Places

Volume 19, Issue 7; 18 Apr 2016; last modified 25 Apr 2016

Keeping track of where you want to go by writing an app. Because that’s what you do, right?

There's no place like 127.0.0.1. (Or ::1, I suppose.)

I collect stuff. Not so much in the physical world, but in the digital world, I accumulate all kinds of data and metadata (not that there’s any clear distinction between those). I geotag my photographs. I create web pages for all my travel itineraries. I record GPS tracks of hikes and other recreational outings. I can tell you every movie I’ve seen since 2003. I’ve scanned hundreds of business cards. [Stop now before they think you’re crazy, —ed]

I’m largely unsatisfied with how all of this information is collected and preserved: several calendars, Google contacts, Emacs Org files, scattered XML and RDFLinked data documents, and a bunch of Evernote notebooks. But that’s not what this posting is about.

One of the Evernote notebooks is “Travel - Places to go”: a collection of web clippings, magazine scans, and cryptic notes. I was looking at it the other day. Two thoughts struck me: first, the notes would be a lot more useful if they were on a mapYes, I’d like to stay in Icehotel Jukkasjärvi and I’d like to visit Huilo-Huilo Biological Reserve, but for entirely practical reasons, I’m not likely to do them both on the same trip!, and second, there are a lot of Wikipedia pages in that notebook.

Wikipedia pages. A lot of structured Wikipedia data is available in DBpedia. And thus an idea was born:

places.nwalsh.com
places.nwalsh.com

It’d be easy:

  1. Grab the structured data from DBpedia: geo_coordinates_en.tql.bz2, geo_coordinates_mappingbased_en.tql.bz2, images_en.tql.bz2, instance_types_en.tql.bz2, labels_en.tql.bz2, mldbmirror-config.json, short_abstracts_en.tql.bz2.

  2. Write a couple hundred lines of Perl to de-normalize those files into JSON documents:

    {
        "uri": "https://en.wikipedia.org/wiki/Eiffel_Tower",
        "id": "wiki-221a0e",
        "type": "Building",
        "image": "http://en.wikipedia.org/wiki/Special:FilePath/Tour_Eiffel_Wikimedia_Commons.jpg",
        "coord": [
            48.858222,
            2.2945
        ],
        "title": "Eiffel Tower",
        "summary": "The Eiffel Tower (/ˈaɪfəl ˈtaʊər/ EYE-fəl TOWR; French: tour Eiffel [tuʁ‿ɛfɛl]
    About this sound listen) is an iron lattice tower located on the Champ de Mars in Paris,
    France. It was named after the engineer Alexandre Gustave Eiffel, whose company designed
    and built the tower."
    }
  3. Upload the roughly million or so JSON documents to MarkLogic and setup a couple of indexes.

  4. Bang out a surprisingly small amount of JavaScript to display an OpenStreetMap map with Leaflet.

  5. Write a few short XQuery modules to search for places within geospatial constraints and maintain document collections (which is how I chose to manage which places you want to see, went to, or want to see again).

  6. Write a little more XQuery and a little more JavaScript to display a popup box for each place.

The Eiffel Tower
The Eiffel Tower

It took literally a couple hours, most of which was spent working out the format of, and groveling over, huge .bz2 files. MarkLogic is a goddamn Swiss Army Chainsaw: I didn’t expect it to be difficult, but I was genuinely surprisedAlso surprising: how many DBpedia entries have wildly incorrect goespatial coordinates.Not really surprising: how much bad data an uninitialized variable can introduce into a data conversion script. how quickly it came together. I built a useful, custom geospatial mapping application with all of Wikipedia in a couple of hours!

I’ve since then spent maybe a couple of days total adding a few more features: the ability to add new places, per-user notes, import and export, and geocoding for address searches.

It’d probably take another week or so to polish off the remaining rough edges, and someone with actual design skills to make it look nice, but that’s ok. It totally scratches my itch. Now I just have to figure out how to connect the places to Evernote pages. Hmmm, maybe I should use Trello instead. Ooh, now there’s an idea…

Comments

Only one ice hotel? There are several.

More serious question: why Perl?

—Posted by Lauren Wood on 19 Apr 2016 @ 03:16 UTC #

I think one might be enough :-) That one is well above the arctic circle and inconveniently far from Patagonia for my example. From a practical perspective, staying in one in Canada might be easier!

The DBpedia downloads are huge and only a relatively small fraction contain geo data. Loading all of DBpedia into my little single-node MarkLogic instance seemed like a bad plan, though obviously if I had a cluster, that would probably have been the easiest thing.

Given that I wanted to prune the data beforehand, I was going to have to parse it with something: Perl or Python or Ruby or … and my fingers still type Perl by default for that kind of hacking.

—Posted by Norman Walsh on 19 Apr 2016 @ 03:36 UTC #