Places
Keeping track of where you want to go by writing an app. Because that’s what you do, right?
There's no place like 127.0.0.1. (Or ::1, I suppose.)
I collect stuff. Not so much in the physical world, but in the digital world, I accumulate all kinds of data and metadata (not that there’s any clear distinction between those). I geotag my photographs. I create web pages for all my travel itineraries. I record GPS tracks of hikes and other recreational outings. I can tell you every movie I’ve seen since 2003. I’ve scanned hundreds of business cards. [Stop now before they think you’re crazy, —ed]
I’m largely unsatisfied with how all of this information is collected and preserved: several calendars, Google contacts, Emacs Org files, scattered XML and RDFLinked data documents, and a bunch of Evernote notebooks. But that’s not what this posting is about.
One of the Evernote notebooks is “Travel - Places to go”: a collection of web clippings, magazine scans, and cryptic notes. I was looking at it the other day. Two thoughts struck me: first, the notes would be a lot more useful if they were on a mapYes, I’d like to stay in Icehotel Jukkasjärvi and I’d like to visit Huilo-Huilo Biological Reserve, but for entirely practical reasons, I’m not likely to do them both on the same trip!, and second, there are a lot of Wikipedia pages in that notebook.
Wikipedia pages. A lot of structured Wikipedia data is available in DBpedia. And thus an idea was born:

It’d be easy:
-
Grab the structured data from DBpedia:
geo_coordinates_en.tql.bz2
,geo_coordinates_mappingbased_en.tql.bz2
,images_en.tql.bz2
,instance_types_en.tql.bz2
,labels_en.tql.bz2
,mldbmirror-config.json
,short_abstracts_en.tql.bz2
. -
Write a couple hundred lines of Perl to de-normalize those files into JSON documents:
{ "uri": "https://en.wikipedia.org/wiki/Eiffel_Tower", "id": "wiki-221a0e", "type": "Building", "image": "http://en.wikipedia.org/wiki/Special:FilePath/Tour_Eiffel_Wikimedia_Commons.jpg", "coord": [ 48.858222, 2.2945 ], "title": "Eiffel Tower", "summary": "The Eiffel Tower (/ˈaɪfəl ˈtaʊər/ EYE-fəl TOWR; French: tour Eiffel [tuʁ‿ɛfɛl] About this sound listen) is an iron lattice tower located on the Champ de Mars in Paris, France. It was named after the engineer Alexandre Gustave Eiffel, whose company designed and built the tower." }
-
Upload the roughly million or so JSON documents to MarkLogic and setup a couple of indexes.
-
Bang out a surprisingly small amount of JavaScript to display an OpenStreetMap map with Leaflet.
-
Write a few short XQuery modules to search for places within geospatial constraints and maintain document collections (which is how I chose to manage which places you want to see, went to, or want to see again).
-
Write a little more XQuery and a little more JavaScript to display a popup box for each place.

It took literally a couple hours, most of
which was spent working out the format of, and groveling over, huge
.bz2
files. MarkLogic is a goddamn Swiss Army Chainsaw: I
didn’t expect it to be difficult, but I was genuinely surprisedAlso surprising: how many
DBpedia entries have wildly incorrect goespatial
coordinates.Not really surprising: how much bad data an uninitialized variable can introduce
into a data conversion script.
how quickly it came together. I built a useful, custom geospatial
mapping application with all of Wikipedia in a couple of
hours!
I’ve since then spent maybe a couple of days total adding a few more features: the ability to add new places, per-user notes, import and export, and geocoding for address searches.
It’d probably take another week or so to polish off the remaining rough edges, and someone with actual design skills to make it look nice, but that’s ok. It totally scratches my itch. Now I just have to figure out how to connect the places to Evernote pages. Hmmm, maybe I should use Trello instead. Ooh, now there’s an idea…
Comments
Only one ice hotel? There are several.
More serious question: why Perl?
I think one might be enough :-) That one is well above the arctic circle and inconveniently far from Patagonia for my example. From a practical perspective, staying in one in Canada might be easier!
The DBpedia downloads are huge and only a relatively small fraction contain geo data. Loading all of DBpedia into my little single-node MarkLogic instance seemed like a bad plan, though obviously if I had a cluster, that would probably have been the easiest thing.
Given that I wanted to prune the data beforehand, I was going to have to parse it with something: Perl or Python or Ruby or … and my fingers still type Perl by default for that kind of hacking.