I was curious in finding worldwide points of interest, and I quickly found the OpenStreetMap database. The complete database is available as a 16GB compressed XML file (which comes in at around 250gb uncompressed), which is updated daily by generous contributors. Thankfully, you can find mirrors that have partitioned the data in some meaningful way (like by major cities).
For our needs, the data is made up of few important elements. The first is a
node, which has a longitude, latitude and an id. A node has zero or more
tag child-elements, which are key-value pairs of meta data. There's also a
way element which references multiple
node elements. You see, in my naive mind a point of interest like a building would be represented by a single
node. However, from a mapping point of view, it's really a polygon made up of multiple
way can also have zero or more tags.
Now ever since I wrote the MongoDB Geospatial tutorial, I've had an itch to try more real-world stuff with MongoDB's geo capabilities. This database seemed like an ideal candidate. The first thing I did was download a bunch of city-dumps from a mirror and started writing a C# importer (github). I wasn't actually interested in polygons, so I calculated the centroid of any
way and converted it into a
node. Most of the time the result was quite good. The importer's readme has more information.
Next, I wrote a little Sinatra app and did the obvious thing using the Google Maps API. You can also find the source for this on github.
I've put up a demo at pots.mongly.com
I also extracted the data for each city and made it available, so that you can play with it yourself. It's available at data.mongly.com (you can read about OpenStreetMap's licensing here). The data is meant to be easily imported into mongodb using its mongoimport command. Download a city, extract it, and do the following:
mongoimport -d pots -c tags PATH_TO_TAGS.json mongoimport -d pots -c nodes PATH_TO_COUNTRY.json
If mongoimport isn't in your PATH, you'll need to use the full path. Also, the
tags.json file is the same for all cities - so you only need to import it once. Finally, connect to mongodb, type
use pots and then create a 2D index on the loc field:
Different cities have different amounts of data. I left everything in and you can see there's quite a bit of information. Given that MongoDB supports composite indexes, it'd be trivial to provide additional node filtering.
And that's why, when people ask me What did you do this weekend?, I can say I parsed a 250gb XML file (because, yes, I did download it and I did *try* to import it).