A recent project required me to search for boundary data. After looking through numerous articles, I determined I could probably pull the data from openstreetmap.org. OpenStreetMap is a gigantic resource for global geoinformation, and it's open-source, so that is a plus. Then I just needed to figure out how to query the data I need and convert it to a useful format to stuff into a database (ElasticSearch in my case).

The first problem is getting the data. Initially, I pulled the entire planet file (45GB) from planet.openstreetmap.org. This file has everything from locations to streets, and also the boundary data that I need. After working on the data for a while, I found region-specific files at geofabrik.de.

curl http://download.geofabrik.de/north-america-latest.osm.pbf -o na-latest.osm.pbf

After the file downloaded, I used several tools to work on it. The first one is osmium. It's the swiss-army knife of geo. If you plan on doing extensive geo work, this tool is a must. The next one is osmfilter. Osmfilter allows querying of the osm file format (you'll see conversions later). Finally, use gdal to convert to the final geoJSON data. I used Nominatim to reference the data available in OpenStreetMap. All these tools were available via Homebrew, so installation wasn't a problem.

brew install osmium-tool
brew install osmfilter
brew install gdal

The pbf file contains everything from borders to streets. I'm looking for just administrative boundaries. That's where osmium comes in.

osmium tags-filter na-latest.osm.pbf nwr/admin_level --overwrite -o na-latest-admin.osm.pbf

Now the pbf file only contains items related to administrative levels. Unfortunately, that also includes tags unrelated to the boundary polygons. Luckily, osmfilter sorts this out. As the name implies, this tool works with osm files. A quick conversion is required.

osmconvert na-latest-admin.osm.pbf -o=na-latest-admin.osm

Perfect. Now osmfilter can be used to remove everything I don't need.

osmfilter na-latest-admin.osm --drop-tags="barrier= building= highway= landuse= office= place= waterway=" -o=na-latest-admin-noplace.osm

With the clean file, it's time to convert to geoJSON. gdal can do this, but it requires the original pbf format (of course it does). Another conversion's required.

osmconvert na-latest-admin-noplace.osm -o=na-latest-admin-noplace.osm.pbf

Finally, create the geoJSON.

ogr2ogr -f GeoJSON na-latest-admin-noplace.geojson na-latest-admin-noplace.osm multipolygons

And with that, I have a pretty geoJSON file I can use for my project. The project took a surprising amount of time to find the correct data and learn how to manage it.

Alternative approach

When I first attacked the project, I used osmtogeojson to do the geoJSON conversion. It was successful, but with some of the queries, I was getting odd results. This was probably due to my knowledge of the tools more than any specific issue with them. This is how I did it.

osmconvert planet.osm.pbf --out-o5m > planet.o5m
osmfilter --keep="admin_level=2 and boundary=administrative" planet.o5m -o=myfile.osm
osmtogeojson myfile.osm > myfile.geojson

Honorable mentions

I've listed a couple of utilities and sites I found on my journey to get boundary data.

Conversion script

After spending many hours reading documentation, this script is the one that put me in the right direction. Special thanks to SomeoneElseOSM for putting it together.

OSM Admin Boundaries Map

This excellent little utility visualizes available OpenStreetMap data. If you're looking to verify the validity of a boundary or even download geoJSON for a specific place, this is the way to do it.

Please don't scrape this site. The author was kind enough to host this dataset, so let's be good samaritans and respect its intended use.

Pelias

These wildcats built a full geocoder on top of Elasticsearch. It's open-source and even has Docker images. If you're looking for a complete geo system, they seem to have done a great job.

Hopefully, someone can save themselves some time.

comments powered by Disqus