A recent project required me to search for boundary data. After looking through numerous articles, I determined I could probably pull the data from openstreetmap.org. OpenStreetMap is a gigantic resource for global geoinformation, and it's open-source, so that is a plus. Then I just needed to figure out how to query the data I need and convert it to a useful format to stuff into a database (ElasticSearch in my case).
The first problem is getting the data. Initially, I pulled the entire planet file (45GB) from planet.openstreetmap.org. This file has everything from locations to streets, and also the boundary data that I need. After working on the data for a while, I found region-specific files at geofabrik.de.
curl http://download.geofabrik.de/north-america-latest.osm.pbf -o na-latest.osm.pbf
After the file downloaded, I used several tools to work on it. The first one is osmium. It's the swiss-army knife of geo. If you plan on doing extensive geo work, this tool is a must. The next one is osmfilter. Osmfilter allows querying of the osm file format (you'll see conversions later). Finally, use gdal to convert to the final
geoJSON data. I used Nominatim to reference the data available in OpenStreetMap. All these tools were available via Homebrew, so installation wasn't a problem.
brew install osmium-tool
brew install osmfilter
brew install gdal
pbf file contains everything from borders to streets. I'm looking for just administrative boundaries. That's where
osmium comes in.
osmium tags-filter na-latest.osm.pbf nwr/admin_level --overwrite -o na-latest-admin.osm.pbf
pbf file only contains items related to administrative levels. Unfortunately, that also includes tags unrelated to the boundary polygons. Luckily,
osmfilter sorts this out. As the name implies, this tool works with
osm files. A quick conversion is required.
osmconvert na-latest-admin.osm.pbf -o=na-latest-admin.osm
osmfilter can be used to remove everything I don't need.
osmfilter na-latest-admin.osm --drop-tags="barrier= building= highway= landuse= office= place= waterway=" -o=na-latest-admin-noplace.osm
With the clean file, it's time to convert to
gdal can do this, but it requires the original
pbf format (of course it does). Another conversion's required.
osmconvert na-latest-admin-noplace.osm -o=na-latest-admin-noplace.osm.pbf
Finally, create the
ogr2ogr -f GeoJSON na-latest-admin-noplace.geojson na-latest-admin-noplace.osm multipolygons
And with that, I have a pretty
geoJSON file I can use for my project. The project took a surprising amount of time to find the correct data and learn how to manage it.
When I first attacked the project, I used
osmtogeojson to do the geoJSON conversion. It was successful, but with some of the queries, I was getting odd results. This was probably due to my knowledge of the tools more than any specific issue with them. This is how I did it.
osmconvert planet.osm.pbf --out-o5m > planet.o5m
osmfilter --keep="admin_level=2 and boundary=administrative" planet.o5m -o=myfile.osm
osmtogeojson myfile.osm > myfile.geojson
I've listed a couple of utilities and sites I found on my journey to get boundary data.
After spending many hours reading documentation, this script is the one that put me in the right direction. Special thanks to SomeoneElseOSM for putting it together.
This excellent little utility visualizes available OpenStreetMap data. If you're looking to verify the validity of a boundary or even download geoJSON for a specific place, this is the way to do it.
Please don't scrape this site. The author was kind enough to host this dataset, so let's be good samaritans and respect its intended use.
These wildcats built a full geocoder on top of Elasticsearch. It's open-source and even has Docker images. If you're looking for a complete geo system, they seem to have done a great job.
Hopefully, someone can save themselves some time.