Corinne Hutchinson's 7/8/2015 PuPPy Presentation on GeoDjango
Transcript of Corinne Hutchinson's 7/8/2015 PuPPy Presentation on GeoDjango
Highlights of Tonight’s Talk
● Reasons for using location data in a web application
● Overview of GeoDjango & setting up a basic web application with geospatial support
● Scaling a system reliant on geospatial queries
Basic GeoSpatial Questions
● Where is A located? (i.e. point-in-polygon, mappability)
● What is the distance between A & B?● What is the shortest path between A & B? (i.e.
route planning)● What’s the elevation change between A & B?
GeoDjango● Included in standard installations of Django since 1.4
● Allows importing of geospatial data from essentially any vector data source (e.g. KML, shapefiles); raster data (bitmaps, etc) are not supported
● Provides familiar ORM interface for geospatial queries
● Straightforward to learn: excellent tutorial on main Django site
Starting a Geospatial Application: DB
● Pick your database and install any needed extensions
● per-DB tutorial in GeoDjango docs, supported options are PostgreSQL/Postgis, MySQL, SQLite, or Oracle
● PostgreSQL/Postgis and Oracle Spatial generally considered the most mature spatial database options
Adding Data? Lots of Free Geospatial Data Sources
● US Census TIGER (Topologically Integrated Geographic Encoding and
Referencing): political boundaries e.g. states, counties, metro areas
● Natural Earth: natural features
● OpenStreetMap: map tiles, land use, etc
● NASA’s Socioeconomic Data and Applications Center (SEDAC): data about
human-environment interactions e.g. land use, poverty, climate
● Open Topography: topo data, most of the world
This is great! Can we scale it?
● Minimize direct database hits● Re-route duplicated database calls to cache
(e.g. Redis, MemCache)● Reformat our data for cacheability: geohashes
Geohashes
● Developed by Gustavo Niemeyer, entered into public domain in 2008
● Method of sequentially subdividing the globe into spatial buckets
● Buckets represented as encoded binary strings (e.g. 0010110101011100011000110001101111000111 -> 5pf666y7)
● Allows for very fast point-in-polygon lookups
Adding Geohashes to Point-in-Polygon Lookups
● Choose level of precision (5 or 6 are likely good)● Convert point to a geohash, then extract the center of that
geohash● DB lookup to determine containing polygon ● Finally, cache the geohash-to-polygon mapping (i.e. set the
key ‘c22zp’ to the value ‘Seattle’)● Subsequent lookups; check cache for existing key
matching geohash before conducting DB lookup
More Awesome GeoSpatial Libraries● OGR/GDAL: interacting with geospatial data formats, i.e. opening files, etc● PyShp: ESRI shapefile handling in pure Python (https://pypi.python.org/pypi/pyshp)● PySAL: spatial analysis functions (https://github.com/pysal/pysal)● PyQGIS: essentially anything you might want to do with GIS data (
http://docs.qgis.org/testing/en/docs/pyqgis_developer_cookbook/intro.html)● geopy: geocoding; integration with OpenStreetMaps, Google Geocoding API, Baidu Maps, and
many more (https://github.com/geopy/geopy)● python-geohash: encoding/decoding points to geohashes, looking up geohash neighbors● descartes: plotting geometric objects in matplotlib (https://pypi.python.org/pypi/descartes)● NumPy: data wrangling (http://www.numpy.org/)● pandas: data wrangling (http://pandas.pydata.org/)