OpenGeo _ Introduction to POstGIS
-
Upload
-mas-marno- -
Category
Documents
-
view
351 -
download
7
Transcript of OpenGeo _ Introduction to POstGIS
PostGIS
PostGIS is an extension to the PostgreSQL relational database that provides spatial types,
indexes and functions, following the OGC “Simple Features for SQL” (SFSQL).
Starting the Suite
You can start and stop the OpenGeo Suite, and access components like PostGIS and
GeoServer, via the “Dashboard”.
Start the Dashboard from the Start Menu > OpenGeo (Windows) or Applications >
OpenGeo (OS/X).
When you first start the dashboard, it provides a reminder about the default password for
accessing GeoServer.
Note
The PostGIS database has been installed with unrestricted access for local users (users
connecting from the same machine as the database is running). That means that it will
accept any password you provide. If you need to connect from a remote computer, the
password for the postgres user has been set to postgres.
First, we need to start up the Suite (which will start both PostGIS and GeoServer). Click
the green Start button at the top right corner of the Dashboard.
1.
The first time the Suite starts, it initializes a data area and sets up template databases.
This can take a couple minutes. Once the Suite has started, you can click the Manage
option under the PostGIS component to start the pgAdmin utility.
2.
Table Of Contents
PostGIS
Starting the Suite
Creating a Database
Loading Shapes into PostGIS
Loading Shapes into PostGIS...
Using the Command Line
PostGIS System Tables
SPATIAL_REF_SYS
GEOMETRY_COLUMNS
Spatial Queries
Measuring
Sub-setting
Spatial Indexes
Spatial Joins
Conclusion
Continue Reading
Previous: Installing PostGIS and
GeoServer
Next: Installing QGIS
About OpenGeo
OpenGeo provides commercial open
source software for internet mapping and
geospatial application development. We
are a social enterprise dedicated to the
growth and support of open source
software.
License
This work is licensed under a Creative
Commons Attribution-Share Alike 3.0
United States License. Feel free to use this
material, but we ask that you please retain
the OpenGeo branding, logos and style.
Products & Services
Technology
Support
Partners
About
Blog
Introduction to an Open Source Geostack
Home » Education » Introduction to an Open Source Geostack » PostGIS
OpenGeo : Introduction to an Open Source Geostack : PostGIS http://workshops.opengeo.org/stack-intro/postgis.html
1 of 11 07/02/2011 10:35
Note
PostgreSQL has a number of administrative front-ends. The primary is psql a
command-line tool for entering SQL queries. Another popular PostgreSQL front-end is
the free and open source graphical tool pgAdmin. All queries done in pgAdmin can also
be done on the command line with psql.
If this is the first time you have run pgAdmin, you should have a server entry for PostGIS
(localhost:54321) already configured in pgAdmin. Double click the entry, and enter
anything you like at the password prompt to connect to the database.
Note
If you have a previous installation of PgAdmin on your computer, you will not have an
entry for (localhost:54321). You will need to create a new connection. Go to File >
Add Server, and register a new server at localhost and port 54321 (note the
non-standard port number) in order to connect to the PostGIS bundled with the
OpenGeo Suite.
3.
Creating a Database
PostgreSQL has the notion of a template database that can be used to initialize a new
database – the new database automatically gets a copy of everything from the template. When
you installed PostGIS, a spatially enabled database called template_postgis was created.
If we use template_postgis as a template when creating our new database, the new
database will be spatially enabled.
Open the Databases tree item and have a look at the available databases. The
postgres database is the user database for the default postgres user and is not too
interesting to us. The template_postgis database is what we are going to use to
create spatial databases.
1.
Right-click on the Databases item and select New Database.2.
OpenGeo : Introduction to an Open Source Geostack : PostGIS http://workshops.opengeo.org/stack-intro/postgis.html
2 of 11 07/02/2011 10:35
Note
If you receive an error indicating that the source database (template_postgis) is
being accessed by other users, this is likely because you still have it selected.
Right-click on the PostGIS (localhost:54321) item and select Disconnect.
You can then double-click the same item to reconnect and try again.
Fill in the New Database form as shown below and click OK.
Name postgis
Owner postgres
Encoding UTF8
Template template_postgis
3.
Select the new postgis database and open it up to display the tree of objects. You’ll
see the public schema, and under that a couple of PostGIS-specific metadata tables –
geometry_columns and spatial_ref_sys – which we will discuss later.
4.
OpenGeo : Introduction to an Open Source Geostack : PostGIS http://workshops.opengeo.org/stack-intro/postgis.html
3 of 11 07/02/2011 10:35
Click on the SQL query button indicated below (or go to Tools > Query Tool).5.
Enter the following query into the query text field:
SELECT postgis_full_version();
Note
This is our first SQL query. postgis_full_version() is management function
that returns version and build configuration.
6.
Click the Play button in the toolbar (or press F5) to “Execute the query”. The query will
return the following string, confirming that PostGIS is properly enabled in the database.
7.
You have successfully created a PostGIS spatial database!! Now do a spatial calculation
just to make sure. Copy the following into the SQL window:
SELECT ST_Length('LINESTRING(0 0, 1 1)');
Our first spatial query constructs a diagonal line across a one-unit square. The length of
that line is sqrt(2), or 1.4142.
8.
Loading Shapes into PostGIS
The workshop data files are public domain data from the City of Medford, Oregon. The files are
located in the data/ directory of the workshop. The projection of the data is NAD83 State Plane
(Oregon South) in feet, more succinctly and opaquely known as EPSG:2270. The files are:
school_pt.shp a small point file of school locations
road_ln.shp a large line file of street centerlines
taxlot_ply.shp a large polygon file of taxable property parcels
We will load our example data into PostGIS using the pgShapeLoader tool in to convert from
Shape files to PostGIS tables.
From the PgAdmin Plugins menu, select PostGIS Shapefile and DBF loader.1.
OpenGeo : Introduction to an Open Source Geostack : PostGIS http://workshops.opengeo.org/stack-intro/postgis.html
4 of 11 07/02/2011 10:35
The loader still start with the connection information for your current PgAdmin database.
Click the “Test connection...” button to ensure you can connect to the database.
Now, click on the button in the “Shape File” area, and browse to the data directory.
Select the “school_pt.shp” file, and click “Open”.
2.
Next, change the value of the SRID field to 2270.3.
Finally, click the “Import” button to start the process.4.
Repeat the process for “road_ln.shp” and “taxlot_ply.shp”. These are much larger files. To
make the load process go faster, open the “Options...” dialogue and click the “Load using
COPY rather than INSERT” option on before running the import.
5.
Loading Shapes into PostGIS... Using the Command Line
OpenGeo : Introduction to an Open Source Geostack : PostGIS http://workshops.opengeo.org/stack-intro/postgis.html
5 of 11 07/02/2011 10:35
PostGIS ships with a command-line utility for loading shape files into the database, called
shp2pgsql, as well as a utility for exporting tables to shape files, call pgsql2shp.
If you completed the process with PostGIS Shapefile and DBF loader above, you do not
need to run these commands – the data is already loaded into your database.
Enter the workshop data directory, set the PATH environment variable to include the
PostgreSQL executables directory, and then run the data loading commands. shp2gpsql
converts the shape file into a SQL text file suitable for loading into the database. psql loads
the text file into the target database.
# set PATH=%PATH%;C:\Program Files\OpenGeo\OpenGeo Suite\pgsql\8.4\bin
# shp2pgsql -p 54321 -I -s 2270 -D road_ln.shp road_ln > road_ln.sql
# psql -f road_ln.sql -d postgis
# shp2pgsql -p 54321 -I -s 2270 -D taxlot_ply.shp taxlot_ply > taxlot_ply.sql
# psql -f taxlot_ply.sql -d postgis
# shp2pgsql -p 54321 -I -s 2270 -D school_pt.shp school_pt > school_pt.sql
# psql -f school_pt.sql -d postgis
PostGIS System Tables
PostGIS follows the OGC SFSQL (Simple Features for SQL) specification, which means it
includes two standard system tables of metadata: SPATIAL_REF_SYS and
GEOMETRY_COLUMNS.
SPATIAL_REF_SYS
The SPATIAL_REF_SYS table contains information about “spatial reference systems” –
combinations of geographic systems (ellipsoids, datum) and projected systems (projections,
parameters) that are used for real-world mapping. “Transverse mercator” is an example of a
projection, and WGS84 is an example of a spheroid, but “UTM Zone 10 North, NAD 83” is an
example of a full spatial reference system.
Table "public.spatial_ref_sys"
Column | Type | Modifiers
-----------+-------------------------+-----------
srid | integer | not null
auth_name | character varying(256) |
auth_srid | integer |
srtext | character varying(2048) |
proj4text | character varying(2048) |
Indexes:
"spatial_ref_sys_pkey" PRIMARY KEY, btree (srid)
Each row in the SPATIAL_REF_SYS table corresponds to one spatial reference system. The
srid column is the unique identifier, and is considered “internal” to the database. The
auth_name and auth_srid are the external authority and authority number. The authority is
usually “EPSG” and the table that ships with PostGIS matches the srid to the auth_srid for
convenience.
The srtext is the OGC “well-known text” representation of the spatial reference system. The
proj4text is the representation consumed by the Proj.4 reprojection library PostGIS uses to
provide on-the-fly reprojection. Because only the proj4text is used internally by PostGIS, it
is usually safe to omit the srtext when adding new entries, but be aware that external
programs may use the srtext to determine the projection of a particular table.
GEOMETRY_COLUMNS
The GEOMETRY_COLUMNS table contains information about the spatial columns in a database.
Table "public.geometry_columns"
Column | Type | Modifiers
-------------------+------------------------+-----------
f_table_catalog | character varying(256) | not null
f_table_schema | character varying(256) | not null
f_table_name | character varying(256) | not null
f_geometry_column | character varying(256) | not null
coord_dimension | integer | not null
srid | integer | not null
type | character varying(30) | not null
Each row in the table corresponds to one spatial column. Tables may have multiple spatial
columns. Client software such as QGIS and uDig often use the GEOMETRY_COLUMNS table to
figure out which columns to display to the end user as “layers” suitable for viewing on a map.
The first four columns (f_table_catalog, f_table_schema, f_table_name,
f_geometry_column) serve to uniquely locate the geometry column. The next three
describe the spatial metadata:
coord_dimension provides the dimensionality (2, 3, or 4 dimensions are supported in
PostGIS);
srid provides the spatial reference system and must refer to a valid row in the
SPATIAL_REF_SYS table;
OpenGeo : Introduction to an Open Source Geostack : PostGIS http://workshops.opengeo.org/stack-intro/postgis.html
6 of 11 07/02/2011 10:35
type provides the geometry type (point, linestring, polygon, etc).
Note that the GEOMETRY_COLUMNS table is not automatically updated as you create and drop
tables. You must manually keep it up to date.
One way to keep the table up-to-date is to religiously use the AddGeometryColumn()
function when managing DDL in spatial tables. This function takes in all the information
necessary to create a new column, performs the creation, and adds a metadata record:
SELECT AddGeometryColumn(
'public',
'mytable',
'mygeocolumn',
2,
4326,
'POLYGON'
);
Another way to keep the table up-to-date is to use helper functions. PostGIS 1.4 and higher
provide the Populate_Geometry_Columns() function, which checks for validity and also
fills in missing entries.
-- PostGIS 1.4
SELECT Populate_Geometry_Columns();
populate_geometry_columns
-------------------------------------------
probed:3 inserted:3 conflicts:0 deleted:0
(1 row)
Spatial Queries
We will now construct some queries of our spatial database, using “spatial SQL” functions
provided by PostGIS (and any other SFSQL spatial database). For a reference list of functions
we will be using, see the PostGIS Functions section.
Measuring
The taxlot_ply table contains 91,343 parcel polygons. It also includes a large number of
attributes about each parcel, including:
impvalue (improvement value)
landvalue (land value)
acreage (reported acreage)
yearblt (year built)
feeowner (name of the owner)
state (state of residence of the owner)
We can use the ST_Area() function in combination with these attributes to ask some
questions of the taxlot_ply table. Open the PgAdmin SQL window and enter the following
queries into database.
What is the area in acres of all parcels in the database?
SELECT Sum(ST_Area(the_geom)) / 43560
FROM taxlot_ply;
Answer: 1772888
What is the area in acres of parcels built on since 2000?
OpenGeo : Introduction to an Open Source Geostack : PostGIS http://workshops.opengeo.org/stack-intro/postgis.html
7 of 11 07/02/2011 10:35
SELECT Sum(ST_Area(the_geom)) / 43560
FROM taxlot_ply
WHERE yearblt >= 2000;
Answer: 27176
What is the value per square foot of all parcels?
SELECT Sum(landvalue + impvalue) / Sum(ST_Area(the_geom)) as
FROM taxlot_ply;
Answer: 0.41
What is the value per square foot of all parcels held by out-of-state owners?
SELECT Sum(landvalue + impvalue) / Sum(ST_Area(the_geom)) as
FROM taxlot_ply
WHERE state != 'OR';
Answer: 0.38
Measurement is not limited to areas. We can also use linear measurements to characterize the
roads in the county.
What is the break down of road types in the county?
SELECT
Sum(ST_Length(the_geom)) / 5280 as miles,
Count(*) as nsegments,
cfcc
FROM road_ln
GROUP BY cfcc
ORDER by cfcc;
Sub-setting
So far, our queries have calculated one metric or a summary against every record in the
database. Databases are commonly used to store very large tables – larger than can be stored
in memory – and efficiently access sub-sets of those tables.
First, let’s find out the coordinates of the first school in our school_pt table:
SELECT ST_AsText(the_geom) FROM school_pt WHERE gid = 1;
Answer: POINT(4387009 402407)
Now, let’s take that point, and find the average property value in a one-mile (5280 foot) radius.
SELECT Sum(landvalue + impvalue) / Count(*) as avg_value
FROM taxlot_ply
WHERE
ST_DWithin(
the_geom,
ST_GeomFromText('POINT(4387009 402407)', 2270),
5280
);
Answer: 161,094
OpenGeo : Introduction to an Open Source Geostack : PostGIS http://workshops.opengeo.org/stack-intro/postgis.html
8 of 11 07/02/2011 10:35
There are a number of things going on in this query:
The ST_GeomFromText() function is used to build a geometry object from the text
representation of a point. Note that the SRID is also set to 2270 at the same time, to
match the SRID of our data tables.
The ST_DWithin() function is then used to test every geometry against the query
point, and return true only if the geometry was within 5280 units (feet).
Finally, only those records that passed the distance test were fed into the calculation of
the average property value: total value divided by number of properties.
Spatial Indexes
The PostGIS spatial index is an r-tree index, implemented on top of PostgreSQL’s GiST access
method infrastructure.
An “r-tree” (and any other spatial index) works by sorting the bounding boxes of features into a
quickly searchable tree. Because the features themselves are not indexed, just the bounding
boxes, all queries that use spatial indexes must proceed in two phases. First, the spatial index
is used to generate a subset of records that might match a spatial condition; then, an exact test
is used on just that subset to produce the final output set.
The “r-tree” index uses nested rectangles (in the two-dimensional case, cubes and hypercubes
for higher dimensions) to sort the features into a quickly searchable tree.
To create a spatial index in PostGIS, use the CREATE INDEX [indexname] ON
[tablename] USING GIST ( [geometry] ) command. For example, to index our three
example tables, you would use the following commands.
Let’s compare an unindexed and indexed query for speed.
First, drop the spatial indexes on your tables.
DROP INDEX school_pt_the_geom_gist;
DROP INDEX taxlot_ply_the_geom_gist;
DROP INDEX road_ln_the_geom_gist;
1.
Run the average property query, and see how fast it executes:
SELECT Sum(landvalue + impvalue) / Count(*) as avg_value
FROM taxlot_ply
WHERE
ST_DWithin(
the_geom,
ST_GeomFromText('POINT(4387009 402407)', 2270),
5280
);
2.
Now, add the spatial indexes back onto your tables, and run the query again.
CREATE INDEX school_pt_the_geom_gist ON school_pt USING GIST (the_geom
CREATE INDEX taxlot_ply_the_geom_gist ON taxlot_ply USING GIST
CREATE INDEX road_ln_the_geom_gist ON road_ln USING GIST (the_geom
3.
The unindexed query logs an execution time of over 1000ms, while with the indexes, a time of
less than 50ms is achieved.
Spatial Joins
With spatial indexes in place, we can perform spatial joins quickly – taking information from two
OpenGeo : Introduction to an Open Source Geostack : PostGIS http://workshops.opengeo.org/stack-intro/postgis.html
9 of 11 07/02/2011 10:35
Previous: Installing PostGIS and GeoServer Next: Installing QGIS
distinct tables and joining it together on the basis of spatial relationships.
Our last query determined the average property value within a one-mile radius of a single
school. We can use a spatial join to determine the property value within a one-mile radius
for all schools. Or, to keep the result set smaller, just the high schools.
SELECT
s.name AS school_name,
Sum(t.landvalue + t.impvalue) / Count(*) AS avg_property_value
FROM taxlot_ply t, school_pt s
WHERE
ST_DWithin(t.the_geom, s.the_geom, 5280)
AND
s.type = 'High School'
GROUP BY s.name
ORDER BY avg_property_value DESC;
And now we know where to send our kids to school in Medford.
Conclusion
These have been a very few examples of using spatial SQL for querying a database. In the
remaining sections of the workshop, most of the querying will happen behind the scenes, as
tools like GeoServer pull data from the database.
However, the power of the spatial database for analysis and querying remains easily available
via scripting languages and direct user tools like PgAdmin to quickly analyze or automate
geospatial tasks.
Products & Services
OpenGeo Suite
Learn
Features
Screenshots & Videos
Download
Purchase
Pricing
Training
Consulting
Solutions
OpenGeo for Government
OpenGeo for Transit
Commercial Solutions
Support
Partners
Partner Terms
Partner FAQ
Technology
OpenGeo Suite
GeoNode
PostGIS
GeoServer
GeoWebCache
OpenLayers
GeoExt
Demos
Publications & Case Studies
About Us
History
Philosophy
Team
Careers
Contact
Press
Blog
OpenGeo is the geospatial division of
OpenPlans, a 501(c)(3) not-for-profit. We're
bringing the best practices of open source
software to organizations around the world.
148 Lafayette Street, Penthouse
New York, NY 10013
1-877-OPENGEO
Subscribe to our newsletter
Follow @opengeo on Twitter
Follow us on LinkedIn
OpenGeo : Introduction to an Open Source Geostack : PostGIS http://workshops.opengeo.org/stack-intro/postgis.html
10 of 11 07/02/2011 10:35
OpenGeo : Introduction to an Open Source Geostack : PostGIS http://workshops.opengeo.org/stack-intro/postgis.html
11 of 11 07/02/2011 10:35