Streaming GIS using PostGIS & SQLstream Julian Hyde - Chief Architect Sunil Mujumdar – Founding...
-
Upload
claire-french -
Category
Documents
-
view
218 -
download
0
Transcript of Streaming GIS using PostGIS & SQLstream Julian Hyde - Chief Architect Sunil Mujumdar – Founding...
Streaming GIS usingPostGIS & SQLstream
Julian Hyde - Chief Architect
Sunil Mujumdar – Founding Engineer
The Data Crunch
» Data volumes rising fast
» Human-originated data (e.g. e-commerce purchases) rising fast
» Machine-generated data (e.g. e-commerce events and network
packets) rising faster
» Sensor data (e.g. GIS-enabled mobile phone, road sensors) faster still
» Every business needs answers with lower latency
» Every significant problem is massively parallel &
distributed:
» Geographically distributed organizations
» Multiple boxes for scale
» Exploit multiple cores
The world is no longer flat
• In data warehouse, all records are equally important
• In many real-world applications, recent & close events are
much more important
Time
Spac
e
NowNow
Here
Case study: Mozilla
Data management is hard
» If you make a mistake, the system won’t be fast enough
» Can’t afford to lose data
» New technologies are very difficult to use
» MapReduce
» NoSQL
» Multi-threaded programming in Java, C++, Erlang, Scala, …
» Collaborate, interoperate, evolve
SQL – life in the old dinosaur yet
» Widely spoken
» Rich
» Orthogonal
» Declarative
» Tune your system without
changing your logical schema
» Apps don’t interfere with each
other
» Adaptive
» Route around failure
» Exploit available resources
» Make tradeoffs to meet QoS goals
Streaming SQL: example #1
Tweets about this conference:
» SELECT STREAM ROWTIME, author, text
FROM Tweets
WHERE text LIKE ‘%#PGWest%'
Streaming SQL basics
» Streams:
» CREATE STREAM Tweets (
author VARCHAR(20),
text VARCHAR(140));
» Relational operators have streaming counterparts:
» Project (SELECT)
» Filter (WHERE)
» Union
» Join
» Aggregation (GROUP BY)
» Windowed aggregation (e.g. SUM(x) OVER window)
» Sort (ORDER BY)
Streaming SQL: example #2
» Each minute, return the number of clicks on each web
page:
» SELECT STREAM ROWTIME, uri, COUNT(*)
FROM PageRequests
GROUP BY FLOOR(ROWTIME TO MINUTE), uri
Streaming SQL: Time
» ROWTIME pseudo-column
» Provided by source application or generated by system
» WINDOW
» Present in regular SQL (e.g. SQL:2003) but more important in
streaming SQL
» Defines a ‘working set’ for streaming JOIN, GROUP BY, windowed
aggregation
» Monotonicity (“sortedness”)
» Prerequisite for certain streaming operations
Streaming SQL: example #3
Find all orders from New York that shipped within an hour:
» CREATE VIEW compliant_orders AS
SELECT STREAM *
FROM orders OVER sla
JOIN shipments
ON orders.id = shipments.orderid
WHERE city = 'New York'
WINDOW sla AS (RANGE INTERVAL '1' HOUR PRECEDING)
Streaming SQL: more
» Usual advanced SQL stuff:
» Schemas, views, tables
» Ability to nest queries
» User-defined functions and transforms
» Interoperate with 3rd party systems
» Adapters make external systems look like read/write streams
» Push/pull
» Active/passive
» Interact with databases:
» As source (change-data capture)
» Lookup (e.g. GIS lookup; normalizing current data using historic norms)
» As sink (populating the data warehouse)
Real-time road traffic monitoring
1. Map vehicle positions to
road segments
2. Compute average speed of
each road segment
3. Detect traffic incidents
Line segmentsrepresentingsections of freeway
Vehicleposition
» Vehicle id, latitude, longitude,
speed, timestamp
» 15,000 vehicles with sensors
» Each vehicle transmits each min
» Road network through New
South Wales, Australia
Copyright © 2010 SQLstream, Inc.
Google earth
Road traffic analytics architecture
Position LogStream
POSDATA_nnn.txt
POSDATA_n.txt
ParseRoadInfoLookup
PostGIS
SQLstream
TrafficAnalytics
Dashboard
Gathering input data
-- Define the Foreign Stream for reading log data
CREATE OR REPLACE FOREIGN STREAM "PositionLogStream" (
MESSAGE VARCHAR(132))
SERVER "PositionLogReader"
OPTIONS (file_pattern 'POSDATA.*\.txt')
DESCRIPTION 'Raw Vehicle Position Log Stream';
Problem 1: Map vehicle positions to road segments
SELECT STREAM segmentId,
roadElementId,
vehiclePositionX,
vehiclePositionY,
velocityX,
velocityY
FROM (TABLE RoadInfoLookup(
CURSOR (SELECT STREAM * FROM VehiclePositions),
'postgis_source.properties', -- data source properties
'road_segment', -- table name
'v_latitude', -- latitude column name
'v_longitude')) -- longitude column name
SQLstream user-defined transform (UDX)
» public class RoadInfoLookupUdx {
public static void RoadInfoLookup(
ResultSet trafficInfoIn,
PreparedStatement roadSegmentInfoOut)
{
while (trafficInfoIn.next()) {
double latitude = trafficInfoIn.getDouble(1);
double longitutde = trafficInfoIn.getDouble(2);
int roadElementId = getInfo(latitude, longitude);
roadSegmentInfoOut.setDouble(1, latitude);
roadSegmentInfoOut.setDouble(2, longitude);
roadSegmentInfoOut.setDouble(3, roadElementId);
// etc.
roadSegmentInfoOut.executeUpdate();
}
}
Helper method to access PostGIS
» private int getInfo(
double latitude,
double longitude) throws SQLException
{
// First time through, prepare query.
if (pstmt == null) {
pstmt = connection.prepareStatement(
“select … from road_segments where
ST_Distance(uts_geom, ST_GeomFromText(?, srid))
< width”);
}
pstmt.setDouble();
ResultSet rset = pstmt.executeQuery();
rset.next();
return rset.getInt(1);
}
Problem 2: Compute average speed
Streaming query computes average over 15 minute sliding windowResults are written to Google Earth file (and elsewhere)
-- Average road element SpeedsCREATE OR REPLACE VIEW "EstimatedReSpeeds"
DESCRIPTION 'Estimated RE Speeds' ASSELECT STREAM "roadElementID",
AVG("vSpeed") OVER "last15" AS "reSpeed",
"reSpeedLimit"
FROM "Stage3"
WINDOW "last15" AS (
PARTITION BY "roadElementID"
RANGE INTERVAL '15' MINUTE PRECEDING);
Problem 3: Incident detection
» Use Bollinger bands to detect outliers (3 standard deviations = 99.7%)
CREATE OR REPLACE VIEW "Incidents"
DESCRIPTION 'Detect incidents' AS
SELECT STREAM ...
FROM ( SELECT STREAM "roadElementID",
AVG("vSpeed") OVER "lastMinute" AS "avgSpeedLastMinute",
AVG("vSpeed") OVER "last15" AS "avgSpeedLast15",
STDDEV("vSpeed") OVER "last15" AS "stddevSpeedLast15",
"reSpeedLimit", ...
FROM "Stage3"
WINDOW "last15" AS (PARTITION BY "roadElementID" RANGE INTERVAL '15' MINUTE PRECEDING)
WINDOW “lastMinute” AS (PARTITION BY "roadElementID" RANGE INTERVAL '1' MINUTE PRECEDING) )
WHERE "avgSpeedLastMinute" < "avgSpeedLast15" – 3 * "stddevSpeedLast15";
Summary
• Emergence of data problems that are:
– Real-time
– Geospatial
– High throughput
• In particular, Intelligent Transport Systems (ITS) analytics
• Need to combine streaming, GIS and relational (SQL)
• Technology synergy:
– PostGIS is a mature GIS implementation, integrates SQL with GIS
– SQLstream integrates SQL with streaming
Any questions?
Thank you for attending!
Further reading:
» “Data in Flight” by Julian Hyde
(Communications of the ACM, Vol. 53
No. 1, Pages 48-52)
Blogs:
» http://www.sqlstream.com/blog
» http://julianhyde.blogspot.com
Twitter:
» @julianhyde
» @sunil_mujumdar