AegisDB: Integrated realtime geo-stream processing and monitoring system Chengyang Zhang Computer...

download AegisDB: Integrated realtime geo-stream processing and monitoring system Chengyang Zhang Computer Science Department University of North Texas.

If you can't read please download the document

description

Ph.D. Proposal (1st Draft) Example of geospatial stream: TEO Project

Transcript of AegisDB: Integrated realtime geo-stream processing and monitoring system Chengyang Zhang Computer...

AegisDB: Integrated realtime geo-stream processing and monitoring system Chengyang Zhang Computer Science Department University of North Texas Ph.D. Proposal (1st Draft) Large volume of real-time streaming data produced by sensors and sensor networks Many streams produced by sensor networks are geo-referenced: Example geo-streams: flooding, traffic jams Need a gStreams (geo-spatial Stream Database Management system) to allow flexibly fuse, query, and make sense out of geo-stream data Why do we need a geospatial Stream management system? Ph.D. Proposal (1st Draft) Example of geospatial stream: TEO Project Ph.D. Proposal (1st Draft) Example of geospatial stream: NWS Weather Observations Ph.D. Proposal (1st Draft) Example of geospatial stream : Denton Water Level Map Ph.D. Proposal (1st Draft) Can existing database system fulfill our requirements? Queries on discrete data points at one time instant can be easily fulfilled using a spatial database e.g. Find the water level at location (60,-10) on date SELECTwaterLevel FROMsurfaceWater WHERElocation=GeomFromtext(Point(60 -10)',4326) ANDtime like % Ph.D. Proposal (1st Draft) Can existing database system fulfill our requirements? However, queries on gStream often require the aggregation of discrete data points, and the continuous running of the queries. e.g. FloodingRegion: A region with all water level sensors in the region has a water level > d house table stores house information Q1: Find the size of FloodingRegion every 0.5 hour Q2: Find the address of all houses traversed by Flood in the past 2 days Q3: List the phone numbers of all the houses within 3 miles of center of flood region Ph.D. Proposal (1st Draft) Can Q1 be fulfilled? Q1: Find the size of FloodingRegion every 0.5 hour SELECT Area(surfaceWater.location) FROM surfaceWater [NOW every 0.5 hour] WHERE surfaceWater.waterLevel >d GROURP BY surfaceWater.location However, the statements above fail because: i) Group by only group tuples with exactly same values, while each sensors location is distinct. ii) Aggregate function Area can only work on polygons, not points iii) Time predicate NOW and every 0.5 hour are not supported Ph.D. Proposal (1st Draft) Existing Approaches Data stream management systems (DSMSs) support streaming data and queries on them. However, they can only handle point locations naively and do not have adequate supports for evolving spatio- temporal extents. Traditional spatio-temporal databases are not designed for handling streaming applications. Moving object databases Little attention paid to extended spatio-temporal objects Many external memory based algorithms for range queries, nearest neighbor queries etc. Only locations are dynamic, not monitored phenomena Sensor databases focus more on energy consumption and hiding the inherent heterogeneity and unreliability of sensor networks. In summary, current database systems are NOT capable of handling continuous queries involving both geo-streams and static extended geo-spatial objects. II gStreams Architecture Ph.D. Proposal (1st Draft) Therefore we propose gStreams Geospatial-Enabled Data Stream Management System Challenges: Supporting spatio-temporal stream data types Spatio-temporal predicates under new data types: closure, consistency, representativeness, succinctness Query Processing and Optimization Fast and real time query processing and response, QoS Kinetic data structures Handle both in-core and out-of core data Can leverage work from moving object databases, e.g. processing range queries, nearest neighbor queries which are mostly external memory based Ph.D. Proposal (1st Draft) gStreams Architecture Input (1) sensor readings, e.g. ozone reading (2) continuous queries in gCQL gStreams will continuously emit query results, e.g. spatial extents of high ozone concentration zones Ph.D. Proposal (1st Draft) March 26, SMU Talk Stream Data Types Ph.D. Proposal (1st Draft) Notify me when my house is within 50 miles of the mandatory evacuation area of a forest fire. March 26, SMU Talk Sample Queries Ph.D. Proposal (1st Draft) Continuously list the addresses of all the houses traversed by flood in the past 2 days in Denton county. March 26, SMU Talk Sample Queries Ph.D. Proposal (1st Draft) Continuously list road segments that have been completely under flood-water for the past 24 hours. March 26, SMU Talk Sample Queries Ph.D. Proposal (1st Draft) Empowers geo-sensor web services, e.g. SWE (sensor Web Enablement) Boosting ways to define and monitor more sophisticated spatio-temporal events for alert purposes From disseminating raw geo-sensor data to providing geo-sensor based monitoring services Allows real time spatio-temporal phenomena monitoring useful for: National weather services Environment monitoring Natural hazard monitoring Transportation applications March 26, SMU Talk Benefits III What we have implemented Ph.D. Proposal (1st Draft) Spatial Aggregator- Cluster By Traditional Group By 1 Semantically only classifies tuples with the same values Spatial functions do not accept discrete points as operands Our Cluster By Allow spatial points to be clustered according to spatial proximity Clusters are elevated to spatial objects that can be queried using spatial functions Motivating Example: NCES 2 may want to identify the size of each low income neighbourhood - Clusters of neighbouring houses with low household income. House information is stored in a spatial table House: House location ( points ), address, owner name, and household income Ph.D. Proposal (1st Draft) Motivating Example Query Illustration Selection Clustering 3 Polygonization 4 Approach Initial Data SELECT ST_Area(ST_Polygon(House.location)) FROM House WHERE House.household_income < 30K CLUSTER BY House.location SELECT ST_Area(ST_Polygon(House.location)) FROM House WHERE House.household_income < 30K CLUSTER BY House.location SELECT ST_Area(ST_Polygon(House.location)) FROM House WHERE House.household_income < 30K CLUSTER BY House.location Ph.D. Proposal (1st Draft) Experimental Dataset Ph.D. Proposal (1st Draft) Implementation of Cluster By Ph.D. Proposal (1st Draft) Comparison of Query Processing Strategies Nested Loop Join Indexed Nested Loop Join with Snapshot Bounding Box Indexed Nested Loop Join with Individual Bounding Box Incremental Indexed Nested Loop Join with Individual Bounding Box Ph.D. Proposal (1st Draft) Comparison of Query Processing Strategies Ph.D. Proposal (1st Draft) Comparison of Query Processing Strategies Ph.D. Proposal (1st Draft) References 1. ISO/IEC9075. Database language sql, international standard., National Center for Educational Statistics.3. M.Ester, H.-P.Kriegel, J.Sander, and X.Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD96, pages , I.Lee and V.Estivill-Castro. Fast cluster polygonization and its applications in data-rich environments. Geoinformatica,10(4): , Zhang, C. and Huang, Y Cluster By: a new sql extension for spatial data aggregation. In Proceedings of the 15th Annual ACM international Symposium on Advances in Geographic information Systems (Seattle, Washington, November , 2007). GIS '07. ACM, New York, NY, 1-4.