Real-time Big Data streaming integration for sensor networks
-
Upload
sqlstream-inc -
Category
Technology
-
view
4.238 -
download
7
description
Transcript of Real-time Big Data streaming integration for sensor networks
Copyright © 2012 – Proprietary and Confidential Information of SQLstream Inc.
Real-time Control in a Big Data World
Sensors Expo, 2012
Presenter: Damian Black, SQLstream CEO
Copyright © 2012 – Proprietary and Confidential Information of SQLstream Inc.2
» What is Streaming Big Data?
» The “Sensor Internet” – a Real-time connected world
» Architectures for processing Real-time/Fast Big Data
» Sharing and Reusing data with Relational Streaming
» Case studies and Examples
» Relational Streaming and Hadoop
» Mapping out the data management space
» Conclusions
Agenda
Copyright © 2012 – Proprietary and Confidential Information of SQLstream Inc.3
Real-time Big Data
First what is a Streaming Big Data Platform ? Stream any data in, immediately stream out real-time answers. Continuously analyze and process massive data volumes. React in real-time to each and every new record.
And what then is Relational Streaming ? A Streaming Big Data paradigm for processing data streams. Familiar relational expressions with automatic optimization. Queries executed continuously on a massively parallel scale.
Copyright © 2012 – Proprietary and Confidential Information of SQLstream Inc.4
» Technology Drivers
» GPS enabled devices
» Low cost wireless sensors
» Ultra low power sensors
» Business & Environmental Drivers
» Congestion Reduction
» Smart Energy & Environment Monitoring
» V2V, V2I and Smart Transportation
» M2M
» RFIDs & the ‘Internet of Things’
A real-time connected world of sensors
Copyright © 2012 – Proprietary and Confidential Information of SQLstream Inc.5
Today’s operational platforms – far from real-time
Poorly integrated operational platforms based on traditional store and process technology
Massive volumes of streaming data:
ServiceSystemSensors
Exponential Growth
Copyright © 2012 – Proprietary and Confidential Information of SQLstream Inc.6
Analyze Streaming Data in Flight
SQLstream s-Server
Respond to real-time analysis
Real-time alerts and visibility with continuously streaming
results.
Historical dataused for predictivereal-time analytics
Existing operational systems and data
warehouses kept up to date in real-time with
continuous ETL
Massive volumes of streaming data:
ServiceSystemSensors
Exponential Growth
Copyright © 2012 – Proprietary and Confidential Information of SQLstream Inc.7
» Fine grained dataflow: pipelined & superscalar parallel processing
» Reuse of analytics and data streams across nodes
» Avoid transactional bottlenecks – fine-grained streaming dataflow
» SQL as a parallel dataflow language – standard, familiar, proven
Streaming Data Processing – Achieving Scalability
Copyright © 2012 – Proprietary and Confidential Information of SQLstream Inc.8
Streaming Data Processing & WindowsOverview of real-time processing pipelines
Real-time data streaming data feed
Example: Continuous query for real-time alerts
» CREATE VIEW sla_fulfilled AS
SELECT STREAM *
FROM orders OVER sla
JOIN shipments
ON orders.id = shipments.orderid
WHERE city = 'New York'
WINDOW sla AS (RANGE INTERVAL '1' HOUR PRECEDING)
Data sources such as log files, sensors and API feeds are turned into streaming data feeds
Copyright © 2012 – Proprietary and Confidential Information of SQLstream Inc.9
Case Study: Traffic Analytics from GPS Data
Copyright © 2012 – Proprietary and Confidential Information of SQLstream Inc.10
10 meter road segments
Road segment GIS database
One GPS event per vehicle per second
Historical Trend Data
Objective: Accurate and reliable Journey Time information with dynamic updating of alternative routes, identifying ‘worse than usual’ events and predictive incident detection.
Case Study: Traffic Analytics from GPS Data
Copyright © 2012 – Proprietary and Confidential Information of SQLstream Inc.11
SQL as an API – Simplifying Analytics
» Example: Compute Average speed across any subset of the
road network over rolling time windows from GPS events
11Copyright © 2012 Proprietary information of SQLstream Inc. All rights reserved
Copyright © 2012 – Proprietary and Confidential Information of SQLstream Inc.12
Case Study: Real-time Seismic Event Detection
Copyright © 2012 – Proprietary and Confidential Information of SQLstream Inc.13
Input Signal Data (blue) and Detected Quakes (red)
Copyright © 2012 – Proprietary and Confidential Information of SQLstream Inc.14
» Many sensors streaming data over the Internet in real-time.
» Streaming analytics maintained over varying time windows.
» Aggregated and continuously sorted: streaming “order by”.
The ‘Sensor Internet’ for Services
stream Server
stream Serverstream
Serverstream Server
stream Server
stream Serverstream
Serverstream Serverstream
Serverstream Server
stream Server
Copyright © 2012 – Proprietary and Confidential Information of SQLstream Inc.15
CREATE OR REPLACE PUMP "SONG_SCORE_PUMP" STOPPED AS INSERT INTO ”SERVICE_SCORE" (”serviceId", "SCORE")
SELECT STREAM
”SERVICE_ID" AS ”serviceId",
SUM("POINTS") OVER "LAST_WEEK" +
((SUM("POINTS") OVER "LAST_2_WEEKS” - SUM("POINTS") OVER "LAST_WEEK") * 0.5) +
((SUM("POINTS") OVER "LAST_3_WEEKS" - SUM("POINTS") OVER "LAST_2_WEEKS") * 0.25) +
((SUM("POINTS") OVER "LAST_4_WEEKS" - SUM("POINTS") OVER "LAST_3_WEEKS") * 0.125) AS "SCORE”
FROM ”SERVICE_SCORES”
WINDOW
"LAST_WEEK" AS (PARTITION BY "SONG_ID" RANGE INTERVAL '7' DAY PRECEDING),
"LAST_2_WEEKS" AS (PARTITION BY "SONG_ID" RANGE INTERVAL '14' DAY PRECEDING),
"LAST_3_WEEKS" AS (PARTITION BY "SONG_ID" RANGE INTERVAL '21' DAY PRECEDING),
"LAST_4_WEEKS" AS (PARTITION BY "SONG_ID" RANGE INTERVAL '28' DAY PRECEDING);
Streaming SQL: Decaying Service Monitor Scoring
» Millions of events per second
» Real-time service scoring
» Amazon EC2
stream Serverstream
Serverstream Serverstream
Server
stream Serverstream
Serverstream Serverstream
Serverstream Server
stream Server
stream Server
Copyright © 2012 – Proprietary and Confidential Information of SQLstream Inc.16
SELECT STREAM ROWTIME, url, “numErrorsLastMinute”,
» FROM (
» SELECT STREAM
» ROWTIME, url, “numErrorsLastMinute”,
» AVG(“numErrorsLastMinute”) OVER (PARTITION BY url RANGE INTERVAL ’1′ MINUTE PRECEDING)
» AS “avgErrorsPerMinute”,
» STDDEV(“numErrorsLastMinute”) OVER (PARTITION BY url RANGE INTERVAL ’1′ MINUTE PRECEDING)
» AS “stdDevErrorsPerMinute”
» FROM “ServiceRequestsPerMinute”) AS S
WHERE S.”numErrorsLastMinute” > S.”avgErrorsPerMinute” + 2 * S.”stdDevErrorsPerMinute”;
Streaming SQL – Change in Rate of Service Errors
stream Serverstream
Serverstream Serverstream
Server
stream Serverstream
Serverstream Serverstream
Serverstream Server
stream Server
stream Server
» Millions of records per second
» Real-time Bollinger Bands
» Amazon EC2
Copyright © 2012 – Proprietary and Confidential Information of SQLstream Inc.17
» Sensor Data:
» Location, Power, Temperature, Pressure, Speed, …
» GPS and Mobile Devices, RFID
» System Data:
» Log files, Device records, SNMP MIBs
» Service Data:
» Usage log files, transactions, Internet, other
» Industries & Applications:
» Energy, Mining, Transportation, Manufacturing, Logistics, etc
» Performance, Security, Compliance, and Fraud Monitoring
» Error and Service Level Monitoring
» Usage, Metering and SCADA
Use Cases for S3 Data (Sensor x System x Service)
Copyright © 2012 – Proprietary and Confidential Information of SQLstream Inc.18
Comparison of Big Data Processing Platforms
Hadoop style: data chunking coarse-grained dataflow
Relational Streaming: DAGs of fine-grained dataflow
Hadoop
Petabytes of stored data
Batch processing
Historical queries
High Latency
Streaming
Millions of events per sec
Stream processing
Continuous queries
Low latency
Copyright © 2012 – Proprietary and Confidential Information of SQLstream Inc.19
Relational Streaming overlaying Hadoop
» Relational Stream Processors co-located with Hadoop Servers to
stream/re-stream local data
» Combination performs Real-time and Historical processing:
» Querying the future – Continuous ETL and Analytics (parallel pipelines)
» Querying the past – Hadoop batch jobs on stored tuples (parallel batches)
» Re-streaming and Re-querying (for example, scenario & sensitivity analyses)
GroupAggJoinProjectSelect
ReduceCombineMapSplit
Hadoop & Relational Streaming Server
Sort
Order
Copyright © 2012 – Proprietary and Confidential Information of SQLstream Inc.20
Data Warehouses
Relational Streaming
HadoopBig Data
Messaging Middleware
Historical analysisPeriodic batches
Continuous analysisReal-time processing
High-level DeclarativeLanguage & Operation
Low-level ProceduralLanguage & Operation
Relational Streaming: A new data management quadrant
Copyright © 2012 – Proprietary and Confidential Information of SQLstream Inc.21
Parallel Processing
Real-time Analysis
Parallel processing made easy, auto-optimized, massive scale.
Process, analyze, and react – all in real-time.
Query the Future
Confidential and Trade Secret SQLstream Inc. © 2012
Relational Streaming – the Next Wave of Big Data.
RT Data Integration Continuous, real-time data integration:• Give each app the view of data and format it needs• Share all your data in real-time with all your apps• Perform Continuous ETL and Data Integration
Copyright © 2012 – Proprietary and Confidential Information of SQLstream Inc.
Real-time Control in a Big Data World
Sensors Expo, 2012
Presenter: Damian Black, SQLstream CEO