Federated Stream Processing Support for Real-Time Business ... · MaxStream Architecture: From...
Transcript of Federated Stream Processing Support for Real-Time Business ... · MaxStream Architecture: From...
![Page 1: Federated Stream Processing Support for Real-Time Business ... · MaxStream Architecture: From 30,000 ft •Key ideas: –Uniform query language and API –Relational database infrastructure](https://reader033.fdocuments.in/reader033/viewer/2022042914/5f4d4dcc6c34f71152638ee5/html5/thumbnails/1.jpg)
Federated Stream Processing Support for Real-Time Business Intelligence Applications
Irina Botan, Younggoo Cho, Roozbeh Derakhshan, Nihal Dindar, Laura Haas, Kihong Kim, Nesime Tatbul
![Page 2: Federated Stream Processing Support for Real-Time Business ... · MaxStream Architecture: From 30,000 ft •Key ideas: –Uniform query language and API –Relational database infrastructure](https://reader033.fdocuments.in/reader033/viewer/2022042914/5f4d4dcc6c34f71152638ee5/html5/thumbnails/2.jpg)
Introduction
• Business Intelligence (BI) enables better decision-making for businesses.
• In operational BI, real-time response to business events is critical, which requires:– reducing latency
– providing rich contextual information
We propose MaxStream federated stream processing system as a platform to meet these needs.
2VLDB BIRTE Workshop, 2009 Nesime Tatbul, ETH Zurich
![Page 3: Federated Stream Processing Support for Real-Time Business ... · MaxStream Architecture: From 30,000 ft •Key ideas: –Uniform query language and API –Relational database infrastructure](https://reader033.fdocuments.in/reader033/viewer/2022042914/5f4d4dcc6c34f71152638ee5/html5/thumbnails/3.jpg)
Talk Outline
• Example Use Cases & Motivation
• MaxStream System
– Architecture
– Usage
– Feasibility
• Conclusions & Open Challenges
3VLDB BIRTE Workshop, 2009 Nesime Tatbul, ETH Zurich
![Page 4: Federated Stream Processing Support for Real-Time Business ... · MaxStream Architecture: From 30,000 ft •Key ideas: –Uniform query language and API –Relational database infrastructure](https://reader033.fdocuments.in/reader033/viewer/2022042914/5f4d4dcc6c34f71152638ee5/html5/thumbnails/4.jpg)
Example Use Cases
• Supply-Chain Optimization
• Call Center Management
• Quality Management in Manufacturing
• SLA Monitoring and Maintenance
• Global Shipment & Delivery Monitoring
• Fraud Detection in Financial Companies
• Real-time Marketing
• …
Different levels of latency and data persistence requirements
4VLDB BIRTE Workshop, 2009 Nesime Tatbul, ETH Zurich
![Page 5: Federated Stream Processing Support for Real-Time Business ... · MaxStream Architecture: From 30,000 ft •Key ideas: –Uniform query language and API –Relational database infrastructure](https://reader033.fdocuments.in/reader033/viewer/2022042914/5f4d4dcc6c34f71152638ee5/html5/thumbnails/5.jpg)
e.g., Call Center Management
• Multiple centers across the globe
• Every incoming call is captured with arrival time, service start and end times
• Main BI tasks:
– Run statistics on wait time, service duration, etc. for different regions
– Generate reports, analyzing problems and proposing strategic improvements
5VLDB BIRTE Workshop, 2009 Nesime Tatbul, ETH Zurich
![Page 6: Federated Stream Processing Support for Real-Time Business ... · MaxStream Architecture: From 30,000 ft •Key ideas: –Uniform query language and API –Relational database infrastructure](https://reader033.fdocuments.in/reader033/viewer/2022042914/5f4d4dcc6c34f71152638ee5/html5/thumbnails/6.jpg)
MaxStream Architecture: From 30,000 ft
• Key ideas:
– Uniform query language and API
– Relational database infrastructure as the basis for the federation layer (in our case: SAP MaxDB and SAP MaxDB Federator)
– “Just enough” streaming capability inside the federation layer
Data Agent
Client Application
Federation Layer
DBDB
Wrapper Wrapper Wrapper
SPESPE
6VLDB BIRTE Workshop, 2009 Nesime Tatbul, ETH Zurich
![Page 7: Federated Stream Processing Support for Real-Time Business ... · MaxStream Architecture: From 30,000 ft •Key ideas: –Uniform query language and API –Relational database infrastructure](https://reader033.fdocuments.in/reader033/viewer/2022042914/5f4d4dcc6c34f71152638ee5/html5/thumbnails/7.jpg)
Putting MaxStream into Context
• vs. Federated Databases– Less focus on data locality, more focus on functional
heterogeneity
• vs. Stream Processing Engines (SPEs)– Unlike distributed SPEs, there may be heterogeneity
– Unlike stream-relational SPEs, MaxStream federator is not a full-fledged SPE
• vs. Business Intelligence Software– Tighter integration between (possibly heterogeneous)
SPEs and databases
7VLDB BIRTE Workshop, 2009 Nesime Tatbul, ETH Zurich
![Page 8: Federated Stream Processing Support for Real-Time Business ... · MaxStream Architecture: From 30,000 ft •Key ideas: –Uniform query language and API –Relational database infrastructure](https://reader033.fdocuments.in/reader033/viewer/2022042914/5f4d4dcc6c34f71152638ee5/html5/thumbnails/8.jpg)
MaxStream Architecture: A Closer Look
8VLDB BIRTE Workshop, 2009 Nesime Tatbul, ETH Zurich
SQL Parser
Query Rewriter
Query Optimizer
Query ExecuterSQL DialectTranslator
MaxStreamFederator
Client Application
Output EventTables
Input EventTables
Metadata
DDL/DML statements in MaxStream’s SQL Dialect
Ou
tpu
t Ev
en
ts
Data Agent for SPE
SPE’s SDK
SPE
MaxDB ODBC
DDL/DML in SPE’s SQL
InputEvents
Data Agent
DBDB
Data Agent Data Agent for SPE
SPE’s SDK
SPE
MaxDB ODBC
![Page 9: Federated Stream Processing Support for Real-Time Business ... · MaxStream Architecture: From 30,000 ft •Key ideas: –Uniform query language and API –Relational database infrastructure](https://reader033.fdocuments.in/reader033/viewer/2022042914/5f4d4dcc6c34f71152638ee5/html5/thumbnails/9.jpg)
MaxStream ArchitectureTwo Key Building Blocks
• Streaming Inputs through MaxStream
– ISTREAM Operator for Persistent input events
– Tuple Queues for Transient input events
• Streaming Outputs through MaxStream
– Monitoring Select over Event Tables
• Persistent Event Tables for Persistent output events
• In-Memory Event Tables for Transient output events
9VLDB BIRTE Workshop, 2009 Nesime Tatbul, ETH Zurich
![Page 10: Federated Stream Processing Support for Real-Time Business ... · MaxStream Architecture: From 30,000 ft •Key ideas: –Uniform query language and API –Relational database infrastructure](https://reader033.fdocuments.in/reader033/viewer/2022042914/5f4d4dcc6c34f71152638ee5/html5/thumbnails/10.jpg)
Streaming Persistent Input Events
• The ISTREAM (“Insert STREAM”) Operator
– Relation-to-Stream operator first proposed by Widom et al. [STREAM Project], that streams new tuples being inserted into a given relation.
– Example:INSERT INTO STREAM CallStream
SELECT OpCode, ArrivalTime, StartTime, EndTime
FROM ISTREAM(CallTable);
r1
r2
r3
r1
r2
r3
r4
r5
T+1T
ISTREAM(CallTable) at T+1 returns:
<r4, T+1>, <r5, T+1>
10VLDB BIRTE Workshop, 2009 Nesime Tatbul, ETH Zurich
![Page 11: Federated Stream Processing Support for Real-Time Business ... · MaxStream Architecture: From 30,000 ft •Key ideas: –Uniform query language and API –Relational database infrastructure](https://reader033.fdocuments.in/reader033/viewer/2022042914/5f4d4dcc6c34f71152638ee5/html5/thumbnails/11.jpg)
Streaming Output Events
• Opposite of streaming input events, but…
– Unlike the SPE interface, the client application interface is not push-based.
• Alternative solutions:
– Each client monitors its own alerts on a given table.
• cumbersome and error-prone
– A monitoring program does so for all registered clients using periodic select queries (i.e., polling) or triggers.
• Not event-driven, inefficient, not scalable
11VLDB BIRTE Workshop, 2009 Nesime Tatbul, ETH Zurich
![Page 12: Federated Stream Processing Support for Real-Time Business ... · MaxStream Architecture: From 30,000 ft •Key ideas: –Uniform query language and API –Relational database infrastructure](https://reader033.fdocuments.in/reader033/viewer/2022042914/5f4d4dcc6c34f71152638ee5/html5/thumbnails/12.jpg)
Streaming Output Events
• Our Solution: Monitoring Select
– Select operation blocks until there is at least one row to return.
– For continuous monitoring, the client program re-issues Monitoring Select in a loop.
– Monitoring Select operates on “Event Tables”.
• Example: Detect calls with unusually long waiting times.
12
SELECT *
FROM /*+ EVENT */ CallAnalysis
WHERE AvgWait > 10;
VLDB BIRTE Workshop, 2009 Nesime Tatbul, ETH Zurich
![Page 13: Federated Stream Processing Support for Real-Time Business ... · MaxStream Architecture: From 30,000 ft •Key ideas: –Uniform query language and API –Relational database infrastructure](https://reader033.fdocuments.in/reader033/viewer/2022042914/5f4d4dcc6c34f71152638ee5/html5/thumbnails/13.jpg)
Hybrid Queries in MaxStream
• Hybrid queries are continuous queries that join Streams with Tables
– Similar to joining Fact tables with Dimension tables in data warehouses
• One can conveniently use hybrid queries in MaxStream in two ways:
– To enrich the input stream before it is passed to the SPE
– To enrich the output stream after it is received from the SPE
13VLDB BIRTE Workshop, 2009 Nesime Tatbul, ETH Zurich
![Page 14: Federated Stream Processing Support for Real-Time Business ... · MaxStream Architecture: From 30,000 ft •Key ideas: –Uniform query language and API –Relational database infrastructure](https://reader033.fdocuments.in/reader033/viewer/2022042914/5f4d4dcc6c34f71152638ee5/html5/thumbnails/14.jpg)
Hybrid Queries: Call Center Example
14
CREATE TABLE CallTable (Opcode, ArrivalTime, StartTime, EndTime);
INSERT INTO STREAM CallStreamSELECT o.RegionNm AS Region, c.StartTime-c.ArrivalTime AS WaitTime,
c.EndTime-c.StartTime AS DurationFROM ISTREAM(CallTable) c, OperatorsbyRegion oWHERE c.Opcode = o.Operator;
INSERT INTO TABLE CallAnalysisSELECT Region, COUNT(*) AS Cnt, AVG(WaitTime) AS AvgWait,
AVG(Duration) AS CallLengthFROM CallStreamGROUP BY RegionKEEP 1 HOUR;
ContinuousQueryin SPE:
Enrichingthe output inMaxStream:
Enriching the input inMaxStream:
VLDB BIRTE Workshop, 2009 Nesime Tatbul, ETH Zurich
SELECT a.Region, a.AvgWait, a.AvgDuration, r.NOps, r.TrainingFROM /* +Event */ CallAnalysis a, Regions rWHERE AvgWait > 10
AND a.Region = r.RegName;
![Page 15: Federated Stream Processing Support for Real-Time Business ... · MaxStream Architecture: From 30,000 ft •Key ideas: –Uniform query language and API –Relational database infrastructure](https://reader033.fdocuments.in/reader033/viewer/2022042914/5f4d4dcc6c34f71152638ee5/html5/thumbnails/15.jpg)
Initial Feasibility Study
• Goal: to show
– if MaxStream is useful in supporting real-time BI applications
– whether MaxStream’s performance overhead is acceptable
• Setup: SAP Sales and Distribution Benchmark
– Persistent events, Throughput critical
– Original benchmark: No streaming
– We add streaming and compare the following two setups:• SD vs. SD with MaxStream/ISTREAM + SPE “X”
• SD vs. SD with MaxStream/Monitoring-Select
15VLDB BIRTE Workshop, 2009 Nesime Tatbul, ETH Zurich
![Page 16: Federated Stream Processing Support for Real-Time Business ... · MaxStream Architecture: From 30,000 ft •Key ideas: –Uniform query language and API –Relational database infrastructure](https://reader033.fdocuments.in/reader033/viewer/2022042914/5f4d4dcc6c34f71152638ee5/html5/thumbnails/16.jpg)
SAP Sales and Distribution (SD) Benchmark
• It is a business benchmark that models a sell-from-stock scenario that consists of 6 transactions, each with 1-4 dialog steps and around 10 seconds of think-time for each.
– Example transactions: Create customer order document, Create order delivery document, Create invoice, etc.
• Measure: throughput in the number of processed dialog steps per minute (SAPs).
16VLDB BIRTE Workshop, 2009 Nesime Tatbul, ETH Zurich
![Page 17: Federated Stream Processing Support for Real-Time Business ... · MaxStream Architecture: From 30,000 ft •Key ideas: –Uniform query language and API –Relational database infrastructure](https://reader033.fdocuments.in/reader033/viewer/2022042914/5f4d4dcc6c34f71152638ee5/html5/thumbnails/17.jpg)
Use of MaxStream in SAP SD Benchmark
MaxStream/ISTREAM + SPE “X”
• Stream incoming orders.
• Forward sales orders to SPE “X” via MaxStream in order to continuously compute the daily sum of sales orders for each product and region.
MaxStream/Monitoring-Select
• Monitor big sales.
• Continuously monitor big sales orders (i.e., with amount > 95) by storing purchase orders in an event table and running Monitoring Select over it.
17
INSERT INTO STREAM SalesOrderStreamSELECT A.MANDT, A.VBELN, A.NETWR,
B.POSNR, B.MATNR, B.ZMENGFROM ISTREAM(VBAK) A, VBAP B
WHERE A.MANDT = B.MANDTAND A.VBELN = B.VBELN;
SELECT A.MANDT, A.VBELN, B.KWMENGFROM /*+ EVENT */ VBAK A, VBAP B
WHERE A.NETWR > 95AND A.MANDT = B.MANDTAND A.VBELN = B.VBELN;
VLDB BIRTE Workshop, 2009 Nesime Tatbul, ETH Zurich
![Page 18: Federated Stream Processing Support for Real-Time Business ... · MaxStream Architecture: From 30,000 ft •Key ideas: –Uniform query language and API –Relational database infrastructure](https://reader033.fdocuments.in/reader033/viewer/2022042914/5f4d4dcc6c34f71152638ee5/html5/thumbnails/18.jpg)
MaxStream SAP SD Benchmark Performance
SD SD with ISTREAM SD with Monitoring-Select
# of SD Users 16,000 16,000 16,000
Throughput (SAPs) 95,910 95,910 95,846
Dialog Response Time (msec)
13 13 13
DB Server CPU Utilization (%)
49.8% 50.6% 50.1%
18
SD with streaming features achieves similar performanceas the standard one.
VLDB BIRTE Workshop, 2009 Nesime Tatbul, ETH Zurich
![Page 19: Federated Stream Processing Support for Real-Time Business ... · MaxStream Architecture: From 30,000 ft •Key ideas: –Uniform query language and API –Relational database infrastructure](https://reader033.fdocuments.in/reader033/viewer/2022042914/5f4d4dcc6c34f71152638ee5/html5/thumbnails/19.jpg)
Conclusions
• Real-time BI requires new platforms which offer– low latencies of stream processing
– support for analytics of data warehouses
– flexible, dynamic access to data of data federation engines
• MaxStream stream federation engine provides– access to heterogeneous SPEs and DBs
– flexible persistence and data federation capabilities
• MaxStream is low-overhead and useful in various operational BI scenarios.
19VLDB BIRTE Workshop, 2009 Nesime Tatbul, ETH Zurich
![Page 20: Federated Stream Processing Support for Real-Time Business ... · MaxStream Architecture: From 30,000 ft •Key ideas: –Uniform query language and API –Relational database infrastructure](https://reader033.fdocuments.in/reader033/viewer/2022042914/5f4d4dcc6c34f71152638ee5/html5/thumbnails/20.jpg)
Open Challenges
• Unified continuous query execution model and semantics
• Cost- and Capability-based query optimization and dispatching over multiple SPEs
• Transactional aspects of federated stream processing
• Distributed operation aspects (e.g., load balancing, high availability)
20VLDB BIRTE Workshop, 2009 Nesime Tatbul, ETH Zurich
![Page 21: Federated Stream Processing Support for Real-Time Business ... · MaxStream Architecture: From 30,000 ft •Key ideas: –Uniform query language and API –Relational database infrastructure](https://reader033.fdocuments.in/reader033/viewer/2022042914/5f4d4dcc6c34f71152638ee5/html5/thumbnails/21.jpg)
Thanks!
• You
• MaxStream team
• Chan Young Kwon (SAP Labs, Korea)
• ETH Zurich Enterprise Computing Center (ECC)
• More information:http://www.systems.ethz.ch/research/projects/maxstream/
21VLDB BIRTE Workshop, 2009 Nesime Tatbul, ETH Zurich