Stream ProcessingKey Driver for Enabling Instant Insights on Big Data
Pritesh Maker
• Background• Presently leading Engineering at DataTorrent• Over a decade of experience in Data Management technologies including Data
Integration, Data Virtualization and Data Quality & Profiling • Past Roles include leading Engineering at Informatica for their Big Data
Management products and core Data Engine • Interested in All Things Data!
• Education• BS in Computer Science from University of Texas at Austin• MBA from Haas School of Business, University of California at Berkeley
• Connect with me• LinkedIn: https://www.linkedin.com/in/priteshmaker• Twitter: @priteshmaker
Why is Stream Processing Vital?
SOURCE DATA
MS Queue’s
Events
XML Files
Databases
Sensor data
Social
Enterprise
Repositories
RDBMS
EDW
NoSQL
Feed m
Feed 2
Feed 1
Load
(Optional) Staging Area
Traditional Analytics – Data at Rest
Business Analytics
Business Intelligence
Visualization Tools
Vis
ual
ize
Analyze
Extract Transform
Feed n
Feed 2
Feed 1
Visualize
Next Generation – Data in Motion • Organizations need to react to changing business conditions in real time
• Faster decision making across all industries • Few companies outside of financial markets, telecom & utilities have experience with
streaming
• Newer data sources – like sensors, social media feeds • Higher Volume and Greater Velocity • More unstructured and semi-structured data
• Democratization of technologies • Open Source Projects • Large Scale Compute & Storage – Hadoop, NoSQL• Streaming Technologies – Apex, Spark, Storm etc. • Real-time dashboards and alert notification systems
• Beyond niche use cases • Broad applicability but needs more adoption
Stream vs. Batch Processing Pipelines
Ingest
Archive
Transform
Normalize
Transform Analyze ActionVisualize/
PersistIngest
Stream Processing Data Pipeline
Batch Processing Data Pipeline
Extract Transform Load Analyze Action
Stream Processing•Continuous processing on data as it flows through a
system•Allows users to act on events instantaneously via
alerts•Processing related to time (event time vs. processing
time)• Real-Time – diff between event time and processing
time is negligible
Enables your Data In Motion Architecture
Big Data Application Types
Data Discovery
Da
ta v
elo
cit
y
IoT
Fraud
CDR
CDC
Reporting
SQL
Operations
Data Discovery
SQL on
Streams
Streaming
Disovery
Ad Hoc
Query
Batch
Processing
Stream
Processing
Stream
Processing
Sample Streaming Analytics Patterns
Preprocessing
• Filtering events
• Transforming attributes
Alerts & Thresholds
• Based on complex conditions
Computing within Windows
• Aggregations
Combining Event Streams
• Correlation
• Error detection
Enrichment
• Looking up database, reference data
Temporal Events
• Detecting events within time windows
Tracking
• Tracking events over space & time
Trend Detection
• Rise, Fall
• Outliers
Source: https://iwringer.wordpress.com/2015/08/03/patterns-for-streaming-realtime-analytics/
Stream Processing Use Cases
Financial Services
• Detect fraudulent activity in real-time
• Risk Analysis
• Deliver personalized products and
offerings
• Make decisions in real-time for trading
and transactional platforms
Financial services big data fabric
Secure, fault tolerant, data
ingestion, formatting & archiving.
Data access layer for application
processing
Financial Data
SMTP Logs
Historical
Application n
Application 1
Persistent
Encrypt Compliance Alert on error
Archive
Telecom
• Real-time network monitoring and
protection
• Quality of service and Customer
Satisfaction
• Take action based on users’ location
• Automatic resource allocation and load
balancing
Online Advertising
• Dynamic bidding
• Real-time targeting & personalization
• Maximize click-through and
conversion rates.
• Reporting that can be updated
continuously
Online advertising dynamic inventory purchases
High volume auto-scaling fault
tolerant event stream.
Dimensional computing to identify
performing ads.Ad Server 1
Ad Server 800
Real-time
Dashboard
Ad Placement
Strategy
Oracle DB
Fault-Tolerant
Flume
In-memory
analytic cube
Campaign
Analysis
Internet of Things
• Environment monitoring
• Infrastructure management
• Manufacturing
• Energy management
• Public Building & Home automation
• Transportation
IoT secure ingestion and predictive analysis
High performance, multi-customer
secure, data ingestion. Complex
event processing with historical
data for predictive maintenance
Sensor 2
Sensor 1
Sensor N
Application n
Application 1
Persistent
Data
Governance
Complex
Event Process
Predictive
maintenance
Stream Processing: Conclusion
• Lots of untapped potential!• Gives your business a competitive edge!
• Open Source and Big Data technologies • Built to address the scale and latency
demands
• Broad use cases • Across industries and verticals
Top Related