Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spark Streaming
Introduction to Streaming Analytics
-
Upload
guido-schmutz -
Category
Data & Analytics
-
view
1.464 -
download
1
Transcript of Introduction to Streaming Analytics
![Page 1: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/1.jpg)
BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENEVA HAMBURG COPENHAGEN LAUSANNE MUNICH STUTTGART VIENNA ZURICH
Introduction to Streaming Analytics
Guido Schmutz
![Page 2: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/2.jpg)
Guido Schmutz
Working for Trivadis for more than 19 yearsOracle ACE Director for Fusion Middleware and SOACo-Author of different booksConsultant, Trainer Software Architect for Java, Oracle, SOA and Big Data / Fast DataMember of Trivadis Architecture BoardTechnology Manager @ Trivadis
More than 25 years of software development experience
Contact: [email protected]: http://guidoschmutz.wordpress.comTwitter: gschmutz
![Page 3: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/3.jpg)
Our company.
© Trivadis – The Company3 03.06.16
Trivadis is a market leader in IT consulting, system integration, solution engineeringand the provision of IT services focusing on and and Open Source technologiesin Switzerland, Germany, Austria and Denmark. We offer our services in the followingstrategic business fields:
Trivadis Services takes over the interacting operation of your IT systems.
O P E R A T I O N
![Page 4: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/4.jpg)
COPENHAGEN
MUNICH
LAUSANNEBERN
ZURICHBRUGG
GENEVA
HAMBURG
DÜSSELDORF
FRANKFURT
STUTTGART
FREIBURG
BASEL
VIENNA
With over 600 specialists and IT experts in your region.
© Trivadis – The Company4 03.06.16
14 Trivadis branches and more than600 employees
200 Service Level Agreements
Over 4,000 training participants
Research and development budget:CHF 5.0 million
Financially self-supporting andsustainably profitable
Experience from more than 1,900 projects per year at over 800customers
![Page 5: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/5.jpg)
Agenda
1. Introduction & Foundation2. Designing Streaming Analytics Solutions
3. Implementing Event Hub
4. Implementing Data Ingestion
5. Implementing Streaming Analytics
6. Scalability & Reliability7. Streaming Analytics in Architecture
8. Summary
![Page 6: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/6.jpg)
Introduction & Foundation
![Page 7: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/7.jpg)
Big Data Definition (4 Vs)
+Timetoaction?– BigData+Real-Time=StreamProcessing
CharacteristicsofBigData:ItsVolume,VelocityandVarietyincombination
![Page 8: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/8.jpg)
The world is changing …
The model of Generating/Consuming Data has changed ….
Old Model: few companies are generating data, all others are consuming data
New Model: all of use are generating data, and all of us are consuming data
![Page 9: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/9.jpg)
Who is generating Big Data?
The progress and innovation is no longer hindered by the ability to collect data
But by the ability to manage, analyze, summarize, visualize and discover knowledge from the collected data in a timely manner and in a scalable fashion
Socialmediaandnetworks(allofusaregeneratingdata)
Scientificinstruments(collectingallsortsofdata)
Mobiledevices(trackingallobjectsallthetime)
Sensortechnologyandnetworks(measuringallkinds ofdata)
![Page 10: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/10.jpg)
Traditional Data Processing - Challenges
• Introduces too much “decision latency”
• Responses are delivered “after the fact”
• Maximum value of the identified situation is lost
• Decision are made on old and stale data
• “Data a Rest”
![Page 11: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/11.jpg)
The New Era: Streaming Data Analytics / Fast Data
• Events are analyzed and processed in real-time as the arrive
• Decisions are timely, contextual and based on fresh data
• Decision latency is eliminated
• “Data in motion”
![Page 12: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/12.jpg)
Real Time Analytics Use Cases
• Algorithmic Trading
• Online Fraud Detection
• Geo Fencing
• Proximity/Location Tracking
• Intrusion detection systems
• Traffic Management
• Recommendations
• Churn detection
• Internet of Things (IoT) / Intelligence
Sensors
• Social Media/Data Analytics
• Gaming Data Feed
• …
![Page 13: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/13.jpg)
What happen in an internet minute
![Page 14: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/14.jpg)
Internet Of Things – Sensorsare/will be everywhereThere are more devices tapping into the internet than people on earth
How do we prepare our systems/architecture for the future?
Source:CiscoSource:TheEconomist
![Page 15: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/15.jpg)
Different Types of Stream/Event Processing
Simple Event Processing (SEP)
Event Stream Processing (ESP)
![Page 16: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/16.jpg)
Different Types of Stream/Event Processing
Complex Event Processing (CEP)
![Page 17: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/17.jpg)
Native Streaming vs. Micro-Batching
Native Streaming• Events processed as they
arrive• + low-latency• - throughput• - fault tolerance is expensive
Micro-Batching• Splits incoming stream in
small batches• + high(er) throughput• + easier fault tolerance• - lower latency
Source: Distributed Real-TimeStreamProcessing:WhyandHowbyPetrZapletal
![Page 18: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/18.jpg)
How to design a Streaming Analytics Solution?
EventStream
eventDataIngestion
event
Persist(Queue)
EventStream
eventDataIngestion
event
Analytics
eventAnalytics
result
result
EventStream
event DataIngestion/Analytics
result
![Page 19: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/19.jpg)
Demo Use Case – Truck Sensors
Truck DataIngestion Geo-Fencing
2016-06-02 14:39:56.605|98|27|MarkLochbihler|803014426|Wichita toLittle Rock Route 2|Normal|38.65|-90.21|5187297736652502631
{"timestamp": "2016-06-0214:39:56.991","truckId": 99,"driverId": 31,"driverName":"Rommel Garcia", "routeId":1565885487, "routeName":"Springfield toKCViaHanibal","eventType":"Normal", "latitude":37.16,"longitude": "-94.46","correlationId":5187297736652502631}
RecklessDrivingDetector
NEAR
ENTER
TruckDriver
DashboardMovement MovementJSON
RecklessDriver
![Page 20: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/20.jpg)
Designing Streaming Analytics Solutions
![Page 21: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/21.jpg)
How to design a Streaming Analytics System?It usually starts very simple … just one data pipeline
EventStream
AnalyticseventData
Ingestion
![Page 22: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/22.jpg)
New Event Stream sources are added …
EventStream
Analytics
2nd EventStream
3rd EventStream
nth EventStream
event
event
event
event
DataIngestion
2nd DataIngestion
3rd DataIngestion
Nth DataIngestion
![Page 23: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/23.jpg)
New Processors are interested in the events …
EventStream
Analytics
2nd EventStream
3rd EventStream
nth EventStream
2nd Analyticsevent
event
event
event
DataIngestion
2nd DataIngestion
3rd DataIngestion
Nth DataIngestion
![Page 24: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/24.jpg)
… and the solution becomes the problem
EventStream
Analytics
2nd EventStream
3rd EventStream
nth EventStream
2nd Analytics
3rd Analytics
Nth
Analytics
event
event
event
event
DataIngestion
2nd DataIngestion
3rd DataIngestion
Nth DataIngestion
![Page 25: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/25.jpg)
… and the solution becomes the problem
EventStream
Analytics
2nd EventStream
3rd EventStream
nth EventStream
2nd Analytics
3rd Analytics
Nth
Analytics
event
event
event
event
DataIngestion
2nd DataIngestion
3rd DataIngestion
Nth DataIngestion
![Page 26: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/26.jpg)
… and the solution becomes the problem
NewCustomers
OperationalLogs
ClickStream
MeterReadings
event
event
event
event
CDCIngestion
LogIngestion
ClickStreamIngestion
SenorIngestion
Hadoop/DataWarehouse
RecommendationSystem
LogSearch
FraudDetection
![Page 27: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/27.jpg)
Decouple event streams from consumers
„UnifiedLog“
Remember EnterpriseService Bus(ESB)?
Enterprise EventBus EventStreamAnalyticsEventStream Ingestion
CDCIngestion
LogIngestion
ClickStreamIngestion
SenorIngestion
Hadoop/DataWarehouse
RecommendationSystem
LogSearch
FraudDetection
What istheideaofaUnifiedLog?
NewCustomers
OperationalLogs
ClickStream
MeterReadings
![Page 28: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/28.jpg)
Unified Log – What is it?
By Unified Log, we do not mean this ….137.229.78.245 - - [02/Jul/2012:13:22:26 -0800] "GET /wp-includes/js/tinymce/wp-tinymce.php?c=1&ver=349-20805 HTTP/1.1" 200 101114
137.229.78.245 - - [02/Jul/2012:13:22:28 -0800] "POST /wp-admin/admin-ajax.php HTTP/1.1" 200 30747
137.229.78.245 - - [02/Jul/2012:13:22:40 -0800] "POST /wp-admin/post.php HTTP/1.1" 302 -
137.229.78.245 - - [02/Jul/2012:13:22:40 -0800] "GET /wp-admin/post.php?post=387&action=edit&message=1 HTTP/1.1" 200 73160
137.229.78.245 - - [02/Jul/2012:13:22:41 -0800] "GET /wp-includes/css/editor.css?ver=3.4.1 HTTP/1.1" 304 -
137.229.78.245 - - [02/Jul/2012:13:22:41 -0800] "GET /wp-includes/js/tinymce/langs/wp-langs-en.js?ver=349-20805 HTTP/1.1" 304 -
137.229.78.245 - - [02/Jul/2012:13:22:41 -0800] "POST /wp-admin/admin-ajax.php HTTP/1.1" 200 30809
… but this• a structured log (records are numbered beginning with 0 based on order they are written)• aka. commit log or
journal
0 1 2 3 4 5 6 7 8 9 10
11
1st record Nextrecordwritten
![Page 29: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/29.jpg)
Central Unified Log for (real-time) subscription
Take all the organization’s data (events) and put it into a central log for subscriptionProperties of the Unified Log:
• Unified: “Enterprise”, single deployment
• Append-Only: events are appended, no update in place => immutable
• Ordered: each event has an offset, which is unique within a shard
• Fast: should be able to handle thousands of messages / sec
• Distributed: lives on a cluster of machines
0 1 2 3 4 5 6 7 8 9 10
11
reads
writes
Collector
ConsumerSystemA(time=6)
ConsumerSystemB(time=10)
reads
![Page 30: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/30.jpg)
Implementing Event Bus
![Page 31: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/31.jpg)
Apache Kafka - Overview
Distributed publish-subscribe messaging system
Designed for processing of real time activity stream data (logs, metrics collections, social media streams, …)
Initially developed at LinkedIn, now part of Apache
Does not use JMS API and standards
Kafka maintains feeds of messages in topics
Kafka Cluster
Consumer Consumer Consumer
Producer Producer Producer
![Page 32: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/32.jpg)
Apache Kafka - Motivation
LinkedIn’s motivation for Kafka was:
• “A unified platform for handling all the real-time data feeds a large company might have.”
Must haves
• High throughput to support high volume event feeds.
• Support real-time processing of these feeds to create new, derived feeds.
• Support large data backlogs to handle periodic ingestion from offline systems.
• Support low-latency delivery to handle more traditional messaging use cases.
• Guarantee fault-tolerance in the presence of machine failures.
![Page 33: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/33.jpg)
Apache Kafka - Architecture
Kafka Broker
Movement Processor
MovementTopic
Engine-MetricsTopic
1 2 3 4 5 6
EngineProcessor1 2 3 4 5 6
Truck
![Page 34: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/34.jpg)
Apache Kafka - Architecture
Kafka Broker
Movement Processor
MovementTopic
Engine-MetricsTopic
1 2 3 4 5 6
EngineProcessor
Partition0
1 2 3 4 5 6Partition0
1 2 3 4 5 6Partition1 Movement
ProcessorTruck
![Page 35: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/35.jpg)
ApacheKafka
Kafka Broker
Movement Processor
Truck
MovementTopic
Engine-MetricsTopic
EngineProcessor
P0
Movement Processor
1 2 3 4 5
P1 1 2 3 4 5
Kafka BrokerMovementTopic
Engine-MetricsTopic
P0 1 2 3 4 5
P1 1 2 3 4 5
P0 1 2 3 4 5
P0 1 2 3 4 5
![Page 36: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/36.jpg)
Apache Kafka - Partition offsets
Offset: messages in the partitions are each assigned a unique (per partition) and sequential id called the offset
• Consumers track their pointers via (offset, partition, topic) tuples
Consumer groupC1
![Page 37: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/37.jpg)
Apache Kafka - Performance
Kafka at LinkedIn => over 1100 brokers / 60 clusters
Kafka Performance at own setup => 6 brokers (VM) / 1 cluster
• 445’622 messages/second• 31 MB / second • 3.0405 ms average latency between producer / consumer
800billionmessages/day
175TBproduced/day
650TBconsumed/day
13millionmessages/second2.75GB/second
atbusiesttimeofday
http://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
https://engineering.linkedin.com/kafka/running-kafka-scale
![Page 38: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/38.jpg)
Demo Use Case – Truck Sensors
Truck DataIngestion Geo-Fencing
2016-06-02 14:39:56.605|98|27|MarkLochbihler|803014426|Wichita toLittle Rock Route 2|Normal|38.65|-90.21|5187297736652502631
{"timestamp": "2016-06-0214:39:56.991","truckId": 99,"driverId": 31,"driverName":"Rommel Garcia", "routeId":1565885487, "routeName":"Springfield toKCViaHanibal","eventType":"Normal", "latitude":37.16,"longitude": "-94.46","correlationId":5187297736652502631}
RecklessDrivingDetector
NEAR
ENTER
TruckDriver
DashboardMovement MovementJSON
RecklessDriver
![Page 39: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/39.jpg)
Demo: Consuming Kafka Topic
![Page 40: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/40.jpg)
Demo: Monitoring Kafka Cluster with Kafka Manager
![Page 41: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/41.jpg)
Implementing Data Ingestion
![Page 42: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/42.jpg)
StreamSets Data Collector
• Founded by ex-Cloudera, Informaticaemployees
• Continuous open source, intent-driven, big data ingest
• Visible, record-oriented approach fixes combinatorial explosion
• Batch or stream processing• Standalone, Spark cluster, MapReduce
cluster• IDE for pipeline development by ‘civilians’• Relatively new - first public release
September 2015• So far, vast majority of commits are from
StreamSets staff
![Page 43: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/43.jpg)
Apache NiFi
• Originated at NSA as Niagarafiles
• Open sourced December 2014, Apache TLP July 2015
• Opaque, file-oriented payload
• Distributed system of processors with centralized control
• Based on flow-based programmingconcepts
• Data Provenance
• Web-based user interface
![Page 44: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/44.jpg)
Demo Use Case – Truck Sensors
Truck DataIngestion Geo-Fencing
2016-06-02 14:39:56.605|98|27|MarkLochbihler|803014426|Wichita toLittle Rock Route 2|Normal|38.65|-90.21|5187297736652502631
{"timestamp": "2016-06-0214:39:56.991","truckId": 99,"driverId": 31,"driverName":"Rommel Garcia", "routeId":1565885487, "routeName":"Springfield toKCViaHanibal","eventType":"Normal", "latitude":37.16,"longitude": "-94.46","correlationId":5187297736652502631}
RecklessDrivingDetector
NEAR
ENTER
TruckDriver
DashboardMovement MovementJSON
RecklessDriver
![Page 45: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/45.jpg)
Demo: Using Apache NiFi for Collection
![Page 46: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/46.jpg)
Implementing Streaming Analytics
![Page 47: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/47.jpg)
Streaming Analytics
Product
Framework/Infrastructure
OpenSource ClosedSource
![Page 48: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/48.jpg)
Implementing Streaming Analytics: Oracle Stream Analytics
![Page 49: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/49.jpg)
History of Oracle Stream Analytics
OracleComplexEventProcessing (OCEP)
OracleEventProcessing (OEP)
OracleStreamExplorer (SX)
OracleEventProcessingforJavaEmbedded
OracleStreamAnalytics(OSA)
OracleEdgeAnalytics(OAE)
BEAWeblogic EventServerOracleCQL
OracleIoT CloudService
2016
2015
2007
2008
2012
2013
![Page 50: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/50.jpg)
OEA
• Filtering• Correlation• Aggregation• Pattern
matching
Devices / Gateways
Services
Computing Edge Enterprise
“Sea of data”
Macro-eventHigh-valueActionableIn-context
EDGEAnalytics
StreamAnalytics
FOG
• High Volume• Continuous Streaming• Extreme Low Latency• Disparate Sources• Temporal Processing• Pattern Matching• Machine Learning
Oracle Stream Analytics: From Noise to Value
• HighVolume• Continuous Streaming• Sub-Millisecond Latency• Disparate Sources• Time-Window Processing• PatternMatching
• HighAvailability /Scalability• Coherence Integration• Geospatial, Geofencing• BigDataIntegration
• Business EventVisualization
• Action!
![Page 51: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/51.jpg)
Oracle Stream Analytics Platform
What it does• Compelling, friendly and visually stunning real time
streaming analytics user experience for Business users to dynamically create and implement Instant Insight solutions
Key Features• Analyze simulated or live data feeds to determine event
patterns, correlation, aggregation & filtering• Pattern library for industry specific solutions• Streams, References, Maps & Explorations
Benefits• Accelerated delivery time• Hides all challenges & complexities of underlying real-time
event-driven infrastructure
![Page 52: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/52.jpg)
Oracle Stream Analytics - Connecting Everything & Anything of Interest to the Business
Understanding of CQL Filtering, Correlation, Pattern: NOT NEEDED
Understanding of IT Deployment and Management: NOT NEEDED
Understanding of Development, Java, Best Practices: NOT NEEDED
Understanding of the Event Driven Platform: NOT NEEDED
![Page 53: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/53.jpg)
Business accessibility to Geo-Streaming Analytics
Real Time Streaming Solutions face an increasing need to track "assets of interest" and initiate actions based on encroachment of boundary proximity to fixed and moving objects and other geographic, temporal, or event conditions.
Geo-Fence,Fence,Polygon
Geo-Streaming
![Page 54: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/54.jpg)
“Addvalue toyourreal timestreaming datadiscoveryandanalytics byapplying andincludingmathematical, statistical analysis totheliveoutput stream”
“These streaming “Excel spreadsheets” really docometolife”
Expression Builder enabling calculation for the Business User
![Page 55: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/55.jpg)
Concept of Connections & Connection Reuse in Streams
![Page 56: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/56.jpg)
Decision Table for Nested IF-THEN-ELSE Rules
![Page 57: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/57.jpg)
Topology View and Navigation
![Page 58: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/58.jpg)
Stream Analytics – Terminology for Business Users
Explorer: The Application User Interface Catalog: The repository for browsing resources
![Page 59: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/59.jpg)
Stream Analytics – Terminology for Business Users
Stream: incoming flow of events that you want to analyze (CSV, Kafka, JMS, Rest, MQTT, …)
Exploration: application that correlates events from streams and data sources, using filters, groupings, summaries, ranges, and more
![Page 60: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/60.jpg)
Stream Analytics – Terminology for Business Users
Shape: A blueprint of an event in a stream or data in a data source. How the business data is represented in the selected stream
Map: collection of geo-fences
Reference: A connection to static data that is joined to a stream to enrich it and/or to be used in business logic and output
![Page 61: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/61.jpg)
Stream Analytics – Terminology for Business Users
Pattern: A pre-built Exploration that addresses a particular business scenario in a focused and simplified User Interface
Connection: collection of metadata required to connect to an external system
Targets: defines an interface with a downstream system
![Page 62: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/62.jpg)
Demo Use Case – Truck Sensors
Truck DataIngestion Geo-Fencing
2016-06-02 14:39:56.605|98|27|MarkLochbihler|803014426|Wichita toLittle Rock Route 2|Normal|38.65|-90.21|5187297736652502631
{"timestamp": "2016-06-0214:39:56.991","truckId": 99,"driverId": 31,"driverName":"Rommel Garcia", "routeId":1565885487, "routeName":"Springfield toKCViaHanibal","eventType":"Normal", "latitude":37.16,"longitude": "-94.46","correlationId":5187297736652502631}
RecklessDrivingDetector
NEAR
ENTER
TruckDriver
DashboardMovement MovementJSON
RecklessDriver
![Page 63: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/63.jpg)
Demo: Oracle Stream Analytics
![Page 64: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/64.jpg)
Demo: Oracle Stream Analytics
![Page 65: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/65.jpg)
Demo: Oracle Stream Analytics
![Page 66: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/66.jpg)
Demo: Oracle Stream Analytics
![Page 67: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/67.jpg)
Implementing Streaming Analytics: Spark Streaming
![Page 68: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/68.jpg)
Apache Spark
Apache Spark is a fast and general engine for large-scale data processing• The hot trend in Big Data!• Originally developed 2009 in UC Berkley’s AMPLab• Based on 2007 Microsoft Dryad paper• Written in Scala, supports Java, Python, SQL and R• Can run programs up to 100x faster than Hadoop MapReduce in memory, or 10x
faster on disk• One of the largest OSS communities in big data with over 200 contributors in 50+
organizations• Open Sourced in 2010 – since 2014 part of Apache Software foundation
![Page 69: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/69.jpg)
Apache Spark
SparkSQL(BatchProcessing)
BlinkDB(ApproximateQuerying)
SparkStreaming(Real-Time)
MLlib,SparkR(MachineLearning)
GraphX(GraphProcessing)
SparkCoreAPIandExecutionModel
SparkStandalone MESOS YARN HDFS Elastic
SearchNoSQL S3
Libraries
CoreRuntime
ClusterResourceManagers DataStores
![Page 70: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/70.jpg)
Resilient Distributed Dataset (RDD)
Are• Immutable• Re-computable• Fault tolerant• Reusable
Have Transformations• Produce new RDD• Rich set of transformation available
• filter(), flatMap(), map(), distinct(), groupBy(), union(), join(), sortByKey(), reduceByKey(), subtract(), ...
Have Actions• Start cluster computing operations• Rich set of action available
• collect(), count(), fold(), reduce(), count(), …
![Page 71: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/71.jpg)
RDD RDD
Input Source
• File• Database• Stream• Collection
.count() ->100
Data
![Page 72: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/72.jpg)
Partitions RDD
Data
Partition0
Partition1
Partition2
Partition3
Partition4
Partition5
Partition6
Partition7
Partition8
Partition9
Server1
Server2
Server3
Server4
Server5
![Page 73: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/73.jpg)
Partitions RDD
Data
Partition0
Partition1
Partition2
Partition3
Partition4
Partition5
Partition6
Partition7
Partition8
Partition9
Server1
Server2
Server3
Server4
Server5
![Page 74: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/74.jpg)
Partitions RDD
Data
Partition0
Partition1
Partition2
Partition3
Partition4
Partition5
Partition6
Partition7
Partition8
Partition9
Server2
Server3
Server4
Server5
![Page 75: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/75.jpg)
Stage 1 – reduceByKey()
Stage 1 – flatMap() + map()
Spark Workflow InputHDFSFile
HadoopRDD
MappedRDD
ShuffledRDD
TextFileOutput
sc.hapoopFile()
map()
reduceByKey()
sc.saveAsTextFile()
Transformations(Lazy)
Action(Execute
Transformations)
Master
MappedRDD
P0
P1
P3
ShuffledRDD
P0
MappedRDD
flatMap()
DAGScheduler
![Page 76: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/76.jpg)
Spark Execution Model
DataStorage
Worker
Master
Executer
Executer
Server
Executer
![Page 77: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/77.jpg)
Stage 1 – flatMap() + map()
Spark Execution Model
DataStorage
Worker
Master
Executer
DataStorage
Worker
Executer
DataStorage
Worker
Executer
RDD
P0
P1
P3
NarrowTransformationMaster
filter()map()sample()flatMap()
DataStorage
Worker
Executer
![Page 78: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/78.jpg)
Stage 2 – reduceByKey()
Spark Execution Model
DataStorage
Worker
Executer
DataStorage
Worker
Executer
RDD
P0
WideTransformation
Master
join()reduceByKey()union()groupByKey()
Shuffle!
DataStorage
Worker
Executer
DataStorage
Worker
Executer
![Page 79: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/79.jpg)
Batch vs. Real-Time Processing
PetabytesofData
Gigabytes
PerSecond
![Page 80: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/80.jpg)
Discretized Stream (DStream)
Kafka
Truck
Truck
Truck
![Page 81: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/81.jpg)
Discretized Stream (DStream)
Kafka
Truck
Truck
Truck
![Page 82: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/82.jpg)
Discretized Stream (DStream)
Kafka
Truck
Truck
Truck
![Page 83: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/83.jpg)
Discretized Stream (DStream)
Kafka
Truck
Truck
Truck Discretebytime
IndividualEvent
DStream =RDD
![Page 84: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/84.jpg)
Discretized Stream (DStream)
DStream DStream
XSeconds
Transform
.countByValue()
.reduceByKey()
.join
.map
![Page 85: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/85.jpg)
Discretized Stream (DStream)time1 time2 time3
message
timen….
f(message 1)RDD@time1
f(message 2)
f(message n)
….
message 1RDD@time1
message 2
message n
….
result 1
result 2
result n
….
message message message
f(message 1)RDD@time2
f(message 2)
f(message n)
….
message 1RDD@time2
message 2
message n
….
result 1
result 2
result n
….
f(message 1)RDD@time3
f(message 2)
f(message n)
….
message 1RDD@time3
message 2
message n
….
result 1
result 2
result n
….
f(message 1)RDD@timen
f(message 2)
f(message n)
….
message 1RDD@timen
message 2
message n
….
result 1
result 2
result n
….
InputStream
EventDStream
MappedDStreammap()
saveAsHadoopFiles()
TimeIncreasing
DStream
TransformationLineage
Actio
nsTrig
ger
SparkJobs
Adapted fromChrisFregly: http://slidesha.re/11PP7FV
![Page 86: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/86.jpg)
Demo Use Case – Truck Sensors
Truck DataIngestion Geo-Fencing
2016-06-02 14:39:56.605|98|27|MarkLochbihler|803014426|Wichita toLittle Rock Route 2|Normal|38.65|-90.21|5187297736652502631
{"timestamp": "2016-06-0214:39:56.991","truckId": 99,"driverId": 31,"driverName":"Rommel Garcia", "routeId":1565885487, "routeName":"Springfield toKCViaHanibal","eventType":"Normal", "latitude":37.16,"longitude": "-94.46","correlationId":5187297736652502631}
RecklessDrivingDetector
NEAR
ENTER
TruckDriver
DashboardMovement MovementJSON
RecklessDriver
![Page 87: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/87.jpg)
Implementing Streaming Analytics: Apache Storm
![Page 88: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/88.jpg)
Apache Storm
A platform for doing analysis on streams of data as they come in, so you can react to data as it happens.• highly distributed real-time computation system
• Provides general primitives to do real-time computation
• To simplify working with queues & workers
• scalable and fault-tolerant
Originated at Backtype, acquired by Twitter in 2011Open Sourced late 2011Part of Apache since September 2013
![Page 89: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/89.jpg)
Apache Storm – Core concepts
Tuple• Immutable Set of Key/value pairs
Stream• an unbounded sequence of tuples that can be processed in parallel by Storm
Topology• Wires data and functions via a DAG (directed acyclic graph)• Executes on many machines similar to a MR job in Hadoop
Spout• Source of data streams (tuples)• can be run in “reliable” and “unreliable” mode
Bolt• Consumes 1+ streams and produces new streams• Complex operations often require multiple
steps and thus multiple bolts
Spout
Spout
Bolt
Bolt
Bolt
Bolt
SourceofStreamB
Subscribes:AEmits:C
Subscribes:AEmits:D
Subscribes:A&BEmits:-
Subscribes:C&DEmits:-
T T T T T T T T
![Page 90: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/90.jpg)
Demo Use Case – Truck Sensors
Truck DataIngestion Geo-Fencing
2016-06-02 14:39:56.605|98|27|MarkLochbihler|803014426|Wichita toLittle Rock Route 2|Normal|38.65|-90.21|5187297736652502631
{"timestamp": "2016-06-0214:39:56.991","truckId": 99,"driverId": 31,"driverName":"Rommel Garcia", "routeId":1565885487, "routeName":"Springfield toKCViaHanibal","eventType":"Normal", "latitude":37.16,"longitude": "-94.46","correlationId":5187297736652502631}
RecklessDrivingDetector
NEAR
ENTER
TruckDriver
DashboardMovement MovementJSON
RecklessDriver
![Page 91: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/91.jpg)
Apache Storm – How does it work ?
GeoHashing
TrucksMovement
GeoHashing
{"timestamp" :"2016-06-02
ShuffleGrouping
GeoHashing
{"timestamp" :"2016-06-0212:56:02.362","truckId" :35,"driverId" :26,"driverName" :"Michael Aube", "routeId" :1090292248, "eventType" :"Normal", "latitude" :40.86,"longitude" :"-89.91"}
TruckMovement
{"timestamp" :"2016-06-02
“geohash” :“dp206n3d“,
![Page 92: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/92.jpg)
Apache Storm – How does it work ?
GeoHashing
TrucksMovement
GeoFencer
GeoHashing
GeoFencer
GeoHashing
ShuffleGrouping
FieldsGrouping
TruckMovement
{"timestamp" :"2016-06-02
{"timestamp" :"2016-06-0212:56:02.362","truckId" :35,"driverId" :26,"driverName" :"Michael Aube", "routeId" :1090292248, "eventType" :"Normal", "latitude" :40.86,"longitude" :"-89.91"}
{“geohash” :“dp206n3d“, "timestamp" :"2016-06-02 12:56:02.362","truckId" :35,"driverId" :26,"driverName" :"MichaelAube", "routeId" :1090292248,"eventType" :"Normal", "latitude" :40.86,"longitude" :"-89.91"}
{“geohash” :“f00hfh99“, ..
{ "timestamp" :"2016-06-02
![Page 93: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/93.jpg)
Apache Storm – How does it work ?
GeoHashing
TrucksMovement
GeoFencer
GeoHashing
GeoFencer
Alerter
GeoHashing
ShuffleGrouping
FieldsGrouping
GlobalGrouping
TruckMovement
{"timestamp" :"2016-06-02
{"timestamp" :"2016-06-0212:56:02.362","truckId" :35,"driverId" :26,"driverName" :"Michael Aube", "routeId" :1090292248, "eventType" :"Normal", "latitude" :40.86,"longitude" :"-89.91"}
{“geohash” :“dp206n3d“, "timestamp" :"2016-06-02 12:56:02.362","truckId" :35,"driverId" :26,"driverName" :"MichaelAube", "routeId" :1090292248,"eventType" :"Normal", "latitude" :40.86,"longitude" :"-89.91"}
{"timestamp" :"2016-06-02
{"timestamp" :"2016-06-02 12:56:02.362","truckId" :35,"driverId" :26, "latitude" :40.86,"longitude" :"-89.91"}
{“geohash” :“f00hfh99“, ..
![Page 94: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/94.jpg)
Apache Storm – Core concepts
Each Spout or Bolt are running N instances in parallel
GeoHashingnth
TrucksMovement
GeoFencingnth
GeoHashing GeoFencing1st
Shuffle Fields
Shufflegrouping israndomgroupingFieldsgrouping isgroupedbyvalue,suchthatequalvalueresultsinequaltaskAllgrouping replicatestoalltasksGlobalgrouping makesalltuples gotoonetaskNonegrouping makesboltruninthesamethreadasbolt/spout itsubscribestoDirectgrouping producer(taskthatemits)controlswhichconsumerwillreceiveLocal orShufflegrouping
similartotheshufflegroupingbutwillshuffletuplesamongbolttasksrunninginthesameworkerprocess,ifany.Fallsbacktoshufflegrouping behavior.
ReportGlobal
![Page 95: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/95.jpg)
Scalability & Reliability
![Page 96: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/96.jpg)
How to scale a Streaming Analytics System?
Queue(Persist)
EventStream
event
CollectingThread1 event event
ProcessingThread1 result
CollectingThread2
ProcessingThread2
event event event result
CollectingThreadn
ProcessingThreadn
![Page 97: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/97.jpg)
CollectingProcess1
CollectingProcess1
CollectingProcess1
CollectingProcess1
CollectingProcess1
How to scale a Streaming Analytics System?
Queue1(Persist)
EventStream
event
CollectingThread1
event event ProcessingProcess1 result
CollectingThread1
ProcessingProcess1
Queue2(Persist)event
event event result
ProcessingProcess1
Queuen(Persist)
![Page 98: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/98.jpg)
CollectingProcess1
CollectingProcess2
Processing AProcess 2
Processing BProcess 2
Processing AProcess 1
Processing BProcess 1
How to scale a Streaming Analytics System?
EventStream
CollectingProcess1
CollectingProcess2
Processing AThread 2
Q2e
Processing BThread 2
Q2e
Processing AThread 1
Q1e
Processing BThread 1
Q1e
Processing AProcess 2
Processing AThread n
Qne
![Page 99: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/99.jpg)
How to make Streaminig Analytics System reliable?
Faults and stragglers inevitable in large clusters running big data applicationsStreaming applications must recover from them quickly
CollectingProcess2
Processing AProcess 2
Processing BProcess 2
EventStream
CollectingProcess2
Processing AThread 2
Q2e
Processing BThread 2
Q2e
CollectingProcess2
Processing AProcess 2
Processing BProcess 2
EventStream
CollectingProcess2
Processing AThread 2
Q2e
Processing BThread 2
Q2e
![Page 100: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/100.jpg)
How to deal with “Stragglers”
Consumer goes slow
Backpressure Queue upDrop data
Other jobs grindto a halt L
Run out ofmemory L
Spill to diskNo thanks L
![Page 101: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/101.jpg)
How to make Streaming Analytics System reliable?
Solution 1: using active/passive system (hot replication)• Both systems process the full load• In case of a failure, automatically switch and use the “passive” system• Stragglers slow down both active and passive system
State
=Statein-memoryand/oron-disk
CollectingProcess2
Processing AProcess 2
Processing BProcess 2
EventStream
CollectingProcess2
Processing AThread 2
Q2e
Processing BThread 2
Q2e
Active
CollectingProcess2
Processing AProcess 2
Processing BProcess 2
CollectingProcess2
Processing AThread 2
Q2e
Processing BThread 2
Q2e
Passive
State
State
![Page 102: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/102.jpg)
How to make Streaming Analytics System reliable?
Solution 2: Upstream backup• Nodes buffer sent messages and reply them to new node in case of failure• Stragglers are treated as failures
State =Statein-memoryand/oron-disk
buffer =Bufferforreplayin-memoryand/oron-disk
CollectingProcess2
Processing AProcess 2
Processing BProcess 2
EventStream
CollectingProcess2
Processing AThread 2
Q2e
Processing BThread 2
Q2e
State
![Page 103: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/103.jpg)
Message Delivery Semantics
At most once [0,1]• Messages my be lost • Messages never redelivered
At least once [1 .. n]• Messages will never be lost • but messages may be redelivered
(might be ok if consumer can handle it)
Exactly once [1]• Messages are never lost• Messages are never redelivered• Perfect message delivery• Incurs higher latency for transactional
semantics
![Page 104: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/104.jpg)
Streaming Analytics in Architecture
![Page 105: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/105.jpg)
“Traditional Architecture” for Big Data
DataCollection (Analytical)DataProcessing ResultStoreData
Sources
Channel
DataConsumer
Reports
Service
AnalyticTools
AlertingTools
Social
RDBMS
Sensor
ERP
Logfiles
Mobile
Machine
Batchcompute
Stage
ResultStore
QueryEngine
ComputedInformation
RawData(Reservoir)
=DatainMotion =DataatRest
![Page 106: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/106.jpg)
Streaming Analytics Architecture for Big Dataaka. (Complex) Event Processing)
DataCollection
Batchcompute
DataSources
Channel
DataConsumer
Reports
Service
AnalyticTools
AlertingTools
Social
Logfiles
Sensor
RDBMS
ERP
Mobile
Machine
(Analytical)Real-TimeDataProcessing
Stream/EventProcessing
ResultStore
Messaging
ResultStore
=DatainMotion =DataatRest
![Page 107: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/107.jpg)
Keep raw event data
DataCollection
Batchcompute
DataSources
Channel
DataConsumer
Reports
Service
AnalyticTools
AlertingTools
Social
Logfiles
Sensor
RDBMS
ERP
Mobile
Machine
(Analytical)Real-TimeDataProcessing
Stream/EventProcessing
ResultStore
Messaging
ResultStore
=DatainMotion =DataatRest
(Analytical)BatchDataProcessing
RawData(Reservoir)
![Page 108: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/108.jpg)
“Lambda Architecture” for Big Data
DataCollection
(Analytical)BatchDataProcessing
Batchcompute
ResultStoreDataSources
Channel
DataConsumer
Reports
Service
AnalyticTools
AlertingTools
Social
RDBMS
Sensor
ERP
Logfiles
Mobile
Machine
(Analytical)Real-TimeDataProcessing
Stream/EventProcessing
Batchcompute
Messaging
ResultStore
QueryEngine
ResultStore
ComputedInformation
RawData(Reservoir)
=DatainMotion =DataatRest
![Page 109: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/109.jpg)
“Kappa Architecture” for Big Data
DataCollection
“RawDataReservoir”
Batchcompute
DataSources
Messaging
DataConsumer
Reports
Service
AnalyticTools
AlertingTools
Social
Logfiles
Sensor
RDBMS
ERP
Mobile
Machine
(Analytical)Real-TimeDataProcessing
Stream/EventProcessing
ResultStore
Messaging
ResultStore
RawData(Reservoir)
=DatainMotion =DataatRest
ComputedInformation
![Page 110: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/110.jpg)
“Unified Architecture” for Big Data
DataCollection
(Analytical)BatchDataProcessing(CalculateModelsofincomingdata)
Batchcompute
ResultStoreDataSources
Channel
DataConsumer
Reports
Service
AnalyticTools
AlertingTools
Social
RDBMS
Sensor
ERP
Logfiles
Mobile
Machine
(Analytical)Real-TimeDataProcessing
Stream/EventProcessing
Batchcompute
Messaging
ResultStore
QueryEngine
ResultStore
ComputedInformation
RawData(Reservoir)
=DatainMotion =DataatRest
PredictionModels
![Page 111: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/111.jpg)
Summary
![Page 112: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/112.jpg)
Summary
More and more use cases (such as IoT) make Streaming Analytics necessary
Treat events as events! Infrastructures for handling lots of events are available!
Platforms such as Oracle Stream Analytics enable the business to work directly on streaming data (empower the business analyst) => User Experience of an Excel Sheet on streaming data
Platform such as Apache Strom and Apache Spark Streaming provide a highly-scalable and fault-tolerant infrastructure for streaming analytics => Oracle Stream Analytics can use Spark Streaming as the runtime infrastructure
Platforms such as Kafka provide a high volume event broker infrastructure, a.k.a. Event Hub
![Page 113: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/113.jpg)
ComparisonOracleStream Analytics SparkStreaming SparkStorm
Community n.a. >280contributors > 100contributors
Language Options Java,CQL Java,Scala, Python Java,Clojure, Scala,…
ProcessingModels Event-Streaming Micro-Batching Event-Streaming
Processing DSL Yes Yes No
Stateful Ops Yes Yes No
Patterndetection Yes No No
Scalability&Reliability limited yes yes
Distributed RPC No No Yes
DeliveryGuarantees At LeastOnce Exactly Once Atmostonce /Atleastonce
Latency sub-second seconds sub-second
”self-service”forBiz Yes No No
Platform OEP server,SparkStreaming(YARN,Mesos)
YARN,Mesos Standalone,DataStax EE
Storm Cluster,YARN
![Page 114: Introduction to Streaming Analytics](https://reader031.fdocuments.in/reader031/viewer/2022021813/5882b6c21a28abd75a8b753b/html5/thumbnails/114.jpg)