Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data...
Transcript of Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data...
![Page 1: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/1.jpg)
Processing Big Data in MotionStreaming Data Ingestion and Processing
Roger BargaGeneral ManagerKinesis Streaming Services, AWS
June 24th, 2016
![Page 2: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/2.jpg)
Riding the Streaming Rapids
2011 20152007 & 2008 2013201220102009 2016
Azure Stream Analytics
Complex Event Processingover Streaming Data
Relational Semanticsand Implementation
Streaming Map Reduce& Machine Learning over Streams
DEBS Keynote 2013
![Page 3: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/3.jpg)
Interest in and demand for stream data processing is rapidly
increasing*…* Understatement of the year (credit to Kostas Tzoumas)…
![Page 4: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/4.jpg)
Most data is produced continuously
{"payerId": "Joe","productCode": "AmazonS3","clientProductCode": "AmazonS3","usageType": "Bandwidth","operation": "PUT","value": "22490","timestamp": "1216674828"
}
Metering Record
127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
Common Log Entry
<165>1 2003-10-11T22:14:15.003Z mymachine.example.comevntslog - ID47 [exampleSDID@32473 iut="3" eventSource="Application" eventID="1011"][examplePriority@32473 class="high"]
Syslog Entry
“SeattlePublicWater/Kinesis/123/Realtime” –412309129140
MQTT Record
<R,AMZN,T,G,R1>
NASDAQ OMX Record
Smart Buildings
Beacons
Smart TextilesHealth Monitors
![Page 5: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/5.jpg)
Time is money…Recent data is highly valuable¾ If you act on it in time¾ Perishable Insights (M. Gualtieri, Forrester)
Old + Recent data is more valuable ¾ If you have the means to combine them
![Page 6: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/6.jpg)
Most ‘big data’ (Hadoop) jobs process data that was continuously generated Foundational for business critical workflowsEnable new class of applications & services that process data continuously.
Disruptive
![Page 7: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/7.jpg)
• Scalable & Durable Data Ingest � A quick word on our motivation� Kinesis Streams, through a simple example
• Continuous Stream Data Processing� Kinesis Client Library (KCL)� How customers are using Kinesis Streams today
• Building on Kinesis Streams� Kinesis Firehose� Kinesis Analytics
Agenda
![Page 8: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/8.jpg)
Our Motivation for Continuous ProcessingAWS Metering service• 100s of millions of billing records per second• Terabytes++ per hour• Hundreds of thousands of sources• For each customer: gather all metering records & compute monthly bill• Auditors guarantee 100% accuracy at months endSeem perfectly reasonable to run as a batch, but relentless pressure for realtime…
With a Data Warehouse to load• 1000s extract-transform-load (ETL) jobs every day• Hundreds of thousands of files per load cycle• Thousands of daily users, hundreds of queries per hour
![Page 9: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/9.jpg)
Our Motivation for Continuous ProcessingAWS Metering service• 100s of millions of billing records per second• Terabytes++ per hour• Hundreds of thousands of sources• For each customer: gather all metering records & compute monthly bill• Auditors guarantee 100% accuracy at months end
Other Service Teams, Similar Requirements• CloudWatch Logs and CloudWatch Metrics• CloudFront API logging• ‘Snitch’ internal datacenter hardware metrics
![Page 10: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/10.jpg)
Real-time Ingest• Highly Scalable• Durable• Replayable Reads
Continuous Processing• Support multiple simultaneous
data processing applications • Load-balancing incoming
streams, scale out processing• Fault-tolerance, Checkpoint /
Replay
Right Tool for the Job Enable Streaming Data Ingestion and Processing
![Page 11: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/11.jpg)
twitter-trends.com
Elastic Beanstalk
twitter-trends.com
Example applicationtwitter-trends.com website
![Page 12: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/12.jpg)
twitter-trends.com
Too big to handle on one box
![Page 13: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/13.jpg)
twitter-trends.com
The solution: streaming map/reduce
My top-10
My top-10
My top-10
Global top-10
![Page 14: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/14.jpg)
twitter-trends.com
Core concepts
My top-10
My top-10
My top-10
Global top-10
Data recordStream
Partition key
ShardWorker
Shard: 14 17 18 21 23
Data record
Sequence number
![Page 15: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/15.jpg)
twitter-trends.com
How this relates to Kinesis
Kinesis Kinesis application
![Page 16: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/16.jpg)
Kinesis Streaming Data Ingestion• Streams are made of Shards• Each Shard ingests data up to 1MB/sec, and up to 1000 TPS
• Producers use a PUT call to store data in a Stream: PutRecord {Data, PartitionKey, StreamName}
• Each Shard emits up to 2 MB/sec • All data is stored for 24 hours, 7 days if extended retention is ‘ON’
• Scale Kinesis streams by adding or removing Shards
• Replay data from retention period
![Page 17: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/17.jpg)
Amazon Web Services
AZ AZ AZ
Durable, highly consistent storage replicates dataacross three data centers (availability zones)
Aggregate andarchive to S3
Millions ofsources producing100s of terabytes
per hour
FrontEnd
AuthenticationAuthorization
Ordered streamof events supportsmultiple readers
Real-timedashboardsand alarms
Machine learningalgorithms or
sliding windowanalytics
Aggregate analysisin Hadoop or adata warehouse
Inexpensive: $0.028 per million puts
Real-Time Streaming Data Ingestion
Custom-built Streaming Applications(KCL)
Inexpensive: $0.014 per 1,000,000 PUT Payload Units
25 – 40ms 100 – 150ms
![Page 18: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/18.jpg)
Kinesis Client Library
![Page 19: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/19.jpg)
twitter-trends.com
Using the Kinesis API directly
KINESIS
iterator = getShardIterator(shardId, LATEST);while (true) {
[records, iterator] = getNextRecords(iterator, maxRecsToReturn);
process(records);}
process(records): {for (record in records) {
updateLocalTop10(record);}if (timeToDoOutput()) {
writeLocalTop10ToDDB();}
}
while (true) {localTop10Lists =
scanDDBTable();updateGlobalTop10List(
localTop10Lists);sleep(10);
}
![Page 20: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/20.jpg)
KINESIS
twitter-trends.com
Challenges with using the Kinesis API directly
Kinesisapplication
Manual creation of workers and assignment to shards
How many workers per EC2 instance?How many EC2 instances?
![Page 21: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/21.jpg)
KINESIS
twitter-trends.com
Using the Kinesis Client Library
Kinesisapplication
Shard mgmttable
![Page 22: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/22.jpg)
KINESIS
twitter-trends.com
Elasticity and Load Balancing
Shard mgmttable
Auto scaling Group
![Page 23: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/23.jpg)
KINESIS
twitter-trends.com
Fault Tolerance Support
Shard mgmttable
XAvailability Zone
1
Availability Zone 3
![Page 24: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/24.jpg)
Worker Fail Over
Amazon.com Confidential 24
Shard-0
Shard-1
Shard-2
Worker1
Worker2
Worker3
LeaseKey LeaseOwner LeaseCounter
Shard-0 Worker1 85
Shard-1 Worker2 94
Shard-2 Worker3 76
![Page 25: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/25.jpg)
Worker Fail Over
Amazon.com Confidential 25
Shard-0
Shard-1
Shard-2
Worker1
Worker2
Worker3
LeaseKey LeaseOwner LeaseCounter
Shard-0 Worker1 85 86
Shard-1 Worker2 94
Shard-2 Worker3 76 77X
![Page 26: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/26.jpg)
Worker Fail Over
Amazon.com Confidential 26
Shard-0
Shard-1
Shard-2
Worker1
Worker2
Worker3
LeaseKey LeaseOwner LeaseCounter
Shard-0 Worker1 85 86 87
Shard-1 Worker2 94
Shard-2 Worker3 76 77 78X
![Page 27: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/27.jpg)
Worker Fail Over
Amazon.com Confidential 27
Shard-0
Shard-1
Shard-2
Worker1
Worker2
Worker3
LeaseKey LeaseOwner LeaseCounter
Shard-0 Worker1 85 86 87 88
Shard-1 Worker3 94 95
Shard-2 Worker3 76 77 78 79X
![Page 28: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/28.jpg)
Worker Load Balancing
Amazon.com Confidential 28
Shard-0
Shard-1
Shard-2
Worker1
Worker2
Worker3
Worker4
LeaseKey LeaseOwner LeaseCounter
Shard-0 Worker1 88
Shard-1 Worker3 96
Shard-2 Worker3 78X
![Page 29: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/29.jpg)
Worker Load Balancing
Amazon.com Confidential 29
Shard-0
Shard-1
Shard-2
Worker1
Worker2
Worker3
Worker4
LeaseKey LeaseOwner LeaseCounter
Shard-0 Worker1 88
Shard-1 Worker3 96
Shard-2 Worker4 79X
![Page 30: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/30.jpg)
Resharding
Amazon.com Confidential 30
Shard-0 Worker1
Worker2
LeaseKey LeaseOwner LeaseCounter checkpoint
Shard-0 Worker1 90 SHARD_END
Shard-0Shard-1
Shard-2
![Page 31: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/31.jpg)
Resharding
Amazon.com Confidential 31
Shard-0
Shard-1
Shard-2
Worker1
Worker2
LeaseKey LeaseOwner LeaseCounter checkpoint
Shard-0 Worker1 90 SHARD_END
Shard-1 0 TRIM_HORIZON
Shard-2 0 TRIM_HORIZON
Shard-0Shard-1
Shard-2
![Page 32: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/32.jpg)
Resharding
Amazon.com Confidential 32
Shard-0
Shard-1
Shard-2
Worker1
Worker2
LeaseKey LeaseOwner LeaseCounter checkpoint
Shard-0 Worker1 90 SHARD_END
Shard-1 Worker1 2 TRIM_HORIZON
Shard-2 Worker2 3 TRIM_HORIZON
Shard-0Shard-1
Shard-2
![Page 33: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/33.jpg)
Resharding
Amazon.com Confidential 33
Shard-1
Shard-2
Worker1
Worker2
LeaseKey LeaseOwner LeaseCounter checkpoint
Shard-1 Worker1 2 TRIM_HORIZON
Shard-2 Worker2 3 TRIM_HORIZON
Shard-0Shard-1
Shard-2
![Page 34: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/34.jpg)
500MM tweets/day = ~ 5,800 tweets/sec
2k/tweet is ~12MB/sec (~1TB/day)
$0.015/hour per shard, $0.014/million PUTS
Kinesis cost is $0.47/hour
Redshift cost is $0.850/hour (for a 2TB node)
Total: $1.32/hour
Cost & Scale
Putting this into production
![Page 35: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/35.jpg)
Design Challenge(s)• Dynamic Resharding & Scale Out• Enforcing Quotas (think proxy fleet with 1Ks servers)• Distributed Denial of Service Attack (unintentional)• Dynamic Load Balancing on Storage Servers• Heterogeneous Workloads (tip of stream vs 7 day) • Optimizing Fleet Utilization (proxy, control, data planes)• Avoid Scaling Cliffs• …
![Page 36: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/36.jpg)
Sushiro: Kaiten Sushi Restaurants380 stores stream data from sushi plate sensors and stream to Kinesis
![Page 37: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/37.jpg)
Sushiro: Kaiten Sushi Restaurants380 stores stream data from sushi plate sensors and stream to Kinesis
![Page 38: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/38.jpg)
Real-Time Streaming Data with Kinesis Streams
5 billion events/wk from connected devices | IoT
17 PB of game data per season | Entertainment
100 billion ad impressions/day, 30 msresponse time | Ad Tech
100 GB/day click streams 250+ sites | Enterprise
50 billion ad impressions/day sub-50 ms responses | Ad Tech
17 million events/day| Technology
1 billion transactions per day | Bitcoin
1 TB+/day game data analyzed in real-time
| Gaming
![Page 39: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/39.jpg)
Streams provide a foundational abstraction on which to build higher
level services
![Page 40: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/40.jpg)
Amazon Kinesis FirehoseLoad massive volumes of streaming data into Amazon S3, Redshift and Elasticsearch
Zero administration: Capture and deliver streaming data into Amazon S3, Amazon Redshift, and other destinations without writing an application or managing infrastructure.
Direct-to-data store integration: Batch, compress, and encrypt streaming data for delivery into data destinations in as little as 60 secs using simple configurations.
Seamless elasticity: Seamlessly scales to match data throughput w/o intervention
Capture and submit streaming data
Analyze streaming data using your favorite BI tools
Firehose loads streaming data continuously into Amazon S3, Redshift and Elasticsearch
![Page 41: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/41.jpg)
AW
S En
dp
oin
t
[Batch, Compress, Encrypt]
Data Sources
S3No Partition Keys No Provisioning End to End Elastic
Amazon Kinesis Firehose Fully Managed Service for Delivering Data Streams into AWS Destinations
Redshift
![Page 42: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/42.jpg)
Amazon Kinesis AnalyticsAnalyze data streams continuously with standard SQL
• Apply SQL on streams: Easily connect to a Kinesis Stream or Firehose Delivery Stream and apply ANSI standard SQL.
• Build real-time applications: Perform continual processing on streaming data with sub-second processing latencies
• Easy Scalability : Elastically scales to match data throughput
Connect to Kinesis streams,Firehose delivery streams
Run standard SQL queries against data streams
Kinesis Analytics can send processed data to analytics tools so you can create
alerts and respond in real-time
![Page 43: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/43.jpg)
Realtime Analytics Patterns• Simple counting (e.g. failure count) • Counting with Windows ( e.g. failure count every hour) • Preprocessing: filtering, transformations (e.g. data cleanup)• Alerts , thresholds (e.g. alarm on high temperature) • Data Correlation, Detect missing events, detecting erroneous
data (e.g. detecting failed sensors) • Joining event streams (e.g. detect a hit on soccer ball) • Merge with data in database, collect, update data conditionally
![Page 44: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/44.jpg)
Realtime Analytics Patterns (contd.)
• Detecting Event Sequence Patterns (e.g. small transaction followed by large transaction)
• Tracking - follow some related entity’s state in space, time etc. (e.g. location of airline baggage, vehicle, customer by beacon)
• Detect trends – Rise, turn, fall, outliers, complex trends like triple bottom etc., (e.g. algorithmic trading, SLA, load balancing).
![Page 45: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/45.jpg)
Amazon Kinesis: Streaming data made easyServices make it easy to capture, deliver and process streams on AWS
Kinesis FirehoseFor all developers, data
scientists, IT professionals
Transform and load streaming data into S3, Redshift,
Elasticsearch, and more…
Kinesis AnalyticsFor all developers, analysts
and data scientists
Easily analyze streaming data using standard SQL
Kinesis StreamsFor Technical Developers
Build your own custom application to process or analyze streaming data
![Page 46: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/46.jpg)
Stream Processing End2End
Kinesis Analytics
Transform• Extract
Fields• Clean• Enrich
Analyze• Filter• Temporal joins• Combine w/ reference data• Projections• Correlate• Windowed Aggregates• Anomaly Detection• …
Durable ingest, repeatable processing Æ In stream processing Æ low latency delivery to persist, alert, visualize
![Page 47: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/47.jpg)
IoT Sensors (Example: Hello Inc.)
Sleep monitoring devices send data like bedroom temperature, humidity,
ambient light, noise level, and particulate count
Kinesis Streams reliably collects, stores, and
exposes sensor data for processing
Firehose loads data into S3 and Redshift for data science and durable storage
Consumers get better sleep by monitoring and adjusting their sleeping
conditions
DynamoDB enriches, aggregates, and transforms data for real-time per-user
analyses
![Page 48: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/48.jpg)
Customer Clickstream
![Page 49: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/49.jpg)
Streaming data is highly prevalent and relevant;Stream data processing is on the rise;A key part of business critical workflows today, a powerful abstraction for building a new class of applications & data intensive services tomorrow.A rich area for distributed systems, programming model, IoT, and new service(s) research.
Closing Thoughts
![Page 50: Processing Big Data in Motiondebs2016/BargaDEBS16.pdf · Amazon Kinesis Analytics Analyze data streams continuously with standard SQL • Apply SQL on streams: Easily connect to a](https://reader030.fdocuments.in/reader030/viewer/2022040608/5ec45e91a278ab1da5430c10/html5/thumbnails/50.jpg)
Questions