Amazon Kinesis
-
Upload
amazon-web-services -
Category
Technology
-
view
276 -
download
3
Transcript of Amazon Kinesis
![Page 1: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/1.jpg)
Deep Dive – Amazon Kinesis
Guy Ernest, Solution Architect - Amazon Web Services
@guyernest
![Page 2: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/2.jpg)
Motivation for Real Time Analytics
E*BI
RTML
![Page 3: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/3.jpg)
Analytics
Amazon KinesisManaged Service for Real Time Big Data Processing
Create Streams to Produce & Consume Data
Elastically Add and Remove Shards for Performance
Use Kinesis Worker Library, AWS Lambda, Apache Spark and
Apache Storm to Process Data
Integration with S3, Redshift and Dynamo DB
Compute Storage
AWS Global Infrastructure
Database
App Services
Deployment & Administration
Networking
Analytics
![Page 4: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/4.jpg)
Data Sources
App.4
[Machine Learning]
AW
S En
dp
oin
t
App.1
[Aggregate & De-Duplicate]
Data Sources
Data Sources
Data Sources
App.2
[Metric Extraction]
S3
DynamoDB
Redshift
App.3[Sliding Window Analysis]
Data Sources
Shard 1
Shard 2
Shard N
Availability
Zone
Availability
Zone
Amazon Kinesis Dataflow
Availability
Zone
![Page 5: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/5.jpg)
Billing Auditors
Incremental
Bill
Computation
Metering Archive
Billing
Management
Service
Example Architecture - Metering
![Page 6: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/6.jpg)
StreamsNamed Event Streams of Data
All data is stored for 24 hours
ShardsYou scale Kinesis streams by adding or removing Shards
Each Shard ingests up to 1MB/sec of data and up to 1000 TPS
Partition KeyIdentifier used for Ordered Delivery & Partitioning of Data across
Shards
SequenceNumber of an event as assigned by Kinesis
Amazon Kinesis Components
![Page 7: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/7.jpg)
Getting Data In
![Page 8: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/8.jpg)
Producers use a PUT call to store data
in a Stream
A Partition Key is used to distribute the
PUTs across Shards
A unique Sequence # is returned to the
Producer for each Event
Data can be ingested at 1MB/second
or 1000 Transactions/second per Shard
1MB / Event
Kinesis - Ingesting Fast Moving Data
![Page 9: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/9.jpg)
Native Code Module to perform efficient writes to Multiple
Kinesis Streams
C++/Boost
Asynchronous Execution
Configurable Aggregation of Events
Introducing the Kinesis Producer Library
My Application KPL Daemon
PutRecord(s)
Kinesis Stream
Kinesis Stream
Kinesis Stream
Kinesis Stream
Async
![Page 10: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/10.jpg)
KPL Aggregation
My Application KPL Daemon
PutRecord(s)
Kinesis Stream
Kinesis Stream
Kinesis Stream
Kinesis Stream
Async
1MB Max Event Size
Aggregate
100k 20k500k200k
40k 20k40k
500k100k 200k 20k
40k
40k
20k
Protobuf Header Protobuf Footer
![Page 11: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/11.jpg)
Apache Flume
Source & Sink
https://github.com/pdeyhim/flume-
kinesis
FluentD
Dynamic Partitioning Support
https://github.com/awslabs/aws-
fluent-plugin-kinesis
Log4J & Log4Net
Included in Kinesis Samples
Kinesis Ecosystem - Ingest
Kinesis
![Page 12: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/12.jpg)
Best Practices for Partition Key
• Random will give even distribution
• If events should be processed together, choose
a relevant high cardinality partition key and
monitor shard distribution
• If partial order is important use sequence
number
![Page 13: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/13.jpg)
Getting Data Out
![Page 14: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/14.jpg)
KCL Libraries available for Java, Ruby,
Node, Go, and a Multi-Lang
Implementation with Native Python
support
All State Management in Dynamo DB
Kinesis Client Library
DynamoDB
![Page 15: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/15.jpg)
Client library for fault-tolerant, at least-once, real-time processing
Kinesis Client Library (KCL) simplifies
reading from the stream by abstracting
your code from individual shards
Automatically starts a Worker Thread for
each Shard
Increases and decreases Thread count
as number of Shards changes
Uses checkpoints to keep track of a
Thread’s location in the stream
Restarts Threads & Workers if they fail
Consuming Data - Kinesis Enabled Applications
![Page 16: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/16.jpg)
Analytics Tooling Integration (github.com/awslabs/amazon-kinesis-connectors)S3
Batch Write Files for Archive into
S3
Sequence Based File Naming
RedshiftOnce Written to S3, Load to
Redshift
Manifest Support
User Defined Transformers
DynamoDBBatchPut Append to Table
User Defined Transformers
ElasticSearchAutomatically index Stream
Contents
Kinesis Connectors
S3 Dynamo
DB
Redshift
Kinesis
ElasticSearch
![Page 17: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/17.jpg)
Connectors Architecture
![Page 18: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/18.jpg)
Apache Storm
Kinesis Spout
Automatic Checkpointing with
Zookeeper
https://github.com/awslabs/kinesis-
storm-spout
Kinesis Ecosystem - Storm
Storm
Kinesis
![Page 19: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/19.jpg)
Apache Spark
DStream Receiver runs KCL
One DStream per Shard
Checkpointed via KCL
Spark Natively Available
on EMR
EMRFS overlay on HDFS
AMI 3.8.0
https://aws.amazon.com/elastic
mapreduce/details/spark
Kinesis Ecosystem - Spark
![Page 20: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/20.jpg)
Distributed Event Processing Platform
Stateless JavaScript & Java functions
run against an Event Stream
AWS SDK Built In
Configure RAM and Execution
Timeout
Functions automatically invoked
against a Shard
Community libraries for Python & Go
Access to underlying filesystem for
read/write
Call other Lambda Functions
Consuming Data - AWS Lambda
Kinesis
Shard 1
Shard 2
Shard 3
Shard 4
Shard n
![Page 21: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/21.jpg)
Why Kinesis? Durability
Regional Service
Synchronous Writes to Multiple
AZ’s
Extremely High Durability
?May be in-memory for
Performance
Requirement to understand Disk
Sync Semantics
User Managed Replication
Replication Lag -> RPO
![Page 22: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/22.jpg)
Why Kinesis? Performance
Perform continual processing on
streaming big data. Processing
latencies fall to a <1 second,
compared with the minutes or
hours associated with batch
processing
?Processing latencies < 1
second
Based on CPU & Disk
Performance
Cluster Interruption ->
Processing Outage
![Page 23: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/23.jpg)
Why Kinesis? Availability
Regional ServiceSynchronous Writes to Multiple
AZ’sExtremely High Durability
AZ, Networking, & Chain Server Issues Transparent to Producers
& Consumers
?Many Depend on a CP Database
Lost Quorum can result in
failure/inconsistency of the cluster
Highest Availability is determined
by Availability of Cross-AZ Links
or Availability of an AZ
![Page 24: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/24.jpg)
Why Kinesis? Operations
Managed service for real-time
streaming data collection,
processing and analysis. Simply
create a new stream, set the
desired level of capacity, and let
the service handle the rest
?Build InstancesInstall SoftwareOperate Cluster
Manage Disk SpaceManage Replication
Migrate to new Stream on Scale Up
![Page 25: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/25.jpg)
Why Kinesis? Elasticity
Seamlessly scale to match your data throughput rate and volume.
You can easily scale up to gigabytes per second. The
service will scale up or down based on your operational or
business needs
?Fixed Partition Count up Front
Maximum Performance ~ 1
Partition/Core | Machine
Convert from 1 Stream to
Another to Scale
Application Reconfiguration
![Page 26: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/26.jpg)
Scaling Streams
https://github.com/awslabs/amazon-kinesis-scaling-utils
![Page 27: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/27.jpg)
Why Kinesis? Cost
Cost-efficient for workloads of any scale. You can get started by
provisioning a small stream, and pay low hourly rates only for what
you use.Scale Up/Down Dynamically
$.015/Hour/1MB
?Run your Own EC2 Instances
Multi-AZ Configuration for
increased Durability
Utilise Instance AutoScaling on
Worker Lag from HEAD with
Custom Metrics
![Page 28: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/28.jpg)
Why Kinesis? Cost
Price Dropped on 2nd June 2015, Restructured to support KPL
Old Pricing: $.028 / 1M Records PUT
New Pricing: $.014/1M 25KB “Payload Units”
Units Cost Units Cost
Shards 50 $558 25 $279
PutRecords 4,320M
Records
$120.96 2,648M
Payload
Units
$37.50
$678.96 $316.50
Scenario: 50,000 Events / Second, 512B / Event = 24.4 MB/Second
Old Pricing New Pricing + KPL
![Page 29: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/29.jpg)
![Page 30: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/30.jpg)
Kinesis – Consumer Application Best Practices
Tolerate Failure of: Threads – Consider Data Serialisation issues and Lease Stealing;
Hardware – AutoScaling may add nodes as needed
Scale Consumers up and down as the number of Shards increase or decrease
Don’t store data in memory in the workers. Use an elastic data store such as Dynamo
DB for StateElastic Beanstalk provides all Best Practices in a simple to
deploy, multi-version Application Container
KCL will automatically redistribute Workers to use new
Instances
Logic implemented in Lambda doesn’t
require any Servers at all!
![Page 31: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/31.jpg)
Managing Application State
![Page 32: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/32.jpg)
Consumer Local State Anti-Pattern
Consumer binds to a configured
number of Partitions
Consumer stores the ‘state’ of a
data structure, as defined by the
event stream, on local storage
Read API can access that local
storage as a ‘shard’ of the overall
database
?
Consumer Consumer
Partition 1
Partition …
Partition P/2
Partition P/2+1
Partition …
Partition P
Local Disk Local Disk
Local Storage
Partitions P1..P/2
Local Storage
Partitions P/2+1..P
Read API
![Page 33: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/33.jpg)
Consumer Local State Anti-Pattern
But what happens when an instance
fails?
?
Consumer Consumer
Partition 1
Partition …
Partition P/2
Partition P/2+1
Partition …
Partition P
Local Disk Local Disk
Local Storage
Partitions P1..P/2
Local Storage
Partitions P/2+1..P
Read API
![Page 34: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/34.jpg)
Consumer Local State Anti-Pattern
A new consumer process starts
up for the required Partitions
Consumer must read from the
beginning of the Stream to
rebuild local storage
Complex, error prone, user
constructed software
Long Startup Time
?
Consumer Consumer
Partition 1
Partition …
Partition P/2
Partition P/2+1
Partition …
Partition P
Local Disk Local Disk
Local Storage
Partitions P1..P/2
Local Storage
Partitions P/2+1..P
Read API
T0
THead
![Page 35: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/35.jpg)
External Highly Available State – Best Practice
Consumer Consumer
Shard 1
Shard …
Shard S/2
Shard S/2+1
Shard…
Shard S
Read API
Consumer binds to a even number of
Shards based on number of
Consumers
Consumer stores the ‘state’ in
Dynamo DB
Dynamo DB is Highly Available,
Elastic & Durable
Read API can access Dynamo DB
DynamoDB
![Page 36: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/36.jpg)
External Highly Available State – Best Practice
Consumer binds to a even number of
Shards based on number of
Consumers
Consumer stores the ‘state’ in
Dynamo DB
Dynamo DB is Highly Available,
Elastic & Durable
Read API can access Dynamo DB
Read API
DynamoDB
Shard 1
Shard …
Shard S/2
Shard S/2+1
Shard…
Shard S
Consumer Consumer
![Page 37: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/37.jpg)
External Highly Available State – Best Practice
Read API
DynamoDB
AWS Lambda
Shard 1
Shard …
Shard S/2
Shard S/2+1
Shard…
Shard SConsumer binds to a even number of
Shards based on number of
Consumers
Consumer stores the ‘state’ in
Dynamo DB
Dynamo DB is Highly Available,
Elastic & Durable
Read API can access Dynamo DB
![Page 38: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/38.jpg)
Idempotency
![Page 39: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/39.jpg)
Property of a system whereby the repeated application of a function on a single input
results in the same end state of the system
…
Exactly Once Processing
![Page 40: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/40.jpg)
Idempotency – Writing Data
The Kinesis SDK & KPL may
retry PUT in certain
circumstances
Kinesis Record acknowledged
with a Sequence Number is
durable to Multiple Availability
Zones…
But there could be a duplicate
entry
My Application
PutRecord(s)
403 (Endpoint Redirect)
Sto
rage
![Page 41: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/41.jpg)
Idempotency – Writing Data
The Kinesis SDK & KPL may
retry PUT in certain
circumstances
Kinesis Record acknowledged
with a Sequence Number is
durable to Multiple Availability
Zones…
But there could be a duplicate
entry
My Application
PutRecord(s)
PutRecord(s)
200 (OK)
403 (Endpoint Redirect)
Sto
rage
Sto
rage
Write
123
Write
123
![Page 42: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/42.jpg)
Coming Soon…
![Page 43: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/43.jpg)
Idempotency – Rolling Idempotency Check
Kinesis will manage a rolling time window of Record ID’s in Dynamo DB
Record ID’s are User Based
Duplicates in storage tier will be acknowledged as Successful
My Application
PutRecord(s)
Sto
rage DynamoDB
X Hour Rolling
Window
![Page 44: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/44.jpg)
Idempotency – Rolling Idempotency Check
Kinesis will manage a rolling time window of Record ID’s in Dynamo DB
Record ID’s are User Based
Duplicates in storage tier will be acknowledged as Successful
My Application
PutRecord(s)
Sto
rage
Write Record ID
OK
DynamoDB
X Hour Rolling
Window
![Page 45: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/45.jpg)
Idempotency – Rolling Idempotency Check
Kinesis will manage a rolling time window of Record ID’s in Dynamo DB
Record ID’s are User Based
Duplicates in storage tier will be acknowledged as Successful
My Application
PutRecord(s)
403 (Endpoint Redirect)
Sto
rage
Write Record ID
OK
DynamoDB
X Hour Rolling
Window
![Page 46: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/46.jpg)
Idempotency – Rolling Idempotency Check
Kinesis will manage a rolling time window of Record ID’s in Dynamo DB
Record ID’s are User Based
Duplicates in storage tier will be acknowledged as Successful
My Application
PutRecord(s)
PutRecord(s)
403 (Endpoint Redirect)
Sto
rage
Sto
rage
Write Record ID
OK
Write Record ID
DynamoDB
X Hour Rolling
Window
ConditionCheckFailedException
![Page 47: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/47.jpg)
Idempotency – Rolling Idempotency Check
Kinesis will manage a rolling time window of Record ID’s in Dynamo DB
Record ID’s are User Based
Duplicates in storage tier will be acknowledged as Successful
My Application
PutRecord(s)
PutRecord(s)
200 (OK)
403 (Endpoint Redirect)
Sto
rage
Sto
rage
Write Record ID
OK
Write Record ID
DynamoDB
X Hour Rolling
Window
ConditionCheckFailedException
![Page 48: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/48.jpg)
Easy Administration
Real-time
Performance.
High Durability
High Throughput.
Elastic
S3, Redshift, &
DynamoDB
Integration
Large Ecosystem Low Cost
In Short…
![Page 49: Amazon Kinesis](https://reader031.fdocuments.in/reader031/viewer/2022013115/55c8e6f3bb61ebdd4c8b465c/html5/thumbnails/49.jpg)