Modern data architectures for real time analytics and engagement
-
Upload
amazon-web-services -
Category
Technology
-
view
335 -
download
2
Transcript of Modern data architectures for real time analytics and engagement
Modern Data Architectures for Real-Time Analytics & Engagement
Russell NashAPAC Solutions Architect
Russell NashAPAC Solutions ArchitectAmazon Web Services
SCALABLE FLEXIBLE MANAGEABLE COST EFFECTIVE
Modern Data Architecture
Ingest Serving
Speed (Real-time)
Scale (Batch)
Data analysts
Data scientists
Business users
Engagement platforms
Automation / events
Sources
Modern Data Architecture
Ingest Serving
Speed (Real-time)
Scale (Batch)
Data analysts
Data scientists
Business users
Engagement platforms
Automation / events
Sources
Real-time Pipeline
Amazon Kinesis
Machines
Devices
Mobile
Clickstream
Amazon Kinesis Streams
Amazon Kinesis Firehose
Amazon Kinesis Analytics
Kinesis Family
Availability Zone
Availability Zone
Availability Zone
Amazon Kinesis
Stream
AWS Lambda
KCL App
Amazon EMR
Streaming
Logs
Alerts
Analysis
Dashboards
Predictions
Amazon Kinesis Stream
SHARD1000 TPS or 1MB 5 TPS or 2MB
SHARD
2000 TPS or 2MB 10 TPS or 4MB
SHARD
3000 TPS or 3MB 15 TPS or 6MB
Retention: 24 hours to 7 Days
Creating a Kinesis Stream
Amazon Kinesis Stream
SHARD
SHARD
SHARD
EVENT PRODUCERS
KinesisEndpoint
Specify Partition Key
• Writes to one or more Amazon Kinesis Streams• Retry Mechanism• Uses PutRecords • Aggregates • Integrates with Amazon KCL to de-aggregate• Submits Amazon CloudWatch metrics
Kinesis Producer Library
Kinesis Agent
• Monitors files and sends new data records to your delivery stream• Handles file rotation, checkpointing, and retry upon failures• Delivers all data in a reliable, timely, and simple manner• Emits AWS CloudWatch metrics
Availability Zone
Availability Zone
Availability Zone
Amazon Kinesis
Stream
AWS Lambda
KCL App
Amazon EMR
Streaming
Logs
Alerts
Analysis
Dashboards
Predictions
Kinesis Data Out – Kinesis Client Library
SHARD 1
SHARD 2
SHARD 3
SHARD N
EC2 Instance
Worker 1
Worker 2
EC2 Instance
Worker 3
Worker N
KCL: Java, Node.js, Python, .NET, Ruby
twitter-trends.com
twitter-trends.com website
twitter-trends.com
The solution: Local Top 10
My top-10
My top-10
My top-10
Global top-10
KINESIS
twitter-trends.com
Challenges using the Kinesis API directly
Kinesisapplication
Manual creation of workers and assignment to shards
How many workers per EC2 instance?How many EC2 instances?
KINESIS
twitter-trends.com
Using the Kinesis Client Library
Kinesisapplication
Shard mgmt table
KINESIS
twitter-trends.com
Elasticity and load balancing
Shard mgmt table
Auto scaling Group
KINESIS
twitter-trends.com
Fault tolerance support in KCL
Shard mgmt table
XAvailability Zone
1
Availability Zone 3
Checkpoint, replay design pattern
Kinesis
1417182123
Shard-i235810
Shard ID
Lock Seq num
Shard-i
Host A
Host B
Shard ID
Local top-10
Shard-i
0
10
18X2
3
5
8
10
14
1718
2123
0
310
Host AHost B
{#Movies: 10235, #Weather: 9835, …}{#Movies: 10235, #Weather: 9910, …}
1023
1417
1821
23
Availability Zone
Availability Zone
Availability Zone
Amazon Kinesis
Stream
AWS Lambda
KCL App
Amazon EMR
Streaming
Logs
Alerts
Analysis
Dashboards
Predictions
Kinesis & Lambda
SHARD 1
SHARD 2
SHARD 3
SHARD N
AWS Lambda: Node.js, Java, Python, C#
AWS Lambda
LambdaBlueprints
Availability Zone
Availability Zone
Availability Zone
Amazon Kinesis
Stream
AWS Lambda
KCL App
Amazon EMR
Streaming
Logs
Alerts
Analysis
Dashboards
Predictions
Spark Core
SparkSQL
Spark Streaming
Spark R
Spark ML Graph X
Spark Core
SparkSQL
Spark Streaming
Spark R
Spark ML Graph X
StreamMicro
BatchesResults
Amazon Kinesis
Apache Kafka
Spark Core
SparkSQL
Spark Streaming
Spark R
Spark ML Graph X
Data Prep
Prediction Model
Train
TestSplit
70%
30%
Near Real-time Data
Training Data
SQL
ML
Ingest Serving
Speed (Real-time)
Scale (Batch)
Data analysts
Data scientists
Business users
Engagement platforms
Automation / events
Sources
Amazon Kinesis AWS Lambda
Application
Amazon EMR
Streaming
S3 (Log)
Amazon ElasticSearch(Dashboard)
Real-time Pipeline
AmazonElasticsearch
• Search and Analytics• Scalable• Fully Managed• Integrated – Logstash, Kibana
Ingest Serving
Speed (Real-time)
Scale (Batch)
Data analysts
Data scientists
Business users
Engagement platforms
Automation / events
Sources
Amazon Kinesis AWS Lambda
Application
Amazon EMR
Streaming
S3 (Logs)
Amazon ElasticSearch(Dashboards)
Amazon EMR(Predictions)
ML
Amazon SNS(Alerts)
Real-time Pipeline
Amazon Redshift
(Analytics)
Amazon Kinesis Streams
Amazon Kinesis Firehose
Amazon Kinesis Analytics
Kinesis Family
S3
Redshift
Elasticsearch
Amazon Kinesis Firehose
Auto provisioningAuto partition keysEnd to End Elastic
Batch Compress
Encrypt
Amazon Kinesis Streams
Amazon Kinesis Firehose
Amazon Kinesis Analytics
Kinesis Family
Kinesis Analytics
Stream or Firehose
Kinesis Analytics
Data OutData In
SQL
Stream or Firehose
Sonos
New X1 Instance - Tons of Memory
• Large-scale, in-memory applications
• Intel® Xeon® E7 8880 v3 Haswell processors
• Up to 2TB of memory
• Up to 128 vCPUs per instance
Intel® Processor Technologies
Intel® AVX – Dramatically increases performance for highly parallel HPC workloads such as life science engineering, data mining, financial analysis, media processing
Intel® AES-NI – Enhances security with new encryption instructions that reduce the performance penalty associated with encrypting/decrypting data
Intel® Turbo Boost Technology – Increases computing power with performance that adapts to spikes in workloads
Intel Transactional Synchronization (TSX) Extensions – Enables execution of transactions that are independent to accelerate throughput
P state & C state control – provides granular performance tuning for cores and sleep states to improve overall application performance
twitter.com/awsawscloudseasia
facebook.com/amazonwebservices/
youtube.com/user/AmazonWebServices
slideshare.net/amazonwebservices
Thank you for joining us today. Please complete the survey & let us know what you think of the webinar.
REGISTER NOWhttp://amzn.to/2jFt11NComplimentary labs are available only till 31 March 2017
Get hands on experience working with the AWS Technology.Access the complimentary Big Data on AWS self-paced labs
Q&A