Big Data is Dead, Long Live Business Intelligence?
Transcript of Big Data is Dead, Long Live Business Intelligence?
![Page 1: Big Data is Dead, Long Live Business Intelligence?](https://reader034.fdocuments.in/reader034/viewer/2022051716/58a1a1b71a28abac578b91c3/html5/thumbnails/1.jpg)
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Michael Muckel, Head of Data Platform
Markus Schmidberger, Data Platform Architect
Glomex GmbH – A ProSiebenSat.1 Media SE company
Berlin, April 12th 2016
Big Data is Dead,Long Live Business Intelligence?
berlin
![Page 2: Big Data is Dead, Long Live Business Intelligence?](https://reader034.fdocuments.in/reader034/viewer/2022051716/58a1a1b71a28abac578b91c3/html5/thumbnails/2.jpg)
Page 2Glomex GmbH – A ProSiebenSat.1 Media SE company
Glomex: A ProSiebenSat.1 company
![Page 3: Big Data is Dead, Long Live Business Intelligence?](https://reader034.fdocuments.in/reader034/viewer/2022051716/58a1a1b71a28abac578b91c3/html5/thumbnails/3.jpg)
Page 3Glomex GmbH – A ProSiebenSat.1 Media SE company
Glomex – The Global Media Exchange
Publishers
Content providers
Video Value Platform
Media Delivery Platform
Media Exchange Platform
Glomex
External broadcasters
Web-only content owners
Non-P7S1 publishers
![Page 4: Big Data is Dead, Long Live Business Intelligence?](https://reader034.fdocuments.in/reader034/viewer/2022051716/58a1a1b71a28abac578b91c3/html5/thumbnails/4.jpg)
Page 4Glomex GmbH – A ProSiebenSat.1 Media SE company
Glomex – Data Platform
Video Value Platform Media Delivery Platform Media Exchange Platform
Data Platform
Real-time-Monitoring Batch Analytics Machine Learning
![Page 5: Big Data is Dead, Long Live Business Intelligence?](https://reader034.fdocuments.in/reader034/viewer/2022051716/58a1a1b71a28abac578b91c3/html5/thumbnails/5.jpg)
Page 5Glomex GmbH – A ProSiebenSat.1 Media SE company
Key Components of our New Data Platform
Content Discovery Find the most relevant content for our customers and their users.
Real-Time MonitoringEnable our development teams to serve our content to our users in the best quality possible.
AnalyticsProvide our teams access to the data to enable data-driven development of new features and products.
![Page 6: Big Data is Dead, Long Live Business Intelligence?](https://reader034.fdocuments.in/reader034/viewer/2022051716/58a1a1b71a28abac578b91c3/html5/thumbnails/6.jpg)
Page 6Glomex GmbH – A ProSiebenSat.1 Media SE company
Lambda Architecture
Graphic provided by http://lambda-architecture.net
≠ AWS Lambda
![Page 7: Big Data is Dead, Long Live Business Intelligence?](https://reader034.fdocuments.in/reader034/viewer/2022051716/58a1a1b71a28abac578b91c3/html5/thumbnails/7.jpg)
Page 7Glomex GmbH – A ProSiebenSat.1 Media SE company
ingest /collect
store process /analyze
visualize / serve
Simplify Data Processing
data answers
Time to Answer (Latency)Throughput
Cost
more concrete numbers at the end
![Page 8: Big Data is Dead, Long Live Business Intelligence?](https://reader034.fdocuments.in/reader034/viewer/2022051716/58a1a1b71a28abac578b91c3/html5/thumbnails/8.jpg)
Page 8Glomex GmbH – A ProSiebenSat.1 Media SE company
Collect Store Analyze Consume
A
iOS Android
Web Apps
Logstash
Amazon RDS
Amazon DynamoDB
AmazonES
AmazonS3
ApacheKafka
AmazonGlacier
AmazonKinesis
AmazonDynamoDB
Amazon Redshift
Impala
Pig
Amazon ML
Streaming
AmazonKinesis
AWSLambda
Amaz
on E
last
ic M
apRe
duce
AmazonElastiCache
Sea
rch
SQ
L N
oSQ
L C
ache
Stre
am P
roce
ssin
gB
atch
Inte
ract
ive
Logg
ing
Stre
am S
tora
ge
IoT
Appl
icat
ions
File
Sto
rage An
alys
is &
Vis
ualiz
atio
n
Hot
Cold
Warm
Hot
Slow
Hot
ML
Fast
Fast
Amazon QuickSight
TransactionalData
File Data
Stream Data
Not
eboo
ks
Predictions
Apps & APIs
Mobile Apps
IDE
Search Data
ETL
Data Processing in Big Data World
![Page 9: Big Data is Dead, Long Live Business Intelligence?](https://reader034.fdocuments.in/reader034/viewer/2022051716/58a1a1b71a28abac578b91c3/html5/thumbnails/9.jpg)
Page 9Glomex GmbH – A ProSiebenSat.1 Media SE company
Our Data Platform Architecture
INGEST STOREPROCESS &
ANALYSEVISUALIZE &
SERVE
AdProxy Log Import Service
Player Feedback Import Service
Data PlatformAccess
Data ScienceAnalytics Service
TechnicalMonitoring
Service
Dev / Ops Analytics Service
Content Discovery Service
KPI & Analytics Service
MetadataService
ContentImport Service
Data Platform Monitoring Service
Data QualityService
Data Management
Service
Data Layer
Data API
Data Lake
External Data Import Service
Portal
CDN files
data stream
data stream
Team
VAS Log Import Service
data stream
other modules
Real-Time Dashboards
ContentAPI
Data Platform - MicroService Layout
CDN Log Import Service
Data Science UI
![Page 10: Big Data is Dead, Long Live Business Intelligence?](https://reader034.fdocuments.in/reader034/viewer/2022051716/58a1a1b71a28abac578b91c3/html5/thumbnails/10.jpg)
Page 10Glomex GmbH – A ProSiebenSat.1 Media SE company
Real-Time Player Monitoring
INGEST STOREPROCESS &
ANALYSEVISUALIZE &
SERVE
AdProxy Log Import Service
Player Feedback Import Service
Data PlatformAccess
Data ScienceAnalytics Service
TechnicalMonitoring
Service
Dev / Ops Analytics Service
Content Discovery Service
KPI & Analytics Service
MetadataService
ContentImport Service
Data Platform Monitoring Service
Data QualityService
Data Management
Service
Data Layer
Data API
Data Lake
External Data Import Service
Portal
CDN files
data stream
data stream
Team
VAS Log Import Service
data stream
other modules
Real-Time Dashboards
ContentAPI
Data Platform - MicroService Layout
CDN Log Import Service
Data Science UI
![Page 11: Big Data is Dead, Long Live Business Intelligence?](https://reader034.fdocuments.in/reader034/viewer/2022051716/58a1a1b71a28abac578b91c3/html5/thumbnails/11.jpg)
Page 11Glomex GmbH – A ProSiebenSat.1 Media SE company
Monitoring Video-Streaming Experience
Focus on Metrics from the User‘s Perspective
From Server-Uptime To (anonymized) Real-User Monitoring
![Page 12: Big Data is Dead, Long Live Business Intelligence?](https://reader034.fdocuments.in/reader034/viewer/2022051716/58a1a1b71a28abac578b91c3/html5/thumbnails/12.jpg)
Page 12Glomex GmbH – A ProSiebenSat.1 Media SE company
Analyze
Take ActionsAutomate
1
23
![Page 13: Big Data is Dead, Long Live Business Intelligence?](https://reader034.fdocuments.in/reader034/viewer/2022051716/58a1a1b71a28abac578b91c3/html5/thumbnails/13.jpg)
Page 13Glomex GmbH – A ProSiebenSat.1 Media SE company
Our Ingest Process
![Page 14: Big Data is Dead, Long Live Business Intelligence?](https://reader034.fdocuments.in/reader034/viewer/2022051716/58a1a1b71a28abac578b91c3/html5/thumbnails/14.jpg)
Page 14Glomex GmbH – A ProSiebenSat.1 Media SE company
Kinesis Firehose is doing his job
Next session: “Streaming Data: The Opportunity and
How to Work With It”
![Page 15: Big Data is Dead, Long Live Business Intelligence?](https://reader034.fdocuments.in/reader034/viewer/2022051716/58a1a1b71a28abac578b91c3/html5/thumbnails/15.jpg)
Page 15Glomex GmbH – A ProSiebenSat.1 Media SE company
Data Facts
20 GB5 Billion
Per day click-stream data in Kinesis Firehose
Record processed per day
~100 ms Data freshness to S3
![Page 16: Big Data is Dead, Long Live Business Intelligence?](https://reader034.fdocuments.in/reader034/viewer/2022051716/58a1a1b71a28abac578b91c3/html5/thumbnails/16.jpg)
Page 16Glomex GmbH – A ProSiebenSat.1 Media SE company
ElasticSearch + Grafana for real-time analyses
Not AWS managed!
![Page 17: Big Data is Dead, Long Live Business Intelligence?](https://reader034.fdocuments.in/reader034/viewer/2022051716/58a1a1b71a28abac578b91c3/html5/thumbnails/17.jpg)
Page 17Glomex GmbH – A ProSiebenSat.1 Media SE company
ElasticSearch on Spot Instances
![Page 18: Big Data is Dead, Long Live Business Intelligence?](https://reader034.fdocuments.in/reader034/viewer/2022051716/58a1a1b71a28abac578b91c3/html5/thumbnails/18.jpg)
Page 18Glomex GmbH – A ProSiebenSat.1 Media SE company
CDN ’Batch Processing’
INGEST STOREPROCESS &
ANALYSEVISUALIZE &
SERVE
AdProxy Log Import Service
Player Feedback Import Service
Data PlatformAccess
Data ScienceAnalytics Service
TechnicalMonitoring
Service
Dev / Ops Analytics Service
Content Discovery Service
KPI & Analytics Service
MetadataService
ContentImport Service
Data Platform Monitoring Service
Data QualityService
Data Management
Service
Data Layer
Data API
Data Lake
External Data Import Service
Portal
CDN files
data stream
data stream
Team
VAS Log Import Service
data stream
other modules
Real-Time Dashboards
ContentAPI
Data Platform - MicroService Layout
CDN Log Import Service
Data Science UI
![Page 19: Big Data is Dead, Long Live Business Intelligence?](https://reader034.fdocuments.in/reader034/viewer/2022051716/58a1a1b71a28abac578b91c3/html5/thumbnails/19.jpg)
Page 19Glomex GmbH – A ProSiebenSat.1 Media SE company
Processing CDN-Logs
25 GB300 Million
Per day as zipped log-files
Record processed per day
+
Normal challenges with external data sourcesOut-of-order deliver / Data quality issues / Varying file sizes / etc.
![Page 20: Big Data is Dead, Long Live Business Intelligence?](https://reader034.fdocuments.in/reader034/viewer/2022051716/58a1a1b71a28abac578b91c3/html5/thumbnails/20.jpg)
Page 20Glomex GmbH – A ProSiebenSat.1 Media SE company
Requirements for our Data Processing Pipeline
Monitor Complete Pipeline
Enable Reprocessing of Historical Datasets
Be Ready to Scale
![Page 21: Big Data is Dead, Long Live Business Intelligence?](https://reader034.fdocuments.in/reader034/viewer/2022051716/58a1a1b71a28abac578b91c3/html5/thumbnails/21.jpg)
Page 21Glomex GmbH – A ProSiebenSat.1 Media SE company
Our CDN Pipeline
![Page 22: Big Data is Dead, Long Live Business Intelligence?](https://reader034.fdocuments.in/reader034/viewer/2022051716/58a1a1b71a28abac578b91c3/html5/thumbnails/22.jpg)
Page 22Glomex GmbH – A ProSiebenSat.1 Media SE company
• How to process 800MB gziped logfile?
• How to split compressed gzip files?
• Splitter using Amazon SQS and Amazon EC2 Spot Instances
AWS Lambda Limits5 min
512 MBAWS Lambda Timeout
AWS Lambda temp disk
![Page 23: Big Data is Dead, Long Live Business Intelligence?](https://reader034.fdocuments.in/reader034/viewer/2022051716/58a1a1b71a28abac578b91c3/html5/thumbnails/23.jpg)
Our Meta Data Store
https://blogs.aws.amazon.com/bigdata/post/Tx2YRX3Y16CVQFZ/Building-and-Maintaining-an-Amazon-S3-Metadata-Index-without-Servers
AWS Big Data Blog:
![Page 24: Big Data is Dead, Long Live Business Intelligence?](https://reader034.fdocuments.in/reader034/viewer/2022051716/58a1a1b71a28abac578b91c3/html5/thumbnails/24.jpg)
Page 24Glomex GmbH – A ProSiebenSat.1 Media SE company
Our Meta Data Store
![Page 25: Big Data is Dead, Long Live Business Intelligence?](https://reader034.fdocuments.in/reader034/viewer/2022051716/58a1a1b71a28abac578b91c3/html5/thumbnails/25.jpg)
Page 25Glomex GmbH – A ProSiebenSat.1 Media SE company
Be serverless and serve data
AWS Lambda AWS Lambda Amazon API GatewayAmazon Kinesis
![Page 26: Big Data is Dead, Long Live Business Intelligence?](https://reader034.fdocuments.in/reader034/viewer/2022051716/58a1a1b71a28abac578b91c3/html5/thumbnails/26.jpg)
Page 26Glomex GmbH – A ProSiebenSat.1 Media SE company
CDN Batch Facts
600 rec/sec
1 $ / hour
Processing time
Cost for 25 GB/dayCDN processing
6 Parallel AWS Lambda functions
2.3 min Average run-time of AWS Lambda AWS Lambda duration
Redshift CPU
![Page 27: Big Data is Dead, Long Live Business Intelligence?](https://reader034.fdocuments.in/reader034/viewer/2022051716/58a1a1b71a28abac578b91c3/html5/thumbnails/27.jpg)
Page 27Glomex GmbH – A ProSiebenSat.1 Media SE company
Data Science Environment
INGEST STOREPROCESS &
ANALYSEVISUALIZE &
SERVE
AdProxy Log Import Service
Player Feedback Import Service
Data PlatformAccess
Data ScienceAnalytics Service
TechnicalMonitoring
Service
Dev / Ops Analytics Service
Content Discovery Service
KPI & Analytics Service
MetadataService
ContentImport Service
Data Platform Monitoring Service
Data QualityService
Data Management
Service
Data Layer
Data API
Data Lake
External Data Import Service
Portal
CDN files
data stream
data stream
Team
VAS Log Import Service
data stream
other modules
Real-Time Dashboards
ContentAPI
Data Platform - MicroService Layout
CDN Log Import Service
Data Science UI
![Page 28: Big Data is Dead, Long Live Business Intelligence?](https://reader034.fdocuments.in/reader034/viewer/2022051716/58a1a1b71a28abac578b91c3/html5/thumbnails/28.jpg)
Page 28Glomex GmbH – A ProSiebenSat.1 Media SE company
Data Science Environment
Project Jupyter: http://jupyter.org/
![Page 29: Big Data is Dead, Long Live Business Intelligence?](https://reader034.fdocuments.in/reader034/viewer/2022051716/58a1a1b71a28abac578b91c3/html5/thumbnails/29.jpg)
Page 29Glomex GmbH – A ProSiebenSat.1 Media SE company
Data Science Environment - Architecture
Amazon Redshift Amazon S3 Elasticsearch
Amazon EMR
Amazon Kinesis
Github
Dat
a So
urce
sC
lust
er
Tech
nolo
gyD
evel
opm
ent
In development
In development
![Page 30: Big Data is Dead, Long Live Business Intelligence?](https://reader034.fdocuments.in/reader034/viewer/2022051716/58a1a1b71a28abac578b91c3/html5/thumbnails/30.jpg)
Page 30Glomex GmbH – A ProSiebenSat.1 Media SE company
Our Lambda Architecture on AWS
Batch Layer
Speed Layer
Serving Layer
Applications
Amazon KinesisFirehose
S3
EC2 with ElasticSearch
AmazonRedshift
Amazon ElasticMapReduce + Spark
Amazon API Gateway
EC2 withJupyther
EC2 withGrafana
EC2 withCaravel
data stream
CDN files Portal
Team
Instancewith Kinesis
Agent
AWS Lambda
other player
modules
Data Platform - Lambda Architecture
AWS Lambda
AWS Lambda
![Page 31: Big Data is Dead, Long Live Business Intelligence?](https://reader034.fdocuments.in/reader034/viewer/2022051716/58a1a1b71a28abac578b91c3/html5/thumbnails/31.jpg)
Page 31Glomex GmbH – A ProSiebenSat.1 Media SE company
Key Takeaways
Lambda Architecture
Enrich your traditional, batch-driven BI-workflow with real-time analytics
Use Lambda-Architecture as a guiding principle and adapt it to your needs
![Page 32: Big Data is Dead, Long Live Business Intelligence?](https://reader034.fdocuments.in/reader034/viewer/2022051716/58a1a1b71a28abac578b91c3/html5/thumbnails/32.jpg)
Page 32Glomex GmbH – A ProSiebenSat.1 Media SE company
Key Takeaways
AWS managed services provide an robust way to run complex big data infrastructures
Follow best-practices provided by AWS and the community
Focus on features development and robust pipelines not on infrastructure management
![Page 33: Big Data is Dead, Long Live Business Intelligence?](https://reader034.fdocuments.in/reader034/viewer/2022051716/58a1a1b71a28abac578b91c3/html5/thumbnails/33.jpg)
Page 33Glomex GmbH – A ProSiebenSat.1 Media SE company
Key Takeaways
Provide an open data environments
Structure your data that it can be access in processed and raw form
Trust the creativity of your engineering teams to find insights in your datasets
Notebooks provide easy access to even large distributed datasets
![Page 34: Big Data is Dead, Long Live Business Intelligence?](https://reader034.fdocuments.in/reader034/viewer/2022051716/58a1a1b71a28abac578b91c3/html5/thumbnails/34.jpg)
Michael Muckel, Head of Data Platform
Markus Schmidberger, Data Platform Architect
Glomex GmbH – A ProSiebenSat.1 Media SE company
We are hiring …
• Data Scientists• Data Engineers
• Project Managers