Sound cloud - User & Partner Conference - AT Internet
-
Upload
at-internet -
Category
Technology
-
view
694 -
download
0
description
Transcript of Sound cloud - User & Partner Conference - AT Internet
![Page 1: Sound cloud - User & Partner Conference - AT Internet](https://reader034.fdocuments.in/reader034/viewer/2022052600/558198cad8b42a417f8b5112/html5/thumbnails/1.jpg)
Big Data with Amazon Redshift and ATINovember, 27th 2013
![Page 2: Sound cloud - User & Partner Conference - AT Internet](https://reader034.fdocuments.in/reader034/viewer/2022052600/558198cad8b42a417f8b5112/html5/thumbnails/2.jpg)
HI, I’M OLE
![Page 3: Sound cloud - User & Partner Conference - AT Internet](https://reader034.fdocuments.in/reader034/viewer/2022052600/558198cad8b42a417f8b5112/html5/thumbnails/3.jpg)
SOUNDCLOUD IS THE WORLD’S LEADING AUDIO PLATFORM
![Page 4: Sound cloud - User & Partner Conference - AT Internet](https://reader034.fdocuments.in/reader034/viewer/2022052600/558198cad8b42a417f8b5112/html5/thumbnails/4.jpg)
Every minute, creators upload
12hrs of audio
![Page 5: Sound cloud - User & Partner Conference - AT Internet](https://reader034.fdocuments.in/reader034/viewer/2022052600/558198cad8b42a417f8b5112/html5/thumbnails/5.jpg)
reaching over
250m
people every month
![Page 6: Sound cloud - User & Partner Conference - AT Internet](https://reader034.fdocuments.in/reader034/viewer/2022052600/558198cad8b42a417f8b5112/html5/thumbnails/6.jpg)
8% of the internet
![Page 7: Sound cloud - User & Partner Conference - AT Internet](https://reader034.fdocuments.in/reader034/viewer/2022052600/558198cad8b42a417f8b5112/html5/thumbnails/7.jpg)
![Page 8: Sound cloud - User & Partner Conference - AT Internet](https://reader034.fdocuments.in/reader034/viewer/2022052600/558198cad8b42a417f8b5112/html5/thumbnails/8.jpg)
FOO FIGHTERS SNOOP LION MADONNA MACKLEMOREPRESIDENT OBAMA JOHN OLIVER(DAILY SHOW/BUGLE)
SKRILLEX
![Page 9: Sound cloud - User & Partner Conference - AT Internet](https://reader034.fdocuments.in/reader034/viewer/2022052600/558198cad8b42a417f8b5112/html5/thumbnails/9.jpg)
![Page 10: Sound cloud - User & Partner Conference - AT Internet](https://reader034.fdocuments.in/reader034/viewer/2022052600/558198cad8b42a417f8b5112/html5/thumbnails/10.jpg)
How‘s the sales funnel performingin Brazil and what‘s the split between products?
![Page 11: Sound cloud - User & Partner Conference - AT Internet](https://reader034.fdocuments.in/reader034/viewer/2022052600/558198cad8b42a417f8b5112/html5/thumbnails/11.jpg)
• Avoid Silos
• Remove unnecessary restrictions
• Provide simple tools
• Teach People how to use data
DATA DEMOCRATIZATION
![Page 12: Sound cloud - User & Partner Conference - AT Internet](https://reader034.fdocuments.in/reader034/viewer/2022052600/558198cad8b42a417f8b5112/html5/thumbnails/12.jpg)
In one sentence:
DATA DEMOCRATIZATION
Deliver the right information to the
right person at the right time.
![Page 13: Sound cloud - User & Partner Conference - AT Internet](https://reader034.fdocuments.in/reader034/viewer/2022052600/558198cad8b42a417f8b5112/html5/thumbnails/13.jpg)
PRODUCTION DB
ANALYTICS DB
DATA ANALYSIS AND REPORTING
2010-2012
AT Internet
![Page 14: Sound cloud - User & Partner Conference - AT Internet](https://reader034.fdocuments.in/reader034/viewer/2022052600/558198cad8b42a417f8b5112/html5/thumbnails/14.jpg)
DATA ANALYSIS AND REPORTING
ListensSoundsUsersCommentsFavoritesSharesReposts
ImpressionsClicksConversionsSuggestionsDownloadsTaggingsUploads
![Page 15: Sound cloud - User & Partner Conference - AT Internet](https://reader034.fdocuments.in/reader034/viewer/2022052600/558198cad8b42a417f8b5112/html5/thumbnails/15.jpg)
DATA ANALYSIS AND REPORTING
Listens
timestamp
duration
sound
owner
listener
API-key
(location)
country
![Page 16: Sound cloud - User & Partner Conference - AT Internet](https://reader034.fdocuments.in/reader034/viewer/2022052600/558198cad8b42a417f8b5112/html5/thumbnails/16.jpg)
DATA ANALYSIS AND REPORTING
additional metadata:
• location within sound
• context (location on site)
• segmentation
Listening creates >6000 events/s
BIG DATA
![Page 17: Sound cloud - User & Partner Conference - AT Internet](https://reader034.fdocuments.in/reader034/viewer/2022052600/558198cad8b42a417f8b5112/html5/thumbnails/17.jpg)
HADOOP TO THE RESCUE
2 Datacenter in AMS
200+ Nodes
![Page 18: Sound cloud - User & Partner Conference - AT Internet](https://reader034.fdocuments.in/reader034/viewer/2022052600/558198cad8b42a417f8b5112/html5/thumbnails/18.jpg)
HADOOP TO THE RESCUE
listen data
listen metadata
search data
recommender data
product testing data
backend production data
backend logs
![Page 19: Sound cloud - User & Partner Conference - AT Internet](https://reader034.fdocuments.in/reader034/viewer/2022052600/558198cad8b42a417f8b5112/html5/thumbnails/19.jpg)
HADOOP AND DATA DEMOCRATIZATION
Data is siloed on hadoop
Data governance not existing
Technical hurdles for access
Not realtime
Slow access
![Page 20: Sound cloud - User & Partner Conference - AT Internet](https://reader034.fdocuments.in/reader034/viewer/2022052600/558198cad8b42a417f8b5112/html5/thumbnails/20.jpg)
AMAZON REDSHIFT
Fast fully managed DW service
Optimized for petabyte or more
datasets
Fast query and I/O performance
Columnar storage technology
![Page 21: Sound cloud - User & Partner Conference - AT Internet](https://reader034.fdocuments.in/reader034/viewer/2022052600/558198cad8b42a417f8b5112/html5/thumbnails/21.jpg)
Staging Area
Pig/Ruby Scripts
Amazon EMR
COPY
Pig/Ruby Scripts
Job execution powered by:
2013BI INFRASTRUCTURE
Data Exploration
Source Systems
Hadoop
MySql
External Systems
(production db)MySql
DataWarehouse
ETL Scripts ETL Scripts
AT Internet
![Page 22: Sound cloud - User & Partner Conference - AT Internet](https://reader034.fdocuments.in/reader034/viewer/2022052600/558198cad8b42a417f8b5112/html5/thumbnails/22.jpg)
How‘s the sales funnel performingin Brazil and what‘s the split between products?
![Page 23: Sound cloud - User & Partner Conference - AT Internet](https://reader034.fdocuments.in/reader034/viewer/2022052600/558198cad8b42a417f8b5112/html5/thumbnails/23.jpg)
ATI Data Query
Create query:
1. filter on funnel
pages
2.select metrics
and dimension
3.add REST URL to
ETL pipeline
![Page 24: Sound cloud - User & Partner Conference - AT Internet](https://reader034.fdocuments.in/reader034/viewer/2022052600/558198cad8b42a417f8b5112/html5/thumbnails/24.jpg)
Staging Area
Pig/Ruby Scripts
Amazon EMR
COPY
Pig/Ruby Scripts
Job execution powered by:
Data Exploration
Source Systems
Hadoop
MySql
External Systems
(production db)MySql
DataWarehouse
ETL Scripts ETL Scripts
AT Internet
![Page 25: Sound cloud - User & Partner Conference - AT Internet](https://reader034.fdocuments.in/reader034/viewer/2022052600/558198cad8b42a417f8b5112/html5/thumbnails/25.jpg)
DATA EXPLORATION
Simple and fast access to data
More time for “deep dives” into
data
Individualized Reporting
Allows interactivity between users
Integrated with RedShift
![Page 26: Sound cloud - User & Partner Conference - AT Internet](https://reader034.fdocuments.in/reader034/viewer/2022052600/558198cad8b42a417f8b5112/html5/thumbnails/26.jpg)
• Reports designed by end users
• Central repository for data analysis
• User interaction
• Data from one source only
• Scalable solution
• Data to the people!
DATA DEMOCRATIZATION
![Page 27: Sound cloud - User & Partner Conference - AT Internet](https://reader034.fdocuments.in/reader034/viewer/2022052600/558198cad8b42a417f8b5112/html5/thumbnails/27.jpg)
QUESTIONS?
![Page 28: Sound cloud - User & Partner Conference - AT Internet](https://reader034.fdocuments.in/reader034/viewer/2022052600/558198cad8b42a417f8b5112/html5/thumbnails/28.jpg)
THANK YOU!
P.S. WE’RE HIRING.SOUNDCLOUD.COM/JOBS
![Page 29: Sound cloud - User & Partner Conference - AT Internet](https://reader034.fdocuments.in/reader034/viewer/2022052600/558198cad8b42a417f8b5112/html5/thumbnails/29.jpg)
APPENDIX
![Page 30: Sound cloud - User & Partner Conference - AT Internet](https://reader034.fdocuments.in/reader034/viewer/2022052600/558198cad8b42a417f8b5112/html5/thumbnails/30.jpg)
First: Gather data from the several source systems into S3
Hadoop
MySql
External Systems
(production db)MySql
Full/Daily Imports
MapReduce for: - Listens - Plays- Impressions- Affiliations- ...
IMPORT DATA FROM SOURCE SYSTEMS
![Page 31: Sound cloud - User & Partner Conference - AT Internet](https://reader034.fdocuments.in/reader034/viewer/2022052600/558198cad8b42a417f8b5112/html5/thumbnails/31.jpg)
Second: Rebuild staging area tables for full imports
IMPORT DATA FROM SOURCE SYSTEMS
Staging Area
tracks users client applications
...
Based on configuration files
Create statements generated
Re-create DISTKEYS and SORTKEYS
Full control in changes in the data
model
yaml config files
![Page 32: Sound cloud - User & Partner Conference - AT Internet](https://reader034.fdocuments.in/reader034/viewer/2022052600/558198cad8b42a417f8b5112/html5/thumbnails/32.jpg)
Third: Import the data from S3 to RedShift
Staging Area
tracks users client applications
...
Full import: TRUNCATE & COPYDaily import: COPY
IMPORT DATA FROM SOURCE SYSTEMS
![Page 33: Sound cloud - User & Partner Conference - AT Internet](https://reader034.fdocuments.in/reader034/viewer/2022052600/558198cad8b42a417f8b5112/html5/thumbnails/33.jpg)
ETL scripts divided into layers:
- Layer 1: Staging -> DW (dimensions)
- Layer 2: Staging -> DW (fact tables - raw data)
- Layer 3: DW -> DW (aggregated fact tables)
- Layer 4: DW -> Reporting Data Cubes (reporting data)
ETL AND DW DATAMODEL
![Page 34: Sound cloud - User & Partner Conference - AT Internet](https://reader034.fdocuments.in/reader034/viewer/2022052600/558198cad8b42a417f8b5112/html5/thumbnails/34.jpg)
DataWarehouse
ETL AND DW DATAMODEL
Staging Area
Data CleaningData Transformation
Ruby/SQL Scripts
ETL Layer 1 & 2
Data Aggregation
Ruby/SQL Scripts
ETL Layer 3
Data Exploration
ETL Layer 4
Data Presentation
SQL
![Page 35: Sound cloud - User & Partner Conference - AT Internet](https://reader034.fdocuments.in/reader034/viewer/2022052600/558198cad8b42a417f8b5112/html5/thumbnails/35.jpg)
JOB SCHEDULE AND EXECUTION
Job-scheduling tool developed
internally
Set dependencies between jobs
Execution in multiple machines
Supports all the ETL layers
![Page 36: Sound cloud - User & Partner Conference - AT Internet](https://reader034.fdocuments.in/reader034/viewer/2022052600/558198cad8b42a417f8b5112/html5/thumbnails/36.jpg)
TIMELINEWeek 2 Week 4 Week 8 Week 10 Week 12 Week 14 Week 16
• Gap Analysis
• Business Exploration
(requirements
interviews)
• Information Mapping
Design
• Solution Design (Draft)
Requirement Analysis
Analysis Stage
End of Analysis Stage
Milestones Design & Build
• Define Infrastructure
• Design Data Model
Week 6
Infrastructure Ready!
• Build ETL
• Build Data Cubes
• Design Reports/Dashboards (Presentation
Layer)
BI 1.0 is built!
• System/Integration
Tests
• User Acceptance
BI 1.0 is tested!
• User Workshops
• BI 1.0 Evaluation
BI 1.0 is ready to use!
Test & Deploy