[email protected] Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @...

47
Scaling the Data Infrastructure @ Spotify [email protected]

Transcript of [email protected] Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @...

Page 1: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

Scaling the Data Infrastructure @ Spotify

[email protected]

Page 2: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

Matti Pehrs

> 25 years in IT

Emacs since 1987

Java since 1995

Spotify since 2013

Page 3: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

Agenda

1. Data at Spotify

2. Summer of 2015

3. Challenges & Victory

○ Datamon

○ Styx

○ GABO

Page 4: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

Data at Spotify

Page 5: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

Hadoop

Page 6: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

Data history through Hadoop

➔ Started with 5 servers in the office 2007

➔ Tried Amazon Elastic Map/Reduce 2010

➔ In 2012 we started using Hortonworks HDP

➔ We now run a 2000+ node cluster in London

➔ We are out of physical space!

Page 7: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

Spotify is moving to the cloud

Page 8: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

It’s all about focus.

Spotify’s core businessis to serve musicnot operate data centers.

Page 9: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

Spotify big-data context

● Over 100 million monthly active users

● Over 30 million song

● Over 2 billion playlists

● Active in 60 markets

Page 10: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

Our growth in Data

Users

+50 TB/day

Developers

+60 TB/day+10k M/R jobs

Page 11: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

Data is at the heart of Spotify

In 2007

- Reporting

In 2017

- Reporting- All features use

big data in some form

Page 12: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

Hadoop

Autonomy & Dependencies

Team A

Team B

Team C

Page 13: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

Autonomy & Dependencies

Page 14: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

Autonomy & Dependencies

Page 15: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

Autonomy & Dependencies

Page 16: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

Summer of 2015

Page 17: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

Summer of Incidents

Page 18: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

● A strain of incidents

Summer of Incidents

Page 19: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

● A strain of incidents

● War-room

Summer of Incidents

Page 20: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

● A strain of incidents

● War-room

● Hadoop on it’s knees

Summer of Incidents

Page 21: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

● A strain of incidents

● War-room

● Hadoop on it’s knees

● Event Delivery Catch up

Summer of Incidents

Page 22: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

● A strain of incidents

● War-room

● Hadoop on it’s knees

● Event Delivery Catch up

● Reprocessing of data

Summer of Incidents

Page 23: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

● A strain of incidents

● War-room

● Hadoop on it’s knees

● Event Delivery Catch up

● Reprocessing of data

● Hard to debug data issues

Summer of Incidents

Page 24: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

Challenges and the path to victory...

Page 25: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

1. Early Warning Datamon - Data monitoring

Challenges and the path to victory...

Page 26: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

1. Early Warning Datamon - Data monitoring

2. Debuggability & Control Styx - Scheduling and control

Challenges and the path to victory...

Page 27: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

1. Early Warning Datamon - Data monitoring

2. Debuggability & Control Styx - Scheduling and control

3. Automate Capacity GABO - Event Delivery

Challenges and the path to victory...

Page 28: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

1. Early Warning Datamon - Data monitoring

2. Debuggability & Control Styx - Scheduling and control

3. Automate Capacity GABO - Event Delivery

Challenges and the path to victory...

Page 29: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

Early Warning - Datamon

Page 30: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

● Unified view○ Alignment between teams

● Ownership○ Clear ownership of data

● SLA○ Alert on late data

Early Warning - Datamon

Page 31: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

● Define terminology

● Provide metadata language

● Implement a Datamon service

Early Warning - Datamon

Page 32: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

1. Early Warning Datamon - Data monitoring

2. Debuggability & Control Styx - Scheduling and control

3. Automate Capacity GABO - Event Delivery

Challenges and the path to victory...

Page 33: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

- Execution control- Self service for data users

- Execution information- Expose debug information

- Execution isolation- Docker for data jobs

Debuggability & Control - Styx

The river Styx

Page 34: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

● Execution control

○ Centralized execution API

Debuggability & Control - Styx

Page 35: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

Debuggability & Control - Styx

● Execution control

○ Centralized execution API

○ Backfilling and reprocessing

Page 36: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

● Execution control

● Execution information

○ Timeline

Debuggability & Control - Styx

Page 37: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

Debuggability & Control - Styx

● Execution control

● Execution information

○ Timeline

○ Google Cloud Logging

Page 38: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

Debuggability & Control - Styx

● Execution control

● Execution information

● Execution isolation○ Docker

Page 39: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

1. Early Warning Datamon - Data monitoring

2. Debuggability & Control Styx - Scheduling and control

3. Automate Capacity GABO - Event Delivery

Challenges and the path to victory...

Page 40: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

● Complex and manual config

Automate Capacity - GABO/Event Delivery

Page 41: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

● Complex and manual config

● Pubsub & Dataflow streaming

Automate Capacity - GABO/Event Delivery

Page 42: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

● Complex and manual config

● Pubsub & Dataflow streaming

● Pubsubs at scale

Automate Capacity - GABO/Event Delivery

Page 43: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

● Complex and manual config

● Pubsub & Dataflow streaming

● Pubsubs at scale

● Dataflow streaming

Automate Capacity - GABO/Event Delivery

Page 44: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

● Complex and manual config

● Pubsub & Dataflow streaming

● Pubsubs at scale

● Dataflow streaming :-(

● 2 micro services + 1 Map/Reduce job

Automate Capacity - GABO/Event Delivery

Page 45: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

● Complex and manual config

● Pubsub & Dataflow streaming

● Pubsubs at scale

● Dataflow streaming :-(

● 2 micro services + 1 Map/Reduce job

● Autoscaling & The Stuffer

Automate Capacity - GABO/Event Delivery

Page 46: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

● Make sure you have the right tools to deal with data incidents○ Make sure you have time to

implement the tools you need

● Remember that your capacity model can fail at larger scale○ Keep track of your scale and

Automate, automate, automate...

Summary

Page 47: matti@spotify.com Scaling the Data Infrastructure @ Spotify · Scaling the Data Infrastructure @ Spotify matti@spotify.com. ... Styx GABO. Data at Spotify. Hadoop. Data history through

Thank [email protected]

Want to join the band? http://spoti.fi/jobs