Replicated RocksDB at Pinterest @scale 2016 San Jose

Post on 16-Apr-2017

190 views 0 download

Transcript of Replicated RocksDB at Pinterest @scale 2016 San Jose

August 31, 2016

PinterestEngineering

Bo LiuSoftware Engineer, Serving Systems

Replicated RocksDB at Pinterest

Kafka

Example 1WritesReads

John saw Pin 1, Pin 2, …Pin K at Time T

Online event tracking system

Kafka

Example 1Writes

Fetch the last 1,000 Pins seen by John

Reads

John saw Pin 1, Pin 2, …Pin K at Time T

Online event tracking system

Kafka

Example 1Writes

Fetch the last 1,000 Pins seen by John

Fetch the number of Pins seen by John between Time T1 and T2

Reads

John saw Pin 1, Pin 2, …Pin K at Time T

Online event tracking system

Kafka

Example 2WritesReads

John just followed Board 1

Board based Pin retrieving and ranking system

Kafka

Example 2WritesReads

John just followed Board 1

Board based Pin retrieving and ranking system

Pin 1 was just saved to Board 1

Kafka

Example 2Writes

Fetch the most relevant Pins followed by John

Reads

John just followed Board 1

Board based Pin retrieving and ranking system

Pin 1 was just saved to Board 1

Kafka

Example 3WritesReads

Add u to HyperLogLog A

Distributed storage system with data structure support

Kafka

Example 3WritesReads

Add u to HyperLogLog A

Distributed storage system with data structure support

Add e to List B

Fetch List B

Kafka

Example 3WritesReads

Add u to HyperLogLog A

Distributed storage system with data structure support

Add e to List B

Fetch List B

Fetch the unique member # of HyperLogLog A

Kafka

Example 3WritesReads

Add u to HyperLogLog A

Distributed storage system with data structure support

Add e to List B

RocksDB Replicator

Application API Admin API

Generate cluster config

Application Logic Admin Logic ZooKeeper

Admin tool

Common system architecture

Rocks DBRocks DBRocks DBRocks DB

RocksDB Replicator

Generate cluster config

Admin tool

Load configwhen start

Application API Admin API

Application Logic Admin Logic ZooKeeper

Common system architecture

Rocks DBRocks DBRocks DBRocks DB

RocksDB Replicator

Generate cluster config

Admin tool

Load configwhen start

ZooKeeper

Application API Admin API

Application Logic Admin Logic

Create/Open DB

Common system architecture

Rocks DBRocks DBRocks DBRocks DB

RocksDB Replicator

Generate cluster config

Admin tool

Load configwhen start

ZooKeeper

Add/Remove DB for replication

Application API Admin API

Application Logic Admin Logic

Create/Open DB

Common system architecture

Rocks DBRocks DBRocks DBRocks DB

Generate cluster config

Admin tool

Load configwhen start

Create/Open DB Add/Remove DB for replication

Data Replicationlocal updates

remote updates

Application API Admin API

Application Logic Admin Logic

RocksDB Replicator

ZooKeeper

Common system architecture

Rocks DBRocks DBRocks DBRocks DB

Generate cluster config

Load configwhen start

Create/Open DB Add/Remove DB for replication

Data Replicationlocal updates

remote updates

RocksDB Replicator

ZooKeeper

Cluster management

Application API Admin APIAdmin tool

Application Logic Admin Logic

Common system architecture

Rocks DBRocks DBRocks DBRocks DB

Cluster managementGenerate cluster config

Load configwhen start

Create/Open DB Add/Remove DB for replication

Data Replicationlocal updates

remote updates

RocksDB Replicator

Admin tool

GetDB()

Application API Admin API

Admin Logic ZooKeeperApplication Logic

Common system architecture

Rocks DBRocks DBRocks DBRocks DB

Cluster managementGenerate cluster config

Load configwhen start

Create/Open DB Add/Remove DB for replication

Data Replicationlocal updates

remote updates

RocksDB Replicator

Admin tool

GetDB()ZooKeeper

Read/Write

Common system architectureApplication API Admin API

Application Logic Admin Logic

Rocks DBRocks DBRocks DBRocks DB

RocksDB replicator design•Support async Master-Slave replication only•Replicate multiple RocksDBs in one process•Replication role at RocksDB instance level•Work reactively ( AddDB(), RemoveDB() )•Low replication latency

RocksDB replicator implementation•RocksDB WAL sequence # as global replication sequence #

•fbthrift for RPC•Pull & Push

Latest SEQ #

Thrift Server

Worker threads

RocksDB replicator workflow

DB1 Master

DB2 Slave

Upstream: ip_Port

Get update sinceSEQ# for DB2Latest SEQ #

Thrift Server

Worker threads

RocksDB replicator workflow

DB1 Master

DB2 Slave

Upstream: ip_Port

Get update sinceSEQ# for DB2

Updates since SEQ# for DB2

Latest SEQ #

Thrift Server

Worker threads

RocksDB replicator workflow

DB1 Master

DB2 Slave

Upstream: ip_Port

Apply updates

Get update sinceSEQ# for DB2

Updates since SEQ# for DB2

Latest SEQ #

Thrift Server

Worker threads

RocksDB replicator workflow

DB1 Master

DB2 Slave

Upstream: ip_Port

Get updates since SEQ# for DB1

Thrift Server

Worker threads

RocksDB replicator workflow

DB1 Master

DB2 Slave

Upstream: ip_Port

Get updates since SEQ# for DB1

Thrift Server

Worker threads

Send request

RocksDB replicator workflow

DB1 Master

DB2 Slave

Upstream: ip_Port

Get updates since SEQ# for DB1

Thrift Server

Worker threads

Send request

Has updates since SEQ#?

RocksDB replicator workflow

DB1 Master

DB2 Slave

Upstream: ip_Port

Get updates since SEQ# for DB1

Thrift Server

Worker threads

Send requestYes, this is the data

Has updates since SEQ#?

RocksDB replicator workflow

DB1 Master

DB2 Slave

Upstream: ip_Port

Get updates since SEQ# for DB1

Thrift Server

Worker threads

Send requestResponseYes, this is the data

Has updates since SEQ#?

RocksDB replicator workflow

DB1 Master

DB2 Slave

Upstream: ip_Port

Response

Get updates since SEQ# for DB1

Thrift Server

Worker threads

Send requestResponseYes, this is the data

Has updates since SEQ#?

RocksDB replicator workflow

DB1 Master

DB2 Slave

Upstream: ip_Port

Get updates since SEQ# for DB1

Thrift Server

Worker threads

RocksDB replicator workflow

DB1 Master

DB2 Slave

Upstream: ip_Port

Get updates since SEQ# for DB1

Thrift Server

Worker threads

Send request

RocksDB replicator workflow

DB1 Master

DB2 Slave

Upstream: ip_Port

Get updates since SEQ# for DB1

Thrift Server

Worker threads

Send request

Has updates since SEQ#?

RocksDB replicator workflow

DB1 Master

DB2 Slave

Upstream: ip_Port

No, wait for my notification

Get updates since SEQ# for DB1

Thrift Server

Worker threads

Send request

Has updates since SEQ#?

RocksDB replicator workflow

DB1 Master

DB2 Slave

Upstream: ip_Port

Get updates since SEQ# for DB1

Thrift Server

Worker threads

Send request

Writes

No, wait for my notification

Has updates since SEQ#?

RocksDB replicator workflow

DB1 Master

DB2 Slave

Upstream: ip_Port

Get updates since SEQ# for DB1

Thrift Server

Worker threads

Send requestNo, wait for my notification

Has updates since SEQ#?

These are the new updates

RocksDB replicator workflowWrites

DB1 Master

DB2 Slave

Upstream: ip_Port

Get updates since SEQ# for DB1

Thrift Server

Worker threads

Send requestNo, wait for my notification

Has updates since SEQ#?

These are the new updates

Response

RocksDB replicator workflow

DB1 Master

DB2 Slave

Upstream: ip_Port

Writes

Response

Get updates since SEQ# for DB1

Thrift Server

Worker threads

RocksDB replicator workflow

Send requestNo, wait for my notification

Has updates since SEQ#?

These are the new updates

Response

DB1 Master

DB2 Slave

Upstream: ip_Port

Writes

•Production load: 1MB/s, P99 12ms, Max 60ms•Synthetic load: 76MB/s, P99 106ms, Max 224ms•Developer velocity: Build a production quality real-time counter service in one week

Performance

Cluster managementGenerate cluster config

Load configwhen start

Create/Open DB Add/Remove DB for replication

Data Replicationlocal updates

remote updates

RocksDB Replicator

Admin tool

GetDB()ZooKeeper

Read/Write

Application API Admin API

Rocks DBRocks DBRocks DBRocks DB

Application Logic Admin Logic

Open source - coming soon

Serving Systems Team @Pinterest

Thank you

Bo Liu, Shu Zhang, Jian Fang, Jinru He, Linda Lo, Yongsheng Wu

Data Analytics Team @PinterestBryant Xiao, Justin Mejorada Pier, Shuo Xiang,Qingxian Lai, Tien Nguyen, Chunyan Wang

Q&A