Databus - LinkedIn's Change Data Capture Pipeline

Recruiting SolutionsRecruiting SolutionsRecruiting Solutions

Databus

LinkedIn’s Change Data Capture Pipeline

Databus Team @ LinkedInSunil Nagarajhttp://www.linkedin.com/in/sunilnagaraj

EventbriteMay 07 2013

http://www.linkedin.com/in/sunilnagaraj

http://www.linkedin.com/in/sunilnagaraj

Talking Points

Motivation and Use-Cases Design Decisions Architecture Sample Code Performance Databus at LinkedIn Review

The Consequence of Specialization in Data Systems

Data Consistency is critical !!!

Data Flow is essential

Extract changes from database commit log

Tough but possible

Consistent!!!

Application code dual writes to database and pub-sub system

Easy on the surface

Consistent?

Two Ways

5

Change Extract: Databus

PrimaryData Store

Data Change Events

StandardizationStandard

izationStandard

ization


izationSearch Index


izationGraph Index


izationRead

Replicas

Updates

Databus

6

Example: External Indexes

Description– Full-text and faceted search

over profile data

Requirements– Timeline consistency– Guaranteed delivery– Low latency– User-space visibility

Members

Update

skillsRecruiters

Search

Results

Change events

linkedin.com recruiter.linkedin.com

People Search IndexDatabus

A brief history of Databus

2006-2010 : Databus became an established and vital piece of infrastructure for consistent data flow from Oracle

2011 : Databus (V2) addressed scalability and operability issues

2012 : Databus supported change capture from Espresso

2013 : Open Source Databus– https://github.com/linkedin/databus

https://github.com/linkedin/databus

https://github.com/linkedin/databus

Databus Eco-system: Participants

Primary Data Store

Source Databus

Consumer

Application

Change Data

Capture

Change Event Stream

events

events

change data

• Support transactions

• Extract changed data of committed transactions

• Transform to ‘user-space’ events

• Preserve atomicity

• Receive change events quickly

• Preserve consistency with source

Databus Eco-System : Realities

Databases

Source Databus

Fast Consumer

Applications

Change Data

Capture

Change Event Stream

Slow Consumer

New Consumer

Every change

Changes since last week

Changes since last 5 seconds

Schemas evolve

• Source cannot be burdened by ‘long look back’ extracts

• Applications cannot be forced to move to latest version of schema at once

change data

events

10

Key Design Decisions : Semantics

Change Data Capture uses logical clocks attached to the source (SCN)– Change data stream is ordered by SCN – Simplifies data portability , change stream is f(SourceState,SCN)

Applications are idempotent– At least once delivery – Track progress reliably (SCN)– Timeline consistency

11

Key Design Decisions : Systems

Isolate fast consumers from slow consumers– Workload separation between online(recent), catch-up (old),

bootstrap (all)

Isolate sources from consumers– Schema changes– Physical layout changes– Speed mismatch

Schema-awareness– Compatibility checks– Filtering at change stream

12

The Components of Databus

DB

ChangeCapture

Event Buffer(In Memory)

change dataConsumer

Relay

Dat

abu

s C

lient

Application

online changes

Bootstrap

New ApplicationConsistent

snapshot

Log Store

Snapshot Store

online changes

Bootstrap Consumer

older changes

SlowApplication

Metadata

Change Data Capture

Contains logic to extract changes from source from specified SCN

Implementations– Oracle

Trigger-based Commit ordering Special instrumentation required

– MySQL Custom-storage-engine based

EventProducer

start(SCN ) //capture changes from specified SCN

SCN getSCN() //return latest SCN

Change Data Capture

SCN

Database Schemas

Databus 14

MySQL : Change Data Capture

MySQLMaster

MySQL Slave

MySql replication

TCP Channel

• MySQL Replication takes care of • bin-log parsing• Protocol between master and slave• Handling restarts

• Relay• Provides a TCP Protocol interface to push events• Controls and Manages MySql Slave

Relay

Publish – Subscribe API

DB

Change Data

Capture


publish

extract (src,SCN)

Consumersubscribe (src,SCN)

EventBuffer

startEvents() //e.g. new txn

DbusEvent(enc(schema,changeData),src,pk) appendEvent(DbusEvent, ...) endEvents(SCN) //e.g. end of txn; commitrollbackEvents() //abort this window

Consumer

register(source, ‘Callback’)

onStartConsumption() //once

onStartDataEventSequence(SCN)

onStartSource(src,Schema)onDataEvent(DbusEvent e,…) onEndSource(src,Schema)

onEndDataEventSequence(SCN)onRollback(SCN)

onStopConsumption() //once

The Databus Change Event Stream


Relay

Bootstrap

Log Store

Snapshot Store

online changes

• Provide APIs to obtain change events• Query API specifies logical clock(SCN) and

source• ‘Get change events greater than SCN’• Filtering at source possible

• MOD, RANGE filter functions applied to primary key of the event

• Batching/Chunking to guarantee progress

• Does not contain state of consumers• Contains references to metadata and

schemas• Implementation

• HTTP server• Persistent connection to clients• REST APIChange Event Stream

Meta-data Management

Event definition, serialization and transport– Avro

Oracle, MySQL – Table schema generates Avro definition

Schema evolution– Only backwards-compatible changes allowed

Isolation of applications from changes in source schema Many versions of a source used by applications , but one

version(latest) of the change stream exists

The Databus Relay

ChangeCapture


Relay

Database Schemas

SrcMeta- data

• Encapsulates change capture logic and change event stream

• Source aware, schema aware

• Multi-tenant: Multiple Event Buffers representing change events of different databases

• Optimizations• Index on SCN exists to quickly

locate physical offset in EventBuffer• Locally stores SCN per source for

efficient restarts

• Large Event Buffers possible (> 2G)

SCN store API

Scaling Databus Relay

DB

Relay Relay Relay

• Peer relays, independent• Increased load on the source

DB with each additional relay instance

DB

RelayLeader

Relay(Follower)

• Relays in leader-follower cluster • Only the leader reads from DB ,

followers from leader• Leadership assigned dynamically• Small period of stream

unavailability during leadership transfer

Relay(Follower)

The Bootstrap Service

Bridges the continuum between stream and batch systems

Catch-all for slow / new consumers Isolate source instance from large scans Snapshot store has to be seeded once

Optimizations– Periodic merge– Filtering pushed down to store– Catch-up versus full bootstrap

Guaranteed progress for consumers via chunking

Multi-tenant - can contain data from many different databases

Implementations– Database (MySQL)– Raw Files

Relay

Bootstrap

Log Store

Snapshot Store

online changes

Bootstrap Consumer

seeding

Database

The Databus Client Library

Glue between Databus Change Stream and business logic in the Consumer

Switches between relay and bootstrap as needed

Optimizations– Change events uses batch write

API without deserialization

Periodically persists SCN for lossless recovery

Built-in support for parallelism– Consumers need to be thread-safe– Useful for scaling large batch processing

(bootstrap)

EventBuffer

Databus Change Stream

Change Stream Client

SCN store API

Dispatcher

Stream Consumer

Bootstrap Consumer

iterate

write

callback

read

Databus Client Library

Databus Applications

Consumer S1

Dat

abu

s C

lien

t

Application

Consumer S2

Consumer Sn

S1 S2

Sn

Change Streams

• Applications can process multiple independent change streams

• Failure of one won’t affect others

• Different logic and configuration settings for bootstrap and online consumption possible

• Processing can be tied to a particular version of schema

• Able to override client library persisted SCN

Client Application

(i=1..k)

Client Application

(k+1..N)

Change Stream

i= pk MOD N

(i=0..k-1)

(i=k..N-1)

• Databus Clients consume partitioned streams• Partitioning strategy: Range or Hash• Partitioning function applied at source• Number of partitions (N) , and list of partitions (i) specified

statically in configuration• Not easy to add/remove nodes

• Needs configuration change on all nodes

Client nodes uniform: can process any partition(s)

Clients distribute processing load

Scaling Applications - I

Client Application

N/m partitions

Application N/m

partitions

Databus Stream

i= pk mod N

Dynamically allocated partitions

N partitions distributed evenly amongst ‘m’ nodes

SCN written to central location

• Databus Clients consume partitioned streams• Partitioning strategy: MOD• Partition function applied at source• Number of partitions (N) , and cluster name specified

statically in configuration• Easy to add or remove nodes

• Dynamic redistribution of partitions • Fault tolerance for client nodes

Scaling Applications - II

Databus: Current Implementation

OS - Linux, written in Java , runs Java 6 All components have http interfaces Databus Client: Java

– Other language bindings possible– All communication with change stream via http

Libraries– Netty , for http client-servers– Avro , for serialization of change events– Helix , for cluster awareness

Sample Code: Simple Application

Sample Code - Consumer

Databus Performance : Relay

Relay– Saturates network with low CPU utilization

CPU utilization increases with more clients Increased poll interval (increase consumer latency ) reduces CPU

utilization

– Scales to 100’s of consumers (client instances)

Databus 29

Performance: Relay Throughput

5/10/13

Databus Performance : Consumer

Consumer– Latency primarily governed by ‘poll interval’– Low overhead of library in event fetch

Spike in latency due to network saturation at relay

Scaling number of consumers Use partitioned consumption (filtering at relay )

– Reduces network utilization , but some increase in latency due to filtering

Increase ‘poll interval’ , tolerate higher latencies

Databus 31

Performance: Consumer Throughput

5/10/13

Databus 32

Performance: End-End Latency

5/10/13

Databus Bootstrap :Performance

Bootstrap– Should we serve from ‘catchup store’ or ‘snapshot store’ – Depends: Traffic patterns in the spectrum ‘all updates’ , ‘all

inserts’ – Tune service depending on fraction of update and inserts

Favour snapshot based serving for update heavy traffic

Databus 34

Bootstrap Performance: Snapshot vs Catch-up

5/10/13

MOracle Change EventStream

MEspresso Change EventEvent Stream

Databus Service

• Databus Change Stream is a managed service

• Applications discover/lookup coordinates of sources

• Multi-tenant , chained relays

• Many sources can be bootstrapped from SCN 0 (beginning of time)

• Automated change stream provisioning is a work in progress

Databus at LinkedIn

Databus at LinkedIn : Monitoring

Available out of the box as JMX Mbean Metrics for health

– lag between update time at DB and the time at which it was received by application

– time of last contact to change event stream and source

Metrics for capacity planning– Event rate/ size – Request rate– Threads/ conns

Databus at LinkedIn: The Good

Source isolation: Bootstrap benefits– Typically, data extracted from sources just once (seeding)– Bootstrap service used during launch of new applications– Primary data store not subject to unpredictable high loads due to

lagging applications

Common Data Format– Avro offers ease-of-use , flexibility and performance

improvements (larger retention periods of change events in Relay)

Partitioned Stream Consumption– Applications horizontally scaled to 100’s of instances

Databus at LinkedIn: Operational Niggles

Oracle Change Capture Performance Bottlenecks– Complex joins– BLOBS and CLOBS– High update rate driven contention on trigger table

Bootstrap: Snapshot store seeding– Consistent snapshot extraction from large sources

Semi-automated change stream provisioning

39

Quick Review

Specialization in Data Systems– CDC pipeline is a first class infrastructure citizen up there with

stores and indexes

Source Independent– Change capture logic can be plugged in

Use of SCN – an external clock attached to source– Makes change stream more ‘portable’ – Easy for applications to reason about consistency with source

Pub-Sub API support atomicity semantics of transactions Bootstrap Service

– Isolates the source from abusive scans– Serves both streaming and batch use-cases

40

Questions

Additional Slides

The Timeline Consistent Data Flow problem

Databus: First attempt (2007)

Issues

Source database pressure caused by slow consumers

Brittle serialization

Databus - LinkedIn's Change Data Capture Pipeline

Documents

Transcript of Databus - LinkedIn's Change Data Capture Pipeline