Change data capture with MongoDB and Kafka.

17
Change Data Capture with Mongo + Kaa By Dan Harvey

Transcript of Change data capture with MongoDB and Kafka.

Page 1: Change data capture with MongoDB and Kafka.

Change Data Capture with

Mongo + KafkaBy Dan Harvey

Page 2: Change data capture with MongoDB and Kafka.
Page 3: Change data capture with MongoDB and Kafka.

High level stack

React.js - Website

Node.js - API Routing

Ruby on Rails + MongoDB - Core API

Java - Opinion Streams, Search, Suggestions

Redshift - SQL Analytics

Page 4: Change data capture with MongoDB and Kafka.

Problems

• Keep user experience consistent

• Streams / search index need to update

• Keep developers efficient

• Loosely couple services

• Trust denormalisations

Page 5: Change data capture with MongoDB and Kafka.

Use case

• User to User recommender

• Suggest “interesting” users to a user

• Update as soon as you make a new opinion

• Instant feedback for contributing content

Page 6: Change data capture with MongoDB and Kafka.

Log transformationJava$Services

AvroRails$API

JSON/BSON

Mongo

Opinion

Optaileroplog

Kafka:User Topic User

Recommender

Change$data$capture

Stream$processing

User

Kafka:Opinion Topic

Page 7: Change data capture with MongoDB and Kafka.

Op(log)tailer

• Converts BSON/JSON to Avro

• Guarantees latest document in topic (eventually)

• Does not guarantee all changes

• Compacting Kafka topic (only keeps latest)

Page 8: Change data capture with MongoDB and Kafka.

Avro Schemas

• Each Kafka topic has a schema

• Schemas evolve over time

• Readers and Writers will have different schemas

• Allows us to update services independently

Page 9: Change data capture with MongoDB and Kafka.

Schema Changes

• Schema to ID managed by Confluent registry

• Readers and writers discover schemas

• Avro deals with resolution to compiled schema

• Must be forwards and backwards compatible

Ka#a$message:$byte[]

message:$byte[]schema$ID:$int

Page 10: Change data capture with MongoDB and Kafka.

Search indexing

• User / Topic / Opinion search

• Re-use Kafka topics from before

• Index from Kafka to Elasticsearch

• Need to update quickly and reliably

Page 11: Change data capture with MongoDB and Kafka.

Samza Indexers

• Index from Kafka to Elasticsearch

• Used Samza for transform and loading

• Far less code than Java Kafka consumers

• Stores offsets and state in Kafka

Page 12: Change data capture with MongoDB and Kafka.

Elasticsearch Producer

• Samza consumers/producers deal with I/O

• Wrote new ElasticsearchSystemProducer

• Contributed back to Samza project

• Included in Samza 0.10.0 (released soon)

Page 13: Change data capture with MongoDB and Kafka.

Samza Good/Bad• Good API

• Simple transformations easy

• Simple ops: logging, metrics all built in

• Only depends on Kafka

• Inbuilt state management

• Joins tricky, need consistent partitioning

• Complex flows are hard (Flink/Spark better)

Page 14: Change data capture with MongoDB and Kafka.

Decoupling Good/Bad• Easy to try out complex new services

• Easy to keep data stores in sync, low latency

• Started to duplicate core logic

• More overhead with more services

• Need high level framework for denormalisations

• Samza SQL being developed

Page 15: Change data capture with MongoDB and Kafka.

Ruby Workers

• Ruby Kafka consumers not great…

• Optailer to AWS SQS (Shoryuken gem)

• No order guarantee like Kafka topics

• But guaranteed trigger off database writes

• Better for core data transformations

Page 16: Change data capture with MongoDB and Kafka.

Future

• Segment.io user interaction logs to Kafka

• Use in product, view counts, etc…

• Fill Redshift for analytics (currently batch)

• Kafka CopyCat instead of our Optailer

• Avro transformation in Samza

Page 17: Change data capture with MongoDB and Kafka.

Questions?

• email: [email protected]

• twitter: @danharvey