Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman
-
Upload
confluent -
Category
Engineering
-
view
904 -
download
1
Transcript of Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman
Simplifying Event StreamingTools for Location Transparency & Data Evolution
Paul Osman - @paulosman - [email protected] - Staff Software Engineer, Under Armour Connected Fitness - Kafka Summit 2016
2
Introduction
• Paul Osman • Staff Software Engineer - Under Armour Connected Fitness • Formerly at PagerDuty, 500px, SoundCloud • @paulosman • [email protected]
Paul Osman - @paulosman - [email protected] - Staff Software Engineer, Under Armour Connected Fitness - Kafka Summit 2016
3
4
5
6
Under Armour Connected Fitness
• November 2013 - Under Armour acquires MapMyFitness Inc
• February 2015 - Under Armour acquires MyFitnessPal
• February 2015 - Under Armour acquires Endomondo
• January 2016 - Announce HealthBox , Gemini 2 RE
7
Under Armour Connected Fitness
8
MyFitnessPal and Kafka
• MFP started as a Rails monolith
• Broken into microservices written in Scala and Ruby
• Data integration challenges
• Service dependencies difficult to manage
9
Solution
10
Pushing a data migration…
11
You broke my consumer!
12
My bad!
13
Fix it?
14
Other Challenges…
• Client libraries for non-JVM languages were of varying quality
• Developers needed to know about Kafka
• Wanted to federate Kafka clusters - no one team should have to maintain all clusters
15
MyFitnessPal joined Under Armour
16
17
Project Golden Gate
18
19
Challenges Recap
• Engineers needed to know a lot about Kafka clusters
• Data migrations broke consumer contracts
• Client libraries for non-JVM languages
• Management of Kafka clusters
• Data retention policies
20
Location Transparency
• A publishing client needn’t be concerned with things like clusters, topics, etc
• Need some kind of source of truth for event locations
21
Topology Service
• Each event has a namespace and event type (globally unique)
• The topology service instructs clients where to publish or consume those messages
• Introduces concept of “zones” which represent one or more clusters
22
Data Migrations
• Solved problem - use Schemas
• Confluent Schema Registry + Small Service to capture Metadata (event type and namespace)
• Confluent Schema Registry uses Avro, so we do too
23
Data Migrations
pending available
{ event_type: "ActivityFeedStoryUpdate", namespace: "mmf", status: "pending", confluent_subject: "mmf_activityfeedstoryupdate", schema_id: "bb68e5381e88d52574b0f50a000fbe9b"}
24
Registering a Schema
$> schema-registry register schema --event-type FoodEntryCreated --file-name FoodEntryCreated.avsc \ --namespace mfp -p
$> schema-registry activate schema 20afc5a8f9c017c1f4e82757a7a88f5b -p
25
26
Publishing
27
JVM Languages
• Java and Scala client libraries
28
Non-JVM Languages
projects/GoldenGate $ curl -D - -H'Content-Type: application/json' http://localhost:3005/golden-gate-proxy/produce -d'@schemas/integ-message.json'
HTTP/1.1 202 Accepted Server: spray-can/1.3.3 Date: Tue, 19 Apr 2016 17:34:21 GMT Content-Type: application/json; charset=UTF-8 Content-Length: 419
{ "items": [{ "producer_id": "foo", "schema_id": "5635ce15a15213105c091d5e0945b0c2", "zone": "mfp", "payload": { "context" : null, "email_address" : "vneo", "email_source" : "fadipwxfmotvav", "first_name" : null, "last_name" : null, "country" : {"string" : "US"}, }] }
29
Consuming
30
JVM Languages
31
Non-JVM Languages
projects/GoldenGate $ ./gg-consumer-proxy —subscriptions=mmf/activity_feed_updated=http://localhost:3000/callback
32
Federation of Kafka Clusters
Publisher
Topology Service
Kafka
Consumer
33
Data Retention
• Archiving is now just a job for a specialized consumer
• Archiving is done “per-zone”. Some data shouldn’t be archived, it only gets published to zones that are not archived (as per event type)
• In our case, data is stored in S3 and then accessed through a variety of tools for analysis, batch processing, etc.
Adoption Pains
35
Adoption Pains
Leaky Abstractions
36
Adoption Pains
Publishers Consumers
Schema Designers
37
Help Publishers - Avro Helper Library
• helpful-avro Scala library
• Adds a layer of robustness
• Tries a few tricks to make a payload validate against a schema
38
{ "name": "Person", "namespace": "org.example", "type": "record", "fields": [ {"name": "first_name", "type": "string"}, {"name": "last_name", "type": "string"}, {"name": "age", "type": ["null", "int"]}, {"name": "username", "type": ["null", "string"]} ] }
Example Schema
39
{ "name": "Person", "namespace": "org.example", "type": "record", "fields": [ {"name": "first_name", "type": "string"}, {"name": "last_name", "type": "string"}, {"name": "age", "type": ["null", "int"]}, {"name": "username", "type": ["null", "string"]} ] }
Optional Fields
Nullable / Optional Fields
40
Nullable / Optional Fields
{ "first_name": "Paul", "last_name": "Osman", "username": "paulosman" }
{ "first_name": "Paul", "last_name": "Osman", "age": {"null":null}, "username": {"string": "paulosman"} }
age omitted
not type annotated
• Browser and CLI based tools that allow people to observe activity being published to a specific zone
• Give people a way to see their event go through the system
• End to end monitoring, monitoring of consumer lag
41
Observability
42
Future Plans
• Make schema authoring and registration easier and more automated
• Extend helpful-avro to work with Case Classes and POJOs
• Further hide implementation details
43CONFIDENTIAL & BUSINESS PROPRIETARY INFORMATION OF UNDER ARMOUR, INC. COPYRIGHT (C)2015
Thank You http://underarmour.jobs