Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman

43
Simplifying Event Streaming Tools for Location Transparency & Data Evolution Paul Osman - @paulosman - [email protected] - Staff Software Engineer, Under Armour Connected Fitness - Kafka Summit 2016

Transcript of Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman

Page 1: Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman

Simplifying Event StreamingTools for Location Transparency & Data Evolution

Paul Osman - @paulosman - [email protected] - Staff Software Engineer, Under Armour Connected Fitness - Kafka Summit 2016

Page 2: Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman

2

Introduction

• Paul Osman • Staff Software Engineer - Under Armour Connected Fitness • Formerly at PagerDuty, 500px, SoundCloud • @paulosman • [email protected]

Paul Osman - @paulosman - [email protected] - Staff Software Engineer, Under Armour Connected Fitness - Kafka Summit 2016

Page 3: Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman

3

Page 4: Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman

4

Page 5: Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman

5

Page 6: Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman

6

Under Armour Connected Fitness

• November 2013 - Under Armour acquires MapMyFitness Inc

• February 2015 - Under Armour acquires MyFitnessPal

• February 2015 - Under Armour acquires Endomondo

• January 2016 - Announce HealthBox , Gemini 2 RE

Page 7: Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman

7

Under Armour Connected Fitness

Page 8: Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman

8

MyFitnessPal and Kafka

• MFP started as a Rails monolith

• Broken into microservices written in Scala and Ruby

• Data integration challenges

• Service dependencies difficult to manage

Page 9: Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman

9

Solution

Page 10: Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman

10

Pushing a data migration…

Page 11: Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman

11

You broke my consumer!

Page 12: Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman

12

My bad!

Page 13: Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman

13

Fix it?

Page 14: Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman

14

Other Challenges…

• Client libraries for non-JVM languages were of varying quality

• Developers needed to know about Kafka

• Wanted to federate Kafka clusters - no one team should have to maintain all clusters

Page 15: Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman

15

MyFitnessPal joined Under Armour

Page 16: Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman

16

Page 17: Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman

17

Project Golden Gate

Page 18: Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman

18

Page 19: Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman

19

Challenges Recap

• Engineers needed to know a lot about Kafka clusters

• Data migrations broke consumer contracts

• Client libraries for non-JVM languages

• Management of Kafka clusters

• Data retention policies

Page 20: Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman

20

Location Transparency

• A publishing client needn’t be concerned with things like clusters, topics, etc

• Need some kind of source of truth for event locations

Page 21: Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman

21

Topology Service

• Each event has a namespace and event type (globally unique)

• The topology service instructs clients where to publish or consume those messages

• Introduces concept of “zones” which represent one or more clusters

Page 22: Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman

22

Data Migrations

• Solved problem - use Schemas

• Confluent Schema Registry + Small Service to capture Metadata (event type and namespace)

• Confluent Schema Registry uses Avro, so we do too

Page 23: Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman

23

Data Migrations

pending available

{ event_type: "ActivityFeedStoryUpdate", namespace: "mmf", status: "pending", confluent_subject: "mmf_activityfeedstoryupdate", schema_id: "bb68e5381e88d52574b0f50a000fbe9b"}

Page 24: Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman

24

Registering a Schema

$> schema-registry register schema --event-type FoodEntryCreated --file-name FoodEntryCreated.avsc \ --namespace mfp -p

$> schema-registry activate schema 20afc5a8f9c017c1f4e82757a7a88f5b -p

Page 25: Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman

25

Page 26: Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman

26

Publishing

Page 27: Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman

27

JVM Languages

• Java and Scala client libraries

Page 28: Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman

28

Non-JVM Languages

projects/GoldenGate $ curl -D - -H'Content-Type: application/json' http://localhost:3005/golden-gate-proxy/produce -d'@schemas/integ-message.json'

HTTP/1.1 202 Accepted Server: spray-can/1.3.3 Date: Tue, 19 Apr 2016 17:34:21 GMT Content-Type: application/json; charset=UTF-8 Content-Length: 419

{ "items": [{ "producer_id": "foo", "schema_id": "5635ce15a15213105c091d5e0945b0c2", "zone": "mfp", "payload": { "context" : null, "email_address" : "vneo", "email_source" : "fadipwxfmotvav", "first_name" : null, "last_name" : null, "country" : {"string" : "US"}, }] }

Page 29: Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman

29

Consuming

Page 30: Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman

30

JVM Languages

Page 31: Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman

31

Non-JVM Languages

projects/GoldenGate $ ./gg-consumer-proxy —subscriptions=mmf/activity_feed_updated=http://localhost:3000/callback

Page 32: Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman

32

Federation of Kafka Clusters

Publisher

Topology Service

Kafka

Consumer

Page 33: Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman

33

Data Retention

• Archiving is now just a job for a specialized consumer

• Archiving is done “per-zone”. Some data shouldn’t be archived, it only gets published to zones that are not archived (as per event type)

• In our case, data is stored in S3 and then accessed through a variety of tools for analysis, batch processing, etc.

Page 34: Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman

Adoption Pains

Page 35: Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman

35

Adoption Pains

Leaky Abstractions

Page 36: Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman

36

Adoption Pains

Publishers Consumers

Schema Designers

Page 37: Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman

37

Help Publishers - Avro Helper Library

• helpful-avro Scala library

• Adds a layer of robustness

• Tries a few tricks to make a payload validate against a schema

Page 38: Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman

38

{ "name": "Person", "namespace": "org.example", "type": "record", "fields": [ {"name": "first_name", "type": "string"}, {"name": "last_name", "type": "string"}, {"name": "age", "type": ["null", "int"]}, {"name": "username", "type": ["null", "string"]} ] }

Example Schema

Page 39: Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman

39

{ "name": "Person", "namespace": "org.example", "type": "record", "fields": [ {"name": "first_name", "type": "string"}, {"name": "last_name", "type": "string"}, {"name": "age", "type": ["null", "int"]}, {"name": "username", "type": ["null", "string"]} ] }

Optional Fields

Nullable / Optional Fields

Page 40: Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman

40

Nullable / Optional Fields

{ "first_name": "Paul", "last_name": "Osman", "username": "paulosman" }

{ "first_name": "Paul", "last_name": "Osman", "age": {"null":null}, "username": {"string": "paulosman"} }

age omitted

not type annotated

Page 41: Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman

• Browser and CLI based tools that allow people to observe activity being published to a specific zone

• Give people a way to see their event go through the system

• End to end monitoring, monitoring of consumer lag

41

Observability

Page 42: Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman

42

Future Plans

• Make schema authoring and registration easier and more automated

• Extend helpful-avro to work with Case Classes and POJOs

• Further hide implementation details

Page 43: Simplifying Event Streaming: Tools for Location Transparency and Data Evolution, Paul Osman

43CONFIDENTIAL & BUSINESS PROPRIETARY INFORMATION OF UNDER ARMOUR, INC. COPYRIGHT (C)2015

Thank You http://underarmour.jobs