How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Streams And Actors

Post on 23-Jan-2018

3.488 views 0 download

Transcript of How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Streams And Actors

1 Proprietary & Confidential1 Proprietary & Confidential

Using Akka Streams

For Real Time Decision MakingDustin LyonsEngineering Manager, Data Platform

2 Proprietary & Confidential

● Engineer turned Engineering Manager at Credit Karma

● Data & Analytics on the Platform team● Build things that make decisions on

where data should go● Lover of science fiction, sushi, and

electronic music

Who I am

3 Proprietary & Confidential

Credit Karma is a free financial assistant, helping over 60 million people make progress.

4 Proprietary & Confidential

1. Data Infrastructure at Credit Karma: Past and current2. Mo’ data, mo’ problems3. Akka Streams saves the day4. Results and learnings5. Q&A

Agenda for today

5 Proprietary & Confidential

Data scale (MB/min) @ Credit Karma

6 Proprietary & Confidential

Credit Karma data platform: PHP days

PHP Scripts

7 Proprietary & Confidential

New tools to help with scale

8 Proprietary & Confidential

Credit Karma data platform: Scala in 2014

Data Warehouse Import

9 Proprietary & Confidential

New tools to help with concurrency

10 Proprietary & Confidential

Credit Karma data platform: Akka in 2015

Analytics Export Service+

Data Warehouse Import

11 Proprietary & Confidential

Credit Karma data platform: Akka in 2015

Analytics Export Service+

Data Warehouse Import

12 Proprietary & Confidential

Analytics export service

Coordinator Data Transformer Workers

Kafka Importer Workers

Analytics Export Service

HTTP Ingest Server

13 Proprietary & Confidential

Analytics export service

14 Proprietary & Confidential

Analytics export service

Coordinator Data Transformer Workers

Kafka Importer Workers

Analytics Export Service

HTTP Ingest Server

15 Proprietary & Confidential

Analytics export service

16 Proprietary & Confidential

Data warehouse import

ReaderDeduplicatorProcessor Extractors

Data Warehouse Import Service

17 Proprietary & Confidential

Data warehouse import

18 Proprietary & Confidential

Marble maze

19 Proprietary & Confidential

Marble maze

20 Proprietary & Confidential

Marble maze

21 Proprietary & Confidential

Marble maze

22 Proprietary & Confidential

Marble maze

1Reading from file

23 Proprietary & Confidential

Marble maze

1

2

Reading from file

Waiting for external service

24 Proprietary & Confidential

Marble maze

1

3

2

Reading from file

Objects sit in heap

Waiting for external service

25 Proprietary & Confidential

Marble maze

1

3

2

Reading from file

Objects sit in heap

Waiting for external service

4 Database Insert

26 Proprietary & Confidential

Backpressure

27 Proprietary & Confidential

What is backpressure?

Backpressure refers to the buildup of data at an I/O switch when buffers are full and not able to receive additional data.

No additional data packets are transferred until the bottleneck of data has been eliminated or the buffer has been emptied.

28 Proprietary & Confidential

Analytics export service

Coordinator Data Transformer Workers

Kafka Importer Workers

Analytics Export Service

HTTP Ingest Server

29 Proprietary & Confidential

Analytics export service

Coordinator Data Transformer Workers

Kafka Importer Workers

Analytics Export Service

HTTP Ingest Server

30 Proprietary & Confidential

Analytics export service

Coordinator Data Transformer Workers

Kafka Importer Workers

Analytics Export Service

HTTP Ingest Server

31 Proprietary & Confidential

Data warehouse import

ReaderDeduplicatorProcessor Extractors

Data Warehouse Import Service

32 Proprietary & Confidential

Akka Streams: Backpressure in action

Actor Actor

Data

Demand

33 Proprietary & Confidential

Akka Streams: Creating a stream

Source Flow Sink

34 Proprietary & Confidential

Akka Streams: Built in stages

Built In Sources• actorRef • actorPublisher• fromIterator • fromFile• Apply (from a Seq)

Built In Processing Stages• map • filter• grouped • drop/take• dropWhile/takeWhile • sliding

Built In Sinks• head • last• seq • foreach• actorRef • actorSubscriber• reduce • fold

Backpressure Aware Stages• mapAsync • buffer (Backpressure)• batch • buffer (Drop)

• buffer (Fail)

Reference: http://doc.akka.io/docs/akka/current/scala/stream/stages-overview.html

35 Proprietary & Confidential

Analytics export service

Coordinator Data Transformer Workers

Kafka Importer Workers

Analytics Export Service

HTTP Ingest Server

36 Proprietary & Confidential

Analytics export service

Coordinator

Analytics Export Service

HTTP Ingest ServerAkka Stream

37 Proprietary & Confidential

Analytics export service

38 Proprietary & Confidential

Data warehouse import

ReaderDeduplicatorProcessor Extractors

Data Warehouse Import Service

39 Proprietary & Confidential

Data warehouse import

Extractors

Data Warehouse Import Service

Akka Stream

40 Proprietary & Confidential

Data warehouse import service

41 Proprietary & Confidential

Analytics export service heap (before)

GiB

=>

Time =>

28 GiB

Red: Heap SpaceBlue: Used Heap SpacePurple: Max Heap Space

42 Proprietary & Confidential

Analytics export service heap (after)

GiB

=>

Time =>

28 GiB

43 Proprietary & Confidential

Data warehouse import

44 Proprietary & Confidential

Data warehouse import

45 Proprietary & Confidential

Data warehouse import

46 Proprietary & Confidential

• Akka Streams allowed us to move data with increased throughput and optimal performance

• No longer getting paged for JVM out of memory or spending time tuning our services

• Reduced the SLA for data delivery to our business stakeholders

Final results

47 Proprietary & Confidential

• Akka Actors: Great for low latency• Akka Streams: Optimized for high throughput and solving back pressure

• Built on top of Akka Actors• Don’t try to build high throughput systems with an actor system, you’ll just start

building Akka Streams

Lessons learned

48 Proprietary & Confidential48 Proprietary & Confidential

Thank you!

Q&ADustin LyonsEngineering Manager, Data Platform