Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk....

59
Routing Trillions of Events Per Day @Twitter #ApacheBigData 2017 Lohit VijayaRenu & Gary Steelman @lohitvijayarenu @efsie

Transcript of Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk....

Page 1: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

Routing Trillions of Events Per Day @Twitter#ApacheBigData 2017

Lohit VijayaRenu & Gary Steelman@lohitvijayarenu @efsie

Page 2: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

1. Event Logs at Twitter2. Log Collection 3. Log Processing4. Log Replication5. The Future 6. Questions

In this talk

Page 3: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

Overview

Page 4: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

● Clients log events specifying a Category name. Eg ads_view, login_event ...

● Events are grouped together across all clients into the Category

● Events are stored on Hadoop Distributed File System, bucketed every hour into separate directories ○ /logs/ads_view/2017/05/01/23○ /logs/login_event/2017/05/01/23

Life of an Event

Clients

Aggregated by Category

Storage HDFS

Http Clients

Clients

Client DaemonClient Daemon

Client Daemon

Http Endpoint

Page 5: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

~3PB

<1500Across millions of clients

>1TTrillion Events a Day of Data a Day

Nodes

Incoming uncompressed

Collocated with HDFS datanodes

Event Log Stats

>600CategoriesEvent groups by category

Page 6: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

Event Log Architecture

Clients

Local log collection daemon

Clients

Aggregate log events grouped by Category

Storage (HDFS)

HTTP

Remote Clients

Log Processor

Storage (HDFS)

Storage (HDFS)

Log ReplicatorStorage (HDFS)

Inside DataCenter

Storage (Streaming)

Page 7: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

Event Log Architecture

Clients

Local log collection daemon

Clients

Aggregate log events grouped by Category

Storage (HDFS)

HTTP

Remote Clients

Log Processor

Storage (HDFS)

Storage (HDFS)

Log ReplicatorStorage (HDFS)

Inside DataCenter

Storage (Streaming)

Page 8: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

Event Log Architecture

Clients

Local log collection daemon

Clients

Aggregate log events grouped by Category

Storage (HDFS)

HTTP

Remote Clients

Log Processor

Storage (HDFS)

Storage (HDFS)

Log ReplicatorStorage (HDFS)

Inside DataCenter

Storage (Streaming)

Page 9: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

Event Log Architecture

Clients

Local log collection daemon

Clients

Aggregate log events grouped by Category

Storage (HDFS)

HTTP

Remote Clients

Log Processor

Storage (HDFS)

Storage (HDFS)

Log ReplicatorStorage (HDFS)

Inside DataCenter

Storage (Streaming)

Page 10: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

Event Log Architecture

Clients

Local log collection daemon

Clients

Aggregate log events grouped by Category

Storage (HDFS)

HTTP

Remote Clients

Log Processor

Storage (HDFS)

Storage (HDFS)

Log ReplicatorStorage (HDFS)

Inside DataCenter

Storage (Streaming)

Page 11: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

Event Log Architecture

Events Events

RT Storage (HDFS)

Inside DC1

Events Events

RT Storage (HDFS)

Inside DC2

DW Storage (HDFS)

Prod Storage (HDFS)

DW Storage (HDFS)Cold Storage (HDFS)

Prod Storage (HDFS)

Page 12: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

Event Log Architecture

Events Events

RT Storage (HDFS)

Inside DC1

Events Events

RT Storage (HDFS)

Inside DC2

DW Storage (HDFS)

Prod Storage (HDFS)

DW Storage (HDFS)Cold Storage (HDFS)

Prod Storage (HDFS)

Page 13: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

Event Log Architecture

Events Events

RT Storage (HDFS)

Inside DC1

Events Events

RT Storage (HDFS)

Inside DC2

DW Storage (HDFS)

Prod Storage (HDFS)

DW Storage (HDFS)Cold Storage (HDFS)

Prod Storage (HDFS)

Page 14: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

Collection

Page 15: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

Event Log Architecture

Clients

Local log collection daemon

Clients

Aggregate log events grouped by Category

Storage (HDFS)

HTTP

Remote Clients

Log Processor

Storage (HDFS)

Storage (HDFS)

Log ReplicatorStorage (HDFS)

Inside DataCenter

Storage (Streaming)

Page 16: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

Past

Event Collection Overview

Future

Scribe Client Daemon

Scribe Aggregator Daemons

Scribe Client Daemon

Flume Aggregator Daemon

Flume Aggregator Daemon

Flume Client Daemon

Present

Page 17: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

Event CollectionPast

Challenges with Scribe● Too many open file handles to HDFS

○ 600 categories x 1500 aggregators x 6 per hour =~ 5.4M files per hour● High IO wait on DataNodes at scale● Max limit on throughput per aggregator ● Difficult to track message drops● No longer active open source development

Page 18: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

Event CollectionPresent

Apache Flume● Well defined interfaces ● Open source● Concept of transactions● Existing implementations of

interfaces

Source Sink

Channel

Client HDFS

Flume Agent

Page 19: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

Event CollectionPresent

Category Group

● Combine multiple related categories into a category group

● Provide different properties per group

● Contains multiple events to generate fewer combined sequence files

Agent 1 Agent 2 Agent 3

Category 1 Category 3Category 2

Category Group

Page 20: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

Group 1Category Groups

Aggregator Group 1 Aggregator Group 2

Event CollectionPresent

Aggregator Group● A set of aggregators

hosting same set of category groups

● Easy to manage group of aggregators hosting subset of categories

Agent 1 Agent 2 Agent 3 Agent 8

Group 2

Page 21: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

Event CollectionPresent

Flume features to support groups

● Extend Interceptor to multiplex events into groups● Implement Memory Channel Group to have separate memory

channel per category group● ZooKeeper registration per category group for service discovery● Metrics for category groups

Page 22: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

Event CollectionPresent

Flume performance improvements

● HDFSEventSink batching increased (5x) throughput reducing spikes on memory channel

● Implement buffering in HDFSEventSink instead of using SpillableMemoryChannel

● Stream events close to network speed

Page 23: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

Processing

Page 24: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

Event Log Architecture

Clients

Local log collection daemon

Clients

Aggregate log events grouped by Category

Storage (HDFS)

HTTP

Remote Clients

Log Processor

Storage (HDFS)

Storage (HDFS)

Log ReplicatorStorage (HDFS)

Inside DataCenter

Storage (Streaming)

Page 25: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

>1PB 20-50%To process one day of data

8Wall Clock Hours Data per Day Disk Space

Output of cleaned, compressed, consolidated, and converted

Saved by processing Flume sequence files

Log Processor StatsProcessing Trillion Events per Day

Page 26: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

● Make processing log data easier for analytics teams● Disk space is at a premium on analytics clusters● Still too many files cause increased pressure on the NameNode● Log data is read many times and different teams all perform the same

pre-processing steps on the same data sets

Log Processor NeedsProcessing Trillion Events per Day

Page 27: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

Datacenter 1

Log Processor Steps

ads_group/yyyy/mm/dd/hhads_click/yyyy/mm/dd/hh

ads_view/yyyy/mm/dd/hh

login_event/yyyy/mm/dd/hhlogin_group/yyyy/mm/dd/hh

Category Groups CategoriesDemux Jobs

ads_group_demuxer

login_group_demuxer

Page 28: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

Datacenter 1

Log Processor Steps

ads_group/yyyy/mm/dd/hhads_click/yyyy/mm/dd/hh

ads_view/yyyy/mm/dd/hh

login_event/yyyy/mm/dd/hhlogin_group/yyyy/mm/dd/hh

Category Groups CategoriesDemux Jobs

ads_group_demuxer

login_group_demuxer

Page 29: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

Datacenter 1

Log Processor Steps

ads_group/yyyy/mm/dd/hhads_click/yyyy/mm/dd/hh

ads_view/yyyy/mm/dd/hh

login_event/yyyy/mm/dd/hhlogin_group/yyyy/mm/dd/hh

Category Groups CategoriesDemux Jobs

ads_group_demuxer

login_group_demuxer

Page 30: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

DecodeBase64 encoding from logged data

1

DemuxCategory groups into individual categories for easier consumption by analytics teams

2

CleanCorrupt, empty, or invalid records so data sets are more reliable

3

CompressLogged data to the highest level to save disk space. From LZO level 3 to LZO level 7

4

ConsolidateSmall files to reduce pressure on the NameNode

5

ConvertSome categories into Parquet for fastest use in ad-hoc exploratory tools

6

Log Processor Steps

Page 31: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

● Scribe’s contract amounts to sending a binary blob to a port● Scribe used new line characters to delimit records in a binary blob batch

of records● Valid records may include new line characters● Scribe base64 encoded received binary blobs to avoid confusion with

record delimiter● Base 64 encoding is no longer necessary because we have moved to one

serialized Thrift object per binary blob

Why Base64 Decoding?Legacy Choices

Page 32: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

/logs/ads_click/yyyy/mm/dd/hh/1.lzo

/logs/ads_view/yyyy/mm/dd/hh/1.lzo

Log Demux Visual/raw/ads_group/yyyy/mm/dd/hh/ads_group_1.seq

DEMUX

/logs/ads_view/yyyy/mm/dd/hh/1.lzo

Page 33: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

/logs/ads_click/yyyy/mm/dd/hh/1.lzo

/logs/ads_view/yyyy/mm/dd/hh/1.lzo

Log Demux Visual/raw/ads_group/yyyy/mm/dd/hh/ads_group_1.seq

DEMUX

/logs/ads_view/yyyy/mm/dd/hh/1.lzo

Page 34: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

/logs/ads_click/yyyy/mm/dd/hh/1.lzo

/logs/ads_view/yyyy/mm/dd/hh/1.lzo

Log Demux Visual/raw/ads_group/yyyy/mm/dd/hh/ads_group_1.seq

DEMUX

/logs/ads_view/yyyy/mm/dd/hh/1.lzo

Page 35: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

● One log processor daemon per RT Hadoop cluster, where Flume aggregates logs

● Primarily responsible for demuxing category groups out of the Flume sequence files

● The daemon schedules Tez jobs every hour for every category group in a thread pool

● Daemon atomically presents processed category instances so partial data can’t be read

● Processing proceeds according to criticality of data or “tiers”

Log Processor Daemon

Page 36: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

● Some categories are significantly larger than other categories (KBs v TBs)● MapReduce demux? Each reducer handles a single category● Streaming demux? Each spout or channel handles a single category● Massive skew in partitioning by category causes long running tasks which

slows down job completion time● Relatively well understood fault tolerance semantics similar to

MapReduce, Spark, etc

Why Tez?

Page 37: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

● Tez’s dynamic hash partitioner adjusts partitions at runtime if necessary, allowing large partitions to be further partitioned so multiple tasks process events for a single category one task○ More info at TEZ-3209.○ Thanks to team member Ming Ma for the contribution!

● Easier horizontal scaling simultaneously providing more predictable processing times

Dynamic Partitioning

Page 38: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

Typical Partitioning Visual

Task 3

Task 2

Input File 1

Task 1

Page 39: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

Hash Partitioning Visual

Task 3Task 2

Input File 1

Task 1

Task 4

Task 5

Page 40: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

Replication

Page 41: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

Event Log Architecture

Clients

Local log collection daemon

Clients

Aggregate log events grouped by Category

Storage (HDFS)

HTTP

Remote Clients

Log Processor

Storage (HDFS)

Storage (HDFS)

Log ReplicatorStorage (HDFS)

Inside DataCenter

Storage (Streaming)

Page 42: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

>1PB ~10Across all analytics clusters

>24kCopy Jobs per Day PB of Data per Day Analytics Clusters

Replicated to analytics clusters

Log Replication StatsReplicating Trillion Events per Day

Page 43: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

● Collocate logs with compute and disk capacity for analytics teams○ Cross-data center reads are incredibly expensive○ Cross-rack reads within data center are still expensive

● Critical data set copies and backups in case of data center failures

Log Replication NeedsProcessing Trillion Events per Day

Page 44: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

Datacenter N

Datacenter 1

Log Replication Visual

ads_click/yyyy/mm/dd/hh

ads_view/yyyy/mm/dd/hh

login_event/yyyy/mm/dd/hh

ads_click/yyyy/mm/dd/hh

ads_view/yyyy/mm/dd/hh

login_event/yyyy/mm/dd/hh

ads_click/yyyy/mm/dd/hh

ads_view/yyyy/mm/dd/hh

login_event/yyyy/mm/dd/hh

ads_click/yyyy/mm/dd/hh

Replication Jobs

ads_click_repl

ads_view_repl

ads_click_repl

login_event_repl

Page 45: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

Datacenter N

Datacenter 1

Log Replication Visual

ads_click/yyyy/mm/dd/hh

ads_view/yyyy/mm/dd/hh

login_event/yyyy/mm/dd/hh

ads_click/yyyy/mm/dd/hh

ads_view/yyyy/mm/dd/hh

login_event/yyyy/mm/dd/hh

ads_click/yyyy/mm/dd/hh

ads_view/yyyy/mm/dd/hh

login_event/yyyy/mm/dd/hh

ads_click/yyyy/mm/dd/hh

Replication Jobs

ads_click_repl

ads_view_repl

ads_click_repl

login_event_repl

Page 46: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

Datacenter N

Datacenter 1

Log Replication Visual

ads_click/yyyy/mm/dd/hh

ads_view/yyyy/mm/dd/hh

login_event/yyyy/mm/dd/hh

ads_click/yyyy/mm/dd/hh

ads_view/yyyy/mm/dd/hh

login_event/yyyy/mm/dd/hh

ads_click/yyyy/mm/dd/hh

ads_view/yyyy/mm/dd/hh

login_event/yyyy/mm/dd/hh

ads_click/yyyy/mm/dd/hh

Replication Jobs

ads_click_repl

ads_view_repl

ads_click_repl

login_event_repl

Page 47: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

Datacenter N

Datacenter 1

Log Replication Visual

ads_click/yyyy/mm/dd/hh

ads_view/yyyy/mm/dd/hh

login_event/yyyy/mm/dd/hh

ads_click/yyyy/mm/dd/hh

ads_view/yyyy/mm/dd/hh

login_event/yyyy/mm/dd/hh

ads_click/yyyy/mm/dd/hh

ads_view/yyyy/mm/dd/hh

login_event/yyyy/mm/dd/hh

ads_click/yyyy/mm/dd/hh

Replication Jobs

ads_click_repl

ads_view_repl

ads_click_repl

login_event_repl

Page 48: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

Datacenter N

Datacenter 1

Log Replication Visual

ads_click/yyyy/mm/dd/hh

ads_view/yyyy/mm/dd/hh

login_event/yyyy/mm/dd/hh

ads_click/yyyy/mm/dd/hh

ads_view/yyyy/mm/dd/hh

login_event/yyyy/mm/dd/hh

ads_click/yyyy/mm/dd/hh

ads_view/yyyy/mm/dd/hh

login_event/yyyy/mm/dd/hh

ads_click/yyyy/mm/dd/hh

Replication Jobs

ads_click_repl

ads_view_repl

ads_click_repl

login_event_repl

Page 49: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

Datacenter N

Datacenter 1

Log Replication Visual

ads_click/yyyy/mm/dd/hh

ads_view/yyyy/mm/dd/hh

login_event/yyyy/mm/dd/hh

ads_click/yyyy/mm/dd/hh

ads_view/yyyy/mm/dd/hh

login_event/yyyy/mm/dd/hh

ads_click/yyyy/mm/dd/hh

ads_view/yyyy/mm/dd/hh

login_event/yyyy/mm/dd/hh

ads_click/yyyy/mm/dd/hh

Replication Jobs

ads_click_repl

ads_view_repl

ads_click_repl

login_event_repl

Page 50: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

Datacenter N

Datacenter 1

Log Replication Visual

ads_click/yyyy/mm/dd/hh

ads_view/yyyy/mm/dd/hh

login_event/yyyy/mm/dd/hh

ads_click/yyyy/mm/dd/hh

ads_view/yyyy/mm/dd/hh

login_event/yyyy/mm/dd/hh

ads_click/yyyy/mm/dd/hh

ads_view/yyyy/mm/dd/hh

login_event/yyyy/mm/dd/hh

ads_click/yyyy/mm/dd/hh

Replication Jobs

ads_click_repl

ads_view_repl

ads_click_repl

login_event_repl

Page 51: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

Datacenter N

Datacenter 1

Log Replication Visual

ads_click/yyyy/mm/dd/hh

ads_view/yyyy/mm/dd/hh

login_event/yyyy/mm/dd/hh

ads_click/yyyy/mm/dd/hh

ads_view/yyyy/mm/dd/hh

login_event/yyyy/mm/dd/hh

ads_click/yyyy/mm/dd/hh

ads_view/yyyy/mm/dd/hh

login_event/yyyy/mm/dd/hh

ads_click/yyyy/mm/dd/hh

Replication Jobs

ads_click_repl

ads_view_repl

ads_click_repl

login_event_repl

Page 52: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

Datacenter N

Datacenter 1

Log Replication Visual

ads_click/yyyy/mm/dd/hh

ads_view/yyyy/mm/dd/hh

login_event/yyyy/mm/dd/hh

ads_click/yyyy/mm/dd/hh

ads_view/yyyy/mm/dd/hh

login_event/yyyy/mm/dd/hh

ads_click/yyyy/mm/dd/hh

ads_view/yyyy/mm/dd/hh

login_event/yyyy/mm/dd/hh

ads_click/yyyy/mm/dd/hh

Replication Jobs

ads_click_repl

ads_view_repl

ads_click_repl

login_event_repl

Page 53: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

Datacenter N

Datacenter 1

Log Replication Visual

ads_click/yyyy/mm/dd/hh

ads_view/yyyy/mm/dd/hh

login_event/yyyy/mm/dd/hh

ads_click/yyyy/mm/dd/hh

ads_view/yyyy/mm/dd/hh

login_event/yyyy/mm/dd/hh

ads_click/yyyy/mm/dd/hh

ads_view/yyyy/mm/dd/hh

login_event/yyyy/mm/dd/hh

ads_click/yyyy/mm/dd/hh

Replication Jobs

ads_click_repl

ads_view_repl

ads_click_repl

login_event_repl

Page 54: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

Datacenter N

Datacenter 1

Log Replication Visual

ads_click/yyyy/mm/dd/hh

ads_view/yyyy/mm/dd/hh

login_event/yyyy/mm/dd/hh

ads_click/yyyy/mm/dd/hh

ads_view/yyyy/mm/dd/hh

login_event/yyyy/mm/dd/hh

ads_click/yyyy/mm/dd/hh

ads_view/yyyy/mm/dd/hh

login_event/yyyy/mm/dd/hh

ads_click/yyyy/mm/dd/hh

Replication Jobs

ads_click_repl

ads_view_repl

ads_click_repl

login_event_repl

Page 55: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

CopyLogged data from all processing clusters to the target cluster.

1

MergeCopied data into one directory.

2

PresentData atomically by renaming it to an accessible location.

3

PublishMetadata to the Data Abstraction Layer to notify analytics teams data is ready for consumption.

4

Log Replication StepsDistributing Trillion Events per Day

Page 56: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

● One log replicator daemon per DW, PROD, or COLD Hadoop cluster, where analytics users run queries and jobs

● Primarily responsible for copying category partitions out of the RT Hadoop clusters

● The daemons schedule Hadoop DistCp jobs every hour for every category● Daemon atomically presents processed category instances so partial data

can’t be read● Replication proceeds according to criticality of data or “tiers”

Log Replicator Daemon

Page 57: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

The Future

Page 58: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

Future of Log Management

● Flume client improvement for tracing, SLA and throttling● Flume client to support for message validation before logging● Centralized configuration management● Processing and replication every 10 minutes instead of every hour

Page 59: Events Per Day @Twitter Routing Trillions of Trillion Events...The Future 6. Questions In this talk. Overview Clients log events specifying a Category ... Present Apache Flume Well

Thank YouQuestions?