Distilling Insights @ Appsflyer

Distilling insights @

Arnon Rotem-‐Gal-‐OzChief Data Officer

Kafka

Columnar Database(Redshift- evaluating Vertica)

IMDG(Ignite - evaluating Geode)

Secor

Spark

Aggregations

SparkSQL(evaluating

Drill, Presto)

SQL

SQL

Raw(sequence

files)

DW(parquet

files)

DM(Aggregations)

Application dashboard

Self-serveBI

(TBD)SparkETLSparkSparkML

Latest Events

Scoring

exploration

Agg. logic

Internal tools

installs clicksinapplaunches

Accounts

Data’s hierarchy of needs*

*With apologies to Maslow

Actedupon

presented

Distilled

Usable

Accessible

Exist

Kafka



Secor

Spark

Aggregations

SparkSQL(evaluating

Drill, Presto)

SQL

SQL

Raw(sequence

files)

DW(parquet

files)

DM(Aggregations)


Self-serveBI


Latest Events

Scoring

exploration

Agg. logic

Internal tools


Accounts

Working off of RAW data

“Malting”Just slap SQL on everything

Accessible

Kafka



Secor

Spark

Aggregations

SparkSQL(evaluating

Drill, Presto)

SQL

SQL

Raw(sequence

files)

DW(parquet

files)

DM(Aggregations)


Self-serveBI


Latest Events

Scoring

exploration

Agg. logic

Internal tools


Accounts

Fermenting

Usable

Kafka



Secor

Spark

Aggregations

SparkSQL(evaluating

Drill, Presto)

SQL

SQL

Raw(sequence

files)

DW(parquet

files)

DM(Aggregations)


Self-serveBI


Latest Events

Scoring

exploration

Agg. logic

Internal tools


Accounts

Distilling

Distilled

Kafka



Secor

Spark

Aggregations

SparkSQL(evaluating

Drill, Presto)

SQL

SQL

Raw(sequence

files)

DW(parquet

files)

DM(Aggregations)


Self-serveBI


Latest Events

Scoring

exploration

Agg. logic

Internal tools


Accounts

RT insights

Predictive

Prescriptive

Dashboards

whatnotpresented

Sidetrack:On use of Spark

Hadoop & Mesos

Land data in a queue

All data is time-‐series

Enrich with foreignkeys before persisting

Analyze and balance jobs

Not everything is big data

We’re hiring….

[email protected]

Distilling Insights @ Appsflyer

Technology

Transcript of Distilling Insights @ Appsflyer