Data Engineering @ Grab · Infrastructure Evolution of Analytics@Grab MySQL Replica 2013 / 2014 1...

9
Data Engineering @ Grab 15 July 2017 Geekcamp JKT

Transcript of Data Engineering @ Grab · Infrastructure Evolution of Analytics@Grab MySQL Replica 2013 / 2014 1...

Data Engineering @ Grab

15 July 2017 Geekcamp JKT

65 CITIES7 COUNTRIES>1.1Million

drivers in network

Largest in SEA

>50 Milliondownloads

#1 e-hailing in SEA

third-party taxi-hailing apps

private cars& growing

>70%95%share share

SERVICES6

Infrastructure Evolution of Analytics@Grab

MySQL Replica2013 / 2014

1 Database~20 tables

< 50 reports< 10 users1 Engineer

Redshift2015/201620 Databases100s tables100s reports < 500 users3 Engineers

Presto+EMR+S3Now

~20 Databases + streams100s tables

> 500 reports > 500 users9 Engineers

Redshift for Analytics@Grab

Daily ETL After Midnight Redshift serves multiple use cases

Data Lake@Grab

Pyrois Orchestrator

Hourly ETL

Data Stored as Parquet and

Partitioned by Time

Helios Data Lake in S3

Analytics@Grab Today

Marketing Analytics

User Trust

Data Science

Helios Data

Lake in S3

Data Gateway

❖ Group based ACL

❖ Custom JDBC Driver

❖ Query Parser extracts

Tables/Columns used

❖ Uses correct cluster

based on permissions

❖ Access and Query Logs

FutureHelios Data Lake in S3

Real time streaming

Real time monitoring

We’re just getting started.