Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities...

25
Zhenxiao Luo Software Engineer @ Uber Even Faster: When Presto Meets Parquet @ Uber

Transcript of Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities...

Page 1: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU

Zhenxiao Luo

Software Engineer @ Uber

Even Faster:

When Presto Meets Parquet

@ Uber

Page 2: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU

Mission

Uber Business Highlights

Analytics Infrastructure @ Uber

Presto

Interactive SQL engine for Big Data

Parquet

Columnar Storage for Big Data

Parquet Optimizations for Presto

Ongoing Work

Agenda

Page 3: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU

Transportation as reliable as running water, everywhere, for everyone

Uber Mission

Page 4: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU

Uber Stats

6Continents

73Countries

450Cities

12,000Employees

10+ MillionAvg. Trips/Day

40+ MillionMAU Riders

1.5+ MillionMAU Drivers

Page 5: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU

Kafka

Analytics Infrastructure @ Uber

Schemaless

MySQL, Postgres

Vertica

Streamio

RawData

Raw Tables

Sqoop

Reports

Hadoop

Hive Presto Spark

Notebook Ad Hoc QueriesReal Time

ApplicationsMachine

Learning JobsBusiness

Intelligence Jobs

Clu

ster

Man

agem

ent

All-

Act

ive

Obs

erva

bilit

y

Secu

rity

Vertica

SamzaPinotFlink

MemSQL

Modeled Tables

Streaming Warehouse

Real-time

Page 6: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU

Parquet @ Uber

Raw Tables

● No preprocessing

● Highly nested

● ~30 minutes ingestion latency

● Huge tables

Modeled Tables

● Preprocessing via Hive ETL

● Flattened

● ~12 hours ingestion latency

Page 7: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU

Scale of Presto @ Uber

● 2 clusters○ Application cluster

■ Hundreds of machines■ 100K queries per day■ P90: 30s

○ Ad hoc cluster■ Hundreds of machines■ 20K queries per day■ P90: 60s

● Access to both raw and model tables○ 5 petabytes of data

● Total 120K+ queries per day

Page 8: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU

● Marketplace pricing○ Real-time driver incentives

● Communication platform○ Driver quality and action platform○ Rider/driver cohorting○ Ops, comms, & marketing

● Growth marketing○ BI dashboard for growth marketing

● Data science○ Exploratory analytics using notebooks

● Data quality○ Freshness and quality check

● Ad hoc queries

Applications of Presto @ Uber

Page 9: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU

What is Presto: Interactive SQL Engine for Big Data

Interactive query speeds

Horizontally scalable

ANSI SQL

Battle-tested by Facebook, Uber, & Netflix

Completely open source

Access to petabytes of data in the Hadoop data lake

Page 10: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU

How Presto Works

Page 11: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU

Why Presto is Fast

● Data in memory during execution

● Pipelining and streaming

● Columnar storage & execution

● Bytecode generation

○ Inline virtual function calls

○ Inline constants

○ Rewrite inner loops

○ Rewrite type-specific branches

Page 12: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU

Resource Management

● Presto has its own resource manager

○ Not on YARN

○ Not on Mesos

● CPU Management

○ Priority queues

○ Short running queries higher priority

● Memory Management

○ Max memory per query per node

○ If query exceeds max memory limit, query fails

○ No OutOfMemory in Presto process

Page 13: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU

Limitations

● No fault tolerance

● Joins do not fit in memory

○ Query fails

○ No OutOfMemory in Presto process

○ Try it on Hive

● Coordinator is a single point of failure

Page 14: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU

Presto Connectors

Page 15: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU

Parquet: Columnar Storage for Big Data

Page 16: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU

Parquet Optimizations for Presto

Example Query:

SELECT base.driver_uuidFROM hdrone.mezzanine_tripsWHERE datestr = '2017-03-02' AND base.city_id in (12)

Data:

● Up to 15 levels of Nesting● Up to 80 fields inside each Struct● Fields are added/deleted/updated inside Struct

Page 17: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU

Old Parquet Reader

Page 18: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU

Nested Column Pruning

Page 19: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU

Columnar Reads

Page 20: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU

Predicate Pushdown

Page 21: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU

Dictionary Pushdown

Page 22: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU

Lazy Reads

Page 23: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU

Benchmarking Results

Page 24: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU

Ongoing Work

● Multi-tenancy support

● High availability for coordinator

● Geospatial optimization

● Authentication & authorization

Page 25: Even Faster: When Presto Meets Parquet @ Uber · Uber Stats 6 Continents 73 Countries 450 Cities 12,000 Employees 10+ Million Avg. Trips/Day 40+ Million MAU Riders 1.5+ Million MAU

Thank you

Proprietary and confidential © 2016 Uber Technologies, Inc. All rights reserved. No part of this document may be

reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any

information storage or retrieval systems, without permission in writing from Uber. This document is intended only for the

use of the individual or entity to whom it is addressed and contains information that is privileged, confidential or otherwise

exempt from disclosure under applicable law. All recipients of this document are notified that the information contained

herein includes proprietary and confidential information of Uber, and recipient may not make use of, disseminate, or in any

way disclose this document or any of the enclosed information to any person other than employees of addressee to the

extent necessary for consultations with authorized personnel of Uber.

We are Hiringhttps://www.uber.com/careers/list/27366/

Send resumes to:[email protected] or [email protected]

Interested in learning more about Uber Eng?Eng.uber.com

Follow us on Twitter:@UberEng