Presto - Analytical Database. Overview and use cases.
-
Upload
wojciech-biela -
Category
Data & Analytics
-
view
790 -
download
2
Transcript of Presto - Analytical Database. Overview and use cases.
![Page 1: Presto - Analytical Database. Overview and use cases.](https://reader034.fdocuments.in/reader034/viewer/2022042907/58e85ec51a28ab007c8b4881/html5/thumbnails/1.jpg)
11
Presto - Analytical Database Wojciech BielaŁukasz Osipiuk
https://prestodb.io
![Page 2: Presto - Analytical Database. Overview and use cases.](https://reader034.fdocuments.in/reader034/viewer/2022042907/58e85ec51a28ab007c8b4881/html5/thumbnails/2.jpg)
2
Who are we?
Center for Hadoop
![Page 3: Presto - Analytical Database. Overview and use cases.](https://reader034.fdocuments.in/reader034/viewer/2022042907/58e85ec51a28ab007c8b4881/html5/thumbnails/3.jpg)
3
History of Presto
FALL 20126 developers start Presto
development
FALL 201488 Releases
41 Contributors 3943 Commits
FALL 2015132 Releases
105 Contributors6300 Commits
---------Teradata part of
Presto community & offers support
SPRING 2013Presto rolled out within Facebook
FALL 2013Facebook open sources Presto
FALL 2008Facebook open
sources Hive
![Page 4: Presto - Analytical Database. Overview and use cases.](https://reader034.fdocuments.in/reader034/viewer/2022042907/58e85ec51a28ab007c8b4881/html5/thumbnails/4.jpg)
4
➔ 100% open source distributed ANSI SQL engine for Big Data
➔ Optimized for low latency, Interactive querying◆ Cross platform query capability, not only SQL on Hadoop◆ Distributed under the Apache license, now supported by Teradata◆ Used by a community of well known, well respected technology companies◆ Modern code base◆ Proven scalability
What is Presto?
![Page 5: Presto - Analytical Database. Overview and use cases.](https://reader034.fdocuments.in/reader034/viewer/2022042907/58e85ec51a28ab007c8b4881/html5/thumbnails/5.jpg)
5
High level architecture
Data stream API
Worker
Data stream API
Worker
Coordinator
MetadataAPI
Parser/analyzer Planner Scheduler
Worker
Client
Data locationAPI
Pluggable
![Page 6: Presto - Analytical Database. Overview and use cases.](https://reader034.fdocuments.in/reader034/viewer/2022042907/58e85ec51a28ab007c8b4881/html5/thumbnails/6.jpg)
6
Plan executionHive Presto
map
reduce
I/O
I/O
I/O
I/O
I/O
task task
task task
task task
task
I/O
![Page 7: Presto - Analytical Database. Overview and use cases.](https://reader034.fdocuments.in/reader034/viewer/2022042907/58e85ec51a28ab007c8b4881/html5/thumbnails/7.jpg)
7
Presto Extensibility – connector interfaces
Parser/analyzer Planner
Worker
Data location API
Hiv
e
Ca
ssa
nd
ra
Ka
fka
MyS
QL
…
Metadata API
Hiv
e
Ca
ssa
nd
ra
Ka
fka
MyS
QL
…
Data stream API
Hiv
e
Ca
ssa
nd
ra
Ka
fka
MyS
QL
…
Scheduler
Coordinator
![Page 8: Presto - Analytical Database. Overview and use cases.](https://reader034.fdocuments.in/reader034/viewer/2022042907/58e85ec51a28ab007c8b4881/html5/thumbnails/8.jpg)
8
Presto Extensibility – plugins
➔ Connectors
➔ Data types
➔ Extra functions
➔ Security providers
![Page 9: Presto - Analytical Database. Overview and use cases.](https://reader034.fdocuments.in/reader034/viewer/2022042907/58e85ec51a28ab007c8b4881/html5/thumbnails/9.jpg)
9
➔ Facebook◆ Multiple production clusters (100s of nodes total)
● Including 300PB Hadoop data warehouse● Single cluster size order of 10s of nodes
◆ 1000s of internal daily active users◆ Millions of queries each month◆ Multiple PBs scanned every day◆ Trillions of rows a day◆ ORC format
➔ Netflix ◆ Over 250-node production cluster on EC2◆ Over 15 PB in S3 (Parquet format)◆ Over 300 users and 2.5K queries daily◆ presto-cli, R, Python, BI tools◆ 50% queries under 4s
Some usage facts
![Page 10: Presto - Analytical Database. Overview and use cases.](https://reader034.fdocuments.in/reader034/viewer/2022042907/58e85ec51a28ab007c8b4881/html5/thumbnails/10.jpg)
10
Netflix Data Pipeline
Suro / Kafka Cassandra
AegisthusUrsula
Amazon S3
TVs mobile laptop dimensionsevents
TD
TVs mobile laptopTVs mobile laptop
![Page 11: Presto - Analytical Database. Overview and use cases.](https://reader034.fdocuments.in/reader034/viewer/2022042907/58e85ec51a28ab007c8b4881/html5/thumbnails/11.jpg)
11
Presto use-cases at Facebook
➔ three use cases
◆ Data warehouse - big data
◆ User facing - small data
◆ User facing - medium data
![Page 12: Presto - Analytical Database. Overview and use cases.](https://reader034.fdocuments.in/reader034/viewer/2022042907/58e85ec51a28ab007c8b4881/html5/thumbnails/12.jpg)
12
Presto use-cases at Facebook (data warehouse)
HDFS data warehouse
![Page 13: Presto - Analytical Database. Overview and use cases.](https://reader034.fdocuments.in/reader034/viewer/2022042907/58e85ec51a28ab007c8b4881/html5/thumbnails/13.jpg)
13
Presto use-cases at Facebook (data warehouse)
➔ Multiple clusters
➔ O(103) of users
➔ O(106) queries per month
➔ petabytes of data scanned every day
➔ 100s of concurrent queries
![Page 14: Presto - Analytical Database. Overview and use cases.](https://reader034.fdocuments.in/reader034/viewer/2022042907/58e85ec51a28ab007c8b4881/html5/thumbnails/14.jpg)
14
Presto use-cases at Facebook (data warehouse)
Loader
Client
Presto
Data Node
Presto
Data Node
M/R
Data Node
M/R
Data Node
Presto
Data Node
Presto
Hive
![Page 15: Presto - Analytical Database. Overview and use cases.](https://reader034.fdocuments.in/reader034/viewer/2022042907/58e85ec51a28ab007c8b4881/html5/thumbnails/15.jpg)
15
Presto use-cases at Facebook (data warehouse)
Client
Presto
PrestoDispatcher
Presto
Presto
Presto
Presto
Presto
![Page 16: Presto - Analytical Database. Overview and use cases.](https://reader034.fdocuments.in/reader034/viewer/2022042907/58e85ec51a28ab007c8b4881/html5/thumbnails/16.jpg)
16
Presto use-cases at Facebook (realtime)
Real time user facing
![Page 17: Presto - Analytical Database. Overview and use cases.](https://reader034.fdocuments.in/reader034/viewer/2022042907/58e85ec51a28ab007c8b4881/html5/thumbnails/17.jpg)
17
Presto use-cases at Facebook (realtime)
Requirements
➔ User facing
➔ 0.1-5 seconds latency
➔ Support for data updates
➔ highly available
➔ 10-15 way joins
![Page 18: Presto - Analytical Database. Overview and use cases.](https://reader034.fdocuments.in/reader034/viewer/2022042907/58e85ec51a28ab007c8b4881/html5/thumbnails/18.jpg)
18
Presto use-cases at Facebook (realtime)
Loader
Client
mysqlPresto
Presto
Presto
mysql
mysql
mysql
mysql
![Page 19: Presto - Analytical Database. Overview and use cases.](https://reader034.fdocuments.in/reader034/viewer/2022042907/58e85ec51a28ab007c8b4881/html5/thumbnails/19.jpg)
19
Presto use-cases at Facebook (semi realtime)
Requirements
➔ Large data sets (smaller than warehouse)
➔ seconds to minutes latency
➔ predictable performance
➔ 5-15 minutes load latency
➔ 100s concurrent queries
![Page 20: Presto - Analytical Database. Overview and use cases.](https://reader034.fdocuments.in/reader034/viewer/2022042907/58e85ec51a28ab007c8b4881/html5/thumbnails/20.jpg)
20
Presto use-cases at Facebook (semi realtime)
Raptor
![Page 21: Presto - Analytical Database. Overview and use cases.](https://reader034.fdocuments.in/reader034/viewer/2022042907/58e85ec51a28ab007c8b4881/html5/thumbnails/21.jpg)
21
Presto use-cases at Facebook (semi realtime)
Raptor Loader
Client
Presto
Flash
Presto
Flash
Presto
Flash
Presto
FlashPresto
mysql
Kafka
Kafka
KafkaKafka
Loader
Gluster
Gluster
backup tier
![Page 22: Presto - Analytical Database. Overview and use cases.](https://reader034.fdocuments.in/reader034/viewer/2022042907/58e85ec51a28ab007c8b4881/html5/thumbnails/22.jpg)
22
Presto use-cases at Facebook (semi realtime)
Raptor Loader
Client
Presto
Flash
Presto
Flash
Presto
Flash
Presto
FlashPresto
mysql
Kafka
Kafka
KafkaKafka
Loader
Gluster
Gluster
backup tier
INSERT INTO raptor_table SELECT * from kafka_table where token BETWEEN ${last_token} AND ${next_token}
MARK LOAD in PROGRESS in MySQL
![Page 23: Presto - Analytical Database. Overview and use cases.](https://reader034.fdocuments.in/reader034/viewer/2022042907/58e85ec51a28ab007c8b4881/html5/thumbnails/23.jpg)
23
Presto use-cases at Facebook (semi realtime)
Extra features
➔ Physical data reorganization
➔ Fully fledged and atomic DDL
➔ Atomic data loading
➔ Tiered architecture
![Page 24: Presto - Analytical Database. Overview and use cases.](https://reader034.fdocuments.in/reader034/viewer/2022042907/58e85ec51a28ab007c8b4881/html5/thumbnails/24.jpg)
24
➔ Data stays in memory during execution and is pipelined across nodes MPP-style
➔ Vectorized columnar processing
➔ Presto is written in highly tuned Java◆ Efficient in-memory data structures◆ Very careful coding of inner loops◆ Bytecode generation
➔ Optimized ORC reader
➔ Predicates push-down
➔ Query optimizer
Presto = Performance
![Page 25: Presto - Analytical Database. Overview and use cases.](https://reader034.fdocuments.in/reader034/viewer/2022042907/58e85ec51a28ab007c8b4881/html5/thumbnails/25.jpg)
25
www.github.com/facebook/prestowww.github.com/prestodb
Certified Distro: www.teradata.com/prestoWebsite: www.prestodb.ioPresto : User’s Group: www.groups.google.com/group/presto-users
Interested in joining Teradata?● Presto development ● other Hadoop related development and consulting
contact our Recruitment Partner: Renata Rosłoniec (VBC)tel. 514 035 237, [email protected]
How can I contribute?