Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017
-
Upload
alluxio-inc -
Category
Technology
-
view
80 -
download
0
Transcript of Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017
![Page 1: Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017](https://reader031.fdocuments.in/reader031/viewer/2022030306/58e4998b1a28abf5428b4acd/html5/thumbnails/1.jpg)
UNIFY DATA AT MEMORY SPEED Haoyuan (HY) Li, CEO @ Alluxio Inc. VAULT Conference 2017
March 2017
![Page 2: Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017](https://reader031.fdocuments.in/reader031/viewer/2022030306/58e4998b1a28abf5428b4acd/html5/thumbnails/2.jpg)
HISTORY
• Started at UC Berkeley AMPLab In Summer 2012 • Originally named as Tachyon • Rebranded to Alluxio in early 2016
• Open Sourced in 2013 • Apache License 2.0 • Latest Stable Release: Alluxio 1.4.0 • Alluxio 1.5.0 Planned For Q2, 2017
2
![Page 3: Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017](https://reader031.fdocuments.in/reader031/viewer/2022030306/58e4998b1a28abf5428b4acd/html5/thumbnails/3.jpg)
© 2017 Alluxio Confidential
BIG DATA ECOSYSTEM YESTERDAY
3
![Page 4: Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017](https://reader031.fdocuments.in/reader031/viewer/2022030306/58e4998b1a28abf5428b4acd/html5/thumbnails/4.jpg)
© 2017 Alluxio Confidential
BIG DATA ECOSYSTEM TODAY
…
…
3
![Page 5: Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017](https://reader031.fdocuments.in/reader031/viewer/2022030306/58e4998b1a28abf5428b4acd/html5/thumbnails/5.jpg)
© 2017 Alluxio Confidential
…
…
BIG DATA ECOSYSTEM ISSUES
3
![Page 6: Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017](https://reader031.fdocuments.in/reader031/viewer/2022030306/58e4998b1a28abf5428b4acd/html5/thumbnails/6.jpg)
© 2017 Alluxio Confidential
BIG DATA ECOSYSTEM WITH ALLUXIO
…
…
FUSE Compatible File System
Hadoop Compatible File System
Native Key-Value Interface
Native File System
GlusterFS InterfaceAmazon S3 Interface Swift InterfaceHDFS Interface
3
![Page 7: Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017](https://reader031.fdocuments.in/reader031/viewer/2022030306/58e4998b1a28abf5428b4acd/html5/thumbnails/7.jpg)
© 2017 Alluxio Confidential
BIG DATA ECOSYSTEM WITH ALLUXIO
…
…
FUSE Compatible File System
Hadoop Compatible File System
Native Key-Value Interface
Native File System
Enabling Application to Access Data from any Storage System at Memory-speed
GlusterFS InterfaceAmazon S3 Interface Swift InterfaceHDFS Interface
3
![Page 8: Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017](https://reader031.fdocuments.in/reader031/viewer/2022030306/58e4998b1a28abf5428b4acd/html5/thumbnails/8.jpg)
© 2017 Alluxio Confidential 4
![Page 9: Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017](https://reader031.fdocuments.in/reader031/viewer/2022030306/58e4998b1a28abf5428b4acd/html5/thumbnails/9.jpg)
© 2017 Alluxio Confidential 5
![Page 10: Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017](https://reader031.fdocuments.in/reader031/viewer/2022030306/58e4998b1a28abf5428b4acd/html5/thumbnails/10.jpg)
© 2017 Alluxio Confidential
FASTEST-GROWING BIG DATA PROJECT
6
![Page 11: Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017](https://reader031.fdocuments.in/reader031/viewer/2022030306/58e4998b1a28abf5428b4acd/html5/thumbnails/11.jpg)
© 2017 Alluxio Confidential
FASTEST-GROWING BIG DATA PROJECT
• Formerly named Tachyon, born in the AMPLab
• 500+ contributors from 100+ organizations
• Running world’s largest production clusters
6
![Page 12: Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017](https://reader031.fdocuments.in/reader031/viewer/2022030306/58e4998b1a28abf5428b4acd/html5/thumbnails/12.jpg)
© 2017 Alluxio Confidential
WHY ALLUXIO
7
Co-located compute and data with memory-speed access to data
Virtualized across different storage systems under a unified namespace
Scale-out architecture
File system API, software only
![Page 13: Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017](https://reader031.fdocuments.in/reader031/viewer/2022030306/58e4998b1a28abf5428b4acd/html5/thumbnails/13.jpg)
© 2017 Alluxio Confidential
ALLUXIO BENEFITS
Unification
New workflows across any data in any storage system
Orders of magnitude improvement in run time
Choice in compute and storage – grow each independently, buy only what is needed
Performance Flexibility
8
![Page 14: Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017](https://reader031.fdocuments.in/reader031/viewer/2022030306/58e4998b1a28abf5428b4acd/html5/thumbnails/14.jpg)
© 2017 Alluxio Confidential
ALLUXIO DEPLOYMENTS
9
![Page 15: Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017](https://reader031.fdocuments.in/reader031/viewer/2022030306/58e4998b1a28abf5428b4acd/html5/thumbnails/15.jpg)
© 2017 Alluxio Confidential
ALLUXIO USE CASES
On-Demand Analytics & Accelerating I/O to and from remote storage
Managing data across disparate storage systems
Sharing data across workloads at memory speed
10
![Page 16: Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017](https://reader031.fdocuments.in/reader031/viewer/2022030306/58e4998b1a28abf5428b4acd/html5/thumbnails/16.jpg)
© 2017 Alluxio Confidential
MANAGE DATA ACROSS STORAGE SYSTEMS
“We’ve been running in production for over 9 months, Alluxio’s enabled different applications & frameworks to easily interact with data from different storage systems
RESULTS
• Data sharing among Spark Streaming, Spark batch and Flink jobs provide efficient data sharing
• Improved the performance of their system with 15x – 300x speedups
• Tiered storage feature manages storage resources including memory, SSD and disk
Qunar uses real-time machine learning
for their website ads
• 200+ nodes deployment
• 6 billion logs (4.5 TB) daily
• Mix of Memory + HDD
ALLUXIO
11
![Page 17: Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017](https://reader031.fdocuments.in/reader031/viewer/2022030306/58e4998b1a28abf5428b4acd/html5/thumbnails/17.jpg)
© 2017 Alluxio Confidential
ON-DEMAND ANALYTICS &ACCELERATE I/O TO/FROM REMOTE STORAGE
“The performance was amazing. With Spark SQL alone, it took 100-150 seconds to finish a query; using Alluxio, where data may hit local or remote Alluxio nodes, it took 10-15 seconds.
RESULTS
• Data queries are now 30x faster with Alluxio
• Alluxio cluster runs stably, providing over 50TB of RAM space
• By using Alluxio, batch queries usually lasting over 15 minutes were transformed into an interactive query taking less than 30 seconds
PMs run interactive queries to gain
insights into their products & business
• 200+ nodes deployment
• 2+ petabytes of storage
• Mix of memory + HDD
ALLUXIO
Baidu File
System
12
![Page 18: Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017](https://reader031.fdocuments.in/reader031/viewer/2022030306/58e4998b1a28abf5428b4acd/html5/thumbnails/18.jpg)
© 2017 Alluxio Confidential
SHARE DATA ACROSS JOBS @ MEMORY SPEED
“Thanks to Alluxio, we now have the raw data immediately available at every iteration & can skip the costs of loading in terms of time waiting, network traffic, and RDBMS activity.
RESULTS
• Barclays workflow iteration time decreased from hours to seconds
• Alluxio enabled workflows that were impossible before
• By keeping data only in memory, the I/O cost of loading and storing in Alluxio is now on the order of seconds
Barclays uses query & machine learning
to train models for risk management
• 6 node deployment
• 1TB of storage
• Memory only
ALLUXIO
13
ALLUXIO
Relational Database: Teradata