Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Conference San Jose 2017
Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage
-
Upload
alluxio-inc -
Category
Software
-
view
623 -
download
0
Transcript of Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage
![Page 1: Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage](https://reader035.fdocuments.in/reader035/viewer/2022062823/5875bb6f1a28ab33128b4661/html5/thumbnails/1.jpg)
Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage
October 2016
Gene Pang
![Page 2: Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage](https://reader035.fdocuments.in/reader035/viewer/2022062823/5875bb6f1a28ab33128b4661/html5/thumbnails/2.jpg)
2
About Me and Alluxio, Inc.
• Team members from Google, Palantir, Uber, Yahoo with years of distributed systems development experience
• Graduated from Stanford University, UC Berkeley, CMU, Peking University, and Tsinghua, with CS masters or PhDs
• Top 9 committers of the Alluxio open source project
AlluxioTeam
Gene Pang, Software Engineer, Alluxio Maintainer
Ph.D. from UC Berkeley AMPLabPreviously on Google F1 team
Twitter: @unityxx
• Andreessen HorowitzInvestors
![Page 3: Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage](https://reader035.fdocuments.in/reader035/viewer/2022062823/5875bb6f1a28ab33128b4661/html5/thumbnails/3.jpg)
3
AGENDA
• Alluxio Open Source Status and History
• Alluxio Overview
• Alluxio Use Cases
• What’s Next?
![Page 4: Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage](https://reader035.fdocuments.in/reader035/viewer/2022062823/5875bb6f1a28ab33128b4661/html5/thumbnails/4.jpg)
4
HISTORY
• Started at UC Berkeley AMPLab In Summer 2012• Original named as Tachyon
• Open Sourced in 2013• Apache License 2.0• Latest Stable Release: Alluxio 1.2.0• Next Release (Alluxio 1.3.0) soon!
• Rebranded as Alluxio in 2016
![Page 5: Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage](https://reader035.fdocuments.in/reader035/viewer/2022062823/5875bb6f1a28ab33128b4661/html5/thumbnails/5.jpg)
5
OPEN SOURCE ALLUXIO
• One of the fastest growing open-source projects in the big data ecosystem
• Currently over 300 contributors from over 100 organizations
• Welcome to join our community!
Popular Open Source Projects’ Growth
Alluxio
![Page 6: Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage](https://reader035.fdocuments.in/reader035/viewer/2022062823/5875bb6f1a28ab33128b4661/html5/thumbnails/6.jpg)
6
BIG DATA ECOSYSTEM TODAYBIG DATA ECOSYSTEM WITH ALLUXIOBIG DATA ECOSYSTEM YESTERDAY
…
…
FUSE Compatible File SystemHadoop Compatible File System Native Key-Value InterfaceNative File System
Enabling any application to access data from any storage system at memory-speed
BIG DATA ECOSYSTEM ISSUES
GlusterFS InterfaceAmazon S3 Interface Swift InterfaceHDFS Interface
![Page 7: Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage](https://reader035.fdocuments.in/reader035/viewer/2022062823/5875bb6f1a28ab33128b4661/html5/thumbnails/7.jpg)
7
• Memory is getting Faster, Larger, and Cheaper
• Memory price as halving every 18 months
• Disk throughput increasing slowly
TECHNOLOGY TRENDS
Top left chart: https://lazure2.wordpress.com/2013/07/02/20-years-of-samsung-new-management-as-manifested-by-the-latest-june-20th-galaxy-ativ-innovations/
Top right chart: people.eecs.berkeley.edu/~istoica/classes/cs294/15/notes/02-TechnologyTrends.ppt
Bottom chart: jcmit.com/
![Page 8: Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage](https://reader035.fdocuments.in/reader035/viewer/2022062823/5875bb6f1a28ab33128b4661/html5/thumbnails/8.jpg)
8
File System APISoftware Only
ALLUXIO ATTRIBUTES
Memor y-Speed Virtual Distributed Storage
Scale out architecture
Virtualizes across different storage
systems, providing a unified namespace
Memory-speed access to data
![Page 9: Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage](https://reader035.fdocuments.in/reader035/viewer/2022062823/5875bb6f1a28ab33128b4661/html5/thumbnails/9.jpg)
9
Server A
A p p l i c a t i o n s
Server B
A p p l i c a t i o n s
Server Z
A p p l i c a t i o n s
Server C
A p p l i c a t i o n sA l l u x i o A l l u x i o A l l u x i oA l l u x i o
ALLUXIO SOLUTION DEPLOYMENT
St o ra g e B Sto ra g e C Sto ra g e ZSto ra g e A
![Page 10: Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage](https://reader035.fdocuments.in/reader035/viewer/2022062823/5875bb6f1a28ab33128b4661/html5/thumbnails/10.jpg)
10
ALLUXIO BENEFITS
UnificationNew workflows across any data in any storage system
PerformanceHigh performance data access
FlexibilityWork with the compute and storage frameworks of your choice
CostGrow compute and storage systems independently
![Page 11: Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage](https://reader035.fdocuments.in/reader035/viewer/2022062823/5875bb6f1a28ab33128b4661/html5/thumbnails/11.jpg)
11
USE CASE 1 – Accelerate I/O to/from Remote Storage
• Compute and Storage Separation• Advantages• Meet different compute and storage hardware
requirements efficiently• Scale compute and storage independently• Store data in Traditional filers/SANs and object
stores cost effectively• Compute on data in existing storage via Big Data
Computational frameworks• Disadvantage• Accessing data requires remote I/O
![Page 12: Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage](https://reader035.fdocuments.in/reader035/viewer/2022062823/5875bb6f1a28ab33128b4661/html5/thumbnails/12.jpg)
12
Use Case without Alluxio
Spark
Storage
Low latency, memory throughput
High latency, network throughput
![Page 13: Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage](https://reader035.fdocuments.in/reader035/viewer/2022062823/5875bb6f1a28ab33128b4661/html5/thumbnails/13.jpg)
13
Use Case with Alluxio
Spark
Storage
AlluxioKeeping data in Alluxio accelerates data access
![Page 14: Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage](https://reader035.fdocuments.in/reader035/viewer/2022062823/5875bb6f1a28ab33128b4661/html5/thumbnails/14.jpg)
14
CASE STUDY
Baidu File System
The performance was amazing. With Spark SQL alone, it took 100-150 seconds to finish a query; using Alluxio, where data may hit local or remote Alluxio nodes, it took 10-15 seconds.
- Shaoshan Liu, Baidu
RESULTS
• Data queries are now 30x faster with Alluxio
• Alluxio cluster run stably, providing over 50TB of RAM space
• By using Alluxio, batch queries usually lasting over 15 minutes were transformed into an interactive query taking less than 30 seconds
Accelerate Access to Remote Storage
• 200+ nodes deployment
• 2+ petabytes of storage
• Mix of memory + HDD
![Page 15: Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage](https://reader035.fdocuments.in/reader035/viewer/2022062823/5875bb6f1a28ab33128b4661/html5/thumbnails/15.jpg)
15
USE CASE 2 – Share Data Across Jobs at Memory Speed
• Architectures Requiring Shared Data• Pipelines: output of one job is input of the next job• Different applications, jobs, or contexts read the
same data• Disadvantage• Sharing data requires I/O
![Page 16: Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage](https://reader035.fdocuments.in/reader035/viewer/2022062823/5875bb6f1a28ab33128b4661/html5/thumbnails/16.jpg)
16
Use Case without Alluxio
Spark
Storage
MapReduce Spark
Network I/O
Disk I/O
I/O slows down
sharing
![Page 17: Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage](https://reader035.fdocuments.in/reader035/viewer/2022062823/5875bb6f1a28ab33128b4661/html5/thumbnails/17.jpg)
17
Use Case with Alluxio
Spark
Storage
MapReduce Spark
Sharing data with Alluxio via memory
Alluxio
![Page 18: Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage](https://reader035.fdocuments.in/reader035/viewer/2022062823/5875bb6f1a28ab33128b4661/html5/thumbnails/18.jpg)
18
CASE STUDY
Thanks to Alluxio, we now have the raw data immediately available at every iteration and we can skip the costs of loading in terms of time waiting, network traffic, and RDBMS activity.
- Henry Powell, Barclays
RESULTS
• Barclays workflow iteration time decreased from hours to seconds
• Alluxio enabled workflows that were impossible before
• By keeping data only in memory, the I/O cost of loading and storing in Alluxio is now on the order of seconds
Relational Database
Share Data Across Jobs at Memory-Speed
• 6 node deployment
• 1TB of storage
• Memory only
![Page 19: Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage](https://reader035.fdocuments.in/reader035/viewer/2022062823/5875bb6f1a28ab33128b4661/html5/thumbnails/19.jpg)
19
USE CASE 3 - Transparently Manage Data Across Storage Systems
• Reasons• Most enterprises have multiple storage systems• New (better, faster, cheaper) storage systems arise
• Disadvantage• Managing data across systems can be difficult
![Page 20: Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage](https://reader035.fdocuments.in/reader035/viewer/2022062823/5875bb6f1a28ab33128b4661/html5/thumbnails/20.jpg)
20
Use Case Explained
Storage
Alluxio
Spark MapReduce Spark
Storage Storage
Flexible,
simple
no application changes,
new mount point
![Page 21: Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage](https://reader035.fdocuments.in/reader035/viewer/2022062823/5875bb6f1a28ab33128b4661/html5/thumbnails/21.jpg)
21
CASE STUDY
We’ve been running Alluxio in production for over 9 months, resulting in 15x speedup on average, and 300x speedup at peak service times.
- Xueyan Li, Qunar
RESULTS
• Alluxio’s unified namespace enables different applications and frameworks to easily interact with their data from different storage systems
• Improved the performance of their system with 15x – 300x speedups
• Tiered storage feature manages various storage resources including memory, SSD and disk
Transparently Manage Data Across Different Storage Systems
• 200+ nodes deployment
• 6 billion logs (4.5 TB) daily
• Mix of Memory + HDD
![Page 22: Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage](https://reader035.fdocuments.in/reader035/viewer/2022062823/5875bb6f1a28ab33128b4661/html5/thumbnails/22.jpg)
22
What’s Next?
![Page 23: Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage](https://reader035.fdocuments.in/reader035/viewer/2022062823/5875bb6f1a28ab33128b4661/html5/thumbnails/23.jpg)
• Contact: [email protected] or [email protected] • Twitter: @Alluxio• Websites: www.alluxio.com and www.alluxio.org• Alluxio Github: www.github.com/Alluxio/alluxio
Thank you!