Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
-
Upload
spark-summit -
Category
Data & Analytics
-
view
133 -
download
0
Transcript of Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
![Page 1: Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli](https://reader035.fdocuments.in/reader035/viewer/2022070510/58abb4431a28ab04618b4d33/html5/thumbnails/1.jpg)
Virtualizing Analytics with Apache Spark
Arsalan Tavakoli-ShirajiSpark Summit East 2017
![Page 2: Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli](https://reader035.fdocuments.in/reader035/viewer/2022070510/58abb4431a28ab04618b4d33/html5/thumbnails/2.jpg)
Enterprise aspirations:More data, more intelligence
![Page 3: Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli](https://reader035.fdocuments.in/reader035/viewer/2022070510/58abb4431a28ab04618b4d33/html5/thumbnails/3.jpg)
So what’s the formula for success?
![Page 4: Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli](https://reader035.fdocuments.in/reader035/viewer/2022070510/58abb4431a28ab04618b4d33/html5/thumbnails/4.jpg)
ANALYTICS
PEOPLEDATA
3 pillars of any data-driven use case
![Page 5: Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli](https://reader035.fdocuments.in/reader035/viewer/2022070510/58abb4431a28ab04618b4d33/html5/thumbnails/5.jpg)
Data: Bigger, messier, more spread out
DATA • Spread out into silos• Varying types and structure• Faster Velocity
![Page 6: Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli](https://reader035.fdocuments.in/reader035/viewer/2022070510/58abb4431a28ab04618b4d33/html5/thumbnails/6.jpg)
Analytics: More variety and complexity
• Multiple approaches• Iterative discovery• Difficult to productionize
ANALYTICS
![Page 7: Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli](https://reader035.fdocuments.in/reader035/viewer/2022070510/58abb4431a28ab04618b4d33/html5/thumbnails/7.jpg)
People: Collaboration from start to finish
PEOPLE • Many roles involved• Diverse skillsets and goals• Inefficient hand-offs
![Page 8: Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli](https://reader035.fdocuments.in/reader035/viewer/2022070510/58abb4431a28ab04618b4d33/html5/thumbnails/8.jpg)
Can we reuse existing technologies?
![Page 9: Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli](https://reader035.fdocuments.in/reader035/viewer/2022070510/58abb4431a28ab04618b4d33/html5/thumbnails/9.jpg)
DATA
Only structured data; Costly to scale
First Generation: The Data WarehouseReporting on small data
ANALYTICS
PEOPLE
SQL only
Targeted at BI
![Page 10: Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli](https://reader035.fdocuments.in/reader035/viewer/2022070510/58abb4431a28ab04618b4d33/html5/thumbnails/10.jpg)
ANALYTICS
PEOPLE
Disparate and complex tools
Limited to developers with big data expertise
Second Generation: Hadoop + Data LakeCapture data first, ETL later
DATA
Hard to centralize the data;Limited value without ETL
![Page 11: Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli](https://reader035.fdocuments.in/reader035/viewer/2022070510/58abb4431a28ab04618b4d33/html5/thumbnails/11.jpg)
V I R T U A L A N A LY T I CS
Decoupled compute and storage
Uniform data management and security model
Unified analytics engine
Enterprise-wide collaboration
Data Warehouses
DATA
Cloud storage
Cloud Storage
And many others…
Hadoop Storage
PEOPLE
Data Science
Data Engineering
And many others…
BI Analysts
The New Paradigm
![Page 12: Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli](https://reader035.fdocuments.in/reader035/viewer/2022070510/58abb4431a28ab04618b4d33/html5/thumbnails/12.jpg)
Is Spark the Answer?
Data Warehouses
DATA
Cloud storage
Cloud Storage
And many others…
Hadoop Storage
PEOPLE
Data Science
Data Engineering
And many others…
BI Analysts
![Page 13: Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli](https://reader035.fdocuments.in/reader035/viewer/2022070510/58abb4431a28ab04618b4d33/html5/thumbnails/13.jpg)
Databricks + Apache Spark
Managed Cloud Platform Integrated Workspace
Production Workflow
Automation
Optimized Data Access
Layer
Databricks Enterprise Security
Data Warehouses
DATA
Cloud storage
Many others…
Cloud Storage
And many others…
Hadoop Storage
PEOPLE
Data Science
Data Engineering
And many others…
BI Analysts
![Page 14: Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli](https://reader035.fdocuments.in/reader035/viewer/2022070510/58abb4431a28ab04618b4d33/html5/thumbnails/14.jpg)
Case Study |
Video qualityReal-time anomaly detection
Viewer loyaltyGrow the Viacom audience
![Page 15: Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli](https://reader035.fdocuments.in/reader035/viewer/2022070510/58abb4431a28ab04618b4d33/html5/thumbnails/15.jpg)
The Road Ahead