The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Move to Spark-(Yandu...
-
Upload
spark-summit -
Category
Data & Analytics
-
view
593 -
download
0
Transcript of The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Move to Spark-(Yandu...
![Page 1: The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Move to Spark-(Yandu Oppacher, Shopify)](https://reader030.fdocuments.in/reader030/viewer/2022033023/55cfbb8fbb61eb7f588b46fa/html5/thumbnails/1.jpg)
The Little Warehouse That Couldn’t Or: How We Learned to
Stop Worrying and Move to Spark
1
Yandu Oppacher (@yandu)Data Infrastructure
![Page 2: The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Move to Spark-(Yandu Oppacher, Shopify)](https://reader030.fdocuments.in/reader030/viewer/2022033023/55cfbb8fbb61eb7f588b46fa/html5/thumbnails/2.jpg)
2
Shopify Stores
![Page 3: The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Move to Spark-(Yandu Oppacher, Shopify)](https://reader030.fdocuments.in/reader030/viewer/2022033023/55cfbb8fbb61eb7f588b46fa/html5/thumbnails/3.jpg)
ETL Warehouse Reporting
August 2013
TilllerRuby Vertica
3
![Page 4: The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Move to Spark-(Yandu Oppacher, Shopify)](https://reader030.fdocuments.in/reader030/viewer/2022033023/55cfbb8fbb61eb7f588b46fa/html5/thumbnails/4.jpg)
Why we had to move
• Data volume
• Data/Query complexity
• Performance issues
4
![Page 5: The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Move to Spark-(Yandu Oppacher, Shopify)](https://reader030.fdocuments.in/reader030/viewer/2022033023/55cfbb8fbb61eb7f588b46fa/html5/thumbnails/5.jpg)
Couple of false starts
5
Pig + Luigi
Pig + Oozie
Platfora
![Page 6: The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Move to Spark-(Yandu Oppacher, Shopify)](https://reader030.fdocuments.in/reader030/viewer/2022033023/55cfbb8fbb61eb7f588b46fa/html5/thumbnails/6.jpg)
–platfora.com
“Without coding or ETL, data warehousing, BI tools, or breaking a
sweat.”
6
![Page 7: The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Move to Spark-(Yandu Oppacher, Shopify)](https://reader030.fdocuments.in/reader030/viewer/2022033023/55cfbb8fbb61eb7f588b46fa/html5/thumbnails/7.jpg)
Enter Spark
• Fast
• Nice development model
• Python
7
![Page 8: The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Move to Spark-(Yandu Oppacher, Shopify)](https://reader030.fdocuments.in/reader030/viewer/2022033023/55cfbb8fbb61eb7f588b46fa/html5/thumbnails/8.jpg)
88
The Good Book
![Page 9: The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Move to Spark-(Yandu Oppacher, Shopify)](https://reader030.fdocuments.in/reader030/viewer/2022033023/55cfbb8fbb61eb7f588b46fa/html5/thumbnails/9.jpg)
GMVA Case Study
9
165,000+ACTIVE SHOPIFY MERCHANTS
$8 BILLION+CUMULATIVE GMV
![Page 10: The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Move to Spark-(Yandu Oppacher, Shopify)](https://reader030.fdocuments.in/reader030/viewer/2022033023/55cfbb8fbb61eb7f588b46fa/html5/thumbnails/10.jpg)
Growing pains
• Joins
• Groupings
• General data skew
• Getting to know python’s performance quirks
10
![Page 11: The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Move to Spark-(Yandu Oppacher, Shopify)](https://reader030.fdocuments.in/reader030/viewer/2022033023/55cfbb8fbb61eb7f588b46fa/html5/thumbnails/11.jpg)
Starscream
11
• specialized joins
• resolvers
• range
• cassandra
• overby
• contracts
• incrementalized fact builds
![Page 12: The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Move to Spark-(Yandu Oppacher, Shopify)](https://reader030.fdocuments.in/reader030/viewer/2022033023/55cfbb8fbb61eb7f588b46fa/html5/thumbnails/12.jpg)
Our current stack
12
Kafka
OLTPHDFS
Cassandra
Spark
FrontroomBackroom
Redshift
Tableau
![Page 13: The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Move to Spark-(Yandu Oppacher, Shopify)](https://reader030.fdocuments.in/reader030/viewer/2022033023/55cfbb8fbb61eb7f588b46fa/html5/thumbnails/13.jpg)
Thank you
13
Yandu Oppacher (@yandu)Data Infrastructure