2015-11-12 - Advanced Apache Spark Meetup @ Thumbtack
-
Upload
natekupp -
Category
Engineering
-
view
290 -
download
1
Transcript of 2015-11-12 - Advanced Apache Spark Meetup @ Thumbtack
2THUMBTACK NOVEMBER 2015
HIRING LOCAL PROFESSIONALS IS STILL SHOCKINGLY HARD
Directories have moved online, but the process hasn’t changed in generations
3THUMBTACK NOVEMBER 2015
THUMBTACK IS BUILDING THE BEST AND MOST TRUSTED WAY TO
HIRE A PROFESSIONAL FOR ANY PROJECT, ANYTIME, ANYWHERE
4THUMBTACK NOVEMBER 2015
• Interested, available and qualified professionals come to customers
• Customers have the confidence to know who is best and that they’re paying a fair price
• One-stop-shop for all their service needs
• Free to use
Customers
• A cost effective and performance-based way to acquired new customers ($3–15 to submit each quote)
• Eliminates need to spend time on outbound marketing
• Mobile platform to run their business
Professionals
OUR MARKETPLACE CONNECTS CUSTOMERS AND PROFESSIONALS
5THUMBTACK NOVEMBER 2015
Receive multiple quotes from pros
• Up to 5 quotes • Median quote within 1 hour • Pricing and response customized
to customer’s unique needs
Customers tell us what they need
• 800 categories with questions customized to each service
• 8-10 unique questions per category
Compare prices, reviews, profiles
• Detailed info on each pro • Reviews tied to past work • Licensing and other credentials
Hire the pro who’s right for them
• Customers can call or message pros to discuss the work before hiring
CUSTOMERS CAN GET FROM REQUEST TO HIRE IN < 1 HOUR
6THUMBTACK NOVEMBER 2015
DATA INFRASTRUCTURE @ THUMBTACK
Democratizing access to data & building data products
7THUMBTACK NOVEMBER 2015
USE CASES
Experiments A/B Testing
Analytics Ad-hoc SQL BI Dashboarding Event analytics
Data Products Matching Pricing
Icon Credit: Noun Project. Blake Thompson, Mister Pixel, Creative Stall
8THUMBTACK NOVEMBER 2015
2014: WHERE WE CAME FROM
Analytics & BI
Event Analytics A/B Testing & Experiments
events
relational
9THUMBTACK NOVEMBER 2015
KEY PROBLEMS TO SOLVE
• Analytics queries hitting production Postgres replica
• MongoDB not scaling with event data volume
• Single Python process on one machine running Mongo queries for event & experiment analysis
10THUMBTACK NOVEMBER 2015
HA HDFS
events
relational
Production Cluster
Looker Mode Analytics
Analytics / BI
Airflow
Impala
2015 DATA PLATFORM INFRASTRUCTURE
A/B Testing Matching
Eng ClientsPricing ...
Custom APIs
Sqoop
Custom ETL
Parquet (Snappy)JSON
(1.5.1 on YARN)
Spark Core Spark SQL MLLib
Spark SQL
11THUMBTACK NOVEMBER 2015
• Spark: Investigating Spark Streaming for event data, several additional use cases for MLLib
• Migrate Spark batch jobs from crons onto Airflow
• Moving event ETL pipeline onto Kinesis
WHAT'S NEXT?