Instrumenting your Instruments
-
Upload
hadoop-summit -
Category
Technology
-
view
222 -
download
0
Transcript of Instrumenting your Instruments
![Page 1: Instrumenting your Instruments](https://reader030.fdocuments.in/reader030/viewer/2022011722/58710fe61a28abac6d8b5817/html5/thumbnails/1.jpg)
INSTRUMENTING YOUR INSTRUMENTS
Premal ShahCo-Founder @ 6senseHadoop Summit 2016
![Page 2: Instrumenting your Instruments](https://reader030.fdocuments.in/reader030/viewer/2022011722/58710fe61a28abac6d8b5817/html5/thumbnails/2.jpg)
AGENDA
What does 6sense do?How do we do it?What does the pipeline look like?Where do we do it?What are the challenges?How are we planning to solve them?
![Page 3: Instrumenting your Instruments](https://reader030.fdocuments.in/reader030/viewer/2022011722/58710fe61a28abac6d8b5817/html5/thumbnails/3.jpg)
WHAT DOES 6SENSE DO?
• We find prospects that are in market to buy• We empower marketing and sales teams
![Page 4: Instrumenting your Instruments](https://reader030.fdocuments.in/reader030/viewer/2022011722/58710fe61a28abac6d8b5817/html5/thumbnails/4.jpg)
SAMPLE OUTPUTAccount Name Buying Stage Profile Fit
ACME Corporation Purchase Strong
ABC Corp Decision Strong
XYZ Systems Consideration Medium
Doe Inc Awareness Strong
PURCHASE
DECISION
CONSIDERATION
AWARENESS
![Page 5: Instrumenting your Instruments](https://reader030.fdocuments.in/reader030/viewer/2022011722/58710fe61a28abac6d8b5817/html5/thumbnails/5.jpg)
HOW DO WE DO IT?
1st Party WebCRM
Marketing Automati
on
3rd Party• Web• Search • Ad
Impressions
Modelling & Scoring
Actionable Data for the
Customer
![Page 6: Instrumenting your Instruments](https://reader030.fdocuments.in/reader030/viewer/2022011722/58710fe61a28abac6d8b5817/html5/thumbnails/6.jpg)
Customer Systems
WHAT DOES THE PIPELINE LOOK LIKE?
Customer
Systems
Ingest
Process
Export
Customer
Systems
![Page 7: Instrumenting your Instruments](https://reader030.fdocuments.in/reader030/viewer/2022011722/58710fe61a28abac6d8b5817/html5/thumbnails/7.jpg)
THE DAILY PROCESS GRAPH (DAG)
![Page 8: Instrumenting your Instruments](https://reader030.fdocuments.in/reader030/viewer/2022011722/58710fe61a28abac6d8b5817/html5/thumbnails/8.jpg)
THE REAL WORLD
![Page 9: Instrumenting your Instruments](https://reader030.fdocuments.in/reader030/viewer/2022011722/58710fe61a28abac6d8b5817/html5/thumbnails/9.jpg)
THE REAL WORLD * N
![Page 10: Instrumenting your Instruments](https://reader030.fdocuments.in/reader030/viewer/2022011722/58710fe61a28abac6d8b5817/html5/thumbnails/10.jpg)
PIPELINE COMPONENTS
Hadoop Eco System
YARN
Hive
Presto
Mesos World
Mesos
Chronos
Marathon
![Page 11: Instrumenting your Instruments](https://reader030.fdocuments.in/reader030/viewer/2022011722/58710fe61a28abac6d8b5817/html5/thumbnails/11.jpg)
WORKFLOW
Chronos Queue Marathon
JobsHadoop
HivePrestoPython
![Page 12: Instrumenting your Instruments](https://reader030.fdocuments.in/reader030/viewer/2022011722/58710fe61a28abac6d8b5817/html5/thumbnails/12.jpg)
WHERE DO WE DO IT?
• AWS─ Elastic─ Easy to experiment─ No CAPEX
• Hadoop─ Data Nodes are run separately from Node Managers─ Most of the data sits in S3
![Page 13: Instrumenting your Instruments](https://reader030.fdocuments.in/reader030/viewer/2022011722/58710fe61a28abac6d8b5817/html5/thumbnails/13.jpg)
PROJECT RAVEN
![Page 14: Instrumenting your Instruments](https://reader030.fdocuments.in/reader030/viewer/2022011722/58710fe61a28abac6d8b5817/html5/thumbnails/14.jpg)
WHAT AFFECTS PERFORMANCE
• Hive─ Joins ─ Non-Partitioned tables─ Filters─ Bucketing
• Hadoop─ File format─ Compression─ Data Locality
![Page 15: Instrumenting your Instruments](https://reader030.fdocuments.in/reader030/viewer/2022011722/58710fe61a28abac6d8b5817/html5/thumbnails/15.jpg)
METRICS THAT MATTER• # of Mappers
• # of Input Files
• # of Input Records
• # of Records passed on to the next stage
• Time taken in─ Mappers─ Copy─ Shuffle─ Reducers
• # of Reducers
• # of compressed vs uncompressed files
• File formats
• Etc.
![Page 16: Instrumenting your Instruments](https://reader030.fdocuments.in/reader030/viewer/2022011722/58710fe61a28abac6d8b5817/html5/thumbnails/16.jpg)
WHAT DO WE STORE?
• Job Name 1─ Date 1
o Yarn Job # 1 Metrics
o Yarn Job # 2 Metrics
─ Date 2o Repeat as above
• Job Name 2─ Repeat as above
![Page 17: Instrumenting your Instruments](https://reader030.fdocuments.in/reader030/viewer/2022011722/58710fe61a28abac6d8b5817/html5/thumbnails/17.jpg)
WHAT DO WE USE THEM FOR?
• Finding the Job that ─ Is the slowest─ Process the most files─ Filter out most of the data─ Use the most amount of memory
• Observe trends over time in the above metrics
• Get alerted on changes in the trends, both up and down
![Page 18: Instrumenting your Instruments](https://reader030.fdocuments.in/reader030/viewer/2022011722/58710fe61a28abac6d8b5817/html5/thumbnails/18.jpg)
RECOMMENDATIONS
• Storage Format
• Compression Type
• Partition Columns
• Bucketing
• Etc.
![Page 19: Instrumenting your Instruments](https://reader030.fdocuments.in/reader030/viewer/2022011722/58710fe61a28abac6d8b5817/html5/thumbnails/19.jpg)
OPTIMIZATIONS
• Which job is causing the bottleneck?
• How many errors can we tolerate?
• Which job is the biggest offender?
• Which job fails the most?
• What did the latest release do?
![Page 20: Instrumenting your Instruments](https://reader030.fdocuments.in/reader030/viewer/2022011722/58710fe61a28abac6d8b5817/html5/thumbnails/20.jpg)
SCALING
• Can we scale the number of customers?
• What does it cost to add a customer?
• What does it cost to add a job to each customer’s pipeline?
![Page 21: Instrumenting your Instruments](https://reader030.fdocuments.in/reader030/viewer/2022011722/58710fe61a28abac6d8b5817/html5/thumbnails/21.jpg)
VENDOR SHOUT OUT
• ClusterK (now AWS Spot Fleet)─ Allows us to use different instance types to load balance and reduce costs
• Sumo Logic─ Detect variances in behavior over a custom time period
• OpsClarity─ Collects, monitors and alerts on the following metrics
o AWS Cloud Watch metrics (Queue length, S3 bucket size, etc.)o Host metrics (CPU, Memory, Disk Space, etc.)o Service metrics (YARN, HBase, Mesos, etc.)o Container metrics - Dockero Custom metrics – Anything else you want to send
![Page 22: Instrumenting your Instruments](https://reader030.fdocuments.in/reader030/viewer/2022011722/58710fe61a28abac6d8b5817/html5/thumbnails/22.jpg)
THANK YOU
• premal at 6sense.com
• https://www.linkedin.com/in/premaljshah