Organizing for Data Science
Dan Mallinger Data Science Practice Manager
September 2014
CONFIDENTIAL | 2
• Data Science Practice Manager − Think Big Analytics
• Working with clients across − Financial Services − Advertising − Manufacturing − Social − Network Providers
Dan Mallinger
CONFIDENTIAL 2
CONFIDENTIAL | 3
• Define Data Science in the Organization • Look at Current Perspectives on Organization • Discuss Shortcomings • Review a Real World Solution
Today
CONFIDENTIAL 3
CONFIDENTIAL | 4
� Use Data to Improve Our Business
� Better Understand Customers � Act Proactively, Not Reactively
What Do We Hope to Do?
CONFIDENTIAL 4
CONFIDENTIAL | 5
� Scale � Robustness � Repeatability
Why Organize?
CONFIDENTIAL 5
CONFIDENTIAL | 6
� Revolutionizing Ad Targeting � Automating Deals and
Recommendations � Alerting Admins to New Network
Attacks
Perception: What Does Data Science Do?
CONFIDENTIAL 6
CONFIDENTIAL | 7
� Specific Data Expertise � Exploratory Analysis � Modeling � Creativity � Programming � Big Data � Communication
� Ability to Target Impact � Unstructured Analysis
� Organizational Politics
� Visualization
� …
What Does It Take?
CONFIDENTIAL 7
CONFIDENTIAL | 8
� Centralized - Brings data, analysis, and
processing together - Data scientists support one
another � Distributed
- Data scientists close to business - Multiple models for rotating
data scientists into lines of business
The New Toy: A Center of Excellence
CONFIDENTIAL 8
CoE
Line of Business A
Line of Business B
Line of Business C
CONFIDENTIAL | 9
� Specific Data Expertise � Exploratory Analysis � Modeling � Creativity � Programming � Big Data � Communication
� Ability to Target Impact � Unstructured Analysis
� Organizational Politics
� Visualization
� …
What Does It Still Take?
CONFIDENTIAL 9
CONFIDENTIAL | 10
� Designed a great home for unicorns � But they are still unicorns
CONFIDENTIAL 10
If You Build It, They Will Come?
CONFIDENTIAL | 11
� Unravel Capability � Map Activities to Functional Roles � Align Functions with Process,
Not Individuals
� Don’t Forget to Scale
Working with Horses, Not Unicorns
CONFIDENTIAL 11
CONFIDENTIAL | 12
� Identify Fraudulent Sessions � Cross Channel Analysis � Next Best Action � Optimize Pathways � Determine Session Interest � Customizing Experience � Proactive Outreach � Search Analysis
� Content Optimization
CLIENT EXAMPLE Clickstream Data in Action
CONFIDENTIAL 12
CONFIDENTIAL | 13
� Billions of clicks � Unstructured data � How do we model it?! � Model the SIGNAL � Not the data
CLIENT EXAMPLE Scaling Data Science
CONFIDENTIAL 13
CONFIDENTIAL | 14
CLIENT EXAMPLE Clickstream Data Science in Action
CONFIDENTIAL 14
Hadoop 1.0
MPP Web
Feature Selection & Dimensionality Reduction
CONFIDENTIAL | 15
� Feature Selection - Forests - Clustering
� Dimensionality Reduction - SVM
� Challenges - Job Latency - Limited Iterations
CLIENT EXAMPLE Extracting Signal: Hadoop 1.0
CONFIDENTIAL 15
CONFIDENTIAL | 16
• Spark − Faster response in exploration − Better Support for Iterative Models
• Genetic Algorithms • Neural Networks
• Challenges − In memory: costly and limiting − MapReduce does not go away
CLIENT EXAMPLE Extracting Signal: Hadoop 2.0
CONFIDENTIAL 16
CONFIDENTIAL | 17
� Focus on Technical Skills - EDA - Modeling - Programming / Big Data
� Communication Skills - Capturing signal needs - Iterating with stakeholders
CLIENT EXAMPLE Horses, Not Unicorns
CONFIDENTIAL 17
Hadoop 1.0
CONFIDENTIAL | 18
• Continue to make signal available to analysts − Next up: Extracting signal from text
• Act as a capability search party − Sprints of new insights and tools
• Finalize operating model − Funding structure − Engagement model with lines of business
CLIENT EXAMPLE CoE Next Steps
CONFIDENTIAL 18
CONFIDENTIAL | 19
Discussion Over Drinks
CONFIDENTIAL 19
Top Related