Download - Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATL

Transcript
Page 1: Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATL

Organizing for Data Science

Dan Mallinger Data Science Practice Manager

September 2014

Page 2: Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATL

CONFIDENTIAL | 2

•  Data Science Practice Manager −  Think Big Analytics

•  Working with clients across −  Financial Services −  Advertising −  Manufacturing −  Social −  Network Providers

Dan Mallinger

CONFIDENTIAL 2

Page 3: Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATL

CONFIDENTIAL | 3

•  Define Data Science in the Organization •  Look at Current Perspectives on Organization •  Discuss Shortcomings •  Review a Real World Solution

Today

CONFIDENTIAL 3

Page 4: Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATL

CONFIDENTIAL | 4

�  Use Data to Improve Our Business

�  Better Understand Customers �  Act Proactively, Not Reactively

What Do We Hope to Do?

CONFIDENTIAL 4

Page 5: Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATL

CONFIDENTIAL | 5

�  Scale �  Robustness �  Repeatability

Why Organize?

CONFIDENTIAL 5

Page 6: Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATL

CONFIDENTIAL | 6

�  Revolutionizing Ad Targeting �  Automating Deals and

Recommendations �  Alerting Admins to New Network

Attacks

Perception: What Does Data Science Do?

CONFIDENTIAL 6

Page 7: Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATL

CONFIDENTIAL | 7

�  Specific Data Expertise �  Exploratory Analysis �  Modeling �  Creativity �  Programming �  Big Data �  Communication

�  Ability to Target Impact �  Unstructured Analysis

�  Organizational Politics

�  Visualization

�  …

What Does It Take?

CONFIDENTIAL 7

Page 8: Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATL

CONFIDENTIAL | 8

�  Centralized - Brings data, analysis, and

processing together - Data scientists support one

another �  Distributed

- Data scientists close to business - Multiple models for rotating

data scientists into lines of business

The New Toy: A Center of Excellence

CONFIDENTIAL 8

CoE

Line of Business A

Line of Business B

Line of Business C

Page 9: Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATL

CONFIDENTIAL | 9

�  Specific Data Expertise �  Exploratory Analysis �  Modeling �  Creativity �  Programming �  Big Data �  Communication

�  Ability to Target Impact �  Unstructured Analysis

�  Organizational Politics

�  Visualization

�  …

What Does It Still Take?

CONFIDENTIAL 9

Page 10: Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATL

CONFIDENTIAL | 10

�  Designed a great home for unicorns �  But they are still unicorns

CONFIDENTIAL 10

If You Build It, They Will Come?

Page 11: Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATL

CONFIDENTIAL | 11

�  Unravel Capability �  Map Activities to Functional Roles �  Align Functions with Process,

Not Individuals

�  Don’t Forget to Scale

Working with Horses, Not Unicorns

CONFIDENTIAL 11

Page 12: Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATL

CONFIDENTIAL | 12

�  Identify Fraudulent Sessions �  Cross Channel Analysis �  Next Best Action �  Optimize Pathways �  Determine Session Interest �  Customizing Experience �  Proactive Outreach �  Search Analysis

�  Content Optimization

CLIENT EXAMPLE Clickstream Data in Action

CONFIDENTIAL 12

Page 13: Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATL

CONFIDENTIAL | 13

�  Billions of clicks �  Unstructured data �  How do we model it?! �  Model the SIGNAL �  Not the data

CLIENT EXAMPLE Scaling Data Science

CONFIDENTIAL 13

Page 14: Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATL

CONFIDENTIAL | 14

CLIENT EXAMPLE Clickstream Data Science in Action

CONFIDENTIAL 14

Hadoop 1.0

MPP Web

Feature Selection & Dimensionality Reduction

Page 15: Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATL

CONFIDENTIAL | 15

�  Feature Selection - Forests - Clustering

�  Dimensionality Reduction - SVM

�  Challenges - Job Latency - Limited Iterations

CLIENT EXAMPLE Extracting Signal: Hadoop 1.0

CONFIDENTIAL 15

Page 16: Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATL

CONFIDENTIAL | 16

•  Spark −  Faster response in exploration −  Better Support for Iterative Models

•  Genetic Algorithms •  Neural Networks

•  Challenges −  In memory: costly and limiting −  MapReduce does not go away

CLIENT EXAMPLE Extracting Signal: Hadoop 2.0

CONFIDENTIAL 16

Page 17: Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATL

CONFIDENTIAL | 17

�  Focus on Technical Skills - EDA - Modeling - Programming / Big Data

�  Communication Skills - Capturing signal needs - Iterating with stakeholders

CLIENT EXAMPLE Horses, Not Unicorns

CONFIDENTIAL 17

Hadoop 1.0

Page 18: Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATL

CONFIDENTIAL | 18

•  Continue to make signal available to analysts −  Next up: Extracting signal from text

•  Act as a capability search party −  Sprints of new insights and tools

•  Finalize operating model −  Funding structure −  Engagement model with lines of business

CLIENT EXAMPLE CoE Next Steps

CONFIDENTIAL 18

Page 19: Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATL

CONFIDENTIAL | 19

Discussion Over Drinks

CONFIDENTIAL 19