Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATL

19
Organizing for Data Science Dan Mallinger Data Science Practice Manager September 2014

description

This talk will introduce a paradigm for enabling access to large, unstructured, and novel datasets in enterprises, while retaining value from existing tools and staff. By following a real world example, the discussion will walk through how small, central data science teams can make data discoveries and data value accessible to others. We will also review the tools, data science approaches, and best practices to uncovering, polishing, and digesting signal in data to support analytics at the front lines of business.

Transcript of Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATL

Page 1: Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATL

Organizing for Data Science

Dan Mallinger Data Science Practice Manager

September 2014

Page 2: Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATL

CONFIDENTIAL | 2

•  Data Science Practice Manager −  Think Big Analytics

•  Working with clients across −  Financial Services −  Advertising −  Manufacturing −  Social −  Network Providers

Dan Mallinger

CONFIDENTIAL 2

Page 3: Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATL

CONFIDENTIAL | 3

•  Define Data Science in the Organization •  Look at Current Perspectives on Organization •  Discuss Shortcomings •  Review a Real World Solution

Today

CONFIDENTIAL 3

Page 4: Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATL

CONFIDENTIAL | 4

�  Use Data to Improve Our Business

�  Better Understand Customers �  Act Proactively, Not Reactively

What Do We Hope to Do?

CONFIDENTIAL 4

Page 5: Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATL

CONFIDENTIAL | 5

�  Scale �  Robustness �  Repeatability

Why Organize?

CONFIDENTIAL 5

Page 6: Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATL

CONFIDENTIAL | 6

�  Revolutionizing Ad Targeting �  Automating Deals and

Recommendations �  Alerting Admins to New Network

Attacks

Perception: What Does Data Science Do?

CONFIDENTIAL 6

Page 7: Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATL

CONFIDENTIAL | 7

�  Specific Data Expertise �  Exploratory Analysis �  Modeling �  Creativity �  Programming �  Big Data �  Communication

�  Ability to Target Impact �  Unstructured Analysis

�  Organizational Politics

�  Visualization

�  …

What Does It Take?

CONFIDENTIAL 7

Page 8: Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATL

CONFIDENTIAL | 8

�  Centralized - Brings data, analysis, and

processing together - Data scientists support one

another �  Distributed

- Data scientists close to business - Multiple models for rotating

data scientists into lines of business

The New Toy: A Center of Excellence

CONFIDENTIAL 8

CoE

Line of Business A

Line of Business B

Line of Business C

Page 9: Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATL

CONFIDENTIAL | 9

�  Specific Data Expertise �  Exploratory Analysis �  Modeling �  Creativity �  Programming �  Big Data �  Communication

�  Ability to Target Impact �  Unstructured Analysis

�  Organizational Politics

�  Visualization

�  …

What Does It Still Take?

CONFIDENTIAL 9

Page 10: Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATL

CONFIDENTIAL | 10

�  Designed a great home for unicorns �  But they are still unicorns

CONFIDENTIAL 10

If You Build It, They Will Come?

Page 11: Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATL

CONFIDENTIAL | 11

�  Unravel Capability �  Map Activities to Functional Roles �  Align Functions with Process,

Not Individuals

�  Don’t Forget to Scale

Working with Horses, Not Unicorns

CONFIDENTIAL 11

Page 12: Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATL

CONFIDENTIAL | 12

�  Identify Fraudulent Sessions �  Cross Channel Analysis �  Next Best Action �  Optimize Pathways �  Determine Session Interest �  Customizing Experience �  Proactive Outreach �  Search Analysis

�  Content Optimization

CLIENT EXAMPLE Clickstream Data in Action

CONFIDENTIAL 12

Page 13: Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATL

CONFIDENTIAL | 13

�  Billions of clicks �  Unstructured data �  How do we model it?! �  Model the SIGNAL �  Not the data

CLIENT EXAMPLE Scaling Data Science

CONFIDENTIAL 13

Page 14: Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATL

CONFIDENTIAL | 14

CLIENT EXAMPLE Clickstream Data Science in Action

CONFIDENTIAL 14

Hadoop 1.0

MPP Web

Feature Selection & Dimensionality Reduction

Page 15: Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATL

CONFIDENTIAL | 15

�  Feature Selection - Forests - Clustering

�  Dimensionality Reduction - SVM

�  Challenges - Job Latency - Limited Iterations

CLIENT EXAMPLE Extracting Signal: Hadoop 1.0

CONFIDENTIAL 15

Page 16: Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATL

CONFIDENTIAL | 16

•  Spark −  Faster response in exploration −  Better Support for Iterative Models

•  Genetic Algorithms •  Neural Networks

•  Challenges −  In memory: costly and limiting −  MapReduce does not go away

CLIENT EXAMPLE Extracting Signal: Hadoop 2.0

CONFIDENTIAL 16

Page 17: Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATL

CONFIDENTIAL | 17

�  Focus on Technical Skills - EDA - Modeling - Programming / Big Data

�  Communication Skills - Capturing signal needs - Iterating with stakeholders

CLIENT EXAMPLE Horses, Not Unicorns

CONFIDENTIAL 17

Hadoop 1.0

Page 18: Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATL

CONFIDENTIAL | 18

•  Continue to make signal available to analysts −  Next up: Extracting signal from text

•  Act as a capability search party −  Sprints of new insights and tools

•  Finalize operating model −  Funding structure −  Engagement model with lines of business

CLIENT EXAMPLE CoE Next Steps

CONFIDENTIAL 18

Page 19: Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATL

CONFIDENTIAL | 19

Discussion Over Drinks

CONFIDENTIAL 19