How to Identify, Train or Become a Data Scientist

37
The Briefing Room How to Identify, Train or Become a Data Scientist

Transcript of How to Identify, Train or Become a Data Scientist

Page 1: How to Identify, Train or Become a Data Scientist

The Briefing Room

How to Identify, Train or Become a Data Scientist

Page 2: How to Identify, Train or Become a Data Scientist

Twitter Tag: #briefr

The Briefing Room

Welcome

Host: Eric Kavanagh

[email protected]

Page 3: How to Identify, Train or Become a Data Scientist

Twitter Tag: #briefr

The Briefing Room

!   Reveal the essential characteristics of enterprise software, good and bad

!   Provide a forum for detailed analysis of today’s innovative technologies

!   Give vendors a chance to explain their product to savvy analysts

!   Allow audience members to pose serious questions... and get answers!

Mission

Page 4: How to Identify, Train or Become a Data Scientist

Twitter Tag: #briefr

The Briefing Room

Topics

This Month: ANALYTICS

October: DATA PROCESSING

November: DATA DISCOVERY & VISUALIZATION

Page 5: How to Identify, Train or Become a Data Scientist

Twitter Tag: #briefr

The Briefing Room

Analytics

Page 6: How to Identify, Train or Become a Data Scientist

Twitter Tag: #briefr

The Briefing Room

Analyst: Neil Raden

Neil Raden is the founder and Principal Analyst at Hired Brains Research. He is the co-author, with James Taylor, of “Smart (Enough) Systems: How To Deliver Competitive Advantage by Automating Hidden Decisions.” With 30 years experience, he is a widely published writer, well-known speaker, analyst and consultant, having personally designed and implemented dozens of large analytical applications in finance, marketing, distribution, logistics, actuarial, intelligence, scientific, statistical and consumer products. As an industry analyst, he has published over 40 white papers, hundreds of articles, blogs and research reports. He welcomes your comments and can be reached at [email protected].

Page 7: How to Identify, Train or Become a Data Scientist

Twitter Tag: #briefr

The Briefing Room

Actian

! Actian is a database and software development company

! Actian offers the ParAccel DataFlow Engine, a scalable parallel platform which provides visual access to complex data flows

!   The DataFlow Engine is designed to reduce cluster complexity, manage multi-petabytes of data, and scale with the size and dimensionality of the data

Page 8: How to Identify, Train or Become a Data Scientist

Twitter Tag: #briefr

The Briefing Room

Guest: John Santaferraro

John Santaferraro is the Vice President of Product Marketing at Actian. Prior to joining Actian, Santaferraro was an independent industry analyst in the business intelligence and analytics market. Before that he developed and executed a vertical market strategy for Hewlett Packard's BI group, focusing on energy, communications, retail, healthcare and financial services; he was also instrumental in helping establish HP’s new BI business group with a combination of solutions, products and consulting. In 2000, John founded a marketing and sales consulting company, Ferraro Consulting, providing business acceleration strategy for technology companies.

Page 9: How to Identify, Train or Become a Data Scientist

Enabling the Business Scientist

John Santaferraro Vice President of Marketing, ParAccel Platform Group

September 3, 2013

Page 10: How to Identify, Train or Become a Data Scientist

What is a “business scientist”? Requirements of a “business scientist” Tools of a “business scientist” Creating a culture of “business science”

10 © 2013 Actian Corporation

Page 11: How to Identify, Train or Become a Data Scientist

The “Moneyball” Effect

§ Analytics Go Mainstream •  Major League Baseball §  Hire the best team

•  NSA and Big Data §  ???????????????

•  Target and Pregnancy §  Predicting pregnancies

11 © 2013 Actian Corporation

Page 12: How to Identify, Train or Become a Data Scientist

What is a Data Scientist?

12 © 2013 Actian Corporation

Page 13: How to Identify, Train or Become a Data Scientist

A data scientist “…incorporates varying elements and builds on techniques and theories from many fields, including mathematics, statistics, data engineering, pattern recognition and learning, advanced computing, visualization, uncertainty modeling, data warehousing, and high performance computing with the goal of extracting meaning from data and creating data products.”

13

What is a Data Scientist?

© 2013 Actian Corporation

Created by Calvin Andrus, depicts a mash-up of disciplines from which Data Science is derived, 13 July 2012 http://en.wikipedia.org/wiki/Data_science

Page 14: How to Identify, Train or Become a Data Scientist

What is a Business Scientist?

“A business scientist is an expert in the science of business, sitting between the business analyst and the data scientist,

pulling together cross-functional expertise from data science, analytics, business applications, business processes, and

business strategy.

14 © 2013 Actian Corporation

Page 15: How to Identify, Train or Become a Data Scientist

Business Science Practice Areas

Business Science

Sales

Marketing

Supply Chain

Logistics

Finance

Human Resource

Risk

Fraud

15 © 2013 Actian Corporation

Page 16: How to Identify, Train or Become a Data Scientist

Business Science Skillset

Understand How Analytics Work

Understand Emerging Data Types

Understand Business Operations & Strategy

Learn Quickly

Think Outside the Box

Tell Compelling Stories

16 © 2013 Actian Corporation

Page 17: How to Identify, Train or Become a Data Scientist

§  Libraries of Analytic Functions Run at Extreme Speed •  Transformational Analytics •  Statistical Analytics •  Machine Learning Analytics •  Clustering Analytics •  Discovery Analytics

§ Visual Framework for Data Discovery, Preparation and Analytics •  Drag and Drop Interaction •  Libraries of Data Preparation

Operators •  Libraries of Analytic Operators •  High-Performance, Parallel

Processing on Hadoop (or other file systems)

17

The Tools of the Business Scientist

© 2013 Actian Corporation

Page 18: How to Identify, Train or Become a Data Scientist

ParAccel Platform – Unconstrained Analytics

Business  Intelligence  and  Repor3ng  Tools  

Advanced    Analy3cs  

Analy3c    Applica3ons  

Machine  Data  

Opera3onal  Data  

3rd  Party  Info  

Provider  

Streaming  Data   Logs  

On-­‐Demand  Integra3on  

On  Demand  Integra3on  Services  

Enterprise  Data  Warehouse  

Hadoop  

Big  Data  Apps  

Embedded  Analy3cs  

18 © 2013 Actian Corporation

In-­‐Database  Analy3cs  

Page 19: How to Identify, Train or Become a Data Scientist

Accelerate Time to Value with Libraries of Analytic Functions

Corporate Finance Statistical •  Standard Deviation •  Correlation •  Covariance, etc.

•  Present Value Analysis •  Stock Valuation •  Asset Valuation, etc.

Options / Derivatives

Univariate •  Gamma distribution •  Maxwell distribution •  Weibull, etc.

•  Risk neutral valuation (with/without Black-Scholes)

•  Greeks, etc.

Portfolio Management

Multivariate •  Normal Copula •  Hypothesis Testing •  Gumbel Copula, etc.

•  Currency / Cross-currency derivatives

•  Merton Models, etc.

Fixed Income

Data Mining •  K-Means •  Logistic Regression, •  Neural Networks, etc.

•  Price and Yield •  Duration •  Convexity, etc.

Time Series Analysis

Mathematical •  Trigonometric •  Permutation / Combination •  Exponential / Logarithm, etc.

•  ARMA / ARIMA models •  ARCH/GRACH model •  Regime Switch, etc.

u  100+ pre-loaded SQL, windows, and mathematical functions pre-loaded u  500+ advanced analytics available for purchase

© 2013 Actian Corporation

Page 20: How to Identify, Train or Become a Data Scientist

Business Analyst to Business Scientist

20 © 2013 Actian Corporation

Unconstrained Analytics Load and Go

Run Ad Hoc Queries

Query Any Time

Query Any Data

Query All Data

Run Any Analytics

Execute Sophisticated Analytics

Return Results Quickly

Iterate Quickly Through Discovery

Share Workloads With Any Platform

Support All Analysts

Run Many Applications

Create Analytic Services

Page 21: How to Identify, Train or Become a Data Scientist

ParAccel Dataflow & Hadoop Analytics

21

›  On-demand integration

›  Data and Application Integration

›  In-flight preparation

›  In-Hadoop preparation

›  Dataflow optimizations

›  Hadoop optimizations

›  In-Hadoop analytics

›  Non-Hadoop analytics

Business Intelligence

Analytics

Enterprise

Social

New Data

Applications DW

www Mobile Machine

Data

High-Performance BI

High-Performance Analytics

Connect Prepare Analyze

Optimize

DATA VALUE

A visual framework for high-performance, data provisioning, ETL, and analytics on Hadoop (or other file systems) without any knowledge of

MapReduce or parallel programming

© 2013 Actian Corporation

Page 22: How to Identify, Train or Become a Data Scientist

ParAccel Dataflow – Designer

§ Single UI for Data Preparation and Advanced Analytics

22 © 2013 Actian Corporation

Page 23: How to Identify, Train or Become a Data Scientist

Dataflow Operator Libraries

© 2013 Actian Corporation 23

Page 24: How to Identify, Train or Become a Data Scientist

HDFS

ParAccel Platform in Action

24 © 2013 Actian Corporation

ParAccel  PlaEorm  

Read Write Prepare Analyze Read Write Analyze Read Write

Page 25: How to Identify, Train or Become a Data Scientist

Creating a Culture for Business Science

25 © 2013 Actian Corporation

Create Educational Opportunities

Provide Incentives for Participants

Reorganize to Support Business Science

Deploy Infrastructure to Support Analytics

Page 26: How to Identify, Train or Become a Data Scientist

Contact me at… [email protected] 408.373.7500

Visit Actian at…

www.actian.com

26 © 2013 Actian Corporation

Page 27: How to Identify, Train or Become a Data Scientist

Twitter Tag: #briefr

The Briefing Room

Perceptions & Questions

Analyst: Neil Raden

Page 28: How to Identify, Train or Become a Data Scientist

Analy5c  Types  and  Roles  

Neil  Raden  Founder,  Hired  Brains  Research  

Twi>er:  NeilRaden      

Blog:  h>p://hiredbrains.wordpress.com  Website:  h>p://www.hiredbrains.com  

Mail:  [email protected]  LinkedIn:  h>p://www.linkedin.com/in/neilraden  

Copyright  2013  Neil  Raden  and  Hired  Brains  Research  LLC   28  

Page 29: How to Identify, Train or Become a Data Scientist

No  More  Managing  from  Scarcity  

29  

Page 30: How to Identify, Train or Become a Data Scientist

Even  Big  Data  Doesn’t  Speak  for  Itself  

30  

•  Incomplete!•  Behaviors under-

represented!•  Anonymizing

disasters!•  Selection!•  ML still needs

analyst!Not  a  crystal  ball  

Page 31: How to Identify, Train or Become a Data Scientist

Anscombe’s  Quartet  

Copyright  2013  Neil  Raden  and  Hired  Brains  Research  LLC   31  

 Mean  of  x  =  9    Variance  of  x  =  11    Mean  of  y  =  7.50    Variance  of  y  =  4.122  Correla5on  between  x  and  y  =  0.816  Linear  regression  line  y  =  3.00  +  0.500x  

Page 32: How to Identify, Train or Become a Data Scientist

Descrip3ve  Title Quan3ta3ve  Sophis3ca3on/Numeracy

Sample  Roles

Type  I Quan5ta5ve  R&D PhD  or  equivalent Crea5on  of  theory,  development  of  algorithms.  Academic  /research.  Work  in  business/government  for  very  specialized  roles

Type  II Data  Scien5st  or  Quan5ta5ve  Analyst

Advanced  Math/Stat,  not  necessarily  PhD

Internal  expert  in  sta5s5cal  and  mathema5cal  modelling  and  development,  with  solid  business  domain  knowledge.  

Type  III Opera5onal  Analy5cs     Good  business  domain,  background  in  sta5s5cs  op5onal

Running  and  managing  analy5cal  models.  Strong  skills  in  and/or  project  management  of  analy5cal  systems  implementa5on

Type  IV Business  Intelligence/  Discovery

Data  and  numbers  oriented,  but  no  special  advanced  sta5s5cal  skills

Repor5ng,  dashboard,  OLAP  and  visualiza5on,  some  design,  posterior  analysis  of  results  from  quan5ta5ve  methods.  Spreadsheets,  “business  discovery  tools”  

32  

Analy3c  Types  

Types  of  Analysis  

Copyright  2013  Neil  Raden  and  Hired  Brains  Research  LLC  

Page 33: How to Identify, Train or Become a Data Scientist

Ques5ons  

Copyright  2013  Neil  Raden  and  Hired  Brains  Research  LLC   33  

Analy3c  Types  

•  How  would  you  describe  the  difference  between  a  data  scien5st  and  a  business  scien5st?  

•  What  tools  are  needed  to  support  a  business  analyst?  

•  What’s  the  career  path  for  a  business  analyst?  •  Is  big  data  suffering  from  hype?  

Page 34: How to Identify, Train or Become a Data Scientist

Ques5ons  

Copyright  2013  Neil  Raden  and  Hired  Brains  Research  LLC   34  

Analy3c  Types  

•  Why  do  you  think  people  are  afraid  of  math?  •  Should  universi5es  prepare  people  for  business  science  or  should  industry?  

Page 35: How to Identify, Train or Become a Data Scientist

Twitter Tag: #briefr

The Briefing Room

Page 36: How to Identify, Train or Become a Data Scientist

Twitter Tag: #briefr

The Briefing Room

Upcoming Topics

www.insideanalysis.com

September: ANALYTICS

October: DATA PROCESSING

November: DATA DISCOVERY & VISUALIZATION

Page 37: How to Identify, Train or Become a Data Scientist

Twitter Tag: #briefr

The Briefing Room

Thank You for Your

Attention