The Zen of Data Science

56
The Zen of Data Science Eugene Dubossarsky Chief Data Scientist – Principal Founder –

description

The Zen of Data Science. Eugene Dubossarsky Chief Data Scientist – Principal Founder –. Presentation Summary - Promised. -Key concepts, dos and don'ts of Data Science 
-Science and engineering : very different! 
- What are Data Scientists for? - PowerPoint PPT Presentation

Transcript of The Zen of Data Science

Page 1: The Zen of Data Science

The Zen of Data Science

Eugene DubossarskyChief Data Scientist – Principal Founder –

Page 2: The Zen of Data Science

Presentation Summary - Promised

-Key concepts, dos and don'ts of Data Science

-Science and engineering : very different!

- What are Data Scientists for?

- Where should Data Science sit in the business?

- How should data science be measured, managed, planned?

- Starting, nourishing and growing a successful Data Science function in your business skills and experience 

- Becoming an effective data scientist

Page 3: The Zen of Data Science

Presentation Summary – But Actually More Like...

Shameless self promotion Parables Metaphors Abstract Philosophical Stuff Surprises Challenges and Reframes You saying “This is relevant to my life

how?”

Page 4: The Zen of Data Science

Presentation Summary

Tools vs Ideas – Science vs Technology Finding vs Building – Science and Engineering Engagement Exploration – a legitimate, vital and strategic

business activity Intelligence – a business function Mastery Apprenticeship

Page 5: The Zen of Data Science

The “Zen” bit

The bare essence The kernel of truth The thing that isn't illusion The way (Tao) to enlightenment (Satori) Clarity and simplicity derived from meditation,

possibly quite different to everyday experience

Page 6: The Zen of Data Science

Parable 1: Getting Airports Wrong

Everybody thinks that this is an airplane:

Page 7: The Zen of Data Science

Parable 1: Getting Airports Wrong

Imagine your job is to build an Airport You need to take the design of airplanes in to

account. The only problem is:

Page 8: The Zen of Data Science

Parable 1: Getting Airports Wrong

This is what is called a “fundamental category error”. Anything done with this misconception in place will be a waste of time, money and resources.

“Working around it”, and “being realistic about the client's expectations” is a bit beside the point.

Page 9: The Zen of Data Science

Parable 1: Getting Airports Wrong

Most people probably want to focus on the aerodynamics of the “airplane” as currently conceived, the buzz around technology to support such “airplanes” and may see this as being “business focused”, while more fundamental discussions would be seen as “negative”, “academic” or too “challenging”.

Page 10: The Zen of Data Science

Parable 1: Getting Airports Wrong

Nevertheless, getting the fundamental issue sorted out would seem to be the first order of business, no matter how abstract, controversial, politically inconvenient or offensive to some quarters, or how many people have built careers managing, selling and practicing in this paradigm.

Page 11: The Zen of Data Science

Parable 1: Getting Airports Wrong

Because... Uh.. Donkey ?

Page 12: The Zen of Data Science

Data, Science, Tools and Definitions

Data Scientist = “Hadoop Guy” ? “Guy Who Does Stuff with Data” ? Guy Who Does Stuff with Lots of Data ? Guy Who Does Stuff with Big Data ? Guy Who Does Stuff With Big Data That

Sounds Cool or Businessy?(And what makes Data “Big” anyway?)

Page 13: The Zen of Data Science

Science and Engineering

Is there a difference ? What is it ? What is a “Data Engineer” ? What is a “non-Data Engineer” ?

Page 14: The Zen of Data Science

Science and Engineering

Are actually direct opposites Skills, positioning, personality types,

appropriate management frameworks and place in the business are quite different.

The confusion needs sorting out.

Page 15: The Zen of Data Science

Science and Engineering

Page 16: The Zen of Data Science

Now I've Lost You...

That's not “realistic” - most “data scientists” are actually “engineers” by this framework !

That sounds too “technical”, “academic” or not “relevant to business”

Page 17: The Zen of Data Science

Now I've Lost You...

That's not “realistic” - most “data scientists” are actually “engineers”

Yep.

That sounds too “technical”, “academic” or “not relevant to business”

Maybe, Too Bad and No

Page 18: The Zen of Data Science

Engineering Start with an identified idea, end with a design Build or maintain something to pre-defined

parameters Uncertainty is the enemy (time, budget,

resources, performance)

Page 19: The Zen of Data Science

Engineering

Plans, Timeframes and Specifications, vs ongoing (loosely focused) discussion

Delivers Products and pre-determined KPIs. The Unexpected is a (usually unwelcome) exception

Works to milestones and a specification Engaged with operational and technical

management

Page 20: The Zen of Data Science

Engineers

Outcomes are Things An Engineer may do more or less the same

thing many times An Engineer performs “projects” and manages

“processes” An engineer is managed according to tight

requirements

Page 21: The Zen of Data Science

Engineers

easier to identify easier to manage easier to understand less stressful to deal with Easier to train more plentiful easier to recruit

Page 22: The Zen of Data Science

Engineers And Data

Data is a resource to move and manipulate Focus is on building and maintaining

processes that do that Data is a “commodity” that flows through the

system. The focus is on the system.

Page 23: The Zen of Data Science

Science and Scientists Start with reality - derive new insights Uncertainty is your job “Projects” and “processes” are anathema, and

people who manage them don't help Explore and Interrogate Data No two jobs are the same No job can be specified too tightly Findings are inherently uncertain, otherwise

why bother ?

Page 24: The Zen of Data Science

Scientists and Data Focused on The Data. Tools help but don't feature. Data is complex, an undiscovered country to

explore. Data is not a commodity : it is complex, ever-

changing and information rich

Page 25: The Zen of Data Science

Scientists and The CEO

Data is “The Last Frontier”, where dangers lurk and opportunities abound. The scientist is the guide.

Objective is to Tell the Story of the Data, to someone who cares and matters (ideally CEO), preferably as part of an ongoing conversation

Page 26: The Zen of Data Science

Science and Engineering Scientists help you identify new risks and

opportunities, they provide transformational insights.

Engineers make transformations tangible Scientists explore Engineers deliver and maintain The personality types are actually quite

different

Page 27: The Zen of Data Science

Science and Engineering

There is a lot of crossover It is good to be skilled in both Many of the tools used are the same The distinction is not obvious to most outsiders The distinction is crucial

Page 28: The Zen of Data Science

Why the Confusion? It's all “technical”, apparently It has the word “data” in it. Some vendors like it that way. Much of management likes it that way. Much of management is out of its depth And almost all of HR and recruting

.

Page 29: The Zen of Data Science

Science and Engineering

Real Business Needs Both Pretend Business only needs Engineering

(and maybe not even that) Science is crucial for real competition and

risk Science is irrelevant otherwise Engineering is Delivery Science is Intelligence

Page 30: The Zen of Data Science

The Intelligence Function – Where Data Science Should Sit in the

Business?

Absent in most “enterprises” Present informally in most real businesses A strategic, secret asset not to be bragged

about or shared

“Data” is not just structured, electronic, concerete or even conscious

Page 31: The Zen of Data Science

The Intelligence Function

Strategic, secret role Trusted, discreet, low-key advisor, mentor,

guide A mix of Mr Spock, James Bond and Steve

Jobs May guises, many names Well understood by militaries at war, and

organisations with real challenges, risks and uncertainty

Often next in line for CEO

Page 32: The Zen of Data Science

The Intelligence Function – Where Data Science Should Sit in the

Business Not IT Not Operations Right near the CEO Reporting directly, discretely, interactively Not managed by Prince2, waterfall or any

other “project management” or “Business Analysis” methods

Lean Startup, real Agile (see Manifesto) and OODA loop much more like it

Page 33: The Zen of Data Science

Data Science and Analytics Today

Insights or Process ? Tools or Outcomes ? Transformation or BAU ? Value or Compliance ? Asset or Vanity ? Engaged or Disengaged ? Measured ?

Page 34: The Zen of Data Science

Data Science and Analytics Today

Insights or Process ? Tools or Outcomes ? Transformation or BAU ? Value or Compliance ? Asset or Vanity ? Engaged or Disengaged ? Measured ?

Page 35: The Zen of Data Science

Insights vs Process

Insights CANNOT be the same each time. But Much of “Analytics” can

Deriving value from predictive targeting is a repeatable, mechanical process.

Deriving value from insights derived from the same model is not.

Page 36: The Zen of Data Science

Insights vs Process

Only one requires a scientist. Only one is valued by businesses that don't

have real competitive, environmental and other change pressures.

Page 37: The Zen of Data Science

Data Science and Analytics Today

Insights or Process ? Tools ? Transformation or BAU ? Value or Compliance ? Asset or Vanity ? Engaged or Disengaged ? Measured ?

Page 38: The Zen of Data Science

Tools and Trinkets

Is “Hadoop” really the most important thing on a “data scientist's resume ?

Why or why not ? What is missing ?

Page 39: The Zen of Data Science

Data Science and Analytics Today

Insights or Process ? Tools ? Transformation or BAU ? Value or Compliance ? Asset or Vanity ? Engaged or Disengaged ? Measured ?

Page 40: The Zen of Data Science

Data Science and Analytics Today

Insights or Process ? Tools or Science ? Transformation or BAU ? Value or Compliance ? Asset or Vanity ? Engaged or Disengaged ? Measured ?

Page 41: The Zen of Data Science

Data Science and Analytics Today

Insights or Process ? Tools or Science ? Transformation or BAU ? Value or Compliance ? Asset or Vanity ? Engaged or Disengaged ? Measured ?

Page 42: The Zen of Data Science

Data Science and Analytics Today

Insights or Process ? Tools or Science ? Transformation or BAU ? Value or Compliance ? Vital Asset or Vanity ? Engaged or Disengaged ? Measured ?

Page 43: The Zen of Data Science

Value, Compliance or Vanity ?

What would happen to the business if the analytics/data science/data mining function disappered overnight ?

Who would care ? Why ? Why does the function exist in the business in

the first place ? Science does not serve vanity well, and is

not necessary for compliance.

Page 44: The Zen of Data Science

Data Science and Analytics Today

Insights or Process ? Tools or Science ? Transformation or BAU ? Value or Compliance ? Vital Asset or Vanity ? Leadership Engaged or Disengaged ? Measured ?

Page 45: The Zen of Data Science

Engagement in Parables

Is investing in data analytics like investing in stocks or investing in an education (or gym membership) ?

If analytics was a taxi, does the CEO think the analytics function are car mechanics, drivers or tour guides, does he know, does he care ?

Page 46: The Zen of Data Science

Engagement in Extremes

Analytics in a hedge fund Analytics in a bank Basel II compliance analytics in a bank What are the KPIs ? Does the CEO personally care about them ? Can the organisation do without the analytics

function ? Can the organisation afford the CEO ignoring

the analytics function ?

Page 47: The Zen of Data Science

Data Science and Analytics Today

Insights or Process ? Tools or Science ? Transformation or BAU ? Value or Compliance ? Vital Asset or Vanity ? Leadership Engaged or Disengaged ? Measured ?

Page 48: The Zen of Data Science

Measurement

How many predictive analytics function in banking, telco, insurance etc are measured explicitly on improvement in predictive accuracy, with the CEO keeping an eye on this (retention, acquisition, risk, pricing models) ?

How many know/care about the predictive accuracy of their competitors ?

Page 49: The Zen of Data Science

Finding Training and Managing Data Scientists

Not Easy

Page 50: The Zen of Data Science

Finding Data Scientists Data Scientists are part engineer, part

enterpreneur and part hunter/gatherer – outcome focused explorers !

ADHD is an asset, personality profile is not typical corporate

Communication skills and lateral thinking as important as technical skill

Technical skills are DEEEEP, eclectic

Page 51: The Zen of Data Science

Finding Data Scientists Most severely recruiters out of their depth Ditto most HR The best people are un-/under-/mis-

employed ! It takes one to know one

Page 52: The Zen of Data Science

Training Data Scientists

Eclectic skill set Hard Skills

Stats/Machine Learning/Computing/Psychology

Domain expertise Many “soft skills”

Conceptual Communication Science ! Agile/Lean Startup/Cynefin/OODA

Page 53: The Zen of Data Science

Training Data Scientists

Experience is crucial Mistakes are valuable Apprenticeship is Key ! Courses help, but not a substitute. Won't teach

the soft skills and conceptual outlook

Page 54: The Zen of Data Science

Managing Data Scientists

Yes: Real Agile, Lean Startup, Cynefin, OODA loop

No: PRINCE2, Project Management, “Business Analysis”, Operational Management, the IT function.

Yes: someone who is engaged, empowered, interested.

No: Just about everyone actually doing this out there...

Page 55: The Zen of Data Science

So Who Needs Data Scientists?

Businesses facing real competition, real threats, real uncertainty and real change.

Page 56: The Zen of Data Science

Who Doesn't Really Need Data Scientists ?

Everyone Else.