Data science governance : what and how

30
www.kensu.io DATA SCIENCE GOVERNANCE 1 What and How

Transcript of Data science governance : what and how

Page 1: Data science governance : what and how

w w w.kensu. io

DATA SCIENCE GOVERNANCE

1

What and How

Page 2: Data science governance : what and how

w w w.kensu. io 2

- CEO & Founder -

Mathematics & Compu t er Science MsC.

Creator of Spark Not ebook

- CSO & Founder -

Physics PhD. Genomics & Quantitat ive Finance

XAVIER TORDOIRANDY PETRELL A

KENSU & ME

Started in 2015 as Data Fellas, focus on Data Science consulting

Team of 10 engineers and scientists

Shift toward Product Company in 2016, renamed to Kensu,

Focus on Data Science Governance

Accelerated by Alchemist Accelerator in San Francisco and The Faktory in Belgium

Page 3: Data science governance : what and how

w w w.kensu. io

TOPICS

1. Some thoughts on “Data Science”

2. Data Science Governance: What

3. Data Science Governance: How

4. GDPR: Accountability principle and transparency

3

Page 4: Data science governance : what and how

w w w.kensu. io

SOME THOUGHTS ON “DATA SCIENCE”

4

Page 5: Data science governance : what and how

w w w.kensu. io

MACHINE LEARNING

Pioneers in 1950s

AI Winter in 1970s due pessimism

Resurgence in 1980s

Machine Learning (and related) is used since the 1990s (esp. SVM and RNN)

Deep learning see widespread commercial use in 2000s

Machine learning receives great publicity (read: buzz) in 2010s

5ref: https://en.wikipedia.org/wiki/Timeline_of_machine_learning

Page 6: Data science governance : what and how

w w w.kensu. io

DATA SCIENCE: +ENGINEERING

Claim: “Data Scientist” coined by DJ Patil in 2008.

Pretty much where Machine Learning was part of Softwares

In a way, when we added “engineering” to the mix

Also, engineering is even more prominent with Big Data Distributed Computing

6

Page 7: Data science governance : what and how

w w w.kensu. io

DATA SCIENCE: +EXPERIMENTATION

So much data available

So many tools, libraries, frameworks, …

So many things we can try

We have distributed computing now, right? => Let’s try everything

Discover new insights (and potentially new businesses)

7

Page 8: Data science governance : what and how

w w w.kensu. io

DATA SCIENCE: RECAP

Maths: stats, machine learning and so on

Engineering: ETL, Databases, Computing framework, Softwares, Platforms, …

Creativity: “From business intelligence To intelligent business” - Michael Fergusson

Data Science is an umbrella on top of all activities on data

8

Page 9: Data science governance : what and how

w w w.kensu. io

DATA SCIENCE GOVERNANCE: WHAT

9

Page 10: Data science governance : what and how

w w w.kensu. io

DATA PIPELINE

Data pipeline is connecting activities on data, potentially involving several technologies.

A pipeline is generally thought as an End-to-End processing line to solve one problem.

But, part of pipelines are reused to save computation, storage, time, …

Thus interdependency between pipeline segments grows with initiatives

10

Page 11: Data science governance : what and how

w w w.kensu. io

GOAL: TAKE DECISION

Data Pipelines, connected together, aren’t created for the beauty of it.

The ultimate goal is always to take decisions.

Decisions are generally taken or linked to humans with responsibilities.(even for self driving cars, in case of problem)

Given that pipelines are cut-and-wired, interleaved, …

How not to be anxious at deploying the last piece used by the decision maker

11

Page 12: Data science governance : what and how

w w w.kensu. io

SOURCES OF ANXIETY

What if:

• one of the data used in the process has different patterns suddenly?

• one of the tools, projects or similar is modified upstream?

• the insights are deviating from the reality?

• …

12

Page 13: Data science governance : what and how

w w w.kensu. io

DEBUGGING?

To reduce the anxiety or, actually, reducing the risks, we need ways to debug.

In pure engineering, we have unit, function, integrations tests,… but

How do we do when the problems come from the data themselves?

We can’t generate all cases of data variations, right?

How to debug? Without the big picture, we may try to optimise a model for weeks for nothing

13

Page 14: Data science governance : what and how

w w w.kensu. io

DATA SCIENCE GOVERNANCE

Data governance: controls that data meets precise standards and involves monitoring against production data.

Data Science Governance: control that data activity meets precise standards and involves monitoring against production data activity.

A Data Activity is described by at least technologies, users, systems, data, processing

14

Page 15: Data science governance : what and how

w w w.kensu. io

GOVERNING DATA SCIENCE

Who does what on which data and where it is done?

What is the impact of a process on the global system?

What are the performance metrics (quality, execution,…) of the processes?

15

Page 16: Data science governance : what and how

w w w.kensu. io

CONTINUOUS INTEGRATION FOR DATA SCIENCE

Data Scientists/Citizens have a view on all the activities applied to the original sources used in his/her own process.

They also have a control on their own results in production They have the opportunity to analyse and debug a pipeline involving all activities: • independently of the technologies • involving several people in the enterprise

16

Page 17: Data science governance : what and how

w w w.kensu. io

DATA SCIENCE GOVERNANCE: HOW

17

Page 18: Data science governance : what and how

w w w.kensu. io

CHALLENGES

So many tools are using data!

The number of processing is growing impressively.

We have to take care of the legacy…

18

Page 19: Data science governance : what and how

w w w.kensu. io

GET THE DATA

As usual, we have to collect the right data to take right decision.

First run an assessment to create a high level map of all the tools involved into a company.

For each tool, do whatever it takes to collect information about the activities it is creating.

Information are metadata, lineage, statistics, accuracy measures, …

19

Page 20: Data science governance : what and how

w w w.kensu. io

CONNECT THE DATA

Data Science Governance needs the global picture.

To do that we need to connect all data that can be collected.

So that, it is possible to create a cartography of all on-going processes.

This map tracks all data and their descendants

20

Page 21: Data science governance : what and how

w w w.kensu. io

USE THE DATA

This is where the fun part starts… the map of data activities is an amazing source of information

Here are a few things you can think of when using this kind of data: • impact analysis • dependency analysis • optimisation • recommendation

21

Page 22: Data science governance : what and how

w w w.kensu. io

GDPR

22

General Data Protection Regulation

Page 23: Data science governance : what and how

w w w.kensu. io

ACCOUNTABILITY PRINCIPLE

Implement appropriate technical and organisational measures that ensure and demonstrate that you comply. This may include internal data protection policies such as staff training, internal audits of processing activities, and reviews of internal HR policies.

23

Page 24: Data science governance : what and how

w w w.kensu. io

TRANSPARENCY

As well as your obligation to provide comprehensive, clear and transparent privacy policies, if your organisation has more than 250 employees, you must maintain additional internal records of your processing activities.

24

Page 25: Data science governance : what and how

w w w.kensu. io

ACCOUNTABILITY: DATA SCIENCE GOVERNANCE

To govern data science, we have to:

• collect activities

• connect activities

With this information we can reliably create automatically the process registry

25

Page 26: Data science governance : what and how

w w w.kensu. io

TRANSPARENCY: DATA SCIENCE GOVERNANCE

To govern data science seen as a continuous integration solution: we have to explain and measure activities independently of the technologies.

With this information we can reliably create transparent reports of activities across the whole chain of processing

26

Page 27: Data science governance : what and how

w w w.kensu. io

GUESS WHAT?

This what Adalog, our product at Kensu, does!

27

Page 28: Data science governance : what and how

w w w.kensu. io

ADALOG

28

Adalog Collectors

Adalog Service

Data Citizen

HT

TPS

Po

rt on

ly

Recommendation System

Data Process Registry

Impact Analyzer

Data

Protection

Officer

Dashboard

Page 29: Data science governance : what and how

w w w.kensu. io

WANT TO SEE MORE?

Request a demo on our website: http://kensu.io

29

Page 30: Data science governance : what and how

w w w.kensu. io

DATA SCIENCE GOVERNANCE

Andy Petrella

CEO Co Founder

0032 495 99 11 04

@noootsab

Xavier Tordoir

CSO Co Founder

0032 495 99 11 04

+1 (628) 236-9239

@xtordoir

@kensuio