Data Science = Machine Learning - bigdata.uni-saarland.de Science and Machine... · Data Science...

Post on 07-Sep-2019

6 views 1 download

Transcript of Data Science = Machine Learning - bigdata.uni-saarland.de Science and Machine... · Data Science...

Data Science 6= Machine Learning

Prof. Dr. Jens Dittrich

bigdata.uni-saarland.dedaimond.ai

twitter.com/jensdittrich

April 19, 2018

Prof. Dr. Jens Dittrich Data Science 6= Machine Learning 1 / 12

Data Science (one possible View)

Application Domain

Machine Learning

A.I. Big DataManagement

Data Science Data Mining

Statistics

Prof. Dr. Jens Dittrich Data Science 6= Machine Learning 2 / 12

Data Science (another View)

Artificial Intelligence

- Machine Learning

DataManagement

Data Science

Data Mining

Statistics

Math

Programming

Visualization

Application Domain

Prof. Dr. Jens Dittrich Data Science 6= Machine Learning 3 / 12

Artificia

l Intellig

ence/

Machine Learning D

ata Managem

ent

Data Mining Application

Domain

The Data Science CakeIngredients: 50g statistics120g linear algebra200g programming1kg visualisation300g software engineering

Additional skills: creativityout of the box thinkinggritteam spirit

© istock.com sasilsolutions

Data Science = Three Thirds

Definition (Data Science)

Data Science =

1/3 Artificial Intelligence (⊃ Machine Learning) +

1/3 Data Mining +

1/3 Data Management.

Preach and get involved!

It is our job as a community to spread the word about this and getinvolved! Otherwise we will again witness the reinvention of the wheel(e.g. like in NoSQL).

Prof. Dr. Jens Dittrich Data Science 6= Machine Learning 5 / 12

Opportunity!

In Data Science there is tremendous opportunity for data management.

Translates to:

In Data Science there is tremendous opportunity for us!

Prof. Dr. Jens Dittrich Data Science 6= Machine Learning 6 / 12

The Data Science-Pipeline/Waterfall ModelDatenbanken im Wasserfallmodell?data collection

data acquisition

data profiling,exploration

&visualization

data cleaning

feature engineering

modeling

model training

model testing

result interpretation

Achtung: Data Transformation kann zu beliebigen Zeitpunktenstattfinden und ist deshalb hier nicht als extra Schritt eingezeichnet.

Prof. Dr. Jens Dittrich Datenbanken 2 / 9

This is at the same time a process model and a dataflow.

Prof. Dr. Jens Dittrich Data Science 6= Machine Learning 7 / 12

AI/ML as the GoalDatenbanken im Wasserfallmodell?data collection

data acquisition

data profiling,exploration

&visualization

data cleaning

feature engineering

modeling

model training

model testing

result interpretation

Achtung: Data Transformation kann zu beliebigen Zeitpunktenstattfinden und ist deshalb hier nicht als extra Schritt eingezeichnet.

Prof. Dr. Jens Dittrich Datenbanken 2 / 9

AI&ML

Data Collection through data cleaning has a single goal here:enable AI&ML

Prof. Dr. Jens Dittrich Data Science 6= Machine Learning 8 / 12

Data Curation/OLAP as the GoalDatenbanken im Wasserfallmodell?data collection

data acquisition

data profiling,exploration

&visualization

data cleaning

feature engineering

modeling

model training

model testing

result interpretation

Achtung: Data Transformation kann zu beliebigen Zeitpunktenstattfinden und ist deshalb hier nicht als extra Schritt eingezeichnet.

Prof. Dr. Jens Dittrich Datenbanken 2 / 9

Database Management System (DBMS)

Data Collection through data cleaning has a single goal here:enable data curation/OLAP

Prof. Dr. Jens Dittrich Data Science 6= Machine Learning 9 / 12

Alternative: DBMS as an Intermediate Tool

In principle, this is possible at any step.

Prof. Dr. Jens Dittrich Data Science 6= Machine Learning 10 / 12

Alternative: DBMS as an Intermediate Tool

Examples: DeepDive, HoloClean,other steps: MonetDB/Tensorflow marriageany data transformation: relational algebra-style, e.g. Pandas, statelessDBMS, Spark/Flink, etc.

Prof. Dr. Jens Dittrich Data Science 6= Machine Learning 11 / 12

Summary: Data Science = Three Thirds

Definition (Data Science)

Data Science =

1/3 Artificial Intelligence (⊃ Machine Learning) +

1/3 Data Mining +

1/3 Data Management.

Preach and get involved!

It is our job as a community to spread the word about this and getinvolved! Otherwise we will again witness the reinvention of the wheel(e.g. like in NoSQL).

Prof. Dr. Jens Dittrich Data Science 6= Machine Learning 12 / 12