Data Science = Machine Learning - bigdata.uni-saarland.de Science and Machine... · Data Science...

12
Data Science 6= Machine Learning Prof. Dr. Jens Dittrich bigdata.uni-saarland.de daimond.ai twitter.com/jensdittrich April 19, 2018 Prof. Dr. Jens Dittrich Data Science 6= Machine Learning 1 / 12

Transcript of Data Science = Machine Learning - bigdata.uni-saarland.de Science and Machine... · Data Science...

Page 1: Data Science = Machine Learning - bigdata.uni-saarland.de Science and Machine... · Data Science (one possible View) Application Domain Machine Learning A.I. Big Data! Management

Data Science 6= Machine Learning

Prof. Dr. Jens Dittrich

bigdata.uni-saarland.dedaimond.ai

twitter.com/jensdittrich

April 19, 2018

Prof. Dr. Jens Dittrich Data Science 6= Machine Learning 1 / 12

Page 2: Data Science = Machine Learning - bigdata.uni-saarland.de Science and Machine... · Data Science (one possible View) Application Domain Machine Learning A.I. Big Data! Management

Data Science (one possible View)

Application Domain

Machine Learning

A.I. Big DataManagement

Data Science Data Mining

Statistics

Prof. Dr. Jens Dittrich Data Science 6= Machine Learning 2 / 12

Page 3: Data Science = Machine Learning - bigdata.uni-saarland.de Science and Machine... · Data Science (one possible View) Application Domain Machine Learning A.I. Big Data! Management

Data Science (another View)

Artificial Intelligence

- Machine Learning

DataManagement

Data Science

Data Mining

Statistics

Math

Programming

Visualization

Application Domain

Prof. Dr. Jens Dittrich Data Science 6= Machine Learning 3 / 12

Page 4: Data Science = Machine Learning - bigdata.uni-saarland.de Science and Machine... · Data Science (one possible View) Application Domain Machine Learning A.I. Big Data! Management

Artificia

l Intellig

ence/

Machine Learning D

ata Managem

ent

Data Mining Application

Domain

The Data Science CakeIngredients: 50g statistics120g linear algebra200g programming1kg visualisation300g software engineering

Additional skills: creativityout of the box thinkinggritteam spirit

© istock.com sasilsolutions

Page 5: Data Science = Machine Learning - bigdata.uni-saarland.de Science and Machine... · Data Science (one possible View) Application Domain Machine Learning A.I. Big Data! Management

Data Science = Three Thirds

Definition (Data Science)

Data Science =

1/3 Artificial Intelligence (⊃ Machine Learning) +

1/3 Data Mining +

1/3 Data Management.

Preach and get involved!

It is our job as a community to spread the word about this and getinvolved! Otherwise we will again witness the reinvention of the wheel(e.g. like in NoSQL).

Prof. Dr. Jens Dittrich Data Science 6= Machine Learning 5 / 12

Page 6: Data Science = Machine Learning - bigdata.uni-saarland.de Science and Machine... · Data Science (one possible View) Application Domain Machine Learning A.I. Big Data! Management

Opportunity!

In Data Science there is tremendous opportunity for data management.

Translates to:

In Data Science there is tremendous opportunity for us!

Prof. Dr. Jens Dittrich Data Science 6= Machine Learning 6 / 12

Page 7: Data Science = Machine Learning - bigdata.uni-saarland.de Science and Machine... · Data Science (one possible View) Application Domain Machine Learning A.I. Big Data! Management

The Data Science-Pipeline/Waterfall ModelDatenbanken im Wasserfallmodell?data collection

data acquisition

data profiling,exploration

&visualization

data cleaning

feature engineering

modeling

model training

model testing

result interpretation

Achtung: Data Transformation kann zu beliebigen Zeitpunktenstattfinden und ist deshalb hier nicht als extra Schritt eingezeichnet.

Prof. Dr. Jens Dittrich Datenbanken 2 / 9

This is at the same time a process model and a dataflow.

Prof. Dr. Jens Dittrich Data Science 6= Machine Learning 7 / 12

Page 8: Data Science = Machine Learning - bigdata.uni-saarland.de Science and Machine... · Data Science (one possible View) Application Domain Machine Learning A.I. Big Data! Management

AI/ML as the GoalDatenbanken im Wasserfallmodell?data collection

data acquisition

data profiling,exploration

&visualization

data cleaning

feature engineering

modeling

model training

model testing

result interpretation

Achtung: Data Transformation kann zu beliebigen Zeitpunktenstattfinden und ist deshalb hier nicht als extra Schritt eingezeichnet.

Prof. Dr. Jens Dittrich Datenbanken 2 / 9

AI&ML

Data Collection through data cleaning has a single goal here:enable AI&ML

Prof. Dr. Jens Dittrich Data Science 6= Machine Learning 8 / 12

Page 9: Data Science = Machine Learning - bigdata.uni-saarland.de Science and Machine... · Data Science (one possible View) Application Domain Machine Learning A.I. Big Data! Management

Data Curation/OLAP as the GoalDatenbanken im Wasserfallmodell?data collection

data acquisition

data profiling,exploration

&visualization

data cleaning

feature engineering

modeling

model training

model testing

result interpretation

Achtung: Data Transformation kann zu beliebigen Zeitpunktenstattfinden und ist deshalb hier nicht als extra Schritt eingezeichnet.

Prof. Dr. Jens Dittrich Datenbanken 2 / 9

Database Management System (DBMS)

Data Collection through data cleaning has a single goal here:enable data curation/OLAP

Prof. Dr. Jens Dittrich Data Science 6= Machine Learning 9 / 12

Page 10: Data Science = Machine Learning - bigdata.uni-saarland.de Science and Machine... · Data Science (one possible View) Application Domain Machine Learning A.I. Big Data! Management

Alternative: DBMS as an Intermediate Tool

In principle, this is possible at any step.

Prof. Dr. Jens Dittrich Data Science 6= Machine Learning 10 / 12

Page 11: Data Science = Machine Learning - bigdata.uni-saarland.de Science and Machine... · Data Science (one possible View) Application Domain Machine Learning A.I. Big Data! Management

Alternative: DBMS as an Intermediate Tool

Examples: DeepDive, HoloClean,other steps: MonetDB/Tensorflow marriageany data transformation: relational algebra-style, e.g. Pandas, statelessDBMS, Spark/Flink, etc.

Prof. Dr. Jens Dittrich Data Science 6= Machine Learning 11 / 12

Page 12: Data Science = Machine Learning - bigdata.uni-saarland.de Science and Machine... · Data Science (one possible View) Application Domain Machine Learning A.I. Big Data! Management

Summary: Data Science = Three Thirds

Definition (Data Science)

Data Science =

1/3 Artificial Intelligence (⊃ Machine Learning) +

1/3 Data Mining +

1/3 Data Management.

Preach and get involved!

It is our job as a community to spread the word about this and getinvolved! Otherwise we will again witness the reinvention of the wheel(e.g. like in NoSQL).

Prof. Dr. Jens Dittrich Data Science 6= Machine Learning 12 / 12