Introduction - ETH Z · Introduction Evangelos Pournaras, Izabela Moise Evangelos Pournaras,...

28
Introduction Evangelos Pournaras, Izabela Moise Evangelos Pournaras, Izabela Moise 1

Transcript of Introduction - ETH Z · Introduction Evangelos Pournaras, Izabela Moise Evangelos Pournaras,...

Page 1: Introduction - ETH Z · Introduction Evangelos Pournaras, Izabela Moise Evangelos Pournaras, Izabela Moise 1. Outline 1.Data Science 2.Course Description ... What is Data Science?

IntroductionEvangelos Pournaras Izabela Moise

Evangelos Pournaras Izabela Moise 1

Outline

1 Data Science

2 Course Description

Evangelos Pournaras Izabela Moise 2

Part 1 - Data Science

Evangelos Pournaras Izabela Moise 3

What is Data Science

A collection of orchestrated methods from different scientific fieldseg statistics computer science etc that provide understanding ofdomain data and result in data-based products and services

Evangelos Pournaras Izabela Moise 4

Is Data Science about Big Data I

Evangelos Pournaras Izabela Moise 5

Is Data Science about Big Data II

Itrsquos more about using the right dataand asking the right questions

Evangelos Pournaras Izabela Moise 6

What about Techno-socio-economic Systems

Evangelos Pournaras Izabela Moise 7

ICT amp Techno-socio-economic Systems

bull Embedded ICT systems in most societal domains How

bull Internet of Things pervasiveubiquitous computing advancednetworking systems inter-operability Result

bull A new explosion of data sources Opportunities

bull Understanding improving managing amp sustaining our complexsociety Threats

bull Privacy discrimination misinterpretations over-fitting etc

Evangelos Pournaras Izabela Moise 8

Threats I

Evangelos Pournaras Izabela Moise 9

Threats II

Evangelos Pournaras Izabela Moise 10

Who is a Data Scientist

bull A statistician

bull A computer programmer

bull Both and More

TipDomain knowledge can be more valuable than machine learning datamining etc

Evangelos Pournaras Izabela Moise 11

Real-world Profile I

Evangelos Pournaras Izabela Moise 12

Real-world Profile II

Evangelos Pournaras Izabela Moise 13

More about Data Scientists

httpshbrorg201210data-scientist-the-sexiest-job-of-the-21st-century

Evangelos Pournaras Izabela Moise 14

More about Data Scientists

Data scientistsrsquo most basic universal skill is the ability to writecode This may be less true in five yearsrsquo time when many morepeople will have the title data scientist on their business cardsMore enduring will be the need for data scientists to communicate inlanguage that all their stakeholders understand-and to demonstratethe special skills involved in storytelling with data whetherverbally visually or - ideally both

Evangelos Pournaras Izabela Moise 15

More about Data Scientists

But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested

Evangelos Pournaras Izabela Moise 16

More about Data Scientists

A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed

A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata

And while people without strong social skills might thrive intraditional data professions data scientists must have suchskills to be effective

Evangelos Pournaras Izabela Moise 17

Part 2 - Course Description

Evangelos Pournaras Izabela Moise 18

Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data

1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data

3 Awareness about the applicability of different data sciencemethods

4 Development of technical skills eg programming use ofdifferent tools etc

5 Presenting scientific results both written and orally

Evangelos Pournaras Izabela Moise 19

Course Prerequisites

Some programming skills are required eg skills for the material ofthis course

1 JavaC++Python

2 UNIX

Didnrsquot you have an opportunity to practice this earlier

No problem this is a golden opportunity

TipProgramming skills will make you more flexible and efficient datascientist

Evangelos Pournaras Izabela Moise 20

Assessment

bull Seminar thesis

bull 100 of the grade no exams

bull Detailed illustration in a next lecture

TipStart early Give the opportunity for your project and your skills todevelop during the course

Evangelos Pournaras Izabela Moise 21

Lectures

bull Every Monday 1715-1900 at LFW B 1

bull Participation is not obligatory but highly recommended

bull 60 minutes lectures followed by 40 minutes interactivediscussions

bull Opportunity to discuss your projectbull Lectures at

httpwwwcossethzcheducationdatasciencehtml

Evangelos Pournaras Izabela Moise 22

Subjects I

1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining

privacyndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentalsndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics

Evangelos Pournaras Izabela Moise 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 Otherndash Project presentations

Evangelos Pournaras Izabela Moise 24

Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications

Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations

Evangelos Pournaras Izabela Moise 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise 28

Page 2: Introduction - ETH Z · Introduction Evangelos Pournaras, Izabela Moise Evangelos Pournaras, Izabela Moise 1. Outline 1.Data Science 2.Course Description ... What is Data Science?

Outline

1 Data Science

2 Course Description

Evangelos Pournaras Izabela Moise 2

Part 1 - Data Science

Evangelos Pournaras Izabela Moise 3

What is Data Science

A collection of orchestrated methods from different scientific fieldseg statistics computer science etc that provide understanding ofdomain data and result in data-based products and services

Evangelos Pournaras Izabela Moise 4

Is Data Science about Big Data I

Evangelos Pournaras Izabela Moise 5

Is Data Science about Big Data II

Itrsquos more about using the right dataand asking the right questions

Evangelos Pournaras Izabela Moise 6

What about Techno-socio-economic Systems

Evangelos Pournaras Izabela Moise 7

ICT amp Techno-socio-economic Systems

bull Embedded ICT systems in most societal domains How

bull Internet of Things pervasiveubiquitous computing advancednetworking systems inter-operability Result

bull A new explosion of data sources Opportunities

bull Understanding improving managing amp sustaining our complexsociety Threats

bull Privacy discrimination misinterpretations over-fitting etc

Evangelos Pournaras Izabela Moise 8

Threats I

Evangelos Pournaras Izabela Moise 9

Threats II

Evangelos Pournaras Izabela Moise 10

Who is a Data Scientist

bull A statistician

bull A computer programmer

bull Both and More

TipDomain knowledge can be more valuable than machine learning datamining etc

Evangelos Pournaras Izabela Moise 11

Real-world Profile I

Evangelos Pournaras Izabela Moise 12

Real-world Profile II

Evangelos Pournaras Izabela Moise 13

More about Data Scientists

httpshbrorg201210data-scientist-the-sexiest-job-of-the-21st-century

Evangelos Pournaras Izabela Moise 14

More about Data Scientists

Data scientistsrsquo most basic universal skill is the ability to writecode This may be less true in five yearsrsquo time when many morepeople will have the title data scientist on their business cardsMore enduring will be the need for data scientists to communicate inlanguage that all their stakeholders understand-and to demonstratethe special skills involved in storytelling with data whetherverbally visually or - ideally both

Evangelos Pournaras Izabela Moise 15

More about Data Scientists

But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested

Evangelos Pournaras Izabela Moise 16

More about Data Scientists

A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed

A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata

And while people without strong social skills might thrive intraditional data professions data scientists must have suchskills to be effective

Evangelos Pournaras Izabela Moise 17

Part 2 - Course Description

Evangelos Pournaras Izabela Moise 18

Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data

1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data

3 Awareness about the applicability of different data sciencemethods

4 Development of technical skills eg programming use ofdifferent tools etc

5 Presenting scientific results both written and orally

Evangelos Pournaras Izabela Moise 19

Course Prerequisites

Some programming skills are required eg skills for the material ofthis course

1 JavaC++Python

2 UNIX

Didnrsquot you have an opportunity to practice this earlier

No problem this is a golden opportunity

TipProgramming skills will make you more flexible and efficient datascientist

Evangelos Pournaras Izabela Moise 20

Assessment

bull Seminar thesis

bull 100 of the grade no exams

bull Detailed illustration in a next lecture

TipStart early Give the opportunity for your project and your skills todevelop during the course

Evangelos Pournaras Izabela Moise 21

Lectures

bull Every Monday 1715-1900 at LFW B 1

bull Participation is not obligatory but highly recommended

bull 60 minutes lectures followed by 40 minutes interactivediscussions

bull Opportunity to discuss your projectbull Lectures at

httpwwwcossethzcheducationdatasciencehtml

Evangelos Pournaras Izabela Moise 22

Subjects I

1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining

privacyndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentalsndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics

Evangelos Pournaras Izabela Moise 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 Otherndash Project presentations

Evangelos Pournaras Izabela Moise 24

Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications

Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations

Evangelos Pournaras Izabela Moise 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise 28

Page 3: Introduction - ETH Z · Introduction Evangelos Pournaras, Izabela Moise Evangelos Pournaras, Izabela Moise 1. Outline 1.Data Science 2.Course Description ... What is Data Science?

Part 1 - Data Science

Evangelos Pournaras Izabela Moise 3

What is Data Science

A collection of orchestrated methods from different scientific fieldseg statistics computer science etc that provide understanding ofdomain data and result in data-based products and services

Evangelos Pournaras Izabela Moise 4

Is Data Science about Big Data I

Evangelos Pournaras Izabela Moise 5

Is Data Science about Big Data II

Itrsquos more about using the right dataand asking the right questions

Evangelos Pournaras Izabela Moise 6

What about Techno-socio-economic Systems

Evangelos Pournaras Izabela Moise 7

ICT amp Techno-socio-economic Systems

bull Embedded ICT systems in most societal domains How

bull Internet of Things pervasiveubiquitous computing advancednetworking systems inter-operability Result

bull A new explosion of data sources Opportunities

bull Understanding improving managing amp sustaining our complexsociety Threats

bull Privacy discrimination misinterpretations over-fitting etc

Evangelos Pournaras Izabela Moise 8

Threats I

Evangelos Pournaras Izabela Moise 9

Threats II

Evangelos Pournaras Izabela Moise 10

Who is a Data Scientist

bull A statistician

bull A computer programmer

bull Both and More

TipDomain knowledge can be more valuable than machine learning datamining etc

Evangelos Pournaras Izabela Moise 11

Real-world Profile I

Evangelos Pournaras Izabela Moise 12

Real-world Profile II

Evangelos Pournaras Izabela Moise 13

More about Data Scientists

httpshbrorg201210data-scientist-the-sexiest-job-of-the-21st-century

Evangelos Pournaras Izabela Moise 14

More about Data Scientists

Data scientistsrsquo most basic universal skill is the ability to writecode This may be less true in five yearsrsquo time when many morepeople will have the title data scientist on their business cardsMore enduring will be the need for data scientists to communicate inlanguage that all their stakeholders understand-and to demonstratethe special skills involved in storytelling with data whetherverbally visually or - ideally both

Evangelos Pournaras Izabela Moise 15

More about Data Scientists

But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested

Evangelos Pournaras Izabela Moise 16

More about Data Scientists

A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed

A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata

And while people without strong social skills might thrive intraditional data professions data scientists must have suchskills to be effective

Evangelos Pournaras Izabela Moise 17

Part 2 - Course Description

Evangelos Pournaras Izabela Moise 18

Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data

1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data

3 Awareness about the applicability of different data sciencemethods

4 Development of technical skills eg programming use ofdifferent tools etc

5 Presenting scientific results both written and orally

Evangelos Pournaras Izabela Moise 19

Course Prerequisites

Some programming skills are required eg skills for the material ofthis course

1 JavaC++Python

2 UNIX

Didnrsquot you have an opportunity to practice this earlier

No problem this is a golden opportunity

TipProgramming skills will make you more flexible and efficient datascientist

Evangelos Pournaras Izabela Moise 20

Assessment

bull Seminar thesis

bull 100 of the grade no exams

bull Detailed illustration in a next lecture

TipStart early Give the opportunity for your project and your skills todevelop during the course

Evangelos Pournaras Izabela Moise 21

Lectures

bull Every Monday 1715-1900 at LFW B 1

bull Participation is not obligatory but highly recommended

bull 60 minutes lectures followed by 40 minutes interactivediscussions

bull Opportunity to discuss your projectbull Lectures at

httpwwwcossethzcheducationdatasciencehtml

Evangelos Pournaras Izabela Moise 22

Subjects I

1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining

privacyndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentalsndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics

Evangelos Pournaras Izabela Moise 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 Otherndash Project presentations

Evangelos Pournaras Izabela Moise 24

Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications

Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations

Evangelos Pournaras Izabela Moise 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise 28

Page 4: Introduction - ETH Z · Introduction Evangelos Pournaras, Izabela Moise Evangelos Pournaras, Izabela Moise 1. Outline 1.Data Science 2.Course Description ... What is Data Science?

What is Data Science

A collection of orchestrated methods from different scientific fieldseg statistics computer science etc that provide understanding ofdomain data and result in data-based products and services

Evangelos Pournaras Izabela Moise 4

Is Data Science about Big Data I

Evangelos Pournaras Izabela Moise 5

Is Data Science about Big Data II

Itrsquos more about using the right dataand asking the right questions

Evangelos Pournaras Izabela Moise 6

What about Techno-socio-economic Systems

Evangelos Pournaras Izabela Moise 7

ICT amp Techno-socio-economic Systems

bull Embedded ICT systems in most societal domains How

bull Internet of Things pervasiveubiquitous computing advancednetworking systems inter-operability Result

bull A new explosion of data sources Opportunities

bull Understanding improving managing amp sustaining our complexsociety Threats

bull Privacy discrimination misinterpretations over-fitting etc

Evangelos Pournaras Izabela Moise 8

Threats I

Evangelos Pournaras Izabela Moise 9

Threats II

Evangelos Pournaras Izabela Moise 10

Who is a Data Scientist

bull A statistician

bull A computer programmer

bull Both and More

TipDomain knowledge can be more valuable than machine learning datamining etc

Evangelos Pournaras Izabela Moise 11

Real-world Profile I

Evangelos Pournaras Izabela Moise 12

Real-world Profile II

Evangelos Pournaras Izabela Moise 13

More about Data Scientists

httpshbrorg201210data-scientist-the-sexiest-job-of-the-21st-century

Evangelos Pournaras Izabela Moise 14

More about Data Scientists

Data scientistsrsquo most basic universal skill is the ability to writecode This may be less true in five yearsrsquo time when many morepeople will have the title data scientist on their business cardsMore enduring will be the need for data scientists to communicate inlanguage that all their stakeholders understand-and to demonstratethe special skills involved in storytelling with data whetherverbally visually or - ideally both

Evangelos Pournaras Izabela Moise 15

More about Data Scientists

But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested

Evangelos Pournaras Izabela Moise 16

More about Data Scientists

A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed

A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata

And while people without strong social skills might thrive intraditional data professions data scientists must have suchskills to be effective

Evangelos Pournaras Izabela Moise 17

Part 2 - Course Description

Evangelos Pournaras Izabela Moise 18

Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data

1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data

3 Awareness about the applicability of different data sciencemethods

4 Development of technical skills eg programming use ofdifferent tools etc

5 Presenting scientific results both written and orally

Evangelos Pournaras Izabela Moise 19

Course Prerequisites

Some programming skills are required eg skills for the material ofthis course

1 JavaC++Python

2 UNIX

Didnrsquot you have an opportunity to practice this earlier

No problem this is a golden opportunity

TipProgramming skills will make you more flexible and efficient datascientist

Evangelos Pournaras Izabela Moise 20

Assessment

bull Seminar thesis

bull 100 of the grade no exams

bull Detailed illustration in a next lecture

TipStart early Give the opportunity for your project and your skills todevelop during the course

Evangelos Pournaras Izabela Moise 21

Lectures

bull Every Monday 1715-1900 at LFW B 1

bull Participation is not obligatory but highly recommended

bull 60 minutes lectures followed by 40 minutes interactivediscussions

bull Opportunity to discuss your projectbull Lectures at

httpwwwcossethzcheducationdatasciencehtml

Evangelos Pournaras Izabela Moise 22

Subjects I

1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining

privacyndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentalsndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics

Evangelos Pournaras Izabela Moise 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 Otherndash Project presentations

Evangelos Pournaras Izabela Moise 24

Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications

Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations

Evangelos Pournaras Izabela Moise 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise 28

Page 5: Introduction - ETH Z · Introduction Evangelos Pournaras, Izabela Moise Evangelos Pournaras, Izabela Moise 1. Outline 1.Data Science 2.Course Description ... What is Data Science?

Is Data Science about Big Data I

Evangelos Pournaras Izabela Moise 5

Is Data Science about Big Data II

Itrsquos more about using the right dataand asking the right questions

Evangelos Pournaras Izabela Moise 6

What about Techno-socio-economic Systems

Evangelos Pournaras Izabela Moise 7

ICT amp Techno-socio-economic Systems

bull Embedded ICT systems in most societal domains How

bull Internet of Things pervasiveubiquitous computing advancednetworking systems inter-operability Result

bull A new explosion of data sources Opportunities

bull Understanding improving managing amp sustaining our complexsociety Threats

bull Privacy discrimination misinterpretations over-fitting etc

Evangelos Pournaras Izabela Moise 8

Threats I

Evangelos Pournaras Izabela Moise 9

Threats II

Evangelos Pournaras Izabela Moise 10

Who is a Data Scientist

bull A statistician

bull A computer programmer

bull Both and More

TipDomain knowledge can be more valuable than machine learning datamining etc

Evangelos Pournaras Izabela Moise 11

Real-world Profile I

Evangelos Pournaras Izabela Moise 12

Real-world Profile II

Evangelos Pournaras Izabela Moise 13

More about Data Scientists

httpshbrorg201210data-scientist-the-sexiest-job-of-the-21st-century

Evangelos Pournaras Izabela Moise 14

More about Data Scientists

Data scientistsrsquo most basic universal skill is the ability to writecode This may be less true in five yearsrsquo time when many morepeople will have the title data scientist on their business cardsMore enduring will be the need for data scientists to communicate inlanguage that all their stakeholders understand-and to demonstratethe special skills involved in storytelling with data whetherverbally visually or - ideally both

Evangelos Pournaras Izabela Moise 15

More about Data Scientists

But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested

Evangelos Pournaras Izabela Moise 16

More about Data Scientists

A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed

A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata

And while people without strong social skills might thrive intraditional data professions data scientists must have suchskills to be effective

Evangelos Pournaras Izabela Moise 17

Part 2 - Course Description

Evangelos Pournaras Izabela Moise 18

Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data

1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data

3 Awareness about the applicability of different data sciencemethods

4 Development of technical skills eg programming use ofdifferent tools etc

5 Presenting scientific results both written and orally

Evangelos Pournaras Izabela Moise 19

Course Prerequisites

Some programming skills are required eg skills for the material ofthis course

1 JavaC++Python

2 UNIX

Didnrsquot you have an opportunity to practice this earlier

No problem this is a golden opportunity

TipProgramming skills will make you more flexible and efficient datascientist

Evangelos Pournaras Izabela Moise 20

Assessment

bull Seminar thesis

bull 100 of the grade no exams

bull Detailed illustration in a next lecture

TipStart early Give the opportunity for your project and your skills todevelop during the course

Evangelos Pournaras Izabela Moise 21

Lectures

bull Every Monday 1715-1900 at LFW B 1

bull Participation is not obligatory but highly recommended

bull 60 minutes lectures followed by 40 minutes interactivediscussions

bull Opportunity to discuss your projectbull Lectures at

httpwwwcossethzcheducationdatasciencehtml

Evangelos Pournaras Izabela Moise 22

Subjects I

1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining

privacyndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentalsndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics

Evangelos Pournaras Izabela Moise 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 Otherndash Project presentations

Evangelos Pournaras Izabela Moise 24

Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications

Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations

Evangelos Pournaras Izabela Moise 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise 28

Page 6: Introduction - ETH Z · Introduction Evangelos Pournaras, Izabela Moise Evangelos Pournaras, Izabela Moise 1. Outline 1.Data Science 2.Course Description ... What is Data Science?

Is Data Science about Big Data II

Itrsquos more about using the right dataand asking the right questions

Evangelos Pournaras Izabela Moise 6

What about Techno-socio-economic Systems

Evangelos Pournaras Izabela Moise 7

ICT amp Techno-socio-economic Systems

bull Embedded ICT systems in most societal domains How

bull Internet of Things pervasiveubiquitous computing advancednetworking systems inter-operability Result

bull A new explosion of data sources Opportunities

bull Understanding improving managing amp sustaining our complexsociety Threats

bull Privacy discrimination misinterpretations over-fitting etc

Evangelos Pournaras Izabela Moise 8

Threats I

Evangelos Pournaras Izabela Moise 9

Threats II

Evangelos Pournaras Izabela Moise 10

Who is a Data Scientist

bull A statistician

bull A computer programmer

bull Both and More

TipDomain knowledge can be more valuable than machine learning datamining etc

Evangelos Pournaras Izabela Moise 11

Real-world Profile I

Evangelos Pournaras Izabela Moise 12

Real-world Profile II

Evangelos Pournaras Izabela Moise 13

More about Data Scientists

httpshbrorg201210data-scientist-the-sexiest-job-of-the-21st-century

Evangelos Pournaras Izabela Moise 14

More about Data Scientists

Data scientistsrsquo most basic universal skill is the ability to writecode This may be less true in five yearsrsquo time when many morepeople will have the title data scientist on their business cardsMore enduring will be the need for data scientists to communicate inlanguage that all their stakeholders understand-and to demonstratethe special skills involved in storytelling with data whetherverbally visually or - ideally both

Evangelos Pournaras Izabela Moise 15

More about Data Scientists

But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested

Evangelos Pournaras Izabela Moise 16

More about Data Scientists

A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed

A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata

And while people without strong social skills might thrive intraditional data professions data scientists must have suchskills to be effective

Evangelos Pournaras Izabela Moise 17

Part 2 - Course Description

Evangelos Pournaras Izabela Moise 18

Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data

1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data

3 Awareness about the applicability of different data sciencemethods

4 Development of technical skills eg programming use ofdifferent tools etc

5 Presenting scientific results both written and orally

Evangelos Pournaras Izabela Moise 19

Course Prerequisites

Some programming skills are required eg skills for the material ofthis course

1 JavaC++Python

2 UNIX

Didnrsquot you have an opportunity to practice this earlier

No problem this is a golden opportunity

TipProgramming skills will make you more flexible and efficient datascientist

Evangelos Pournaras Izabela Moise 20

Assessment

bull Seminar thesis

bull 100 of the grade no exams

bull Detailed illustration in a next lecture

TipStart early Give the opportunity for your project and your skills todevelop during the course

Evangelos Pournaras Izabela Moise 21

Lectures

bull Every Monday 1715-1900 at LFW B 1

bull Participation is not obligatory but highly recommended

bull 60 minutes lectures followed by 40 minutes interactivediscussions

bull Opportunity to discuss your projectbull Lectures at

httpwwwcossethzcheducationdatasciencehtml

Evangelos Pournaras Izabela Moise 22

Subjects I

1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining

privacyndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentalsndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics

Evangelos Pournaras Izabela Moise 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 Otherndash Project presentations

Evangelos Pournaras Izabela Moise 24

Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications

Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations

Evangelos Pournaras Izabela Moise 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise 28

Page 7: Introduction - ETH Z · Introduction Evangelos Pournaras, Izabela Moise Evangelos Pournaras, Izabela Moise 1. Outline 1.Data Science 2.Course Description ... What is Data Science?

What about Techno-socio-economic Systems

Evangelos Pournaras Izabela Moise 7

ICT amp Techno-socio-economic Systems

bull Embedded ICT systems in most societal domains How

bull Internet of Things pervasiveubiquitous computing advancednetworking systems inter-operability Result

bull A new explosion of data sources Opportunities

bull Understanding improving managing amp sustaining our complexsociety Threats

bull Privacy discrimination misinterpretations over-fitting etc

Evangelos Pournaras Izabela Moise 8

Threats I

Evangelos Pournaras Izabela Moise 9

Threats II

Evangelos Pournaras Izabela Moise 10

Who is a Data Scientist

bull A statistician

bull A computer programmer

bull Both and More

TipDomain knowledge can be more valuable than machine learning datamining etc

Evangelos Pournaras Izabela Moise 11

Real-world Profile I

Evangelos Pournaras Izabela Moise 12

Real-world Profile II

Evangelos Pournaras Izabela Moise 13

More about Data Scientists

httpshbrorg201210data-scientist-the-sexiest-job-of-the-21st-century

Evangelos Pournaras Izabela Moise 14

More about Data Scientists

Data scientistsrsquo most basic universal skill is the ability to writecode This may be less true in five yearsrsquo time when many morepeople will have the title data scientist on their business cardsMore enduring will be the need for data scientists to communicate inlanguage that all their stakeholders understand-and to demonstratethe special skills involved in storytelling with data whetherverbally visually or - ideally both

Evangelos Pournaras Izabela Moise 15

More about Data Scientists

But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested

Evangelos Pournaras Izabela Moise 16

More about Data Scientists

A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed

A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata

And while people without strong social skills might thrive intraditional data professions data scientists must have suchskills to be effective

Evangelos Pournaras Izabela Moise 17

Part 2 - Course Description

Evangelos Pournaras Izabela Moise 18

Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data

1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data

3 Awareness about the applicability of different data sciencemethods

4 Development of technical skills eg programming use ofdifferent tools etc

5 Presenting scientific results both written and orally

Evangelos Pournaras Izabela Moise 19

Course Prerequisites

Some programming skills are required eg skills for the material ofthis course

1 JavaC++Python

2 UNIX

Didnrsquot you have an opportunity to practice this earlier

No problem this is a golden opportunity

TipProgramming skills will make you more flexible and efficient datascientist

Evangelos Pournaras Izabela Moise 20

Assessment

bull Seminar thesis

bull 100 of the grade no exams

bull Detailed illustration in a next lecture

TipStart early Give the opportunity for your project and your skills todevelop during the course

Evangelos Pournaras Izabela Moise 21

Lectures

bull Every Monday 1715-1900 at LFW B 1

bull Participation is not obligatory but highly recommended

bull 60 minutes lectures followed by 40 minutes interactivediscussions

bull Opportunity to discuss your projectbull Lectures at

httpwwwcossethzcheducationdatasciencehtml

Evangelos Pournaras Izabela Moise 22

Subjects I

1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining

privacyndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentalsndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics

Evangelos Pournaras Izabela Moise 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 Otherndash Project presentations

Evangelos Pournaras Izabela Moise 24

Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications

Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations

Evangelos Pournaras Izabela Moise 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise 28

Page 8: Introduction - ETH Z · Introduction Evangelos Pournaras, Izabela Moise Evangelos Pournaras, Izabela Moise 1. Outline 1.Data Science 2.Course Description ... What is Data Science?

ICT amp Techno-socio-economic Systems

bull Embedded ICT systems in most societal domains How

bull Internet of Things pervasiveubiquitous computing advancednetworking systems inter-operability Result

bull A new explosion of data sources Opportunities

bull Understanding improving managing amp sustaining our complexsociety Threats

bull Privacy discrimination misinterpretations over-fitting etc

Evangelos Pournaras Izabela Moise 8

Threats I

Evangelos Pournaras Izabela Moise 9

Threats II

Evangelos Pournaras Izabela Moise 10

Who is a Data Scientist

bull A statistician

bull A computer programmer

bull Both and More

TipDomain knowledge can be more valuable than machine learning datamining etc

Evangelos Pournaras Izabela Moise 11

Real-world Profile I

Evangelos Pournaras Izabela Moise 12

Real-world Profile II

Evangelos Pournaras Izabela Moise 13

More about Data Scientists

httpshbrorg201210data-scientist-the-sexiest-job-of-the-21st-century

Evangelos Pournaras Izabela Moise 14

More about Data Scientists

Data scientistsrsquo most basic universal skill is the ability to writecode This may be less true in five yearsrsquo time when many morepeople will have the title data scientist on their business cardsMore enduring will be the need for data scientists to communicate inlanguage that all their stakeholders understand-and to demonstratethe special skills involved in storytelling with data whetherverbally visually or - ideally both

Evangelos Pournaras Izabela Moise 15

More about Data Scientists

But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested

Evangelos Pournaras Izabela Moise 16

More about Data Scientists

A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed

A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata

And while people without strong social skills might thrive intraditional data professions data scientists must have suchskills to be effective

Evangelos Pournaras Izabela Moise 17

Part 2 - Course Description

Evangelos Pournaras Izabela Moise 18

Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data

1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data

3 Awareness about the applicability of different data sciencemethods

4 Development of technical skills eg programming use ofdifferent tools etc

5 Presenting scientific results both written and orally

Evangelos Pournaras Izabela Moise 19

Course Prerequisites

Some programming skills are required eg skills for the material ofthis course

1 JavaC++Python

2 UNIX

Didnrsquot you have an opportunity to practice this earlier

No problem this is a golden opportunity

TipProgramming skills will make you more flexible and efficient datascientist

Evangelos Pournaras Izabela Moise 20

Assessment

bull Seminar thesis

bull 100 of the grade no exams

bull Detailed illustration in a next lecture

TipStart early Give the opportunity for your project and your skills todevelop during the course

Evangelos Pournaras Izabela Moise 21

Lectures

bull Every Monday 1715-1900 at LFW B 1

bull Participation is not obligatory but highly recommended

bull 60 minutes lectures followed by 40 minutes interactivediscussions

bull Opportunity to discuss your projectbull Lectures at

httpwwwcossethzcheducationdatasciencehtml

Evangelos Pournaras Izabela Moise 22

Subjects I

1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining

privacyndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentalsndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics

Evangelos Pournaras Izabela Moise 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 Otherndash Project presentations

Evangelos Pournaras Izabela Moise 24

Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications

Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations

Evangelos Pournaras Izabela Moise 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise 28

Page 9: Introduction - ETH Z · Introduction Evangelos Pournaras, Izabela Moise Evangelos Pournaras, Izabela Moise 1. Outline 1.Data Science 2.Course Description ... What is Data Science?

Threats I

Evangelos Pournaras Izabela Moise 9

Threats II

Evangelos Pournaras Izabela Moise 10

Who is a Data Scientist

bull A statistician

bull A computer programmer

bull Both and More

TipDomain knowledge can be more valuable than machine learning datamining etc

Evangelos Pournaras Izabela Moise 11

Real-world Profile I

Evangelos Pournaras Izabela Moise 12

Real-world Profile II

Evangelos Pournaras Izabela Moise 13

More about Data Scientists

httpshbrorg201210data-scientist-the-sexiest-job-of-the-21st-century

Evangelos Pournaras Izabela Moise 14

More about Data Scientists

Data scientistsrsquo most basic universal skill is the ability to writecode This may be less true in five yearsrsquo time when many morepeople will have the title data scientist on their business cardsMore enduring will be the need for data scientists to communicate inlanguage that all their stakeholders understand-and to demonstratethe special skills involved in storytelling with data whetherverbally visually or - ideally both

Evangelos Pournaras Izabela Moise 15

More about Data Scientists

But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested

Evangelos Pournaras Izabela Moise 16

More about Data Scientists

A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed

A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata

And while people without strong social skills might thrive intraditional data professions data scientists must have suchskills to be effective

Evangelos Pournaras Izabela Moise 17

Part 2 - Course Description

Evangelos Pournaras Izabela Moise 18

Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data

1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data

3 Awareness about the applicability of different data sciencemethods

4 Development of technical skills eg programming use ofdifferent tools etc

5 Presenting scientific results both written and orally

Evangelos Pournaras Izabela Moise 19

Course Prerequisites

Some programming skills are required eg skills for the material ofthis course

1 JavaC++Python

2 UNIX

Didnrsquot you have an opportunity to practice this earlier

No problem this is a golden opportunity

TipProgramming skills will make you more flexible and efficient datascientist

Evangelos Pournaras Izabela Moise 20

Assessment

bull Seminar thesis

bull 100 of the grade no exams

bull Detailed illustration in a next lecture

TipStart early Give the opportunity for your project and your skills todevelop during the course

Evangelos Pournaras Izabela Moise 21

Lectures

bull Every Monday 1715-1900 at LFW B 1

bull Participation is not obligatory but highly recommended

bull 60 minutes lectures followed by 40 minutes interactivediscussions

bull Opportunity to discuss your projectbull Lectures at

httpwwwcossethzcheducationdatasciencehtml

Evangelos Pournaras Izabela Moise 22

Subjects I

1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining

privacyndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentalsndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics

Evangelos Pournaras Izabela Moise 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 Otherndash Project presentations

Evangelos Pournaras Izabela Moise 24

Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications

Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations

Evangelos Pournaras Izabela Moise 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise 28

Page 10: Introduction - ETH Z · Introduction Evangelos Pournaras, Izabela Moise Evangelos Pournaras, Izabela Moise 1. Outline 1.Data Science 2.Course Description ... What is Data Science?

Threats II

Evangelos Pournaras Izabela Moise 10

Who is a Data Scientist

bull A statistician

bull A computer programmer

bull Both and More

TipDomain knowledge can be more valuable than machine learning datamining etc

Evangelos Pournaras Izabela Moise 11

Real-world Profile I

Evangelos Pournaras Izabela Moise 12

Real-world Profile II

Evangelos Pournaras Izabela Moise 13

More about Data Scientists

httpshbrorg201210data-scientist-the-sexiest-job-of-the-21st-century

Evangelos Pournaras Izabela Moise 14

More about Data Scientists

Data scientistsrsquo most basic universal skill is the ability to writecode This may be less true in five yearsrsquo time when many morepeople will have the title data scientist on their business cardsMore enduring will be the need for data scientists to communicate inlanguage that all their stakeholders understand-and to demonstratethe special skills involved in storytelling with data whetherverbally visually or - ideally both

Evangelos Pournaras Izabela Moise 15

More about Data Scientists

But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested

Evangelos Pournaras Izabela Moise 16

More about Data Scientists

A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed

A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata

And while people without strong social skills might thrive intraditional data professions data scientists must have suchskills to be effective

Evangelos Pournaras Izabela Moise 17

Part 2 - Course Description

Evangelos Pournaras Izabela Moise 18

Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data

1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data

3 Awareness about the applicability of different data sciencemethods

4 Development of technical skills eg programming use ofdifferent tools etc

5 Presenting scientific results both written and orally

Evangelos Pournaras Izabela Moise 19

Course Prerequisites

Some programming skills are required eg skills for the material ofthis course

1 JavaC++Python

2 UNIX

Didnrsquot you have an opportunity to practice this earlier

No problem this is a golden opportunity

TipProgramming skills will make you more flexible and efficient datascientist

Evangelos Pournaras Izabela Moise 20

Assessment

bull Seminar thesis

bull 100 of the grade no exams

bull Detailed illustration in a next lecture

TipStart early Give the opportunity for your project and your skills todevelop during the course

Evangelos Pournaras Izabela Moise 21

Lectures

bull Every Monday 1715-1900 at LFW B 1

bull Participation is not obligatory but highly recommended

bull 60 minutes lectures followed by 40 minutes interactivediscussions

bull Opportunity to discuss your projectbull Lectures at

httpwwwcossethzcheducationdatasciencehtml

Evangelos Pournaras Izabela Moise 22

Subjects I

1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining

privacyndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentalsndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics

Evangelos Pournaras Izabela Moise 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 Otherndash Project presentations

Evangelos Pournaras Izabela Moise 24

Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications

Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations

Evangelos Pournaras Izabela Moise 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise 28

Page 11: Introduction - ETH Z · Introduction Evangelos Pournaras, Izabela Moise Evangelos Pournaras, Izabela Moise 1. Outline 1.Data Science 2.Course Description ... What is Data Science?

Who is a Data Scientist

bull A statistician

bull A computer programmer

bull Both and More

TipDomain knowledge can be more valuable than machine learning datamining etc

Evangelos Pournaras Izabela Moise 11

Real-world Profile I

Evangelos Pournaras Izabela Moise 12

Real-world Profile II

Evangelos Pournaras Izabela Moise 13

More about Data Scientists

httpshbrorg201210data-scientist-the-sexiest-job-of-the-21st-century

Evangelos Pournaras Izabela Moise 14

More about Data Scientists

Data scientistsrsquo most basic universal skill is the ability to writecode This may be less true in five yearsrsquo time when many morepeople will have the title data scientist on their business cardsMore enduring will be the need for data scientists to communicate inlanguage that all their stakeholders understand-and to demonstratethe special skills involved in storytelling with data whetherverbally visually or - ideally both

Evangelos Pournaras Izabela Moise 15

More about Data Scientists

But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested

Evangelos Pournaras Izabela Moise 16

More about Data Scientists

A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed

A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata

And while people without strong social skills might thrive intraditional data professions data scientists must have suchskills to be effective

Evangelos Pournaras Izabela Moise 17

Part 2 - Course Description

Evangelos Pournaras Izabela Moise 18

Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data

1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data

3 Awareness about the applicability of different data sciencemethods

4 Development of technical skills eg programming use ofdifferent tools etc

5 Presenting scientific results both written and orally

Evangelos Pournaras Izabela Moise 19

Course Prerequisites

Some programming skills are required eg skills for the material ofthis course

1 JavaC++Python

2 UNIX

Didnrsquot you have an opportunity to practice this earlier

No problem this is a golden opportunity

TipProgramming skills will make you more flexible and efficient datascientist

Evangelos Pournaras Izabela Moise 20

Assessment

bull Seminar thesis

bull 100 of the grade no exams

bull Detailed illustration in a next lecture

TipStart early Give the opportunity for your project and your skills todevelop during the course

Evangelos Pournaras Izabela Moise 21

Lectures

bull Every Monday 1715-1900 at LFW B 1

bull Participation is not obligatory but highly recommended

bull 60 minutes lectures followed by 40 minutes interactivediscussions

bull Opportunity to discuss your projectbull Lectures at

httpwwwcossethzcheducationdatasciencehtml

Evangelos Pournaras Izabela Moise 22

Subjects I

1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining

privacyndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentalsndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics

Evangelos Pournaras Izabela Moise 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 Otherndash Project presentations

Evangelos Pournaras Izabela Moise 24

Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications

Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations

Evangelos Pournaras Izabela Moise 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise 28

Page 12: Introduction - ETH Z · Introduction Evangelos Pournaras, Izabela Moise Evangelos Pournaras, Izabela Moise 1. Outline 1.Data Science 2.Course Description ... What is Data Science?

Real-world Profile I

Evangelos Pournaras Izabela Moise 12

Real-world Profile II

Evangelos Pournaras Izabela Moise 13

More about Data Scientists

httpshbrorg201210data-scientist-the-sexiest-job-of-the-21st-century

Evangelos Pournaras Izabela Moise 14

More about Data Scientists

Data scientistsrsquo most basic universal skill is the ability to writecode This may be less true in five yearsrsquo time when many morepeople will have the title data scientist on their business cardsMore enduring will be the need for data scientists to communicate inlanguage that all their stakeholders understand-and to demonstratethe special skills involved in storytelling with data whetherverbally visually or - ideally both

Evangelos Pournaras Izabela Moise 15

More about Data Scientists

But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested

Evangelos Pournaras Izabela Moise 16

More about Data Scientists

A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed

A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata

And while people without strong social skills might thrive intraditional data professions data scientists must have suchskills to be effective

Evangelos Pournaras Izabela Moise 17

Part 2 - Course Description

Evangelos Pournaras Izabela Moise 18

Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data

1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data

3 Awareness about the applicability of different data sciencemethods

4 Development of technical skills eg programming use ofdifferent tools etc

5 Presenting scientific results both written and orally

Evangelos Pournaras Izabela Moise 19

Course Prerequisites

Some programming skills are required eg skills for the material ofthis course

1 JavaC++Python

2 UNIX

Didnrsquot you have an opportunity to practice this earlier

No problem this is a golden opportunity

TipProgramming skills will make you more flexible and efficient datascientist

Evangelos Pournaras Izabela Moise 20

Assessment

bull Seminar thesis

bull 100 of the grade no exams

bull Detailed illustration in a next lecture

TipStart early Give the opportunity for your project and your skills todevelop during the course

Evangelos Pournaras Izabela Moise 21

Lectures

bull Every Monday 1715-1900 at LFW B 1

bull Participation is not obligatory but highly recommended

bull 60 minutes lectures followed by 40 minutes interactivediscussions

bull Opportunity to discuss your projectbull Lectures at

httpwwwcossethzcheducationdatasciencehtml

Evangelos Pournaras Izabela Moise 22

Subjects I

1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining

privacyndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentalsndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics

Evangelos Pournaras Izabela Moise 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 Otherndash Project presentations

Evangelos Pournaras Izabela Moise 24

Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications

Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations

Evangelos Pournaras Izabela Moise 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise 28

Page 13: Introduction - ETH Z · Introduction Evangelos Pournaras, Izabela Moise Evangelos Pournaras, Izabela Moise 1. Outline 1.Data Science 2.Course Description ... What is Data Science?

Real-world Profile II

Evangelos Pournaras Izabela Moise 13

More about Data Scientists

httpshbrorg201210data-scientist-the-sexiest-job-of-the-21st-century

Evangelos Pournaras Izabela Moise 14

More about Data Scientists

Data scientistsrsquo most basic universal skill is the ability to writecode This may be less true in five yearsrsquo time when many morepeople will have the title data scientist on their business cardsMore enduring will be the need for data scientists to communicate inlanguage that all their stakeholders understand-and to demonstratethe special skills involved in storytelling with data whetherverbally visually or - ideally both

Evangelos Pournaras Izabela Moise 15

More about Data Scientists

But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested

Evangelos Pournaras Izabela Moise 16

More about Data Scientists

A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed

A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata

And while people without strong social skills might thrive intraditional data professions data scientists must have suchskills to be effective

Evangelos Pournaras Izabela Moise 17

Part 2 - Course Description

Evangelos Pournaras Izabela Moise 18

Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data

1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data

3 Awareness about the applicability of different data sciencemethods

4 Development of technical skills eg programming use ofdifferent tools etc

5 Presenting scientific results both written and orally

Evangelos Pournaras Izabela Moise 19

Course Prerequisites

Some programming skills are required eg skills for the material ofthis course

1 JavaC++Python

2 UNIX

Didnrsquot you have an opportunity to practice this earlier

No problem this is a golden opportunity

TipProgramming skills will make you more flexible and efficient datascientist

Evangelos Pournaras Izabela Moise 20

Assessment

bull Seminar thesis

bull 100 of the grade no exams

bull Detailed illustration in a next lecture

TipStart early Give the opportunity for your project and your skills todevelop during the course

Evangelos Pournaras Izabela Moise 21

Lectures

bull Every Monday 1715-1900 at LFW B 1

bull Participation is not obligatory but highly recommended

bull 60 minutes lectures followed by 40 minutes interactivediscussions

bull Opportunity to discuss your projectbull Lectures at

httpwwwcossethzcheducationdatasciencehtml

Evangelos Pournaras Izabela Moise 22

Subjects I

1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining

privacyndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentalsndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics

Evangelos Pournaras Izabela Moise 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 Otherndash Project presentations

Evangelos Pournaras Izabela Moise 24

Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications

Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations

Evangelos Pournaras Izabela Moise 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise 28

Page 14: Introduction - ETH Z · Introduction Evangelos Pournaras, Izabela Moise Evangelos Pournaras, Izabela Moise 1. Outline 1.Data Science 2.Course Description ... What is Data Science?

More about Data Scientists

httpshbrorg201210data-scientist-the-sexiest-job-of-the-21st-century

Evangelos Pournaras Izabela Moise 14

More about Data Scientists

Data scientistsrsquo most basic universal skill is the ability to writecode This may be less true in five yearsrsquo time when many morepeople will have the title data scientist on their business cardsMore enduring will be the need for data scientists to communicate inlanguage that all their stakeholders understand-and to demonstratethe special skills involved in storytelling with data whetherverbally visually or - ideally both

Evangelos Pournaras Izabela Moise 15

More about Data Scientists

But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested

Evangelos Pournaras Izabela Moise 16

More about Data Scientists

A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed

A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata

And while people without strong social skills might thrive intraditional data professions data scientists must have suchskills to be effective

Evangelos Pournaras Izabela Moise 17

Part 2 - Course Description

Evangelos Pournaras Izabela Moise 18

Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data

1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data

3 Awareness about the applicability of different data sciencemethods

4 Development of technical skills eg programming use ofdifferent tools etc

5 Presenting scientific results both written and orally

Evangelos Pournaras Izabela Moise 19

Course Prerequisites

Some programming skills are required eg skills for the material ofthis course

1 JavaC++Python

2 UNIX

Didnrsquot you have an opportunity to practice this earlier

No problem this is a golden opportunity

TipProgramming skills will make you more flexible and efficient datascientist

Evangelos Pournaras Izabela Moise 20

Assessment

bull Seminar thesis

bull 100 of the grade no exams

bull Detailed illustration in a next lecture

TipStart early Give the opportunity for your project and your skills todevelop during the course

Evangelos Pournaras Izabela Moise 21

Lectures

bull Every Monday 1715-1900 at LFW B 1

bull Participation is not obligatory but highly recommended

bull 60 minutes lectures followed by 40 minutes interactivediscussions

bull Opportunity to discuss your projectbull Lectures at

httpwwwcossethzcheducationdatasciencehtml

Evangelos Pournaras Izabela Moise 22

Subjects I

1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining

privacyndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentalsndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics

Evangelos Pournaras Izabela Moise 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 Otherndash Project presentations

Evangelos Pournaras Izabela Moise 24

Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications

Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations

Evangelos Pournaras Izabela Moise 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise 28

Page 15: Introduction - ETH Z · Introduction Evangelos Pournaras, Izabela Moise Evangelos Pournaras, Izabela Moise 1. Outline 1.Data Science 2.Course Description ... What is Data Science?

More about Data Scientists

Data scientistsrsquo most basic universal skill is the ability to writecode This may be less true in five yearsrsquo time when many morepeople will have the title data scientist on their business cardsMore enduring will be the need for data scientists to communicate inlanguage that all their stakeholders understand-and to demonstratethe special skills involved in storytelling with data whetherverbally visually or - ideally both

Evangelos Pournaras Izabela Moise 15

More about Data Scientists

But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested

Evangelos Pournaras Izabela Moise 16

More about Data Scientists

A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed

A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata

And while people without strong social skills might thrive intraditional data professions data scientists must have suchskills to be effective

Evangelos Pournaras Izabela Moise 17

Part 2 - Course Description

Evangelos Pournaras Izabela Moise 18

Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data

1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data

3 Awareness about the applicability of different data sciencemethods

4 Development of technical skills eg programming use ofdifferent tools etc

5 Presenting scientific results both written and orally

Evangelos Pournaras Izabela Moise 19

Course Prerequisites

Some programming skills are required eg skills for the material ofthis course

1 JavaC++Python

2 UNIX

Didnrsquot you have an opportunity to practice this earlier

No problem this is a golden opportunity

TipProgramming skills will make you more flexible and efficient datascientist

Evangelos Pournaras Izabela Moise 20

Assessment

bull Seminar thesis

bull 100 of the grade no exams

bull Detailed illustration in a next lecture

TipStart early Give the opportunity for your project and your skills todevelop during the course

Evangelos Pournaras Izabela Moise 21

Lectures

bull Every Monday 1715-1900 at LFW B 1

bull Participation is not obligatory but highly recommended

bull 60 minutes lectures followed by 40 minutes interactivediscussions

bull Opportunity to discuss your projectbull Lectures at

httpwwwcossethzcheducationdatasciencehtml

Evangelos Pournaras Izabela Moise 22

Subjects I

1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining

privacyndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentalsndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics

Evangelos Pournaras Izabela Moise 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 Otherndash Project presentations

Evangelos Pournaras Izabela Moise 24

Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications

Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations

Evangelos Pournaras Izabela Moise 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise 28

Page 16: Introduction - ETH Z · Introduction Evangelos Pournaras, Izabela Moise Evangelos Pournaras, Izabela Moise 1. Outline 1.Data Science 2.Course Description ... What is Data Science?

More about Data Scientists

But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested

Evangelos Pournaras Izabela Moise 16

More about Data Scientists

A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed

A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata

And while people without strong social skills might thrive intraditional data professions data scientists must have suchskills to be effective

Evangelos Pournaras Izabela Moise 17

Part 2 - Course Description

Evangelos Pournaras Izabela Moise 18

Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data

1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data

3 Awareness about the applicability of different data sciencemethods

4 Development of technical skills eg programming use ofdifferent tools etc

5 Presenting scientific results both written and orally

Evangelos Pournaras Izabela Moise 19

Course Prerequisites

Some programming skills are required eg skills for the material ofthis course

1 JavaC++Python

2 UNIX

Didnrsquot you have an opportunity to practice this earlier

No problem this is a golden opportunity

TipProgramming skills will make you more flexible and efficient datascientist

Evangelos Pournaras Izabela Moise 20

Assessment

bull Seminar thesis

bull 100 of the grade no exams

bull Detailed illustration in a next lecture

TipStart early Give the opportunity for your project and your skills todevelop during the course

Evangelos Pournaras Izabela Moise 21

Lectures

bull Every Monday 1715-1900 at LFW B 1

bull Participation is not obligatory but highly recommended

bull 60 minutes lectures followed by 40 minutes interactivediscussions

bull Opportunity to discuss your projectbull Lectures at

httpwwwcossethzcheducationdatasciencehtml

Evangelos Pournaras Izabela Moise 22

Subjects I

1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining

privacyndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentalsndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics

Evangelos Pournaras Izabela Moise 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 Otherndash Project presentations

Evangelos Pournaras Izabela Moise 24

Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications

Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations

Evangelos Pournaras Izabela Moise 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise 28

Page 17: Introduction - ETH Z · Introduction Evangelos Pournaras, Izabela Moise Evangelos Pournaras, Izabela Moise 1. Outline 1.Data Science 2.Course Description ... What is Data Science?

More about Data Scientists

A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed

A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata

And while people without strong social skills might thrive intraditional data professions data scientists must have suchskills to be effective

Evangelos Pournaras Izabela Moise 17

Part 2 - Course Description

Evangelos Pournaras Izabela Moise 18

Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data

1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data

3 Awareness about the applicability of different data sciencemethods

4 Development of technical skills eg programming use ofdifferent tools etc

5 Presenting scientific results both written and orally

Evangelos Pournaras Izabela Moise 19

Course Prerequisites

Some programming skills are required eg skills for the material ofthis course

1 JavaC++Python

2 UNIX

Didnrsquot you have an opportunity to practice this earlier

No problem this is a golden opportunity

TipProgramming skills will make you more flexible and efficient datascientist

Evangelos Pournaras Izabela Moise 20

Assessment

bull Seminar thesis

bull 100 of the grade no exams

bull Detailed illustration in a next lecture

TipStart early Give the opportunity for your project and your skills todevelop during the course

Evangelos Pournaras Izabela Moise 21

Lectures

bull Every Monday 1715-1900 at LFW B 1

bull Participation is not obligatory but highly recommended

bull 60 minutes lectures followed by 40 minutes interactivediscussions

bull Opportunity to discuss your projectbull Lectures at

httpwwwcossethzcheducationdatasciencehtml

Evangelos Pournaras Izabela Moise 22

Subjects I

1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining

privacyndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentalsndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics

Evangelos Pournaras Izabela Moise 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 Otherndash Project presentations

Evangelos Pournaras Izabela Moise 24

Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications

Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations

Evangelos Pournaras Izabela Moise 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise 28

Page 18: Introduction - ETH Z · Introduction Evangelos Pournaras, Izabela Moise Evangelos Pournaras, Izabela Moise 1. Outline 1.Data Science 2.Course Description ... What is Data Science?

Part 2 - Course Description

Evangelos Pournaras Izabela Moise 18

Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data

1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data

3 Awareness about the applicability of different data sciencemethods

4 Development of technical skills eg programming use ofdifferent tools etc

5 Presenting scientific results both written and orally

Evangelos Pournaras Izabela Moise 19

Course Prerequisites

Some programming skills are required eg skills for the material ofthis course

1 JavaC++Python

2 UNIX

Didnrsquot you have an opportunity to practice this earlier

No problem this is a golden opportunity

TipProgramming skills will make you more flexible and efficient datascientist

Evangelos Pournaras Izabela Moise 20

Assessment

bull Seminar thesis

bull 100 of the grade no exams

bull Detailed illustration in a next lecture

TipStart early Give the opportunity for your project and your skills todevelop during the course

Evangelos Pournaras Izabela Moise 21

Lectures

bull Every Monday 1715-1900 at LFW B 1

bull Participation is not obligatory but highly recommended

bull 60 minutes lectures followed by 40 minutes interactivediscussions

bull Opportunity to discuss your projectbull Lectures at

httpwwwcossethzcheducationdatasciencehtml

Evangelos Pournaras Izabela Moise 22

Subjects I

1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining

privacyndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentalsndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics

Evangelos Pournaras Izabela Moise 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 Otherndash Project presentations

Evangelos Pournaras Izabela Moise 24

Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications

Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations

Evangelos Pournaras Izabela Moise 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise 28

Page 19: Introduction - ETH Z · Introduction Evangelos Pournaras, Izabela Moise Evangelos Pournaras, Izabela Moise 1. Outline 1.Data Science 2.Course Description ... What is Data Science?

Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data

1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data

3 Awareness about the applicability of different data sciencemethods

4 Development of technical skills eg programming use ofdifferent tools etc

5 Presenting scientific results both written and orally

Evangelos Pournaras Izabela Moise 19

Course Prerequisites

Some programming skills are required eg skills for the material ofthis course

1 JavaC++Python

2 UNIX

Didnrsquot you have an opportunity to practice this earlier

No problem this is a golden opportunity

TipProgramming skills will make you more flexible and efficient datascientist

Evangelos Pournaras Izabela Moise 20

Assessment

bull Seminar thesis

bull 100 of the grade no exams

bull Detailed illustration in a next lecture

TipStart early Give the opportunity for your project and your skills todevelop during the course

Evangelos Pournaras Izabela Moise 21

Lectures

bull Every Monday 1715-1900 at LFW B 1

bull Participation is not obligatory but highly recommended

bull 60 minutes lectures followed by 40 minutes interactivediscussions

bull Opportunity to discuss your projectbull Lectures at

httpwwwcossethzcheducationdatasciencehtml

Evangelos Pournaras Izabela Moise 22

Subjects I

1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining

privacyndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentalsndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics

Evangelos Pournaras Izabela Moise 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 Otherndash Project presentations

Evangelos Pournaras Izabela Moise 24

Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications

Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations

Evangelos Pournaras Izabela Moise 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise 28

Page 20: Introduction - ETH Z · Introduction Evangelos Pournaras, Izabela Moise Evangelos Pournaras, Izabela Moise 1. Outline 1.Data Science 2.Course Description ... What is Data Science?

Course Prerequisites

Some programming skills are required eg skills for the material ofthis course

1 JavaC++Python

2 UNIX

Didnrsquot you have an opportunity to practice this earlier

No problem this is a golden opportunity

TipProgramming skills will make you more flexible and efficient datascientist

Evangelos Pournaras Izabela Moise 20

Assessment

bull Seminar thesis

bull 100 of the grade no exams

bull Detailed illustration in a next lecture

TipStart early Give the opportunity for your project and your skills todevelop during the course

Evangelos Pournaras Izabela Moise 21

Lectures

bull Every Monday 1715-1900 at LFW B 1

bull Participation is not obligatory but highly recommended

bull 60 minutes lectures followed by 40 minutes interactivediscussions

bull Opportunity to discuss your projectbull Lectures at

httpwwwcossethzcheducationdatasciencehtml

Evangelos Pournaras Izabela Moise 22

Subjects I

1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining

privacyndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentalsndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics

Evangelos Pournaras Izabela Moise 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 Otherndash Project presentations

Evangelos Pournaras Izabela Moise 24

Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications

Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations

Evangelos Pournaras Izabela Moise 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise 28

Page 21: Introduction - ETH Z · Introduction Evangelos Pournaras, Izabela Moise Evangelos Pournaras, Izabela Moise 1. Outline 1.Data Science 2.Course Description ... What is Data Science?

Assessment

bull Seminar thesis

bull 100 of the grade no exams

bull Detailed illustration in a next lecture

TipStart early Give the opportunity for your project and your skills todevelop during the course

Evangelos Pournaras Izabela Moise 21

Lectures

bull Every Monday 1715-1900 at LFW B 1

bull Participation is not obligatory but highly recommended

bull 60 minutes lectures followed by 40 minutes interactivediscussions

bull Opportunity to discuss your projectbull Lectures at

httpwwwcossethzcheducationdatasciencehtml

Evangelos Pournaras Izabela Moise 22

Subjects I

1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining

privacyndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentalsndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics

Evangelos Pournaras Izabela Moise 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 Otherndash Project presentations

Evangelos Pournaras Izabela Moise 24

Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications

Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations

Evangelos Pournaras Izabela Moise 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise 28

Page 22: Introduction - ETH Z · Introduction Evangelos Pournaras, Izabela Moise Evangelos Pournaras, Izabela Moise 1. Outline 1.Data Science 2.Course Description ... What is Data Science?

Lectures

bull Every Monday 1715-1900 at LFW B 1

bull Participation is not obligatory but highly recommended

bull 60 minutes lectures followed by 40 minutes interactivediscussions

bull Opportunity to discuss your projectbull Lectures at

httpwwwcossethzcheducationdatasciencehtml

Evangelos Pournaras Izabela Moise 22

Subjects I

1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining

privacyndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentalsndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics

Evangelos Pournaras Izabela Moise 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 Otherndash Project presentations

Evangelos Pournaras Izabela Moise 24

Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications

Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations

Evangelos Pournaras Izabela Moise 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise 28

Page 23: Introduction - ETH Z · Introduction Evangelos Pournaras, Izabela Moise Evangelos Pournaras, Izabela Moise 1. Outline 1.Data Science 2.Course Description ... What is Data Science?

Subjects I

1 Computational Social Science Applicationsndash Smart Grids geolocation traffic systems social sensingmining

privacyndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentalsndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learningndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics

Evangelos Pournaras Izabela Moise 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 Otherndash Project presentations

Evangelos Pournaras Izabela Moise 24

Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications

Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations

Evangelos Pournaras Izabela Moise 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise 28

Page 24: Introduction - ETH Z · Introduction Evangelos Pournaras, Izabela Moise Evangelos Pournaras, Izabela Moise 1. Outline 1.Data Science 2.Course Description ... What is Data Science?

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 Otherndash Project presentations

Evangelos Pournaras Izabela Moise 24

Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications

Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations

Evangelos Pournaras Izabela Moise 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise 28

Page 25: Introduction - ETH Z · Introduction Evangelos Pournaras, Izabela Moise Evangelos Pournaras, Izabela Moise 1. Outline 1.Data Science 2.Course Description ... What is Data Science?

Lectures OutlineLecture 01 (200216)Introduction amp Coursework OutlineLecture 02 (270216)Data Mining Machine Learning ampApplicationsLecture 03 (060316)Data Science Techniques ampApplicationsLecture 04 (130316)Data Mining Machine Learning ampApplicationsLecture 05 (200316)Data Science Techniques ampApplicationsLecture 06 (270316)Big Data Analytics amp Applications

Lecture 07 (030416)Big Data Analytics amp ApplicationsLecture 08 (100416)Data Science Techniques ampApplicationsLecture 09 (080516)Big Data Analytics amp ApplicationsLecture 10 (150516)Data Science Techniques ampApplicationsLecture 11 (220516)Oral PresentationsLecture 12 (290516)Oral Presentations

Evangelos Pournaras Izabela Moise 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise 28

Page 26: Introduction - ETH Z · Introduction Evangelos Pournaras, Izabela Moise Evangelos Pournaras, Izabela Moise 1. Outline 1.Data Science 2.Course Description ... What is Data Science?

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Iza Moise imoiseethzchndash Evangelos Pournaras epournarasethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise 28

Page 27: Introduction - ETH Z · Introduction Evangelos Pournaras, Izabela Moise Evangelos Pournaras, Izabela Moise 1. Outline 1.Data Science 2.Course Description ... What is Data Science?

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise 28

Page 28: Introduction - ETH Z · Introduction Evangelos Pournaras, Izabela Moise Evangelos Pournaras, Izabela Moise 1. Outline 1.Data Science 2.Course Description ... What is Data Science?

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise 28