What is data analysis? 10-718 DATA ANALYSISlwehbe/10718_F19/files/Lecture_1.pdf•How to explore a...

9
10-718 DATA ANALYSIS WEEK -1, LECTURE 1 Leila Wehbe Machine Learning Department Carnegie Mellon University What is data analysis? We will spend the entire semester answering this question. Course Staff Instructor: Leila Wehbe [email protected] Assistant instructor: Fabricio Flores wfl[email protected] TA: Jacob Tyo [email protected] TA: Aria Wang [email protected] Welcome to 10-718 10-718 Data Analysis will have a new format this year. This will be a case study course. This course will be discussion and reading based.

Transcript of What is data analysis? 10-718 DATA ANALYSISlwehbe/10718_F19/files/Lecture_1.pdf•How to explore a...

10-718 DATA ANALYSIS WEEK -1, LECTURE 1

Leila Wehbe Machine Learning Department

Carnegie Mellon University

What is data analysis?We will spend the entire semester answering this question.

Course StaffInstructor: Leila Wehbe [email protected]

Assistant instructor: Fabricio Flores [email protected]

TA: Jacob Tyo [email protected]

TA: Aria Wang [email protected]

Welcome to 10-71810-718 Data Analysis will have a new format this year.

• This will be a case study course. • This course will be discussion and reading based.

TopicsWe will cover many topics relevant to data analysis in the real world:

• How to define a research question • How to explore a dataset • The importance of domain experts • How to evaluate an algorithm • Reproducibility • Fairness • etc…

There is no assigned textbook for this classThe point of this class will be to create, together, a textbook.

Creating a textbookA light “textbook” with 11 topics

• Each topic gets 1 week • A team of 4-5 students • 2 papers assigned (or 1 paper + 1 or more news articles).

Each group responsible of: • Meeting with instructor 10 days before the first presentation. • Preparing a blog-post like chapter. • Presenting the papers (Monday class). • Presenting the blog-post outline (Wednesday class). • Submitting draft version for review, then final version.

The blog-post requirements• Thorough review of readings. • Additional readings and materials (opportunity to use

examples from own research). • Presenting important issues and pitfalls. • Opportunity to highlight what you think is important, and

what solutions could be attempted. • Good, clear, organized writing.

Presentation 1: Readings• Ten days before, meet with Leila to finalize readings. • Starts with a short quiz (just need to have read the paper) • Presenting reading like a journal club

- Prepare 30-40min of material but expect to be interrupted by discussion.

- Prepare discussion points (throughout the presentation and/or at the end).

• Part of the grade will be based on engaging other students in discussion.

Presentation 2: Blog post plan• Prepare a summary of what you will include of the blog post • The blog post should be based on the readings and on

additional papers - But this presentation should not be a rehash of the

previous week • The point is to solicit feedback from the class about:

- What content to put in the blog post. - Specific discussion points that are important.

• Other students will fill out suggestion sheets after the presentation

Blog post due datesGoogle docs until the very last draft, at which point, one student of the group will upload to wordpress.

Blog posts are private to the class for now, we can revisit at the end of the class

• First draft due: 9 days after first presentation. • 4-5 students will review the post in one week. • Final draft due one week after that. • Blogs can be further edited until the end of the class (but

grade will be based on final draft).

Summary of graded componentsGroup work (50%)

• 15% presentations in class (including moderating the discussion)

• 35% blog post

Individual work (50%) • 20% in class quizzes (only 8 best are counted) • 20% in class feedback forms (only 8 best are counted) • 10% reviewing one other blog post

In class quizzesNeed to have read the papers.

5-10 minute quiz.

Multiple choice.

In class feedback formsOne page form (optionally more).

Include comments about blog post presentation.

Reviewing another blog postRespectful and useful comments.

Both in line comments / edits and separate review paragraphs.

Summary of graded componentsGroup work (50%)

• 15% presentations in class (including moderating the discussion)

• 35% blog post

Individual work (50%) • 20% in class quizzes (only 8 best are counted) • 20% in class feedback forms (only 8 best are counted) • 10% reviewing one other blog post

Screen policyLaptops and phones distract you and others from the conversation.

• They are not allowed. • If you have a valid reason, please reach out to staff.

If you think that the class is not engaging enough… • … consider participating more :)

This is a 12 unit class, and a big part of it is participation.

Academic IntegrityNo communication during quizzes.

Communication is allowed during the filling of feedback forms, you can even talk to the presenters.

No plagiarism when writing the posts.

Any deviation from the rules will be dealt with according to the severity of the case. For example: blindly copying one solution from someone else will result in the maximum points that can be earned for that quiz becoming zero (maximum eligible grade becomes B); repeat occurrences will result in a failing grade for the course.

AccommodationsIf you have a disability and are registered with the Office of Disability Resources:

• Use online system to notify me as soon as possible

Whenever a problem arises throughout the semester, please reach out right away so we can follow appropriate procedure.

If you suspect that you may have a disability and would benefit from accommodations but are not yet registered:

[email protected].

Well beingRemember to eat, sleep, exercise and socialize well!

Any problem, please reach out early on, don’t wait.

Counseling and Psychological Services (CaPS) is here to help: call 412-268-2922 and visit http://www.cmu.edu/counseling

If you or someone you know is feeling suicidal or in danger of self-harm, call someone immediately, day or night (CaPS: 412-268-2922, Resolve Crisis Network: 888-796-8226). If the situation is life threatening, call the police (On-campus CMU Police: 412-268-2323, Off-campus Police: 911).

Questions?… or suggestions?

Data analysisStatistics/Machine learning applied to real problems.

Anytime we look at a real problem, great care should be given to: • defining the problem • understanding the data • understanding what the results mean.

Great books and courses about this, such as • Exploratory Data Analysis by Tukey • Case Studies in the Mathematical Statistics Course by Deborah Nolan • Advanced Data Analysis from an Elementary Point of View by Cosma

Shalizi (Offered in Fall) • (Some) Statistical methods for reproducibility (2019) by Aaditya

Ramdas (currently offered).

Data analysisBecause of limited time in this class, we want to focus on applying Machine Learning algorithms:

• ML hype, increased use. • Complex models are attractive. • All the problems inherent in data analysis, plus many

more due to additional degrees of freedom and lack of understanding what the ML model is doing.

Syllabus

Questions?… or suggestions?

Remember to sign up for Piazza and for groups / reviewing