Introduction to USC Viterbi School Computational Thinking ... · multimedia data. Students learn to...

9
USC Viterbi School of Engineering Introduction to Computational Thinking and Data Science INF 549 Term: Fall 2019 Syllabus Term: Fall 2019 Units: 4 Time: Mondays 9am-12:20pm Location: DEN Instructor: Dr. Gale Lucas Office Hours: Mondays 12:20pm-1:20pm (or as arranged by appointment) Office hours location: Skype Contact Info: [email protected] TA: Shengjia Wu Contact Info: [email protected] Catalogue Course Description Introduction to data analysis techniques and associated computing concepts for non-programmers. Topics include foundations for data analysis, visualization, parallel processing, metadata, provenance, and data stewardship. Expanded Course Description This course will teach non-programmers to think in computing terms about modern topics, and to approach real-world phenomena through data science. The course will enable students to: Acquire computational thinking skills that will enable students to represent and reason about complex problems in the digital arena Understand different kinds of data in terms of their possibilities and limitations to approach complex problems cast in terms of the emerging field of data science Become data science scholars through best practices in data documentation and dissemination The course is intended for students in disciplines outside of computer science, so no prior experience with computer science is assumed. The course topics will be particularly relevant to students interested in physical sciences and social sciences. This class will include ten quizzes, eight homework assignments and a final exam.

Transcript of Introduction to USC Viterbi School Computational Thinking ... · multimedia data. Students learn to...

Page 1: Introduction to USC Viterbi School Computational Thinking ... · multimedia data. Students learn to run multi-step analysis through a graphical workflow interface, and will experience

USC Viterbi School

of Engineering

Introduction to Computational Thinking and Data Science

INF 549 Term: Fall 2019

Syllabus Term: Fall 2019 Units: 4 Time: Mondays 9am-12:20pm Location: DEN

Instructor: Dr. Gale Lucas Office Hours: Mondays 12:20pm-1:20pm (or as arranged by appointment) Office hours location: Skype Contact Info: [email protected]

TA: Shengjia Wu Contact Info: [email protected]

Catalogue Course Description Introduction to data analysis techniques and associated computing concepts for non-programmers. Topics include foundations for data analysis, visualization, parallel processing, metadata, provenance, and data stewardship.

Expanded Course Description This course will teach non-programmers to think in computing terms about modern topics, and to approach real-world phenomena through data science. The course will enable students to:

Acquire computational thinking skills that will enable students to represent and reason about complex problems in the digital arena

Understand different kinds of data in terms of their possibilities and limitations to approach complex problems cast in terms of the emerging field of data science

Become data science scholars through best practices in data documentation and dissemination The course is intended for students in disciplines outside of computer science, so no prior experience with computer science is assumed. The course topics will be particularly relevant to students interested in physical sciences and social sciences. This class will include ten quizzes, eight homework assignments and a final exam.

Page 2: Introduction to USC Viterbi School Computational Thinking ... · multimedia data. Students learn to run multi-step analysis through a graphical workflow interface, and will experience

Syllabus for INF 549 Fall 2018, Page 2

Learning Objectives This course teaches non-programmers to think in computing terms about modern topics, and to approach real-world phenomena through data science. The course introduces different kinds of data and corresponding approaches to data analysis, including geospatial data, time series, networks, and multimedia data. Students learn to run multi-step analysis through a graphical workflow interface, and will experience first hand complex concepts in data science such as parallel computing, provenance, and visualization. Students also learn to use ontologies and logic representations to capture metadata and other knowledge about complex data. The course includes practical lessons to use workflow and ontology development toolkits, as well as best practices for data stewardship and dissemination.

Prerequisite(s): none Co-Requisite (s): none Recommended Preparation: Mathematics and logic undergraduate courses.

Software and Supplementary Readings All required software is freely available for students to install on their personal computers or to access through a web interface. There is no textbook. Students can find all the supplementary readings online. Supplementary readings include:

“Computational Thinking.” J. M. Wing. Communications of the ACM, viewpoint, vol. 49, no.3, March 2006.

“Data Science in the News: Advances and Challenges for the Era of Big Data.” Kate Musen, Alyssa Deng, Taylor Alarcon, Yolanda Gil. Technical Report ISI-TR-702, Information Sciences Institute, University of Southern California. August 24, 2015.

“Ten Simple Rules for the Care and Feeding of Scientific Data.” Goodman, A.; Pepe, A.; Blocker, A. W.; Borgman, C. L.; Cranmer, K.; Crosas, M.; Stefano, R. D.; Gil, Y.; Groth, P.; Hedstrom, M.; Hogg, D. W.; Kashyap, V.; Mahabal, A.; Siemiginowska, A.; and Slavkovic, A. PLOS Computational Biology, 10, 2014.

“Intelligent Workflow Systems and Provenance-Aware Software.” Y. Gil. Proceedings of the Seventh International Congress on Environmental Modeling and Software, San Diego, CA, 2014.

“Data Science for Business”, Foster Provost and Tom Fawcett. O’Reilly Media publishers, 2013.

“A Primer for the PROV Provenance Model.” Gil, Y.; Miles, S.; Belhajjame, K.; Deus, H.; Garijo, D.; Klyne, G.; Missier, P.; Soiland-Reyes, S.; and Zednik, S. World Wide Web Consortium (W3C) Technical Report, 2013.

“The Ethics of Data Sharing and Reuse in Biology.” Duke, C. S., & Porter, J. H. BioScience, 63(6), 483–489, 2013. doi:10.1525/bio.2013.63.6.10

Description and Assessment of Homework Assignments There will be a homework assignment every other week, with two exceptions where the assignment will be due in a week. The homeworks include a class project that will be developed by the students independently in 3 separate stages, getting feedback from the instructors at each stage. The assignments must be submitted individually and students will receive individual scores. Students may work in groups to complete the tasks. The homework assignments are expected to take 6-8 hours. Each assignment is graded on a scale of 0-100 and the grading criteria will be specified in each assignment. The homework topics are listed in the Course Schedule.

Page 3: Introduction to USC Viterbi School Computational Thinking ... · multimedia data. Students learn to run multi-step analysis through a graphical workflow interface, and will experience

Syllabus for INF 549 Fall 2019, Page 3

Syllabus and Class Schedule

Week Topic Material Covered Homework assigned

Section I: Introduction to Computational Thinking and Data Science

Week 1

Computational thinking and data science

What is computational thinking

Computational thinking for reasoning and analysis

What is data science

Data scientists

The context of data science

HW1: Project part 1 – Finding data

Data What is data

What is not (yet) data

Time series data

Networked data

Geospatial data

Text data

Labeled and annotated data

Big data

Week 2

NO CLASS Labor Day

Week 3

Data analysis software

Programs for data analysis

Inputs and Outputs

Program Parameters

Programming Languages

Programs as Black Boxes

Algorithms versus software

Multi-step data analysis as workflows

Building workflows by composing software

Pre-processing and post-processing data

Workflows for data analysis

Workflow inputs and parameters

Executing workflows

Exploring data through workflows

Workflows in practice

Section II: Data Analysis

Week 4

Logic and probability for statistics

Basic probability for statistics

Logic for statistics

Null hypothesis significance testing

Sampling distributions

Homework HW2: Exploring data analysis workflows

Basic statistics Descriptive statistics

Inferential statistics

Page 4: Introduction to USC Viterbi School Computational Thinking ... · multimedia data. Students learn to run multi-step analysis through a graphical workflow interface, and will experience

Syllabus for INF 549 Fall 2019, Page 4

o T-tests o ANOVAs o Chi-squared tests o Correlation

Week 5

Data analysis tasks (I)

Data analysis tasks in data mining, statistics, and machine learning

Supervised learning o Classification tasks o Classification algorithms o Evaluation of classifiers

Data analysis tasks (II)

Unsupervised learning o Clustering o Pattern detection o Anomaly detection

Simulation and prediction

Week 6

Data analysis tasks (III)

Causality o Probabilistic graphical

models o Bayesian networks o Causal models

Homework HW3: Analyzing data with workflows

Data analysis tasks (IV)

Networks o Network structure o Dynamic networks o Scale-free networks

Section III: Data Analysis in Practice

Week 7

Analyzing different kinds of data (I)

Analyzing time series data o Collecting time series data o Pre-processing time series

data o Event detection o Granger causality

Analyzing different kinds of data (II)

Analyzing text data o Pre-processing text o Document classification o Document clustering o Topic detection o Sentiment analysis

Week 8

Analyzing different kinds of data (II)

Analyzing multimedia data o Pre-processing images o Segmentation o Edge detection o Object detection o Video analysis

Analyzing geospatial data

Homework HW4: Data visualization

Page 5: Introduction to USC Viterbi School Computational Thinking ... · multimedia data. Students learn to run multi-step analysis through a graphical workflow interface, and will experience

Syllabus for INF 549 Fall 2019, Page 5

o Coordinate systems o GIS systems

Data visualization

Quality of visualizations

Major types of visualizations

Time series visualizations

Geospatial visualizations

Multi-dimensional spaces

Network visualizations

Section IV: User interfaces and user studies

Week 9

User experience, user interfaces, user studies

UX/UI Design Principles

AB testing

User study design

Week 10

Analysis for experiments

Advanced analysis for experiments

Appropriate statistical tests

Homework HW5: Project part 2 – Design of data analysis approach

Causal claims from user studies

Correlational research

Comparing correlational research to experiments

Ensuring internal validity

Section V: Data analysis at scale

Week 11

Parallel and distributed computing for big data (I)

Cost of computation

Divide and conquer

Speedup with parallel processing

Limits of speedup: Critical path

Amdahl’s law

When problems are not parallelizable

Homework HW6: Data analysis with parallel processing

Parallel and distributed computing for big data (II)

Multi-core computing

Distributed computing

Cluster computing

Cloud computing

Grid computing

Virtual machines

Web services

Practical concerns in distributed computing

Parallel programming languages

Section VI: Metadata

Week 12

Semantic metadata

What is metadata

Basic metadata versus semantic

Page 6: Introduction to USC Viterbi School Computational Thinking ... · multimedia data. Students learn to run multi-step analysis through a graphical workflow interface, and will experience

Syllabus for INF 549 Fall 2019, Page 6

metadata

Metadata about data collection

Metadata about data processing

Metadata for search and retrieval

Metadata standards

Domain metadata and ontologies

Ontologies (I) What is an ontology

Taxonomies and class inheritance

Properties

Logical constraints

Week 13

Ontologies (II) Logical reasoning and inference

Expressivity and computation

The Semantic Web

The PROTÉGÉ ontology editor

Homework HW7: Project part 3 – Final report

Provenance What is provenance

Provenance models

Provenance standards

Section VII: Responsible data science

Week 14

Data lifecycle Data collection and storage

Data cleaning

Data extraction and querying

Data preparation

Quality control

Data integration

Homework HW8: Data Science Scenarios

Data standards and data stewardship

Data formats and standards

Data repositories and services

Data sharing

Data identifiers

Licenses for data

Data citation and attribution

Software and other work products

Week 15

Privacy and ethics

Privacy

Sensitive data

Anonymization

Research ethics

Multi-disciplinary collaborations

Multidisciplinary collaborations

Final Exam The final exam will be proctored. As with the previous assessments (i.e., quizzes), DEN students will be asked to complete the final exam through the quiz function on D2L (desire to learn) and have it proctored via ProctorU. However, taking the exam at a proctoring center is an option; a student must let the professor

Page 7: Introduction to USC Viterbi School Computational Thinking ... · multimedia data. Students learn to run multi-step analysis through a graphical workflow interface, and will experience

Syllabus for INF 549 Fall 2019, Page 7

know during the first week of classes if he or she would like to switch to this option (otherwise the default option -using D2L and ProctorU- will be utilized).

Assignment Submission Policy Homework assignments are due at 11:59pm on the due date and should be submitted in Blackboard. Homework will be accepted up to one week late as long as the student requested a late submission ahead of the deadline, and in that case the assignment will be graded at 20% less than the possible points for the assignment. After one week, the assignment will not be graded.

Grading Breakdown Quizzes: There will be (almost) weekly quizzes based on the material from the week before. When there is no quiz for a week, the material from that previous week will also be on the next quiz. There is no mid-term for this class. Homework: There will be eight homework assignments throughout the course. Final Exam: There is a final exam at the end of the semester covering all of the material covered in the class. Grading Schema: Quizzes 28% Homework assignments 45% Final: 27% __________________________________________ Total 100% Grades will range from A through F. The following is the breakdown for grading: 94 - 100 = A 74 – 76.99 = C 90 – 93.99 = A- 70 – 73.99 = C- 87 – 89.99 = B+ 67 – 69.99 = D+ 84 – 86.99 = B 64 – 66.99 = D 80 – 83.99 = B- 60 – 63.99 = D- 77 – 79.99 =C+ 59.99 and below = F

Academic Conduct and Support Systems Honor Code In response to recommendations made by the Academic Integrity Task Force to the Dean, the USC Viterbi School of Engineering now has an Honor Code. The Code was developed by Viterbi students, and its text is as follows:

Engineering enables and empowers our ambitions and is integral to our identities. In the Viterbi community, accountability is reflected in all our endeavors.

Engineering+ Integrity. Engineering+ Responsibility. Engineering+ Community. Think good. Do better. Be great.

These are the pillars we stand upon as we address the challenges of society and enrich lives. During your time here at Viterbi, please know that academic and personal resources are available to help you:

Page 8: Introduction to USC Viterbi School Computational Thinking ... · multimedia data. Students learn to run multi-step analysis through a graphical workflow interface, and will experience

Syllabus for INF 549 Fall 2019, Page 8

The student-driven and student-written Honor Code is here: http://viterbi.usc.edu/academics/integrity/.

An introductory video is posted at https://myviterbi.usc.edu/ under the link "Academic Integrity Introduction" and serves as a reminder of the school’s emphasis in maintaining a high level of academic integrity.

Master’s and PhD students can contact the GAPP office in OHE 106 (https://gapp.usc.edu/) for other helpful resources.

The Viterbi Academic and Resource Center (VARC) (http://viterbi.usc.edu/students/undergrad/varc) has a variety of services available.

Academic Integrity The Viterbi School takes academic integrity violations seriously. Most of the violations that have been reported in the past fall into four categories: unauthorized collaboration, plagiarism, code sharing, and cheating on an exam. Specifically:

Unauthorized collaboration - Unauthorized collaboration on a project, homework or other assignment. (section 11.14.B) All homework assignments must be individually developed. Students that collaborate on assignments will be referred to the Academic Integrity Coordinator.

Plagiarism - presenting someone else’s ideas as your own, either verbatim or recast in your own words - is a serious academic offense with serious consequences.

Code sharing - Obtaining for oneself or providing for another person a solution to homework, a project or other assignment, without the knowledge and expressed consent of the instructor. (section 11.14.A)

Cheating in an exam - this may involve a number of violations, such as looking at class notes during the exam, looking at other student's exam, "texting" with other students during the exam. See the section titled Two Exams for a list of specific violations.

Please note that that these are only the basic violations that we have encountered in the past, and there are many more. Please familiarize yourself with the discussion of plagiarism in SCampus in Section B.11.00, Behavior Violating University Standards and Appropriate Sanctions available at https://scampus.usc.edu/b/11-00-behavior-violating-university-standards-and-appropriate-sanctions/.

All academic integrity violations will be referred to the Academic Integrity Coordinator of the Viterbi School of Engineering. The process for adjudicating these cases is available in SCampus, Part B, Section 13. Other Misconduct Other forms of academic dishonesty are equally unacceptable. See additional information in SCampus and university policies on scientific misconduct, http://policy.usc.edu/scientific-misconduct/.

Discrimination, sexual assault, and harassment are not tolerated by the university. You are encouraged to report any incidents to the Office of Equity and Diversity http://equity.usc.edu/ or to the Department of Public Safety http://capsnet.usc.edu/department/department-public-safety/online-forms/contact-us. This is important for the safety whole USC community. Another member of the university community - such as a friend, classmate, advisor, or faculty member - can help initiate the report, or can initiate the report on behalf of another person. The Center for Women and Men http://www.usc.edu/student-affairs/cwm/ provides 24/7 confidential support, and the sexual assault resource center webpage http://sarc.usc.edu describes reporting options and other resources. Support Systems A number of USC’s schools provide support for students who need help with scholarly writing. Check with your advisor or program staff to find out more. Students whose primary language is not English should check with the American Language Institute http://dornsife.usc.edu/ali which sponsors courses and workshops specifically for international graduate students. The Office of Disability Services and Programs http://sait.usc.edu/academicsupport/centerprograms/dsp/home_index.html provides certification for

Page 9: Introduction to USC Viterbi School Computational Thinking ... · multimedia data. Students learn to run multi-step analysis through a graphical workflow interface, and will experience

Syllabus for INF 549 Fall 2019, Page 9

students with disabilities and helps arrange the relevant accommodations. If an officially declared emergency makes travel to campus infeasible, USC Emergency Information http://emergency.usc.edu/ will provide safety and other updates, including ways in which instruction will be continued by means of blackboard, teleconferencing, and other technology. Diversity The diversity of the participants in this course is a valuable source of ideas, problem solving strategies, and engineering creativity. The instructors encourage and support the efforts of all of our students to contribute freely and enthusiastically. As members of an academic community, it is our shared responsibility to cultivate a climate where all students and individuals are valued and where both they and their ideas are treated with respect, regardless of their differences, visible or invisible. Students with Disabilities Any student requesting academic accommodations based on a disability is required to register with Disability Services and Programs (DSP) each semester. A letter of verification for approved accommodations can be obtained from DSP. Please be sure the letter is delivered to me (or to TA) as early in the semester as possible. DSP is located in STU 301 and is open 8:30 a.m. - 5:00 p.m., Monday through Friday. Website and contact information for DSP: http://sait.usc.edu/academicsupport/centerprograms/dsp/home_index.html, (213) 740-0776 (Phone), (213) 740-6948 (TDD only), (213) 740-8216 (FAX), [email protected]. Emergency Preparedness/Course Continuity in a Crisis In case of a declared emergency if travel to campus is not feasible, USC executive leadership will announce an electronic way for instructors to teach students in their residence halls or homes using a combination of Blackboard, teleconferencing, and other technologies.