15071

8

Click here to load reader

Transcript of 15071

Page 1: 15071

15.071 THE ANALYTICS EDGE

SPRING 2015

Class Time

Section A

Lecture: Mondays and Wednesdays, 1:00pm – 2:30pm, Room E51-315

Recitation: Fridays, 2:00pm – 3:00pm, Room E51-335

Section B

Lecture: Mondays and Wednesdays, 2:30pm – 4:00pm, Room E51-315

Recitation: Fridays, 3:00pm – 4:00pm, Room E51-335

Instructors

Dimitris Bertsimas, E40-147, [email protected], (617) 253-4223

Allison O’Hair, E40-111, [email protected], (617) 452-2116

Teaching Assistants

TBA

Course Description In the last decade, the amount of data available to organizations has reached unprecedented levels. Companies and individuals who can use this data together with analytics give themselves an edge over the competition. In this class, we examine real world examples of how analytics have been used to transform a business or industry. These examples include Moneyball, eHarmony, the Framingham Heart Study, Twitter, IBM Watson, and Netflix. Through these examples and many more, we cover the following analytics methods and how to implement them: linear regression, logistic regression, trees, text analytics, clustering, visualization, and optimization. Readings The readings are chapters from the following book:

Dimitris Bertsimas, Allison O’Hair and Bill Pulleyblank, The Analytics Edge, Dynamic Ideas, March 2015.

Page 2: 15071

We refer to the book below as the AE book. Electronic copies of some of the book chapters are available on the Stellar course webpage (please do not distribute without permission from the authors). We will also provide a copy of the “Analytics Edge R Manual” on Stellar.

Contents

1. February 4, 2015 Lecture 1 – Introduction to the Analytics Edge

In the first lecture, we will discuss the logistics and goals of the class, the recent impact of analytics, and the examples that will be covered during the semester. We will then discuss analytics software, and start working in R. In preparation for this class, you will need to install R on your personal computer (instructions on Stellar) and download the datasets provided on Stellar. The reading for Lecture 1 is the first section of the Analytics Edge R Manual titled “Introduction to R”.

2. February 9, 2015 Lecture 2 – Predicting Wine Quality

We’ll review linear regression, discuss how linear regression can be used to predict the quality of wine, and cover linear regression in R. Download the dataset provided on Stellar so you can follow along in class. The readings for Lecture 2 are the first section of Chapter 1 of the AE book, titled “Predicting the Quality and Prices of Wine,” the first section of Chapter 21 of the AE book, titled “Multiple Linear Regression,” and the second section of the Analytics Edge R Manual titled “Linear Regression in R”.

3. February 11, 2015 Lecture 3 – Moneyball

We will discuss how the Oakland A’s used analytics to become a competitive baseball team, and how these techniques can be applied to other sports. The reading for Lecture 3 is Chapter 4 of the AE book, titled “How to Evaluate Championship Players.”

4. February 17, 2015 Lecture 4 – The Framingham Heart Study

(NOTE: This class is on Tuesday due to President’s Day)

We will discuss the Framingham Heart Study, which led to one of the top 10 cardiology advances of the 1900s, and paved the way for clinical decision rules. Through this example, we’ll start discussing the method of logistic regression, and we’ll use the original Framingham Heart Study data to build logistic regression models in R. The readings for Lecture 4 are Chapter 7 of the AE book titled “The Framingham Heart Study”, the second section of Chapter 21 of the AE book, titled “Logistic Regression”, and the “Logistic Regression in R” section of the Analytics Edge R Manual.

5. February 18, 2015 Lecture 5 – Quality of Healthcare

Page 3: 15071

We will discuss how analytics can be used to model the expertise of a physician and predict the quality of healthcare. Through this example, we will continue to discuss the method of logistic regression. The reading for Lecture 5 is the second section of Chapter 1 of the AE book, titled “Assessing Quality in Healthcare”.

6. February 23, 2015 Lecture 6 – The Supreme Court

We discuss how a group of academics predicted the outcomes of the United States Supreme Court. Through this example, we will discuss the analytical methods of CART and Random Forests, and then use data for Supreme Court cases to build models in R. The readings for Lecture 6 are the third section of Chapter 1 of the AE book, titled “Forecasting Supreme Court Decisions,” the third section of Chapter 21 of the AE book, titled “CART and Random Forests” and the “Trees in R” section of the Analytics Edge R Manual.

7. February 25, 2015 Lecture 7 – D2Hawkeye

We will present the story of D2Hawkeye, a medical data mining company Dimitris Bertsimas was involved in from 2001-2009, and present how analytics methods, specifically CART, were used to predict medical knowledge for individual patients. The reading for Lecture 7 is Chapter 8 of the AE book, titled “Predicting Healthcare Costs.”

8. March 2, 2015 Lecture 8 – Twitter Sentiment Detection

We present how tweets on the social networking site Twitter can be used to understand public perception and analyze sentiment. Through this example, we’ll introduce the method of text analytics, and use tweets in R to build models.

9. March 4, 2015 Lecture 9 – The eDiscovery Problem

In Lecture 9, we discuss how text analytics is being used to find files relevant to a lawsuit. Specifically, we’ll discuss the story of Enron, and how analytics can be used to detect relevant emails and provide evidence for a legal case.

10. March 9, 2015 Lecture 10 – Netflix and Clustering

We will discuss the Netflix Prize and recommendation systems in general. As an example of a type of recommendation system, we introduce the method of clustering. The readings for Lecture 10 are Chapter 13 of the AE book, titled “Recommendations Worth a Million,” the fourth section of Chapter 21 of the AE book, titled “Clustering” and the “Clustering in R” section of the Analytics Edge R Manual.

11. March 11, 2015 Lecture 11 – Patterns of Heart Attacks

Page 4: 15071

We present how analytics have been used to understand the patterns of heart attacks. The reading for Lecture 11 is Chapter 9 of the AE book, titled “Medical Monitoring and Predictive Diagnosing.”

NO CLASS from March 16 – March 27 due to SIP week and Spring Break.

12. March 30, 2015 Lecture 12 – Fraud Detection

This week, we will discuss examples that have successfully combined many different analytics methods to create an edge. We will first discuss how predictive methods and clustering have been used to construct sophisticated algorithms for fraud detection. The reading for Lecture 21 is Chapter 14 of the AE book, titled “Fraud Detection”.

13. April 1, 2015 Lecture 13 – IBM Watson

We will discuss how IBM build a computer that could beat the best human players at Jeopardy, a game known for testing human knowledge and reasoning. The reading for Lecture 13 is Chapter 3 of the AE book, titled “What is Watson?”.

14. April 6, 2015 Lecture 14 – The Power of Visualization

We will discuss the power of visualizations, specifically for WHO, the World Health Organization. Through this example, we’ll learn how to create visualizations in R.

15. April 8, 2015 Lecture 15 – Data-Driven Policing

We will discuss the use of analytics and visualization in policing, specifically, we’ll create heat maps, or “hot spot” maps. These maps are currently being used by police departments all over the country to allocate resources. The reading for Lecture 15 is Chapter 15 of the AE book, titled “Predictive Policing”.

16. April 13, 2015 Lecture 16 – Sports Scheduling

We will discuss how professional sports use integer optimization to design sports schedules, and how analytics methods can significantly outperform human scheduling. Through this example, we’ll learn how to solve optimization models in a powerful modeling language.

17. April 15, 2015 Lecture 17 – Revenue Management

We will discuss how optimization is used for revenue management, and how airlines and casinos have relied on the power of analytics to create a competitive edge. The reading for Lecture 17 is Chapter 17 of the AE book.

Page 5: 15071

18. April 22, 2015 Lecture 18 – eHarmony

We will discuss how the online dating site eHarmony uses logistic regression and optimization to predict the probability of love and find perfect matches. Through this example, we’ll see how the results of a predictive model can be used in an optimization model to make optimal decisions.

19. April 27, 2015 Lecture 19 – The MIT Blackjack Team

We will discuss how a group of MIT students made millions playing blackjack, and how strategies were developed using data and simulation. The reading for Lecture 19 is Chapter 6 of the AE book, titled “The MIT Blackjack Team.”

20. April 29, 2015 Lecture 20 – Emergency Room Operations

We will discuss how simulations and analytics can be used to understand the operations in an emergency room, and to analyze the effects of different decisions on patient care and hospital efficiency. The reading for Lecture 20 is Chapter 18 of the AE book.

21. May 4, 2015 Lecture 21 – Social Networks

We will discuss social networks, specifically how the social networks of gangs can be used to better understand gang dynamics and combat crime. We will also discuss the use of social networks in other applications. The reading for Lecture 21 is Chapter 16 of the AE book.

22. May 6, 2015 Lecture 22 – Analytics in Finance

We will discuss the use of analytics in finance, including asset management and options pricing. The readings for Lecture 22 are Chapters 19 and 20 of the AE book.

23. May 11, 2015 Student Project Presentations

During this lecture, selected students will make 15 minute presentations of their projects.

24. May 13, 2015 Student Project Presentations

During this lecture, selected students will make 15 minute presentations of their projects.

Recitations:

Recitations will be held on Fridays in Room E51-335 (2pm – 3pm for Section A, and 3pm – 4pm for Section B).

Page 6: 15071

The recitations will be interactive sessions, covering additional examples on the analytics methods learned in class, and how to create models in R. Attendance is strongly encouraged.

Assignments:

There will be seven homework assignments, and a final project in teams of two.

The following are tentative due dates and topics for the homework assignments:

• February 17: Data analysis and linear regression in R. • February 23: Logistic Regression. • March 2: CART and Random Forests. • March 9: Text analytics. • March 30: Clustering. • April 13: Visualization. • April 27: Optimization.

All homework assignments are due by the beginning of class on the date assigned.

For the final project, by March 11, each team will submit a one page proposal that outlines a plan to apply analytical methods to a problem you identify using some of the concepts and tools discussed in the course. It should include a description of: (1) the problem, (2) the data that you have or plan to collect to solve the problem, (3) which analytic techniques you plan to use, and (4) the impact or overall goal of the project (if you could build a perfect model, what would it be able to do?). The teaching staff will be available to answer questions over email, and will provide all students with electronic feedback by March 20.

The week of April 13, each project team will set up a meeting with a member of the teaching team to show your progress applying the analytical methods you have learned to your project topic. This meeting is intended to help you progress on your project.

The final project submission consists of a written report of at most 4 pages (not including appendices) that describes your analysis, as well as a 15 minute presentation (in powerpoint or pdf format) of your project. Unfortunately, due to time constraints, we will not be able to have all student teams present in class. However, ALL TEAMS should be prepared to give a 15 minute presentation on May 11 or May 13, and all teams are required to submit their presentation for a grade.

To determine who will present on May 11 and May 13, by midnight on Thursday May 7, each team will electronically submit a) a 1 page abstract summarizing their project (including the scope and idea of your project, what analytical methods/models you used, and your results), and b) the presentation. The abstracts will be uploaded to the class website. Students will vote by the end of the day on Sunday, May 10 about which projects they would like to see presented in class. The teaching team will vote as well

Page 7: 15071

(taking the abstracts and presentations into account), and the presenters will be notified in real-time during class on May 11 and May 13.

Office Hours:

Allison: Mondays and Wednesdays, 9:30am – 10:30am in E40-111.

Teaching Assistants: TBD

We are also always available by appointment and email.

Policy on Individual Work:

In the case of homework assignments, your assignment must represent your own individual work. Although you may discuss homework problems with other students, assignments must represent your own work. Copying from another individual or from any outside source (including past homework solutions) constitutes a violation of the Policy on Individual Work. Any student who copies or knowingly allows his/her work to be copied will receive an F grade for the assignment. If there is a second offense, the student will receive an F grade in the course.

You may find it useful to discuss broad conceptual issues and general solution procedures with others. If this is the case, then we enthusiastically recommend that you do so. The objective here is to learn. In our opinion (and personal experiences), the material of this class is best learned through individual practice and exposure to a variety of application contexts. Class Participation and Conduct

Your class participation will be evaluated subjectively, but will rely upon measures of punctuality, attendance, familiarity with the readings, relevance and insight reflected in classroom questions, and commentary. Relative differences in technical background will not be a criterion. Although several lectures will be didactic, we will rely heavily upon interactive discussion within the class. Students will be expected to be familiar with the readings, even though they might not understand all of the material in advance. In general, questions and comments are encouraged.

We will require you to bring and use a personal laptop in some class sessions. However, if we are not using laptops together as a class, we expect your laptops to be closed or only used for class materials.

Grading:

Grades for the course will be based upon participation (10%), homework assignments (50%), and the final project (40%).

Page 8: 15071

Prerequisites:

It is highly recommended that students have taken 15.060 (Data, Models and Decisions), or basic statistics and optimization courses.