Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the...
Transcript of Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the...
![Page 1: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi](https://reader034.fdocuments.in/reader034/viewer/2022043006/5f8eecc552306d0de23f663d/html5/thumbnails/1.jpg)
Machine learning: lecture 1
Tommi S. Jaakkola
MIT AI Lab
![Page 2: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi](https://reader034.fdocuments.in/reader034/viewer/2022043006/5f8eecc552306d0de23f663d/html5/thumbnails/2.jpg)
6.867 Machine learning: administrivia
• Instructor: Prof. Tommi Jaakkola
• General info
– lectures TR 2.30-4pm
– tutorials/recitations (time/location tba)
• Grading
– midterm (15%), final (25%)
– 6 (≈ bi-weekly) problem sets (30%)
– final project (30%)
Tommi Jaakkola, MIT AI Lab 2
![Page 3: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi](https://reader034.fdocuments.in/reader034/viewer/2022043006/5f8eecc552306d0de23f663d/html5/thumbnails/3.jpg)
Broader context
• What is learning anyway?
• Principles of learning are “universal”
– society (e.g., scientific community)
– animal (e.g., human)
– machine
• Prediction is the key...
Tommi Jaakkola, MIT AI Lab 3
![Page 4: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi](https://reader034.fdocuments.in/reader034/viewer/2022043006/5f8eecc552306d0de23f663d/html5/thumbnails/4.jpg)
Prediction
• We make predictions all the time but rarely investigate the
processes underlying our predictions
• In carrying out scientific research we are also governed by
how theories are evaluated
• To automate the process of making predictions we need to
understand in addition how we search and refine “theories”
Tommi Jaakkola, MIT AI Lab 4
![Page 5: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi](https://reader034.fdocuments.in/reader034/viewer/2022043006/5f8eecc552306d0de23f663d/html5/thumbnails/5.jpg)
Machine learning
• Statistical machine learning
– principles, methods, and algorithms for learning and
prediction on the basis of past experience
– already everywhere: speech recognition, hand-written
character recognition, information retrieval, operating
systems, compilers, fraud detection, security, defense
applications, ...
Tommi Jaakkola, MIT AI Lab 5
![Page 6: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi](https://reader034.fdocuments.in/reader034/viewer/2022043006/5f8eecc552306d0de23f663d/html5/thumbnails/6.jpg)
Example
• A classifcation problem: predict the grades for students
taking this course
Tommi Jaakkola, MIT AI Lab 6
![Page 7: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi](https://reader034.fdocuments.in/reader034/viewer/2022043006/5f8eecc552306d0de23f663d/html5/thumbnails/7.jpg)
Example
• A classifcation problem: predict the grades for students
taking this course
• Key steps:
1. data
2. assumptions
3. representation
4. estimation
5. evaluation
6. model selection
Tommi Jaakkola, MIT AI Lab 7
![Page 8: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi](https://reader034.fdocuments.in/reader034/viewer/2022043006/5f8eecc552306d0de23f663d/html5/thumbnails/8.jpg)
Example
• A classifcation problem: predict the grades for students
taking this course
• Key steps:
1. data: what “past experience” can we rely on?
Tommi Jaakkola, MIT AI Lab 8
![Page 9: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi](https://reader034.fdocuments.in/reader034/viewer/2022043006/5f8eecc552306d0de23f663d/html5/thumbnails/9.jpg)
Example
• A classifcation problem: predict the grades for students
taking this course
• Key steps:
1. data: what “past experience” can we rely on?
2. assumptions: what can we assume about the students or
the course?
Tommi Jaakkola, MIT AI Lab 9
![Page 10: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi](https://reader034.fdocuments.in/reader034/viewer/2022043006/5f8eecc552306d0de23f663d/html5/thumbnails/10.jpg)
Example
• A classifcation problem: predict the grades for students
taking this course
• Key steps:
1. data: what “past experience” can we rely on?
2. assumptions: what can we assume about the students or
the course?
3. representation: how do we “summarize” a student?
Tommi Jaakkola, MIT AI Lab 10
![Page 11: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi](https://reader034.fdocuments.in/reader034/viewer/2022043006/5f8eecc552306d0de23f663d/html5/thumbnails/11.jpg)
Example
• A classifcation problem: predict the grades for students
taking this course
• Key steps:
1. data: what “past experience” can we rely on?
2. assumptions: what can we assume about the students or
the course?
3. representation: how do we “summarize” a student?
4. estimation: how do we construct a map from students to
grades?
Tommi Jaakkola, MIT AI Lab 11
![Page 12: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi](https://reader034.fdocuments.in/reader034/viewer/2022043006/5f8eecc552306d0de23f663d/html5/thumbnails/12.jpg)
Example
• A classifcation problem: predict the grades for students
taking this course
• Key steps:
1. data: what “past experience” can we rely on?
2. assumptions: what can we assume about the students or
the course?
3. representation: how do we “summarize” a student?
4. estimation: how do we construct a map from students to
grades?
5. evaluation: how well are we predicting?
Tommi Jaakkola, MIT AI Lab 12
![Page 13: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi](https://reader034.fdocuments.in/reader034/viewer/2022043006/5f8eecc552306d0de23f663d/html5/thumbnails/13.jpg)
Example
• A classifcation problem: predict the grades for students
taking this course
• Key steps:
1. data: what “past experience” can we rely on?
2. assumptions: what can we assume about the students or
the course?
3. representation: how do we “summarize” a student?
4. estimation: how do we construct a map from students to
grades?
5. evaluation: how well are we predicting?
6. model selection: perhaps we can do even better?
Tommi Jaakkola, MIT AI Lab 13
![Page 14: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi](https://reader034.fdocuments.in/reader034/viewer/2022043006/5f8eecc552306d0de23f663d/html5/thumbnails/14.jpg)
Data
• The data we have available (in principle):
– names and grades of students in past years ML courses
– academic record of past and current students
Tommi Jaakkola, MIT AI Lab 14
![Page 15: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi](https://reader034.fdocuments.in/reader034/viewer/2022043006/5f8eecc552306d0de23f663d/html5/thumbnails/15.jpg)
Data
• The data we have available (in principle):
– names and grades of students in past years ML courses
– academic record of past and current students
• “training” data:
Student ML course 1 course 2 ...
Peter A B A ...
David B A A ...
• “test” data:
Student ML course 1 course 2 ...
Jack ? C A ...
Kate ? A A ...
• Anything else we could use?
Tommi Jaakkola, MIT AI Lab 15
![Page 16: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi](https://reader034.fdocuments.in/reader034/viewer/2022043006/5f8eecc552306d0de23f663d/html5/thumbnails/16.jpg)
Assumptions
• There are many assumptions we can make to facilitate
predictions
1. the course has remained roughly the same over the years
2. each student performs independently from others
Tommi Jaakkola, MIT AI Lab 16
![Page 17: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi](https://reader034.fdocuments.in/reader034/viewer/2022043006/5f8eecc552306d0de23f663d/html5/thumbnails/17.jpg)
Representation
• Academic records are rather diverse so we might limit the
summaries to a select few courses
• For example, we can summarize the ith student (say Pete)
with a vector
xi = [A C B]
where the grades correspond to (say) 18.06, 6.041, and
6.034.
• The available data in this representation
Training Test
Student ML grade Student ML grade
x1 A x′1 ?
x2 B x′2 ?
. . . . . . . . . . . .
Tommi Jaakkola, MIT AI Lab 17
![Page 18: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi](https://reader034.fdocuments.in/reader034/viewer/2022043006/5f8eecc552306d0de23f663d/html5/thumbnails/18.jpg)
Estimation
• Given the training data
Student ML grade
x1 A
x2 B
. . . . . .
we need to find a mapping from “input vectors” x to “labels”
y encoding the grades for the ML course.
Tommi Jaakkola, MIT AI Lab 18
![Page 19: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi](https://reader034.fdocuments.in/reader034/viewer/2022043006/5f8eecc552306d0de23f663d/html5/thumbnails/19.jpg)
Estimation
• Given the training data
Student ML grade
x1 A
x2 B
. . . . . .
we need to find a mapping from “input vectors” x to “labels”
y encoding the grades for the ML course.
• Possible solution (nearest neighbor classifier):
1. For any student x find the “closest” student xi in the
training set
2. Predict yi, the grade of the closest student
Tommi Jaakkola, MIT AI Lab 19
![Page 20: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi](https://reader034.fdocuments.in/reader034/viewer/2022043006/5f8eecc552306d0de23f663d/html5/thumbnails/20.jpg)
Evaluation
• How can we tell how good our predictions are?
– we can wait till the end of this course...
– we can try to assess the accuracy based on the data we
already have (training data)
Tommi Jaakkola, MIT AI Lab 20
![Page 21: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi](https://reader034.fdocuments.in/reader034/viewer/2022043006/5f8eecc552306d0de23f663d/html5/thumbnails/21.jpg)
Evaluation
• How can we tell how good our predictions are?
– we can wait till the end of this course...
– we can try to assess the accuracy based on the data we
already have (training data)
• Possible solution:
– divide the training set further into training and test sets
– evaluate the classifier constructed on the basis of only the
smaller training set on the new test set
Tommi Jaakkola, MIT AI Lab 21
![Page 22: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi](https://reader034.fdocuments.in/reader034/viewer/2022043006/5f8eecc552306d0de23f663d/html5/thumbnails/22.jpg)
Model selection
• We can refine
– the estimation algorithm (e.g., using a classifier other than
the nearest neighbor classifier)
– the representation (e.g., base the summaries on a different
set of courses)
– the assumptions (e.g., perhaps students work in groups)
etc.
• We have to rely on the method of evaluating the accuracy
of our predictions to select among the possible refinements
Tommi Jaakkola, MIT AI Lab 22
![Page 23: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi](https://reader034.fdocuments.in/reader034/viewer/2022043006/5f8eecc552306d0de23f663d/html5/thumbnails/23.jpg)
Types of learning problems
A rough (and somewhat outdated) classification of learning
problems:
• Supervised learning, where we get a set of training inputs
and outputs
– classification, regression
• Unsupervised learning, where we are interested in capturing
inherent organization in the data
– clustering, density estimation
• Reinforcement learning, where we only get feedback in the
form of how well we are doing (not what we should be doing)
– planning
Tommi Jaakkola, MIT AI Lab 23
![Page 24: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi](https://reader034.fdocuments.in/reader034/viewer/2022043006/5f8eecc552306d0de23f663d/html5/thumbnails/24.jpg)
Supervised learning: classification (again)
Example: digit recognition (8x8 binary digits)
binary digit target label
“2”
“2”
“1”
“1”
. . . . . .
• We wish to learn the mapping from digits to labels
Tommi Jaakkola, MIT AI Lab 24
![Page 25: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi](https://reader034.fdocuments.in/reader034/viewer/2022043006/5f8eecc552306d0de23f663d/html5/thumbnails/25.jpg)
Representation/assumptions
• A change in the representation that preserves the relevant
information can preclude learning
Tommi Jaakkola, MIT AI Lab 25
![Page 26: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi](https://reader034.fdocuments.in/reader034/viewer/2022043006/5f8eecc552306d0de23f663d/html5/thumbnails/26.jpg)
Supervised learning: regression
−2 −1 0 1 2−5
−4
−3
−2
−1
0
1
2
3
• Given a set of training examples {(x1, y1) . . . , (xn, yn)}, we
want to learn a mapping f : X → Y such that
yi ≈ f(xi), i = 1, . . . , n
Tommi Jaakkola, MIT AI Lab 26
![Page 27: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi](https://reader034.fdocuments.in/reader034/viewer/2022043006/5f8eecc552306d0de23f663d/html5/thumbnails/27.jpg)
Unsupervised learning: data organization
The digits again...
• We’d like to understand the generation process of examples
(digits in this case)
Tommi Jaakkola, MIT AI Lab 27