ECE-271A Statistical Learning I · the course is an introductory level course in statistical...
Transcript of ECE-271A Statistical Learning I · the course is an introductory level course in statistical...
![Page 1: ECE-271A Statistical Learning I · the course is an introductory level course in statistical learning by introductory I mean that you will not need any previous exposure to the field,](https://reader033.fdocuments.in/reader033/viewer/2022043018/5f3acb194ebeaa19c814aad1/html5/thumbnails/1.jpg)
ECE-271AStatistical Learning I
Nuno Vasconcelos
ECE Department, UCSD
![Page 2: ECE-271A Statistical Learning I · the course is an introductory level course in statistical learning by introductory I mean that you will not need any previous exposure to the field,](https://reader033.fdocuments.in/reader033/viewer/2022043018/5f3acb194ebeaa19c814aad1/html5/thumbnails/2.jpg)
2
The coursethe course is an introductory level course in statistical learning
by introductory I mean that you will not need any previous exposure to the field, not that it is basic
we will cover the foundations of Bayesian or generative learning
271B is a follow-up course on discriminant learning, in alternating years
more on generative vs discriminant later
![Page 3: ECE-271A Statistical Learning I · the course is an introductory level course in statistical learning by introductory I mean that you will not need any previous exposure to the field,](https://reader033.fdocuments.in/reader033/viewer/2022043018/5f3acb194ebeaa19c814aad1/html5/thumbnails/3.jpg)
3
LogisticsExams: 1 mid-term - 35%
1 final – 45% (covers everything)
Homework (20%):• one problem set every week.
• will include a small computational problem. By small, I mean in terms of concepts, thinking, etc.
• some computational problems will require a fair amount of computer power, e.g. a few hours on a low-end PC.
• be sure to start early
• will count 20%, but almost impossible without it.
• will give you the hands-on experience needed to be able to claim that you really know learning!
![Page 4: ECE-271A Statistical Learning I · the course is an introductory level course in statistical learning by introductory I mean that you will not need any previous exposure to the field,](https://reader033.fdocuments.in/reader033/viewer/2022043018/5f3acb194ebeaa19c814aad1/html5/thumbnails/4.jpg)
4
Homework policies
homework is individual
OK to work on problems with someone else
but you have to:• write your own solution
• write down the names of who you collaborated with
homework is due one week after it is issued.
![Page 5: ECE-271A Statistical Learning I · the course is an introductory level course in statistical learning by introductory I mean that you will not need any previous exposure to the field,](https://reader033.fdocuments.in/reader033/viewer/2022043018/5f3acb194ebeaa19c814aad1/html5/thumbnails/5.jpg)
5
Cheetah
statistical learning only makes sense when you try it on data
we will test what we learn on a image processing problem• given the cheetah image, can we teach a
computer to segment it into object and foreground?
• the question will be answered with different techniques, typically one problem per week
• a total of 5 computer problems
try to keep an eye on the big picture,e.g. “did this improve over what we had done before?”
![Page 6: ECE-271A Statistical Learning I · the course is an introductory level course in statistical learning by introductory I mean that you will not need any previous exposure to the field,](https://reader033.fdocuments.in/reader033/viewer/2022043018/5f3acb194ebeaa19c814aad1/html5/thumbnails/6.jpg)
6
Resources
Course web page: http://www.svcl.ucsd.edu/~nuno• all handouts, problem sets, code will be available there
TA: TBA,
Me: Nuno Vasconcelos, [email protected], EBU1-5603
Office hours:• TA: TBA
• mine: Fridays, 9:30-10:30AM
• for homework talk to TA first, everything else see me
My assistant:• Travis Spackman ([email protected]), outside my office,
may sometimes be involved in administrative issues
![Page 7: ECE-271A Statistical Learning I · the course is an introductory level course in statistical learning by introductory I mean that you will not need any previous exposure to the field,](https://reader033.fdocuments.in/reader033/viewer/2022043018/5f3acb194ebeaa19c814aad1/html5/thumbnails/7.jpg)
7
Textsrequired:• “Pattern Classification”, Duda, Hart, and
Stork, John Willey and Sons, 2001
• will follow closely, hand-outs where needed
various other good, but optional, texts:• “Pattern Recognition and Machine Learning”, Bishop, 2006
• “Elements of Statistical Learning”, Hastie, Tibshirani, Fredman, 2001
• “Bayesian Data Analysis”, Gelman, Rubin, Stern, 2003.
• “A Probabilistic Theory of Pattern Recognition”, Devroye, Gyorfi, Lugosi, 1998 (more than what we need)
stuff you must know really well:• “Linear Algebra”, Gilbert Strang, 1988
• “Fundamentals of Applied Probability”, Drake, McGraw-Hill, 1967
![Page 8: ECE-271A Statistical Learning I · the course is an introductory level course in statistical learning by introductory I mean that you will not need any previous exposure to the field,](https://reader033.fdocuments.in/reader033/viewer/2022043018/5f3acb194ebeaa19c814aad1/html5/thumbnails/8.jpg)
8
The course
why statistical learning?
there are many processes in the world that are ruled by deterministic equations• e.g. f = ma; linear systems and convolution, Fourier, etc, various
chemical laws
• there may be some “noise”, “error”, “variability”, but we can leave with those
• we don’t need statistical learning
learning is needed when• there is a need for predictions about variables in the world, Y
• that depend on factors (other variables) X
• in a way that is impossible or too difficult to derive an equation for.
![Page 9: ECE-271A Statistical Learning I · the course is an introductory level course in statistical learning by introductory I mean that you will not need any previous exposure to the field,](https://reader033.fdocuments.in/reader033/viewer/2022043018/5f3acb194ebeaa19c814aad1/html5/thumbnails/9.jpg)
9
Examples
data-mining view:• large amounts of data that does not follow deterministic rules
• e.g. given an history of thousands of customer records and some questions that I can ask you, how do I predict that you will pay on time?
• impossible to derive a theorem for this, must be learned
while many associate learning with data-mining, it is by no means the only or more important application
signal processing view:• signals combine in ways that depend on “hidden structure” (e.g.
speech waveforms depend on language, grammar, etc.)
• signals are usually subject to significant amounts of “noise” (which sometimes means “things we do not know how to model”)
![Page 10: ECE-271A Statistical Learning I · the course is an introductory level course in statistical learning by introductory I mean that you will not need any previous exposure to the field,](https://reader033.fdocuments.in/reader033/viewer/2022043018/5f3acb194ebeaa19c814aad1/html5/thumbnails/10.jpg)
10
Examples (cont’d)
signal processing view:• e.g. the cocktail party problem,
• although there are all these people talking, I can figure everything out.
• how do I build a chip to separate the speakers?
• model the hidden dependence as
• a linear combination of independent sources
• noise
• many other examples in the areas of wireless, communications, signal restoration, etc.
![Page 11: ECE-271A Statistical Learning I · the course is an introductory level course in statistical learning by introductory I mean that you will not need any previous exposure to the field,](https://reader033.fdocuments.in/reader033/viewer/2022043018/5f3acb194ebeaa19c814aad1/html5/thumbnails/11.jpg)
11
Examples (cont’d)
perception/AI view:• it is a complex world, I cannot model
everything in detail
• rely on probabilistic models that explicitly account for the variability
• use the laws of probability to make inferences, e.g. what is
• P( burglar | alarm, no earthquake) is high
• P( burglar | alarm, earthquake) is low
• a whole field that studies “perception as Bayesian inference”
• perception really just confirms what you already know
• priors + observations = robust inference
![Page 12: ECE-271A Statistical Learning I · the course is an introductory level course in statistical learning by introductory I mean that you will not need any previous exposure to the field,](https://reader033.fdocuments.in/reader033/viewer/2022043018/5f3acb194ebeaa19c814aad1/html5/thumbnails/12.jpg)
12
communications view:• detection problems:
• I see Y, and know something about the statistics of the channel. What was X?
• this is the canonic detection problem that appears all over learning.
• for example, face detection in computer vision: “I see pixel array Y. Is it a face?”
Examples (cont’d)
channelX Y
![Page 13: ECE-271A Statistical Learning I · the course is an introductory level course in statistical learning by introductory I mean that you will not need any previous exposure to the field,](https://reader033.fdocuments.in/reader033/viewer/2022043018/5f3acb194ebeaa19c814aad1/html5/thumbnails/13.jpg)
13
Statistical learning
goal: given a function
and a collection of exampledata-points, learn what the function f(.) is.
this is called training.
two major types of learning:• unsupervised: only X is known, usually referred to as clustering;
• supervised: both are known during training, only X known at test time, usually referred to as classification or regression.
)(xfy =x(.)f
![Page 14: ECE-271A Statistical Learning I · the course is an introductory level course in statistical learning by introductory I mean that you will not need any previous exposure to the field,](https://reader033.fdocuments.in/reader033/viewer/2022043018/5f3acb194ebeaa19c814aad1/html5/thumbnails/14.jpg)
14
Supervised learning
X can be anything, but the type of Y dictates the type of supervised learning problem• Y in {0,1} referred to as detection
• Y in {0, ..., M-1} referred to as classification
• Y real referred to as regression
theory is quite similar, algorithms similar most of the time
we will emphasize classification, but will talk about regression when particularly insightful
![Page 15: ECE-271A Statistical Learning I · the course is an introductory level course in statistical learning by introductory I mean that you will not need any previous exposure to the field,](https://reader033.fdocuments.in/reader033/viewer/2022043018/5f3acb194ebeaa19c814aad1/html5/thumbnails/15.jpg)
15
Exampleclassifying fish:• fish roll down a conveyer belt
• camera takes a picture
• goal: is this a salmon or a sea-bass?
Q: what is X? What featuresdo I use to distinguish between the two fish?
this is somewhat of an art-form. Frequently, the best is to ask experts.
e.g. “obvious! use length and scale width!”
![Page 16: ECE-271A Statistical Learning I · the course is an introductory level course in statistical learning by introductory I mean that you will not need any previous exposure to the field,](https://reader033.fdocuments.in/reader033/viewer/2022043018/5f3acb194ebeaa19c814aad1/html5/thumbnails/16.jpg)
16
Classification/detection
two major types of classifiers:• discriminant: directly recover the
decision boundary that best separates the classes;
• generative: fit a probability model to each class and then “analyze” the models to find the border.
a lot more on this later!
focus will be on generative learning.
discriminant will be covered by 271B.
![Page 17: ECE-271A Statistical Learning I · the course is an introductory level course in statistical learning by introductory I mean that you will not need any previous exposure to the field,](https://reader033.fdocuments.in/reader033/viewer/2022043018/5f3acb194ebeaa19c814aad1/html5/thumbnails/17.jpg)
17
Caution
how do we know learning worked?
we care about generalization, i.e. accuracy outside training set
models that are too powerful canlead to over-fitting:• e.g. in regression I can always fit
exactly n pts with polynomial oforder n-1.
• is this good? how likely is the errorto be small outside the training set?
• similar problem for classification
fundamental LAW: only test set results matter!!!
![Page 18: ECE-271A Statistical Learning I · the course is an introductory level course in statistical learning by introductory I mean that you will not need any previous exposure to the field,](https://reader033.fdocuments.in/reader033/viewer/2022043018/5f3acb194ebeaa19c814aad1/html5/thumbnails/18.jpg)
18
Generalization
good generalization requires controlling the trade-off between training and test error• training error large, test error large
• training error smaller, test error smaller
• training error smallest, test error largest
this trade-off is known by many names
in the generative classification world it is usually due to the bias-variance trade-off of the class models
will look at this in detail
![Page 19: ECE-271A Statistical Learning I · the course is an introductory level course in statistical learning by introductory I mean that you will not need any previous exposure to the field,](https://reader033.fdocuments.in/reader033/viewer/2022043018/5f3acb194ebeaa19c814aad1/html5/thumbnails/19.jpg)
19
Class-modelingeach class is characterized by a probability density function (class conditional density)
a model is adopted, e.g. a Gaussian
training data used to estimate model parameters
overall the process is referred to as density estimation
the simplest example would be to use histograms
![Page 20: ECE-271A Statistical Learning I · the course is an introductory level course in statistical learning by introductory I mean that you will not need any previous exposure to the field,](https://reader033.fdocuments.in/reader033/viewer/2022043018/5f3acb194ebeaa19c814aad1/html5/thumbnails/20.jpg)
20
Density estimation
there are, however, much better models
usually, problem has two components:• selecting a model
• estimating model parameters
models:• we will cover the whole gamut
• from the exponential family (e.g. Gaussian)
• to kernel-based density estimates
• including mixture models
• and non-parametric approaches(nearest neighbors, histograms, etc.)
![Page 21: ECE-271A Statistical Learning I · the course is an introductory level course in statistical learning by introductory I mean that you will not need any previous exposure to the field,](https://reader033.fdocuments.in/reader033/viewer/2022043018/5f3acb194ebeaa19c814aad1/html5/thumbnails/21.jpg)
21
Parameter estimationtwo main camps:• maximum likelihood
• Bayesian estimates
ML will devote most attention to the quality of estimates• the bias variance/trade-off
a lot more emphasis on Bayes:• subjective probability
• what is really a prior?
• mechanics: predictive distribution, MAP estimates, etc.
• priors: conjugate, non-informative, improper
• why is the exponential family special?
![Page 22: ECE-271A Statistical Learning I · the course is an introductory level course in statistical learning by introductory I mean that you will not need any previous exposure to the field,](https://reader033.fdocuments.in/reader033/viewer/2022043018/5f3acb194ebeaa19c814aad1/html5/thumbnails/22.jpg)
22
Decision rules
given class models, Bayesiandecision theory provides us withoptimal rules for classification
optimal here means minimumprobability of error, for example
we will • study BDT in detail,
• establish connections to other decision principles (e.g. linear discriminants)
• show that Bayesian decisions are usually intuitive
derive optimal rules for a range of classifiers
![Page 23: ECE-271A Statistical Learning I · the course is an introductory level course in statistical learning by introductory I mean that you will not need any previous exposure to the field,](https://reader033.fdocuments.in/reader033/viewer/2022043018/5f3acb194ebeaa19c814aad1/html5/thumbnails/23.jpg)
24
Reasons to take the coursestatistical learning• tremendous amount of theory
• but things invariably go wrong
• too little data, noise, too many dimensions, training sets that do not reflect all possible variability, etc.
good learning solutions require:• knowledge of the domain (e.g. “these are the features to use”)
• knowledge of the available techniques, their limitations, etc. (e.g. here a Gaussian is enough for AB&C, but there I need a mixture)
in the absence of either of these you will fail!
we will cover the basics, but will talk about quite advanced concepts.
easier scenario in which to understand them
![Page 24: ECE-271A Statistical Learning I · the course is an introductory level course in statistical learning by introductory I mean that you will not need any previous exposure to the field,](https://reader033.fdocuments.in/reader033/viewer/2022043018/5f3acb194ebeaa19c814aad1/html5/thumbnails/24.jpg)
25
Reasons to take the course
theory together with hands-on experience• will cover all theory, every week 5-6
problems
• “hands-on” component: one computational problem per week
• this will center around cheetah segmentation
• allows evaluation of the benefits of more advance techniques as they are introduced
• forces you to deal with real, noisy, data
• exposes you to working on a new domain
![Page 25: ECE-271A Statistical Learning I · the course is an introductory level course in statistical learning by introductory I mean that you will not need any previous exposure to the field,](https://reader033.fdocuments.in/reader033/viewer/2022043018/5f3acb194ebeaa19c814aad1/html5/thumbnails/25.jpg)
26
![Page 26: ECE-271A Statistical Learning I · the course is an introductory level course in statistical learning by introductory I mean that you will not need any previous exposure to the field,](https://reader033.fdocuments.in/reader033/viewer/2022043018/5f3acb194ebeaa19c814aad1/html5/thumbnails/26.jpg)
27
Cheetah day
last class, we will have “Cheetah Day”
what:• 5 teams
• each team will write a report on the 5 cheetah problems
• each team will give a presentation on one of the problems
why: • to make sure that we get the “big picture”
out of all this work
• presenting is always good practice
![Page 27: ECE-271A Statistical Learning I · the course is an introductory level course in statistical learning by introductory I mean that you will not need any previous exposure to the field,](https://reader033.fdocuments.in/reader033/viewer/2022043018/5f3acb194ebeaa19c814aad1/html5/thumbnails/27.jpg)
28
Cheetah Dayhow much:• 10% of the final grade (5% report, 5%
presentation)
what to talk about:• report: comparative analysis of all solutions
of the problem
• as if you were writing a conference paper
• presentation: will be on one single problem• review what solution was
• what did this problem taught us about learning?
• what “tricks” did we learn solving it?
• how well did this solution do compared to others?
will talk about this in due time