Learning from Observations · Inductive learning • Simplest form: learn a function from examples...
Transcript of Learning from Observations · Inductive learning • Simplest form: learn a function from examples...
![Page 1: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/1.jpg)
Learning from Observations
Bishop, Ch.1 Russell & Norvig Ch. 18
![Page 2: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/2.jpg)
Learning as source of knowledge
• Implicit models: In many domains, we cannot say how we manage to perform so well
• Unknown environment: After some effort, we can get a system to work for a finite environment, but it fails in new areas
• Model structures: Learning can reveal properties (regularities) of the system behaviour
– Modifies agent's decision models to reduce complexity and improve performance
![Page 3: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/3.jpg)
Feedback in Learning
• Type of feedback:– Supervised learning: correct answers for each
example• Discrete (categories) : classification• Continuous : regression
– Unsupervised learning: correct answers not given
– Reinforcement learning: occasional rewards
![Page 4: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/4.jpg)
Inductive learning• Simplest form: learn a function from examples
An example is a pair (x, y) : x = data, y = outcomeassume: y drawn from function f(x) : y = f(x) + noise
f = target function
Problem: find a hypothesis hsuch that h ≈ fgiven a training set of examples
Note: highly simplified model :– Ignores prior knowledge : some h may be more likely– Assumes lots of examples are available– Objective: maximize prediction for unseen data – Q. How?
![Page 5: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/5.jpg)
Inductive learning method• Construct/adjust h to agree with f on training set• (h is consistent if it agrees with f on all examples)• E.g., curve fitting:
![Page 6: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/6.jpg)
y = f(x)
Regression: y is continuous
Classification: y : set of discrete values
e.g. classes C1, C2, C3...y ∈ {1,2,3...}
Regression vs Classification
![Page 7: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/7.jpg)
Precision: A / Retrieved
Positives
Recall:A / Actual
Positives
Precision vs Recall
![Page 8: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/8.jpg)
Regression
![Page 9: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/9.jpg)
Polynomial Curve Fitting
![Page 10: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/10.jpg)
Linear Regression
y = f(x) = Σi wi . φi(x)
φi(x) : basis functionwi : weights
Linear : function is linear in the weightsQuadratic error function --> derivative is linear in w
![Page 11: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/11.jpg)
Sum-of-Squares Error Function
![Page 12: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/12.jpg)
0th Order Polynomial
![Page 13: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/13.jpg)
1st Order Polynomial
![Page 14: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/14.jpg)
3rd Order Polynomial
![Page 15: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/15.jpg)
9th Order Polynomial
![Page 16: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/16.jpg)
Over-fitting
Root-Mean-Square (RMS) Error:
![Page 17: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/17.jpg)
Polynomial Coefficients
![Page 18: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/18.jpg)
9th Order Polynomial
![Page 19: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/19.jpg)
Data Set Size: 9th Order Polynomial
![Page 20: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/20.jpg)
Data Set Size: 9th Order Polynomial
![Page 21: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/21.jpg)
Regularization
Penalize large coefficient values
![Page 22: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/22.jpg)
Regularization:
![Page 23: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/23.jpg)
Regularization:
![Page 24: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/24.jpg)
Regularization: vs.
![Page 25: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/25.jpg)
Polynomial Coefficients
![Page 26: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/26.jpg)
Binary Classification
![Page 27: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/27.jpg)
y = f(x)
Regression:
y is continuous
Classification:
y : discrete values e.g. 0,1,2...for classes C0, C1, C2...
Binary Classification: two classesy ∈ {0,1}
Regression vs Classification
![Page 28: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/28.jpg)
Binary Classification
![Page 29: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/29.jpg)
Feature : Length
![Page 30: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/30.jpg)
Feature : Lightness
![Page 31: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/31.jpg)
Minimize Misclassification
![Page 32: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/32.jpg)
Precision / Recall
C1 : class of interest
Which is higher: Precision, or Recall?
![Page 33: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/33.jpg)
Precision / Recall
C1 : class of interest(Positives)
Recall = TP / TP +FP
![Page 34: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/34.jpg)
Precision / Recall
C1 : class of interest
Precision = TP / TP +FN
![Page 35: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/35.jpg)
Decisions - Feature Space
- Feature selection : which feature is maximally discriminative?
– Axis-oriented decision boundaries in feature space
– Length – or – Width – or Lightness?
- Feature Discovery: construct g(), defined on the feature space, for better discrimination
![Page 36: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/36.jpg)
Feature Selection: width / lightness
lightness is more discriminative- but can we do better?
select the most discriminative feature(s)
![Page 37: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/37.jpg)
- Feature selection : which feature is maximally discriminative?
– Axis-oriented decision boundaries in feature space
– Length – or – Width – or Lightness?
- Feature Discovery: discover discriminative function on feature space : g()
– combine aspects of length, width, lightness
Feature Selection
![Page 38: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/38.jpg)
Feature Discovery : Linear
Cross-Validation
![Page 39: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/39.jpg)
Decision Surface: non-linear
![Page 40: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/40.jpg)
Decision Surface : non-linear
overfitting!
![Page 41: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/41.jpg)
Learning process
- Feature set : representative? complete?
- Sample size : training set vs test set
- Model selection:
– Unseen data overfitting?
– Quality vs Complexity
– Computation vs Performance
![Page 42: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/42.jpg)
Best Feature set?
- Is it possible to describe the variation in the data in terms of a compact set of Features?
- Minimum Description Length
![Page 43: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/43.jpg)
Probability Theory
![Page 44: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/44.jpg)
Learning = discovering regularities
- Regularity : repeated experiments: outcome not be fully predictable
outcome = “possible world”set of all possible worlds = Ω
![Page 45: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/45.jpg)
Probability TheoryApples and Oranges
![Page 46: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/46.jpg)
Sample Space
Sample ω = Pick two fruits, e.g. Apple, then Orange
Sample Space Ω = {(A,A), (A,O),(O,A),(O,O)}
= all possible worlds
Event e = set of possible worlds, e ⊆ Ω• e.g. second one picked is an apple
![Page 47: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/47.jpg)
Learning = discovering regularities
- Regularity : repeated experiments: outcome not be fully predictable
- Probability p(e) : "the fraction of possible worlds in which e is true” i.e. outcome is event e
- Frequentist view : p(e) = limit as N → ∞- Belief view: in wager : equivalent odds
(1-p):p that outcome is in e, or vice versa
![Page 48: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/48.jpg)
Axioms of Probability
- non-negative : p(e) ≥ 0
- unit sum p(Ω) = 1i.e. no outcomes outside sample space
- additive : if e1, e2 are disjoint events (no common outcome):
p(e1) + p(e2) = p(e1 ∪ e2)ALT:
p(e1 ∨ e2) = p(e1) + p(e2) - p(e1 ∧ e2)
![Page 49: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/49.jpg)
Why probability theory?
different methodologies attempted for uncertainty: – Fuzzy logic– Multi-valued logic– Non-monotonic reasoning
But unique property of probability theory:
If you gamble using probabilities you have the best chance in a wager. [de Finetti 1931]
=> if opponent uses some other system, he's more likely to lose
![Page 50: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/50.jpg)
Ramsay-diFinetti theorem (1931)If agent X’s degrees of belief are rational, then X ’s
degrees of belief function defined by fair betting rates is (formally) a probability function
Fair betting rates: opponent decides which side one bets on
Proof: fair odds result in a function pr () that satisifies the Kolmogrov axioms:
Normality : pr(S) >=0
Certainty : pr(T)=1
Additivity : pr (S1 v S2 v.. )= Σ(Si)
![Page 51: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/51.jpg)
Joint vs. conditional probability
Marginal Probability
Conditional ProbabilityJoint Probability
![Page 52: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/52.jpg)
Probability Theory
Sum Rule
Product Rule
![Page 53: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/53.jpg)
Rules of Probability
Sum Rule
Product Rule
![Page 54: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/54.jpg)
Example
A disease d occurs in 0.05% of population. A test is 99% effective in detecting the disease, but 5% of the cases test positive in absence of d.
10000 people are tested. How many are expected to test positive?
p(d) = 0.0005 ; p(t/d) = 0.99 ; p(t/~d) = 0.05
p(t) = p(t,d) + p(t,~d) [Sum Rule]
= p(t/d)p(d) + p(t/~d)p(~d) [Product Rule]
= 0.99*0.0005 + 0.05 * 0.9995 = 0.0505 505 +ve
![Page 55: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/55.jpg)
Bayes’ Theorem
posterior ∝ likelihood × prior
![Page 56: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/56.jpg)
Bayes’ TheoremThomas Bayes (c.1750):
how can we infer causes from effects? How can one learn the probability of a future event if one knew only
how many times it had (or had not) occurred in the past?
as new evidence comes in --> prob knowledge improves. e.g. throw a die. guess is poor (1/6)
throw die again. is it > or < than prev? Can improve guess. throw die repeatedly. can improve prob of guess quite a lot.
Hence: initial estimate (prior belief P(h), not well formulated)+ new evidence (support) – compute likelihood P(data|h) improved estimate (posterior P(h|data) )
![Page 57: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/57.jpg)
ExampleA disease d occurs in 0.05% of population. A test is
99% effective in detecting the disease, but 5% of the cases test positive in absence of d.
If you are tested +ve, what is the probability you have the disease?
p(d/t) = p(d) . p(t/d) / p(t) ; p(t) = 0.0505
p(d/t) = 0.0005 * 0.99 / 0.0505 = 0.0098 (about 1%)
if 10K people take the test, E(d) = 5FPs = 0.05 * 9995 = 500 TPs = 0.99 * 5 = 5. only 5/505 have d
![Page 58: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/58.jpg)
Probability Densities
![Page 59: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/59.jpg)
Expectations
(both discrete / continuous)
Frequentist approximation w unbiased sample
discrete x continuous x
![Page 60: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/60.jpg)
Variances and Covariances
: Sum over x p(x)f(x,y) --> is a function of y
![Page 61: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/61.jpg)
Gaussian Distribution
![Page 62: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/62.jpg)
The Gaussian Distribution
![Page 63: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/63.jpg)
Gaussian Mean and Variance
![Page 64: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/64.jpg)
Central Limit Theorem
Distribution of sum of N i.i.d. random variables becomes increasingly Gaussian for larger N.
Example: N uniform [0,1] random variables.
![Page 65: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/65.jpg)
Gaussian Parameter Estimation
Likelihood function
Observations assumed to be
indpendentlydrawn from same distribution (i.i.d)
![Page 66: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/66.jpg)
Maximum (Log) Likelihood
![Page 67: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/67.jpg)
The Multivariate Gaussian
lines of equal probability densities
![Page 68: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/68.jpg)
![Page 69: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/69.jpg)
Multivariate distribution
joint distribution P(x,y) varies considerably though marginals P(x), P(y) are identical
estimating the joint distribution requires much larger sample: O(nk) vs nk
![Page 70: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/70.jpg)
Marginals and Conditionals
marginals P(x), P(y) are gaussianconditional P(x|y) is also gaussian
![Page 71: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/71.jpg)
Non-intuitive in high dimensions
As dimensionality increases, bulk of data moves away
from center
Gaussian in polar coordinates; p(r)δr : prob. mass inside annulus δr at r.
![Page 72: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/72.jpg)
Change of variable x=g(y)
![Page 73: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/73.jpg)
Successive Trials – e.g. Toss a coin three times:HHH, HHT, HTH, HTT, THH, THT, TTH, TTT
Probability of k Heads:
k 0 1 2 3P(k) 1/8 3/8 3/8 1/8
Probability of success: p, failure q, then
Bernoulli Process
![Page 74: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/74.jpg)
Model Selection
![Page 75: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/75.jpg)
Model Selection
Cross-Validation
![Page 76: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/76.jpg)
Curse of Dimensionality
![Page 77: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/77.jpg)
Curse of Dimensionality
Polynomial curve fitting, M = 3
Gaussian Densities in higher dimensions
![Page 78: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/78.jpg)
Performance measurement
• How do we know that h ≈ f ?1. Use theorems of computational/statistical learning theory
2. Try h on a new test set of examples(use same distribution over example space as training set)
Learning curve = % correct on test set as a function of training set size
![Page 79: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/79.jpg)
Regression with Polynomials
![Page 80: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/80.jpg)
Curve Fitting Re-visited
![Page 81: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/81.jpg)
Maximum Likelihood
Determine by minimizing sum-of-squares error, .
![Page 82: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/82.jpg)
Predictive Distribution
![Page 83: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/83.jpg)
MAP: A Step towards Bayes
Determine by minimizing regularized sum-of-squares error, .
MAP = Maximum Posterior
![Page 84: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/84.jpg)
Bayesian Curve Fitting
![Page 85: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/85.jpg)
Bayesian Predictive Distribution
![Page 86: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/86.jpg)
Information Theory
![Page 87: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/87.jpg)
Twenty Questions
Knower: thinks of object (point in a probability space)Guesser: asks knower to evaluate random variables
Stupid approach:
Guesser: Is it my left big toe?Knower: No.
Guesser: Is it Valmiki? Knower: No.
Guesser: Is it Aunt Lakshmi?...
![Page 88: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/88.jpg)
Expectations & Surprisal
Turn the key: expectation: lock will open
Exam paper showing: could be 100, could be zero. random variable: function from set of marks
to real interval [0,1]
Interestingness ∝ unpredictability
surprisal (r.v. = x) = - log2 p(x)= 0 when p(x) = 1= 1 when p(x) = ½ = ∞ when p(x) = 0
![Page 89: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/89.jpg)
A: 00010001000100010001. . . 0001000100010001000100010001
B: 01110100110100100110. . . 1010111010111011000101100010
C: 00011000001010100000. . . 0010001000010000001000110000
Expectations in data
Structure in data easy to remember
![Page 90: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/90.jpg)
Entropy
Used in• coding theory• statistical physics• machine learning
![Page 91: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/91.jpg)
Entropy
![Page 92: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/92.jpg)
Entropy
In how many ways can N identical objects be allocated Mbins?
Entropy maximized when
![Page 93: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/93.jpg)
Entropy in Coding theory
x discrete with 8 possible states; how many bits to transmit the state of x?
All states equally likely
![Page 94: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/94.jpg)
Coding theory
![Page 95: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/95.jpg)
Entropy in Twenty Questions
Intuitively : try to ask q whose answer is 50-50
Is the first letter between A and M?
question entropy = p(Y)logp(Y) + p(N)logP(N)
For both answers equiprobable: entropy = - ½ * log2(½) - ½ * log2(½) = 1.0
For P(Y)=1/1028entropy = - 1/1028 * -10 - eps = 0.01
![Page 96: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/96.jpg)
Learning Logical Rules Decision Trees
Duda and Hart, Ch.1
Russell & Norvig Ch. 18
![Page 97: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/97.jpg)
Boolean Decision Trees
![Page 98: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/98.jpg)
Attribute-based representations
• Examples described by attribute values (Boolean, discrete, continuous)
• E.g., situations where I will/won't wait for a table:
• Classification of examples is positive (T) or negative (F)
![Page 99: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/99.jpg)
Learning decision trees
Problem: decide whether to wait for a table at a restaurant, based on the following attributes:1. Alternate: is there an alternative restaurant nearby?2. Bar: is there a comfortable bar area to wait in?3. Fri/Sat: is today Friday or Saturday?4. Hungry: are we hungry?5. Patrons: number of people in the restaurant (None, Some, Full)6. Price: price range ($, $$, $$$)7. Raining: is it raining outside?8. Reservation: have we made a reservation?9. Type: kind of restaurant (French, Italian, Thai, Burger)10. WaitEstimate: estimated waiting time (0-10, 10-30, 30-60, >60)
![Page 100: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/100.jpg)
Continuous orthogonal domains
classification and regression trees CART [Breiman 84] ID3: [Quinlan 86]
![Page 101: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/101.jpg)
Expressiveness• Decision trees can express any function of the input attributes.• E.g., for Boolean functions, truth table row → path to leaf:
• Trivially, there is a consistent decision tree for any training set with one path to leaf for each example (unless f nondeterministic in x) but it probably won't generalize to new examples
• Prefer to find more compact decision trees
![Page 102: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/102.jpg)
Which attribute to use first?
Gain(S; A) = expected reduction in entropy due to sorting on attribute A
![Page 103: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/103.jpg)
Choosing an attribute
• Idea: a good attribute splits the examples into subsets that are (ideally) "all positive" or "all negative"
• Gain (Patrons)
• Gain (Type)
bitsBBBB 541.0)]22
(126
)44
(124
)20
(122
[)126
( =++−=
0)]21
(.1[1 =−= B Information Gain is higher for Patrons
![Page 104: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/104.jpg)
Decision trees
• One possible representation for hypotheses
• E.g., here is the “true” tree for deciding whether to wait:
![Page 105: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/105.jpg)
Information gain
• A chosen attribute A divides the training set E into subsets E1, … , Ev according to their values for A, where A has vdistinct values.
• Information Gain (IG) or reduction in entropy from the attribute test:
• Choose the attribute with the largest IG
∑= ++
+=
v
i ii
iii
nppB
npnpAremainder
1)()(
)()()( Aremaindernp
pBAIG −+
=
![Page 106: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/106.jpg)
Too many ways to order the tree
How many distinct decision trees with n Boolean attributes?
= number of Boolean functions
= number of distinct truth tables with 2n rows = 22n
• E.g., with 6 Boolean attributes, there are 18,446,744,073,709,551,616 trees
![Page 107: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/107.jpg)
Hypothesis spacesHow many distinct decision trees with n Boolean attributes?= number of Boolean functions= number of distinct truth tables with 2n rows = 22n
• E.g., with 6 Boolean attributes, there are 18,446,744,073,709,551,616 trees
How many purely conjunctive hypotheses (e.g., Hungry ∧ ¬Rain)?• Each attribute can be in (positive), in (negative), or out
⇒ 3n distinct conjunctive hypotheses• More expressive hypothesis space
– increases chance that target function can be expressed– increases number of hypotheses consistent with training set
⇒ may get worse predictions
![Page 108: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/108.jpg)
Information gain
For the training set, p = n = 6, I(6/12, 6/12) = 1 bit
Consider the attributes Patrons and Type (and others too):
Patrons has the highest IG of all attributes and so is chosen by the DTL algorithm as the root
b 0) ]42,
42(
1 24)
42,
42(
1 24)
21,
21(
1 22)
21,
21(
1 22[1)(
b i t 0 5 4.) ]64,
62(
1 26)0,1(
1 24)1,0(
1 22[1)(
=+++−=
=++−=
IIIIT y p eI G
IIIP a t r o n sI G
![Page 109: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/109.jpg)
Decision tree learning
• Aim: find a small tree consistent with the training examples
• Idea: (recursively) choose "most significant" attribute as root of (sub)tree
![Page 110: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/110.jpg)
Example contd.
• Decision tree learned from the 12 examples:
• Substantially simpler than “true” tree---a more complex hypothesis isn’t justified by small amount of data
![Page 111: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/111.jpg)
Decision Theory
![Page 112: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/112.jpg)
Decision Theory
Inference stepDetermine either or .
Decision stepFor given x, determine optimal t.
![Page 113: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/113.jpg)
Minimum Expected Loss
Example: classify medical images as ‘cancer’ or ‘normal’
Loss matrix L:
Decision
Trut
h
![Page 114: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/114.jpg)
Minimum Expected Loss
Regions are chosen to minimize
![Page 115: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/115.jpg)
Reject Option
![Page 116: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/116.jpg)
Why Separate Inference and Decision?• Minimizing risk (loss matrix may change over time)• Reject option• Unbalanced class priors• Combining models
![Page 117: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/117.jpg)
Decision Theory for Regression
Inference stepDetermine .
Decision stepFor given x, make optimal prediction, y(x), for t.
Loss function:
![Page 118: Learning from Observations · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcome](https://reader034.fdocuments.in/reader034/viewer/2022042908/5f38848728ddfb16c20ef8f5/html5/thumbnails/118.jpg)
The Squared Loss Function