Andrew Ng Linear regression with one variable Model representation Machine Learning.
-
Upload
annabelle-hart -
Category
Documents
-
view
263 -
download
2
Transcript of Andrew Ng Linear regression with one variable Model representation Machine Learning.
Andrew Ng
Linear regression with one variable
Model representation
Machine Learning
Andrew Ng
500 1000 1500 2000 2500 30000
100000
200000
300000
400000
500000
500 1000 1500 2000 2500 30000
100000
200000
300000
400000
500000Housing Prices(Portland, OR)
Price(in 1000s of dollars)
Size (feet2)
Supervised Learning
Given the “right answer” for each example in the data.
Regression Problem
Predict real-valued outputClassification : Discrete-valued output
220
1250
Andrew Ng
Notation:
m = Number of training examples x’s = “input” variable / features y’s = “output” variable / “target” variable
Size in feet2 (x) Price ($) in 1000's (y)2104 4601416 2321534 315852 178… …
Training set ofhousing prices(Portland, OR)
(x, y) – one training example(x(i), y(i)) – ith trainingg example
x(1) = 2104x(2) = 1416y(1) = 460
m
Andrew Ng
Training Set
Learning Algorithm
How do we represent h ?
h maps from x’s to y’s
Size of house
x
h
hypothesis
Estimated price
Estimated value Linear regression with one variable.
Univariate linear regression.
One variable
Andrew Ng
Cost function
Machine Learning
Linear regression with one variable
Andrew Ng
How to choose ‘s ?
Training Set
Hypothesis:
‘s: Parameters
Size in feet2 (x) Price ($) in 1000's (y)2104 4601416 2321534 315852 178… …
Andrew Ng
0 1 2 30
1
2
3
0 1 2 30
1
2
3
0 1 2 30
1
2
3
h(x) = 1.5 + 0·x h(x) = 0.5·x
h(x) = 1 + 0.5·x
Andrew Ng
y
x
Idea: Choose so that is close to for our training examples
h(x) = +
(x(i), y(i))
minimize Θ0Θ1Θ0, Θ1
J() =
Minimize J() : Cost Function Θ0Θ1
Squared error function
Andrew Ng
Cost functionintuition I
Machine Learning
Linear regression with one variable
Andrew Ng
Hypothesis:
Parameters:
Cost Function:
Goal:
Simplified
h(x) h(x) = 0
Andrew Ng
0 1 2 30
1
2
3
y
x
(for fixed , this is a function of x) (function of the parameter )
-0.5 0 0.5 1 1.5 2 2.50
1
2
3
J() =
=
=
𝐽 (1 )=0
Andrew Ng
0 1 2 30
1
2
3
y
x
(for fixed , this is a function of x) (function of the parameter )
-0.5 0 0.5 1 1.5 2 2.50
1
2
3
J() =
= (3.5) = 0.58
y ( i )
hΘ ( x ( i ))
Andrew Ng
-0.5 0 0.5 1 1.5 2 2.50
1
2
3
y
x
(for fixed , this is a function of x) (function of the parameter )
0 1 2 30
1
2
3
J() =
= = 2.3
Andrew Ng
Cost functionintuition II
Machine Learning
Linear regression with one variable
Andrew Ng
Hypothesis:
Parameters:
Cost Function:
Goal:
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )
500 1000 1500 2000 2500 30000
100000
200000
300000
400000
500000
Price ($) in 1000’s
Size in feet2 (x)
= 50
= 0.06
Andrew Ng
Contour plots
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )
h(x) = 360 + 0·x
= 360
= 0
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )
Andrew Ng
Gradient descent
Machine Learning
Linear regression with one variable
Andrew Ng
Have some function
Want
Outline:
• Start with some
• Keep changing to reduce
until we hopefully end up at a minimum
Andrew Ng
1
0
J(0,1)
Andrew Ng
0
1
J(0,1)
Andrew Ng
Gradient descent algorithm
Correct: Simultaneous update Incorrect:
Simultaneously update & Learning rate
assignmenta:=b
Andrew Ng
Gradient descentintuition
Machine Learning
Linear regression with one variable
Andrew Ng
Gradient descent algorithm
Learning rate derivative
Andrew Ng
:= -
:= - ≥0
:= -
:= - ≤0
Andrew Ng
If α is too small, gradient descent can be slow.
If α is too large, gradient descent can overshoot the minimum. It may fail to converge, or even diverge.
Andrew Ng
at local optima
Current value of
Andrew Ng
Gradient descent can converge to a local minimum, even with the learning rate α fixed.
As we approach a local minimum, gradient descent will automatically take smaller steps. So, no need to decrease α over time.
Andrew Ng
Gradient descent for linear regression
Machine Learning
Linear regression with one variable
Andrew Ng
Gradient descent algorithm Linear Regression Model
Andrew Ng
=
Andrew Ng
Gradient descent algorithm
update and
simultaneously
Andrew Ng
1
0
J(0,1)
Andrew Ng
Convex function
Bowl-shaped
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )
Andrew Ng
“Batch” Gradient Descent
“Batch”: Each step of gradient descent uses all the training examples.