Sedat Bikent NN-Part2sedat/CS484_555/Sedat_Bikent_NN... · 2020. 2. 29. · Title:...
Transcript of Sedat Bikent NN-Part2sedat/CS484_555/Sedat_Bikent_NN... · 2020. 2. 29. · Title:...
Introduction to Computer Vision
Dr. Sedat Ozer
Introduction to Neural Networks: Part2
Use of Python
• Both GPU and CPU has parallelization instructions and many Python built-in functions supports that. • Use built-in functions for matrix (or vector) operations and avoid
using for loops in your code whenever you can!
2Lecture Notes for Computer Vision Sedat Ozer
Cost function and GD implementation
3
! ", $ =1'()*+
,
ℒ(/0 ) , 0()))
2ℒ(3,4)
25+
= 61(7 − 0)
2ℒ(3,4)
29= 7 − 0
2ℒ(3,4)
25:
= 62(7 − 0)
<! ", $<"1
=1'()*+
,<ℒ a
(>), 0(>)
<"1
<! ", $<"2
=1'()*+
,<ℒ a
(>), 0(>)
<"2
<! ", $<$
=1'()*+
,<ℒ a
(>), 0(>)
<$
Remember the loss for single sample:
Cost Function:
Final derivatives to be used:! d"1 d"2 d$ ∝ =0.00001
i +E>= FGH
i+ $
7>= J E
(>)
! += −[0())log 7 ) (1 − 0())) log(1 − 7 ) )]dE
>= 7
>−0())
d"1 += 6+()) dE
>
d"2 += 6:()) dE
>
d$ += dE>
! ! d"1 d"1
d"2 d"2 d$ d$"1 "1 ∝ d"1"2 "2 ∝ d"2$ b ∝ db
Implement all that:
x2
x1
/0w1
w2
b
z=wTx + b J (z)
Lecture Notes for Computer Vision Sedat Ozer
Lets Re-implement the Logistic Regression
4
Xnxm=x1 xm… features
samples
Remember:
Z= z1 zm…= #$% + [(, (, … , (]
1xm
Implementation for Z: Z = ) ( +
A= a1 am…=-(Z)
Y= y1, y2
, …,ym
Note that in this particular example, the output is a vector. Therefore, the matrix Z (thus, the matrix A) becomes a vector.
Lecture Notes for Computer Vision Sedat Ozer
(Training) Data
A Better Implementation in Python
5
! d#1 d#2 d&
i +()= +,-
i+ &
/)= 0((
()))
! += −[5(6)log / 6 (1 − 5(6)) log(1 − / 6 )]
d()= /
)−5(6)
d#1 += ;<(6)d(
)
d#2 += ;=(6)d(
)
d& += d()
! ! d#1 d#1
d#2 d#2 d& d&
#1 #1 ∝ d#1
#2 #2 ∝ d#2
& b ∝ db
@ = +,A + &A= 0(@)
dZ = A– Y
d +=(1/m)X dZT
dK =(1/m)np.sum(dZ)
+ + ∝ d+
K K ∝ dK
One iteration of gradient descent One iteration of gradient descent
Lecture Notes for Computer Vision Sedat Ozer
Neural “Networks”
6Lecture Notes for Computer Vision Sedat Ozer
Neural Networks
• So far, we have seen only a “single neuron” (with two models)!• Linear model• Logistic regression model
• The performance of a single unit (single neuron) is limited!• Higher performance can be achieved by forming a network of
multiple neurons.
7Lecture Notes for Computer Vision Sedat Ozer
!"!#!$
%&
x
w
b
' = )*+ + - . = /(') ℒ(., &)
x
4["]
7["]'["] = 4["]+ + 7["] 8["] = /(9["]) '[#] = 4[#]:["] + -[#] .[#] = /('[#]) ℒ(.[#], &)
!"!#!$
%&
;[#]
-[#]
A Neuron vs. A Neural Network
<=>.?@
<=>.?@Lecture Notes for Computer Vision Sedat Ozer
A 2-layer FC-NN Example:
9
!"
!#
!$
!%
&'!(
Input Layer(L0)
Hidden Layer(L1)
Output Layer(L2)
a[1]x = a[0]
&' = a[2]
Each layer has its own weights and bias values. So… the kth layer would have: W[k] and b[k]
W[1] is a (3x5) matrix,b[1] is a (3x1) vector
W[2] is a (1x3) matrix (i.e., a row vector),b[2] is a (1x1) vector
&' = a[2]= )"[#]
1x1
&'11x1
=
-"["]=
.""
.#"
.("
.$"
.%"5x1
a vector
Layer
Unit
(Reads: the weight vector of the 1st unit at the first layer)
W[1] =."" .#" .(" .$" .%"."# .## .(# .$# .%#."( .#( .(( .$( .%(
1st unit in L1 (-"" /)
2nd unit in L1 (-#" /)
3rd unit in L1 (-(" /)
3x5
The Weight Matrix for the entire layer 1:
a[1]=
)"["]
)#["])(["]
3x1
Lecture Notes for Computer Vision Sedat Ozer
[1]
What if I have more than 2 classes?• In logistic regression we assumed that we had only two classes: Class 0 & Class 1.• What if I have more than two classes?• One typical approach is: using one vs. all approach.
• (where, for example, you first consider Class 0 as one class and the combination of all the other classes as the “other” class. Then you consider Class 1 as one class and the combination of all the other classes as the “other class”,...)
• Another approach might be using Softmax Regression instead of logistic regression.• Define a new cost function and derive all the weight update rules according to that cost
function.
10Lecture Notes for Computer Vision Sedat Ozer
Softmax Regression
• Remember Logistic Regression: We had only two classes (Class 0 and Class 1, i.e., y(i)∈{0,1}).• Softmax Regression is the case where we have K classes ( K > 2 ) such
that: y(i)∈{1,…,K}. • Sigmoid function is no longer being used.• Now the output can take K different values rather than just two.
11
" #, % = 1()
*+,
-ℒ(a(1), 3(1))ℒ a, 3 = −)
*+,
K31567(8*)89 = :39 = 7(;9) =
<=>∑*+,@ <=A
Softmax Loss Function Softmax Cost FunctionSoftmax Activation Function
Lecture Notes for Computer Vision Sedat Ozer
Lets have a look at an example:
13
!"#$%&"' (&) : +,
&'#"-. − 0.$!%ℎ: +2
ℎ"34. 4&(.: +5# 7.89""-4: +:
# 7$%ℎ9""-4: +; <$-&!= 4&(.
n: total number of features = 6
!"#$! $-.'&%&.4
4#ℎ""! >3$!&%=
0$!?$7&!&%=
)")3!$%&"': +@ A=1(-$9?.% )")3!$9&%=)Discrete value: popular/ not
Lecture Notes for Computer Vision Sedat Ozer
A Fully Connected (FC) Neural Network (6 inputs, 1 output)
14
!"
!#
!$!%
&'
!(
n: total number of features = 6
!)
Lecture Notes for Computer Vision Sedat Ozer
A Fully Connected (FC) Neural Network (6 inputs, 2 outputs)
15
!"
!#
!$!%
&'1
!(
n: total number of features = 6
!)&'2
a “unit” (a neuron)
Lecture Notes for Computer Vision Sedat Ozer
A 2-layer FC-NN Example:
16
!"
!#
!$
!%
&'!(
Input Layer(L0)
Hidden Layer(L1)
Output Layer(L2)
a[1]x = a[0]
&' = a[2]
a[1] =
)"["]
)#["])(["]
3x1
&' = a[2]= )"[#]1x1
&'11x1
=
W[1] =-"" -#" -(" -$" -%"-"# -## -(# -$# -%#-"( -#( -(( -$( -%(
3x5
The Weight Matrix for the entire layer 1:
=.(0"["]).(0#["]).(0(["])
3x1
= .(z[1]) 2"["]
2#["]2(["]
-"" -#" -(" -$" -%"-"# -## -(# -$# -%#-"( -#( -(( -$( -%(
!1!2!3!4!5
+ =0"["]
0#["]0(["] 3x13x1
5x13x5
x
z[1]= W[1] x + b[1] =
Lecture Notes for Computer Vision Sedat Ozer
17
!"
!#
!$
!%
&'!(
Input Layer(L0)
Hidden Layer(L1)
Output Layer(L2)
a[1]x = a[0]
&' = a[2]
a[1] =
)"["]
)#["])(["]
3x1
&' = a[2]= )"[#]1x1
&'11x1
=
-"["]
-#["]-(["]
z[1]= W[1] x + b[1] =."" .#" .(" .$" .%"."# .## .(# .$# .%#."( .#( .(( .$( .%(
!1!2!3!4!5
+ =3"["]
3#["]3(["] 3x13x1
5x13x5
=4(3"["])4(3#["])4(3(["])
3x1
= 4(z[1])
x
z[1] =W[1] x + b[1]
a[1] = 4(z[1])
z[2] = W[2] a[1] + b[2]
a[2] = 4(z[2])
Steps to compute the output for logistic regression for one input sample:
Lecture Notes for Computer Vision Sedat Ozer
A 2-layer FC-NN Example:
Computation with less for-loop
18
for i = 1 to m! " ($) = ' " (($) + * "
+ " ($) = ,(! " $ )- . ($) = ' . + " ($) + / .
0 . ($) = ,(- . $ )
1 " = 2 " 3 + * "
4 " = ,(1 " )1 . = 2 . 4 " + * .
4 . = ,(1 . )
…5 = 6(") 6(.) 6(7)
0["](.)A["] = 0["](") 0["](7)…
m = number of training samples
6"6.6;
<=
Algorithm 1:
Algorithm 2:
(> " has the same shape as A["]) Lecture Notes for Computer Vision Sedat Ozer
Another 2 layer FC NN Example
19
!"
!#
!$
!%
&"["]
)*1
!+
!,
)*2
Input LayerWith 6 features
Hidden Layerwith 4 units
Output Layerwith 2 units
a[1]x = a[0]
)- = a[2]
&#["]
&,["]
&$["]
&"[#]
&#[#]
A 2 layer NN with 6 inputs and 2 outputs
a[1]=
&"["]
&#["]&,["]
&$["]4x1
)- = a[2]= &"[#]
&#[#]2x1
)*1)*2
2x1
=
QUESTION: What are the dims of W[1] and W[2]?Dims = (a x b); a=? b=?
a=4 and b=6 for W[1]
Lecture Notes for Computer Vision Sedat Ozer
20
zSigmoid:
z
z
z
Common Activation Functions
ReLU:
tanh: Leaky ReLU:
Lecture Notes for Computer Vision Sedat Ozer
a(z)a(z)
! = # $ = max(0, $)
! = # $ = ,- − ,/-,- + ,/-
! = # $ = max(0.01$, $)
! = # $ = 11 + ,/-
a(z)a(z)
Activation Function as: g(z)
21
! " = $ " % + ' "
( " = )(! " )! , = $ , ( " + ' ,
( , = )(! , )
Algorithm 2:
! " = $ " % + ' "
( " = -(! " )! , = $ , ( " + ' ,
( , = -(! , )
Algorithm 2:
Lecture Notes for Computer Vision Sedat Ozer
With or Without the Activation Function
22
Lets have a look at the case where we do not use any activation function. (That is also equivalent to setting !(z[1]) = z[1] )
#$ = a[2]= %&[(]1x1
#$11x1
=
z[1] =W[1] x + b[1]
a[1] =!(z[1])
z[2] = W[2] a[1] + b[2]
a[2] =!(z[2])
#$ = z[2]= W x + b
z[1] =W[1] x + b[1]
a[1] = z[1]
z[2] = W[2] z[1] + b[2]
a[2] = z[2] = W[2] z[1] + b[2]
= W[2] [W[1] x + b[1]]+ b[2]
= W[2] W[1] x + W[2] b[1]+ b[2]
= W x + b
The output is always a linear function of the input!
Lecture Notes for Computer Vision Sedat Ozer
+&+(+,
#$
• Remember that the updating process of the parameters depends on the derivatives!• That also depends on the derivative of the chosen activation function!• (we used sigmoid function previously in our logistic regression
implementation).
23Lecture Notes for Computer Vision Sedat Ozer
Derivatives for the Activation Functions
24
a(z)
z
z
a(z)
z
a(z)
z
a(z)Derivatives for the Activation Functions
ReLU: ! = # $ = max(0, $)
tanh: ! = # $ = ,- − ,/-,- + ,/-
Leaky ReLU: ! = # $ = max(0.01$, $)
Sigmoid: ! = # $ = 11 + ,/-
#3 - = 4# $4$ = # $ (1 − # $ ) = !(1 − !)
#3 - = 4# $4$ == (1 − (tanh $ )2) = (1 − !2)
#3 - = 4# $4$ =
0, 9: $ < 01, 9: $ > 0
=>4,:9>,4, 9: $ = 0
#3 - = 4# $4$ = 0.01, 9: $ < 0
1, 9: $ ≥ 0Lecture Notes for Computer Vision Sedat Ozer
25
![#]% + '[#] = ([#]
%
![#]
'[#]
) ( # = +[#] ℒ(+[.], y)![.]+[#] + '[.] = ([.] ) ( . = +[.]
![.]
'[.]
2([.] = +[.] − 4
2![.] = 2([.]+ # 5
2'[.] = 2([.]
2([#] = ! . 62([.] ∗ 8[#]′(z # )
2![#] = 2([#]%6
2'[#] = 2([#]
2;[.] = <[.] − =
2![.] =1?2;[.]< # 5
2'[.] =1?@A. CD?(2; . , +%EC = 1, FGGA2E?C = HIDG)
2;[#] = ! . 62;[.] ∗ 8[#]′(Z # )
2![#] =1?2;[#]K6
2'[#] =1?@A. CD?(2; # , +%EC = 1, FGGA2E?C = HIDG)
Lecture Notes for Computer Vision Sedat Ozer
%#%.%L
M4