Sedat Bikent NN-Part2sedat/CS484_555/Sedat_Bikent_NN... · 2020. 2. 29. · Title:...

Introduction to Computer Vision

Dr. Sedat Ozer

Introduction to Neural Networks: Part2

Use of Python

• Both GPU and CPU has parallelization instructions and many Python built-in functions supports that. • Use built-in functions for matrix (or vector) operations and avoid

using for loops in your code whenever you can!

2Lecture Notes for Computer Vision Sedat Ozer

Cost function and GD implementation

3

! ", $ =1'()*+

,

ℒ(/0 ) , 0()))

2ℒ(3,4)

25+

= 61(7 − 0)

2ℒ(3,4)

29= 7 − 0

2ℒ(3,4)

25:

= 62(7 − 0)

<! ", $<"1

=1'()*+

,<ℒ a

(>), 0(>)

<"1

<! ", $<"2

=1'()*+

,<ℒ a

(>), 0(>)

<"2

<! ", $<$

=1'()*+

,<ℒ a

(>), 0(>)

<$

Remember the loss for single sample:

Cost Function:

Final derivatives to be used:! d"1 d"2 d$ ∝ =0.00001

i +E>= FGH

i+ $

7>= J E

(>)

! += −[0())log 7 ) (1 − 0())) log(1 − 7 ) )]dE

>= 7

>−0())

d"1 += 6+()) dE

>

d"2 += 6:()) dE

>

d$ += dE>

! ! d"1 d"1

d"2 d"2 d$ d$"1 "1 ∝ d"1"2 "2 ∝ d"2$ b ∝ db

Implement all that:

x2

x1

/0w1

w2

b

z=wTx + b J (z)

Lecture Notes for Computer Vision Sedat Ozer

Lets Re-implement the Logistic Regression

4

Xnxm=x1 xm… features

samples

Remember:

Z= z1 zm…= #$% + [(, (, … , (]

1xm

Implementation for Z: Z = ) ( +

A= a1 am…=-(Z)

Y= y1, y2

, …,ym

Note that in this particular example, the output is a vector. Therefore, the matrix Z (thus, the matrix A) becomes a vector.


(Training) Data

A Better Implementation in Python

5

! d#1 d#2 d&

i +()= +,-

i+ &

/)= 0((

()))

! += −[5(6)log / 6 (1 − 5(6)) log(1 − / 6 )]

d()= /

)−5(6)

d#1 += ;<(6)d(

)

d#2 += ;=(6)d(

)

d& += d()

! ! d#1 d#1

d#2 d#2 d& d&

#1 #1 ∝ d#1

#2 #2 ∝ d#2

& b ∝ db

@ = +,A + &A= 0(@)

dZ = A– Y

d +=(1/m)X dZT

dK =(1/m)np.sum(dZ)

+ + ∝ d+

K K ∝ dK

One iteration of gradient descent One iteration of gradient descent


Neural “Networks”


Neural Networks

• So far, we have seen only a “single neuron” (with two models)!• Linear model• Logistic regression model

• The performance of a single unit (single neuron) is limited!• Higher performance can be achieved by forming a network of

multiple neurons.


!"!#!$

%&

x

w

b

' = )*+ + - . = /(') ℒ(., &)

x

4["]

7["]'["] = 4["]+ + 7["] 8["] = /(9["]) '[#] = 4[#]:["] + -[#] .[#] = /('[#]) ℒ(.[#], &)

!"!#!$

%&

;[#]

-[#]

A Neuron vs. A Neural Network

<=>.?@

<=>.?@Lecture Notes for Computer Vision Sedat Ozer

A 2-layer FC-NN Example:

9

!"

!#

!$

!%

&'!(

Input Layer(L0)

Hidden Layer(L1)

Output Layer(L2)

a[1]x = a[0]

&' = a[2]

Each layer has its own weights and bias values. So… the kth layer would have: W[k] and b[k]

W[1] is a (3x5) matrix,b[1] is a (3x1) vector

W[2] is a (1x3) matrix (i.e., a row vector),b[2] is a (1x1) vector

&' = a[2]= )"[#]

1x1

&'11x1

=

-"["]=

.""

.#"

.("

.$"

.%"5x1

a vector

Layer

Unit

(Reads: the weight vector of the 1st unit at the first layer)

W[1] =."" .#" .(" .$" .%"."# .## .(# .$# .%#."( .#( .(( .$( .%(

1st unit in L1 (-"" /)

2nd unit in L1 (-#" /)

3rd unit in L1 (-(" /)

3x5

The Weight Matrix for the entire layer 1:

a[1]=

)"["]

)#["])(["]

3x1


[1]

What if I have more than 2 classes?• In logistic regression we assumed that we had only two classes: Class 0 & Class 1.• What if I have more than two classes?• One typical approach is: using one vs. all approach.

• (where, for example, you first consider Class 0 as one class and the combination of all the other classes as the “other” class. Then you consider Class 1 as one class and the combination of all the other classes as the “other class”,...)

• Another approach might be using Softmax Regression instead of logistic regression.• Define a new cost function and derive all the weight update rules according to that cost

function.


Softmax Regression

• Remember Logistic Regression: We had only two classes (Class 0 and Class 1, i.e., y(i)∈{0,1}).• Softmax Regression is the case where we have K classes ( K > 2 ) such

that: y(i)∈{1,…,K}. • Sigmoid function is no longer being used.• Now the output can take K different values rather than just two.

11

" #, % = 1()

*+,

-ℒ(a(1), 3(1))ℒ a, 3 = −)

*+,

K31567(8*)89 = :39 = 7(;9) =

<=>∑*+,@ <=A

Softmax Loss Function Softmax Cost FunctionSoftmax Activation Function


Lets have a look at an example:

13

!"#$%&"' (&) : +,

&'#"-. − 0.$!%ℎ: +2

ℎ"34. 4&(.: +5# 7.89""-4: +:

# 7$%ℎ9""-4: +; <$-&!= 4&(.

n: total number of features = 6

!"#$! $-.'&%&.4

4#ℎ""! >3$!&%=

0$!?$7&!&%=

)")3!$%&"': +@ A=1(-$9?.% )")3!$9&%=)Discrete value: popular/ not


A Fully Connected (FC) Neural Network (6 inputs, 1 output)

14

!"

!#

!$!%

&'

!(


!)


A Fully Connected (FC) Neural Network (6 inputs, 2 outputs)

15

!"

!#

!$!%

&'1

!(


!)&'2

a “unit” (a neuron)



16

!"

!#

!$

!%

&'!(

Input Layer(L0)

Hidden Layer(L1)

Output Layer(L2)

a[1]x = a[0]

&' = a[2]

a[1] =

)"["]

)#["])(["]

3x1

&' = a[2]= )"[#]1x1

&'11x1

=

W[1] =-"" -#" -(" -$" -%"-"# -## -(# -$# -%#-"( -#( -(( -$( -%(

3x5

The Weight Matrix for the entire layer 1:

=.(0"["]).(0#["]).(0(["])

3x1

= .(z[1]) 2"["]

2#["]2(["]

-"" -#" -(" -$" -%"-"# -## -(# -$# -%#-"( -#( -(( -$( -%(

!1!2!3!4!5

+ =0"["]

0#["]0(["] 3x13x1

5x13x5

x

z[1]= W[1] x + b[1] =


17

!"

!#

!$

!%

&'!(

Input Layer(L0)

Hidden Layer(L1)

Output Layer(L2)

a[1]x = a[0]

&' = a[2]

a[1] =

)"["]

)#["])(["]

3x1

&' = a[2]= )"[#]1x1

&'11x1

=

-"["]

-#["]-(["]

z[1]= W[1] x + b[1] =."" .#" .(" .$" .%"."# .## .(# .$# .%#."( .#( .(( .$( .%(

!1!2!3!4!5

+ =3"["]

3#["]3(["] 3x13x1

5x13x5

=4(3"["])4(3#["])4(3(["])

3x1

= 4(z[1])

x

z[1] =W[1] x + b[1]

a[1] = 4(z[1])

z[2] = W[2] a[1] + b[2]

a[2] = 4(z[2])

Steps to compute the output for logistic regression for one input sample:



Computation with less for-loop

18

for i = 1 to m! " ($) = ' " (($) + * "

+ " ($) = ,(! " $ )- . ($) = ' . + " ($) + / .

0 . ($) = ,(- . $ )

1 " = 2 " 3 + * "

4 " = ,(1 " )1 . = 2 . 4 " + * .

4 . = ,(1 . )

…5 = 6(") 6(.) 6(7)

0["](.)A["] = 0["](") 0["](7)…

m = number of training samples

6"6.6;

<=

Algorithm 1:

Algorithm 2:

(> " has the same shape as A["]) Lecture Notes for Computer Vision Sedat Ozer

Another 2 layer FC NN Example

19

!"

!#

!$

!%

&"["]

)*1

!+

!,

)*2

Input LayerWith 6 features

Hidden Layerwith 4 units

Output Layerwith 2 units

a[1]x = a[0]

)- = a[2]

&#["]

&,["]

&$["]

&"[#]

&#[#]

A 2 layer NN with 6 inputs and 2 outputs

a[1]=

&"["]

&#["]&,["]

&$["]4x1

)- = a[2]= &"[#]

&#[#]2x1

)*1)*2

2x1

=

QUESTION: What are the dims of W[1] and W[2]?Dims = (a x b); a=? b=?

a=4 and b=6 for W[1]


20

zSigmoid:

z

z

z

Common Activation Functions

ReLU:

tanh: Leaky ReLU:


a(z)a(z)

! = # $ = max(0, $)

! = # $ = ,- − ,/-,- + ,/-

! = # $ = max(0.01$, $)

! = # $ = 11 + ,/-

a(z)a(z)

Activation Function as: g(z)

21

! " = $ " % + ' "

( " = )(! " )! , = $ , ( " + ' ,

( , = )(! , )

Algorithm 2:

! " = $ " % + ' "

( " = -(! " )! , = $ , ( " + ' ,

( , = -(! , )

Algorithm 2:


With or Without the Activation Function

22

Lets have a look at the case where we do not use any activation function. (That is also equivalent to setting !(z[1]) = z[1] )

#$ = a[2]= %&[(]1x1

#$11x1

=

z[1] =W[1] x + b[1]

a[1] =!(z[1])

z[2] = W[2] a[1] + b[2]

a[2] =!(z[2])

#$ = z[2]= W x + b

z[1] =W[1] x + b[1]

a[1] = z[1]

z[2] = W[2] z[1] + b[2]

a[2] = z[2] = W[2] z[1] + b[2]

= W[2] [W[1] x + b[1]]+ b[2]

= W[2] W[1] x + W[2] b[1]+ b[2]

= W x + b

The output is always a linear function of the input!


+&+(+,

#$

• Remember that the updating process of the parameters depends on the derivatives!• That also depends on the derivative of the chosen activation function!• (we used sigmoid function previously in our logistic regression

implementation).


Derivatives for the Activation Functions

24

a(z)

z

z

a(z)

z

a(z)

z

a(z)Derivatives for the Activation Functions

ReLU: ! = # $ = max(0, $)

tanh: ! = # $ = ,- − ,/-,- + ,/-

Leaky ReLU: ! = # $ = max(0.01$, $)

Sigmoid: ! = # $ = 11 + ,/-

#3 - = 4# $4$ = # $ (1 − # $ ) = !(1 − !)

#3 - = 4# $4$ == (1 − (tanh $ )2) = (1 − !2)

#3 - = 4# $4$ =

0, 9: $ < 01, 9: $ > 0

=>4,:9>,4, 9: $ = 0

#3 - = 4# $4$ = 0.01, 9: $ < 0

1, 9: $ ≥ 0Lecture Notes for Computer Vision Sedat Ozer

25

![#]% + '[#] = ([#]

%

![#]

'[#]

) ( # = +[#] ℒ(+[.], y)![.]+[#] + '[.] = ([.] ) ( . = +[.]

![.]

'[.]

2([.] = +[.] − 4

2![.] = 2([.]+ # 5

2'[.] = 2([.]

2([#] = ! . 62([.] ∗ 8[#]′(z # )

2![#] = 2([#]%6

2'[#] = 2([#]

2;[.] = <[.] − =

2![.] =1?2;[.]< # 5

2'[.] =1?@A. CD?(2; . , +%EC = 1, FGGA2E?C = HIDG)

2;[#] = ! . 62;[.] ∗ 8[#]′(Z # )

2![#] =1?2;[#]K6

2'[#] =1?@A. CD?(2; # , +%EC = 1, FGGA2E?C = HIDG)


%#%.%L

M4

Sedat Bikent NN-Part2sedat/CS484_555/Sedat_Bikent_NN... · 2020. 2. 29. · Title:...

Documents

Transcript of Sedat Bikent NN-Part2sedat/CS484_555/Sedat_Bikent_NN... · 2020. 2. 29. · Title:...