Output Units and Cost Function in FNN

58
Introduction Output Units and Cost Functions Deterministic and Generic Model Concludsions and Discussions Deep Neural Network Cost Functions and Output Units Jiaming Lin [email protected] DATALab@III NetDBLab@NTU January 9, 2017 1 / 28 Jiaming Lin [email protected] Deep Neural Network

Transcript of Output Units and Cost Function in FNN

Page 1: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

Deep Neural NetworkCost Functions and Output Units

Jiaming [email protected]

DATALab@IIINetDBLab@NTU

January 9, 2017

1 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 2: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

Outline

1 Introduction

2 Output Units and Cost FunctionsBinaryMultinoulli

3 Deterministic and Generic Model

4 Concludsions and Discussions

2 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 3: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

Introduction

In the neural network learning...

The selection of output unit depends on the learningproblems.

– Classification: sigmoid, softmax or linear.– Linear Regression: linear.

Determine and analyse the cost function.

– Is the cost function †analytic?– Can the learning progress well(first order derivative)?

Deterministic and Generic Model.

– Data is more complicated in many cases.

Note: †For simplicity, we mean analytic to say a function isinfinitely differentiable on the domain.

3 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 4: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

Introduction

In the neural network learning...

The selection of output unit depends on the learningproblems.

– Classification: sigmoid, softmax or linear.– Linear Regression: linear.

Determine and analyse the cost function.

– Is the cost function †analytic?– Can the learning progress well(first order derivative)?

Deterministic and Generic Model.

– Data is more complicated in many cases.

Note: †For simplicity, we mean analytic to say a function isinfinitely differentiable on the domain.

3 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 5: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

Introduction

In the neural network learning...

The selection of output unit depends on the learningproblems.

– Classification: sigmoid, softmax or linear.– Linear Regression: linear.

Determine and analyse the cost function.

– Is the cost function †analytic?– Can the learning progress well(first order derivative)?

Deterministic and Generic Model.

– Data is more complicated in many cases.

Note: †For simplicity, we mean analytic to say a function isinfinitely differentiable on the domain.

3 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 6: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

BinaryMultinoulli

Outline

1 Introduction

2 Output Units and Cost FunctionsBinaryMultinoulli

3 Deterministic and Generic Model

4 Concludsions and Discussions

4 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 7: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

BinaryMultinoulli

Outline

1 Introduction

2 Output Units and Cost FunctionsBinaryMultinoulli

3 Deterministic and Generic Model

4 Concludsions and Discussions

5 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 8: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

BinaryMultinoulli

Binary

index x1 · · · xn target1 0 · · · 1 Class A2 1 · · · 0 Class B3 1 · · · 1 Class A· · · · · · · · · · · · · · ·m 0 · · · 0 Class B

6 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 9: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

BinaryMultinoulli

Binary

whereS is the sigmoid function,z is the input of output layer

z = w>h+ b (1)

with w is weight, h is output of hidden layer and b is bias.6 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 10: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

BinaryMultinoulli

Cost Function

Cost function can be derived from many methods, we discusstwo of the most common

Mean Square Error

Cross Entropy

7 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 11: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

BinaryMultinoulli

Cost Function

Cost function can be derived from many methods, we discusstwo of the most common

Mean Square Error

Let y(i) denotes the data label, and y(i) = S(z(i)) as theprediction. We may define the cost function Cmse by

Cmse =1

m

m∑i=1

(y(i) − y(i))2 (2)

where m is the data size, and z(i), y(i) and y(i) are realnumbers.

7 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 12: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

BinaryMultinoulli

Cost Function

Cost function can be derived from many methods, we discusstwo of the most common

Cross Entropy

Adapting the symbols above, the cost function defined byCross Entropy is

Cce =1

m

m∑i=1

y(i) ln(y(i)) + (1− y(i)) ln(1− y(i)) (2)

where m is the data size, and z(i), y(i) and y(i) are realnumbers.

7 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 13: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

BinaryMultinoulli

Comparison between MSE and Cross Entropy

Problem: Which one is better?

Analyticity(infinitely differentiable)

Learning ability(first order derivatives)

8 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 14: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

BinaryMultinoulli

Comparison between MSE and Cross Entropy

Analyticity:

Cmse =1

m

m∑i=1

(y(i) − y(i))2

Cce =1

m

m∑i=1

y(i) ln(y(i)) + (1− y(i)) ln(1− y(i))

Computationally, the value of y(i) = S(z(i)) could overflow to1 or underflow to 0 when z(i) is very positive or very negative.Therefore, given a fixed y(i) ∈ {0, 1},

Cce is undefined at y(i) is 0 or 1.

Cmse is polynomial and thus analytic every where.

8 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 15: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

BinaryMultinoulli

Comparison between MSE and Cross Entropy

Learning Ability: compare the gradients

∂Cmse∂w

= [S(z)− y] [1− S(z)]S(z)h, (3)

∂Cce∂w

= [y − S(z)]h (4)

respectively, where S is sigmoid, z = w>h+ b.

8 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 16: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

BinaryMultinoulli

Comparison between MSE and Cross Entropy

MSE Cross Entropy[S(z)− y] [1− S(z)]S(z)h [y − S(z)]h

If y = 1 and y → 1,steps → 0

If y = 1 and y → 0,steps → 0

If y = 0 and y → 1,steps → 0

If y = 0 and y → 0,steps → 0

If y = 1 and y → 1,steps → 0

If y = 1 and y → 0,steps → 1

If y = 0 and y → 1,steps → −1

If y = 0 and y → 0,steps → 0

In the ceas of Mean Square Error, the progress get stuck whenz is very positive or very negative.

9 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 17: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

BinaryMultinoulli

Comparison between MSE and Cross Entropy

MSE Cross Entropy[S(z)− y] [1− S(z)]S(z)h [y − S(z)]h

If y = 1 and y → 1,steps → 0

If y = 1 and y → 0,steps → 0

If y = 0 and y → 1,steps → 0

If y = 0 and y → 0,steps → 0

If y = 1 and y → 1,steps → 0

If y = 1 and y → 0,steps → 1

If y = 0 and y → 1,steps → −1

If y = 0 and y → 0,steps → 0

In the ceas of Mean Square Error, the progress get stuck whenz is very positive or very negative.

9 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 18: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

BinaryMultinoulli

The Unstable Issue in Cross Entropy

We have mentioned about the unstable issue of crossentropy.

Precisely,

y = S(z) underflow to 0 when z is very negative,

y = S(z) overflow to 1 when z is very positive.

Therefore, given a fixed y ∈ {0, 1}, then the function

C = y ln y + (1− y) ln(1− y)

could be undefined when z is very positive or verynegative.

10 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 19: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

BinaryMultinoulli

The Unstable Issue in Cross Entropy

We have mentioned about the unstable issue of crossentropy.

Precisely,

y = S(z) underflow to 0 when z is very negative,

y = S(z) overflow to 1 when z is very positive.

Therefore, given a fixed y ∈ {0, 1}, then the function

C = y ln y + (1− y) ln(1− y)

could be undefined when z is very positive or verynegative.

10 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 20: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

BinaryMultinoulli

The Unstable Issue in Cross Entropy

Alternatively, regarding z as the variable of cross entropy

C = y lnS(z) + (1− y) ln(1− S(z)) (5)

= −ζ(−z) + z(y − 1), (6)

where ζ is the softplus and z is real number.

11 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 21: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

BinaryMultinoulli

The Unstable Issue in Cross Entropy

Alternatively, regarding z as the variable of cross entropy

C = y lnS(z) + (1− y) ln(1− S(z)) (5)

= −ζ(−z) + z(y − 1), (6)

where ζ is the softplus and z is real number.

We may obtain the analyticity of C by showing the dCdz

ismultiple of analytic functions.

11 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 22: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

BinaryMultinoulli

The Unstable Issue in Cross Entropy

Alternatively, regarding z as the variable of cross entropy

C = y lnS(z) + (1− y) ln(1− S(z)) (5)

= −ζ(−z) + z(y − 1), (6)

where ζ is the softplus and z is real number.In the cases of right answer

y = 1 and y = S(z)→ 1⇒ z →∞, C → 0,

y = 0 and y = S(z)→ 0⇒ z → −∞, C → 0.

In the cases of wrong answer

y = 1 and y = S(z)→ 0⇒ z → −∞,∇C → −1,

y = 0 and y = S(z)→ 1⇒ z →∞,∇C → −1.

11 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 23: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

BinaryMultinoulli

Outline

1 Introduction

2 Output Units and Cost FunctionsBinaryMultinoulli

3 Deterministic and Generic Model

4 Concludsions and Discussions

12 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 24: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

BinaryMultinoulli

Multinoulli: Output Unit and Cost Function

Generalize the binary case to multiple classes.Linear output units and #(output units) = #(classes).Cost function evaluated by cross entropy.

Cost Function in Multinoulli Problems

Suppose the size of dataset is m and there are K classes, thenwe can obtain the cost function from cross entropy

C(w) = −

[m∑i=1

K∑k=1

1{y(i) = k} lnexp(z

(i)k )∑K

j=1 exp(z(i)j )

](7)

where z(i)k = w>k h

(i) + bk and h(i) is the output of hidden layercorresponding to example data xi.

13 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 25: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

BinaryMultinoulli

Multinoulli: Output Unit and Cost Function

Generalize the binary case to multiple classes.Linear output units and #(output units) = #(classes).Cost function evaluated by cross entropy.

Cost Function in Multinoulli Problems

Suppose the size of dataset is m and there are K classes, thenwe can obtain the cost function from cross entropy

C(w) = −

[m∑i=1

K∑k=1

1{y(i) = k} lnexp(z

(i)k )∑K

j=1 exp(z(i)j )

](7)

where z(i)k = w>k h

(i) + bk and h(i) is the output of hidden layercorresponding to example data xi.

13 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 26: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

BinaryMultinoulli

A Lemma for Cost Function Simplify

Analyticity(infinitely differentiable)

Learning ability(first order derivatives)

14 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 27: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

BinaryMultinoulli

A Lemma for Cost Function Simplify

Analyticity(infinitely differentiable)

Learning ability(first order derivatives)

To claim above properties, We should show a lemma at veryfirst,

Lemma 1

For the output z = w>h+ b and z = [z1, . . . , zK], we have

supz

(ln

K∑j=1

exp(zj)

)= max

j{zj}. (8)

14 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 28: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

BinaryMultinoulli

A Lemma for Cost Function Simplify

Proof.

Without loss of generality, we may assume z1 > . . . > zK ,then the remaining work is to show, for all ε > 0.

ln

[ez1

(1 +

K∑j=2

ezj−z1

)]= z1 + ln

(1 +

K∑j=2

ezj−z1

)≤ z1 + ε

Intuitively, the ln∑∑∑K

j=1exp (zj) can be well approximated

by maxj

{zj}.

14 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 29: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

BinaryMultinoulli

Analyticity

We may rewrite the cost function as

C(w) = −

{m∑i=1

K∑k=1

1{y(i) = k}

[z(i)k − ln

K∑j=1

exp(z(i)j )

]}.

For each summand, it is substraction of analytic function andthus analytic, and the term 1{y(i) = k} is acturally a constant.The total cost is summation of analytic functions and thusanalytic.

15 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 30: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

BinaryMultinoulli

Learning Ability

Property 2

By the rule of sum in derivatives, we may simplify the (7) asfollowing

C(i) =K∑k=1

1{y = k}

[zk − ln

K∑j=1

exp(zj)

], (8)

this cost is contributed by the example xi in the total cost C.

1 Assume the model gives the right answer, then theerrors would close to 0.

2 Assume the model gives the wrong answer, then thelearning can prograss well.

16 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 31: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

BinaryMultinoulli

Learning Ability

Proof (The Right Answer).

Suppose the true label is class n. By the assumption, weknow zn is the maxmal. Then

−ε ≤K∑k=1

1{y = k}

[zk − ln

K∑j=1

exp(zj)

]

= zn − lnK∑j=1

exp(zj)

< zn −maxj{zj} = 0.

This shows that −ε ≤ C(i) < 0 for an arbitrary small ε.16 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 32: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

BinaryMultinoulli

Learning Ability

Proof (The Wrong Answer).

Suppose the true label is class n. By assumption, theprediction zn given by model is not the maxmal. On the otherhand, using the fact

zn 6= maxj{zj} ⇒ softmax(zn) � 1.

This implies that there exist a sufficient large δ > 0 such that| softmax(zn)− 1 |> δ.

16 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 33: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

BinaryMultinoulli

Learning Ability

Proof (The Wrong Answer, Conti.)

Then

∂C(i)

∂zn=

∂zn

[zn − ln

K∑j=1

ezj

]= 1− softmax(zn)

> δ

This shows the gradient is sufficently large and alsopredictable(bounded by 1), therefore the learning can progresswell.

16 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 34: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

Outline

1 Introduction

2 Output Units and Cost FunctionsBinaryMultinoulli

3 Deterministic and Generic Model

4 Concludsions and Discussions

17 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 35: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

Learning Processes Overview

Deterministic GenericStep1 Model function

Linear

Sigmoid

Probability distribution

Gaussian

BernoulliStep2 Design errors evals

MSE

Cross Entropy

Maximum Likelihood Es-timate

Step3 Learning one statistic

Mean

Median

Learning full distribution

To describe some complicate data, it’s easier to build modelwith generic method.

18 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 36: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

Learning Processes Overview

Deterministic GenericStep1 Model function

Linear

Sigmoid

Probability distribution

Gaussian

BernoulliStep2 Design errors evals

MSE

Cross Entropy

Maximum Likelihood Es-timate

Step3 Learning one statistic

Mean

Median

Learning full distribution

To describe some complicate data, it’s easier to build modelwith generic method.

18 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 37: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

Generic Modeling for Binary Classification

Step1: Using Bernoulli distribution as likelihood function.

p(y | x) = py(1− p)1−y

= S(z)y(1− S(z))1−y

Step2: Minimizing negative log-likelihood

lnp(y | x(i)) = y lnS(z) + (1− y) ln(1− S(z))

Step3: We an learn the full distribution.

p(y | x′) = S(z′)y(1− S(z′))1−y,

where we denote z′ = w>x′ + b and S is sigmoid.

19 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 38: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

Generic Modeling for Linear Regression: Step1

Given a training feature x, using Gaussian distribution aslikelihood function

20 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 39: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

Generic Modeling for Linear Regression: Step1

Given a training feature x, using Gaussian distribution aslikelihood function

p(y | x) =1√

2σ2πexp

(−(µ− y)2

2σ2

),

where we denote the output of hidden layer as hx, weightw = [w1, w2] and bias b = [b1, b2], then

µ = w>1 hx + b1

σ = w>2 hx + b2

Intuitively, µ and σ are two linear output units, they arefunctions of x.

20 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 40: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

Generic Modeling for Linear Regression: Step2

Recall that the maximum likelihood estimate is equivalent tominimize the negative log-likelihood, that is

(µ, σ) = arg min(µ,σ)

(−∑x

lnp(y | x)

)(8)

However, for each summand,

Cx = lnp(y | x) =−1

2

[ln(2πσ2) +

(µ− y)2

σ2

]∂Cx∂σ

= (πσ)−1 − 2σ−3(µ− y)

the gradients and errors become unstable when σ close 0.

21 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 41: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

Generic Modeling for Linear Regression: Step2

Recall that the maximum likelihood estimate is equivalent tominimize the negative log-likelihood, that is

(µ, σ) = arg min(µ,σ)

(−∑x

lnp(y | x)

)(8)

However, for each summand,

Cx = lnp(y | x) =−1

2

[ln(2πσ2) +

(µ− y)2

σ2

]∂Cx∂σ

= (πσ)−1 − 2σ−3(µ− y)

the gradients and errors become unstable when σ close 0.21 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 42: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

Generic Modeling for Linear Regression: Step2

To prevent the gradients and errors from being unstable, wemay substitute the term 1

2σ2 with v, then for each summand inthe negative log-likelihood

Cx = lnπ − ln v − (µ− y)2v,

∂Cx∂µ

= −2v(µ− y),

∂Cx∂v

=1

v− (µ− y)2.

Note that, this substitution valid only when the variance isn’ttoo large.

22 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 43: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

Generic Modeling for Linear Regression: Step2

If the variance σ is fixed and chosen by user, then bycomparing the negative log-likelihood and MSE, we can seethat minimizing NLL is equivalent to minimizing MSE.

Cmse =1

m

m∑i=1

‖y(i) − y(y)‖2

Cnll =m∑i=1

Cx(i)

=−1

2

[m ln(2πσ2) +

m∑i=1

‖µx(i) − y(i)‖2

σ2

]

22 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 44: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

Generic Modeling for Linear Regression: Step3

Full distribution from Generic, µ and σ in this case.

Single statistics from Deterministic, µ in this case.

23 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 45: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

Generic Modeling for Linear Regression: Step3

Full distribution from Generic, µ and σ in this case.

Single statistics from Deterministic, µ in this case.

Experiment(ref): generate random data base on the formula

y = x+ 7.0 sin(0.75x) + ε

where ε is the gaussian noise with µ = 0, σ = 1

23 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 46: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

Generic Modeling for Linear Regression: Step3

Full distribution from Generic, µ and σ in this case.Single statistics from Deterministic, µ in this case.

FNN config:#(hidden layey) = 1, width = 20 and hidden unit is tanh.

Gerneric Deterministic

23 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 47: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

More Complicated Cases

Complicated data distributions.

In some cases, it’s almost impossible to describe data viadeterministic methods.

Generic methods might perform better in complicatedcase.

24 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 48: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

Mixture Density Network

Generate random data based on the formula

x = y + 7.0 sin(0.75y) + ε

where ε is the gaussian noise with µ = 0, σ = 1

25 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 49: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

Mixture Density Network

Firstly, just try to using MSE to define cost function and onehidden layer with width = 20, hidden unit is tanh.

25 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 50: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

Mixture Density Network

Firstly, just try to using MSE to define cost function and onehidden layer with width = 20, hidden unit is tanh.

The reason is, minimizing MSE isequivalant to minimizing nagetive log-likelihood for simpleGaussian.

25 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 51: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

Mixture Density Network

The mixture density network. The Gaussian mixture with ncomponents is defined by the conditional probabilitydistribution

p(y | x) =n∑i=1

p(c = i|x)ℵ(y;µ(i)(x); Σ(i)(x)). (9)

Network configuration,

1 Number of components n, need to be fine tuned(try anderror).

2 3× n output units.

25 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 52: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

Mixture Density Network

Experiment(ref):

#(components) = 24,

two hidden layers with width = 24 and activation is tanh,

#(output units) = 3× 24 and they are linear.

25 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 53: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

Outline

1 Introduction

2 Output Units and Cost FunctionsBinaryMultinoulli

3 Deterministic and Generic Model

4 Concludsions and Discussions

26 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 54: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

In classification problems, cross entropy is naturallygood to evaluate errors than other methods.

An cross entropy improvement to avoid numericallyunstable.

– The MNIST example from Tensorflow.

Determine the cost function is good or not.

– Is the cost function analytic?– Can the learning progress well?

Deterministic v.s. Generic

– Deterministic learns single statistic while generic learnfull distribution.

– When data distribution is not normal(high kurtosis or fattail), generic might be better.

– Generic methods is easier to apply to complicated cases.

27 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 55: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

In classification problems, cross entropy is naturallygood to evaluate errors than other methods.

An cross entropy improvement to avoid numericallyunstable.

– The MNIST example from Tensorflow.

Determine the cost function is good or not.

– Is the cost function analytic?– Can the learning progress well?

Deterministic v.s. Generic

– Deterministic learns single statistic while generic learnfull distribution.

– When data distribution is not normal(high kurtosis or fattail), generic might be better.

– Generic methods is easier to apply to complicated cases.

27 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 56: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

In classification problems, cross entropy is naturallygood to evaluate errors than other methods.

An cross entropy improvement to avoid numericallyunstable.

– The MNIST example from Tensorflow.

Determine the cost function is good or not.

– Is the cost function analytic?– Can the learning progress well?

Deterministic v.s. Generic

– Deterministic learns single statistic while generic learnfull distribution.

– When data distribution is not normal(high kurtosis or fattail), generic might be better.

– Generic methods is easier to apply to complicated cases.

27 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 57: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

In classification problems, cross entropy is naturallygood to evaluate errors than other methods.

An cross entropy improvement to avoid numericallyunstable.

– The MNIST example from Tensorflow.

Determine the cost function is good or not.

– Is the cost function analytic?– Can the learning progress well?

Deterministic v.s. Generic

– Deterministic learns single statistic while generic learnfull distribution.

– When data distribution is not normal(high kurtosis or fattail), generic might be better.

– Generic methods is easier to apply to complicated cases.

27 / 28 Jiaming Lin [email protected] Deep Neural Network

Page 58: Output Units and Cost Function in FNN

IntroductionOutput Units and Cost FunctionsDeterministic and Generic Model

Concludsions and Discussions

Thank you.

28 / 28 Jiaming Lin [email protected] Deep Neural Network