The detailed derivation of the derivatives in Table 2 of Marginalized Denoising Auto-encoders for...

2

Click here to load reader

description

The detailed derivation of the derivatives in Table 2 of Marginalized Denoising Auto-encoders for Nonlinear Representations by M. Chen, K. Weinberger, F. Sha, and Y. Bengio http://www.cse.wustl.edu/~mchen/papers/deepmsda.pdf

Transcript of The detailed derivation of the derivatives in Table 2 of Marginalized Denoising Auto-encoders for...

Page 1: The detailed derivation of the derivatives in Table 2 of Marginalized Denoising Auto-encoders for Nonlinear Representations by M. Chen, K. Weinberger, F. Sha, and Y. Bengio

The detailed derivation of the derivatives in Table 2 ofMarginalized Denoising Auto-encoders for Nonlinear Representations

by M. Chen, K. Weinberger, F. Sha, and Y. Bengio

Tomonari MASADA @ Nagasaki University

October 14, 2014

The derivative ∂zh∂x̃d

can be obtained as follows:

z = σ(Wx̃+ b

)=

1

1 + exp(−Wx̃− b)(1)

∴ ∂zh∂x̃d

=∂

∂x̃d

1

1 + exp(−∑

d whdx̃d − bh)

=whd exp(−

∑d whdx̃d − bh)

{1 + exp(−∑

d whdx̃d − bh)}2

=1

1 + exp(−∑

d whdx̃d − bh)·{1− 1

1 + exp(−∑

d whdx̃d − bh)

}· whd

= zh(1− zh)whd . (2)

For the cross-entropy loss, we obtain the following:

ℓ(x, fθ(x̃)

)= −x⊤ log σ(W⊤z+ b′)− (1− x)⊤ log

{1− σ(W⊤z+ b′)

}= −x⊤ log

{1

1 + exp(−W⊤z− b′)

}− (1− x)⊤ log

{exp(−W⊤z− b′)

1 + exp(−W⊤z− b′)

}= x⊤ log{1 + exp(−W⊤z− b′)} − (1− x)⊤(−W⊤z− b′) + (1− x)⊤ log

{1 + exp(−W⊤z− b′)

}= −(1− x)⊤(−W⊤z− b′) + 1⊤ log

{1 + exp(−W⊤z− b′)

}= −

∑d

(1− xd)(−∑h

whdzh − b′d

)+∑d

log

{1 + exp

(−∑h

whdzh − b′d

)}(3)

∴ ∂ℓ

∂zh=

∑d

(1− xd)whd −∑d

whd exp(−∑

h whdzh − b′d)

1 + exp(−∑

h whdzh − b′d)(4)

∴ ∂2ℓ

∂z2h= − ∂

∂zh

∑d

whd exp(−∑

h whdzh − b′d)

1 + exp(−∑

h whdzh − b′d)

=∑d

w2hd exp(−

∑h whdzh − b′d)

1 + exp(−∑

h whdzh − b′d)−∑d

w2hd{exp(−

∑h whdzh − b′d)}2

{1 + exp(−∑

h whdzh − b′d)}2

=∑d

w2hd exp(−

∑h whdzh − b′d)

{1 + exp(−∑

h whdzh − b′d)}2

=∑d

(1

1 + exp(−∑

h whdzh − b′d)

)(1− 1

1 + exp(−∑

h whdzh − b′d)

)w2

hd

=∑d

yd(1− yd)w2hd . (5)

1

Page 2: The detailed derivation of the derivatives in Table 2 of Marginalized Denoising Auto-encoders for Nonlinear Representations by M. Chen, K. Weinberger, F. Sha, and Y. Bengio

For the squared loss, we obtain the following:

ℓ(x, fθ(x̃)

)= ∥x− (W⊤z+ b′)∥2 =

∑d

{xd −

(∑h

whdzh + b′d

)}2

(6)

∴ ∂ℓ

∂zh=

∂zh

∑d

{xd −

(∑h

whdzh + b′d

)}2

= −2∑d

whd

{xd −

(∑h

whdzh + b′d

)}(7)

∴ ∂2ℓ

∂z2h= − ∂

∂zh2∑d

whd

{xd −

(∑h

whdzh + b′d

)}= 2

∑d

w2hd . (8)

2