Cost-aware Pre-training for Multiclass Cost-sensitive Deep...

1
www.postersession.com Cost-aware Pre-training for Multiclass Cost-sensitive Deep Learning Yu-An Chung 1 Hsuan-Tien Lin 1 Shao-Wen Yang 2 1 Dept. of Computer Science and Information Engineering 2 Intel Labs National Taiwan University, Taipei, Taiwan Intel Corporation, USA Cost-sensitive Classification What is the status of the patient? H1N1-infected Cold-infected Healthy l Cost of each kind of mis-prediction: C = 0 1000 100 0 3000 100 30 0 Healthy Cold H1N1 H1N1 Cold Healthy Predicted Actual Predict H1N1-infected as Healthy: very high cost! Predict Cold-infected as Healthy: high cost Predict correctly: no cost l Input: a training set = x + , + +./ 0 and a Cost Matrix C, where x + ∈ , + ∈= 1,2,…, , and C , is the cost of classifying a class example as class l Goal: Use and C to train a classifier : ⇢ such that the expected cost C , x on test example x , is minimal Our Goal & Contributions Shallow Models (e.g., SVM) Deep Learning Regular (Cost-insensitive) Classification Well-studied Popular and undergoing Cost-sensitive Classification Well-studied Our work lies here! l First work that studies thoroughly on Cost-sensitive Deep Learning 1) a novel cost-sensitive loss function for any deep model 2) a Cost-sensitive Autoencoder (CAE) equipped with the loss function for pre-training fully-connected deep model 3) a combination of 1) and 2) as a complete cost-sensitive deep learning (CSDNN) solution The Input-to-Cost Regression Network l Regression network: estimate the costs l Train a regression network any end-to-end loss function for regression (e.g., MSE linear regression) could be applied a loss function built on top of [Tu and Lin, 2010] is derived in this work, given a training set = x + , + +./ 0 and C, we define +,= ≡ ln 1 + exp +,= E = x + C + , , where +,= ≡2 c + = C + , − 1. train the regression network by minimizing the derived Cost-Sensitive Loss (CSL) over the training set : LMN =O O +,= P =./ 0 +./ l Prediction: x ≡ argmin /V=VP = x Cost-sensitive Autoencoder (CAE) l Autoencoder (AE): pre-training a fully-connected neural network (FCNN) for regular classification l Cost-sensitive Autoencoder (CAE): pre-training the DNN for cost-sensitive classification Autoencoder (AE) Goal: to reconstruct the original input x Reconstructed errors measured by the cross- entropy loss LW Cost-sensitive Autoencoder (CAE) Goal: to reconstruct both the original input x and the cost information C ,: Mixture reconstructed errors: LXW = 1− E LW +E LMN Conclusions l CSL: make any deep model cost-sensitive (see paper for CNN with CSL) l CSDNN = CAE pre-training + CSL training: both techniques lead to significant improvements Cost-aware Experiments l FCNN: traditional fully-connected neural network for regular classification l FCNN_CSL: the fully-connected regression network trained by the loss function LMN l The proposed Cost-sensitive Deep Neural Network (CSDNN)

Transcript of Cost-aware Pre-training for Multiclass Cost-sensitive Deep...

Page 1: Cost-aware Pre-training for Multiclass Cost-sensitive Deep ...people.csail.mit.edu/andyyuan/docs/ijcai-16.csdnn.poster.pdfl First work that studies thoroughly on Cost-sensitive Deep

www.postersession.com

Cost-aware Pre-training for MulticlassCost-sensitive Deep Learning

Yu-An Chung1 Hsuan-Tien Lin1 Shao-Wen Yang2

1 Dept. of Computer Science and Information Engineering 2 Intel LabsNational Taiwan University, Taipei, Taiwan Intel Corporation, USA

Cost-sensitive Classification

What is the status of the patient?

H1N1-infected Cold-infected Healthy

l Cost of each kind of mis-prediction:

C = 0 1000 𝟏𝟎𝟎𝟎𝟎𝟎100 0 3000100 30 0Healthy

Cold

H1N1H1N1 Cold Healthy

PredictedActual

Predict H1N1-infectedas Healthy: very high cost!

Predict Cold-infectedas Healthy: high cost

Predict correctly: no cost

l Input: a training set 𝑆 = x+, 𝑦+ +./0 and a Cost Matrix C, where x+ ∈ 𝒳,𝑦+∈ 𝒴 = 1,2,… , 𝐾 ,

and C 𝑖, 𝑗 is the cost of classifying a class 𝑖 example as class 𝑗

l Goal: Use 𝑆 and C to train a classifier 𝑔:𝒳 ⇢ 𝒴 such that the expected cost C 𝑦, 𝑔 x ontest example x, 𝑦 is minimal

Our Goal & ContributionsShallow Models (e.g., SVM) Deep Learning

Regular (Cost-insensitive) Classification Well-studied Popular and undergoing

Cost-sensitive Classification Well-studied Our work lies here!

l First work that studies thoroughly on Cost-sensitive Deep Learning1) a novel cost-sensitive loss function for any deep model

2) a Cost-sensitive Autoencoder (CAE) equipped with the loss function for pre-training

fully-connected deep model

3) a combination of 1) and 2) as a complete cost-sensitive deep learning (CSDNN) solution

The Input-to-Cost Regression Network

l Regression network: estimate the costsl Train a regression network

• any end-to-end loss function for regression (e.g., MSE linear regression) could be applied• a loss function built on top of [Tu and Lin, 2010] is derived in this work, given a training set

𝑆 = x+, 𝑦+ +./0 and C, we define

𝛿+,= ≡ ln 1 + exp 𝑧+,= E 𝑟= x+ − C 𝑦+, 𝑘 ,

where 𝑧+,= ≡ 2 c+ 𝑘 = C 𝑦+, 𝑘 − 1.• train the regression network by minimizing the derived Cost-Sensitive Loss (CSL) over the

training set 𝑆:

𝐿LMN =O O 𝛿+,=P

=./

0

+./l Prediction: 𝑔 x ≡ argmin

/V=VP𝑟= x

Cost-sensitive Autoencoder (CAE)l Autoencoder (AE): pre-training a fully-connected neural network (FCNN) for regular classification

l Cost-sensitive Autoencoder (CAE): pre-training the DNN for cost-sensitive classification

Autoencoder (AE)• Goal: to reconstruct the original input x

• Reconstructed errors measured by the cross-

entropy loss 𝐿LW

Cost-sensitive Autoencoder(CAE)

• Goal: to reconstruct both the

original input xand the

cost information C 𝑦, :

• Mixture reconstructed errors:

𝐿LXW 𝑆 = 1 − 𝛼 E 𝐿LW + 𝛼 E 𝐿LMN

Conclusionsl CSL: make any deep model cost-sensitive (see paper for CNN with CSL)

l CSDNN = CAE pre-training + CSL training: both techniques lead to significant improvements

Cost-aware

Experimentsl FCNN: traditional fully-connected neural network for regular classification

l FCNN_CSL: the fully-connected regression network trained by the loss function 𝐿LMNl The proposed Cost-sensitive Deep Neural Network (CSDNN)