Hierarchical Classification Rongcheng Lin Computer Science Department.

45
Hierarchical Classification Rongcheng Lin Computer Science Department

Transcript of Hierarchical Classification Rongcheng Lin Computer Science Department.

Page 1: Hierarchical Classification Rongcheng Lin Computer Science Department.

Hierarchical ClassificationRongcheng Lin

Computer Science Department

Page 2: Hierarchical Classification Rongcheng Lin Computer Science Department.

Contents

Motivation, Definition & Problem

Review of SVM

Hierarchical Classification

Path-based Approaches

Regularization-based Approaches

Page 3: Hierarchical Classification Rongcheng Lin Computer Science Department.

MotivationThe classes in real world are structured, specially often hierarchically related.

Gene function prediction Document categorization Image Search …

Hierarchies or taxonomies offer clear advantage in supporting tasks like browsing, searching or visualization International Patent Classification scheme Yahoo! Web catalogs …

Prior knowledge about class relationships will improve the classification performance, especially for tasks with large class number

Page 4: Hierarchical Classification Rongcheng Lin Computer Science Department.

MotivationThe classes in real world are structured, specially often hierarchically related.

Gene function prediction Document categorization Image Search …

Hierarchies or taxonomies offer clear advantage in supporting tasks like browsing, searching or visualization International Patent Classification scheme Yahoo! Web catalogs …

Prior knowledge about class relationships will improve the classification performance, especially for tasks with large class number

Page 5: Hierarchical Classification Rongcheng Lin Computer Science Department.

MotivationThe classes in real world are structured, specially often hierarchically related.

Gene function prediction Document categorization Image Search …

Hierarchies or taxonomies offer clear advantage in supporting tasks like browsing, searching or visualization International Patent Classification scheme Yahoo! Web catalogs …

Prior knowledge about class relationships will boost the classification performance, especially for tasks with large class number

Page 6: Hierarchical Classification Rongcheng Lin Computer Science Department.

Definition and Problemautomatically categorize data into pre-defined topic hierarchies or taxonomies Supervised Learning Structured Output

Page 7: Hierarchical Classification Rongcheng Lin Computer Science Department.

DAG and Tree Structure

Page 8: Hierarchical Classification Rongcheng Lin Computer Science Department.

Definition and Problemautomatically categorize data into pre-defined topic hierarchies or taxonomies Supervised Learning Structured Output

Problem and solution?

Page 9: Hierarchical Classification Rongcheng Lin Computer Science Department.

Definition and ProblemIncorporate the inter-class relationship(hierarchy) into classification

Redefine the problem

Lower level categories are more detailed while upper level categories are more general Redefine the margin

Different classification mistake are of different severity Redefine the loss function

Page 10: Hierarchical Classification Rongcheng Lin Computer Science Department.

Definition and ProblemIncorporate the inter-class relationship(hierarchy) into classification

Redefine the problem

Lower level categories are more detailed while upper level categories are more general Redefine the margin

Different classification mistake are of different severity Redefine the loss function

Page 11: Hierarchical Classification Rongcheng Lin Computer Science Department.

Definition and ProblemIncorporate the inter-class relationship(hierarchy) into classification

Redefine the problem

Lower level categories are more detailed while upper level categories are more general Redefine the margin

Different classification mistake are of different severity Redefine the loss function

Page 12: Hierarchical Classification Rongcheng Lin Computer Science Department.

Review: Binary SVMBinary classification

Margin

Loss Function

wTx + b = 0

wTx + b < 0wTx + b > 0

f(x) = sign(wTx + b)w

xw br i

T

𝐿( 𝑓 (𝑥 ) , 𝑦 )

Page 13: Hierarchical Classification Rongcheng Lin Computer Science Department.

Review: Binary SVM

𝐽 (𝑤 )=𝑅 (𝑤 )+ ∑𝑖=1 …𝑛

𝐿(𝑤 ,𝑥 𝑖 , 𝑦 𝑖)

General Form:

Page 14: Hierarchical Classification Rongcheng Lin Computer Science Department.

Review: Multiclass SVM1) one-vs-the rest2) Crammer & Singer (pairwise)

Page 15: Hierarchical Classification Rongcheng Lin Computer Science Department.

Review: Multiclass SVMDedicated Loss Function

Page 16: Hierarchical Classification Rongcheng Lin Computer Science Department.

Review: Multiclass SVMDedicated Loss Function

𝑀𝑎𝑟𝑔𝑖𝑛 :𝛾𝑖 (𝑤 )=𝑤𝑦 𝑖𝑇 𝑋 𝑖−𝑤𝑘

𝑇 𝑋 𝑖 for k ≠ 𝑦 𝑖

Page 17: Hierarchical Classification Rongcheng Lin Computer Science Department.

Review: Hinge Loss Function the more you violate the margin, the higher the penalty is.

Page 18: Hierarchical Classification Rongcheng Lin Computer Science Department.

Loss Function

Page 19: Hierarchical Classification Rongcheng Lin Computer Science Department.

Hierarchical ClassifiersPath-based Approaches

Large Margin Hierarchical Classification Hierarchical Document Categorization with Support Vector Machine On Large Margin Hierarchical Classification with multiple paths

Regularization-based Approaches Tree-Guided Group Lasso for Multi-task Regression Hierarchical Multitask Structured Output Learning for Large-Scale Segmentation

Page 20: Hierarchical Classification Rongcheng Lin Computer Science Department.

Tree DistanceA given hierarchy induces a metric over the set of classes tree distance or tree induced error

(y,) is defined to be the number of edges along the (unique) path from y to

Page 21: Hierarchical Classification Rongcheng Lin Computer Science Department.

Tree DistanceA given hierarchy induces a metric over the set of classes tree distance or tree induced error

(y,) is defined to be the number of edges along the (unique) path from y to

�̂�

y

𝛾 (𝑦 , �̂� )=4

Page 22: Hierarchical Classification Rongcheng Lin Computer Science Department.

Tree Distance

2

5 6 �̂� 8 9

1

0

3

y

4

𝐷 (𝑦 , �̂� )= 𝑓 4∗𝐶4 + 𝑓 1∗𝐶1+ 𝑓 3∗𝑐3

Page 23: Hierarchical Classification Rongcheng Lin Computer Science Department.

Loss Functions

1

1Zero-One Loss

Hinge Loss

Hierarchical Hinge Loss

𝐷( �̂� , 𝑦 ¿

𝐷( �̂� , 𝑦 ) 𝑓 𝑦 (𝑥 ) − 𝑓 �̂� (𝑥)

Page 24: Hierarchical Classification Rongcheng Lin Computer Science Department.

Path-based Approachespath-based approaches try to find the most likely path from the root.

Only need to update the parameters of miss-classified

nodes in the tree

Page 25: Hierarchical Classification Rongcheng Lin Computer Science Department.

Large margin hierarchical classifier

𝑓 𝑦 (𝑥 ) − 𝑓 �̂� (𝑥)

𝑛𝑜𝑡𝑒: 𝑦 𝑖𝑠 h𝑡 𝑒𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑙𝑎𝑏𝑒𝑙 𝑎𝑛𝑑 �̂� ≠ 𝑦

√𝛾(𝑦 , �̂� )  

√𝛾(𝑦 , �̂� )  

Page 26: Hierarchical Classification Rongcheng Lin Computer Science Department.

Training Algorithm

Page 27: Hierarchical Classification Rongcheng Lin Computer Science Department.

HSVM

Page 28: Hierarchical Classification Rongcheng Lin Computer Science Department.

HSVM

𝑓 𝑦 (𝑥 ) − 𝑓 �̂� (𝑥)1

Δ (𝑦 𝑖 , 𝑦 )

Page 29: Hierarchical Classification Rongcheng Lin Computer Science Department.

HSVM

1

Δ (𝑦 𝑖 , 𝑦 )

Page 30: Hierarchical Classification Rongcheng Lin Computer Science Department.

Regularization-based ApproachesK individual classification tasks

Use a n additional regularization term to penalizes the disagreement between the individual models

Page 31: Hierarchical Classification Rongcheng Lin Computer Science Department.

Multitask Learning

Inductions of multiple tasks are performed simultaneously to capture intrinsic relatedness

Page 32: Hierarchical Classification Rongcheng Lin Computer Science Department.
Page 33: Hierarchical Classification Rongcheng Lin Computer Science Department.

L1-Norm, L2-Norm

Penalize model complexity to avoid overfitting

L-1 Norm give more sparse estimate than L-2 Norm

Page 34: Hierarchical Classification Rongcheng Lin Computer Science Department.

Group Lasso and Sparse Group Lasso

Page 35: Hierarchical Classification Rongcheng Lin Computer Science Department.

HMTL: Hierarchical Multitask Learning

determine the contribution of regularization from the origin vs. the parent node’s parameters (i.e., the strength of coupling between the node and its parent)

Page 36: Hierarchical Classification Rongcheng Lin Computer Science Department.

HMTL

Page 37: Hierarchical Classification Rongcheng Lin Computer Science Department.

Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsity

Original Approach:

New Approach:

Note:

Page 38: Hierarchical Classification Rongcheng Lin Computer Science Department.

Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsityeach leaf node is a class

each inner node is a group of classes

Page 39: Hierarchical Classification Rongcheng Lin Computer Science Department.
Page 40: Hierarchical Classification Rongcheng Lin Computer Science Department.

Tree-Guided Group Lasso

Page 41: Hierarchical Classification Rongcheng Lin Computer Science Department.
Page 42: Hierarchical Classification Rongcheng Lin Computer Science Department.
Page 43: Hierarchical Classification Rongcheng Lin Computer Science Department.
Page 44: Hierarchical Classification Rongcheng Lin Computer Science Department.

Advantages and DrawbacksAssume children is good

Assume parent is good

Assume both are not good

Page 45: Hierarchical Classification Rongcheng Lin Computer Science Department.

Advantages and DrawbacksAssume children is good

Tree Guided Group Lasso

Assume parent is good HMTL

Assume both are not good Path-based

It depends!