Experimental Setup, Multi-class vs. Multi-label ... · perform final evaluation on test, using the...

Experimental Setup,Multi-class vs. Multi-label classification,

andEvaluation

CMSC 678UMBC

Central Question: How Well Are We Doing?

Classification

Regression

Clustering

the task: what kindof problem are you

solving?

• Precision, Recall, F1

• Accuracy• Log-loss• ROC-AUC• …

• (Root) Mean Square Error• Mean Absolute Error• …

• Mutual Information• V-score• …

Central Question: How Well Are We Doing?

Classification

Regression

Clustering

the task: what kindof problem are you

solving?

• Precision, Recall, F1

• Accuracy• Log-loss• ROC-AUC• …

• (Root) Mean Square Error• Mean Absolute Error• …

• Mutual Information• V-score• …

This does not have to be the same thing as the

loss function

you optimize

Outline

Experimental Design: Rule 1

Multi-class vs. Multi-label classification

EvaluationRegression MetricsClassification Metrics

Experimenting with Machine Learning Models

All your data

Training Data DevData

Test Data

Rule #1

What is “correct?”What is working “well?”

Test Data

Learn model parameters from training set

set hyper-parameters

Test Data

Evaluate the learned model on devwith that hyperparameter setting

Test Data

perform final evaluation on test,using the hyperparameters that optimized dev performance and

retraining the model

Test Data

perform final evaluation on test,using the hyperparameters that optimized dev performance and

retraining the model

Rule 1: DO NOT ITERATE ON THE TEST DATA

On-board Exercise

Produce dev and test tables for a linear regression model with learned weights and

set/fixed (non-learned) bias

Outline

Multi-class Classification

Given input 𝑥, predict discrete label 𝑦

Multi-label Classification

If 𝑦 ∈ {0,1} (or 𝑦 ∈{True, False}), then a

binary classification task

If 𝑦 ∈ {0,1, … , 𝐾 − 1} (for finite K), then a multi-class

classification task

Q: What are some examples of multi-class classification?

classification task

Q: What are some examples of multi-class classification?

A: Many possibilities. See A2, Q{1,2,4-7}

classification task

Single output

Multi-output

If multiple 𝑦4 are predicted, then a multi-label classification task

classification task

Single output

Multi-output

Given input 𝑥, predict multiple discrete labels 𝑦 = (𝑦7,… , 𝑦8)

classification task

Single output

Multi-output

Given input 𝑥, predict multiple discrete labels 𝑦 = (𝑦7,… , 𝑦8)

Each 𝑦4 could be binary or multi-class

Multi-Label Classification…

Will not be a primary focus of this course

Many of the single output classification methods apply to multi-label classification

Predicting “in the wild” can be trickier

Evaluation can be trickier

We’ve only developed binary classifiers so far…

Option 1: Develop a multi-class version

Option 2: Build a one-vs-all (OvA) classifier

Option 3: Build an all-vs-all (AvA) classifier

(there can be others)

Loss function may (or may not) need to be extended & the model structure may need to change (big or small)

Common change:instead of a single weight vector 𝑤, keep a weight vector 𝑤(;) for each class c

Compute class specific scores, e.g., <𝑦=(;) = 𝑤(;) >𝑥 + 𝑏(;)

Multi-class Option 1: Linear Regression/Perceptron

𝑦 = 𝐰>𝑥 + 𝑏

output:if y > 0: class 1

else: class 2

Multi-class Option 1: Linear Regression/Perceptron: A Per-Class View

𝑦 = 𝐰>𝑥 + 𝑏𝐰𝟐

𝑥 𝑦

𝑦7 = 𝐰𝟏>𝑥 + 𝑏7

𝐰𝟏

𝑦D = 𝐰𝟐>𝑥 + 𝑏D

else: class 2

output:i = argmax {y1, y2}

class i

binary version is special case

Multi-class Option 1: Linear Regression/Perceptron: A Per-Class View (alternative)

𝑦 = 𝐰>𝑥 + 𝑏𝐰𝟐

𝑥 𝑦

𝑦7 = 𝒘𝟏;𝒘𝟐𝑻[𝑥; 𝟎] + 𝑏7

𝐰𝟏

else: class 2

output:i = argmax {y1, y2}

class i

𝑦D = 𝒘𝟏;𝒘𝟐𝑻[𝟎; 𝑥] + 𝑏D

concatenation

Q: (For discussion) Why does this work?

With C classes:

Train C different binary classifiers 𝛾;(𝑥)

𝛾;(𝑥) predicts 1 if x is likely class c, 0 otherwise

With C classes:

Train C different binary classifiers 𝛾;(𝑥)

𝛾;(𝑥) predicts 1 if x is likely class c, 0 otherwise

To test/predict a new instance z:Get scores 𝑠; = 𝛾;(𝑧)Output the max of these scores, N𝑦 = argmax; 𝑠;

With C classes:

Train RD different binary

classifiers 𝛾;S,;T(𝑥)

With C classes:

classifiers 𝛾;S,;T(𝑥)𝛾;S,;T(𝑥) predicts 1 if x is likely class 𝑐7, 0 otherwise (likely class 𝑐D)

With C classes:

To test/predict a new instance z:Get scores or predictions 𝑠;S,;T =𝛾;S,;T 𝑧

With C classes:

To test/predict a new instance z:Get scores or predictions 𝑠;S,;T =𝛾;S,;T 𝑧Multiple options for final prediction:

(1) count # times a class c was predicted(2) margin-based approach

Q: (to discuss)

Why might you want to use option 1 or options

OvA/AvA?

What are the benefits of OvA vs. AvA?

Q: (to discuss)

Why might you want to use option 1 or options

OvA/AvA?

What are the benefits of OvA vs. AvA?

What if you start with a balanced dataset, e.g.,

100 instances per class?

Outline

Regression Metrics

(Root) Mean Square Error

𝑅𝑀𝑆𝐸 =1𝑁[=

𝑦= − ]𝑦= D

Regression Metrics

(Root) Mean Square Error Mean Absolute Error

𝑦= − ]𝑦= D 𝑀𝐴𝐸 =1𝑁[=

|𝑦= − ]𝑦=|

Regression Metrics

𝑦= − ]𝑦= D 𝑀𝐴𝐸 =1𝑁[=

|𝑦= − ]𝑦=|

Q: How can these reward/punish predictions

differently?

Regression Metrics

𝑦= − ]𝑦= D 𝑀𝐴𝐸 =1𝑁[=

|𝑦= − ]𝑦=|

Q: How can these reward/punish predictions

differently?

A: RMSE punishes outlier predictions more harshly

Outline

Training Loss vs. Evaluation Score

In training, compute loss to update parameters

Sometimes loss is a computational compromise- surrogate loss

The loss you use might not be as informative as you’d like

Binary classification: 90 of 100 training examples are +1, 10 of 100 are -1

Some Classification Metrics

Accuracy

PrecisionRecall

AUC (Area Under Curve)

Confusion Matrix

Classification Evaluation:the 2-by-2 contingency table

ActuallyCorrect

Actually Incorrect

Selected/Guessed

Not selected/not guessed

Classes/Choices

ActuallyCorrect

Actually Incorrect

Selected/Guessed

True Positive (TP)

Classes/Choices

Correct Guessed

ActuallyCorrect

Actually Incorrect

Selected/Guessed

True Positive (TP)

False Positive (FP)

Classes/Choices

Correct Guessed Correct Guessed

ActuallyCorrect

Actually Incorrect

Selected/Guessed

True Positive (TP)

False Positive (FP)

False Negative (FN)

Classes/Choices

Correct Guessed

ActuallyCorrect

Actually Incorrect

Selected/Guessed

True Positive (TP)

False Positive (FP)

False Negative (FN)

True Negative (TN)

Classes/Choices

Classification Evaluation:Accuracy, Precision, and Recall

Accuracy: % of items correct

Actually Correct Actually Incorrect

Selected/Guessed True Positive (TP) False Positive (FP)

Not select/not guessed False Negative (FN) True Negative (TN)

TP + TNTP + FP + FN + TN

Precision: % of selected items that are correct

TPTP + FP

Recall: % of correct items that are selected

TPTP + FP

TPTP + FN

Recall: % of correct items that are selected

TPTP + FP

TPTP + FN

Min: 0 ☹Max: 1 😀

Precision and Recall Present a Tradeoff

precision

recall0

Q: Where do you want your ideal

model?model

precision

recall0

model?model

Q: You have a model that always identifies correct instances. Where on this graph is it?

precision

recall0

Q: You have a model that only

make correct predictions. Where on this graph is it?

model?model

precision

recall0

model?model

precision

recall0

make correct predicoons. Where on this graph is it?

model?model

Idea: measure the tradeoff between

precision and recall

Remember those hyperparameters: Each

point is a differently trained/tuned model

precision

recall0

model?model

Idea: measure the tradeoff between

precision and recall

Improve overall model: push the curve that way

Measure this Tradeoff:Area Under the Curve (AUC)

AUC measures the area under this tradeoff curve

recall00

Min AUC: 0 ☹Max AUC: 1 😀

1. Compuong the curveYou need true labels & predicted labels with some score/confidence esomate

Threshold the scores and for each threshold compute precision and recall

recall00

1. Computing the curveYou need true labels & predicted labels with some score/confidence estimateThreshold the scores and for each threshold compute precision and recall

2. Finding the areaHow to implement: trapezoidal rule (& others)

In practice: external library like the sklearn.metrics module

recall00

Measure A Slightly Different Tradeoff:ROC-AUC

1. Computing the curveYou need true labels & predicted labels with some score/confidence estimateThreshold the scores and for each threshold compute metrics

2. Finding the areaHow to implement: trapezoidal rule (& others)

In practice: external library like the sklearn.metrics module

False posiove rate00

Min ROC-AUC: 0.5 ☹Max ROC-AUC: 1 😀

Main variant: ROC-AUCSame idea as before but with some

flipped metrics

A combined measure: F

Weighted (harmonic) average of Precision & Recall

𝐹 =1

𝛼 1𝑃 + (1 − 𝛼)1𝑅

𝐹 =1

𝛼 1𝑃 + (1 − 𝛼)1𝑅=

1 + 𝛽D ∗ 𝑃 ∗ 𝑅(𝛽D ∗ 𝑃) + 𝑅

algebra (not important)

Balanced F1 measure: β=1

𝐹 =1 + 𝛽D ∗ 𝑃 ∗ 𝑅(𝛽D ∗ 𝑃) + 𝑅

𝐹7 =2 ∗ 𝑃 ∗ 𝑅𝑃 + 𝑅

P/R/F in a Multi-class Setting:Micro- vs. Macro-Averaging

If we have more than one class, how do we combine mulGple performance measures into one quanGty?

Macroaveraging: Compute performance for each class, then average.

Microaveraging: Collect decisions for all classes, compute conongency table, evaluate.

Sec. 15.2.4

P/R/F in a Mulo-class Setng:Micro- vs. Macro-Averaging

Microaveraging: Collect decisions for all classes, compute contingency table, evaluate.

Sec. 15.2.4

microprecision =∑nTPn

∑nTPn + ∑nFPn

macroprecision =[;

TPnTPn + FPn

precision;

P/R/F in a Multi-class Setting:Micro- vs. Macro-Averaging

Microaveraging: Collect decisions for all classes, compute conongency table, evaluate.

Sec. 15.2.4

microprecision =∑n TPn

∑n TPn + ∑n FPn

macroprecision =[;

TPnTPn + FPn

precision;

when to prefer the macroaverage?

when to prefer the microaverage?

Micro- vs. Macro-Averaging: Example

Truth: yes

Truth: no

Classifier: yes

Classifier: no

10 970

Truth: yes

Truth: no

Classifier: yes

Classifier: no

10 890

Truth: yes

Truth: no

Classifier: yes

100 20

Classifier: no

20 1860

Class 1 Class 2 Micro Ave. Table

Sec. 15.2.4

Macroaveraged precision: (0.5 + 0.9)/2 = 0.7Microaveraged precision: 100/120 = .83Microaveraged score is dominated by score on frequent classes

Confusion Matrix: Generalizing the 2-by-2 contingency table

Correct Value

Guessed Value

Correct Value

Guessed Value

80 9 11

7 86 7

Q: Is this a good result?

Correct Value

Guessed Value

30 40 30

25 30 50

30 35 35

Correct Value

Guessed Value

7 3 90

4 8 88

3 7 90

Some Classification Metrics

Accuracy

PrecisionRecall

AUC (Area Under Curve)

Confusion Matrix

Outline

Experimental Setup, Multi-class vs. Multi-label ... · perform final evaluation on test, using the...

Documents

Transcript of Experimental Setup, Multi-class vs. Multi-label ... · perform final evaluation on test, using the...

Topology Optimized Multi-layered Meta-optics · Topology Optimized Multi-layered Meta-optics Zin Lin 1, Benedikt Groever , Federico Capasso1, Alejandro W. Rodriguez2, and Marko Loncarˇ

Matrix mlm software, gift and reward mlm plan, binary mlm, binary mlm software, binary multi level marketing software

An Optimized Classification Method for Multi-Classed Sequences · aggregate the output from all PSO-SA-BA classifiers, the fuzzy preference relation between each pair of binary database

Optimized Binary Modular Reconfigurable Robotic Devicesrobots.mit.edu/publications/papers/2002_05_Haf_Lic_Dub.pdf · Optimized Binary Modular Reconfigurable Robotic Devices ... increase

Design of Optimized Reversible Binary and BCD Adderswineyard.in/Abstract/mtech/VLSI/2015/bp/15V36.pdf · Design of Optimized Reversible Binary and BCD Adders Nagamani A N #i, Ashwin

Multi-Terminal Binary Decision Diagrams: An Efficient Data ...

Design of Automotive Structures using Multi-Model Optimization · M2 DV_set_2 Constraints_M2 Objective_M2 M3 DV_set_3 Constraints_M3 Objective_M3 Optimized M1 Optimized M2 Optimized

Fully Binary Neural Network Model and Optimized Hardware ...ziyang.eecs.umich.edu/iesr/papers/coussy15apr.pdfFully Binary Neural Network Model and Optimized Hardware Architectures

Multi-stage Binary Code Obfuscation Using Improved Virtual … . cracking/Multi... · 2016. 5. 29. · Multi-stage Binary Code Obfuscation Using Improved Virtual Machine Hui Fang

Optimized hybrid learning for multi disease prediction ...

Optimized Directed Roadmap Graph for Multi-Agent

Cross Platform Multi Arch Binary{Shellcode} …...> Part Time CtF player Cross platform and multi architecture advanced binary emulation framework Qiling Framework > > Lead Developer

Optimized Binary GCD for Modular Inversion

Highly-Economized Multi-View Binary Compression for Scalable …openaccess.thecvf.com/content_ECCV_2018/papers/Zheng... · 2018. 8. 28. · Highly-Economized Multi-View Binary Compression

Segmentation of Multi-Textured Images using Optimized ... · Distance, Local Binary Patterns, Optimized Local Ternary Patterns, Brodatz Textures. 1. INTRODUCTION Image segmentation

Optimized Java Binary and Virtual Machine for Tiny Motes

Reducing Multi-Valued Algebraic Operations to Binary

The Antarctic ice core chronology (AICC2012): an optimized multi ...

OPTIMIZED ADAPTIVE ENRICHMENT DESIGNS FOR MULTI-ARM TRIALS …

Multi-class Generalized Binary Search for Active Inverse ...