LearningfromImbalancedData - University of Rhode Island
Transcript of LearningfromImbalancedData - University of Rhode Island
![Page 1: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/1.jpg)
Learning from Imbalanced Data
Prof. Haibo He
Electrical Engineering University of Rhode Island, Kingston, RI 02881
Computa)onal Intelligence and Self-‐Adap)ve Systems (CISA) Laboratory
h?p://www.ele.uri.edu/faculty/he/ Email: [email protected]
This lecture notes is based on the following paper: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge
and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009
1 Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009
![Page 2: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/2.jpg)
Learning from Imbalanced Data
1. The problem: Imbalanced Learning
2. The solu)ons: State-‐of-‐the-‐art
3. The evalua)on: Assessment Metrics
4. The future: Opportuni)es and Challenges
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 2
![Page 3: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/3.jpg)
The Nature of Imbalanced Learning Problem
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 3
![Page 4: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/4.jpg)
Requirement? • Balanced distribution of data • Equal costs of misclassification
ü Explosive availability of raw data ü Well-‐developed algorithms for data analysis
The Problem
What about data in reality?
The Nature of Imbalance Learning
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 4
![Page 5: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/5.jpg)
Imbalance is Everywhere Between-‐class
Within-‐class
Intrinsic and extrinsic
Rela)vity and rarity
Imbalance and small sample size
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 5
![Page 6: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/6.jpg)
Growing interest
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 6
![Page 7: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/7.jpg)
Mammography Data Set: An example of between-class imbalance
The Nature of Imbalance Learning
Nega)ve/healthy Posi)ve/cancerous
Number of cases 10,923 260
Category Majority Minority
Imbalanced accuracy ≈ 100% 0-‐10 %
Imbalance can be on the order of 100 : 1 up to 10,000 : 1!
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 7
![Page 8: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/8.jpg)
Intrinsic and extrinsic imbalance Intrinsic:
• Imbalance due to the nature of the dataspace Extrinsic:
• Imbalance due to time, storage, and other factors • Example:
Data transmission over a specific interval of time with interruption
0 10 20 30 40 50 60 70 80 90 100-2
-1
0
1
2
time/t
x(t)
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 8
![Page 9: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/9.jpg)
Data Complexity
The Nature of Imbalance Learning
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 9
![Page 10: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/10.jpg)
Relative imbalance and absolute rarity
• The minority class may be outnumbered, but not necessarily rare
• Therefore they can be accurately learned with little disturbance
?
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 10
![Page 11: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/11.jpg)
• Data with high dimensionality and small sample size • Face recogni)on, gene expression
• Challenges with small sample size: 1. Embedded absolute rarity and within-‐class imbalances 2. Failure of generalizing induc)ve rules by learning algorithms
• Difficulty in forming good classifica)on decision boundary over more features but less samples
• Risk of overfifng
Imbalanced data with small sample size
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 11
![Page 12: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/12.jpg)
The Solutions to Imbalanced Learning Problem
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 12
![Page 13: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/13.jpg)
Solutions to imbalanced learning
Sampling methods
Cost-‐sensi)ve methods
Kernel and Ac)ve Learning methods
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 13
![Page 14: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/14.jpg)
Create balanced dataset
Modify data distribu)on
If data is Imbalanced…
Sampling methods
Sampling methods
Create balance though sampling
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 14
![Page 15: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/15.jpg)
Sampling methods
Random Sampling S: training data set; Smin: set of minority class samples, Smaj: set of majority class samples; E: generated samples
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 15
![Page 16: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/16.jpg)
Sampling methods
• EasyEnsemble • Unsupervised: use random subsets of the majority class to create balance and form mul5ple classifiers
• BalanceCascade • Supervised: itera5vely create balance and pull out redundant samples in majority class to form a final classifier
Informed Undersampling
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 16
![Page 17: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/17.jpg)
Sampling methods
Informed Undersampling
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 17
![Page 18: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/18.jpg)
Sampling methods
Synthetic Sampling with Data Generation
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 18
![Page 19: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/19.jpg)
• Synthe)c minority oversampling technique (SMOTE)
Sampling methods
Synthetic Sampling with Data Generation
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 19
![Page 20: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/20.jpg)
Sampling methods
Adaptive Synthetic Sampling
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 20
![Page 21: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/21.jpg)
• Overcomes over generaliza)on in SMOTE algorithm • Border-‐line-‐SMOTE
Sampling methods
Adaptive Synthetic Sampling
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 21
![Page 22: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/22.jpg)
Sampling methods
Adaptive Synthetic Sampling
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 22
![Page 23: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/23.jpg)
Sampling methods
Sampling with Data Cleaning
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 23
![Page 24: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/24.jpg)
Sampling methods Sampling with Data Cleaning
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 24
![Page 25: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/25.jpg)
Sampling methods
Cluster-based oversampling (CBO) method
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009
![Page 26: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/26.jpg)
Sampling methods
CBO Method
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 26
![Page 27: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/27.jpg)
Sampling methods
Integration of Sampling and Boosting
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 27
![Page 28: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/28.jpg)
Sampling methods
Integration of Sampling and Boosting
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 28
![Page 29: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/29.jpg)
Cost-Sensitive Methods
U)lize cost-‐sensi)ve
methods for imbalanced learning
Considering the cost of misclassifica
)on
Instead of modifying data…
Cost-‐Sensi)ve methods
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 29
![Page 30: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/30.jpg)
Cost-Sensitive Learning Framework
Cost-‐Sensi)ve methods
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 30
![Page 31: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/31.jpg)
Cost-‐Sensi)ve methods Cost-Sensitive Dataspace Weighting with Adaptive Boosting
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 31
![Page 32: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/32.jpg)
Cost-‐Sensi)ve methods Cost-Sensitive Dataspace Weighting with Adaptive Boosting
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009
![Page 33: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/33.jpg)
Cost-Sensitive Decision Trees 1. Cost-‐sensi)ve adjustments for the decision
threshold • The final decision threshold shall yield the most dominant point on the ROC curve
2. Cost-‐sensi)ve considera)ons for split criteria • The impurity func)on shall be insensi)ve to unequal costs
3. Cost-‐sensi)ve pruning schemes • The probability es)mate at each node needs improvement to reduce removal of leaves describing the minority concept
• Laplace smoothing method and Laplace pruning techniques
Cost-‐Sensi)ve methods
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 33
![Page 34: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/34.jpg)
Cost-Sensitive Neural Network
Cost-‐Sensi)ve methods
Four ways of applying cost sensi)vity in neural networks
Modifying probability es)mate
of outputs
• Applied only at tes5ng stage
• Maintain original neural networks
Altering outputs directly
• Bias neural networks during training to focus on expensive class
Modify learning rate
• Set η higher for costly examples and lower for low-‐cost examples
Replacing error-‐minimizing func)on
• Use expected cost minimiza5on func5on instead
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 34
![Page 35: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/35.jpg)
Kernel-based learning framework • Based on sta)s)cal learning and Vapnik-‐Chervonenkis (VC) dimensions
• Problems with Kernel-‐based support vector machines (SVMs) 1. Support vectors from the minority concept may contribute
less to the final hypothesis 2. Op)mal hyperplane is also biased toward the majority class
Kernel-‐Based Methods
To minimize the total error
Biased toward the majority
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 35
![Page 36: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/36.jpg)
Integration of Kernel Methods with Sampling Methods
1. SMOTE with Different Costs (SDCs) method
2. Ensembles of over/under-‐sampled SVMs
3. SVM with asymmetric misclassifica)on cost
4. Granular Support Vector Machines—Repe))ve
Undersampling (GSVM-‐RU) algorithm
Kernel-‐Based Methods
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 36
![Page 37: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/37.jpg)
Kernel Modification Methods 1. Kernel classifier construc)on
• Orthogonal forward selec)on (OFS) and Regularized orthogonal weighted least squares (ROWLSs) es)mator
2. SVM class boundary adjustment
• Boundary movement (BM), biased penal)es (BP), class-‐boundary alignment(CBA), kernel-‐boundary alignment (KBA)
3. Integrated approach • Total margin-‐based adap)ve fuzzy SVM (TAF-‐SVM)
4. K-‐category proximal SVM (PSVM) with Newton refinement
5. Support cluster machines (SCMs), Kernel neural gas (KNG), P2PKNNC algorithm, hybrid kernel machine ensemble (HKME) algorithm, Adaboost relevance vector machine (RVM), …
Kernel-‐Based Methods
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 37
![Page 38: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/38.jpg)
Ac3ve Learning Methods • SVM-‐based ac)ve learning
• Ac)ve learning with sampling techniques • Undersampling and oversampling with ac)ve learning for the word sense disambigua)on (WSD) imbalanced learning
• New stopping mechanisms based on maximum confidence and minimal error
• Simple ac)ve learning heuris)c (SALH) approach
Ac)ve Learning Methods
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 38
![Page 39: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/39.jpg)
Addi3onal methods One-‐class learning/novelty detec)on methods
Mahalanobis-‐Taguchi System
• Combina)on of imbalanced data and the small sample size problem
Rank metrics and mul)task learning
• AdaC2.M1 • Rescaling approach for mul)class cost-‐sensi)ve neural networks • the ensemble knowledge for imbalance sample sets (eKISS) method
Mul)class imbalanced learning
Ac)ve Learning Methods
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 39
![Page 40: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/40.jpg)
The Evaluation of Imbalanced Learning Problem
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 40
![Page 41: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/41.jpg)
How to evaluate the performance of imbalanced learning algorithms ?
1. Singular assessment metrics
2. Receiver opera)ng characteris)cs (ROC) curves
3. Precision-‐Recall (PR) Curves
4. Cost Curves
5. Assessment Metrics for Mul)class Imbalanced Learning
Assessment Metrics
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 41
![Page 42: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/42.jpg)
Singular assessment metrics Assessment Metrics
• Limita)ons of accuracy – sensi)vity to data distribu)ons
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 42
![Page 43: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/43.jpg)
Singular assessment metrics
Assessment Metrics
• Insensi)ve to data distribu)ons
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 43
![Page 44: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/44.jpg)
Singular assessment metrics
Assessment Metrics
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 44
![Page 45: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/45.jpg)
Receive Operating Characteristics (ROC) curves
Assessment Metrics
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 45
![Page 46: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/46.jpg)
Precision-Recall (PR) curves • Plofng the precision rate over the recall rate
• A curve dominates in ROC space (resides in the upper-‐lep hand) if and only if it dominates (resides in the upper-‐right hand) in PR space
• PR space has all the analogous benefits of ROC space • Provide more informa)ve representa)ons of performance assessment under highly imbalanced data
Assessment Metrics
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 46
![Page 47: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/47.jpg)
Cost Curves
Assessment Metrics
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009
47
![Page 48: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/48.jpg)
The Future of Imbalanced Learning Problem
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 48
![Page 49: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/49.jpg)
Opportuni3es and Challenges
Understanding the Fundamental Problem
Need of a Uniform Benchmark Plaqorm
Need of Standardized Evalua)on Prac)ces
Semi-‐supervised Learning from
Imbalanced data
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 49
![Page 50: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/50.jpg)
1. What kind of assump)ons will make imbalanced learning algorithms work
be?er compared to learning from the original distribu)ons?
2. To what degree should one balance the original data set?
3. How do imbalanced data distribu)ons affect the computa)onal complexity
of learning algorithms?
4. What is the general error bound given an imbalanced data distribu)on?
Opportuni)es and Challenges
Understanding the Fundamental Problem
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 50
![Page 51: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/51.jpg)
1. Lack of a uniform benchmark for standardized performance assessments
2. Lack of data sharing and data interoperability across different disciplinary
domains;
3. Increased procurement costs, such as )me and labor, for the research
community as a whole group since each research group is required to
collect and prepare their own data sets.
Opportuni)es and Challenges
Need of a Uniform Benchmark Plaqorm
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 51
![Page 52: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/52.jpg)
• Establish the prac)ce of using the curve-‐based evalua)on techniques
• A standardized set of evalua)on prac)ces for proper comparisons
Opportuni)es and Challenges
Need of Standardized Evalua)on Prac)ces
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 52
![Page 53: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/53.jpg)
1. How can we autonomously adjust the learning algorithm if an imbalance is
introduced in the middle of the learning period?
2. Should we consider rebalancing the data set during the incremental learning
period? If so, how can we accomplish this?
3. How can we accumulate previous experience and use this knowledge to
adap)vely improve learning from new data?
4. How do we handle the situa)on when newly introduced concepts are also
imbalanced (i.e., the imbalanced concept driping issue)?
Opportuni)es and Challenges
Incremental Learning from Imbalanced Data Streams
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 53
![Page 54: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/54.jpg)
1. How can we iden)fy whether an unlabeled data example came from a
balanced or imbalanced underlying distribu)on?
2. Given an imbalanced training data with labels, what are the effec)ve and
efficient methods for recovering the unlabeled data examples?
3. What kind of biases may be introduced in the recovery process (through
the conven)onal semi-‐supervised learning techniques) given imbalanced,
labeled data?
Opportuni)es and Challenges
Semi-‐supervised Learning from Imbalanced Data
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 54
![Page 55: LearningfromImbalancedData - University of Rhode Island](https://reader031.fdocuments.in/reader031/viewer/2022012013/6158ceb9c1046949a8363506/html5/thumbnails/55.jpg)
Reference: This lecture notes is based on the following paper:
H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009
Should you have any comments or sugges)ons regarding this lecture note, please feel free to contact Dr. Haibo He at [email protected] Web: h?p://www.ele.uri.edu/faculty/he/
Source: H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowledge and Data Engineering, vol. 21, issue 9, pp. 1263-‐1284, 2009 55