Taming the Learning Zoo

TAMING THE LEARNING ZOO

SUPERVISED LEARNING ZOO Bayesian learning (find parameters of a

probabilistic model) Maximum likelihood Maximum a posteriori

Classification Decision trees (discrete attributes, few relevant) Support vector machines (continuous attributes)

Regression Least squares (known structure, easy to interpret) Neural nets (unknown structure, hard to interpret)

Nonparametric approaches k-Nearest-Neighbors Locally-weighted averaging / regression

AGENDA Quantifying learner performance

Cross validation Error vs. loss Confusion matrix Precision & recall

Computational learning theory

CROSS-VALIDATION

ASSESSING PERFORMANCE OF A LEARNING ALGORITHM Samples from X are typically unavailable Take out some of the training set

Train on the remaining training set Test on the excluded instances Cross-validation

CROSS-VALIDATION Split original set of examples, train

-Hypothesis space H

Examples D

CROSS-VALIDATION Evaluate hypothesis on testing set

Hypothesis space H

Testing set

CROSS-VALIDATION Evaluate hypothesis on testing set

Hypothesis space H

Testing set

CROSS-VALIDATION Compare true concept against prediction

Hypothesis space H

Testing set

9/13 correct

COMMON SPLITTING STRATEGIES k-fold cross-validation

Leave-one-out (n-fold cross validation)

Train TestDataset

COMPUTATIONAL COMPLEXITY k-fold cross validation requires

k training steps on n(k-1)/k datapoints k testing steps on n/k datapoints (There are efficient ways of computing L.O.O.

estimates for some nonparametric techniques, e.g. Nearest Neighbors)

Average results reported

BOOTSTRAPPING Similar technique for estimating the

confidence in the model parameters Procedure:1. Draw k hypothetical datasets from original

data. Either via cross validation or sampling with replacement.

2. Fit the model for each dataset with k parameters k

3. Return the standard deviation of 1,…,k (or a confidence interval)

Can also estimate confidence in a prediction y=f(x)

EXAMPLE: AVERAGE OF N NUMBERS Data D={x(1),…,x(N)}, model is constant Learning: minimize E() = i(x(i)-)2 => compute

average Repeat for j=1,…,k :

Randomly sample subset x(1)’,…,x(N)’ from D Learn j = 1/N i x(i)’

Return histogram of 1,…,j

10 100 1000 100000.44

AverageLower rangeUpper range

|Data set|

BEYOND ERROR RATES

BEYOND ERROR RATE Predicting security risk

Predicting “low risk” for a terrorist, is far worse than predicting “high risk” for an innocent bystander (but maybe not 5 million of them)

Searching for images Returning irrelevant images is

worse than omitting relevant ones

BIASED SAMPLE SETS Often there are orders of magnitude more

negative examples than positive E.g., all images of Mark Wilson on Facebook If I classify all images as “not Mark” I’ll have

>99.99% accuracy

Examples of Mark should count much more than non-Mark!

FALSE POSITIVES

True concept Learned concept

FALSE POSITIVES

New query

An example incorrectly predicted

to be positive

FALSE NEGATIVES

New query

An example incorrectly predicted

to be negative

PRECISION VS. RECALL Precision

# of relevant documents retrieved / # of total documents retrieved

Recall # of relevant documents retrieved / # of total

relevant documents Numbers between 0 and 1

PRECISION VS. RECALL Precision

# of true positives / (# true positives + # false positives)

Recall # of true positives / (# true positives + # false

negatives) A precise classifier is selective A classifier with high recall is inclusive

OPTION 1: CLASSIFICATION THRESHOLDS Many learning algorithms (e.g., linear

models, NNets, BNs, SVM) give real-valued output v(x) that needs thresholding for classification

v(x) > t => positive label given to xv(x) < t => negative label given to x

May want to tune threshold to get fewer false positives or false negatives

REDUCING FALSE POSITIVE RATE

REDUCING FALSE NEGATIVE RATE

LOSS FUNCTIONS & WEIGHTED DATASETS General learning problem: “Given data D and

loss function L, find the best hypothesis from hypothesis class H”

Loss functions: L contains weights to favor accuracy on positive or negative examples E.g., L = 10 E+

+ 1 E-

Weighted datasets: attach a weight w to each example to indicate how important it is Or construct a resampled dataset D’ where each

example is duplicated proportionally to its w

PRECISION-RECALL CURVES

Precision

Recall

Measure Precision vs Recall as tolerance (or weighting) is tuned

Perfect classifier

Actual performance

Precision

Recall

Penalize false negatives

Penalize false positives

Equal weight

Precision

Recall

Precision

Recall

Better learningperformance

MODEL SELECTION

COMPLEXITY VS. GOODNESS OF FIT More complex models can fit the data better,

but can overfit Model selection: enumerate several possible

hypothesis classes of increasing complexity, stop when cross-validated error levels off

Regularization: explicitly define a metric of complexity and penalize it in addition to loss

MODEL SELECTION WITH K-FOLD CROSS-VALIDATION Parameterize learner by a complexity level C Model selection pseudocode:

For increasing levels of complexity C: errT[C],errV[C] = Cross-Validate(Learner,C,examples) If errT has converged,

Find value Cbest that minimizes errV[C] Return Learner(Cbest,examples)

REGULARIZATION Minimize:

Cost(h) = Loss(h) + Complexity(h) Example with linear models y = Tx:

L2 error: Loss() = i (y(i)-Tx(i))2

Lq regularization: Complexity(): j |j|q

L2 and L1 are most popular in linear regularization L2 regularization leads to simple computation

of optimal L1 is more complex to optimize, but produces

sparse models in which many coefficients are 0!

OTHER TOPICS IN MACHINE LEARNING Unsupervised learning

Dimensionality reduction Clustering

Reinforcement learning Agent that acts and learns how to act in an

environment by observing rewards Learning from demonstration

Agent that acts and learns how to act in an environment by observing demonstrations from an expert

ISSUES IN PRACTICE The distinctions between learning algorithms

diminish when you have a lot of data The web has made it much easier to gather

large scale datasets than in early days of ML Understanding data with many more

attributes than examples is still a major challenge! Do humans just have really great priors?

PROJECT MIDTERM REPORT Due 11/10

~1 page description of current progress, challenges, changes in direction

NEXT LECTURES Intelligent agents (R&N 2) Decision-theoretic planning Reinforcement learning Applications of AI

Taming the Learning Zoo

Documents

Transcript of Taming the Learning Zoo

Taming Traffic

Taming the Factor Zoo - AQR Capital

Taming Text

Machine Learning Control – Taming Nonlinear Dynamics and ...berndnoack.com/publications/2016_Book_MLC_Chapter06.pdf · Machine Learning Control – Taming Nonlinear Dynamics and

Taming the Integrable Zoo

Towards Taming the Resource and Data Heterogeneity in … · 2020-03-26 · Towards Taming the Resource and Data Heterogeneity in Federated Learning Zheng Chai1, Hannan Fayyaz2, Zeshan

Taming Dragons

Machine Translation Zoo - Univerzita Karlovaufal.mff.cuni.cz/~popel/papers/2013_05_06_zoo.pdf · Machine Translation Zoo Tree-to-tree transfer and Discriminative learning Martin Popel

Taming the Learning Zoo

Taming Shrew

Exploring the factor zoo with a machine-learning portfolio

Taming Facebook

Taming Procrastination - Home - Learning Center · 2017-07-25 · Taming Procrastination ... project until I have cleaned my apartment.” Challenge: ... There are only so many cat

Service-Learning Online Zoo Lab Jacobs

Machine Learning Control – Taming Nonlinear Dynamics and …berndnoack.com/publications/2016_Book_MLC_Chapter01.pdf · 2017. 3. 30. · Machine Learning Control – Taming Nonlinear

The Big Data Zoo Taming the Beasts - Bitpipedocs.media.bitpipe.com/io_10x/io_108041/item_630961/big...The Big Data Zoo—Taming the Beasts The need for an integrated platform for enterprise

Trends in eLearning 2014; Video, Flipped Classrooms, Mobile Learning, Location Based Learning, Australia Zoo and More

Taming Risk

Zoo Zoo Research

Taming Snakemake