iml chap6 linreg handout - Chloé-Agathe...

6. Linear & logistic regressions

Foundations of Machine LearningCentraleSupélec — Fall 2017

Chloé-Agathe AzencotCentre for Computational Biology, Mines ParisTech

chloe-agathe.azencott@mines-paristech.fr

6. Linear & logistic regressions

Foundations of Machine LearningCentraleSupélec — Fall 2017

Chloé-Agathe AzencotCentre for Computational Biology, Mines ParisTech

chloe-agathe.azencott@mines-paristech.fr

Learning objectives● Density estimation:

– Define parametric methods.– Define the maximum likelihood estimator and compute

it for Bernouilli, multinomial and Gaussian densities.– Define the Bayes estimator and compute it for normal

priors.● Supervised learning:

– Compute the maximum likelihood estimator / least-square fit solution for linear regression.

– Compute the maximum likelihood estimator for logistic regression.

Learning objectives● Density estimation:

– Define parametric methods.– Define the maximum likelihood estimator and compute

it for Bernouilli, multinomial and Gaussian densities.– Define the Bayes estimator and compute it for normal

priors.● Supervised learning:

– Compute the maximum likelihood estimator / least-square fit solution for linear regression.

– Compute the maximum likelihood estimator for logistic regression.

Parametric methods●

● Parametric estimation:– assume a form for p(x|θ)

– Goal: estimate θ using X– usually assume that independent and identically

distributed (iid)

Density estimation

Parametric methods●

● Parametric estimation:– assume a form for p(x|θ)

– Goal: estimate θ using X– usually assume that independent and identically

distributed (iid)

Density estimation

Maximum likelihood estimation

● Find θ such that X is the most likely to be drawn.● Likelihood of θ given the i.i.d. sample X:

● Log likelihood:

● Maximum likelihood estimation (MLE):

Maximum likelihood estimation● Find θ such that X is the most likely to be drawn.● Likelihood of θ given the i.i.d. sample X:

● Log likelihood:

● Maximum likelihood estimation (MLE):

Bernouilli density● Two states: failure / success

● MLE estimate of p0 ?

Bernouilli density● Two states: failure / success

● MLE estimate of p0 ?

Multinomial density● Consider K mutually exclusive and exhaustive

classes– Each class occurs with probability pk

– x1, x2, …, xK indicator variables: x1=1 if the outcome is class k and 0 otherwise

● The MLE of pk is

Multinomial density● Consider K mutually exclusive and exhaustive

classes– Each class occurs with probability pk

– x1, x2, …, xK indicator variables: x1=1 if the outcome is class k and 0 otherwise

● The MLE of pk is

Gaussian distribution● Gaussian distribution = normal distribution

MLE estimates of μ and σ:?

Gaussian distribution● Gaussian distribution = normal distribution

MLE estimates of μ and σ:?

Bias-variance tradeof● Mean squared error of the estimator:

A biased estimator may achieve better MSE than an unbiased one.

θθ0E[θθ̂]

variance

Bias-variance tradeof● Mean squared error of the estimator:

A biased estimator may achieve better MSE than an unbiased one.

θθ0E[θθ̂]

variance

Bayes estimator● Treat θ as a random variable with prior p(θ)● Bayes rule: ● Density estimation

● Maximum likelihood estimate (MLE):

● Bayes estimate:

Bayes estimator● Treat θ as a random variable with prior p(θ)● Bayes rule: ● Density estimation

● Maximum likelihood estimate (MLE):

● Bayes estimate:

● n data points (iid) ● MLE of θ:

Bayes estimator of θ:

Bayes estimator: Normal prior

● n data points (iid) ● MLE of θ:

Bayes estimator of θ:

Bayes estimator: Normal prior

Linear regression

Linear regression: MLE● Assume error is Gaussian distributed

● Replace g with its estimator f

E[y|x*]

E[y|x] = βx+β0

p(y|x*)

Linear regression: MLE● Assume error is Gaussian distributed

● Replace g with its estimator f

E[y|x*]

E[y|x] = βx+β0

p(y|x*)

Gauss-Markov Theorem● Under the assumption that

the least-squares estimator of β is its (unique) best linear unbiased estimator.

● Best Linear Unbiased Estimator (BLUE):Var(βθ̂) < Var(β*) for any β* that is a linear unbiased estimator of β

psd and minimal for D=0

Gauss-Markov Theorem● Under the assumption that

the least-squares estimator of β is its (unique) best linear unbiased estimator.

● Best Linear Unbiased Estimator (BLUE):Var(βθ̂) < Var(β*) for any β* that is a linear unbiased estimator of β

psd and minimal for D=0

Correlated variables● If the variables are decorrelated:

– Each coefficient can be estimated separately;– Interpretation is easy:

“A change of 1 in x j is associated with a change of βj in Y, while everything else stays the same.”

● Correlations between variables cause problems:– The variance of all coefficients tend to increase;– Interpretation is much harder

when xj changes, so does everything else.

Correlated variables● If the variables are decorrelated:

– Each coefficient can be estimated separately;– Interpretation is easy:

“A change of 1 in x j is associated with a change of βj in Y, while everything else stays the same.”

● Correlations between variables cause problems:– The variance of all coefficients tend to increase;– Interpretation is much harder

when xj changes, so does everything else.

What about classification?● Model Pr(Y=1|X) as a linear function? ?

Logistic regression

What about classification?● Model Pr(Y=1|X) as a linear function? ?

Logistic regression

Maximum likelihood estimation of logistic regression coefficients

● Log likelihood for n observations:

● Gradient of the log likelihood:

Maximum likelihood estimation of logistic regression coefficients

● Log likelihood for n observations:

● Gradient of the log likelihood:

Summary● MAP estimate:

● MLE:

● Bayes estimate:

● Assuming Gaussian error, maximizing the likelihood is equivalent to minimizing the RSS.

● Linear regression MLE:

● Logistic regression MLE: solve with gradient descent.

Summary● MAP estimate:

● MLE:

● Bayes estimate:

● Assuming Gaussian error, maximizing the likelihood is equivalent to minimizing the RSS.

● Linear regression MLE:

● Logistic regression MLE: solve with gradient descent.

References● A Course in Machine Learning. http://ciml.info/dl/v0_99/ciml-v0_99-all.pdf

– Least-squares regression: Chap 7.6● The Elements of Statistical Learning. http://web.stanford.edu/~hastie/ElemStatLearn/

– Least-squares regression: Chap 2.2.1, 3.1, 3.2.1– Gauss-Markov theorem: Chap 3.2.3

References● A Course in Machine Learning. http://ciml.info/dl/v0_99/ciml-v0_99-all.pdf

– Least-squares regression: Chap 7.6● The Elements of Statistical Learning. http://web.stanford.edu/~hastie/ElemStatLearn/

– Least-squares regression: Chap 2.2.1, 3.1, 3.2.1– Gauss-Markov theorem: Chap 3.2.3

iml chap6 linreg handout - Chloé-Agathe...

Documents

Transcript of iml chap6 linreg handout - Chloé-Agathe...

224710761 Agathe Christie Crima Din Orient Express

Data Mining in Bioinformatics Day 9: Graph Mining in ... · Chemoinformatics Chloé-Agathe Azencott: Data Mining in Bioinformatics, Page 8 The chemical space 1060 possible small or-ganic

Brew culture in France Boullier Prescilla Capron Agathe Mazuras Isabelle Pomies Louanna.

Agathe Backer Grondahl - Serenade (Op15-1)

Č Martin Miller, Agathe Untch and Nils Wedi Research ......Martin Miller, Agathe Untch and Nils Wedi Research Department * Current affiliation: The Abdus Salam International Centre

Chloé in canada

See by Chloé Cotton & Silk Ruffle Top Where to Buy...Top,check price on See by Chloé Cotton & Silk Ruffle Top,See by Chloé Cotton & Silk Ruffle Top sale online,Review See by Chloé

AGATHE DE BAILLIENCOURT - Art Plural Gallery...Agathe de Bailliencourt was born in 1974 in Paris. She graduated from Ecole Boulle, Paris and Ecole Nationale des Beaux-Arts, Cergy Pontoise,

Chloé Lybaert Chloe.Lybaert @ ugent.be Supervisor: Prof. Dr. Johan De Caluwe

Carrefour Ste-Agathe eng SEPTEMBER 2012 Carrefour Ste-Agathe 40, BOULEVARD MORIN, STE-AGATHE-DES-MONTS, QUÉBEC J8C 2V6 Carrefour Ste-Agathe is located on the main commercial street

SERVICE JEUNESSE JEUGDDIENST - berchem.brussels · Publication communale de Berchem-Sainte-Agathe Avenue du Roi Albert 33 - 1082 Berchem-Sainte-Agathe Tél.: 02/464 04 11 - Fax: 02/464

AGATHE TECHNICAL SPECIFICATIONS ELECTRICAL POINT

Jane Cadwallader - PB3 et la veste de Chloé

12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

RAVEL DAPHNIS ET CHLOÉ - Chicago Symphony …...DAPHNIS ET CHLOÉ CHICAGO SYMPHONY ORCHESTRA 3 FRANCIS POULENC / GLORIA En août 1936, alors que Francis Poulenc, l’infatigable enfant

Musikkekteparet Olaus Andreas Grøndahl og Agathe Backer ...

Data Mining in Bioinformatics Day 9: Graph Mining in ... · Data Mining in Bioinformatics Day 9: Graph Mining in Chemoinformatics Chloé-Agathe Azencott & Karsten Borgwardt February

BOOK NOTES - Scholastic · 19.Chloé et les animaux du refuge. PB, Kindergarten-Gr 2 Chloé takes it upon herself to read to the animals at the shelter to lift their spirits. 20.Chloé

publicaciones.cucsh.udg.mxpublicaciones.cucsh.udg.mx/pperiod/esthom/pdfs/36... · Chloé Pomedio Chloé Pomedio 36 Departamento de Estudios Mesoamericanos y Mexicanos Centro Universitario

User's Manual AGATHE software Flying Insect Countingdownload.advansee.com/docs/ADV_UM_9019_02-AGATHE_user_man… · User's Manual AGATHE software Flying Insect Counting ... This document