Introduction to Machine Learning - Statistical Machine Learning · 2020-05-26 · All of...

Introduction to Machine Learning

Statistical Machine Learning

Varun ChandolaComputer Science & Engineering

State University of New York at BuffaloBuffalo, NY, USA

chandola@buffalo.edu

Chandola@UB CSE 474/574 1 / 30

Outline

Statistical Machine Learning - Introduction

Introduction to Probability

Random Variables

Bayes Rule

Continuous Random Variables

Different Types of Distributions

Handling Multivariate Distributions

Functional MethodsI y = f (x)

I Learn f () using training data

I y∗ = f (x∗) for a test data instance

Statistical/Probabilistic Methods

I Calculate the conditional probability of the target to be y , given thatthe input is x

I Assume that y |x is random variable generated from a probabilitydistribution

I Learn parameters of the distribution using training data

Functional MethodsI y = f (x)

I Learn f () using training data

I y∗ = f (x∗) for a test data instance

Statistical/Probabilistic Methods

I Calculate the conditional probability of the target to be y , given thatthe input is x

I Assume that y |x is random variable generated from a probabilitydistribution

I Learn parameters of the distribution using training data

What is Probability? [2, 1]

I Probability that a coin will land heads is 50%1

I What does this mean?

1Dr. Persi Diaconis showed that a coin is 51% likely to land facing the same wayup as it is started.

Frequentist Interpretation

I Number of times an event will be observed in n trials

I What if the event can only occur once?I My winning the next month’s powerball.I Polar ice caps melting by year 2020.

Frequentist Interpretation

I Number of times an event will be observed in n trials

I What if the event can only occur once?I My winning the next month’s powerball.I Polar ice caps melting by year 2020.

Bayesian Interpretation

I Uncertainty of the event

I Use for making decisionsI Should I put in an offer for a sports car?

What is a Random Variable (X )?

I Can take any value from XI Discrete Random Variable - X is finite/countably finite

I Categorical??

I Continuous Random Variable - X is infinite

I P(X = x) or P(x) is the probability of X taking value xI an event

I What is a distribution?

Examples

1. Coin toss (X = {heads, tails})2. Six sided dice (X = {1, 2, 3, 4, 5, 6})

Notation, Notation, Notation

I X - random variable (X if multivariate)

I x - a specific value taken by the random variable ((x if multivariate))

I P(X = x) or P(x) is the probability of the event X = x

I p(x) is either the probability mass function (discrete) orprobability density function (continuous) for the random variableX at xI Probability mass (or density) at x

Basic Rules - Quick Review

I For two events A and B:I P(A ∨ B) = P(A) + P(B)− P(A ∧ B)I Joint Probability

I P(A,B) = P(A ∧ B) = P(A|B)P(B)I Also known as the product rule

I Conditional Probability

I P(A|B) = P(A,B)P(B)

Chain Rule of Probability

I Given D random variables, {X1,X2, . . . ,XD}

P(X1,X2, . . . ,XD) = P(X1)P(X2|X1)P(X3|X1,X2) . . .P(XD |X1,X2, . . . ,XD)

Marginal Distribution

I Given P(A,B) what is P(A)?I Sum P(A,B) over all values for B

P(A) =∑b

P(A,B) =∑b

P(A|B = b)P(B = b)

I Sum rule

Bayes Rule or Bayes Theorem

I Computing P(X = x |Y = y):

Bayes Theorem

P(X = x |Y = y) =P(X = x ,Y = y)

P(Y = y)

=P(X = x)P(Y = y |X = x)∑x′ P(X = x ′)P(Y = y |X = x ′)

Example

I Medical Diagnosis

I Random event 1: A test is positive or negative (X )

I Random event 2: A person has cancer (Y ) – yes or no

I What we know:

1. Test has accuracy of 80%2. Number of times the test is positive when the person has cancer

P(X = 1|Y = 1) = 0.8

3. Prior probability of having cancer is 0.4%

P(Y = 1) = 0.004

Question?

If I test positive, does it mean that I have 80% rate of cancer?

Base Rate Fallacy

I Ignored the prior information

I What we need is:P(Y = 1|X = 1) =?

I More information:I False positive (alarm) rate for the testI P(X = 1|Y = 0) = 0.1

P(Y = 1|X = 1) =P(X = 1|Y = 1)P(Y = 1)

P(X = 1|Y = 1)P(Y = 1) + P(X = 1|Y = 0)P(Y = 0)

Classification Using Bayes Rule

I Given input example x, find the true class

P(Y = c |X = x)

I Y is the random variable denoting the true class

I Assuming the class-conditional probability is known

P(X = x|Y = c)

I Applying Bayes Rule

P(Y = c |X = x) =P(Y = c)P(X = x|Y = c)∑c P(Y = c ′))P(X = x|Y = c ′)

Independence

I One random variable does not depend on another

I A ⊥ B ⇐⇒ P(A,B) = P(A)P(B)

I Joint written as a product of marginals

I Conditional Independence

A ⊥ B|C ⇐⇒ P(A,B|C ) = P(A|C )P(B|C )

I X is continuous

I Can take any value

I How does one define probability?I

∑x P(X = x) = 1

I Probability that X lies in an interval [a, b]?I P(a < X ≤ b) = P(x ≤ b)− P(x ≤ a)I F (q) = P(x ≤ q) is the cumulative distribution functionI P(a < X ≤ b) = F (b)− F (a)

I X is continuous

I Can take any value

I How does one define probability?I

∑x P(X = x) = 1

I Probability that X lies in an interval [a, b]?I P(a < X ≤ b) = P(x ≤ b)− P(x ≤ a)I F (q) = P(x ≤ q) is the cumulative distribution functionI P(a < X ≤ b) = F (b)− F (a)

Probability Density

Probability Density Function

p(x) =∂

∂xF (x)

P(a < X ≤ b) =

p(x)dx

I Can p(x) be greater than 1?

Expectation

I Expected value of a random variable

I What is most likely to happen in terms of X?

I For discrete variables

E[X ] ,∑x∈X

xP(X = x)

I For continuous variables

E[X ] ,∫Xxp(x)dx

I Mean of X (µ)

Expectation of Functions of Random Variable

I Let g(X ) be a function of X

I If X is discrete:

E[g(X )] ,∑x∈X

g(x)P(X = x)

I If X is continuous:

E[g(X )] ,∫Xg(x)p(x)dx

PropertiesI E[c] = c , c - constant

I If X ≤ Y , then E[X ] ≤ E[Y ]

I E[X + Y ] = E[X ] + E[Y ]

I E[aX ] = aE[X ]

I var [X ] = E[(X − µ)2] =E[X 2]− µ2

I Cov [X ,Y ] = E[XY ]−E[X ]E[Y ]

I Jensen’s inequality: If ϕ(X ) isconvex,

ϕ(E[X ]) ≤ E[ϕ(X )]

What is a Probability Distribution?

DiscreteI Binomial,Bernoulli

I Multinomial, Multinoulli

I Poisson

I Empirical

ContinuousI Gaussian (Normal)

I Degenerate pdf

I Laplace

I Gamma

I Beta

I Pareto

Binomial Distribution

I X = Number of heads observed in n coin tosses

I Parameters: n, θ

I X ∼ Bin(n, θ)

I Probability mass function (pmf)

Bin(k |n, θ) ,

)θk(1− θ)n−k

Bernoulli DistributionI Binomial distribution with n = 1

I Only one parameter (θ)

Multinomial Distribution

I Simulates a K sided die

I Random variable x = (x1, x2, . . . , xK )

I Parameters: n, θ

I θ ← <K

I θj - probability that j th side shows up

Mu(x|n,θ) ,

x1, x2, . . . , xK

) K∏j=1

Multinoulli DistributionI Multinomial distribution with n = 1

I x is a vector of 0s and 1s with only one bit set to 1

I Only one parameter (θ)

Gaussian (Normal) Distribution

N (x |µ, σ2) ,1√

2πσ2e−

12σ2 (x−µ)2

I Parameters:

1. µ = E[X ]2. σ2 = var [X ] = E[(X − µ)2]

I X ∼ N (µ, σ2)⇔ p(X = x) = N (µ, σ2)

I X ∼ N (0, 1) ⇐ X is a standard normalrandom variable

I Cumulative distribution function:

Φ(x ;µ, σ2) ,∫ x

−∞N (z |µ, σ2)dz

Joint Probability Distributions

I Multiple related random variables

I p(x1, x2, . . . , xD) for D > 1 variables (X1,X2, . . . ,XD)

I Discrete random variables?

I Continuous random variables?

I What do we measure?

CovarianceI How does X vary with respect to Y

I For linear relationship:

cov [X ,Y ] , E[(X − E[X ])(Y − E[Y ])] = E[XY ]− E[X ]E[Y ]

Covariance and Correlation

I x is a d-dimensional random vector

cov [X] , E[(X− E[X])(X− E[X])>]

var [X1] cov [X1,X2] · · · cov [X1,Xd ]

cov [X2,X1] var [X2] · · · cov [X2,Xd ]...

.... . .

...cov [Xd ,X1] cov [Xd ,X2] · · · var [Xd ]

I Covariances can be between 0 and ∞I Normalized covariance ⇒ Correlation

Correlation

I Pearson Correlation Coefficient

corr [X ,Y ] ,cov [X ,Y ]√var [X ]var [Y ]

I What is corr [X ,X ]?

I −1 ≤ corr [X ,Y ] ≤ 1

I When is corr [X ,Y ] = 1?

I Y = aX + b

Correlation

I Pearson Correlation Coefficient

corr [X ,Y ] ,cov [X ,Y ]√var [X ]var [Y ]

I What is corr [X ,X ]?

I −1 ≤ corr [X ,Y ] ≤ 1

I When is corr [X ,Y ] = 1?I Y = aX + b

Multivariate Gaussian Distribution

I Most widely used joint probability distribution

N (X|µ,Σ) ,1

(2π)D/2|Σ|1/2exp

2(x− µ)>Σ−1(x− µ)

References

E. Jaynes and G. Bretthorst.Probability Theory: The Logic of Science.Cambridge University Press Cambridge:, 2003.

L. Wasserman.All of Statistics: A Concise Course in Statistical Inference (SpringerTexts in Statistics).Springer, Oct. 2004.

Introduction to Machine Learning - Statistical Machine Learning · 2020-05-26 · All of...

Documents

Transcript of Introduction to Machine Learning - Statistical Machine Learning · 2020-05-26 · All of...

Springer Texts in Statistics - UNAMsistemas.fciencias.unam.mx/~ediaz/Cursos/Estadistica3/Libros/Time... · Springer Texts in Statistics Alfred: Elements of Statistics for the Life

Springer Series in Statistics978-1-4612-0795-5/1.pdf · Springer Series in Statistics Advisors: ... Statistical Decision Theory and Bayesian Analysis, 2nd edition. ... Some knowledge

Springer Series in Statistics978-1-4612-1554-7/1.pdf · Springer Series in Statistics ... Statistical Decision Theory and Bayesian Analysis, 2nd edition. ... the mathematical arguments

Springer-Verlag - Statistical Data Mining - Ripley Bd 2002.pdf

Select Titles from Springer · Second Edition, provides a comprehensive intro-duction to the principles of mathematical statistics which underpin statistical analyses in the fields

Machine Learning: unsupervised Methods · •Statistical Data Analysis: B. Shahbaba; Biostatistics with R: An Introduction to Statistics Through Biological Data; Springer, series

Springer Texts in Statistics - ESL CN...A First Course in Bayesian Statistical Methods 123 Peter D. Hoff Department of Statistics University of Washington Seattle WA 98195-4322 USA

Statistics/Statistical Hypothesis

HYPOTHESIS TESTING. Statistical Methods Estimation Hypothesis Testing Inferential Statistics Descriptive Statistics Statistical Methods.

STATISTICAL Statistics STATISTICAL PORTFOLIO … Estimation.pdfK14575 STATISTICAL Statistics PORTFOLIO ESTIMATION STATISTICAL PORTFOLIO ESTIMATION Masanobu Taniguchi, Hiroshi Shiraishi,

Springer Texts in Statistics978-0-387-27605-2/1.pdf · Springer Texts in Statistics ... Carmona: Statistical Analysis of Financial Data in S-Plus ... We won’t here comment on

TURKISH STATISTICAL INSTITUTE - United Nations · TURKISH STATISTICAL INSTITUTE Demographic Statistics Department Vital and Gender Statistics Group Death Statistics Death Statistics

Statistical Mechanics statistical distributions Maxwell-Boltzmann Statistics

Springer Texts in Statistics - site.iugaza.edu.pssite.iugaza.edu.ps/nbarakat/files/2010/02/Introduction_to_TSFor... · Springer Texts in Statistics ... the software tutorial, ...

Springer Series in Statistics - SJTUins.sjtu.edu.cn/files/common/20121209191850_7.pdf · Springer Series in Statistics Advisors: P. Bickel, P. Diggle, S. Fienberg, U. Gather, ...

Statistics 1023A (Statistical Concepts)€¦ · Statistics 1023A (Statistical Concepts) and Statistics 2037A (Statistics for Health) Fall 2018 Course Syllabus ... LO1: Correctly use

Statistics Statistics Data measurement, probability and statistical tests.

Publication - Statistical Yearbook 2002 - International ... · Statistical Yearbook 2002 International Statistics International statistics 1. International statistics The need for

Statistics Quick Reference : statistical tables equations ...staff.estem-uc.edu.au/files/business-statistics-quick-reference/... · Statistics Quick Reference statistical tables equations

...The Elements of Statistical Learning Springer Series in Statistics Trevor Hastie Robert Tibshirani Jerome Friedman The Elements of Statistical Learning Data Mining, Inference, and