Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/... · The Beta...

Christopher M. Bishop

PATTERN RECOGNITION AND MACHINE LEARNING

Polynomial Curve Fitting

Sum-of-Squares Error Function

0th Order Polynomial

1st Order Polynomial

3rd Order Polynomial

Over-fitting

Root-Mean-Square (RMS) Error:

Polynomial Coefficients

Data Set Size:

Regularization

Penalize large coefficient values

Regularization:

Regularization: vs.

Polynomial Coefficients

The Gaussian Distribution

Gaussian Parameter Estimation

Likelihood function

Maximum (Log) Likelihood

Properties of and

Curve Fitting Re-visited

Maximum Likelihood

Determine by minimizing sum-of-squares error, .

Predictive Distribution

MAP: A Step towards Bayes

Determine by minimizing regularized sum-of-squares error, .

Bayesian Curve Fitting

Bayesian Predictive Distribution

Model Selection

Cross-Validation

Parametric Distributions

Basic building blocks:

Need to determine given

Representation: or ?

Recall Curve Fitting

Binary Variables (1)

Coin flipping: heads=1, tails=0

Bernoulli Distribution

Binary Variables (2)

N coin flips:

Binomial Distribution

Parameter Estimation (1)

ML for BernoulliGiven:

Parameter Estimation (2)

Example:

Prediction: all future tosses will land heads up

Overfitting to D

Beta Distribution

Distribution over .

Bayesian Bernoulli

The Beta distribution provides the conjugate prior for the Bernoulli distribution.

Beta Distribution

Prior ∙ Likelihood = Posterior

Properties of the Posterior

As the size of the data set, N , increase

Prediction under the Posterior

What is the probability that the next coin toss will land heads up?

Multinomial Variables

1-of-K coding scheme:

ML Parameter estimation

Given:

Ensure , use a Lagrange multiplier, ¸.

The Multinomial Distribution

The Dirichlet Distribution

Conjugate prior for the multinomial distribution.

Bayesian Multinomial (1)

Bayesian Multinomial (2)

Maximum Likelihood for the Gaussian (1)

Given i.i.d. data , the log likeli-hood function is given by

Sufficient statistics

Set the derivative of the log likelihood function to zero,

and solve to obtain

Similarly

Under the true distribution

Hence define

Bayesian Inference for the Gaussian (1)

Assume ¾2 is known. Given i.i.d. data, the likelihood function for

¹ is given by

This has a Gaussian shape as a function of ¹ (but it is not a distribution over ¹).

Combined with a Gaussian prior over ¹,

this gives the posterior

Completing the square over ¹, we see that

… where

Example: for N = 0, 1, 2 and 10.

Sequential Estimation

The posterior obtained after observing N { 1data points becomes the prior when we observe the N th data point.

Now assume ¹ is known. The likelihood function for ̧ = 1/¾2 is given by

This has a Gamma shape as a function of ¸.

The Gamma distribution

Now we combine a Gamma prior, ,with the likelihood function for ¸ to obtain

which we recognize as with

If both ¹ and ¸ are unknown, the joint likelihood function is given by

We need a prior with the same functional dependence on ¹ and ¸.

The Gaussian-gamma distribution

• Quadratic in ¹.• Linear in ¸.

• Gamma distribution over ¸.• Independent of ¹.

The Gaussian-gamma distribution

Multivariate conjugate priors• ¹ unknown, ¤ known: p(¹) Gaussian.

• ¤ unknown, ¹ known: p(¤) Wishart,

• ¤ and ¹ unknown: p(¹,¤) Gaussian-Wishart,

Infinite mixture of Gaussians.

Student’s t-Distribution

Robustness to outliers: Gaussian vs t-distribution.

Student’s t-Distribution

The D-variate case:

where .

Properties:

The Exponential Family (1)

where ´ is the natural parameter and

so g(´) can be interpreted as a normalization coefficient.

The Exponential Family (2.1)

The Bernoulli Distribution

Comparing with the general form we see that

and so

Logistic sigmoid

The Bernoulli distribution can hence be written as

The Multinomial Distribution

where, , and

NOTE: The ´k parameters are not independent since the corresponding ¹k must satisfy

Let . This leads to

Here the ´k parameters are independent. Note that

Softmax

The Multinomial distribution can then be written as

The Exponential Family (4)

ML for the Exponential Family (1)

From the definition of g(´) we get

ML for the Exponential Family (2)

Give a data set, , the likelihood function is given by

Thus we have

Sufficient statistic

Conjugate priors

For any member of the exponential family, there exists a prior

Combining with the likelihood function, we get

Prior corresponds to º pseudo-observations with value Â.

Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/... · The Beta...

Documents

Transcript of Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/... · The Beta...

6. Bernoulli and Binomial 2017 - UMasspeople.umass.edu/biep540w/pdf/6. Bernoulli and Binomial... · 2017. 10. 15. · The Bernoulli and Binomial probability distribution models are

Statistical Distributions. ． Bernoulli Distribution ． Binomial Distribution ． Hypergeometric Distribution ． Multinomial Distribution ． Poisson Distribution.

6. Bernoulli and Binomial 2019 - UMass. Bernoulli and Binomial 2019.pdfthe possible values that a random variable can take on. The Bernoulli and Binomial probability distribution models

1 Week 3.. 2 3 NORMAL DISTRIBUTION BERNOULLI TRIALS BINOMIAL DISTRIBUTION EXPONENTIAL DISTRIBUTION UNIFORM DISTRIBUTION POISSON DISTRIBUTION.

The Bernoulli Core Approach and Bayesian Modeling · The Bernoulli Core Approach and Bayesian Modeling: An Analysis of Income Distribution in Japan * Hiroshi Hamada (Tohoku University)

. STATISTICS.docx · Web viewRandom variables (discrete and continuous). Bernoulli trials and Bernoulli random variables, distribution function and its properties. Mean and variance.

Bernoulli Automatic Coarse Filters - HiFlux Filtrationhiflux-filtration.dk/media/23534/bernoulli-uk.pdf · Bernoulli Automatic Coarse Filters The Bernoulli filter is an automatic

Product Sheet Bernoulli Filter. Bernoulli Filter Venezuela.

Introduction - math.uh.educlimenha/doc/no-conjugate-points.pdfGEODESIC FLOWS WITHOUT CONJUGATE POINTS 3 mixing, and is the limiting distribution of closed orbits. In particular, the

Empirical and Discrete Distributionscwinton/html/cop4300/s09/class.notes/d4-DiscreteDistributions.pdfBernoulli DistributionBernoulli Distribution • Bernoulli distribution – Given

Prior-Posterior Analysis and Conjugacyjltobias/BayesClass/...Conjugate Bernoulli Trials Conjugate Bernoulli Trials By Bayes’ Theorem: p( jy) /p( )p(yj ): Putting the previous parts

CHAPTER 15netdrive.puiying.edu.hk/~ms/f6notes/MS-… · Web view · 2008-02-16EXERCISE 15.1 Section 15.1 The Bernoulli distribution (page 332) 1. (a) Bernoulli, two possible outcomes:

Conjugate Bayesian analysis of the Gaussian distributionmurphyk/Papers/bayesGauss.pdf · Conjugate Bayesian analysis of the Gaussian distribution ... Bayesian inference and conjugate

Lecture 6: Special Probability Distributionsweb.iku.edu.tr/~eyavuz/dersler/probability/6-nopause.pdf · The Discrete Uniform Distribution The Bernoulli Distribution The Binomial Distribution

Probability (10A) - Wikimedia CommonsA single success / failure experiment a Bernoulli trial or Bernoulli experiment a single trial, i.e., n = 1, the binomial distribution is a Bernoulli

Taking the “Convoluted” out of Bernoulli Convolutionsmdelcour/DelcourtEngberg.pdf · Conclusion Bernoulli convoluted Functional equation q = 2=3 Bernoulli convoluted A Bernoulli

39165346 Bernoulli s Theorem Distribution Experiment

Extension of Generalized Bernoulli Learning Modelsfile.scirp.org/pdf/OJMSi_2015011410470765.pdf · 3. Limiting Distribution of the Bernoulli Learning Model In this section we study

1 Chapter 4: Commonly Used Distributions. 2 Section 4.1: The Bernoulli Distribution We use the Bernoulli distribution when we have an experiment which.

Multivariate Bernoulli Distribution Models - stat.wisc.eduwahba/ftp1/tr1171.pdf · MULTIVARIATE BERNOULLI DISTRIBUTION MODELS by Bin Dai A dissertation submitted in partial fulﬁllment