Bayes=(Prior,Posterior,evidence,probability) &curve fitting

download Bayes=(Prior,Posterior,evidence,probability) &curve fitting

of 23

Transcript of Bayes=(Prior,Posterior,evidence,probability) &curve fitting

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    1/63

    Using Probability Theory

    Curve Fitting &Multisensory Integration

    Hannah Chen, Rebecca Roseman and Adam Rule | COGS 202 | 4.14.2

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    2/63

    Overheard at Porters

    “You’re optimizing for the wrong thing!”

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    3/63

    What would you say?

    What makes for a good model?

    Best performance?Fewest assumptions?Most elegant?Coolest name?

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    4/63

    Motivation Finding Patterns

    Tools Probability Theory

    Applications Curve FittingMultimodal Sensory Integra

    Agenda

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    5/63

    FINDING PATTERNSThe Motivation

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    6/63

    You have data... now find patterns

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    7/63

    You have data... now find patterns

    Unsupervisedx = data (training)

    y(x) = model

    ClusteringDensity estimation

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    8/63

    You have data... now find patterns

    Unsupervisedx = data (training)

    y(x) = model

    ClusteringDensity estimation

    Supervisedx = data (trainint = (target vectoy(x) = model

    ClassificationRegression

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    9/63

    You have data... now find patterns

    Unsupervisedx = data (training)

    y(x) = model

    ClusteringDensity estimation

    Supervisedx = data (trainint = (target vectoy(x) = model

    ClassificationRegression

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    10/63

    Important Questions

    What kind of model is appropriate?What makes a model accurate?Can a model be too accurate?

    What are our prior beliefs about the model?

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    11/63

    PROBABILITY THEORYThe Tools

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    12/63

    Properties of a distribution

    x = eventp(x) = prob. of event

    1.2.

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    13/63

    Rules

    Sum

    Product

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    14/63

    Example (Discrete)

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    15/63

    Bayes Rule (review)

    prior

    evidence

    likelihoodposterior

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    16/63

    Probability Density vs. Mass Function

    PDF PMF

    continuous discrete

    Intuition: How much probability f(x)concentrated near x per length dx, how

    dense is probability near x

    Intuition: Probability mass is sameinterpretation but from discrete point of

    view: f(x) is probability for each point,whereas in PDF f(x) is probability for ainterval dx

    Notation: p(x) Notation: P(x)

    p(x) can be greater than 1, as long as

    integral over entire interval is 1

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    17/63

    Expectation & Covariance

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    18/63

    CURVE FITTING Application

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    19/63

    Observed Data: Given n points (x,y)

    Curve Fitting

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    20/63

    Observed Data: Given n points (x,t)Textbook Example): generate x uniformly from ra[0,1] and calculate target data t with sin(2πx) funcnoise with Gaussian distribution

    Why?

    Curve Fitting

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    21/63

    Observed Data: Given n points (x,t)Textbook Example): generate x uniformly from ra[0,1] and calculate target d ata t with sin(2πx) funnoise with Gaussian distribution

    Why?Real data sets typically have underlying regularity thatwe are trying to learn.

    Curve Fitting

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    22/63

    Observed Data: Given n points (x,y)

    Goal: use observed data to predict new targetvalues t’ for new values of x’

    Curve Fitting

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    23/63

    Curve Fitting

    Observed Data: Given n points (x,y)Can we fit a polynomial function with this data

    What values for w and M fits this data well?

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    24/63

    How to measure goodness of fit?

    Minimize an error function

    Sum of squares

    of error

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    25/63

    Overfitting

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    26/63

    Combatting Overfitting

    1. Increase data points

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    27/63

    Combatting Overfitting

    1. Increase data points

    Data observations may have just been noisy

    With more data, can see if data variation is due to noise or if is part ofunderlying relationship between observations

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    28/63

    Combatting Overfitting

    1. Increase data points

    Data observations may have just been noisy

    With more data, can see if data variation is due to noise or if is part ofunderlying relationship between observations

    2. Regularization

    Introduce penalty term

    Trade off between good fit and penalty

    Hyperparameter, , is input to model. Hyperparam will reduce overfittinin turn reducing variance and increasing bias (difference between

    estimated and true target)

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    29/63

    How to check for overfitting?

    Training and validation subsetheuristic: data points > (multiple)(# parameters)

    Training vs Testingdon’t touch test set until you are actuallyevaluating experiment!!

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    30/63

    Cross-validation

    1. Use portion, (S-1)/S, for training (white)

    2. Assess performance (red)

    3. Repeat for each run4. Average performance scores

    4-fold cross-validation

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    31/63

    Cross-validation

    When to use?

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    32/63

    Cross-validation

    When to use?Validation set is small. If very little data, use S=number ofobserved data points

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    33/63

    Cross-validation

    When to use?Validation set is small. If very little data, use S=number ofobserved data points

    Limitations:● Computationally expensive - # of training runs increase

    by factor of S● Might have multiple complexity parameters - may lead

    to training runs that is exponential in # of parameters

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    34/63

    Alternative Approach:

    Want an approach that depends only on size oftraining data rather than # of training runs

    e.g.) Akaike information criterion (AIC),Bayesian information criterion (BIC, Sec 4.4)

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    35/63

    Akaike Information Criterion (AIC)

    Choose model with largest value for

    best-fit loglikelihood

    # of adjustablemodel parameters

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    36/63

    Gaussian Distribution

    This satisfies the two properties for a probabilitdensity! (what are they?)

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    37/63

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    38/63

    Curve Fitting (ver. 1)

    Assumption:1. Given value of x, corresponding target value t has aGaussian distribution with mean y(x, w)

    2. Data { x,t } drawn independently from distribution:

    likelihood =

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    39/63

    Maximum Likelihood Estimation (MLE

    Log likelihood:

    What does maximizing the log likelihood looksimilar to?

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    40/63

    Log likelihood:

    What does maximizing the log likelihood looksimilar to?

    wrt w:

    Maximum Likelihood Estimation (MLE

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    41/63

    Simpler example: use Gaussian of form

    With Bayes’ can calculate posterior:

    Maximum Posterior (MAP)

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    42/63

    Maximum Posterior (MAP)

    Determine w by maximizing posteriordistributionEquivalent to taking negative log of:

    and combining the Gaussian & log likelihoodfunction from earlier...

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    43/63

    Maximum Posterior (MAP)

    Minimum of

    What does this look like?

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    44/63

    Maximum Posterior (MAP)

    Minimum of

    What does this look like?

    What is regularization parameter?

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    45/63

    Maximum Posterior (MAP)

    Minimum of

    What does this look like?

    What is regularization parameter?

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    46/63

    Bayesian Curve Fitting

    However…

    MLE and MAP are not fully Bayesian becausethey involve using point estimates for w

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    47/63

    Curve Fitting (ver. 2)Given training data { x, t} and new point x,

    predict the target value t

    Assume parameters are fixed

    Evaluate predictive distribution:

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    48/63

    Bayesian Curve Fitting

    Fully Bayesian approach requires integratingover all values of w, by applying sum andproduct rules of probability

    B i C Fi i

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    49/63

    Bayesian Curve Fittingmarginalization

    posterior

    This posterior is Gaussian and can beevaluated analytically (Sec 3.3)

    B i C Fi i

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    50/63

    Bayesian Curve Fitting

    Predictive is Gaussian of form

    with mean and variance and matrix

    B i C Fi i

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    51/63

    Bayesian Curve Fitting

    Need to define

    Mean and variance depend on x as a result ofmarginalization

    (not the case in MLE/MAP

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    52/63

    MULTIMODAL SENSORYINTEGRATION

    Application

    T M d l

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    53/63

    Two ModelsVisual Capture

    p r o

    b a b i l i t y

    location

    v

    a

    T M d l

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    54/63

    Two ModelsVisual Capture MLE

    location

    p r o

    b a b i l i t y v

    a

    Proced re

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    55/63

    Procedure

    Final Result

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    56/63

    Final Result

    location

    p r o b a b

    i l i t y v

    a

    The Math (MLE Model)

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    57/63

    The Math (MLE Model)

    Likelihood

    The Math (MLE Model)

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    58/63

    The Math (MLE Model)

    Likelihood

    The Math (“Bayesian” Model)

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    59/63

    The Math ( Bayesian Model)

    Likelihood * Prior

    Uniform Inverse Gamma

    The Math (Empirical)

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    60/63

    The Math (Empirical)

    logistic function

    likelihood

    location estimates

    weight constraint

    Final Result

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    61/63

    Final Result

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    62/63

    QUESTIONS?

    Two Models (Prediction)

  • 8/9/2019 Bayes=(Prior,Posterior,evidence,probability) &curve fitting

    63/63

    Two Models (Prediction)

    Visual Capture MLE

    Visual Noise

    V i s u a l

    W e i g h

    t

    Visual Noise

    V i s u a l

    W e i g h

    t