MCMC Algorithms - Saquib

download MCMC Algorithms - Saquib

of 25

Transcript of MCMC Algorithms - Saquib

  • 7/31/2019 MCMC Algorithms - Saquib

    1/25

    MCMC algorithms: Metropolis-

    Hastings and its variants

    Data Mining Seminar Fall 2012

    Nazmus Saquib

  • 7/31/2019 MCMC Algorithms - Saquib

    2/25

    Motivation

    Metropolis among the top 10 algorithms in

    science and engineering.

    Use in Statistics, Econometrics, Physics,

    Computing science.

    Example: High dimensional problems such as

    computing the volume of a convex body in d

    dimensions.

  • 7/31/2019 MCMC Algorithms - Saquib

    3/25

    Motivation

    Normalizing factor in Bayes Theorem:

    Statistical Mechanics: Partition function Z

  • 7/31/2019 MCMC Algorithms - Saquib

    4/25

    Back to Monte Carlo

    Monte Carlo Simulation:

    Draw i.i.d. set of N samples {x_(i)}.

    Almost surely converges

    Using central limit theorem.

  • 7/31/2019 MCMC Algorithms - Saquib

    5/25

    Rejection Sampling

    Sample another easy to use distribution q(x)

    that satisfies p(x)

  • 7/31/2019 MCMC Algorithms - Saquib

    6/25

    Importance Sampling

  • 7/31/2019 MCMC Algorithms - Saquib

    7/25

    Why MCMC?

    Wasting resources we need to spend more

    time on the tail that overlaps with E.

  • 7/31/2019 MCMC Algorithms - Saquib

    8/25

    MCMC Principles

    Even with adaptation, often impossible to obtain proposaldistributions that are easy to sample from and goodapproximations at the same time.

    Markov Chain is used to explore the state space X.

    Transition matrix (kernels) are constructed so that the chainspends more time in the important regions.

  • 7/31/2019 MCMC Algorithms - Saquib

    9/25

    MCMC Principles

    For any starting point, the chain will converge

    to the invariant distribution p(x) As long as T is a stochastic transition matrix

    Irreducible graph should be connected.

    Aperiodicity chain should not get trapped in cycles.

  • 7/31/2019 MCMC Algorithms - Saquib

    10/25

    Detailed Balance (reversibility)

    Condition

    One way to design a MCMC sampler is to

    satisfy this condition. However, convergence speed plays a more

    crucial role in terms of practicalities.

  • 7/31/2019 MCMC Algorithms - Saquib

    11/25

    Spectral Theory and Convergence

    (brief review)

    Note that p(x) is the left eigenvector of the matrix

    T with corresponding eigenvalue 1 (Perron-Frobenius theorem).

    Remaining eigenvalues are less than 1.

    Second largest eigenvalue, therefore, determinesthe rate of convergence. Should be as small aspossible.

  • 7/31/2019 MCMC Algorithms - Saquib

    12/25

    Application: PageRank (Google)

    T = L + E, where L is a large link matrix.

    L_(i,j) = normalized number of links from websiteI to website j.

    E = uniform random matrix of small magnitudeadded to L to ensure irreducibility andaperiodicity. (addition of noise).

    [L + E] p(x_(i+1)) = p(x_i)

    Transition matrix as kernels: design differentkernels to introduce bias etc. to make the resultsmore interesting.

  • 7/31/2019 MCMC Algorithms - Saquib

    13/25

    Mathematical Representation

    Based on different kernels, different kinds of

    Markov Chain algorithms are possible.

    The most celebrated is the Metropolis-Hastings algorithm.

  • 7/31/2019 MCMC Algorithms - Saquib

    14/25

    Metropolis-Hastings Algorithm

  • 7/31/2019 MCMC Algorithms - Saquib

    15/25

    Metropolis-Hastings Algorithm

  • 7/31/2019 MCMC Algorithms - Saquib

    16/25

    Metropolis-Hastings Algorithm

    (properties)

    Kernel:

    Rejection Term:

    Detailed Balance:

  • 7/31/2019 MCMC Algorithms - Saquib

    17/25

    Independent Sampler Algorithm

    Proposal is independent of the current state.

    Algorithm is close to importance sampling, but

    now the samples are correlated, since they result

    from comparing one sample to the other.

  • 7/31/2019 MCMC Algorithms - Saquib

    18/25

    Metropolis Algorithm

    Assumes a symmetric random walk proposal.

  • 7/31/2019 MCMC Algorithms - Saquib

    19/25

    Metropolis Algorithm

    Normalizing constant of the target distribution

    is not required. (Cancels each other out)

    Parallelization Several independent chains

    can be simulated in parallel.

    Success or failure depends on the parameters

    selected for the proposal distribution.

  • 7/31/2019 MCMC Algorithms - Saquib

    20/25

    Metropolis Algorithm

  • 7/31/2019 MCMC Algorithms - Saquib

    21/25

    Simulated Annealing

    Global Optimization.

    Could be estimated by

    Argmax p(x_i), x_i, i = 1..N

    Inefficient because random samples rarely comefrom the vicinity of the mode (blind samplingunless the distribution has large probabilitymass around the mode).

    Simulated Annealing is a variant ofMCMC/Metropolis-Hastings that solves thisproblem.

  • 7/31/2019 MCMC Algorithms - Saquib

    22/25

    Simulated Annealing

  • 7/31/2019 MCMC Algorithms - Saquib

    23/25

    Simulated Annealing

  • 7/31/2019 MCMC Algorithms - Saquib

    24/25

    Other Methods

    Mixture of Kernels! Could be very useful when target distribution has many

    peaks Can incorporate global proposals to explore vast regions

    of the state space. (global proposal locks into peaks)

    Local proposals to discover finer details. (explore spacearound peaks)

  • 7/31/2019 MCMC Algorithms - Saquib

    25/25

    Gibbs Sampling etc..

    Parasaran..

    Thank you!