QMDA

download QMDA

If you can't read please download the document

description

QMDA. Review Session. Things you should remember. 1. Probability & Statistics. the Gaussian or normal distribution p(x) = exp{ - (x-x) 2 / 2 s 2 ). variance. expected value. 1  (2 p ) s. x. x+2 s. x-2 s. Properties of the normal distribution. Expectation = Median = - PowerPoint PPT Presentation

Transcript of QMDA

  • QMDAReview Session

  • Things you should remember

  • 1. Probability & Statistics

  • the Gaussian or normal distribution

    p(x) = exp{ - (x-x)2 / 2s2 ) 1(2p)sexpected valuevariance

  • xp(x)95%Expectation =Median =Mode = x

    95% of probability within 2s of the expected valueProperties of the normal distribution

  • Multivariate Distributions

    The Covariance Matrix, C, is very important

    Cij

    the diagonal elements give the variance of each xi

    sxi2 = Cii

  • The off-diagonal elemements of C indicate whether pairs of xs are correlated. E.g.

    C12C120positive correlation

  • the multivariate normal distribution

    p(x) = (2)-N/2 |Cx|-1/2 exp{ -1/2 (x-x)T Cx-1 (x-x) }

    has expectation x

    covariance Cx

    And is normalized to unit area

  • if y is linearly related to x, y=Mx then

    y=Mx (rule for means)

    Cy = M Cx MT(rule for propagating error)

    These rules work regardless of the distribution of x

  • 2. Least Squares

  • Simple Least SquaresLinear relationship between data, d, and model, m

    d = Gm

    Minimize prediction error E=eTe with e=dobs-Gm

    mest = [GTG]-1GTd

    If data are uncorrelated with variance, sd2, then

    Cm = sd2 [GTG]-1

  • Least Squares with prior constraintsGiven uncorrelated with variance, sd2, that satisfy a linear relationship d = Gm

    And prior information with variance, sm2, that satisfy a linear relationship h = Dm

    The best estimate for the model parameters, mest, solves

    GeD

    d eh

    m =

    Previously, we discussed only the special case h=0With e = sm/sd.

  • Newtons Method for Non-Linear Least-Squares ProblemsGiven data that satisfies a non-linear relationship d = g(m)

    Guess a solution m(k) with k=0 and linearize around it:

    Dm = m-m(k) and Dd = d-g(m(k)) and Dd=GDm

    With Gij = gi/mj evaluated at m(k)

    Then iterate, m(k+1) = m(k) + Dm with Dm=[GTG]-1GTDd

    hoping for convergence

  • 3. Boot-straps

  • Investigate the statistics of y by

    creating many datasets yand examining their statistics

    each y is created throughrandom sampling with replacementof the original dataset y

  • y1y2y3y4y5y6y7yNy1y2y3y4y5y6y7yN437114196N original dataRandom integers in the range 1-NN resampled dataN1Si yiCompute estimateNow repeat a gazillion times and examine the resulting distribution of estimatesExample: statistics of the mean of y, given N data

  • 4. Interpolation and Splines

  • linear splinesxxixi+1yiyi+1yin this intervaly(x) = yi + (yi+1-yi)(x-xi)/(xi+1-xi)1st derivative discontinuous here

  • cubic splinesxxixi+1yiyi+1ycubic a+bx+cx2+dx3 in this intervala different cubic in this interval1st and 2nd derivative continuous here

  • 5. Hypothesis Testing

  • The Null Hypothesisalways a variant of this theme:

    the results of an experiment differs from the expected value only because of random variation

  • Test of Significance of Resultssay to 95% significance

    The Null Hypothesis would generate the observed result less than 5% of the time

  • Four important distributionsNormal distribution

    Chi-squared distribution

    Students t-distribution

    F-distributionDistribution of c2 = Si=1Nxi2Distribution of xiDistribution of t = x0 / { N-1 Si=1Nxi2 }

    Distribution of F = { N-1Si=1N xi2} / { M-1Si=1M xN+i2 }

  • 5 testsmobs = mprior when mprior and sprior are knownnormal distribution

    sobs = sprior when mprior and sprior are knownchi-squared distribution

    mobs = mprior when mprior is known but sprior is unknownt distribution

    s1obs = s2obs when m1prior and m2prior are knownF distribution

    m1obs = m2obs when s1prior and s2prior are unknownmodified t distribution

  • 6. filters

  • Filtering operation g(t)=f(t)*h(t)

    convolutiong(t) = -t f(t-t) h(t) dt gk = Dt Sp=-k fk-p hp

    g(t) = 0 f(t) h(t-t) dt gk = Dt Sp=0 fp hk-por alternatively

  • How to do convolution by handx=[x0, x1, x2, x3, x4, ]T and y=[y0, y1, y2, y3, y4, ]Tx0, x1, x2, x3, x4, y4, y3, y2, y1, y0x0y0Reverse on time-series, line them up as shown, and multiply rows. This is first element of x*y[x*y]2=x0, x1, x2, x3, x4, y4, y3, y2, y1, y0x0y1+x1y0Then slide, multiply rows and add to get the second element of x*yAnd etc [x*y]1=

  • Matrix formulations of g(t)=f(t)*h(t)

    g = F h

    g = H f

    and

  • X(0)X(1)X(2)X(N)f0f1fNA(0) A(1) A(2) A(1) A(0) A(1) A(2) A(1) A(0) A(N) A(N-1) A(N-2) =Least-squares equation [HTH] f = HTg

    g = H f

    Autocorrelation of hCross-correlation of h and g

  • Ai and XiAuto-correlation of a time-series, T(t)

    A(t) = -+ T(t) T(t-t) dt

    Ai = Sj Tj Tj-i

    Cross-correlation of two time-series T(1)(t) and T(2)(t)

    X(t) = -+ T(1)(t) T(2)(t-t) dt

    Xi = Sj T(1)j T(2)j-i

  • 7. fourier transforms and spectra

  • Integral transforms:

    C(w) = -+ T(t) exp(-iwt) dt

    T(t) = (1/2p) -+ C(w) exp(iwt) dw

    Discrete transforms (DFT)

    Ck = Sn=0N-1 Tn exp(-2pikn/N ) with k=0, , N-1

    Tn = N-1Sk=0N-1 Ck exp(+2pikn/N ) with n=0, , N-1

    Frequency step: DwDt = 2p/NMaximum (Nyquist) Frequency wmax = 1/ (2Dt)

  • Aliasing and cyclicity

    in a digital world wn+N = wn

    andsince time and frequency play symmetrical roles in exp(-iwt)

    tk+N = tk

  • C(w) = -+ d(t) exp(-iwt) dt = exp(0) = 1One FFT that you should know:

    FFT of a spike at t=0 is a constant

  • Error Estimates for the DFTAssume uncorrelated, normally-distributed data, dn=Tn, with variance sd2The matrix G in Gm=d is Gnk=N-1 exp(+2pikn/N ) The problem Gm=d is linear, so the unknowns, mk=Ck, (the coefficients of the complex exponentials) are also normally-distributed.Since exponentials are orthogonal, GHG=N-1I is diagonaland Cm= sd2 [GHG]-1 = N-1sd2I is diagonal, tooApportioning variance equally between real and imaginary parts of Cm, each has variance s2= N-1sd2/2.The spectrum sm2= Crm2+ Cim2 is the sum of two uncorrelated, normally distributed random variables and is thus c22-distributed.The 95% value of c22 is about 5.9, so that to be significant, a peak must exceed 5.9N-1sd2/2

  • Convolution Theorem

    transform[ f(t)*g(t) ] =

    transform[g(t)] transform[f(t)]

  • Power spectrum of a stationary time-seriesT(t) = stationary time series

    C(w) = -T/2+T/2 T(t) exp(-iwt) dt

    S(w) = limT T-1 |C(w)|2

    S(w) is called the power spectral density, the spectrum normalized by the length of the time series.

  • Relationship of power spectral density to DFTTo compute the Fourier transform, C(w), you multiply the DFT coefficients, Ck, by Dt.

    So to get power spectal densityT-1 |C(w)|2 =(NDt)-1 |Dt Ck|2 =(Dt/N) |Ck|2 You multiply the DFT spectrum, |Ck|2, by Dt/N.

  • Windowed TimeseriesFourier transform of long time-series

    convolved with the Fourier Transform of the windowing function

    is Fouier transform of windowed time-series

  • Window FunctionsBoxcarits Fourier transform is a sinc functionwhich has a narrow central peakbut large side lobes

    Hanning (Cosine) taperits Fourier transformhas a somewhat wider central peakbut now side lobes

  • 8. EOFs and factor analysis

  • SamplesNM(f1 in s1) (f2 in s1) (f3 in s1)(f1 in s2) (f2 in s2) (f3 in s2)(f1 in s3) (f2 in s3) (f3 in s3)(f1 in sN) (f2 in sN) (f3 in sN)(A in s1) (B in s1) (C in s1)(A in s2) (B in s2) (C in s2)(A in s3) (B in s3) (C in s3)(A in sN) (B in sN) (C in sN)=(A in f1) (B in f1) (C in f1)(A in f2) (B in f2) (C in f2)(A in f3) (B in f3) (C in f3)S = C FCoefficients NMFactors MMRepresentation of samples as a linear mixing of factors

  • SamplesNM(f1 in s1) (f2 in s1)(f1 in s2) (f2 in s2)(f1 in s3) (f2 in s3)(f1 in sN) (f2 in sN)(A in s1) (B in s1) (C in s1)(A in s2) (B in s2) (C in s2)(A in s3) (B in s3) (C in s3)(A in sN) (B in sN) (C in sN)=(A in f1) (B in f1) (C in f1)(A in f2) (B in f2) (C in f2)

    S C Fselectedcoefficients Npselectedfactors pMignore f3ignore f3data approximated with only most important factors

    p most important factors = those with the biggest coefficients

  • Singular Value Decomposition (SVD)

    Any NM matrix S and be written as the product of three matrices

    S = U L VT

    where U is NN and satisfies UTU = UUTV is MM and satisfies VTV = VVTandL is an NM diagonal matrix of singular values, li

  • SVD decomposition of S

    S = U L VT

    write as

    S = U L VT = [U L] [VT] = C F

    So the coefficients are C = U L

    and the factors are

    F = VT

    The factors with the biggest lis are the most important

  • Transformations of FactorsIf you chose the p most important factors, they define both a subspace in which the samples must lie, and a set of coordinate axes of that subspace. The choice of axes is not unique, and could be changed through a transformation, T

    Fnew = T Fold

    A requirement is that T-1 exists, else Fnew will not span the same subspace as Fold

    S = C F = C I F = (C T-1) (T F)= Cnew Fnew

    So you could try to implement the desirable factors by designing an appropriate transformation matrix, T

  • 9. Metropolis Algorithm and Simulated Annealing

  • Metropolis Algorithm

    a method to generate a vector x of realizations of the distribution p(x)

  • The process is iterativestart with an x, say x(i)

    then randomly generate another x in its neighborhood, say x(i+1), using a distribution Q(x(i+1)|x(i))

    then test whether you will accept the new x(i+1)

    if it passes, you append x(i+1) to the vector x that you are accumulating

    if it fails, then you append x(i)

  • a reasonable choice for Q(x(i+1)|x(i)) normal distribution with mean=x(i) and sx2 that quantifies the sense of neighborhood

    The acceptance test is as followsfirst compute the quantify:

    If a>1 always accept x(i+1)

    If a

  • Simulated Annealing

    Application of Metropolis to Non-linear optimization

    find m that minimizes E(m)=eTewhere e = dobs-g(m)

  • Based on using the Boltzman distribution for p(x) in the Metropolis Algorithm

    p(x) = exp{-E(m)/T}

    where temperature, T, is slowly decreased during the iterations

  • 10. Some final words

  • Start Simple !Examine a small subset of your data and looking them over carefully

    Build processing scripts incrementally, checking intermediated results at each stage

    Make lots of plots and look them over carefully

    Do reality checks