1 MAXIMUM LIKELIHOOD ESTIMATION Recall general discussion on Estimation, definition of Likelihood...

19
1 MAXIMUM LIKELIHOOD ESTIMATION Recall general discussion on Estimation, definition of Likelihood function for a vector of parameters and set of values x. Finding the most likely value of to maximising the Likelihood function. Also defined the Log- likelihood (Support function S() ) and its derivative, the Score, together with Information content per observation, which for single parameter likelihood is given by Why bother with MLE? (Need knowledge of underlying distribution) Consistency; sufficiency; asymptotic efficiency (linked to variance); unique maximum; invariance property and, as a consequence most convenient parameterisation; usually MVUE; conventional optimisation methods. ) ( log 2 2 ) ( log ) ( 2 x L E x L E I

Transcript of 1 MAXIMUM LIKELIHOOD ESTIMATION Recall general discussion on Estimation, definition of Likelihood...

Page 1: 1 MAXIMUM LIKELIHOOD ESTIMATION Recall general discussion on Estimation, definition of Likelihood function for a vector of parameters  and set of values.

1

MAXIMUM LIKELIHOOD ESTIMATION

• Recall general discussion on Estimation, definition of Likelihood function for a vector of parameters and set of values x. Finding the most likely value of to maximising the Likelihood function. Also defined the Log-likelihood (Support function S() ) and its derivative, the Score, together with Information content per observation, which for single parameter likelihood is given by

• Why bother with MLE? (Need knowledge of underlying distribution)

Consistency; sufficiency; asymptotic efficiency (linked to variance); unique maximum; invariance property and, as a consequence most convenient parameterisation; usually MVUE; conventional optimisation methods.

)(log2

2)(log)(

2

xLExLEI

Page 2: 1 MAXIMUM LIKELIHOOD ESTIMATION Recall general discussion on Estimation, definition of Likelihood function for a vector of parameters  and set of values.

2

Estimator Comparison in brief.• Classical, uses objective probabilities, intuitive estimators, additional

assumptions for sampling distributions, good properties for some estimators. (See LSE)

• Moment - less calculation, loss of efficiency. Not that widely used in genomic analysis even though usually have analytical solutions and low bias, because poorer asymptotic properties and even simple solutions may not be unique.

• Bayesian - subjective prior knowledge, sample info. close to MLE under certain conditions - see earlier.

• LSE - if assumptions OK, ’s unbiased + variances obtained {(XTX)-1} Assumptions needed on distributions of response variables are just expectations and variance-covariance structure. (Unlike MLE where need to specify joint prob. distribution of variables). But additional assumptions for sampling distns. Some computational advantage. Close if assumptions met e.g. in “Likelihood form”, LSE conditions

XXIIN T2

1 1),)(,(~ˆ

Page 3: 1 MAXIMUM LIKELIHOOD ESTIMATION Recall general discussion on Estimation, definition of Likelihood function for a vector of parameters  and set of values.

3

VARIANCE, BIAS and CONFIDENCE INTERVALS

• Variance of an Estimator - usual form or

for k independent estimates

• For a large sample, variance of an MLE can be approximated by

can also be estimated empirically, using re-sampling techniques.

• Variance of a linear function of several estimates - common in statistical genomics, see earlier.

• Recall Bias of the Estimator

then the Mean Square error is defined to be:

expands to

so we have the basis for C.I. and tests of hypothesis.

)ˆ(E2)ˆ( EMSE

2

11

22 ˆ1ˆˆ

k

i

i

k

i

i k

)(

1ˆ 2

nI

22ˆ

2 ])ˆ([]})ˆ([)]ˆ(ˆ{[ EEEE

Page 4: 1 MAXIMUM LIKELIHOOD ESTIMATION Recall general discussion on Estimation, definition of Likelihood function for a vector of parameters  and set of values.

4

COMMONLY-USED METHODS of obtaining MLE

• Analytical - solving or when simple solutions exist

• Grid search or likelihood profile approach

• Newton-Raphson iteration methods

• EM (expectation and maximisation) algorithm

N.B. Log.-likelihood, because maximum for same value of as

Likelihood

Easier to compute

Close relationship between statistical properties of MLE and Log-

likelihood

0ddL 0d

dS

Page 5: 1 MAXIMUM LIKELIHOOD ESTIMATION Recall general discussion on Estimation, definition of Likelihood function for a vector of parameters  and set of values.

5

METHODS in brief

Analytical : - recall Binomial example earlier

• Example : For Normal, MLE’s of mean and variance, (taking derivatives w.r.t mean and variance separately), and equivalent to sample mean and actual variance (i.e. /N), -unbiased if mean known, biased if not.

• Invariance : One-to-one relationships preserved

• Used: when MLE has a simple solution

0)(

xnx

d

dSScore

n

x

Page 6: 1 MAXIMUM LIKELIHOOD ESTIMATION Recall general discussion on Estimation, definition of Likelihood function for a vector of parameters  and set of values.

6

Methods for MLE’s contd.Grid Search : MLE from plots likelihood/ log-likelihood vs parameter.

• Relative Likelihood =Likelihood/Max. Likelihood (set =1).

Peak of R.L. can be visually identified or from searching algorithm. E.g. suppose

-Plot likelihood -parameter space range - gives 2 peaks,

symmetrical around likelihood profile for the well-known

mixed linkage phase problem in linkage analysis. If constrain

MLE = R.F. between genes (possible mixed linkage phase).• Graphic/numerical Implementation -initial estimate of , direction of

search determined by evaluating likelihood at both sides of . Search takes direction of increase. Initial search increments large, e.g. 0.1, then when likelihood starts to decrease, stop and refine increment.

• Multiple peaks - miss global maximum, computationally intensive • Multiple Parameters - grid search. Interpretation of Likelihood profiles

can be difficult.

])1()1([)( 20808020 LogS

10 5.0

5.00 2.0ˆ

Page 7: 1 MAXIMUM LIKELIHOOD ESTIMATION Recall general discussion on Estimation, definition of Likelihood function for a vector of parameters  and set of values.

7

Example

• Recall Exercises 2, ex. 8. Data used to show a linkage relationship between marker and a “rust-resistant”gene. Escapes = individuals who are susceptible, but show no disease (rust) phenotype under experimental conditions. So define as proportion escapes and R.F. respectively. is penetrance for disease trait, i.e. P{individual with susceptible genotype has disease phenotype}. Purpose of this type of experiment typically to estimate R.F. between marker and gene.

• Support function

Setting first derivatives w.r.t = 0. No simple analytical solution• Using grid search, likelihood reaches maximum at • In general, this type of experiment tests H0: Independence between marker

and gene and no escapes using Likelihood Ratio Test statistics.

• N.B: for Moment estimates (ex. 7) solve - not same as MLE

,1

)1log(163)log(52)log(3)1log(168),( S ,

22.0ˆ,02.0ˆ

)5.0( )0(

)1(5.0ˆˆ1

)22(5.0ˆˆ1

41

41

cc

cc

Page 8: 1 MAXIMUM LIKELIHOOD ESTIMATION Recall general discussion on Estimation, definition of Likelihood function for a vector of parameters  and set of values.

8

Methods contd.Newton-Raphson Iteration

Have Score () = 0 from previously. N-R consists of replacing Score by linear terms of its Taylor expansion, so if ´´ a solution, ´ first guess

Repeat with ´´ replacing ´

Each iteration - fits a parabola to L.F.

• Problems -Multiple peaks, zero Information, extreme estimates

• Multiple parameters - matrix notation, where S matrix for example has elements = derivatives of S(, ) w.r.t. and respectively. Similarly, the Information matrix has terms of form

Estimates

are

0)]([

)()()(

2

2

d

Sd

d

dS

d

dS

22 )(

)]([

dSd

dSd

.),(),(2

2

2

etcSESE

SIN

11

Page 9: 1 MAXIMUM LIKELIHOOD ESTIMATION Recall general discussion on Estimation, definition of Likelihood function for a vector of parameters  and set of values.

9

Methods contd.Expectation-Maximisation Algorithm - Iterative. Incomplete data

(Much genomic data fits this situation e.g. linkage analysis with marker genotypes of F2 progeny, usually 9 categories observed for 2-locus, 2-allele model, but 16 = complete info., while 14 give info. on linkage. Some hidden, but if linkage parameter known, expected frequencies can be predicted and the complete data restored by expectation).

• Steps - Expectation estimates statistics of complete data, given observed incomplete data. Maximisation uses estimated complete data to give MLE. Iterate till converges.

• Implementation An initial guess ´ chosen (e.g. =0.25 say for R.F.). Taking this = “true”, complete data estimated, by distributional statements e.g. P(individual is recombinant given observed genotype) for R.F. estimation. MLE estimate ´´ computed. This, for R.F. sum of recombinants/N. Thus MLE, for fi observed count,

Convergence ´´ =´ or

)(1

GRPfN ii )00001.0(tolerance

Page 10: 1 MAXIMUM LIKELIHOOD ESTIMATION Recall general discussion on Estimation, definition of Likelihood function for a vector of parameters  and set of values.

10

LIKELIHOOD for C.I. and H.T.• Likelihood Ratio Test - cf with 2. Principal Advantage of G is Power,

where unknown parameters involved in hypothesis test.

Have likelihood of taking a value A which maximises

it, i.e. MLE and likelihood under H0 : N , e.g. N = 0.5

• Form of L.R. Test Statistic

or, conventionally

In practice - interpretation issue - choose to use first form.• Distribution of G ~ approx. 2 - dof = difference in dimension of

parameter spaces for L(A), L(N)

• Goodness of Fit ….notation as for 2 , G ~ 2n-1

• Independence notation, dof as 2

)(

)(

xL

xL

N

A

)(

)(2

xL

xLLogG

N

A

)(

)(2

xL

xLLogG

A

N

i

i

n

i

i E

OLogOG

1

2

ij

ij

r

i

c

j

ij E

OLogOG

1 1

2

Page 11: 1 MAXIMUM LIKELIHOOD ESTIMATION Recall general discussion on Estimation, definition of Likelihood function for a vector of parameters  and set of values.

11

Example

• To test H0: = 0.5 (estimated parameter of Binomial)

H1: 0.5

where is MLE of Binomial parameter. If and x replaced with expectations or parametric values

i.e. expected Likelihood Ratio test statistic sample size n , parameter where the part in the bracket { } is the ELRTS from a single observation

)(

)(2

xL

xLLogG

N

A

)]5.0()ˆ([2 SS

]5.0)ˆ1()(ˆ[2 nLogLogxnxLog

]}5.0)1([2{}{ LogLognGE

Page 12: 1 MAXIMUM LIKELIHOOD ESTIMATION Recall general discussion on Estimation, definition of Likelihood function for a vector of parameters  and set of values.

12

Power-Example extended• Under H0 :

• At level of significance =0.05, suppose true = 1 = 0.2, so if n=25

(in genomics might apply where R.F. =0.2 between two genes (as opposed to 0.5). Natural logs. used, though either possible in practice. Hence, generic form “Log” rather than Ln here. Assume Ln throughout unless otherwise indicated)

• Rejection region at 0.05 level is

• If sketch the curves, P{LRTS falls in the acceptance region} = 0.13,

= the probability of a false negative when actual value of = 0.2

• If sample size increased, e.g. n=50, E{G} = 19 and easy to show that P{False negative} = 0.01

• Generally Power for these tests given by

0}5.05.05.05.05.0{2}{ LogLogLognGE

6.9}5.08.08.02.02.0{50}{ LogLogLogGE

84.321

}{ 22}{,

unitGnEdfP

Page 13: 1 MAXIMUM LIKELIHOOD ESTIMATION Recall general discussion on Estimation, definition of Likelihood function for a vector of parameters  and set of values.

13

Likelihood Confidence Intervals -method

• Example: Consider the following Likelihood function where is the unknown parameter and a, b observed counts• If four sets of data observed, A: (a,b) = (8,2), B: (a,b)=(16,4) C: (a,b)=(80, 20) D: (a,b) = (400, 100) Likelihood estimates can be plotted vs possible parameter values, with

MLE = peak value. For example, MLE = 0.2, Lmax=0.0067 for A, =0.0045 for B etc.

A: Log Lmax - Log L=Log (0.0067)-Log(0.00091)=2 gives 95% C.I. and =(0.035,0.496) corresponding to L=0.00091, 95% C.I. for A. Similarly, manipulating this expression, Likelihood value corresponding

to 95% confidence interval given as

L = 7.389Lmax

• Usually plot Log-likelihood vs parameter, rather than Likelihood• As sample size increases, interval narrower and symmetric

baL )1(

Page 14: 1 MAXIMUM LIKELIHOOD ESTIMATION Recall general discussion on Estimation, definition of Likelihood function for a vector of parameters  and set of values.

14

Example - sample size

• For expected Log-LRTS and average Info. content (per observation)

• If true parameter values =0.05,0.1, 0.2, 0.3 respectively, then

G I() and sample size for power 90% (1- = 0.9) and

0.05 0.99 21.1 = 0.05 from

0.10 0.74 11.1

0.20 0.39 6.3 so have 0.30 0.17 4.8 Size or if want, say, range (d) of CI

0.05 11 true value of parameter,

0.10 15 (i.e. d ) - c.f. classical form

0.20 28

0.30 64

baL )1(

;]5.0loglog)1log()1[(2}{ unitGE)1(

1)(

I

}{

5.10

}{

)( 1284.3,1,9.0

unitunit GEGEn

)(

)96.12(2

2

In

Page 15: 1 MAXIMUM LIKELIHOOD ESTIMATION Recall general discussion on Estimation, definition of Likelihood function for a vector of parameters  and set of values.

15

Multiple Populations: Extensions to G -Example

• Recall Mendel’s data - Week 3 and Extensions to 2 - Week 8.

In brief: Round Wrinkled

Plant O E O E G dof p-value

1 45 42.75 12 14.25 0.49 1 0.49

2 0.09 1 0.77

3 0.10 1 0.75

4 1.30 1 0.26

5 0.01 1 0.93

6 0.71 1 0.40

7 0.79 1 0.38

8 0.63 1 0.43

9 1.06 1 0.30

10 0.17 1 0.68

Total 336 101 5.34 10

Pooled 336 327.75 101 109.25 0.85 1 0.36

Heterogeneity 4.50 9 0.88

Page 16: 1 MAXIMUM LIKELIHOOD ESTIMATION Recall general discussion on Estimation, definition of Likelihood function for a vector of parameters  and set of values.

16

Multiple Populations - summary• Parallels• Partitions therefore

and Gheterogeneity = Gtotal - GPooled (n=no. classes, p = no. populations)• Example in brief: Recall Backcross (AaBb x aabb) -Goodness of fit etc. (2-

locus model),Week 3. For each of the four crosses, a Total GoF statistic can be calculated according to expected segregation ratio 1:1:1:1 - assumes no segregation distortion for both loci and no linkage between loci. For each locus GoF calculated using marginal counts, assuming the two genotypes segregate 1:1.Difference between Total and 2 individual locus GoF statistics is L-LRTS (or chi-squared statistic) contributed by association/linkage between 2 loci.

2

p

i

n

j ij

ijiTotal E

OOG

1 1

log2

n

j

p

i

p

i

ij

p

i

ij

ijPooled

E

O

OG1 1

1

1log2

Page 17: 1 MAXIMUM LIKELIHOOD ESTIMATION Recall general discussion on Estimation, definition of Likelihood function for a vector of parameters  and set of values.

17

Class Exercise solutions• Mendel’s Peas Week 3 - 2 - extensions, Week 8

In brief: Round Wrinkled Plant O E O E 2 dof p-value

1 45 42.75 12 14.25 0.47 1 0.49

2 0.09 1 0.77

3 0.10 1 0.75

4 1.39 1 0.24

5 0.01 1 0.93

6 0.67 1 0.41

7 0.76 1 0.38

8 0.67 1 0.41

9 0.98 1 0.32

10 0.17 1 0.68

Total 336 101 5.30 10

Pooled 336 327.75 101 109.25 0.83 1 0.36

Heterogeneity 4.47 9 0.88

No significant departure from the expected frequencies detected for each of the 10 plants or for the pooled frequencies. The heterogeneity 2 also not significant. Notes - separate H0. Some differences in 2 , compared to G values (Lecture)

Page 18: 1 MAXIMUM LIKELIHOOD ESTIMATION Recall general discussion on Estimation, definition of Likelihood function for a vector of parameters  and set of values.

18

Class Examples contd.• Two-way ANOVA/Additive Design, Week 8, - solution in lecture

• Backcross (Wk 3 & referred to Wk 10) - Complete GoF etc. 2 analysis Cross Total Locus A Locus B Linkage

1 2.13 0.06 (0.86) 0.01(0.91) 2.09(0.15) p-values in brackets

2 6.60 0.03(0.86) 0.03(0.86) 6.53(0.01)

3 66.00 0.33(0.56) 0.33(0.56) 65.33(<0.0001)

4 11.60 0.27(0.61) 0.07(0.80) 11.27(0.0008)

Total 86.33 0.66 0.45 85.22 Each cross ~12, Total =

Pooled 61.86 0.15(0.70) 0.33(0.56) 61.38(<0.0001) Sum of 4 crosses

Heterogeneity 24.47 0.51(0.92) 0.12(0.99) 23.84(<0.0001)

Pooled - uses marginal frequency of 4 genotypic classes over 4 crosses (Assumes no heterogeneity in Segregation Ratio among 4 crosses - for each locus and for linkage

relationship between them). Locus A, B and Linkage ~ 32 under (different)Ho

Heterogeneity overall ~ 92 where dof from (4-1) (4-1) under H0

CONCLUSIONS: -No S.R. distortion for 2 loci (all 4 crosses)

- Significant linkage in 3 crosses (2,3,4)*

-Significant Heterogeneity among 4 crosses found for linkage relationship between 2 loci.

-Sig.GoF statistic for heterogeneity mainly from Cross 1 compared with others, thus linkage

p-value for heterogeneity GoF from 2,3,4 as above*

Experimentally , Cross 1 biologically different from others, so linkage between loci A and B could not be detected using cross 1 data

Page 19: 1 MAXIMUM LIKELIHOOD ESTIMATION Recall general discussion on Estimation, definition of Likelihood function for a vector of parameters  and set of values.

19

Outstanding class exercises

• Likelihood C.I. for data sets B,C,D - Lectures Week 10

• Sample size calculations for range true parameter values given - Lectures Week 10

• Backcross example - to complete for G to compare with 2 results

(Week 3, Week 8 and current)