1 MAXIMUM LIKELIHOOD ESTIMATION Recall general discussion on Estimation, definition of Likelihood...
-
Upload
bennett-green -
Category
Documents
-
view
215 -
download
2
Transcript of 1 MAXIMUM LIKELIHOOD ESTIMATION Recall general discussion on Estimation, definition of Likelihood...
1
MAXIMUM LIKELIHOOD ESTIMATION
• Recall general discussion on Estimation, definition of Likelihood function for a vector of parameters and set of values x. Finding the most likely value of to maximising the Likelihood function. Also defined the Log-likelihood (Support function S() ) and its derivative, the Score, together with Information content per observation, which for single parameter likelihood is given by
• Why bother with MLE? (Need knowledge of underlying distribution)
Consistency; sufficiency; asymptotic efficiency (linked to variance); unique maximum; invariance property and, as a consequence most convenient parameterisation; usually MVUE; conventional optimisation methods.
)(log2
2)(log)(
2
xLExLEI
2
Estimator Comparison in brief.• Classical, uses objective probabilities, intuitive estimators, additional
assumptions for sampling distributions, good properties for some estimators. (See LSE)
• Moment - less calculation, loss of efficiency. Not that widely used in genomic analysis even though usually have analytical solutions and low bias, because poorer asymptotic properties and even simple solutions may not be unique.
• Bayesian - subjective prior knowledge, sample info. close to MLE under certain conditions - see earlier.
• LSE - if assumptions OK, ’s unbiased + variances obtained {(XTX)-1} Assumptions needed on distributions of response variables are just expectations and variance-covariance structure. (Unlike MLE where need to specify joint prob. distribution of variables). But additional assumptions for sampling distns. Some computational advantage. Close if assumptions met e.g. in “Likelihood form”, LSE conditions
XXIIN T2
1 1),)(,(~ˆ
3
VARIANCE, BIAS and CONFIDENCE INTERVALS
• Variance of an Estimator - usual form or
for k independent estimates
• For a large sample, variance of an MLE can be approximated by
can also be estimated empirically, using re-sampling techniques.
• Variance of a linear function of several estimates - common in statistical genomics, see earlier.
• Recall Bias of the Estimator
then the Mean Square error is defined to be:
expands to
so we have the basis for C.I. and tests of hypothesis.
)ˆ(E2)ˆ( EMSE
2
11
22 ˆ1ˆˆ
k
i
i
k
i
i k
)(
1ˆ 2
nI
22ˆ
2 ])ˆ([]})ˆ([)]ˆ(ˆ{[ EEEE
4
COMMONLY-USED METHODS of obtaining MLE
• Analytical - solving or when simple solutions exist
• Grid search or likelihood profile approach
• Newton-Raphson iteration methods
• EM (expectation and maximisation) algorithm
N.B. Log.-likelihood, because maximum for same value of as
Likelihood
Easier to compute
Close relationship between statistical properties of MLE and Log-
likelihood
0ddL 0d
dS
5
METHODS in brief
Analytical : - recall Binomial example earlier
• Example : For Normal, MLE’s of mean and variance, (taking derivatives w.r.t mean and variance separately), and equivalent to sample mean and actual variance (i.e. /N), -unbiased if mean known, biased if not.
• Invariance : One-to-one relationships preserved
• Used: when MLE has a simple solution
0)(
xnx
d
dSScore
n
x
6
Methods for MLE’s contd.Grid Search : MLE from plots likelihood/ log-likelihood vs parameter.
• Relative Likelihood =Likelihood/Max. Likelihood (set =1).
Peak of R.L. can be visually identified or from searching algorithm. E.g. suppose
-Plot likelihood -parameter space range - gives 2 peaks,
symmetrical around likelihood profile for the well-known
mixed linkage phase problem in linkage analysis. If constrain
MLE = R.F. between genes (possible mixed linkage phase).• Graphic/numerical Implementation -initial estimate of , direction of
search determined by evaluating likelihood at both sides of . Search takes direction of increase. Initial search increments large, e.g. 0.1, then when likelihood starts to decrease, stop and refine increment.
• Multiple peaks - miss global maximum, computationally intensive • Multiple Parameters - grid search. Interpretation of Likelihood profiles
can be difficult.
])1()1([)( 20808020 LogS
10 5.0
5.00 2.0ˆ
7
Example
• Recall Exercises 2, ex. 8. Data used to show a linkage relationship between marker and a “rust-resistant”gene. Escapes = individuals who are susceptible, but show no disease (rust) phenotype under experimental conditions. So define as proportion escapes and R.F. respectively. is penetrance for disease trait, i.e. P{individual with susceptible genotype has disease phenotype}. Purpose of this type of experiment typically to estimate R.F. between marker and gene.
• Support function
Setting first derivatives w.r.t = 0. No simple analytical solution• Using grid search, likelihood reaches maximum at • In general, this type of experiment tests H0: Independence between marker
and gene and no escapes using Likelihood Ratio Test statistics.
• N.B: for Moment estimates (ex. 7) solve - not same as MLE
,1
)1log(163)log(52)log(3)1log(168),( S ,
22.0ˆ,02.0ˆ
)5.0( )0(
)1(5.0ˆˆ1
)22(5.0ˆˆ1
41
41
cc
cc
8
Methods contd.Newton-Raphson Iteration
Have Score () = 0 from previously. N-R consists of replacing Score by linear terms of its Taylor expansion, so if ´´ a solution, ´ first guess
Repeat with ´´ replacing ´
Each iteration - fits a parabola to L.F.
• Problems -Multiple peaks, zero Information, extreme estimates
• Multiple parameters - matrix notation, where S matrix for example has elements = derivatives of S(, ) w.r.t. and respectively. Similarly, the Information matrix has terms of form
Estimates
are
0)]([
)()()(
2
2
d
Sd
d
dS
d
dS
22 )(
)]([
dSd
dSd
.),(),(2
2
2
etcSESE
SIN
11
9
Methods contd.Expectation-Maximisation Algorithm - Iterative. Incomplete data
(Much genomic data fits this situation e.g. linkage analysis with marker genotypes of F2 progeny, usually 9 categories observed for 2-locus, 2-allele model, but 16 = complete info., while 14 give info. on linkage. Some hidden, but if linkage parameter known, expected frequencies can be predicted and the complete data restored by expectation).
• Steps - Expectation estimates statistics of complete data, given observed incomplete data. Maximisation uses estimated complete data to give MLE. Iterate till converges.
• Implementation An initial guess ´ chosen (e.g. =0.25 say for R.F.). Taking this = “true”, complete data estimated, by distributional statements e.g. P(individual is recombinant given observed genotype) for R.F. estimation. MLE estimate ´´ computed. This, for R.F. sum of recombinants/N. Thus MLE, for fi observed count,
Convergence ´´ =´ or
)(1
GRPfN ii )00001.0(tolerance
10
LIKELIHOOD for C.I. and H.T.• Likelihood Ratio Test - cf with 2. Principal Advantage of G is Power,
where unknown parameters involved in hypothesis test.
Have likelihood of taking a value A which maximises
it, i.e. MLE and likelihood under H0 : N , e.g. N = 0.5
• Form of L.R. Test Statistic
or, conventionally
In practice - interpretation issue - choose to use first form.• Distribution of G ~ approx. 2 - dof = difference in dimension of
parameter spaces for L(A), L(N)
• Goodness of Fit ….notation as for 2 , G ~ 2n-1
• Independence notation, dof as 2
)(
)(
xL
xL
N
A
)(
)(2
xL
xLLogG
N
A
)(
)(2
xL
xLLogG
A
N
i
i
n
i
i E
OLogOG
1
2
ij
ij
r
i
c
j
ij E
OLogOG
1 1
2
11
Example
• To test H0: = 0.5 (estimated parameter of Binomial)
H1: 0.5
where is MLE of Binomial parameter. If and x replaced with expectations or parametric values
i.e. expected Likelihood Ratio test statistic sample size n , parameter where the part in the bracket { } is the ELRTS from a single observation
)(
)(2
xL
xLLogG
N
A
)]5.0()ˆ([2 SS
]5.0)ˆ1()(ˆ[2 nLogLogxnxLog
]}5.0)1([2{}{ LogLognGE
12
Power-Example extended• Under H0 :
• At level of significance =0.05, suppose true = 1 = 0.2, so if n=25
(in genomics might apply where R.F. =0.2 between two genes (as opposed to 0.5). Natural logs. used, though either possible in practice. Hence, generic form “Log” rather than Ln here. Assume Ln throughout unless otherwise indicated)
• Rejection region at 0.05 level is
• If sketch the curves, P{LRTS falls in the acceptance region} = 0.13,
= the probability of a false negative when actual value of = 0.2
• If sample size increased, e.g. n=50, E{G} = 19 and easy to show that P{False negative} = 0.01
• Generally Power for these tests given by
0}5.05.05.05.05.0{2}{ LogLogLognGE
6.9}5.08.08.02.02.0{50}{ LogLogLogGE
84.321
}{ 22}{,
unitGnEdfP
13
Likelihood Confidence Intervals -method
• Example: Consider the following Likelihood function where is the unknown parameter and a, b observed counts• If four sets of data observed, A: (a,b) = (8,2), B: (a,b)=(16,4) C: (a,b)=(80, 20) D: (a,b) = (400, 100) Likelihood estimates can be plotted vs possible parameter values, with
MLE = peak value. For example, MLE = 0.2, Lmax=0.0067 for A, =0.0045 for B etc.
A: Log Lmax - Log L=Log (0.0067)-Log(0.00091)=2 gives 95% C.I. and =(0.035,0.496) corresponding to L=0.00091, 95% C.I. for A. Similarly, manipulating this expression, Likelihood value corresponding
to 95% confidence interval given as
L = 7.389Lmax
• Usually plot Log-likelihood vs parameter, rather than Likelihood• As sample size increases, interval narrower and symmetric
baL )1(
14
Example - sample size
• For expected Log-LRTS and average Info. content (per observation)
• If true parameter values =0.05,0.1, 0.2, 0.3 respectively, then
G I() and sample size for power 90% (1- = 0.9) and
0.05 0.99 21.1 = 0.05 from
0.10 0.74 11.1
0.20 0.39 6.3 so have 0.30 0.17 4.8 Size or if want, say, range (d) of CI
0.05 11 true value of parameter,
0.10 15 (i.e. d ) - c.f. classical form
0.20 28
0.30 64
baL )1(
;]5.0loglog)1log()1[(2}{ unitGE)1(
1)(
I
}{
5.10
}{
)( 1284.3,1,9.0
unitunit GEGEn
)(
)96.12(2
2
In
15
Multiple Populations: Extensions to G -Example
• Recall Mendel’s data - Week 3 and Extensions to 2 - Week 8.
In brief: Round Wrinkled
Plant O E O E G dof p-value
1 45 42.75 12 14.25 0.49 1 0.49
2 0.09 1 0.77
3 0.10 1 0.75
4 1.30 1 0.26
5 0.01 1 0.93
6 0.71 1 0.40
7 0.79 1 0.38
8 0.63 1 0.43
9 1.06 1 0.30
10 0.17 1 0.68
Total 336 101 5.34 10
Pooled 336 327.75 101 109.25 0.85 1 0.36
Heterogeneity 4.50 9 0.88
16
Multiple Populations - summary• Parallels• Partitions therefore
and Gheterogeneity = Gtotal - GPooled (n=no. classes, p = no. populations)• Example in brief: Recall Backcross (AaBb x aabb) -Goodness of fit etc. (2-
locus model),Week 3. For each of the four crosses, a Total GoF statistic can be calculated according to expected segregation ratio 1:1:1:1 - assumes no segregation distortion for both loci and no linkage between loci. For each locus GoF calculated using marginal counts, assuming the two genotypes segregate 1:1.Difference between Total and 2 individual locus GoF statistics is L-LRTS (or chi-squared statistic) contributed by association/linkage between 2 loci.
2
p
i
n
j ij
ijiTotal E
OOG
1 1
log2
n
j
p
i
p
i
ij
p
i
ij
ijPooled
E
O
OG1 1
1
1log2
17
Class Exercise solutions• Mendel’s Peas Week 3 - 2 - extensions, Week 8
In brief: Round Wrinkled Plant O E O E 2 dof p-value
1 45 42.75 12 14.25 0.47 1 0.49
2 0.09 1 0.77
3 0.10 1 0.75
4 1.39 1 0.24
5 0.01 1 0.93
6 0.67 1 0.41
7 0.76 1 0.38
8 0.67 1 0.41
9 0.98 1 0.32
10 0.17 1 0.68
Total 336 101 5.30 10
Pooled 336 327.75 101 109.25 0.83 1 0.36
Heterogeneity 4.47 9 0.88
No significant departure from the expected frequencies detected for each of the 10 plants or for the pooled frequencies. The heterogeneity 2 also not significant. Notes - separate H0. Some differences in 2 , compared to G values (Lecture)
18
Class Examples contd.• Two-way ANOVA/Additive Design, Week 8, - solution in lecture
• Backcross (Wk 3 & referred to Wk 10) - Complete GoF etc. 2 analysis Cross Total Locus A Locus B Linkage
1 2.13 0.06 (0.86) 0.01(0.91) 2.09(0.15) p-values in brackets
2 6.60 0.03(0.86) 0.03(0.86) 6.53(0.01)
3 66.00 0.33(0.56) 0.33(0.56) 65.33(<0.0001)
4 11.60 0.27(0.61) 0.07(0.80) 11.27(0.0008)
Total 86.33 0.66 0.45 85.22 Each cross ~12, Total =
Pooled 61.86 0.15(0.70) 0.33(0.56) 61.38(<0.0001) Sum of 4 crosses
Heterogeneity 24.47 0.51(0.92) 0.12(0.99) 23.84(<0.0001)
Pooled - uses marginal frequency of 4 genotypic classes over 4 crosses (Assumes no heterogeneity in Segregation Ratio among 4 crosses - for each locus and for linkage
relationship between them). Locus A, B and Linkage ~ 32 under (different)Ho
Heterogeneity overall ~ 92 where dof from (4-1) (4-1) under H0
CONCLUSIONS: -No S.R. distortion for 2 loci (all 4 crosses)
- Significant linkage in 3 crosses (2,3,4)*
-Significant Heterogeneity among 4 crosses found for linkage relationship between 2 loci.
-Sig.GoF statistic for heterogeneity mainly from Cross 1 compared with others, thus linkage
p-value for heterogeneity GoF from 2,3,4 as above*
Experimentally , Cross 1 biologically different from others, so linkage between loci A and B could not be detected using cross 1 data
19
Outstanding class exercises
• Likelihood C.I. for data sets B,C,D - Lectures Week 10
• Sample size calculations for range true parameter values given - Lectures Week 10
• Backcross example - to complete for G to compare with 2 results
(Week 3, Week 8 and current)