Combining averages and single measurements in a lognormal model

26
Combining averages and single measurements in a lognormal model Dr. Nagaraj K. Neerchal and Justin Newcomer Department of Mathematics and Statistics University of Maryland, Baltimore County 1000 Hilltop Circle, Baltimore, MD 21250

description

Combining averages and single measurements in a lognormal model. Dr. Nagaraj K. Neerchal and Justin Newcomer Department of Mathematics and Statistics University of Maryland, Baltimore County 1000 Hilltop Circle, Baltimore, MD 21250. Motivating Example. - PowerPoint PPT Presentation

Transcript of Combining averages and single measurements in a lognormal model

Page 1: Combining averages and single measurements in a lognormal model

Combining averages and single measurements in a lognormal

model

Dr. Nagaraj K. Neerchal and Justin Newcomer

Department of Mathematics and Statistics University of Maryland, Baltimore County1000 Hilltop Circle, Baltimore, MD 21250

Page 2: Combining averages and single measurements in a lognormal model

Motivating Example Goal: To develop a protocol (methodology)

for obtaining confidence bounds for the “Mean Emissions” for each welding process and rod type combination, incorporating all of the data

Three Welding Processes

Three Rod Types

Multiple Sources of Data Some report individual measurements Some report only averages without the original

observations

Page 3: Combining averages and single measurements in a lognormal model

The DataWelding Process RODTYPE NTESTS Chromium

SMAW E308 3 0.384SMAW E309 1 0.8193SMAW E309 1 0.8604SMAW E309 1 0.1995SMAW E309 3 0.484SMAW E308 1 0.9316SMAW E308 1 1.011SMAW E308 3 0.494SMAW E308 3 0.584GMAW E309 1 4.76GMAW E309 1 6.51GMAW E308 3 0.324GMAW E309 1 0.7285GMAW E308 1 0.898GMAW E308 1 1.3GMAW E308 3 0.532FCAW E309 1 2.42FCAW E309 1 2.82FCAW E309 1 2.86FCAW E308 1 1.86FCAW E308 1 3.04FCAW E308 3 0.265FCAW E308 3 1.205

Page 4: Combining averages and single measurements in a lognormal model

Traditional Approaches Assume Normality?

Sample sizes are very small for certain combinations

Here the bounds obtained assuming normality give meaningless results (e.g. negative bounds)

Transform the data to Normality? In environmental studies, particularly with

concentration measurements, the data most often tends to be skewed, therefore there is a temptation to use the lognormal model

It is hard to transform the confidence bounds back to the original scale (mean of the log is not the same log of the mean!)

Page 5: Combining averages and single measurements in a lognormal model

Weighted Regression? Estimates have good properties, such as Best Linear

Unbiased Estimates, in general

But the confidence bounds are sensitive to the normality assumption, especially when the sample sizes are small as in our case

Nonparametric Approaches? Nonparametric approaches usually use ranks. When only

averages are reported we completely lose the information regarding ranks. Therefore, means can not be incorporated into nonparametric approaches

Traditional Approaches

Page 6: Combining averages and single measurements in a lognormal model

The Data – In General

Individual Data PointsSample Mean

Sample Variance

X01, X02, … , X0n0

X11, X12, … , X1n1

X21, X22, … , X2n2

.

.

.

.

.

.

.

.

.

Xk1, Xk2, … , Xknk

0X

1X

2X

kX

20S

21S

22S

2kS

{Not Availabl

e

Individual Data PointsSample Mean

Sample Variance

X01, X02, … , X0n0

X11, X12, … , X1n1

X21, X22, … , X2n2

.

.

.

.

.

.

.

.

.

Xk1, Xk2, … , Xknk

Page 7: Combining averages and single measurements in a lognormal model

The Setup Our goal is to estimate the mean and

variance from a population of lognormal random variables under the following setup

Consider:

The observations for the first group are available, but for the remaining k groups only the average of the observations (i.e. ) is available

0 , )ln(

2

1exp*

1*

2

1)( where,

,...1,0 , ,...,, - nsobservatio of groups 1

2

21

ijij

ijijX

jnjj

xx

xxf

kjXXXk

ij

j

kXXX ,...,, 21

Page 8: Combining averages and single measurements in a lognormal model

Normality Approach – Large Sample

In practice it is common to assume Normality when the sample sizes are large

In this case the sample means and sample variances are sufficient statistics and therefore the individual observations are not needed

Page 9: Combining averages and single measurements in a lognormal model

Normality Approach – Large Sample

Assume nj’s are large

Then

The likelihood equation then reduces to

)2()22(2)2

(

2

j

22

2

and where

, Normal~X

eee

n j

k

j j

j

knn

k

jjn

i

inn n

xn

xL

k 12

2

222/)...(

1

12

20

2/22/

)(

2

1exp*

)()2(*

)(

2

1exp*

)()2(

1

1

0

00

Page 10: Combining averages and single measurements in a lognormal model

Normality Approach – Large Sample

This gives us the following normal equations

Which gives us the following MLE estimates

3

1

2

31

20

0

2

0

2

)()( 0

ln

0

)(

0ln

0

k

jjj

n

ii

k

jjj

xnxknL

xnL

0

1

2

1

20

0

2

0

0

)ˆ()ˆ(1ˆ

)(

ˆ

n

i

k

jjjimle

k

jj

k

jjj

mle

xnxkn

n

xn

Page 11: Combining averages and single measurements in a lognormal model

Normality Approach – Large Sample

Remarks

Although this method works well for large samples, in practice it is common for sample means to be based on a small number of observations, such as n=2,3,4

In this case, when the original data follows a lognormal distribution, the sample mean does not follow a normal distribution

Our goal then becomes finding the distribution of the sample mean from a random sample of lognormal random variables

Page 12: Combining averages and single measurements in a lognormal model

Assume Lognormal - Naïve In practice a common naïve approach is to

assume that the sample means are lognormal random variables

This would imply that

in

2

i

2

,Normal~)Xln(

,Normal~Xln

However this does not hold… Why?

Page 13: Combining averages and single measurements in a lognormal model

The exact approach to this problem is to derive the distribution of by convoluting

Hence, we can write the likelihood function as

where is the probability distribution of

The problem is that the distribution of the sum of lognormal random variables does not have a closed form and therefore does not have a closed form

kjXXX jnjj j,...,1 , ,...,, 21 jX

k

jjj

n

ii xfxf

1

2

1

20

2 ),|(*),|( ,( L0

jf kjX j ,...,2,1 ,

),(L 2

Direct Approach

Page 14: Combining averages and single measurements in a lognormal model

Numerical Approximation We can approximate the convolution

numerically by replacing the integral

For small samples, n=2,3,4 it can be seen that the plot of appears to be approximated better by a lognormal distribution with an adjusted mean and variance rather than an approximate normal random variable

2111111 , )()(by )()(1

21

1

21xxwxwfxfdxxwfxff

xXX

x

XXW

nn X

n

XXX

n

w

)...( 21

Page 15: Combining averages and single measurements in a lognormal model

Numerical Approximation

Page 16: Combining averages and single measurements in a lognormal model

Numerical Approximation Remarks

Here a separate approximation must be performed for each sample mean

Therefore this approach can become computationally intensive since the numerical approximations must be computed at each iteration

The simulations show that a lognormal model, with an adjusted mean and variance, is a good fit when the sample sizes are small

Page 17: Combining averages and single measurements in a lognormal model

Adjusted Lognormal Distribution

Here we assume that approximately follows a lognormal distribution with parameters

We then have

kjX j ,...,1 ,

),(g

),(

22

2*

21

*

g

2**2**

2**

222

j

2

j

X

X

eeVar

eE

Page 18: Combining averages and single measurements in a lognormal model

Adjusted Lognormal Distribution

Also, since the original sample comes from a lognormal distribution we have

Equating the expected values and variances gives us

jj n

ee

nVar

eE22

2

2222

j

2j

X

X

)2(2- ln),(g

ln2

1)2(2 ),(

2222

22

2*

22222

1*

22

22

j

j

n

ee

n

eeg

Page 19: Combining averages and single measurements in a lognormal model

Adjusted Lognormal Distribution

Therefore we have

Which gives us the following likelihood function

kjgg

nii

,...,2,1 , ) ),( , ),((Normal~)Xln(

,...,2,1 , ),(Normal~)Xln(

22

2*21

*j

02

0

k

j

j

kk

k

jjn

i

inn

xn

xL

12*

2*

22*2

1

12

20

222

))(ln(

2

1exp*

)()2(*

))(ln(

2

1exp*

)()2(

1 0

00

Page 20: Combining averages and single measurements in a lognormal model

Adjusted Lognormal Distribution

This gives us the following normal equations

The numerical solutions of these equations will give the MLE’s of and and hence the MLE’s of and (by the invariance property)

0)(ln

2

1)ln()(ln

2

1)ln(

0)(ln

2

1)ln()(ln

2

1

*

12*

2*

*

*

*

*

12

200

*

12*

2*

*

*

*

*

12

20

0

0

k

j

jn

i

i

k

j

jn

i

i

xkxn

xkx

22** and both of functions are and Since

2 2

Page 21: Combining averages and single measurements in a lognormal model

Adjusted Lognormal Distribution

Remarks

This method works well when dealing with small sample sizes n=2,3,4

The likelihood becomes quite complicated and therefore numerical methods must be employed to obtain the MLE’s of the parameters

There is an advantage over the convolution since the approximations do not need to be made at each iteration

Page 22: Combining averages and single measurements in a lognormal model

Conclusions

The distribution of the mean of lognormal observations does not yield a useful closed form expression

Approximations either by normal when the sample size is large or by lognormal (with appropriately chosen parameters) when the sample size is small can be used for obtaining estimates of the population parameters

Page 23: Combining averages and single measurements in a lognormal model

Future Work Implementation of these methods within

standard software packages, such as PROC NLIN in SAS

Performing simulation techniques, such as Monte Carlo, to explore the efficiency of these methods

Other numerical methods can be explored, such as the EM algorithm, for obtaining the MLE’s

Generalizing these methods to other standard power transformations

Page 24: Combining averages and single measurements in a lognormal model

Bootstrapping What is Bootstrapping?

Resampling the observed data

It is a simulation type of method where the observed data (not a mathematical model) is repeatedly sampled for generating representative data sets

Only indispensable assumption is that “observations are a random sample from a single population”

There are some fixes available when the single population assumption is violated as in our case.

Can be implemented in quite a few software packages: e.g. SPLUS, SAS

Millard and Neerchal (2000) gives S-Plus code

Page 25: Combining averages and single measurements in a lognormal model

Bootstrapping - The Details

Data X=(X1,X2,X3,….,Xn) Statistic: T=T(X)

rep #1

X*1=(X*1,X*2,X*3,….,X*n)

T*1=T(X*1)

rep #2

X*2=(X*1,X*2,X*3,….,X*n)

T*2=T(X*2)

….. …….. …….

rep #B

X*B=(X*1,X*2,X*3,….,X*n)

T*B=T(X*B)

Bootstrapping inference is based on the distribution of the replicated values of the statistic : T*1,T*2,….T*B. For example, Bootstrap 95% Upper Confidence Bound based on T is given by the 95th percentile of the distribution of T*s.

Page 26: Combining averages and single measurements in a lognormal model

Bootstrapping the Combined Data

Group the data points according to the number of tests used in reporting the average, within each welding process and rod type combination. Then bootstrap within each such group.

i.e. for GMAW and E316:

Note: Each color represents a separate group

Source of Data Welding Process RODTYPE NTESTS Chromium Chromium 6 (g/kg) NSRP 0587 GMAW E316 1 0.898 0.0457NSRP 0587 GMAW E316 1 1.3 0.0169NSRP 0587 GMAW E316 1 0.899 0.0074CARB GMAW E316 3 0.025AP-42 GMAW E316 3 0.532 0.007CARB GMAW E316 4 0.0086

Source of Data Welding Process RODTYPE NTESTS Chromium Chromium 6 (g/kg) NSRP 0587 GMAW E316 1 0.898 0.0457NSRP 0587 GMAW E316 1 1.3 0.0169NSRP 0587 GMAW E316 1 0.899 0.0074CARB GMAW E316 3 0.025AP-42 GMAW E316 3 0.532 0.007CARB GMAW E316 4 0.00860.253