Download - Chapter 6 Statistics: Part 1 Parameter Estimation

Notes and figures are based on or taken from materials in the course textbook: Probability, Statistics and Random Processes for Engineers, 4th ed., Henry Stark and John W. Woods, Pearson Education, Inc., 2012.

B.J. Bazuin, Fall 2016 1 of 37 ECE 3800

Henry Stark and John W. Woods, Probability, Statistics, and Random Variables for Engineers, 4th ed.,

Pearson Education Inc., 2012. ISBN: 978-0-13-231123-6

Chapter 6 Statistics: Part 1 Parameter Estimation

Sections 6.1 Introduction 340 Independent, Identically Distributed (i.i.d.) Observations 341 Estimation of Probabilities 343 6.2 Estimators 346 6.3 Estimation of the Mean 348 Properties of the Mean-Estimator Function (MEF) 349 Procedure for Getting a δ-confidence Interval on the Mean of a Normal

Random Variable When σX Is Known 352 Confidence Interval for the Mean of a Normal Distribution When σX Is Not

Known 352 Procedure for Getting a δ-Confidence Interval Based on n Observations on the

Mean of a Normal Random Variable when σX Is Not Known 355 Interpretation of the Confidence Interval 355 6.4 Estimation of the Variance and Covariance 355 Confidence Interval for the Variance of a Normal Random variable 357 Estimating the Standard Deviation Directly 359 Estimating the covariance 360 6.5 Simultaneous Estimation of Mean and Variance 361 6.6 Estimation of Non-Gaussian Parameters from Large Samples 363 6.7 Maximum Likelihood Estimators 365 6.8 Ordering, more on Percentiles, Parametric Versus Nonparametric Statistics 369 The Median of a Population Versus Its Mean 371 Parametric versus Nonparametric Statistics 372 Confidence Interval on the Percentile 373 Confidence Interval for the Median When n Is Large 375 6.9 Estimation of Vector Means and Covariance Matrices 376 Estimation of μ 377 Estimation of the covariance K 378 6.10 Linear Estimation of Vector Parameters 380 Summary 384 Problems 384 References 388 Additional Reading 389



6.1 Introduction

Statistics Definition: The science of assembling, classifying, tabulating, and analyzing data or facts:

Descriptive statistics – the collecting, grouping and presenting data in a way that can be easily understood or assimilated.

Inductive statistics or statistical inference – use data to draw conclusions about or estimate parameters of the environment from which the data came from.

Theoretical Areas:

Sampling Theory – selecting samples from a collection of data that is too large to be examined completely.

Estimation Theory – concerned with making estimates or predictions based on the data that are available.

Hypothesis Testing – attempts to decide which of two or more hypotheses about the data are true.

Curve fitting and regression – attempt to find mathematical expressions that best represent the data. (Shown in Chap. 4)

Analysis of Variance – attempt to assess the significance of variations in the data and the relation of these variances to the physical situations from which the data arose. (Modern term ANOVA)

We will focus on parameter estimation (Chap. 6) and hypothesis testing (Chap. 7)



Sampling Theory – The Sample Mean

How many samples are required to find a representative sample set that provides confidence in the results?

Defect testing, opinion polls, infection rates, etc.

Definitions

Population: the collection of data being studied N is the size of the population

Sample: a random sample is the part of the population selected all members of the population must be equally likely to be selected! n is the size of the sample

Sample Mean: the average of the numerical values that make of the sample

Population: N

Sample set: nxxxxxxS ,,,,, 54321

Sample Mean

n

iix

nx

1

1

To generalize, describe the statistical properties of arbitrary random samples rather than those of any particular sample.

Sample Mean

n

iiX

nX

1

1ˆ , where iX are random variables with a pdf.

Notice that for a pdf, the true mean, X , can be compute while for a sample data set the above

sample mean, is computed. X̂



As may be noted, the sample mean is a combination of random variables and, therefore, can also be considered a random variable. As a result, the hoped for result can be derived as:

XXn

XEn

XEn

i

n

ii

11

11ˆ

If and when this is true, the estimate is said to be an unbiased estimate.

Though the sample mean may be unbiased, the sample mean may still not provide a good estimate.

What is the “variance” of the computation of the sample mean?

Varianceofthesamplemean–(themeanitself,notthevalueofX)

You would expect the sample mean to have some variance about the “probabilistic” or actual mean; therefore, it is also desirable to know something about the fluctuations around the mean. As a result, computation of the variance of the sample mean is desired.

For N>>n or N infinity (or even a known pdf), using the collected samples … based on the prior definition of variance, a statistical estimate of the 2nd moment and the square of the mean.

22

1

ˆ1ˆ XEXn

EXVarn

ii

211

2

1ˆ XXXn

EXVarn

jj

n

ii

21 1

2

1ˆ XXXn

EXVarn

i

n

jji

21 1

2

1ˆ XXXEn

XVarn

i

n

j

ji

For iX independent (measurements are independent of each other)

jiforXXEXEXE

jiforXXEXXE

ji

ii

ji

,ˆ

,

22

22



As a result we can define two summation where i=j and i<>j,

21 ,1

2

1ˆ XXXEXXEn

XVarn

i

n

ijj

jiii

2222

2

1ˆ XXEnnXEnn

XVar ii

22

2

221ˆ XX

n

nnX

nXVar

nn

XXX

n

nX

nXVar

2222

221ˆ

where 2 is the true variance (probabilistic) of the random variable, X.

Therefore, as n approaches infinity, this variance in the sample mean estimate goes to zero! Thus a larger sample size leads to a better estimate of the population mean.

Note: this variance is developed based on “sampling with replacement”.

When based on sampling without replacement …

Destructive testing or sampling without replacement in a finite population results in another expression:

1

ˆ2

N

nN

nXVar

Note that when all the samples are tested (N=n) the variance necessarily goes to 0. And … all the samples have been removed from the population?!

The variance in the mean between the population and the sample set must be zero as the entire population has been measured!



Example: How many samples of an infinitely long time waveform would be required to insure the mean is within 1% of the true (probabilistic) mean value? For this relationship, let

22 1001.001.0ˆ XVar

Infinite set, therefore assume that you use the “with replacement equation”:

n

XVar2

ˆ

Assume that the true means is 10 and that the true variance is 9 so that the mean =/- a standard deviation would be 310 . Then,

21001.09ˆ n

XVar

01.01.09 2 n

900n

A very large sample set size to “estimate” the mean within the 1% desired bound!



CentralLimitTheoremEstimate

Thinking of the characterization after using a very large number of samples …

Using the central limit theorem (assume a Gaussian distribution) to estimate the probability that the mean is within a prescribed variance (1% from the previous example):

9.91.101.10ˆ9.9Pr FFX

Assume that the statistical measurement density function has become Gaussian centered around 10 with a 1% of the mean standard deviation (assuming that 10 and 1.0 ). We can use Gaussian/Normal Tables to determine the probability …

1.0

109.9

1.0

101.101.10ˆ9.9Pr X

112111111.10ˆ9.9Pr X

6826.018413.021.10ˆ9.9Pr X

This implies that, after taking so many measurement to form an estimate, there is a 68.3% chance the estimate is within 1% of the mean

or

that there is a 1-0.6826 or 31.74% probability that the estimate of the population mean is more than 1% away from the true population mean.

Summary, as the number of sample measurements increases, the density function of the estimated mean about the true (probabilistic) mean takes on a Gaussian characteristic. (based on the central limit theorem) Based on the variance of the sample mean computation (related to number of samples) the probability that the measurement mean match the probabilistic mean has known probability (based on Gaussian statistics).

We will be dealing with Gaussian/Normal Distributions as large sum sizes with some random variable association haves joint density functions that are Gaussian – Central Limit Theorem.



Example #2: A smaller sample size

Population: 100 transistors

Find the mean value of the current gain, . Assume that: the true population mean is 120 and

the true population variance is 252 .

How large a sample is required to obtain a sample mean that has a standard deviation of 1% of the true mean? Therefore, we want

44.12.112001.0ˆ 22 XVar

A smaller sample size, sample mean variance can be computed as

1

ˆ2

N

nN

nXVar

Determining the number of samples needed to meet tolerance …

44.11100

10025

n

n

nn

25

9944.1100

1592.147024.6

100

25

9944.11

100

n

A rule-of-thumb is offered to define “large vs. small” sample sizes, the threshold given is 30. The ultimate goal is to have enough samples to achieve a near-Gaussian probability distribution.



Sampling Theory – The Sample Variance

When dealing with probability, both the mean and variance provide valuable information about the “DC” and “AC” operating conditions (about what value is expected) and the variance (in terms of power or squared value) about the operating point.

Therefore, we are also interested in the sample variance as compared to the true data variance.

The sample variance of the population (stdevp) is defined as:

n

i

i XXn

S

1

22 ˆ1

and continuing until (shown in the coming pages)

22 1

n

nSE

where is the true variance of the random variable.

Note: the sample variance is not equal to the true variance; it is a biased estimate!

To create an unbiased estimator, scale by the biasing factor to compute (stdev):

n

ii

n

iix XX

nXX

nn

nSE

n

nSE

1

2

1

2222 ˆ

1

1ˆ1

11

~

When the population is not large, the biased estimate becomes

22 1

1

n

n

N

NSE

and the unbiased estimate is

22

1

1~SE

n

n

N

NSE



Additional notes: MATLAB and MS Excel

Simulation and statistical software packages allow for either biased or unbiased computations.

In MS Excel there are two distinct functions stdev and stdevp.

stdev uses (n-1) - http://office.microsoft.com/en-us/excel-help/stdev-function-HP010335660.aspx stdevp uses (n) - https://support.office.com/en-US/article/STDEVP-function-1F7C1C88-1BEC-4422-

8242-E9F7DC8BB195

In MATLAB, there is an additional flag associate with the std function.

n

jjx

nXXstd

1

2

1

1var , flag implied as 0

n

jjx

nXXstd

1

211,var1, , flag specified as 1

>> help std std Standard deviation. For vectors, Y = std(X) returns the standard deviation. For matrices, Y is a row vector containing the standard deviation of each column. For N-D arrays, std operates along the first non-singleton dimension of X. std normalizes Y by (N-1), where N is the sample size. This is the sqrt of an unbiased estimator of the variance of the population from which X is drawn, as long as X consists of independent, identically distributed samples. Y = std(X,1) normalizes by N and produces the square root of the second moment of the sample about its mean. std(X,0) is the same as std(X).



Sampling Theory – The Sample Variance - Proof

The sample variance of the population is defined as

n

i

i XXn

S

1

22 ˆ1

n

i

n

j

ji Xn

Xn

S

1

2

1

2 11

Determining the expected value

n

i

n

jji X

nX

nESE

1

2

1

2 11

n

i

n

jj

n

jjii X

nXX

nX

nESE

1

2

11

22 121

n

i

n

kk

n

jj

n

jjii XX

nEXXE

nXE

nSE

1 112

1

22 121

n

i

n

kk

n

jj

n

i

n

jji

n

ii XX

nE

nXXE

nXE

nSE

1 112

1 12

1

22 1121

n

i

n

j

n

kkj

n

i

XXEnn

XEnXEn

XEnn

SE1 1 1

21

222

22 111

21

n

i

n

j

n

j

n

jkkkjj XXEXE

nnXEnnXEn

nXESE

1 1 1 ,1

2

2

222

22 111

2

n

i

XEnnXEnn

XEn

nXE

nXESE

1

2223

2222 1122

22223

2222 1122XEnnnXEn

nXE

n

nXE

nXESE



n

n

n

nXE

nnXESE

112121 222

n

nXE

n

nXESE

11 222

2222 11

n

nXEXE

n

nSE

Therefore,

22 1

n

nSE

To create an unbiased estimator, scale by the (un-) biasing factor to compute:

222

1

~

SEn

nSE

Varianceofthevariance

As before, the variance of the variance can be computed. (Instead of deriving the values, it is given.) It is defined as

n

SVar4

42

where 4 is the fourth central moment of the population and is defined by

44 XXE

Proof for extra credit homework credit ? …

For the unbiased variance, the result is

2

44

44

2

22

2

22

111

~

n

n

nn

nSVar

n

nSVar



Example: the random time samples problem (first example) previously used where the true means is 10 and that the true variance is 9. Then,

nn

XVar9ˆ

2

and for n=900 01.0900

9ˆ XVar

2

442

1

~

n

nSVar

for a Gaussian random variable, the 4th central moment is 44 3 . Therefore

2

4

2

442

1

2

1

3~

n

n

n

nSVar

1804.0808201

145800

1900

99002~2

22

SVar

4247.0~2 SVar

The Variance estimate would then be

2~SVar or within %72.4%9

~100 2

SVar

While 900 was selected to provide a mean estimate that was within 1%, the variance estimate is not nearly as close at 4.72%. More samples are required to improve the variance estimate.



Statistical Mean and Variance Summary

For taking samples and estimating the mean and variance …

The Estimate Variance of Estimate

Mean

n

iiX X

nX

1

1ˆ̂

An unbiased estimate

XEXE ˆ

XX ˆ

n

XVar X2

ˆ

Variance (biased)

n

i

i XXn

S

1

22 ˆ1

A biased estimate

22 1Xn

nSE

2

442

1

~

n

nSVar X

44 XXE

Variance (unbiased) 222

1

~XSE

n

nSE

An unbiased estimate

22~XXESE

222 ˆ~

XXSE

n

SVar X4

42

44 XXE



Bounds on the estimates

Using the Chebyshev Inequality 2

2

XXXP

Bounding the estimated mean value

2

2

ˆ

n

P

X

XX

if we let n∞

0limˆlim2

2

n

P X

nXX

n

Therefore, for any value lambda

0ˆlim

XXn

P

The probability that the estimated (statistical) mean is different from the probabilistic mean is zero! Therefore the two must be identical for the infinite sample case!



Building a confidence interval

From Chapter 4, the discrete derivation stated for the Chebyshev Inequality stated

𝑃𝑟 |𝑋 𝜇 | 𝜖𝑉𝑎𝑟 𝑋𝜖

Let X be an arbitrary R.V. with known mean and variance. Then for any 0


2

XXXP

Derivation

dxxfXxXXEXX X

2222

Then

Xx

XX dxxfXxdxxfXx222

and

XxPdxxfdxxfXxXx

X

Xx

X2222

Results#1:

XxP2

2

It may be convenient to define the delta function in terms of a multiples of the standard deviation.

k

22

2

X

XXX

kkXP

Flipping the bounds on the inequality

2

11

kkXP XX

Expanding the absolute value and adding the mean

2

11

kkXkP XXX

A confidence interval for the statistical average becomes



2

11ˆ

knk

nkP X

XXX

If we assume that the mean value has a Gaussian distribution, an exact value can be computed for this probability

12ˆ

kkkk

n

kPX

XX

Aconfidenceintervaloftenchosenis95%or0.95. 1295.0 k

975.02

95.01

k

96.1k

and

95.096.1ˆ

96.1

nP

X

XX

UsingExample6.3‐1effectofsamplesizeontheestimatedmean

If the actual mean is 0 and actual standard deviation is 3, what are the 95% confidence bounds?

For 0X and 3X

For a two-sides interval 1295.0 k

975.02

95.01

k and k = 1.96

95.096.13

0ˆ96.1

nP X

95.088.5

ˆ88.5

nnP X

If we were hoping to be within +/-0.1, how many samples are needed? 95.01.0ˆ1.0 XP

If and only if 1.088.5

n



or n8.58 or n44.3457

If we only used n=64 samples, the probability of being within the 95% interval is ???

???64

88.5ˆ

64

88.5

XP

???735.0ˆ735.0 XP

26.0163.021735.02735.0ˆ735.0 XP

We could pick a different confidence interval … say 50%. 1250.0 k

75.02

50.01

k

67.0k and

50.067.03

0ˆ67.0

nP X

50.02

ˆ2

nnP X

If we only used n=64 samples, 50.025.0ˆ25.0 XP



Gaussian Confidence Intervals (CI):

For a “two sided” confidence interval, we want

..12ˆ

ICkkkk

n

kPX

XX

(A: textbook steps 1 & 2) Find the appropriate value k for the confidence interval selected.

(B: textbook step 3 & 4) Based on the known Gaussian mean and variance for the “estimated” R.V and the known number of samples, Compute the bounds on the inequality.

Alternate solution typy.

If you know the bounds on the probability inequality and the Gaussian statistics, compute the number of samples needed.

boundn

k X

2

bounds

kn X

for ..12 ICk

SummaryforusingGaussianC.I(knownmeanandvariance)

..12ˆ ICkboundsn

kn

kboundsP XXX

X

Compute what you need …

Lookingatthenumbers…

The higher the confidence interval, the wider are the bounds.

The tighter are the bounds, the smaller the confidence interval.

Gaussian “confidence” +/- one standard deviation 68.3% +/- two standard deviation 95.44% +/- three standard deviation 99.74%

90% k=+/-1.64 standard deviation 95% k=+/-1.96 standard deviation 99% k=+/-2.58 standard deviation



MoreGaussian

Confidence Interval (in %) Two Tail Bounds ccc zzzzork :

99.99% 0.005% to 99.995% 3.89

99.9% 0.05% to 99.95% 3.29

99% 0.5% to 99.5% 2.58

95% 2.5% to 97.5% 1.96

90% 5% to 95% 1.64

80% 10% ro 90% 1.28

50% 25% to 75% 0.675

see Sec4_4_Gaussian.m

There are “one-sided” bounds that we have not discussed. For a Gaussian R.V.

czq for zzc

..ˆ ICkboundsn

kP XXX

-5 -4 -3 -2 -1 0 1 2 3 4 5

f(x)

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

q= 50.00%, k=0.674

q= 90.00%, k=1.645

q= 95.00%, k=1.960

q= 99.00%, k=2.576

Gaussian q values



Confidence Intervals when we do not know the actual variance …

We have the statistically computed, non-biased variance estimate.

Define the new “estimated random variable mean function: as

nnS

T

ˆˆ

~ˆ

and define

n

ii

n

Xnnn

T

1

2

1

ˆ1

1

ˆˆ

ˆ

To simplify the textbook description involving the chi-squared distribution, this is the basis for Student’s t-distribution with n-1 degrees of freedom.

The Student’s t probability density function (letting v=n-1, the degrees of freedom) is defined as

2

12

1

2

2

1

v

T v

t

vv

v

tf

where is the gamma function.

The gamma function can be computed as

integerankfor!

kanyfor1

k

kkk

and

21

(1) Note that when evaluating the Student’s t-density function, all arguments of the gamma function are integers or an integer plus ½.

(2) Note that: The distribution depends on ν, but not μ or σ; the lack of dependence on μ and σ is what makes the t-distribution important in both theory and practice.



http://en.wikipedia.org/wiki/Student's_t-distribution

Student's distribution arises when (as in nearly all practical statistical work) the population standard deviation is unknown and has to be estimated from the data.

Note that: The distribution depends on ν = n-1, but not μ or σ; the lack of dependence on μ and σ is what makes the t-distribution important in both theory and practice.

T-distribution confidence interval

For a “two sided” confidence interval, we want

..12ˆ

ˆ111 ICtTtTtTt

n

tP nnnX

XX

(A: textbook steps 1 & 2) Find the appropriate value t based on the value v=n-1 for the confidence interval selected. (Hint. the tables says x and n, but you are looking up v=n-1 (not n) and finding t=x based on FT)

(B: textbook step 3 & 4) Based on the known computed variance for the “estimated” R.V and the known number of samples. Compute the bounds on the inequality.

ICn

tn

tP XX .ˆ

ˆˆ

Or the bounds on the true mean, based on the confidence interval are

ICn

tn

tP XXX .ˆ

ˆˆ

ˆ

cTcT

t

t

T tFtFdttfCIc

c

100 for cc ttt , 2-sided

There are “one-sided” bounds that we have not discussed. For T-distribution R.V.

cT

t

T tFdttfCIc

100 for ttc , “right-tail”

ICn

tP XX .ˆ

ˆ



Comparing the density functions: Student’s t and Gaussian

See StudentsT_Plot.m and function students_t.m

Student’s t 2

12

1

2

2

1

v

T v

t

vv

v

tf

Gaussian

2

2

2exp

2

1

X

X

XX

xxf

t-4 -3 -2 -1 0 1 2 3 4

Den

sity

fun

ctio

n F

T(t

)

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4Students t and Gaussian Densities

Gaussian

T w/ v=1

T w/ v=2

T w/ v=8



HW 4-4.2 A very large population of bipolar transistors has a current gain with a mean value of 120 and a standard deviation of 10. The value of current gain may be assumed to be independent Gaussian random variables.

a) Find the confidence limits for a confidence level of 90% on the sample mean if it is computed from a sample size of 150.

nkXX

nkX

ˆ

Two sided test at 90% means that k = 1.645.

343.1150

10645.1

nk

343.1120ˆ343.1120 X

b) Repeat part (a) if the sample size is 21.


590.321

10645.1

nk

590.3120ˆ590.3120 X



HW 4-4.3 Repeat Problem 4-4.2 for a one-sided confidence interval. Restating the problem …

Find the value of the current gain above which 90% of the sample means would lie.

Xn

kX ˆ

(a) 150 sample size

One sided test at 90% means that

9.0

nk

or 9.01

nkQ

Therefore, k = 1.2826 and

047.1150

102826.1

nk

Xn

kX ˆ95.118

(b) 21 sample size

One sided test at 90% means, k = 1.2826 and

799.221

102826.1

nk

Xn

kX ˆ20.117

One Tail Bounds Confidence Interval (in %) ccc zzzzork :

99.99% 99.99% 3.7190 99.9% 99.9% 3.0902 99% 99% 2.3263 95% 95% 1.6449 90% 90% 1.2816 80% 80% 0.8416 75% 75% 0.6745 50% 50% 0

Examples of use:



Exercise 4-4.2

A very large population of resistor values has a true mean of 100 ohms and a sample standard deviation of 4 ohms. Find the confidence interval on the sample mean for a confidence level of 95% if it is computed from:

a) a sample size of 100. v = 99 Using v=60 (no 100 given) and F=0.975 (2 sided test) on p. G-4, t=2.00. Therefore

nStXX

nStX

~ˆ~

8.0100

400.2

~

nSt

8.0100ˆ8.0100 X

8.100ˆ2.99 X

Using v=120 (no 100 given) and F=0.975 (2 sided test) on p. G-4, t=1.98.

792.0100

498.1

~

nSt

792.100ˆ208.99 X

b) a sample size of 9.

v = 8 Using v=8 and F=0.975 (2 sided test) on p. G-4, t=2.306. Therefore

nStXX

nStX

~ˆ~

075.39

4306.2

~

nSt

075.3100ˆ075.3100 X



HW 4-4.2 A very large population of bipolar transistors has a current gain with a mean value of 120 and a standard deviation of 10, The value of current gain may be assumed to be independent Gaussian random variables.

b) Repeat part (a) if the sample size is 21.


590.321

10645.1

nk

590.3120ˆ590.3120 X

If the variance was an estimated variance … instead of a known variance.


nStXX

nStX

~ˆ~

764.321

10725.1

~

nSt

764.3120ˆ764.3120 X

Notice that using an estimate variance results in a greater range of values (differences in the density functions).



Skill 17-2 A cereal vendor’s quality control department has just tested a random sample of 10 “20 ounce” boxes of Oat Flakes by weighing them in order to see if their 20 ounce claim is to be believed. Their report, to be forwarded to management, must include a 95% confidence interval as to the population mean.

a) Find the unbiased mean and standard deviation

b) Determine the 95% confidence interval of the mean (by using the Student’s-t table).

c) In general, if the confidence interval becomes tighter (smaller), would the confidence level increase or decrease?

Measurement Data: 19, 18, 21, 21, 18, 22, 17, 19, 20, and 17.

a) Sample Mean

n

i

iXn

X

1

1ˆ , where iX are random variables with a pdf.

2.1910

19217201917221821211819

10

1ˆ X

Unbiased variance

n

ii XX

nSE

1

22 ˆ

1

1~

067.39

6.272.28.12.02.28.22.18.18.12.12.0

9

1~ 22222222222 SE

751.1067.39

6.27~S


nStXX

nStX

~ˆ~

252.110

751.1262.2

~

nSt

252.12.19ˆ252.12.19 X

452.20ˆ948.17 X

(c) As the confidence interval becomes tighter (smaller) [p% going down! ], the confidence level/interval decreases.



6.2 Definitions

Estimator: a function of the observations vector that estimates a particular parameter.

Unbiased estimator an estimator is unbiased if the estimate converges to the correct value.

Biased estimator an estimator may be biased. Being converging to an offset or gain adjusted value.

Linear estimation the estimate is a linear combination of the sample points … bH x X

Consistent the estimator is consistent if the estimate converges to the appropriate value as the number of samples goes to infinity.

There are estimators that minimize the variance in the estimate from the sample values.

There are estimators that minimize the mean-squared error for the sample values.



6.6 Estimation of non-Gaussian Parameters.


2

XXXP

The confidence interval for the statistical average became

2

11ˆ

knk

nkP X

XXX

But based on the central limit theorem, we determined the probability to be related as a sum RV where the resulting RV becomes Gaussian, with prescribed means and variances based on the original density functions.

kkk

n

kPX

XXˆ

or

kk

nk

nkP X

XXX

X ˆˆ

When the initial distributions of the summed random variables are non-Gaussian, the mean and variance may be related, for example the exponential distribution.

Example6.6‐1exponentialdistribution xuexf x

X

where

1 and

22 1

We wish to estimate bounds for lambda

kk

nk

nkP X

XXX

X ˆˆ

kk

nk

nkP X

XXX

X ˆ

kk

nk

nkP X

11ˆ

11

kk

nk

nkP X 1

1ˆ1

1

kk

nk

nkP

XX

1ˆ1

1ˆ1



Example6.6‐2exponentialdistribution

Determine the 95% confidence interval for 64 samples when the data estimated mean is 3.5.

kk

nk

nkP

XX

1ˆ1

1ˆ1

Then,

96.1k

95.064

96.115.3

164

96.115.3

1

P

95.05.3

245.1

5.3

755.0

P

95.0356.0216.0 P

Note that the estimated is based a mean of 3.5,

286.0ˆ1ˆ

X

Example6.6‐3Bernoullidistribution

1 kpkqkpmfX , for pq 1

pXEmX

qpppXVARX 12

The statistical summation based on CLT should provide

pmXX ̂ and n

qp

nS X

2

2~

kk

nk

nkP X

XXX ˆ

kk

n

qpkpp

n

qpkP ˆ

The range of the bounds become



kk

n

qpkpp

n

qpkP ˆ

Determining the interval bounds and width – solve for p in terms of all the other variables! Remember that q=1-p

n

qpkpp

22ˆ

ppn

kpppp 1ˆˆ2

222

22

22

ˆ2

ˆ21 ppn

kpp

n

k

nk

pp

nk

nkp

p2

2

2

2

2

1

ˆ

1

2ˆ

2

Complete the “squared” term on the left 2

2

2

2

22

2

2

2

22

22

ˆ2ˆ

22

ˆ2

22

ˆ22

kn

kpn

kn

pn

kn

kpnp

kn

kpnp

2

22

2

22

2

2

22

ˆ2

22

ˆ2

22

ˆ2

kn

pn

kn

kpn

kn

kpnp

2

22

2

2

2

2

22

ˆ2

22

ˆ2

22

ˆ2

kn

pn

kn

kpn

kn

kpnp

Simplify as best possible

2

22222

2

ˆ4ˆ2ˆ2

kn

pnknkpnkpnp

2

222242222

2

ˆ4ˆ4ˆ4ˆ4ˆ2

kn

pnkpnkkpnpnkpnp

2

22

2

42

2,1 2

ˆ2

2

ˆ2

kn

kkpn

kn

kkpnp

2

2

2

2

1

ˆ

2

2ˆ2

kn

kpn

kn

kpnp

and 222

ˆ

2

ˆ2

kn

pn

kn

pnp

The distance between the two solutions goes to zero as n increases …

2

2

22

2

21

ˆˆ

kn

k

kn

pn

kn

kpnpp

Note: I have no clue what the textbook did …



Example6.6‐4Isitafaircoin?

If we get 47 heads after tossing a coin 100 times, is it fair within a 95% confidence interval?

95.0100

5.05.0ˆ

100

5.05.0

kkkppkP

96.1k

95.010

5.096.1ˆ

10

5.096.1

ppP

95.0098.0ˆ098.0 ppP

95.0098.05.0ˆ098.05.0 pP

95.0598.0ˆ402.0 pP

Therefore, 0.47 is within the acceptable range. Alternately

95.0100

5.05.0ˆ

100

5.05.0ˆ

kppkpP

95.0098.047.0098.047.0 pP

95.0568.0372.0 pP

If the number of coin flips were 1200 with the same proportional results …

95.064.34

5.096.1ˆ

64.34

5.096.1

ppP

95.00283.0ˆ0283.0 ppP

95.05283.0ˆ4717.0 pP (0.47 is not in the range)

Alternately

95.04983.04417.0 pP (0.5 is not in the range)

We would have to say the coin is biased. The values are not within the 95% confidence intervals!



6.7 Maximum Likelihood Estimators

The likelihood function can be “properly” defined as

n

iiXn xfxxxL

121 |,,,;

We are interested in finding the value of theta, , that maximizes this function!

To solve … take the derivative, set to zero, solve and determine the minima and maxima. Pick the global maxima!

Easy … right ?!

Example6.7‐1BernoulliRVofanunknownprobabilityp.

What is the maximum likelihood estimate of p if after flipping coins n times, we have k1 heads?

knk ppk

nk

1Pr , for nk ,,2,1,0

We define a likelihood function

11 1|Pr1

1knk pp

k

npkY

Determine the derivative

011|Pr 11

11

11

1111

knkknk ppknppk

k

npkY

dp

d

011 1111 11 pknpkpp knk

01 11111 11 pkpnpkkpp knk

01 111 11 pnkpp knk

The roots are at

n

kp 1,1,0

Two are minima (p=0 and p=1), therefore the ML probability is n

kpML

1



Example6.7‐2DeterminethemeanofaGaussianR.Vwithknownvariance.

For

2

2

2exp

2

1

X

X

X

X

mxxf

The joint probability likelihood estimate for n sample trials becomes.

2

1

2

1 2exp

2

1|

X

n

ii

n

X

n

iiX

xxfL

A simplification often performed is to use the log-likelihood function or

n

iiX

n

iiX xfxfL

11

|log|loglog

21

2

2explog

2

1loglog

X

n

ii

X

xnL

n

i X

i

X

xnL

12

2

2explog

2

1loglog

n

ii

XX

n

i X

i

X

xnx

nL1

2

21

2

2

2

1

2

1log

22

1loglog

Taking the derivative

02

1

2

1loglog

1

2

2

n

ii

XX

xd

dn

d

dL

d

d

022

1

12

n

ii

X

x

n

i

n

iix

11

n

iiML x

n 1

1



MaximumLikelihoodEstimatorProperties

Invariance: if an MLE is found for theta, , then the MLE of a function of theta is the function of the MLE.

For ML ˆ

Then hy has MLML hyy ˆ

from https://en.wikipedia.org/wiki/Maximum_likelihood

Consistency: the sequence of MLEs converges in probability to the value being estimated.

Asymptotic normality: as the sample size increases, the distribution of the MLE tends to the Gaussian distribution with mean \theta and covariance matrix equal to the inverse of the Fisher information matrix.

Efficiency, i.e., it achieves the Cramér–Rao lower bound when the sample size tends to infinity. This means that no consistent estimator has lower asymptotic mean squared error than the MLE (or other estimators attaining this bound).

Second-order efficiency after correction for bias.



6.8 Ordering, Ranking and Percentiles

Percentile: https://en.wikipedia.org/wiki/Percentile

“A percentile (or a centile) is a measure used in statistics indicating the value below which a given percentage of observations in a group of observations fall. For example, the 20th percentile is the value (or score) below which 20 percent of the observations may be found.”

Textbook:

The u-th percentile of X is the number xu such that FX(xu)=u.

uX xFu

One would say that the result xu is in the uth percentile.

Example 6.8-1 Assume a person’s IQ is distributed as N(100,100). That is , a Gaussian normal with mean of 100, a variance of 100 and a standard deviation of 10.

Then an IQ of 115 would be defined at what percentile of the popolations?

5.110100115 z

9332.05.1

The individual is in the 93rd percentile for IQ.

Median: The median of a population is defined where half of the population is above and half is below the value.

5.0medianX xF

For some of the distributions described, the mean and the median are not equal!

For example, the exponential distribution.

69.05.0ln

medianx whereas

1X