Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and...

391
Chapter . Introduction to R, and Descriptive Data Analysis What is R: an environment for data analysis and graphics based on S language a full-featured programming language freely available to everyone (with complete source code) Easier access to the means of handling BigData such as parallel computation, Hadoop, distributed computation. ocial homepage: http://www.R-project.org

Transcript of Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and...

Page 1: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Chapter 1. Introduction to R, and Descriptive Data Analysis

What is R: an environment for data analysis and graphics based on Slanguage

• a full-featured programming language

• freely available to everyone (with complete source code)

• Easier access to the means of handling BigData such as parallelcomputation, Hadoop, distributed computation.

• ocial homepage: http://www.R-project.org

Page 2: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

1.1 Installation

Installing R: R consists of twomajor parts: the base system and a collectionof (over 8.5K) user contributed add-on packages, all available from theabove website.

To install the base system, Windows users may follow the link

http://CRAN.R-project.org/bin/windows/base/release.htm

Note. The base distribution comes with some high-priority add-on pack-ages such as graphic systems, linear models etc.

After the installation, one may start R in the PC by going to Start ->Statistics -> R, or simply doubt-click the logo ‘R’ on your desktop. AnR-console will pops up with a prompt character like ‘>’.

Page 3: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

R may be used as a calculators. Of course it can do much more. Try out

> sqrt(9)/3 -1

To quit R, type at the prompt ‘q()’.

It is strongly advised to use RStudio instead of R. You may nd it with thelink

https://www.rstudio.com/

FromWikipedia: RStudio is a free and open-source integrated developmentenvironment (IDE) for R, a programming language for statistical computingand graphics. RStudio was founded by JJ Allaire, creator of the programminglanguage ColdFusion. Hadley Wickham is the Chief Scientist at RStudio.

We assume you use RStudio throughout the course.

Page 4: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

To dene a vector x consisting of integers 1, 2, · · · , 100

> x <- 1:100> x

[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18[19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36[37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54[55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72[73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90[91] 91 92 93 94 95 96 97 98 99 100> sum(x)> [1] 5050

Or we may also try> y <- (1:100)^2> y

[1] 1 4 9 16 25 36 49 64 81 100 121 144[13] 169 196 225 256 289 324 361 400 441 484 529 576[25] 625 676 729 784 841 900 961 1024 1089 1156 1225 1296[37] 1369 1444 1521 1600 1681 1764 1849 1936 2025 2116 2209 2304[49] 2401 2500 2601 2704 2809 2916 3025 3136 3249 3364 3481 3600

Page 5: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

[61] 3721 3844 3969 4096 4225 4356 4489 4624 4761 4900 5041 5184[73] 5329 5476 5625 5776 5929 6084 6241 6400 6561 6724 6889 7056[85] 7225 7396 7569 7744 7921 8100 8281 8464 8649 8836 9025 9216[97] 9409 9604 9801 10000

> y[14] # print out the 14-th element of vector y[1] 196

One may also try x+y, (x+y)/(x+y), help(log), log(x) etc.

Additional packages can be installed directly from the R prompt. Informa-tion on the available packages is available at

http://cran.r-project.org/web/views/http://cran.r-project.org/web/packages/

For example, one may install HSAUR2 – A Handbook of Statistical AnalysisUsing R (2nd edition):

Page 6: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

> install.packages("HSAUR2")> library("HSAUR2") # To load all the objects in the package\\

# into the current session

You may start an R help manual using command help.start(). By clickingPackages in the manual, you will see HSAUR2 is listed among the installedpackages.

Page 7: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

1.2 Help and documentation

To start a manual page of R: help.start()

Alternatively we may access online manual at

http://cran.r-project.org/manuals.html

To access a manual for function ‘mean’: help(mean), or ?mean

To access the info on an added-on package: help(package="HSAUR2")

To access the info on a data set or a function in the installed package:help(package="HSAUR2", men1500m)

Page 8: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

To load all the functions in an added-on package: library("HSAUR2")

To load a data set from the installed package into the current session:data(men1500m, package="HSAUR2")

Type men1500m to print out all the info in the data set ‘men1500m’.

Two other useful sites:

R Newsletter: http://cran.r-project.org/doc/Rnews/

R FAQ: http://cran.r-project.org/faqs.html

You may also simply follow the links on the main page of the R project

Page 9: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

http://www.R-project.org

Last but not least, google whatever questions often leads to most helpfulanswers

Page 10: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

1.3 Data Import/Export

The easiest form of data to import into R is a simple text le. The primaryfunction to import from a text le is scan. You may check out what ‘scan’can do: > ?scan

Create a plain text le ‘simpleData’, in the folder ‘statsI’ in your Drive D, asfollow:

This is a simple data file, created for illustrationof importing data in text files into R1 2 3 45 6 7 89 10 11 12

Page 11: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

It has two lines of explanation and 3 lines numbers. The R session belowimports it into R as a vector x and 3 × 4 matrix y, perform some simpleoperations. Note the ag skip=2 instructs R to ignore the rst two lines inthe le.

Note. R ignores anything after ‘#’ in a command line.

> x <- scan("D:/statsI/simpleData.txt", skip=2)> x # print out vector x[1] 1 2 3 4 5 6 7 8 9 10 11 12

> length(x)[1] 12> mean(x); range(x) # write 2 commands in one line to save space[1] 6.5[1] 1 12> summary(x) # a very useful command!

Min. 1st Qu. Median Mean 3rd Qu. Max.1.00 3.75 6.50 6.50 9.25 12.00

Page 12: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

> y <- matrix(scan("D:/statsI/simpleData.txt", skip=2), byrow=T,ncol=4)

> y # print out matrix y[,1] [,2] [,3] [,4]

[1,] 1 2 3 4[2,] 5 6 7 8[3,] 9 10 11 12> dim(y) # size of matrix y[1] 3 4> y[1,] # 1st row of y[1] 1 2 3 4> y[,2] # 2nd column of y[1] 2 6 10> y[2,4] # the (2,4)-th element of matrix y[1] 8

Page 13: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

A business school sent a questionnaire to its graduates in past 5 years andreceived 253 returns. The data are stored in a plain text le ‘Jobs’ whichhas 6 columns:C1: ID numberC2: Job type, 1 - accounting, 2 - nance, 3 - management, 4 - marketing

and sales, 5 -othersC3: Sex, 1 - male, 2 - femaleC4: Job satisfaction, 1 - very satised, 2 - satised, 3 - not satisedC5: Salary (in thousand pounds)C6: No. of jobs after graduation

IDNo. JobType Sex Satisfaction Salary Search1 1 1 3 51 12 4 1 3 38 23 5 1 3 51 44 1 2 2 52 5... ...

Page 14: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

We import data into R using command read.table

> jobs <- read.table("D:/statsI/Jobs.txt"); jobsV1 V2 V3 V4 V5 V6

1 IDNo. JobType Sex Satisfaction Salary Search2 1 1 1 3 51 13 2 4 1 3 38 24 3 5 1 3 51 4

... ...> dim(jobs)[1] 254 6> jobs[1,]

V1 V2 V3 V4 V5 V61 IDNo. JobType Sex Satisfaction Salary Search

We repeat the above again by taking the 1st row as the names of vari-ables (header=T) and the entries in 1st column as the names of the rows(row.names =1).

Page 15: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

> jobs <- read.table("D:/statsI/Jobs.txt", header=T, row.names=1)> dim(jobs)[1] 253 5> names(jobs)[1] "JobType" "Sex" "Satisfaction" "Salary" "Search"> class(jobs)[1] "data.frame"> class(jobs[,1]); class(jobs[,2]); class(jobs[,3]);

class(jobs[,4]); class(jobs[,5])[1] "integer"[1] "integer"[1] "integer"[1] "integer"[1] "integer"

Since the rst three variables are nominal, we may specify them as ‘factor’,while "Salary" can be specied as ‘numeric’:

> jobs <- read.table("D:/statsI/Jobs.txt", header=T, row.names=1,

Page 16: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

colClasses = c("factor", "factor", "factor","numeric", "integer"))

> class(jobs[,1]); class(jobs[,2]); class(jobs[,3]);class(jobs[,4]); class(jobs[,5])

[1] "factor"[1] "factor"[1] "factor"[1] "numeric"[1] "integer"

Note. we need to specify the class for the row name variable (i.e. 1stcolumn) as well.

Now we do some simple descriptive statistical analysis for this data.

> table(jobs[,1])1 2 3 4 5

Page 17: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

73 52 36 64 28 # No. of graduates with 5 different JobTypes> t <-table(jobs[,2], jobs[,1], deparse.level=2) # store table in t> t

jobs[, 1]jobs[, 2] 1 2 3 4 5

1 35 30 18 39 13 # No. of males with 5 different JobTypes2 38 22 18 25 15 # No. of females with 5 different JobTypes

> 100*t[1,]/sum(t[1,])1 2 3 4 5

25.92593 22.22222 13.33333 28.88889 9.62963# Percentages of males with 5 different JobTypes

> 100*t[2,]/sum(t[2,])1 2 3 4 5

32.20339 18.64407 15.25424 21.18644 12.71186# Percentages of females with 5 different JobTypes

> barplot(t, main="No. of graduates in 5 different job categories",legend.text=c("male", "female"), names.arg=c("accounting","finance", "management", "marketing", "others")) # draw a bar-plot

Page 18: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

accounting finance management marketing others

femalemale

No. of graduates in 5 different job categories0

1020

3040

5060

70

Page 19: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

The barplot shows the dierence in job distribution due to gender. Wemay also draw pie-plots, which are regarded as less eective.

> pie(t[1,]+t[2,],label=c("accounting","finance","management","marketing","others")); text(0,1, "Total", cex=2)

> pie(t[1,],label=c("accounting","finance","management","marketing","others")); text(0,1, "Male", cex=2)

> pie(t[2,],label=c("accounting","finance","management","marketing","others")); text(0,1, "Female", cex=2)

Page 20: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

accounting

finance

management

marketing

others

Total

accountingfinance

management

marketing

others

Maleaccounting

finance

management

marketing

others

Female

Page 21: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Now let look at the salary (jobs[,4]) distribution, and the impact due togender.

> mSalary <- jobs[,4][jobs[,2]==1]# extract the salary data from male

> fSalary <- jobs[,4][jobs[,2]==2]# extract the salary data from female

> summary(jobs[,4]); summary(mSalary); summary(fSalary)Min. 1st Qu. Median Mean 3rd Qu. Max.26.00 43.00 47.00 47.13 52.00 65.00Min. 1st Qu. Median Mean 3rd Qu. Max.34.00 44.00 48.00 48.11 53.00 65.00Min. 1st Qu. Median Mean 3rd Qu. Max.26.00 42.25 46.00 46.00 51.00 61.00

> hist(jobs[,4], col="gray", nclass=15, xlim=c(25,66),main="Histogram of Salaries (Total)")

# plot the histogram of salary data> hist(mSalary, col="blue", nclass=15, xlim=c(25,66),

main="Histogram of Salaries (Male)")

Page 22: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

> hist(fSalary, col="red", nclass=15, xlim=c(25,66),main="Histogram of Salaries (Female)")

You may also try stem-and-leaf plot: stem(jobs[,4])

Page 23: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Histogram of Salaries (Total)

jobs[, 4]

30 40 50 60

010

2030

Histogram of Salaries (Male)

mSalary

30 40 50 60

05

1015

Histogram of Salaries (Female)

30 40 50 60

05

1015

Page 24: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

To export data from R, use write.table or write.

To write jobs into a plain text le ‘Jobs1.txt’:

> write.table(jobs, "Jobs1.txt")

which retains both the row and column names. Note the dierent entriesin the le are separated by spaces.

We may also use

> write.table(jobs, "Jobs2.txt", row.names=F, col.names=F),> write.table(jobs, "Jobs3.txt", sep=",")

Page 25: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Compare the three output les.

Note that the values of factor variables are recorded with “ ". To recordall the levels of factor variables as numerical values, we need to dene apure numerical data.frame rst:

> t <- data.frame(as.numeric(jobs[,1]), as.numeric(jobs[,2]),as.numeric(jobs[,3]), jobs[,4], jobs[,5])

> write.table(t, "Jobs4.txt")

The le "Jobs4.txt" contains purely numerical values.

Note. (i) Working directory — all exported les are saved in ‘My Documents’by default. You may change your working directory by clicking

File -> Change dir...

Page 26: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

in the RGui window. For example, I create on my laptop D:\statsI as myworking directory for this course.

(ii) Saving a session — when you quit an R session q(), you will be oeredan option to ‘save workspace image’. By clicking on "yes", you will save allthe objects (including data sets, loaded functions from added-on packagesetc) in your R session. You may continue to work on this session by directlydouble-clicking on the image le in your working directory.

A useful tip: Create a separate working directory for each of your R projects.

1.4 Organising an Analysis

An R analysis typically consists of executing several commands. Instead oftyping each of those commands on the R prompt, we may collect them

Page 27: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

into a plain text le. For example, the le "jobsAnalysis.r" in my workingdirectory reads like:

jobs <- read.table("Jobs.txt", header=T, row.names=1)# File "Jobs.txt" is in the working directory now

mSalary <- jobs[,4][jobs[,2]==1]fSalary <- jobs[,4][jobs[,2]==2]summary(jobs[,4])summary(mSalary)summary(fSalary)par(mfrow=c(3,1)) # display 3 figures in one columnhist(jobs[,4], col="gray", nclass=15, xlim=c(25,66),

main="Histogram of Salaries (Total)")hist(mSalary, col="blue", nclass=15, xlim=c(25,66),

main="Histogram of Salaries (Male)")hist(fSalary, col="red", nclass=15, xlim=c(25,66),

main="Histogram of Salaries (Female)")

You may carry out the project by sourcing the le into an R session:

Page 28: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

> source("jobAnalysis.r", echo=T)

Also try source("jobAnalysis.r").

1.5 Writing functions in R

For some repeated task, it is convenient to dene a function in R. Weillustrate this idea by an example.

Consider the famous ‘Birthday Coincidences’ problem: In a class of kstudents, what is the probability that at least two students have the samebirthday?

Let us make some assumptions to simplify the problems:

Page 29: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

(i) only 365 days in every year,(ii) every day is equally likely to be a birthday,(iii) students’ birthdays are independent with each other.

With k people, the total possibilities is (365)k .

Consider the complementary event: all k birthdays are dierent. The totalsuch possibility is

365 × 364 × 363 × · · · × (365 − k + 1) =365!

(365 − k )!So the probability that at least two students have the same birthday is

p(k ) = 1 −365!

(365 − k )!(365)k.

Page 30: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Wemay use R to compute p(k ). Unfortunately factorials are often too large,e.g. 52! = 8.065525e + 67, and often cause overow in computer. We adoptthe alternative formula

p(k ) = 1 − explog(365!) − log((365 − k )!) − k log(365).

We dene a R-function pBirthday to perform this calculation for dierentk .

> pBirthday <- function(k)+ 1 - exp(lfactorial(365) - lfactorial(365-k) - k*log(365))

# lfactorial(n) returns log(n!)> pBirthday(100)[1] 0.9999997 # probability with a class of 100 students> x <- c(20, 30, 40, 50, 60)> pBirthday(x)[1] 0.4114384 0.7063162 0.8912318 0.9703736 0.9941227

Page 31: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

With 20 students in class, the probability of having overlapping birthdaysis about 0.41. But with 60 students, the probability is almost 1, i.e. itis almost always true that at least 2 out of 60 students have the samebirthday.

Note. The expression in a function may have several lines. In this case theexpression is enclosed in curly braces , and the nal line determinesthe return value.

Page 32: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Another Example — The capture and recapture problem

To estimate the number of whitesh in a lake, 50 whitesh are caught,tagged and returned to the lake. Some time later another 50 are caughtand only 3 are tagged ones. Find a reasonable estimate for the number ofwhitesh in the lake.

Suppose there are n whitesh in the lake. Catching 50 sh can be done

in(n50

)= n !50!(n−50)! ways, while catching 3 tagged ones and 47 untagged

can be done in(503

) (n − 5047

)ways. Therefore the probability for the

latter event to occur is

Pn =

(503

) (n − 5047

) / (n50

).

Page 33: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Therefore, a reasonable estimate for n should be the value at which Pnobtains its maximum. We use R to compute Pn and to nd the estimate.

> Pn <- function(n) + tmp <- choose(50,3)*choose(n-50,47)+ tmp/choose(n,50)+ # Definition for function Pn ends here> n <- 97:2000 # as there are at least 97 fish in the lake> plot(n, Pn(n), type='l')

It produces the plot of Pn against n :

Page 34: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

500 1000 1500 2000

0.00

0.05

0.10

0.15

0.20

n

Pn

To nd the maximum:> m <- max(Pn(n)); m[1] 0.2382917> n[Pn(n)==m][1] 833

Hence the estimatednumber of sh in thelake is 833.

Page 35: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

1.6 Control structure: loops and conditionals

An if statement has the formif (condition) expression1 else expression2

It executes ‘expression1’ if ‘condition’ is true, and ‘expression2’ otherwise.When ‘condition’ contains several lines, they should be enclosed in curlybraces . The same applies to expressions.

The above statement can be compactly written in the formifelse(condition, expression1, expression2)

When the else-part is not present:if (condition) expression

It executes ‘expression’ if ‘condition’ is true, and does nothing otherwise.

Page 36: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

A for loop allows a statement to be iterated as a variables assumes valuesin a specied sequence. It has the form:

for(variable in sequence) statement

A while loop does not use an explicit loop variable:while (condition) expression

It repeats ‘expression’ as long as ‘condition’ holds. This makes it dierentlyfrom the “if-statement” above.

We illustrate those control commands by examining a simple ‘doubling’strategy in gambling.

You go to a casino to play a simple 0-1 game: you bet x dollars and ipa coin. You win 2x dollars and keep your bet if ‘Head’, and lose x dollarsif ‘Tail’. You start 1 dollar in rst game, and double your bet in each newgames, i.e. you bet 2i−1 dollars in the i -th game, i = 1, 2, · · · .

Page 37: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

With this strategy, once you win, say, at the (k +1)-th game, you will recoverall your losses in your previous games plus a prot of 2k + 1 dollars, as

2 × 2k >k∑i=1

2i−1 = 2k − 1.

Hence as long as (i) the probability p of the occurrence of ‘Head’ is positive(no matter how small), and (ii) you have enough capital to keep you in thegames, you may win handsomely at the end — is it really true?

Condition (ii) is not trivial, as the maximum loss in 20 games is 220 − 1 =1, 048, 575 dollars!

Plan A: Suppose you could aord to lose maximum n games and, therefore,decide to play n games. We dene the R -function nGames below to simulateyour nal earning/loss (after n games).

Page 38: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

nGames <- function(n,p) # n is the No. of games to play# p is the prob of winning each game

x <- 0 # earning after each gamefor(i in 1:n) ifelse(runif(1)<p, x <- x+2^i, x <- x-2^(i-1))

# runif(1) returns a random number from uniform dist on (0, 1)x # print out your final earning/loss

To play n = 20 games with p = 0.1:

> nGames(20, 0.1)[1] -999411> nGames(20, 0.1)[1] -1048575> nGames(20, 0.1)[1] 524289> nGames(20, 0.1)

Page 39: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

[1] -655263> nGames(20, 0.1)[1] -1016895

We repeated the experience 5 times above, with 5 dierent results.

One way to assess this gameplan is to repeat a large number of times andlook at the average earning/loss:

> x = vector(length=5000)> for(i in 1:5000) x[i] <- nGames(20, 0.1)> mean(x)[1] -733915

In fact, this mean -733915 is stable measure reecting the average loss ofthis gameplan.

Page 40: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Plan B: Play the maximum n , but quit as soon as winning one game. TheR-function winStop simulates the earning/loss.

winStop <- function(n,p) # n -- maximum No. of games, p -- prob of winning each game

i <- 1ifelse((runif(1)<p), x<- 2, x<- -1) # play 1st gamewhile((x<0)&(i<n)) i <- i+1 # i records the no. of games played

ifelse(runif(1)<p, x <- x+2^i, x <- x-2^(i-1))

x

Set n = 20, p = 0.1, we repeat the experience a few times:

>winStop(20, 0.1)[1] 2

Page 41: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

> winStop(20, 0.1)[1] 17> winStop(20, 0.1)[1] 129> winStop(20, 0.1)[1] -1048575> winStop(20, 0.1)[1] 16385

To assess the gameplan:

> x<- 1:5000> for(i in 1:5000) x[i] <- winStop(20, 0.1)> mean(x)[1] -112672.9 # This indicates "Plan B" is better than "Plan A"> for(i in 1:5000) x[i] <- winStop(80, 0.1)

# the maximum no. of games is 80 now> mean(x)

Page 42: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

[1] -7.22886e+20> for(i in 1:5000) x[i] <- winStop(90, 0.1)

# the maximum no. of games is 90 now> mean(x)3.790896e+18

With p as small as 0.1, you need a huge capital in order to play about 90games to generate the positive returns in average.

The best and the most eective way to learn R: use it!

Hands-on experience is the most illuminating.

Page 43: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Chapter 2. Probability

Probability: a number between 0 and 1 to quantifying uncertainty in amathematical manner.

2.1 Sample space and events

Sample Space Ω: a set of possible outcomes of an experiment.

Sample outcome, realization or element: a point in a sample space, de-noted by ω ∈ Ω.

Event or random event: a subset of Ω, i.e. an assemble of some sampleoutcomes

Page 44: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Example 1. Experiment – Toss a coin two times.

Sample space = HH ,HT ,T H ,TT .

A ≡ HH ,HT = 1st toss is head is an event.

What is the sample space if we toss a coin for ever? — the Bernoulli trial.

Background of tossing a coin: success or failure, up or down, better orworse, boy or girl, 1 or 0 and etc.

Page 45: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Example 2. Find the sample space in each of the following cases

• number of insect damaged leaves on a plant

• lifetime (in hours) of a light bulb

• weight of a 10-hour old infant

• exchange rate of pounds sterling to US-dollars today next year

• directional movement S&P500 index price tomorrow

Page 46: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Complement of event A: Ac = ω ∈ Ω : ω < A. Obviously Ωc = ∅ (theempty set).

Union of events A and B : A ∪ B = ω ∈ Ω : ω ∈ A or ω ∈ B. ThenA ∪ B = B ∪ A, A ∪ Ac = Ω.

Intersection of events A and B : A ∩ B ≡ AB = ω ∈ Ω : ω ∈ A and ω ∈ B.Then A ∩ B = B ∩ A, A ∩ Ac = ∅.

If A1,A2, · · · is a sequence of events,∞⋃i=1

Ai = ω ∈ Ω : ω ∈ Ai for at least one i ,

∞⋂i=1

Ai = ω ∈ Ω : ω ∈ Ai for all i .

Page 47: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Dierence of events A and B : A − B = ω ∈ Ω : ω ∈ A and ω < B.Obviously A − B , B − A.

Page 48: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Inclusion: occurrence of event A implies that of B , we say A ⊂ B .

Summary of Terminology

Ω Sample space, true event (always true)∅ null event (always false)ω outcome, realization or elementAc complement of A (not A)A ∪ B union (A or B)A ∩ B or AB intersection (A and B)A − B or A\B set dierenceA ⊂ B set inclusion

Mutually exclusive or disjoint: A and B are mutually exclusive if A∩B = ∅.Obviously A and Ac are mutually exclusive.

Page 49: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Partition of Ω: a sequence disjoint events A1,A2, · · · such that

∞⋃i=1

Ai = Ω.

Indicator of A: IA ≡ IA(ω) — a function dened on ω ∈ Ω:

IA =

1 if A occurs0 otherwise, or equivalently IA(ω) =

1 if ω ∈ A0 if ω < A.

Limits of a sequence of monotonic events:(i) A sequence A1,A2, · · · is monotone increasing if A1 ⊂ A2 ⊂ · · · . Wedene limn→∞An = ∪∞i=1Ai .(ii) A sequence A1,A2, · · · is monotone decreasing if A1 ⊃ A2 ⊃ · · · . Wedene limn→∞An = ∩∞i=1Ai .In both cases, we may write An → A, where A denotes its limit.

Page 50: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Example 3. Let Ω = (−∞,∞), Ai = [0, 1/i ). Then

∪∞i=1Ai = [0, 1), ∩∞i=1Ai = 0.

If we change to Ai = (0, 1/i ), then ∪∞i=1Ai = (0, 1) and ∩∞i=1Ai = ∅.

For Ai = (−i , i ), ∪∞i=1Ai = Ω.

Example 4. Let Ω be the salaries earned by the graduates from a BusinessSchool. We may choose Ω = [0,∞). Based on the dataset "Jobs.txt", weextract some interesting events/subsets. Recall the info of the dataset:C1: ID numberC2: Job type, 1 - accounting, 2 - nance, 3 - management, 4 - marketing

and sales, 5 -othersC3: Sex, 1 - male, 2 - femaleC4: Job satisfaction, 1 - very satised, 2 - satised, 3 - not satisedC5: Salary (in thousand pounds)

Page 51: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

C6: No. of jobs after graduation

We have dened salaries of male and female graduates respectively asfollows:

> jobs <- read.table("Jobs.txt", header=T, row.names=1)> mSalary <- jobs[,4][jobs[,2]==1]> fSalary <- jobs[,4][jobs[,2]==2]

Similarly we may extract the salaries from nance sector or accounting:

> finSalary <- jobs[,4][jobs[,1]==2]; summary(finSalary)Min. 1st Qu. Median Mean 3rd Qu. Max.37.00 46.00 53.00 52.08 58.00 65.00

> accSalary <- jobs[,4][jobs[,1]==1]; summary(accSalary)Min. 1st Qu. Median Mean 3rd Qu. Max.40.00 47.00 51.00 50.45 54.00 62.00

Page 52: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

According to this dataset, nance pays slightly higher than accounting. Wemay also extract the salaries for males (females) in accounting:

> maccSalary <- jobs[,4][(jobs[,1]==1) & (jobs[,2]==1)]# &' stands for logic operation and'

> summary(mfinSalary)Min. 1st Qu. Median Mean 3rd Qu. Max.44.00 48.00 51.00 51.31 55.00 62.00

> faccSalary<- jobs[,4][(jobs[,1]==1) & (jobs[,2]==2)]> summary(ffinSalary)

Min. 1st Qu. Median Mean 3rd Qu. Max.40.00 45.25 49.50 49.66 53.00 61.00

To extract the salaries for males in both nance and accounting:

> mfinaccSalary <- jobs[,4][ (jobs[,2]==1) & ( (jobs[,1]==1) |(jobs[,1]==2) ) ] # |' stands for logic operation or'

Page 53: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

To remove (unwanted) objects:

> rm(mSalary, fSalary, accSalary, finSalary, maccSalary,faccSalary, mfinaccSalary)

Page 54: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

2.2 Probability

Denition. Probability a function P that assigns a real number P(A) to eachevent in a sample space, which satises the three conditions:

(i) P (A) ≥ 0 for any event A,(ii) P (Ω) = 1, and(iii) For disjoint events A1,A2, · · · , P

( ⋃∞i=1Ai ) =

∑∞i=1 P (Ai ).

Let A1 = Ω, A2 = A3 = · · · = ∅. By (iii) and (ii), P (∅) = 0.Hence for any disjoint A and B , P (A ∪ B) = P (A) + P (B).

More properties of probability:

1. P (Ac) = 1 − P (A).

2. If A ⊂ B , P (B) = P (A) + P (B − A) ≥ P (A).

Page 55: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

3. P (A ∪ B) = P (A) + P (B) − P (A ∩ B) ≤ P (A) + P (B).

4. Boole inequality: P (⋃ni=1Ai ) ≤

∑ni=1 P (Ai ).

5. If An → A, P (An) → P (A).

Proof. 1. P (A) + P (Ac) = P (A ∪ Ac) = P (Ω) = 1.

3. A ∪ B = (AB) ∪ (ABc) ∪ (AcB), and the 3 events on the RHS are disjoint.Hence

P (A ∪ B) = P (AB) + P (ABc) + P (AcB).

Since A = (AB) ∪ (ABc), P (A) = P (AB) + P (ABc). Similarly P (B) = P (AB) +P (AcB). ThereforeP (A ∪ B) = P (AB) + P (A) − P (AB) + P (B) − P (AB) = P (A) + P (B) − P (AB).

4. is obtained by applying 3. repeatedly.

The proof of 5. is a bit more involved, we refer to p.7 of Wasserman (2004).

Page 56: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Example 5. Toss a fair 6-sided die, there are 6 possible outcomes eachwith probability 1/6. If we toss it two times, the sample space is Ω = (i , j ) :i , j = 1, · · · , 6. Since each outcome is equally likely,

P (A) =No. of elements in A

36, A ∈ Ω.

For example, P (A) = 2/36 = 1/18 for A = the sum is 3, and P (A) = 3/36 =1/12 for A = the sum is 4.

2.3 Independence

Denition. k events A1, · · · ,Ak are independent if

P (Ai1Ai2 · · ·Aij ) = P (Ai1)P (Ai2) · · · P (Aij )

for any 1 ≤ i1 < i2 < · · · < ij ≤ k and 2 ≤ j ≤ k .

Page 57: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Intuition. If A and B are independent, the occurrence of A has nothing todo with the occurrence of B . For example, two persons toss two coins: twooutcomes are independent with each other.

Page 58: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Example 6. Toss a fair coin 10 times. Let A = “at least one head". Dene Tjbe the event that tail occurs on the j-th toss. Then

P (A) = 1 − P (Ac) = 1 − P (T1 · · ·T10) = 1 − P (T1)P (T2) · · · P (T10)

= 1 − (0.5)10 ≈ 0.9999.

Example 7. John and Peter play each other in the nal of a tennis tourna-ment. Whoever wins 2 out of 3 games will win the tournament. Supposethat John is higher ranked player who beats Peter in a single game withprobability 0.6, and each game will be played independently. Find theprobability that John will win the tournament.

Let Ai =“John wins the i-th game", and A =“John wins the tournament".Then

A = (A1A2) ∪ (A1Ac2A3) ∪ (A

c1A2A3),

Page 59: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

and the 3 events on the RHS are disjoint. Hence

P (A) = P (A1A2) + P (A1Ac2A3) + P (A

c1A2A3)

= (0.6)2 + 2 × (0.6)2 × 0.4 = 0.648,

which is greater than the probability for John to win a single game.

Question. Would John prefer to play the maximum 5 (instead of 3) gamesin the nal?

2.4 Conditional Probability

Example 8. Five people take one ball each out of a bag containing 4 whiteballs and one red ball.

Page 60: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Obviously the Probability for the 1st person to take the red ball is 1/5.What is the probability for the 2nd, 3rd, 4th or the last person to take thered ball?

Denition. If P (B) > 0, the conditional probability of A given B is

P (A |B) = P (AB)/P (B),

which is the probability of event A given the condition that event B occursalready.

Remark. (i) If A and B are independent, P (A |B) = P (A).

(ii) In general P (AB) = P (A|B)P (B).

Page 61: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Example 8. (Continue)

P (2nd person takes R) = P (1st person takes W, 2nd Person take R)= P (1st person takes W) × P (2nd person takes R|1st person takes W)

=4

5×1

4= 1/5,

which is the same as the probability for the 1st person to take the red.

Page 62: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Let A1, · · · ,Ak be a partition of Ω.

Law of Total Probability. For any event B ,

P (B) = P (BA1) + · · · + P (BAk ).

Proof. B = BΩ = B(∪iAi ) = ∪i (BAi ). Since BA1, · · · ,BAk are disjoint, thelaw holds.

Bayes’ Formula. Let P (B) > 0 and P (Ai ) > 0 for i = 1, · · · , k . Then

P (Aj |B) = P (B |Aj )P (Aj )/ k∑i=1

P (B |Ai )P (Ai ).

Proof. P (Aj |B) = P (AjB)/P (B) = P (B |Aj )P (Aj )

/P (B). Replacing P (B)

using the law of total probability, we obtain Bayes’ Formula.

Page 63: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Example 9. Larry divides his emails into 3 categories: A1 =“spam", A2 =“lowpriority" and A3 =“high priority". From previous experience he concludes

P (A1) = 0.7, P (A2) = 0.2, P (A3) = 0.1.

Let B be the event that an email contains the word ‘free’. Again based onprevious experience,

P (B |A1) = 0.9, P (B |A2) = 0.1, P (B |A3) = 0.1.

He receives a new email with word ‘free’. What is the probability that it isspam?

By Bayes’ theorem,

P (A1 |B) =P (B |A1)P (A1)∑3i=1

P (B |Ai )P (Ai )= 0.955.

Page 64: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Chapter 3. Random Variables and Distributions

Basic idea of introducing random variables: represent outcomes and/orrandom events by numbers.

3.1 Random variables and Distributions.

Denition. A random variable is a function dened on the sample spaceΩ, which assigns a real number X (ω) to each outcome ω ∈ Ω.

Example 1. Flip a coin 10 times. We may dene random variables (r.v.s) asfollows:

X1 = no. of heads,X2 = no. of ips required to have the rst head,

Page 65: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

X3 = no. of ‘HT’-pairs,X4 = no. of tails.

For ω = HHT HHT HHTT , X1(ω) = 6, X2(ω) = 1, X3(ω) = 3, X4(ω) = 4.

Note X1 ≡ 10 − X4.

Remark. The values of a r.v. varies and cannot be pre-determined beforean outcome occurs.

Denition. For any r.v. X , its (cumulative) distribution function (CDF) isdened as FX (x ) = P (X ≤ x ).

Example 2. Toss a fair coin twice and let X be the number of heads. Then

P (X = 0) = P (X = 2) = 1/4, P (X = 1) = 1/2.

Page 66: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Hence its CDF is FX (x ) =

0 x < 0,1/4 0 ≤ x < 1,3/4 1 ≤ x < 2,1 x ≥ 2.

Note. (i) FX (x ) is right continuous, non-decreasing, and dened for allx ∈ (−∞,∞). For example, FX (1.1) = 0.75.

(ii) The CDF is a non-random function.

(iii) If F (·) is the CDF of r.v. X , we simply write X ∼ F .

Properties of CDF. A function F (·) is a CDF i

(i) F is non-decreasing: x1 < x2 implies F (x1) ≤ F (x2),(ii) F is normalized: limx→−∞ F (x ) = 0, limx→∞ F (x ) = 1,(iii) F is right continuous: limy↓x F (y ) = F (x ).

Page 67: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Probabilities from CDF

(a) P (X > x ) = 1 − F (x )(b) P (x < X ≤ y ) = F (y ) − F (x )(c) P (X < x ) = limh↓0 F (x − h) ≡ F (x−)(d) P (X = x ) = F (x ) − F (x−).

Note. It is helpful for understanding (c) & (b) to revisit Example 2.

Page 68: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

3.2 Discrete random variables

If r.v. X only takes some isolated values, X is called a discrete r.v. Its CDFis called a discrete distribution.

Denition. For a discrete r.v. X taking values x1, x2, · · · , we dene theprobability function (or probability mass function) as

fX (xi ) = P (X = xi ), i = 1, 2, · · · .

Obviously, fX (xi ) ≥ 0 and∑i fX (xi ) = 1.

It is often more convenient to list a probability function in a table:

X x1 x2 · · · · · ·

Probability fX (x1) fX (x2) · · · · · ·

Page 69: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Example 2 (continue). The probability function is be tabulated:

X 0 1 2Probability 1/4 1/2 1/4

Page 70: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Expectation or Mean EX or E (X ): a measure for the ‘center’, ‘averagevalue’ of a r.v. X , and is often denoted by µ.

For a discrete r.v. X with probability function fX (x ),

µ = EX =∑i

xi fX (xi ).

Variance Var(X ): a measure for variation, uncertainty or ‘risk’ of a r.v. X , isoften denoted by σ2, while σ is called standard deviation of X .

For a discrete r.v. X with probability function fX (x ),

σ2 = Var(X ) =∑i

(xi − µ)2fX (xi ) =

∑i

x2i fX (xi ) − µ2.

Page 71: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

The k -th moment of X : µk ≡ E (X k ) =∑i xkifX (xi ), k = 1, 2, · · · .

Obviously, µ = µ1, and σ2 = µ2 − µ21.

Page 72: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Some important discrete distributions

Convention. We often use upper case letters X ,Y , Z , · · · to denote r.v.s,and lower case letters x , y , z , · · · to denote the values of r.v.s. In contrastletters a, b or A,B are often used to denote (non-random) constants.

Degenerate distribution: X ≡ a , i.e. FX (x ) = 1 for any x ≥ a , and 0

otherwise.

It is easy to see that µ = a and σ2 = 0.

Bernoulli distribution: X is binary, P (X = 1) = p and P (X = 0) = 1 − p ,where p ∈ [0, 1] is a constant. It represents the outcome of ipping a coin.

µ = 1 · p + 0 · (1 − p) = p, σ2 = p(1 − p).

Note. Bernoulli trial refers to an experiment of ipping a coin repeatedly.

Page 73: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Binomial distribution Bin(n, p): X takes values 0, 1, · · · , n only with theprobability function

fX (x ) =n !

x !(n − x )!px (1 − p)n−x , x = 0, 1, · · · , n .

Theorem. If we toss a coin n times, let X be the number of heads. ThenX ∼ Bin(n, p), where p is the probability that head occurs in tossing thecoin once.

Proof. Let ω = HT HHT · · ·H denote an outcome of n tosses. Then X = k

i there are k ‘H’ and (n − k ) ‘T’ in ω. Therefore the probability of such aω is pk (1 − p)n−k . Since those k H’s may occur in any n positions of the

sequence, there are( nk

)such ω’s. Hence

P (X = k ) =

(nk

)pk (1 − p)n−k =

n !k !(n − k )!p

k (1 − p)n−k , k = 0, 1, · · · , n .

Page 74: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Let us check if the probability function above is well dened. ObviouslyP (X = k ) ≥ 0, furthermore

n∑k=0

P (X = k ) =n∑k=0

(nk

)pk (1 − p)n−k = p + (1 − p)n = 1n = 1.

Let us work out the mean and the variance for X ∼ Bin(n, 1 − p).

µ =n∑k=0

kn !

k !(n − k )!pk (1 − p)n−k =

n∑k=1

kn !

k !(n − k )!pk (1 − p)n−k

=n−1∑j=0

np(n − 1)!

j !(n − 1 − j )!pj (1 − p)n−1−j

= npm∑j=0

m!j !(m − j )!p

j (1 − p)m−j = np .

Page 75: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Note that σ2 = E (X 2) − µ2 = E X (X − 1) + µ − µ2. We need to work out

E X (X − 1) =n∑k=0

k (k − 1)n !

k !(n − k )!pk (1 − p)n−k

=n∑k=2

n(n − 1)p2(n − 2)!

(k − 2)!(n − 2) − (k − 2)pk−2(1 − p)(n−2)−(k−2)

= n(n − 1)p2n−2∑j=0

(n − 2)!j !(n − 2) − j )p

j (1 − p)(n−2)−j = n(n − 1)p2.

This gives σ2 = n(n − 1)p2 + np − (np)2 = np(1 − p).

By the above theorem, we can see immediately

(i) If X ∼ Bin(n, p), n − X ∼ Bin(n, 1 − p).(ii) If X ∼ Bin(n, p),Y ∼ Bin(m, p), and X andY are independent,then X +Y ∼ Bin(n +m, p).

Page 76: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Furthermore, let Yi = 1 if the i -th toss yields H, and 0 otherwise. ThenY1, · · · ,Yn are independent Bernoulli r.v.s with mean p and variance p(1−p).Since X =Y1 + · · · +Yn , we notice

EX =n∑i=1

EYi = np, Var(X ) =n∑i=1

Var(Yi ) = np(1 − p).

This is a much easier way to derived the means and variances for binomialdistributions, which is based on the following general properties.

(i) For any r.v.s ξ1, · · · , ξn , and any constants a1, · · · , an ,

E( n∑i=1

ai ξi)=

n∑i=1

aiE (ξi ).

Page 77: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

(ii) If, in addition, ξ1, · · · , ξn are independent,

Var( n∑i=1

ai ξi)=

n∑i=1

a2i Var(ξi ).

Page 78: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Independence of random variables. The r.v.s ξ1, · · · , ξn are independent if

P (ξ1 ≤ x1, · · · , ξn ≤ xn) = P (ξ1 ≤ x1) × · · · × P (ξn ≤ xn)

for any x1, · · · , xn .

Moment generate function (MGF) of r.v. X :

ψX (t ) = E (etX ), t ∈ (−∞,∞).

(i) It is easy to see that ψ′X(0) = E (X ) = µ. In general ψ(k )

X(0) = E (X k ) = µk .

(ii) IfY = a + bX , ψY (t ) = E (e(a+bX )t ) = eatψX (bt ).

(iii) If X1, · · · ,Xn are independent, ψ∑i Xi(t ) =

∏ni=1ψXi (t ), and vice versa

If X is discrete, ψX (t ) =∑i exi t fX (xi ).

Page 79: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

To generate a r.v. from Bin(n, p), we can ip a coin (with p-probability forH) n times, and count the number of heads. However R can do the ippingfor us much more eciently:

> rbinom(10, 100, 0.1) # generate 10 random numbers from \Bin(100, 0.1)[1] 8 11 9 9 18 7 5 5 3 7

> rbinom(10, 100, 0.1) # do it again, obtain different numbers[1] 11 13 6 7 11 9 9 9 12 10

> x <- rbinom(10, 100, 0.7); x; mean(x)[1] 66 77 67 66 64 68 70 68 72 72[1] 69 # mean close to np=70> x <- rbinom(10, 100, 0.7); x; mean(x)[1] 70 73 72 70 68 69 70 66 79 71[1] 70.8

Page 80: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Note that rbinom(10000, 1, 0.5) is equivalent to toss a fair coin 10000times:

> y <- rbinom(10000, 1, 0.5); length(y); table(y)[1] 10000y

0 14990 5010 # about a half times with head

You may try with smaller sample size, such as

> y <- rbinom(10, 1, 0.5); length(y); table(y)[1] 10y0 13 7 # 7 heads and 3 tails

Page 81: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Also try out pbinom (CDF), dbinom (probability function), qbinom (quantile)for Binomial distributions.

Geometric Distribution Geom(p): X takes all positive integer values withprobability function

P (X = k ) = (1 − p)k−1p, k = 1, 2, · · · .

Obviously, X is the number of tosses required in a Bernoulli trial to obtainthe rst head.

µ =∞∑k=1

k (1 − p)k−1p = −pd

dp

∞∑k=1

(1 − p)k = −pd

dp(1/p) = 1/p,

and it can be shown that σ2 = (1 − p)/p2.

Using the MGF provides an alternative way to nd mean and variance: for

Page 82: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

t < − log(1 − p) (i.e. e t (1 − p) < 1),

ψX (t ) = E (e tX ) =∞∑i=1

e t i (1 − p)i−1p =p

1 − p

∞∑i=1

e t (1 − p)i

=p

1 − p

e t (1 − p)

1 − e t (1 − p)=

pe t

1 − e t (1 − p)=

p

e−t − 1 + p.

Now µ = ψ′X(0) =

[ pe−t

(e−t−1+p)2

]t=0 = 1/p , and

µ2 = ψ′′X (0) =

[ 2pe−2t

(e−t − 1 + p)3−

pe−t

(e−t − 1 + p)2

]t=0 = 2/p

2 − 1/p .

Hence σ2 = µ2 − µ2 = (1 − p)/p2.

The R functions for Geom(p): rgeom, dgeom, pgeom and qgeom.

Page 83: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Poisson Distribution Poisson(λ): X takes all non-negative integers withprobability function

P (X = k ) =λk

k ! e−λ, k = 0, 1, 2, · · · ,

where λ > 0 is a constant, called parameter.

The MGF X ∼ Poisson(λ):

ψX (t ) =∞∑k=0

ek tλk

k ! e−λ = e−λ

∞∑k=0

(e tλ)k

k ! = e−λeetλ = expλ(e t − 1).

Hence

µ = ψ′X (0) = [expλ(et − 1)λe t ]t=0 = λ,

µ2 = ψ′′X (0) = [expλ(e

t − 1)λe t + expλ(e t − 1)(λe t )2]t=0 = λ + λ2.

Page 84: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Therefore σ2 = µ2 − µ2 = λ.

Remark. For Poisson distributions, µ = σ2.

The R functions for Poisson(λ): rpois, dpois, ppois and qpois.

To understand the role of the parameter λ, we plot the probability functionof Poisson(λ) for dierent values of λ.

> x <- c(0:20)> plot(x,dpois(x,1),type='o',xlab='k', ylab='probability function')> text(2.5,0.35, expression(lambda==1))> lines(x,dpois(x,2),typ='o',lty=2, pch=2)> text(3.5,0.25, expression(lambda==2))> lines(x,dpois(x,8),typ='o',lty=3, pch=3)> text(7.5,0.16, expression(lambda==8))> lines(x,dpois(x,15),typ='o',lty=4, pch=4)> text(14.5, 0.12, expression(lambda==15))

Page 85: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Plots of λk e−λ/k ! against k

0 5 10 15 20

0.0

0.1

0.2

0.3

k

prob

abili

ty fu

nctio

n

λ = 1

λ = 2

λ = 8

λ = 15

Page 86: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Three ways of computing probability and distribution functions:

• calculators — for simple calculation• statistical tables — for, e.g. the nal exam• R — for serious tasks such as real application

Page 87: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

3.2 Continuous random variables

A r.v. X is continuous if there exists a function fX (·) ≥ 0 such that

P (a < X < b) =

∫ b

afX (x )dx , [a < b .

We fX (·) the probability density function (PDF) or, simply, density function.Obviously

FX (x ) =

∫ x

−∞fX (u)du .

Properties of continuous random variables

(i) FX (x ) = P (X ≤ x ) = P (X < x ), i.e. P (X = x ) = 0 , fX (x ).

Page 88: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

(ii) The PDF fX (·) ≥ 0, and∫ ∞−∞

fX (x )dx = 1.

(iii) µ = E (X ) =∫ ∞−∞

xfX (x )dx ,

σ2 = Var(X ) =∫ ∞−∞(x − µ)2fX (x )dx =

∫x2fX (x )dx − µ

2.

Furthermore the MGF of X is equal to

ψX (t ) =

∫ ∞−∞

e t xfX (x )dx .

Some important continuous distributions

Uniform distribution U (a, b): X takes any values between a and b equallylikely. Its PDF is

f (x ) =

1b−a a ≤ x ≤ b

0 otherwise.

Page 89: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Then the CDF is

F (x ) =

∫ x

−∞f (u)du =

0 x < a,1b−a

∫ xadu = x−a

b−a a ≤ x ≤ b,

1 x > b .

,

and

µ =

∫ b

a

xdx

b − a=a + b

2, µ2 =

∫ b

a

x2dx

b − a=b3 − a3

3(b − a)=a2 + ab + b2

3

Hence σ2 = µ2 − µ2 = (b − a)2/12.

R-functions related to uniformdistributions: runif, dunif, punif, qunif.

Quantile. For a given CDF F (·), its quantile function is dened as

F −1(p) = infx : F (x ) ≥ p, p ∈ [0, 1]

Page 90: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

> x <- c(1, 2.5, 4)> punif(x, 2, 3) # CDF of U(2, 3) at 1, 2.5 and 4[1] 0 0.5 1> dunif(x, 2, 3) # PDF of U(2, 3) at 1, 2.5 and 4[1] 0 1 0> qunif(0.5, 2, 3) # quantile of U(2, 3) at p=0.5[1] 2.5

Page 91: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Normal Distribution N (µ,σ2): the PDF is

f (x ) =1√2πσ2

exp−

1

2σ2(x − µ)2

, −∞ < x < ∞,

where µ ∈ (−∞,∞) is the ‘centre’ (or mean) of the distribution, and σ > 0is the ‘spread’ (standard deviation).

Remarks. (i) The most important distribution in statistics: Many phenom-ena in nature have approximately normal distributions. Furthermore, itprovides asymptotic approximations for the distributions of sample means(Central Limit Theorem).

(ii) If X ∼ N (µ,σ2), EX = µ, Var(X ) = σ2, and ψX (t ) = eµt+σ2t 2/2.

Page 92: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

We compute ψX (t ) below, the idea is applicable in general.

ψX (t ) =1√2πσ

∫e t xe

− 12σ2(x−µ)2

dx =1√2πσ

∫e− 12σ2(x2−2µx−2t xσ2+µ2)

dx

=1√2πσ

∫e− 12σ2[x−(µ+t σ2)2−(µ+t σ2)2+µ2]

dx

= e12σ2(µ+t σ2)2−µ2 1

√2πσ

∫e− 12σ2x−(µ+t σ2)2

dx

= e12σ2(µ+t σ2)2−µ2

= eµt+t2σ2/2

(iii) Standard normal distribution: N (0, 1).If X ∼ N (µ,σ2), Z ≡ (X − µ)/σ ∼ N (0, 1). Hence

P (a < X < b) = P(a − µσ< Z <

b − µ

σ

)= Φ

(b − µσ

)− Φ

(a − µσ

),

Page 93: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

where

Φ(x ) = P (Z < x ) =1√2π

∫ x

−∞e−u

2/2du

is the CDF of N (0, 1). Its values are tabulated in all statistical tables.

Example 3. Let X ∼ N (3, 5).

P (X > 1) = 1 − P (X < 1) = 1 − P(Z <

1 − 3√5

)= 1 − Φ(−0.8944) = 0.81.

Now nd x = Φ−1(0.2), i.e. x satises the equation

0.2 = P (X < x ) = P(Z <

x − 3√5

).

From the normal table, Φ(−0.8416) = 0.2. Therefore (x − 3)/√5 = −0.8416,

leading to the solution x = 1.1181.

Page 94: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Note. You may check the answers using R:1 - pnorm(1, 3, sqrt(5)),qnorm(0.2, 3, sqrt(5))

Page 95: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Density functions of N (0,σ2)

−4 −2 0 2 4

0.0

0.2

0.4

0.6

0.8

x

Nor

mal

den

sity

σ = 1

σ = 2

σ = 0.5

Page 96: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Density functions of N (µ, 1)

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

x

Nor

mal

den

sity

μ = 0 μ = 2μ = − 2

Page 97: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Below are the R codes which produce the two normal density plots.

x <- seq(-4, 4, 0.1) # x = (-4, -3.9, -3.8, ..., 3.9, 4)plot(x, dnorm(x, 0, 1), type='l', xlab='x', ylab='Normal density',

ylim=c(0, 0.85))text(0,0.43, expression(sigma==1))lines(x, dnorm(x, 0, 2), lty=2)text(0,0.23, expression(sigma==sqrt(2)))lines(x, dnorm(x, 0, 0.5), lty=4)text(0,0.83, expression(sigma==sqrt(0.5)))

x <- seq(-3, 3, 0.1)plot(x, dnorm(x, 0, 1), type='l', xlab='x', ylab='Normal density',

xlim=c(-5, 5))text(0,0.34, expression(mu==0))lines(x+2, dnorm(x+2, 2, 1), lty=2)text(2,0.34, expression(mu==2))lines(x-2, dnorm(x-2, -2, 1), lty=4)text(-2,0.34, expression(mu==-2))

Page 98: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Exponential Distribution Exp(λ): X ∼ Exp(λ) if X has the PDF

f (x ) =

1λ e−x/λ x > 0

0 otherwise,where λ > 0 is a parameter.

E (X ) = λ, Var(X ) = λ2, ψX (t ) = 1/(1 − tλ).

Background. Exp(λ) is used tomodel the lifetime of electronic componentsand the waiting times between rare events.

Gamma Distribution Gamma(α , β ): X ∼ Gamma(α , β ) if X has the PDF

f (x ) =

1

βαΓ(α) xα−1e−x/β x > 0

0 otherwise,

Page 99: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

where α , β > 0 are two parameters, and Γ(α) =∫ ∞0 yα−1e−ydy .

E (X ) = αβ , Var(X ) = αβ2, ψX (t ) = (1 − t β )−α .

Note. Gamma(1, β ) = Exp(β ).

Cauchy Distribution: the PDF of the Cauchy distribution is

f (x ) =1

π(1 + x2), x ∈ (−∞,∞).

As E (|X |) = ∞, the mean and variance of the Cauchy distribution do notexist. Cauchy Distribution is particularly useful to model the data withexcessively large, or negatively large outliers.

Page 100: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Chapter 4. Multivariate Distributions

4.1 Bivariate Distributions.

For a pair r.v.s (X ,Y ), the Joint CDF is dened as

FX ,Y (x , y ) = P (X ≤ x ,Y ≤ y ).

Obviously, the marginal distributions may be obtained easily from thejoint distribution:

FX (x ) = P (X ≤ x ) = P (X ≤ x ,Y < ∞) = FX ,Y (x ,∞),

and FY (y ) = FX ,Y (∞, y ).

Covariance and correlation of X andY :

Cov(X ,Y ) = E (X − EX )(Y − EY ) = E (XY ) − (EX )(EY ),Corr(X ,Y ) = Cov(X ,Y )

/√Var(X )Var(Y ).

Page 101: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Discrete bivariate distributions

If X takes discrete values x1, · · · , xm andY takes discrete values y1, · · · , yn ,their joint probability function may be presented in a table:

X \Y y1 y2 · · · ynx1 p11 p12 · · · p1n p1·x2 p21 p22 · · · p2n p2·

· · · · · ·

xm pm1 p22 · · · pmn pm ·p ·1 p ·2 · · · p ·n

where pi j = P (X = xi ,Y = yj ), and

pi · = P (X = xi ) =n∑j=1

P (X = xi ,Y = yj ) =∑j

pi j ,

p ·j = P (Y = yj ) =m∑i=1

P (X = xi ,Y = yj ) =∑i

pi j .

Page 102: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

In general, pi j , pi · × p ·j . However if pi j = pi · × p ·j for all i and j , X andYare independent, i.e.

P (X = xi ,Y = yj ) = P (X = xi ) × P (Y = yj ), [i , j .

For independent X andY , Cov(X ,Y ) = 0.

Example 1. Flip a fair coin two times. Let X = 1 if H occurs in the rst ip,and 0 if T occurs in the rst ip. LetY = 1 if the outcomes in the two ipsare the same, and 0 if the two outcomes are dierent. The joint probabilityfunction is

X \Y 1 01 1/4 1/4 1/20 1/4 1/4 1/2

1/2 1/2

It is easy to see that X andY are independent, which is a bit anti-intuitive.

Page 103: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Continuous bivariate distribution

If the CDF FX ,Y can be written as

FX ,Y (x , y ) =

∫ y

−∞

∫ x

−∞fX ,Y (u,v )dudv for any x and y ,

where fX ,Y ≥ 0, (X ,Y ) has a continuous joint distribution, and fX ,Y (x , y )is the joint PDF.

As FX ,Y (∞,∞) = 1, it holds that∫ ∞−∞

∫ ∞−∞

fX ,Y (u,v )dudv = 1.

In fact any non-negative function satisfying this condition is a PDF. Fur-thermore for any subset A in R2,

P (X ,Y ) ∈ A =

∫AfX ,Y (x , y )dxdy .

Page 104: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Also

Cov(X ,Y ) =∫(x − EX )(y − EY )fX ,Y (x , y )dxdy

=

∫x yfX ,Y (x , y )dxdy − EX EY .

Note that

FX (x ) = FX ,Y (x ,∞) =

∫ ∞−∞

∫ x

−∞fX ,Y (u,v )dudv =

∫ x

−∞

∫ ∞−∞

fX ,Y (u,v )dvdu,

hence the marginal PDF of X can be derived from the joint PDF as follows

fX (x ) =

∫ ∞−∞

fX ,Y (x , y )dy .

Similarly, fy (y ) =∫ ∞−∞

fX ,Y (x , y )dx .

Page 105: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Note. Dierent from discrete cases, it is not always easy to work outmarginal PDFs from joint PDFs, especially when PDFs are discontinuous.

When fX ,Y (x , y ) = fX (x )fy (y ) for any x and y , X andY are independent,as then

P (X ≤ x ,Y ≤ y ) =

∫ y

−∞

∫ x

−∞fX ,Y (u,v )dudv =

∫ y

−∞

∫ x

−∞fX (u)fY (v )dudv

=

∫ x

−∞fX (u)du

∫ y

−∞fY (v )dv = P (X ≤ x )P (Y ≤ y ),

and also Cov(X ,Y ) = 0.

Example 2. Uniform distribution on unit square – U [0, 1]2.

f (x , y ) =

1 0 ≤ x ≤ 1, 0 ≤ y ≤ 10 otherwise.

Page 106: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

This is well-dened PDF, as f ≥ 0 and∫ ∫

f (x , y )dxdy = 1. It is easy to seethat X andY are independent. Let us calculate some probabilities

P (X < 1/2,Y < 1/2) = F (1/2, 1/2) =

∫ 1/2

−∞

∫ 1/2

−∞fX ,Y (x , y )dxdy

=

∫ 1/2

0

∫ 1/2

0dxdy = 1/4.

P (X +Y < 1) =

∫x+y<1

fX ,Y (x , y )dxdy =

∫x>0, y>0, x+y<1

dxdy

=

∫ 1

0dy

∫ 1−y

0dx =

∫ 1

0(1 − y )dy = 1/2.

Example 3. Let (X ,Y ) have the joint PDF

f (x , y ) =

x2 + x y/3 0 ≤ x ≤ 1, 0 ≤ y ≤ 2,0 otherwsie.

Page 107: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Calculate P (0 < X < 1/2, 1/4 < Y < 3) and P (X < Y ). Are X and Yindependent with each other?

P (0 < X < 1/2, 1/4 < Y < 3) = P (0 < X < 1/2, 1/4 < Y < 2)

=

∫ 2

1/4dy

∫ 1/2

0(x2 +

x y

3)dx =

∫ 2

1/4

1 + y

24dy =

1.75

24+y 2

48

21/4 = 0.155.

P (X < Y ) =

∫ 1

0dx

∫ 2

x(x2+

x y

3)dy =

∫ 1

0(2

3x+2x2−

7

6x3)dx = 17/24 = 0.708.

fX (x ) =

∫ 2

0(x2 +

x y

3)dy = 2x2 +

2x

3, fY (y ) =

∫ 1

0(x2 +

x y

3)dx =

1

3+y

6.

Both fX (x ) and fY (y ) are well-dened PDFs.

But f (x , y ) , fX (x )fY (y ), hence they are not independent.

Page 108: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

4.2 Conditional Distributions

If X andY are not independent, knowing X should be helpful in determin-ingY , as X may carry some information onY . Therefore it makes senseto dene the distribution of Y given, say, X = x . This is the concept ofconditional distributions.

If both X andY are discrete, the conditional probability function is simplya special case of conditional probabilities:

P (Y = y |X = x ) = P (Y = y , X = x )/P (X = x ).

However this denition does not extend to continuous r.v.s, as then P (X =

x ) = 0.

Page 109: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Denition (Conditional PDF). For continuous r.v.s X andY , the conditionalPDF ofY given X = x is

fY |X (·|x ) = fX ,Y (x , ·)/fX (x ).

Remark. (i) As a function of y , fY |X (y |x ) is a PDF:

P (Y ∈ A |X = x ) =

∫AfY |X (y |x )dy ,

while x is treated as a constant (i.e. not a variable).

(ii) E (Y |X = x ) =∫yfY |X (y |x )dy is a function of x , and

Var(Y |X = x ) =

∫y − E (Y |X = x )2fY |X (y |x )dy .

Page 110: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

(iii) If X andY are independent, fY |X (y |x ) = fY (y ).

(iv) fX ,Y (x , y ) = fX (x )fY |X (y |x ) = fX |Y (x |y )fY (y ), which oers alternativeways to determine the joint PDF.

(v) E E (Y |X ) = E (Y ) — This in fact holds for any r.v.s X andY . We give aproof here for continuous r.v.s only:

E E (Y |X ) =

∫ ∫yfY |X (y |x )dy

fX (x )dx =

∫ ∫yfX ,Y (x , y )dxdy

=

∫y ∫

fX ,Y (x , y )dxdy =

∫yfY (y )dy = EY .

Example 4. Let fX ,Y (x , y ) = e−y for 0 < x < y < ∞, and 0 otherwise. FindfY |X (y |x ), fX |Y (x |y ) and Cov(X ,Y ).

Page 111: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

We need to nd fX (x ), fY (y ) rst:

fX (x ) =

∫fX ,Y (x , y )dy =

∫ ∞x

e−ydy = e−x x ∈ (0,∞),

fY (y ) =

∫fX ,Y (x , y )dx =

∫ y

0e−ydx = ye−y y ∈ (0,∞).

Hence

fY |X (y |x ) =fX ,Y (x , y )

fX (x )= e−(y−x ) y ∈ (x ,∞),

fX |Y (x |y ) =fX ,Y (x , y )

fY (y )= 1/y x ∈ (0, y ).

Note that givenY = y , X ∼ U (0, y ), i.e. the uniform distribution on (0, y ).

Page 112: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

To nd Cov(X ,Y ), we compute EX , EY and E (XY ) rst.

EX =

∫xfX (x )dx =

∫ ∞0

xe−xdx = −e−x (1 + x )∞0 = 1,

EY =

∫yfY (y )dy =

∫ ∞0

y 2e−ydy = −y 2e−y∞0 + 2

∫ ∞0

ye−ydy = 2,

E (XY ) =

∫x yfX ,Y (x , y )dxdy =

∫ ∞0

dy

∫ y

0x ye−ydx =

1

2

∫ ∞0

y 3e−ydy

= −1

2y 3e−y

∞0 +

3

2

∫ ∞0

y 2e−ydy = 3.

Hence Cov(X ,Y ) = E (XY ) − (EX )(EY ) = 3 − 2 = 1.

Page 113: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

4.3 Multivariate Distributions

Let X = (X1, · · · ,Xn)′ be a random vector (r.v.) consisting of n r.v.s. Thejoint CDF is dened as

F (x1, · · · , xn) ≡ FX1,··· ,Xn (x1, · · · , xn) = P (X1 ≤ x1, · · · , Xn ≤ xn).

If X is continuous, its PDF f satises

F (x1, · · · , xn) =

∫ xn

−∞· · ·

∫ x1

−∞f (u1, · · · ,un)du1 · · · dun .

In general, the PDF admits the factorisation

f (x1, · · · , xn) = f (x1)f (x2 |x1)f (x3 |x1, x2) · · · f (xn |x1, · · · , xn−1),

where f (xj |x1, · · · , xj−1) denotes the conditional PDF of Xj given X1 =x1, · · · ,Xj−1 = xj .

Page 114: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

However, when X1, · · · ,Xn are independent,

fX1,··· ,Xn (x1, · · · , xn) = fX1(x1) · · · fXn (xn).

IID Samples. If X1, · · · ,Xn are independent and each has the same CDF F ,we say that X1, · · · ,Xn are IID (independent and identically distributed)and write

X1, · · · ,Xn ∼i i d F .

We also call X1, · · · ,Xn a sample or a random sample.

4.3 Two important multivariate distributions

Multinomial DistributionMultinomial(n, p1, · · · , pk )—an extension of Bin(n, p).

Page 115: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Suppose we threw a k -sided die n times, record Xi as the number of timesended with the i -th side, i = 1, · · · , k . Then

(X1, · · · ,Xk ) ∼ Multinomial(n, p1, · · · , pk ),

where pi is the probability of the event that the i -th side occurs in onethrew. Obviously pi ≥ 0 and

∑i pi = 1.

We may immediately make the following observation from the abovedenition.

(i) X1 + · · · + Xk ≡ n , therefore X1, · · · ,Xn are not independent.

(ii) Xi ∼ Bin(n, pi ), hence EXi = npi and Var(Xi ) = npi (1 − pi ).

Page 116: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

The joint probability function for Multinomial(n, p1, · · · , pk ):

For any j1, · · · , jk ≥ 0 and j1 + · · · + jk = n ,

P (X1 = j1, · · · ,Xk = jk ) =n !

j1! · · · jk !pj11· · · p

jkk.

Page 117: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Multivariate Normal DistributionN (µ, Σ): a k -variable r.v. X = (X1, · · · ,Xk )′

is normal with mean µ and covariance matrix Σ if its PDF is

f (x) = 1

(2π)k /2 |Σ |1/2exp

−1

2(x − µ)′Σ−1(x − µ)

x ∈ R k ,

where µ is k -vector, and Σ is a k × k positive-denite matrix.

Some properties of N (µ, Σ): Let µ = (µ1, · · · , µk )′ and Σ ≡ (σi j ), then

(i) EX = µ, and the covariance matrix

Cov(X) = E (X − µ)(X − µ)′ = Σ,

and

σi j = Cov(Xi ,Xj ) = E (Xi − µi )(Xj − µj ).

Page 118: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

(ii) When σi j = 0 for all i , j , i.e. the components of X are uncorrelated,Σ = diag(σ11, · · · ,σk k ), |Σ | =

∏i σi i . Hence the PDF admits a simple form

f (x) =k∏i=1

1√2πσi i

exp− 1

2σi i(xi − µi )

2.

Thus X1, · · · ,Xn are independent when σi j = 0 for all i , j .

(iii) Let Y = AX + b, where A is a constant matrix and b is a constant vector.Then Y ∼ N (Aµ + b, AΣA′).

(iv) Xi ∼ N (µi ,σi i ). For any constant k -vector a, a′X is a scale r.v. anda′X ∼ N (a′µ, a′Σa).

(v). Standard Normal Distribution: N (0, Ik), where Ik is the k × k identitymatrix.

Page 119: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Example 5. Let X1,X2,X3 be jointly normal with the common mean 0,variance 1 and

Corr(Xi ,Xj ) = 0.5, 1 ≤ i , j ≤ 3.

Find the probability P (|X1 | + |X2 | + |X3 | ≤ 2).

It is dicult to calculate this probability by the integration of the joint PDF.We provide an estimate by simulation. (The justication will be in Chapter6). We solve a general problem rst.

Let X ∼ N (µ, Σ), X has p component. For any set A ⊂ Rp , we may estimatethe probability P (X ∈ A) by the relative frequency

#1 ≤ i ≤ n : Xi ∈ A/n,

where n is a large integer, and X1, · · · , Xn are n vectors generated indepen-dently from N (µ, Σ).

Page 120: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Note

X = µ + Σ1/2Z,

where Z ∼ N (0, Ip) is standard normal, and Σ1/2 ≥ 0 and Σ1/2Σ1/2 = Σ.We generate Z by rnorm(p), and apply the above linear transformation toobtain X.

Σ1/2 may be obtained by an eigenanalysis for Σ using R-function eigen.Since Σ ≥ 0, it holds that

Σ = ΓΛΓ′,

where Γ is an orthogonal matrix (i.e. Γ′Γ = Ip), Λ = diag(λ1, · · · , λp) is adiagonal matrix. Then

Σ1/2 = ΓΛ1/2Γ′, where Λ1/2 = diag(√λ1, · · · ,

√λp

).

The R function rMNorm below generate random vectors from N (µ, Σ).

Page 121: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

rMNorm <- function(n, p, mu, Sigma) # generate n p-vectors from N(mu, Sigma)# mu is p-vector of mean, Sigma >=0 is pxp matrix

t <- eigen(Sigma, symmetric=T) # eigenanalysis for Sigmaev <- sqrt(t$values) # square-roots of the eigenvaluesG <- as.matrix(t$vectors) # line up eigenvectors into a matrix GD <- G*0; for(i in 1:p) D[i,i] <- ev[i]; # D is diagonal matrixP <- G%*%D%*%t(G) # P=GDG' is the required transformation matrixZ <- matrix(rnorm(n*p), byrow=T, ncol=p)

# Z is nxp matrix with elements drawn from N(0,1)Z <- Z%*%P # Now each row of Z is N(0, Sigma)X <- matrix(rep(mu, n), byrow=T, ncol=p) + Z

# each row of X is N(mu, Sigma)

This function is saved in the le ‘rMNorm.r’. We may use it to perform therequired task:

source("rMNorm.r")

Page 122: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

mu <- c(0, 0, 0)Sigma <- matrix(c(1,0.5,0.5,0.5,1,0.5,0.5,0.5,1), byrow=T, ncol=3)X <- rMNorm(20000, 3, mu, Sigma)dim(X) # check the size of Xt <- abs(X[,1]) + abs(X[,2]) + abs(X[,3])cat("Estimated probability:", length(t[t<=2])/20000, "\n")

It returned the value:

Estimated probability: 0.446

I repeated it a few more times and obtained the estimates 0.439, 0.445,0.441 etc.

Page 123: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

4.4 Transformations of random variables

Let a random vector X have PDF fX. We are interested in the distributionof a scalar function of X, say,Y = r (X). We introduce a general procedurerst.

Three steps to nd the PDF ofY = r (X):

(i) For each y , nd the set Ay = x : r (x) ≤ y (ii) Find the CDF

FY (y ) = P (Y ≤ y ) = P r (X) ≤ y =∫AyfX(x)dx.

(iii) fY (y ) = ddy FY (y ).

Page 124: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Example 6. Let X ∼ fX (x ) (X is a scalar). Find the PDF ofY = eX .

Ay = x : ex ≤ y = x : x ≤ log y . Hence

FY (y ) = P (Y ≤ y ) = P eX ≤ y = P (X ≤ log y ) = FX (log y ).

Hence

fY (y ) =d

dyFX (log y ) = fX (log y )

d log ydy

= y−1fX (log y ).

Note that y = ex and log y = x , dydx = ex = y . The above result can bewritten as

fY (y ) = fX (x )/dydx, or fY (y )dy = fX (x )dx .

Page 125: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

For 1-1 transformation Y = r (X ) (i.e. the inverse function X = r−1(Y ) isuniquely dened), it holds that

fY (y ) = fX (x )/|r′(x )| = fX (x )

dxdy

.Note. You should replace all x in the above by x = r−1(y ).

Page 126: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Example 7. Let X ∼ U (−1, 3). Find the PDF ofY = X 2. Now this is not a 1-1transformation. We have to use the general 3-step procedure.

Note thatY takes values in (0, 9). Consider two cases:

(i) For y ∈ (0, 1), Ay = (−√y ,√y ), Fy (y ) =

∫AyfX (x )dx = 0.5

√y . Hence

fY (y ) = F′Y(y ) = 0.25/

√y .

(ii) For y ∈ [1, 9), Ay = (−1,√y ), Fy (y ) =

∫AyfX (x )dx = 0.25(

√y + 1). Hence

fY (y ) = F′Y(y ) = 0.125/

√y .

Collectively we have

fY (y ) =

0.25/

√y 0 < y < 1

0.125/√y 1 ≤ y < 9

0 otherwise.

Page 127: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Chapter 5. Inequalities

Inequalities are useful tools in establishing various properties of statisticalinference methods. They may also provide estimates for probabilities withlittle assumption on probability distributions.

5.1 Probability inequalities

Markov’s inequality. Let X be a non-negative r.v. and EX < ∞. Then forany t > 0, P (X > t ) ≤ EX

/t .

Page 128: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

An immediate corollary of Markov’s inequality: For any r.v. X and anyconstant t > 0,

P (|X | > t ) ≤E |X |

tprovided E |X | < ∞,

P (|X | > t ) ≤E |X |k

t kprovided E |X |k < ∞. (1)

The tail-probability P (|X | > t ) is a useful measure in insurance and riskmanagement in nance. (1) implies that the more moments X has, thesmaller the tail probabilities are.

Page 129: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Proof of Markov’s inequality. Since X > 0,

EX =

∫ ∞0

xf (x )dx =

∫ t

0xf (x )dx +

∫ ∞t

xf (x )dx

∫ ∞t

xf (x )dx ≥ t

∫ ∞t

f (x )dx = t P (X > t ).

Page 130: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Chebyshev’s inequality. Suppose a r.v. X have mean µ and variance σ2 ∈(0,∞). Then P (|X − µ | ≥ t ) ≤ σ2/t 2 for any t > 0.

Remarks. (i) Chebyshev’s inequality follows from (1) with X replaced byX − µ and k = 2.

(ii) Replacing t by t σ , we have P (|Z | > t ) ≤ 1/t 2, where Z = (X − µ)/σ is astandardization of X .

(iii) For any r.v. X with mean 0 and variance 1, it holds that

P (|X | > 2) ≤ 1/4, P (|X | > 3) ≤ 1/9.

Page 131: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Example 1. We ipped a coin n times with Head occurred k (< n) times.Therefore a natural estimate for the probability p = P (Head) is k /n . Whatis the error k /n − p in this estimation?

Let Xi = 1 if Head occurred in the i -th ip, and 0 otherwise. Then k =∑ni=1Xi , and k /n = n

−1∑ni=1Xi ≡ Xn . Note k , therefore also Xn , may take

dierent value if we repeat the experiment. Hence it makes sense toquantify the estimation error in probability such as P (|Xn − p | > ε) forsome small constant ε > 0.

Note E (Xn) = n−1∑EXi = p , Var(Xn) = n−2

∑Var(Xi ) = n−1Var(X1)

= n−1p(1 − p). It follows from Chebyshev’s inequality that

P (|Xn − p | > ε) ≤p(1 − p)

nε2≤

1

4nε2.

Let ε = 0.1 and n = 500, P (|Xn − p | > 0.1) ≤ 1/(20) = 0.05.

Page 132: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

5.2 Inequalities for expectations

Cauchy-Schwartz inequality. Let E (X 2) < ∞ and E (Y 2) < ∞. Then E |XY | ≤E (X 2)E (Y 2)1/2.

A function g is convex if for any x , y and any α ∈ [0, 1],

g (αx + (1 − α)y ) ≤ αg (x ) + (1 − α)g (y ).

If g ′′(x ) ≥ 0 for all x , g is convex. Examples of convex functions includeg (x ) = x2 and g (x ) = ex .

A function g is concave if −g is convex. Examples of concave functions areg (x ) = −x2 and g (x ) = log(x ).

Page 133: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Jensen’s inequality. If g is convex, E g (X ) ≥ g (EX ).

From Jensen’s inequality, we have E (X 2) ≥ (EX )2. If X ≥ 0, E (1/X ) ≥1/EX and E (logX ) ≤ log(EX ).

Page 134: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Chapter 6. Convergence of Random Variables andMonte Carlo Methods

6.1 Type of convergence

The two main types of convergence are dened as follows.

Let X1,Xn, · · · be a sequence of r.v.s, and X be another r.v.

1. Xn converges to X in probability, denoted by XnP−→ X , if for any

constant ε > 0, P (|Xn − X | > ε) → 0 as n →∞.

2. Xn converges toX in distribution, denoted byXnD−→ X , if limn FXn (x ) =

FX (x ) for any x (at which FX is continuous).

Page 135: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Remarks. (i) X may be a constant (as a constant is a r.v. with probabilitymass concentrated on a single point.)

(ii) If XnP−→ X , it also holds that Xn

D−→ X , but not visa versa.

Example 1. Let X ∼ N (0, 1) and Xn = −X for all n ≥ 1. Then FXn ≡ FX .Hence Xn

D−→ X . But Xn 6

P−→ X , as for any ε > 0

P (|Xn − X | > ε) = P (2|X | > ε) = P (|X | > ε/2) > 0.

However if XnD−→ c and c is a constant, it holds that Xn

P−→ c.

Page 136: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Note. We need the two types of convergence.

For example, let θn = h(X1, · · · ,Xn) be an estimator for θ.

Naturally we require θnP−→ θ.

But θn is a random variable, it takes dierent values with dierent sam-ples. To consider how good it is as an estimator for θ, we hope that thedistribution of (θn − θ) becomes more concentrated around 0 when nincreases.

Page 137: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

(iii) It is sometimes more convenient to consider the mean square conver-gence:

E (Xn − X )2 → 0 as n →∞,

denoted by Xnm .s .−→ X . It follows from Markov’s inequality that

P (|Xn − X | > ε) = P (|Xn − X |2 > ε2) ≤

E |Xn − X |2

ε2.

Hence if Xnm .s .−→ X , it also holds that Xn

P−→ X , but not visa versa.

Example 2. Let U ∼ U (0, 1) and Xn = nIU<1/n. Then P (|Xn | > ε) ≤ P (U <

1/n) = 1/n → 0, hence XnP−→ 0. However

E (X 2n ) = n

2P (U < 1/n) = n →∞.

Hence Xn 6m .s .−→ 0.

Page 138: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

(iv) XnP−→ X does not imply EXn → EX .

Example 3. Let Xn = n with probability 1/n and 0 with probability 1 − 1/n .Then Xn

P−→ 0. However EXn = 1 6→ 0.

(v) When XnD−→ X , we also write Xn

D−→ FX , where FX is the CDF of X .

However the notation XnP−→ FX does not make sense!

Page 139: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Slutsky’s Theorem. Let Xn,Yn,X ,Y be r.v.s, g be a continuous function,and c is a constant.(i) If Xn

P−→ X and Yn

P−→ Y , then Xn +Yn

P−→ X +Y , XnYn

P−→ XY , and

g (Xn)P−→ g (X ).

(ii) If XnD−→ X and Yn

D−→ c, then Xn +Yn

D−→ X + c, XnYn

D−→ cX , and

g (Xn)D−→ g (X ).

Note. XnD−→ X andYn

D−→Y does not in general imply Xn +Yn

D−→ X +Y .

Slutzky’s theorem is very useful, as it implies, e.g., X 2n

P−→ µ2, and Xn/Sn

P−→

µ/σ (see Exercise 4.3).

Page 140: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Recall the limits of sequences of real numbers x1, x2, · · · : if limn→∞ xn = x(or, simply, xn → x ), we mean |xn − x | → 0 as n →∞.

For a sequence of r.v.s X1,X2, · · · , we say X is the limit of Xn if |Xn−X | →0. Now there are some subtle issues here:

(i) |Xn−X | is a r.v., it takes dierent values in the sample space Ω. Therefore|Xn −X | → 0 should hold (almost) on the entirely sample space. This callsfor some probability statement.

(ii) Since r.v.s have distributions, we may also consider FXn (x ) → FX (x ) forall x .

Page 141: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Recall two simple facts: for any r.v.sY1, · · · ,Yn and constants a1, · · · , an ,

E( n∑i=1

aiYi)=

n∑i=1

aiEYi , (2)

and ifY1, · · · ,Yn are uncorrelated (i.e. Cov(Yi ,Yj ) = 0 [ i , j )

Var( n∑i=1

aiYi)=

n∑i=1

a2i Var(Yi ). (3)

Page 142: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Proof for (3). First note that for any r.v. U , Var(U ) = Var(U − EU ). Becauseof (2), we may assume EYi = 0 for all i . Thus

Var( n∑i=1

aiYi)= E

( n∑i=1

aiYi)2 = E ( n∑

i=1

a2i Y2i +

∑i,j

ai ajYiYj)

=n∑i=1

a2i E (Y2i ) +

∑i,j

ai ajE (YiYj ) =n∑i=1

a2i Var(Yi ) +∑i,j

ai aj (EYi )(EYj )

=n∑i=1

a2i Var(Yi ).

Page 143: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

6.2 Two important limit theorems: LLN and CLT

Let X1,X2, · · · be IID with mean µ and variance σ2 ∈ (0,∞). Let Xn denotethe sample mean:

Xn =1

n(X1 + · · · + Xn), n = 1, 2, · · · .

We recall two simple facts:

E Xn = µ, Var(Xn) = σ2/n .

The (weak) Law of Large Numbers (LLN):As n →∞, Xn

P−→ µ.

Page 144: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

The LLN is very natural: When the sample size increases, the sample meanbecomes more and more close to the population mean. Furthermore, thedistribution of Xn degenerates to a single point distribution at µ.

Proof. It follows from Chebyshev’s inequality directly.

Page 145: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

To visualize the LLN, we simulate the sample paths for

> x <- rexp(10000) # generate 10000 random numbers from Exp(1)> summary(x)

Min. 1st Qu. Median Mean 3rd Qu. Max.0.0001666 0.2861000 0.7098000 1.0220000 1.4230000 8.6990000> n <- 1:10000> ms <- n> for(i in 1:10000) ms[i] <- mean(x[1:i])> plot(n, ms, type='l', ylab=expression(bar(Xn)),

main='Sample means of Exponential Distribution')> abline(1,0,lty=2) # draw a horizontal line at y=1

Page 146: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

0 2000 4000 6000 8000 10000

0.2

0.4

0.6

0.8

1.0

1.2

Sample means of Exponential Distribution

n

Xn

Page 147: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

We repeat this exercise for Poisson(2):

0 2000 4000 6000 8000 10000

1.0

1.5

2.0

2.5

Sample means of Poisson(2)

n

Xn

Page 148: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

The Central Limit Theorem (CLT):As n →∞,

√n(Xn − µ)/σ

D−→ N (0, 1).

Note the CLT can be expressed as

P Xn − µ√

σ2/n

≤ x→

1√2π

∫ x

−∞e−u

2/2du = Φ(x )

for any x , as n →∞, i.e. the standardized sample mean is approximatelystandard normal when the sample size is large. Hence in addition to√n(Xn − µ)/σ ≈ N (0, 1), we also see the expressions such as

Xn ≈ N (µ,σ2/n), Xn − µ ≈ N (0,σ

2/n),√n(Xn − µ) ≈ N (0,σ

2).

Note. The CLT is one of the reasons why normal distribution is the mostuseful and important distribution in statistics.

Page 149: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Example 4. If we take a sample X1, · · · ,Xn from U (0, 1), the standardizedhistogram will resemble the density function f (x ) = I(0,1)(x ), and thesample mean Xn = n−1

∑i Xi will be close to µ = EXi = 0.5, provided n is

suciently large.

However the CLT implies Xn ≈ N (0.5, 1/(12n)) as Var(Xi ) = 1/12. What doesthis mean?

If we take many samples of size n and compute the sample mean for eachsample, we then obtain many sample means. The standardized histogramof those samples means resembles the PDF of N (0.5, 1/(12n)) provided nis suciently large.

> x <- runif(50000) # generate 50,000 random numbers from U(0,1)> hist(x, probability=T) # plot histogram of the 50,000 data

Page 150: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

> z <- seq(0,1,0.01)> lines(z,dunif(z)) # superimpose the PDF of U(0,1)> x <- matrix(x, ncol=500) # line up x into a 100x500 matrix

# each column represents a sample of size 100> par(mar=c(3,3,3,2),mfrow=c(2,2)) # plot 4 figures together> meanx <- 1:500> for(i in 1:500) meanx[i] <- mean(x[1:5,i])

# compute the mean of the first 5 data in each column> hist(meanx, probability=T, nclass=20, main='n=5')> lines(z,dnorm(z,1/2,sqrt(1/(12*5))))

# superimpose the PDF of N(.5, 1/(12*5))> for(i in 1:500) meanx[i] <- mean(x[1:20,i])> hist(meanx, probability=T, nclass=20, main='n=20')> lines(z,dnorm(z,1/2,sqrt(1/(12*20))))> for(i in 1:500) meanx[i] <- mean(x[1:60,i])> hist(meanx, probability=T, nclass=20, main='n=60')> lines(z,dnorm(z,1/2,sqrt(1/(12*60))))> for(i in 1:500) meanx[i] <- mean(x[,i])> hist(meanx, probability=T, nclass=20, main='n=100')> lines(z,dnorm(z,1/2,sqrt(1/(12*100))))

Page 151: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Uniform(0, 1)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

n=5

0.2 0.4 0.6 0.8

0.0

0.5

1.0

1.5

2.0

2.5

3.0

n=20

0.4 0.5 0.6 0.7

01

23

45

6

n=100

0.40 0.45 0.50 0.55 0.60

05

1015

Page 152: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Example 5. Suppose a large box contains 10,000 poker chips distributedas follows

Values of chips $5 $10 $15 $30No. of chips 5000 3000 1000 1000

Take one chip randomly from the box, let X be its nomination. Then itsprobability function is

X 5 10 15 30probability 0.5 0.3 0.1 0.1

Furthermore µ = EX = 10 and σ2 = Var(X ) = 55.

We draw 500 samples from this distribution, compute the sample meansXn . When n is suciently large, we expect Xn ≈ N (10, 55/n).

Page 153: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

We create a plain text le ‘porkerChip.r’ as below, which illustrate thecentral limiting phenomenon for the samples from this simple distribution.

y<- runif(50000) # generate 50,000 U(0,1) random numbersx<- yfor(i in 1:50000)

if(y[i]<0.5) x[i]<-5 else if(y[i]<0.8) x[i]<-10 else

ifelse(y[i]<0.9, x[i]<-15, x[i]<-30)

# By now x are random numbers from the required distribution# of the poker chips

cat("mean", mean(x), "\n")cat("variance", var(x), "\n")

x <- matrix(x, ncol=500) # line up x into a 100x500 matrix# each column represents a sample of size 100

par(mar=c(3,3,3,2),mfrow=c(2,2)) # plot 4 figures together

meanx <- 1:500

Page 154: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

z<-seq(5,25,0.1)

for(i in 1:500) meanx[i] <- mean(x[1:5,i])# compute the mean of the first 5 data in each column

hist(meanx, probability=T, main='n=5')lines(z,dnorm(z,10,sqrt(55/5)))

# draw N(10, 55/n) together with the histogram

for(i in 1:500) meanx[i] <- mean(x[1:20,i])# compute the mean of the first 20 data in each column

hist(meanx, probability=T, main='n=20')lines(z,dnorm(z,10,sqrt(55/20)))

for(i in 1:500) meanx[i] <- mean(x[1:60,i])# compute the mean of the first 60 data in each column

hist(meanx, probability=T, main='n=60')lines(z,dnorm(z,10,sqrt(55/60)))

for(i in 1:500) meanx[i] <- mean(x[,i])# compute the mean of the whole 100 data in each column

hist(meanx, probability=T, main='n=100')

Page 155: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

lines(z,dnorm(z,10,sqrt(55/100)))

Page 156: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

n=5

5 10 15 20

0.00

0.05

0.10

0.15

n=20

6 8 10 12 14 16

0.00

0.05

0.10

0.15

0.20

0.25

n=60

8 9 10 11 12 13

0.0

0.1

0.2

0.3

0.4

0.5

n=100

8 9 10 11 12

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Page 157: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Example 6. Suppose X1, · · · ,Xn is an IID sample. A natural estimator forthe population mean µ = EXi is the sample mean Xn . By the CLT, we caneasily gauge the error of this estimation as follows:

P (|Xn − µ | > ε) = P(√n |Xn − µ |/σ >

√nε/σ

)≈ P |N (0, 1)| >

√nε/σ

= 2P N (0, 1) >√nε/σ = 21 − Φ(

√nε/σ).

With ε, n given, we can nd the value Φ(√nε/σ) from the table for standard

normal distribution, if we know σ .

Page 158: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Remarks. (i) Let ε = 2σ/√n = 2× STD(Xn) (as Var(Xn) = σ2/n), P (|Xn − µ | <

2σ/√n) ≈ 2Φ(2) − 1 = 0.954. Hence

If one estimates µ by Xn and repeats it a large number times,about the 95% of times µ is within 2 × STD(Xn)-distance from Xn .

(ii) Typically σ2 = Var(Xi ) is unknown in practice. We estimate it using thesample variance

S2n =1

n − 1

n∑i=1

(Xi − X )2.

In fact it still holds that√n(Xn − µ)/Sn

D−→ N (0, 1), as n →∞.

Page 159: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Similar to Example 6 above, we have now

P (|Xn − µ | > ε) ≈ 21 − Φ(√nε/Sn)

Let ε = Sn/√n , P (|Xn − µ | > ε) ≈ 21 − Φ(1) = 0.317, or

P (|Xn − µ | < Sn/√n) ≈ 1 − 0.317 = 0.683.

Let ε = 2Sn/√n , we obtain:

P (|Xn − µ | < 2Sn/√n) ≈ 0.954.

Hence

If one estimates µ by Xn and repeats it a large number times,about the 95% of times the true value is within (2Sn/

√n)-distance

from Xn .

Page 160: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Standard Error: SE(Xn) ≡ Sn/√n is called the standard error of the sample

mean. Note

SE(Xn) = 1

n(n − 1)

n∑i=1

(Xi − Xn)21/2.

Page 161: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

The Delta Method. If√n(Yn − µ)/σ

D−→ N (0, 1) and g is a dierentiable

function and g ′(µ) , 0. Then√ng (Yn) − g (µ)

|g ′(µ)|σ

D−→ N (0, 1).

Hence ifYn ≈ N (µ,σ2/n), then g (Yn) ≈ N (g (µ), (g ′(µ))2σ2/n).

Example 7. Suppose√n(Xn − µ)/σ

D−→ N (0, 1) andWn = e

Xn = g (Xn) withg (x ) = ex . Since g ′(x ) = ex , the Delta method impliesWn ≈ N (e

µ, e2µσ2/n).

Page 162: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

6.3 Monte Carlo methods

6.3.1 Basic Monte Carlo integration

The LLN may be interpreted as

1

n

n∑i=1

XiP−→

∫xf (x )dx

if X1, · · · ,Xn is a sample from the distribution with PDF f .

In general, for any function h, we apply the LLN to the sample Hi ≡ h(Xi )(i = 1, · · · , n), leading to

Hn ≡1

n

n∑i=1

h(Xi )P−→ E h(X1) =

∫h(x )f (x )dx . (4)

Page 163: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Monte Carlo integration method: generate a sample X1, · · · ,Xn fromPDF f , then the integral on the RHS of (4) may be approximated by themean Hn .

Tomeasure the accuracy of this Monte Carlo approximation, wemay use thestandard deviation σ/

√n (if we know σ2 = Var(H1)), or the standard error:( 1

n(n − 1)

n∑i=1

h(Xi ) − Hn2)1/2.

Page 164: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Example 8. (Area of the quarter circle) The area of a quarter of the unitcircle is π/4 = 0.7854.

Suppose we do not know the answer. It can be written as

J ≡

∫ 1

0

√1 − x2dx .

However it is not obvious how to solve this integral. We provide a MonteCarlo solution. Let

h(x ) =√1 − x2, f (x ) = I(0,1)(x ).

Then f is the PDF of U (0, 1) and

J =

∫h(x )f (x )dx = E h(X ),

Page 165: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

where X ∼ U (0, 1). Hence we generate a sample from U (0, 1) and estimateJ by

J =1

n

n∑i=1

√1 − X 2

i , SE = 1

n(n − 1)

n∑i=1

(

√1 − X 2

i − J )21/2.

The STD of J is σ/√n , where

σ2 = Var(√1 − X 2

1

)= E (1 − X 2

1 ) −(π4

)2 = 23−

(π4

)2 = 0.0498.The R-function ‘quartercircle.r’ below perform this Monte Carlo calculation.It is used to produce the table

n 1000 2000 4000 8000J .7950 .7834 .7841 .7858STD .0071 .0050 .0035 .0025SE .0072 .0050 .0036 .0025

Page 166: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

R-function ‘quartercircle.r’:

quartercircle<-function(n)# This function calculates the area of the quarter circle# using Monte Carlo method# The true value is \pi/4 = 0.7854# n is the sample size

x <- runif(n)h <- sqrt(1-x*x)list(quarterarea=mean(h), STD=sqrt(.0498/n),

SE=sqrt(var(h)/n), SampleSize=n)# use 'list' to keep more than one outputs

Page 167: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

You may call the function to perform the simulation:

> source("quartercircle.r")> t=quartercircle(2000)> summary(t)

Length Class Modequarterarea 1 -none- numericSTD 1 -none- numericSE 1 -none- numericSampleSize 1 -none- numeric> t$quarterarea[1] 0.7913048$STD[1] 0.00498999$SE[1] 0.004946009$SampleSize[1] 2000> t$quarterarea[1] 0.7913048

Page 168: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

6.3.2 Composition (Sequential sampling)

Let X ∼ fX (·),Y |X ∼ fY |X (·|X ). To obtain

Y1, · · · ,Yn ∼i i d fY (·) ≡

∫fY |X (·|x )fX (x )dx ,

we may repeat the composition below n times:

Step 1. Draw Xi from fX (·),Step 2. DrawYi from fY |X (·|Xi ).

Then (Xi ,Yi ), 1 ≤ i ≤ n are i.i.d. from the joint density

fX ,Y (x , y ) = fY |X (y |x )fX (x ).

HenceY1, · · · ,Yn are i.i.d. from its marginal density fY (·).

Page 169: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Remarks.(a) This method is applied when it is dicult to sample directly from fY (·).(b) WithY1, · · · ,Yn ∼i i d fY (y ), we may estimate E (Y ) by n−1

∑i Yi . In gen-

eral we estimate E ψ(Y ), for a known ψ(·), by

ψ ≡1

n

n∑i=1

ψ(Yi ),

with the standard error

1√n(n − 1)

[ n∑i=1

ψ(Yi ) − ψ2]1/2.

(c) The density function fY (·) may be estimated by

fY (y ) =1

n

n∑i=1

fY |X (y |Xi ).

It also provides an estimate for EY :∫y fY (y )dy .

Page 170: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Example 9. Let Y = X1 + · · · + XT , where X1,X2, · · · are IID Bernoulli(p),T ∼ Poisson(λ), and T and Xi ’s are independent. Then a sample from thedistribution ofY can be drawn as follows:

(i) Draw T1, · · · ,Tn independently from Poisson(λ),(ii) DrawYi ∼ Bin(Ti , p), i = 1, · · · , n , independently.

Example 10. Mixture of Normal distributions:

p N (µ1,σ21 ) + (1 − p)N (µ0,σ

20 ), p ∈ (0, 1),

(i.e. with PDF pσ1ϕ(x−µ1σ1) +

1−pσ0ϕ(x−µ0σ0).)

A sample X1, · · · ,Xn can be drawn as follows:

(i) I1, · · · , In ∼ Bernoulli(p) independently,(ii) Xi ∼ N (µIi ,σ

2Ii), i = 1, · · · , n , independently.

Page 171: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Example 11. The lifetime X of a product follows the exponential distri-bution with mean e1+U/4, where U is a quality index of the raw materialsused in producing the product and U ∼ N (µ,σ2). Find the mean, varianceand the PDF of X when µ = 1 and σ2 = 2.

As X |U ∼ Exp(e1+U/4) and U ∼ N (µ,σ2), we have

fX (x ) =

∫fX |U (x |u)fU (u)du,

fX |U (x |u) = e−(1+u/4) exp−xe−(1+u/4) for x > 0.

We use Monte Carlo simulation as follows:

1. Draw U1, · · · ,Un from N (µ,σ2)

2. Draw Xi from Exp(e1+Ui /4), i = 1, · · · , n .

Page 172: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Then the estimated mean for X is Xn = n−1∑i Xi with the standard error

σ/√n , where

σ2 =1

n − 1

n∑i=1

(Xi − Xn)2

is an estimator for the variance of X . The estimated PDF is

fX (x ) =1

n

n∑i=1

fX |U (x |Ui ) =1

n

n∑i=1

e−(1+Ui /4) exp−xe−(1+Ui /4)

Page 173: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

We write R-function lifetimeMeanVar to simulate EX and Var(X ), andlifetimePDF to produce the PDF fX and also EX .

lifetimeMeanVar <- function(n, mu, sigma2) u <- rnorm(n, mu, sqrt(sigma2))

# generate n random numbers from N(mu, sigma2)x <- ufor(i in 1:n) x[i] <- rexp(1, 1/exp(1+u[i]/4))# x[i] is a random number from Exponential# distribution with mean e^1+u[i]/4

vx <- var(x)list(Mean=mean(x), Min=min(x), Max=max(x),

StandardError=sqrt(vx/n), Var=vx)

Page 174: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

The function is saved in the le ‘lifetimeMeanVar.r’, we source it into R andproduce the required results:

> source("lifetimeMeanVar.r")> outcome <- lifetimeMeanVar(500,1,2)> outcome$Mean[1] 3.763913> outcome$Min[1] 0.02139847> outcome$Max[1] 50.12281> outcome$StandardError[1] 0.1906219> outcome$Var[1] 18.16836

You may also try summary(outcome).

Page 175: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

The function lifetimePDF produces the PDF curve of X in the given range(xmin, xmax). It also computes EX according to the estimated PDF.

lifetimePDF <- function(n,xmin,xmax,mu,sigma2) u <- rnorm(n, mu, sqrt(sigma2))eu <- exp(-1-u/4)h <- (xmax-xmin)/400x <- seq(xmin, xmax, h)fx <- xfor(i in 1:401) fx[i] <- mean(eu*exp(-x[i]*eu))m <- sum(x*fx*h) # calculate the meanplot(x, fx, type='l', main="PDF of Lifetime")abline(0,0) # abline(a,b) draw the straight line y=a+bxcat("Mean", m, "\n") # print out the mean

# Definition of function lifetimePDF' ends here

Page 176: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Source it into R to produce therequired results:> source("lifetimePDF.r")> lifetimePDF(500,0,55,1,2)> Mean 3.779971

0 10 20 30 40 50

0.00

0.10

0.20

0.30

PDF of Lifetime

x

fx

Page 177: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

6.3.3 Importance sampling

Let us consider the composition method discussed in section 6.3.2: Toobtain an estimate for

fY (·) =

∫fY |X (·|x )fX (x )dx

or to obtain a sample from fY (·), we need to draw a sample X1, · · · ,Xnfrom fX (·).

However sometimes we cannot directly sample from fX (·). Importancesampling oers an indirect way to achieve this goal via an appropriatelyselected PDF p(·).

Page 178: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Let p(·) be a density satisfying:

(a) the support of p contains the support of fX ,i.e. p(x) = 0 implies fX (x) = 0, and(b) it is easy to sample from p(·).

Importance sampling method for approximating

J ≡ E h(X ) =

∫h(x )fX (x )dx

(i) Draw X1, · · · ,Xn ∼i .i .d . p(·)(ii) Compute the estimator

J =n∑i=1

wih(Xi )/ n∑i=1

wi ,

where wi = fX (Xi )/p(Xi ).

Page 179: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Importance sampling places weights greater than 1 on the regions wherefX (x ) > p(x ), and downweights the regions where fX (x ) < p(x ).

Choice of p(·): as close to fX (·) as possible among all PDF satisfying (a)and (b) in the previous page.

The standard error of J is[ n∑i=1

h(Xi ) − J 2w2i

]1/2/ n∑i=1

wi .

which is inated when p(·) poorly approximates fX (·).

Note.∑ni=1wi can be viewed as a version of the eective sample size in

the importance sampling. When p(·) diers substantially from fX (·), all wiare small. Hence the sampling is inecient.

Page 180: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Remark. In the above calculation, we may replace the PDF fX (·) by g (·) ≡C0fX (·), where C0 > 0 is an unknown constant. The algorithm stays thesame but with the weights

wi = g (Xi )/p(Xi ).

For example, fX (x ) = C−10 e−x2/(|x |+2), where the normalised constant C0 =∫

e−x2/(|x |+2)dx is not easy to compute. In this case we may use g (x ) =

e−x2/(|x |+2) instead of fX (x ) in importance sampling.

Page 181: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Proof of Remark. By the LLN, as n →∞,

1

n

n∑i=1

wiP−→

∫g (x )

p(x )p(x )dx =

∫g (x )dx = C0

∫fX (x )dx = C0,

and

1

n

n∑i=1

wih(Xi )P−→

∫g (x )

p(x )h(x )p(x )dx

=

∫g (x )h(x )dx = C0

∫fX (x )h(x )dx = C0E h(X ).

Hence, by Slutzky’s theorem,n∑i=1

wih(Xi )/ n∑i=1

wiP−→ E h(X ).

Page 182: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Application to sequential sampling: fY (·) =∫fY |X (·|x )fX (x )dx

(i) Draw X1, · · · ,XN ∼i .i .d . p(·),(ii) DrawYi ∼ fY |X (·|Xi ), i = 1, · · · , n , independently.

Let wi = g (Xi )/p(Xi ) and µy = E (Y ), then

fY (y ) =n∑i=1

wi fY |X (y |Xi )/ n∑i=1

wi ,

µy =n∑i=1

wiYi/ n∑i=1

wi ,

which is guaranteed by the fact (Xi ,Yi ) ∼i .i .d . p(x )fY |X (y |x ).

Note. Importance sampling does not yield correct samples, as

Xi / fX (·), Yi / fY (·)

Page 183: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Example 11 (Continue). Suppose now the quality index of the rawmaterialsU follows a generalised normal distribution with PDF

fU (u) ∝ exp−1

2

u − µσ

ν ≡ g (u)where ν > 0 is a constant. Recall

fX |U (x |u) = e−(1+u/4) exp−xe−(1+u/4) for x > 0.

We adopt an importance sampling scheme as follows:

1. DrawU1, · · · ,Un fromN (µ,σ2), compute theweightwi = g (Ui )/φ(Ui−µσ ),

where φ denotes the PDF of N (0, 1).

2. Draw Xi from Exp(e1+Ui /4), i = 1, · · · , n .

Page 184: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Then the estimated mean for X is

Xn =n∑i=1

wiXi/ n∑i=1

wi .

The estimated PDF is

fX (x ) =

∑ni=1wi fX |U (x |Ui )∑n

i=1wi=

∑ni=1wi e

−(1+Ui /4) exp−xe−(1+Ui /4)∑ni=1wi

.

Page 185: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

The R-function lifetimeMeanIS implements the above scheme for calcu-lating EX :

lifetimeMeanIS <- function(n, mu, sigma2, nu) u=rnorm(n, mu, sqrt(sigma2)) #generate n numbers from N(mu, sigam2)w=exp(-0.5*abs((u-mu)/sqrt(sigma2))^nu)/dnorm((u-mu)/sqrt(sigma2))

# compute the weights w_ix<-ufor(i in 1:n) x[i]<-rexp(1, 1/exp(1+u[i]/4))list(Mean=sum(x*w)/sum(w), Min=min(x), Max=max(x))

Page 186: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

The results for µ = 1, σ2 = 2 and ν = 0.5 or 3 are as follows:

> source("lifetimeMeanIS.r")> lifetimeMeanIS(5000,1,2,0.5)$Mean[1] 0.8827147$Min[1] 0.0003652474$Max[1] 57.21467> lifetimeMeanIS(10000,1,2,3)$Mean[1] 1.616474$Min[1] 0.00125402$Max[1] 56.77547

Page 187: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

The R-function lifetimePDF.IS implements the above scheme for esti-mating PDF fX and E (X ):

lifetimePDF.IS <- function(n,xmin,xmax,mu,sigma2,nu) u <- rnorm(n, mu, sqrt(sigma2))Eu <- exp(-(1+u/4)) # Eu=e^-(1+u/4)w=exp(-0.5*abs((u-mu)/sqrt(sigma2))^nu)/dnorm((u-mu)/sqrt(sigma2))

# compute the weights w_isumw <- sum(w)h <- (xmax-xmin)/400x <- seq(xmin, xmax, h)fx <- xt <- 1:nm <- 0for(i in 1:401)

t <- Eu*exp(-x[i]*Eu)# t = PDF of Exp(1/e^(1+u/4)) at x=x[i] --- THIS IS MOREfx[i] <- sum(t*w)/sumwm <- m+x[i]*fx[i]*h # calculate the mean

plot(x, fx, type='l', main="PDF of Lifetime")

Page 188: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

abline(0,0) # abline(a,b) draw the straight line y=a+bxcat("Mean", m, "\n") # print out the mean

You may source it in, and try lifetimePDF.IS(5000,0,60,1,2,0.5) etc.

Page 189: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Importance of using appropriate sampling distributions

An alternative measure for the eective sample size (ESS) is dened asn/1 + cv (w ), where cv (w ) is the sample coecient of variation of theweights

cv (w ) = 1

n − 1

n∑i=1

(wi − w )21/2/w , w =

1

n

n∑i=1

wi .

We illustrate the importance of choosing ‘correct’ p(·) in the example below.

Example 12. Estimate µ for N (µ, 1) based on the importance samplingmethod using N (0, 1) as the sampling distribution p(·). The table below isproduced by R-function effectN with n = 1000.

µ 0 1 2 3 4 5Estimated µ -0.022 1.026 1.756 2.806 2.873 3.325

ESS 1000 448.9 246.1 113.4 65.7 33.8

Page 190: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

effectN=function(n, mu) x=rnorm(n)w=dnorm(x,mu,1)/dnorm(x) # sampling weightsmuhat=mean(w*x)/mean(w) # estimate for mu by importance samplingess=n/(1+sqrt(var(w))/mean(w)) # effective sample sizelist(SampleSize=n, Mean=mu, EstimatedMean=muhat, ESS=ess)

Page 191: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Chapter 7. Introduction to Statistical Inference

7.1. What is Statistics: a scientic subject on collecting and analyzingdata.

collecting: designing experiments/questionnaires, designing samplingschemes, administration of data collection

analyzing: estimation, testing and forecasting

Statistics is an application-oriented subject, is particularly useful or help-ful in answering questions such as:

Those questions are dicult to study in laboratory, and admit no self-evident axioms.

Page 192: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

What to learn in Statistics: basic ideas, methods (including computation)and theory.

Some guidelines for learning/applying statistics:

• Understand what data say in each specic context. All the methodsare just tools to help to understand data• Concentrate on what to do and why, rather than concrete calculationand graphing• It may take a while to catch the basic idea of statistics – Keep thinking!!!

Page 193: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

7.2 Population, Sample and Parametric Models

Two practical situations:

• A new type of tyre was designed to increase the lifetime. The manu-facturer tested 120 new tyres and obtained the average lifetime (overthose 120 tyres) 35,391 miles. So it claims that the mean lifetime ofthe new tyres is 35,391 miles.

• A newspaper sampled 1000 potential voters, and 350 of them wereDemocratic party supporters. It claims that the proportion of theDemocratic voters in the whole Country is 350/1000=35%.

Page 194: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

In both cases, the conclusion is drawn on a population (i.e. all the objectsconcerned) based on the information from a sample (i.e. a subset ofpopulation).

In the rst case, it is impossible to measure the whole population. Forthe second case, it is not economic to measure the whole population.Therefore, errors are inevitable!

Page 195: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Population is an entire set of the objects concerned, and those objectsare typically represented by some numbers. We do not know the entirepopulation in practice.

For the tyre example, the population consists of the lifetimes of all thetyres, including those to be produced in the future.

For the opinion poll examples, the population consists of many ‘1’ and‘0’, where each ‘1’ represents a voter for Democratic party, and each ‘0’represents a voter for other parties.

A Sample is a (randomly) selected subset of a population, and is a set ofknown data in practice.

Page 196: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Population is unknown. We represent a population by a probability distri-bution.

> jobs <- read.table("Jobs.txt", header=T, row.name=1)> dim(jobs)[1] 253 5> mean(jobs[,4]); var(jobs[,4])[1] 47.12648[1] 46.83315> hist(jobs[,4], probability=T, nclass=15, xlab="Salary",

main="Histogram of Salary data")> range(jobs[,4])[1] 26 65> x <- seq(26, 65, 0.1)> lines(x, dnorm(x, 47.12648, sqrt(46.83315)))

# superimpose the PDF of N(47.12648, 46.83315)

Page 197: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Histogram of Salary data

Salary

Den

sity

30 40 50 60

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

The blue curve is the PDF of N (Xn, S2n).

n = 253, Xn = 47.126, andS2n = 46.833, Sn = 6.843.

Xn ± Sn = (40.283, 53.969)

171 points are in this interval:

171/253 = 67.58%.

Xn±1.96Sn = (33.714, 60.538)

242 points are in this interval:

242/253 = 95.65%

Suggesting N (Xn, S2n)!!!

Page 198: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Parametric Models. For a given problem, we typically assume a populationto be a probability distribution F (·; θ), where the form of distribution Fis known (such as normal, Poisson etc), and θ denotes some unknowncharacteristics (such as mean, variance etc) and is called a parameter.Such an assumed distribution is often called a parametric model.

For the tyre lifetime example, the population may be assumed to beN (µ,σ2) with θ = (µ,σ2), where µ is the ‘true’ lifetime. Let

X = the lifetime of a tyre.

Then X ∼ N (µ,σ2).

Page 199: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

For the opinion poll example, the population is a Bernoulli distribution:

P (X = 1) = P ( a Democratic voter ) = π,

P (X = 0) = P ( a Republican voter ) = 1 − π,

where

π = the proportion of Democratic supporters in the USA= the probability of a voter to be a Democratic supporter

Page 200: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

A Sample: a set of data or random variables? – A Duality

A sample of size n : X1, · · · ,Xn, is also called a random sample. It consistsof n concrete numbers in a practical problem.

The word ‘random’ captures the character that samples (of the same size)taken by dierent people or at dierent times may be dierent, as theyare dierent subsets of a population.

Furthermore, a sample is also viewed as n independent and identicallydistributed (i.i.d.) random variables, when we assess the performance of astatistical method.

For the tyre lifetime example, the sample (of size n = 120) used gives thesample mean

X =1

n

n∑i=1

Xi = 35, 391

Page 201: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

A dierent sample may give a dierent sample mean, say, 36,721.

Question: Is the sample mean X a good estimator for the unknown ‘true’lifetime µ?

Obviously we cannot use the concrete number 35,391 to assess how goodthis estimator is, as a dierent sample may give a dierent average value,say, 36,721.

Key idea: By treating X1, · · · ,Xn as random variables, X is also a randomvariable. If the distribution of X concentrates closely around (unknown)µ, X is a good estimator for µ.

Statistic. Any known function of a random sample is called a statistic.

Page 202: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Statistic is used for statistical inference such as estimation, testing etc.

Example. Let X1, · · · ,Xn be a sample from population N (µ,σ2). Then

X =1

n

n∑i=1

Xi , X1 + X2n , sin(X3) + 6

are all statistics. But

(X1 − µ)/σ

is not a statistic, as it depends on unknown quantities µ and σ2.

Note. It is often to denote a random sample as x1, · · · , xn , indicating theyare n concrete numbers. They are seen as a realization or an instance of ni.i.d. random variables X1, · · · ,Xn . But we do not make this dierence, asit makes statements laboursome from time to time.

Page 203: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Unknown Real World

PopulationDistributionFX(·; θ)

A sample

X1, · · · , Xn

i.i.d. random variables

X1, · · · , Xn

SamplingInferenceSamplingInference

Math. Model

Math. Model

Known Information/Data

θ is called a parameter.

A known function of X1, · · · ,Xn is called a statistic.

Page 204: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Dierence between Probability and Statistics

Probability: a mathematical subject

Statistics: an application oriented subject (which uses Probability heavily)

Example. Let

X = No. of StatsI lectures attended by a student

Then X ∼ Bin(17, p), i.e.

P (X = k ) =17!

k !(17 − k )!pk (1 − p)17−k , k = 0, 1, · · · , 17.

Page 205: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Probability questions: treating p as known

• what is E(X)? (the average lectures attended)• what is P (X ≥ 14)? (the proportion of the students attending at least14 lectures)• what is P (X ≤ 8)? (the proportion of the students attending fewerthan the half of lectures)

Statistics questions:

• what is p? (the average attendance rate)• Is p not smaller than 0.9?• Is p smaller than 0.5?

Page 206: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

7.3 Fundamental concepts in statistical inference

Let X1, · · · ,Xn be a sample from a population F (·, θ). Most inferenceproblems can be identied as one of the three types: (point) estimation,condence sets and hypothesis testing for parameter θ.

7.3.1 Point estimation: Provide a single “best guess” of θ, based on obser-vations X1, · · · ,Xn . Formally we may write

θ ≡ θn = g (X1, · · · ,Xn)

as a point estimator for θ, where g (X1, · · · ,Xn) is a statistic.

For example, a natural point estimator for the mean µ = EX1 is the samplemean µ = Xn = n−1

∑ni=1Xi .

Remark. Parameters to be estimated are unknown constants. Their esti-mators are viewed as r.v.s, although in practice θ, µ admit some concretevalues.

Page 207: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

A good estimator should make |θ − θ | as small as possible. However

(i) θ is unknown,(ii) the value of θ changes with the sample observed.

Hence we seek for an estimator θ whichmakes the MSE as small as possiblefor all possible values of θ.

Page 208: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

The mean square error of the estimator θ is dened as

MSEθ(θ) = Eθ(θ − θ)2 = Biasθ(θ)2 + Varθ(θ), (5)

where Biasθ(θ) = Eθ(θ) − θ is called the bias.

When Biasθ(θ) = 0 for all possible values of θ, θ is called an unbiasedestimator.

Note. The subscript ‘θ’ in Eθ etc indicates that the expectation etc aretaken with θ being the true value.

Page 209: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

The standard error of the estimator θ: SE(θ) = Varθ(θ)1/2

Note. The standard deviation of θ Varθ(θ)1/2 may depend on the un-known θ. The standard error SE(θ) is known, and is an estimator for thestandard deviation of θ.

Page 210: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Example 1. LetY1, · · · ,Yn be a sample from Bernoulli(p). Let p ≡ pn = Yn =∑i Yi

/n . Then

E (p) =1

n

n∑i=1

EYi = p, Var(p) = 1

n2

n∑i=1

Var(Yi ) =p(1 − p)

n.

Therefore pn is an unbiased estimator for p with the standard deviation√p(1 − p)/n , and SE(p) =

√p(1 − p)/n .

For example, if n = 10 and Yn = 0.3, we have p = 0.3 and SE(p) = 0.1449

while the standard deviation of p is√p(1 − p)/10 unknown.

Proof of (5).

MSE(θ) = E [(θ − E θ) + (E θ − θ)2]= E (θ − E θ)2 + (E θ − θ)2 + 2(E θ − θ)E (θ − E θ)

= Var(θ) + Bias(θ)2 + 0.

Page 211: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Consistency. θn is a consistent estimator for θ if θnP−→ θ as n →∞.

The consistency is a natural condition for a reasonable estimator, as θnshould converge to θ if we have innity amount of information. Thereforea non-consistent estimator should not be used in practice!

If MSE(θn) → 0, θ m .s .−→ θ. Therefore θ P−→ θ, i.e. θ is a consistent estimator

for θ.

In Example 1 above, MSE(p) = Var(p) = p(1 − p)/n → 0. Hence p is consis-tent.

Page 212: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Asymptotic Normality. An estimator θn is asymptotically normal if

(θ − θ)/SE(θ) D−→ N (0, 1).

Remark. Many good estimators such as MLE, LSE and MME are asymptot-ically normal under some mild conditions, such as nite moments andsmooth likelihood function (as function of parameters)

Page 213: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

7.3.2 Condence sets

A point estimator is simple to construct and to use. But it is not very infor-mative. For example, it does not reect the uncertainty in the estimation.

Condence Interval is the most commonly used condence set, is moreinformative than a point estimator.

Example2. Let us start with a simple example. A random sampleX1, · · · ,Xnare drawn from N (µ, 1). Then

√n(X − µ) ∼ N (0, 1). Hence

P (−1.96 ≤√n(X − µ) ≤ 1.96) = 0.95,

or

P (X − 1.96/√n < µ < X + 1.96/

√n) = 0.95.

Page 214: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

So a 95% condence interval for µ is

(X − 1.96/√n, X + 1.96/

√n).

Suppose n = 4, X = 2.25. Then a 95% C.I. is (2.25 − 0.98, 2.25 + 0.98) =(1.27, 3.23).

Question: what is P (1.27 < µ < 3.23)? — Note µ is a unknown constant!

Answer: (1.27, 3.23) is one instance of the random interval (X − 0.98, X +

0.98) which covers µ with probability 0.95.

If one draw 10,000 samples, with size n = 4 each, to construct 10,000intervals of the form (X − 0.98, X + 0.98), about 9,500 intervals cover thetrue value of µ.

Page 215: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Denition. If L ≡ L(X1, · · · ,Xn) and U ≡ U (X1, · · · ,Xn) are two statisticsfor which

P L(X1, · · · ,Xn) < θ < U (X1, · · · ,Xn) = 1 − α ,

(L, U ) is called a 100(1 − α)% condence interval for θ.

Remark. 1 − α is called the condence level, which is usually set at 0.90,0.95 or 0.99. Naturally for given α , we shall search for the interval with theshortest length U − L, which gives the most accurate estimation.

Page 216: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Approximate condence interval based on an asymptotically normal es-timator: If (θ − θ)/SE(θ) D

−→ N (0, 1). Then θ ± Zα/2SE(θ) is an approximate1 − α condence interval for θ, where Zα is the top-α point of N (0, 1), i.e.P N (0, 1) > Zα = α .

Forα = 0.05, Zα/2 = 1.96 ≈ 2, one of the most used 95% condence intervalis

θ ± 2 × SE(θ) =(θ − 2 × SE(θ), θ + 2 × SE(θ)

).

Page 217: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Example 1 (continue). Let Y1, · · · ,Yn be a sample from Bernoulli(p). Letp ≡ pn = Yn =

∑i Yi

/n . By the CLT, (pn − p)/SE(pn) ∼ N (0, 1) asymptotically.

Hence an approximate 1 − α condence interval for p is

pn ± Zα/2SE(pn) = pn ± Zα/2√pn(1 − pn)/n .

For example, if n = 10, pn = 0.3, an approximate 95% condence intervalfor p is

0.3 ± 2√0.3(1 − 0.3)/10 = 0.3 ± 0.145 = (0.155, 0.445).

However if now n = 100 and pn = 0.3, an approximate 95% condenceinterval for p is

0.3 ± 2√0.3(1 − 0.3)/100 = 0.3 ± 0.046 = (0.254, 0.346).

Page 218: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Remark. The point estimator pn unchanged with n = 10 or 100. Howeverthe condence interval is much shorter when n = 100, giving much moreaccurate estimation.

7.3.3 Hypothesis testing: We start with some default statement – calleda null hypothesis denoted by H0. We ask if the date provide signicantevidence to reject the null hypothesis.

For example wemay test if a coin is fair by using the hypothesis H0 : p = 0.5.

Remark. (i) Estimation and testing address dierent needs in practice.

(ii) A statistical test often takes binary decision: ‘reject H0’ or ‘not reject H0’.However technically a testing problem is more complex that an estimationproblem.

Page 219: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

7.4 Nonparametric models and Empirical distribution functions

The outcome of a statistical inference depends on two factors: data andassumption.

The data are objective, while the assumption is more subjective. We wouldlike to let data speak as much as possible.

The classical statistical inference is typically based on an assumption ofa parametric model: X1, · · · ,Xn is a sample from the distribution F (·, θ),where the form of the distribution F is known (such as Normal, Exponentialetc), and the parameter θ is unknown. The inference is on either estimationor testing of parameter θ.

Page 220: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Nonparametric model: let X1, · · · ,Xn is a sample from a distribution Fbelong to a class of distributions F . For example, F may consist of allcontinuous distributions on (−∞,∞). The statistical inference is either toestimate or to test some characteristics of F , or F itself.

Parametric models limit the tasks of statistical inference. It facilitates moreecient inference if the assume parametric form is correct. Nonparametricmodels impose less model-bias, however its statistical inference is morechallenging.

Page 221: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Empirical Distribution Functions. Let X1, · · · ,Xn be a sample from CDF F .A natural estimator for F is dened as

F (x ) =No. of X ′

is not greater than x

n=1

n

n∑i=1

I (Xi ≤ x ),

where I (A) = 1 if A occurs, and 0 otherwise. F (·) is a well-dened CDF, andis called the empirical distribution function.

Note I (Xi ≤ x ) is a sequence of Bernoulli r.v.s with p = F (x ). Hence,

E F (x ) = F (x ), VarF (x ) = F (x )1 − F (x )/n,

and F (x ) m .s .−→ F (x ). In fact it also holds that

supx|F (x ) − F (x )|

P−→ 0, P sup

x|F (x ) − F (x )| > ε ≤ 2e−2nε

2.

Page 222: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Chapter 8. Nonparametric bootstrap

Bootstrap is a computational method for estimating standard errors andcondence intervals.

Let X1, · · · ,Xn ∼i i d F . We use statistic

T = g (X1, · · ·Xn)

for inference (i.e. estimation or testing). It is important to know, e.g. thestandard deviation or the standard error of T .

Page 223: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Bootstrap idea: Let Fn(x ) = n−1∑i I (Xi ≤ x ).

Real world: F −→ X1, · · · ,Xn −→ T = g (X1, · · ·Xn)

Bootstrap world: Fn −→ X ∗1 , · · · ,X∗n −→ T ∗ = g (X ∗1 , · · ·X

∗n )

Although we do not know F , Fn is known. Therefore we know the distri-bution of T ∗ (in principle), which is taken as an approximation for thedistribution of T . We compute the distribution of T ∗ by simulation.

Page 224: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

8.1 Bootstrap variance estimation

Suppose we need to know variance v = Var(T ) = Varg (X1, · · ·Xn). Thebootstrap scheme below provides an estimator v ∗ for v .

1. Draw X ∗1 , · · ·X∗n independently from Fn .

2. Compute T ∗ = g (X ∗1 , · · ·X∗n ).

3. Repeat Steps 1 & 2 B times, to obtain T ∗1 , · · · ,T∗B.

4. Compute the sample variance v ∗ = (B − 1)−1∑1≤i≤B (T

∗i− T ∗)2,

where T ∗ = B−1∑1≤i≤B T

∗i.

Remark. Step 1 can be easily implemented in R. Let X be n-vector (X1, · · · ,Xn),then a bootstrap sample is obtained using sample as follows:

> Xstar <- sample(X, n, replace=T)

Page 225: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Bootstrap MSE estimation. Let T = g (X1, · · ·Xn) be an estimator for θ =θ(F ). Let

m = MSE(T ) = E (T − θ)2 = Var(T ) + (ET − θ)2.

The bootstrap scheme below provides an estimator m∗ for m.

1. Draw X ∗1 , · · ·X∗n independently from Fn .

2. Compute T ∗ = g (X ∗1 , · · ·X∗n ).

3. Repeat Steps 1 & 2 B times, to obtain T ∗1 , · · · ,T∗B.

4. Compute the sample MSE

m∗ =1

B

B∑i=1

T ∗i − θ(Fn)2,

where Fn(x ) = n−1∑i I (Xi ≤ x ) is the empirical distribution.

Page 226: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Example 1. Consider the daily returns of the Shanghai Stock ExchangeComposite Index in December 1994 – September 2010

The data are saved in the le shanghaiSECI.txt. The sample size is n =3839.

> x <- read.table("shanghaiSECI.txt", skip=3, header=T)> x[1:4,] # print out the first 4 rows

idxcd idxnmabbr date idxdret1 8 SSE-Composite-Index 1994-12-08 -0.01652 8 SSE-Composite-Index 1994-12-09 -0.00143 8 SSE-Composite-Index 1994-12-12 -0.00854 8 SSE-Composite-Index 1994-12-13 0.0000> dim(x)[1] 3839 4> y <- x[,4]*100 # daily return in percentages> summary(y)

Min. 1st Qu. Median Mean 3rd Qu. Max.

Page 227: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

-12.95000 -0.89000 0.01000 0.04994 0.96000 33.04000> var(y)[1] 4.111112> hist(y, nclass=40, prob=T, main='Histogram of SSECI Returns')> x <- seq(-12, 33, 0.1)> lines(x, dnorm(x, mean(y), sqrt(var(y))), col='blue', lwd=2)

# superimpose a normal PDF with the same mean/var onto the histogram> qqnorm(y, xlab="Normal quantiles", ylab="Quantiles of returns")

# quantiles of the empirical distribution# vs quantiles of Normal distribution

> qqline(y, col="blue") # add a line passing through 1st and 3rd# quartiles

Page 228: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Histogram of SSECI Returns

Returns

Den

sity

−10 0 10 20 30

0.00

0.05

0.10

0.15

0.20

0.25

−2 0 2

−10

010

2030

Normal Q−Q Plot

Normal quantiles

Qua

ntile

s of

ret

urns

Page 229: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

The histogram shows that the returns do not follow a normal distribution,as the peak around 0 is much higher.

The Q-Q plot shows that both the tails of the return distribution are muchheavier than the tails of normal distributions.

Recall: For any univariate CDF F (·), the quantile of F is dened as F −1(α) =infx : F (x ) ≥ α , α ∈ [0, 1].

Q-Q plot of two distributions F and G : the curves (G−1(α),F −1(α)), α ∈ [0, 1].

Lemma 1. Let F ,G are two univariate CDFs, b > 0 and a are two constants.Then G (x ) = F (x−ab ) for any x i G

−1(α) = a + bF −1(α) for any α ∈ [0, 1].

Hence, a Q-Q plot is a straight line i the two distributions are of the sameform (i.e. one is a scale-location transformation of the other).

R-functions: qqnorm, qqline, qqplot

Page 230: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

We introduce two measures related to the 3rd and 4th moments, whichare often used as the measures for non-Gaussianality. Let X ∼ F andE (X 4) < ∞. Write µ = EX and σ2 = Var(X ).

Skewness of F : γ = E (X − µ)3/σ3.Kurtosis of F : κ = E (X − µ)4/σ4.

Remark. (i) The skewness is a measure for symmetry of distributions. If Fis symmetric w.r.t the mean µ (such as N (µ,σ2)), γ = 0.

(ii) The kurtosis is a measure for tail-heaviness (i.e. fat-tails). For N (µ,σ2),κ = 3. When κ > (<)3, we say that the tails of F are heavier (lighter) thannormal distributions.

Page 231: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

(iii) Estimators for Skewness and Kurtosis: Let X and S2 be the samplemean and the sample variance. Then

γ =1

nS3

n∑i=1

(Xi − X )3, κ =

1

nS4

n∑i=1

(Xi − X )4.

Page 232: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Example 1 (Continue). We compute the estimates for skewness and kurto-sis for the Shanghai SECI returns:

> mean((y-mean(y))^3) /var(y)^(1.5)[1] 1.204415 # estimated skewness> mean((y-mean(y))^4) /var(y)^2[1] 25.05686 # estimated kurtosis

Since γ = 1.204415 > 0, the distribution is skewed to the right. Thedistribution is also heavy-tailed, since κ = 25.05686.

How accurate are those estimates? — use bootstrap to nd the standarderrors of the estimators.

> skew <- 1:1000> kurt<- 1:1000

Page 233: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

> for(i in 1:1000) + ystar <- sample(y, 3839, replace=T)+ skew[i] <- mean((ystar-mean(ystar))^3) /var(ystar)^(1.5)+ kurt[i] <- mean((ystar-mean(ystar))^4) /var(ystar)^2+ > sqrt(var(skew)); sqrt(var(kurt))[1] 0.9514143 # bootstrap estimate for SE(estimated skewness)[1] 13.96478 # bootstrap estimate for SE(estimated kurtosis)

Hence, the estimated skewness is 1.2044 with the standard error 0.9514,the estimated kurtosis is 25.06 with the standard error 13.97.

In the above we draw B = 1000 bootstrap samples. For this example, theresults are insensitive for B ≥ 100.

The analysis indicates that the returns are skewed to its right (unusual!)and heavy-tailed. Certainly their distribution is not normal.

Page 234: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

8.2 Bootstrap condence intervals

8.2.1 Approximate normal intervals

If (θ−θ)/Var(θ)1/2 D−→ N (0, 1), an approximate (1−α) condence interval

for θ is

θ ± Zα/2Var(θ)1/2,

where Zα/2 is the top-α/2 point of N (0, 1).

However Var(θ) is often unknown. Replacing it by its bootstrap estimate(see section 8.1 above), we obtain a bootstrap interval:

θ ± Zα/2Var(θ∗)1/2.In practice, we repeat bootstrap sampling B times, obtaining bootstrap es-timates θ∗1, · · · , θ

∗B. We take the sample variance of θ∗1, · · · , θ

∗B as Var(θ∗).

Page 235: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

8.2.2 Pivotal intervals

Let X1, · · · ,Xn be a sample from distribution F . We are interested inestimating a characteristics θ = θ(F ) (such as mean, skewness etc). Letθ = g (X1, · · · ,Xn) = θ(Fn) be the estimator for θ. Let rα be the α-thpercentile of the pivotal θ − θ, i.e.

α = P (θ − θ ≤ rα ).

Then

P (rα/2 < θ − θ ≤ r1−α/2) = 1 − α .

This gives a (1 − α)-th condence interval of θ:

(θ − r1−α/2, θ − rα/2).

This is a valid interval estimation if rα does not depend on θ, i.e. thedistribution of the pivotal θ − θ does not depend on θ. However thisrequirement is not necessary if we adopt a bootstrap approach.

Page 236: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Under some standard conditions,

P (θ − θ < r ) ≈ P (θ∗ − θ < rX1, · · · ,Xn)

when n is large, where θ∗ = g (X ∗1 , · · · ,X∗n ). Thus we may replace rα/2 and

r1−α/2 by their bootstrap counterparts as follows:

Repeat bootstrap sampling B times to form estimates θ∗1, · · · , θ∗B.

Let θ∗α be the [Bα]-th smallest value among θ∗1, · · · , θ∗B, where

[Bα] denotes the integer part of Bα (i.e. [a] is the largest integersmaller than a). Then

r ∗α/2 = θ∗α/2 − θ, r ∗1−α/2 = θ

∗1−α/2 − θ.

The (1 − α) bootstrap pivotal interval for θ is:(2θ − θ∗

1−α/2, 2θ − θ∗

α/2)

Page 237: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

8.2.3 Percentile intervals

The (1 − α) bootstrap percentile interval for θ is:(θ∗α/2, θ∗

1−α/2)

Example 2. We calculate the three bootstrap intervals for the medianof the salary for the graduates in a business school based on data inJobs.txt using the following R function:

jobsMedianCIs <- function(alpha, B) jobs <- read.table("Jobs.txt", header=T, row.names=1)y <- jobs[,4] # salary datacat("Point estimate for median of salaries:", median(y), "\n\n")my <- 1:Bfor(i in 1:B)

Page 238: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

ystar <- sample(y, 253, replace=T) # draw bootstrap samplemy[i] <- median(ystar) # bootstrap estimate for medianmy <- sort(my) # sort bootstrap estimates in ascending orderi <- as.integer(alpha*B/2) # i=[B x alpha/2]cat(1-alpha, "Bootstrap confidence intervals for median of salaries", "\n")cat("Normal interval:", median(y)-qnorm(1-alpha/2)*sqrt(var(my)),

median(y)+qnorm(1-alpha/2)*sqrt(var(my)), "\n")cat("Pivotal interval:", 2*median(y)-my[B-i], 2*median(y)-my[i], "\n")cat("Percentile interval:", my[i], my[B-i], "\n")

Calling jobsMedianCIs(0.05, 5000), we obtain the results below. Notethat the three intervals for this example are very similar.Point estimate for median of salaries: 47

0.95 Bootstrap confidence intervals for median of salariesNormal interval: 45.72511 48.27489Pivotal interval: 46 48Percentile interval: 46 48

Page 239: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Example 1 (Continue). We calculate the three bootstrap intervals for theskewness using the following R-function:

SSECIbootstrapCIs <- function(B) x <- read.table("shanghaiSECI.txt", skip=3, header=T)y <- x[,4]*100skew0 <- mean((y-mean(y))^3) /var(y)^(1.5)cat("Point estimate for skewness:", skew0, "\n\n")skew <- 1:Bfor(i in 1:B) ystar <- sample(y, 3839, replace=T) # draw bootstrap sampleskew[i] <- mean((ystar-mean(ystar))^3) /var(ystar)^(1.5)skew <- sort(skew) # sort the data in ascending orderi <- as.integer(0.025*B) # i =[0.025B]cat("95% Bootstrap confidence intervals for skewness", "\n")cat("Normal interval:", skew0-2*sqrt(var(skew)),

skew0+2*sqrt(var(skew)), "\n")cat("Pivotal interval:", 2*skew0-skew[B-i], 2*skew0-skew[i], "\n")cat("Percentile interval:", skew[i], skew[B-i], "\n")

Page 240: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Call SSECIbootstrapCIs(1000), yielding the following output:

Point estimate for skewness: 1.204415

95% Bootstrap confidence intervals for skewnessNormal interval: -0.6486737 3.057503Pivotal interval: -0.6067615 2.577141Percentile interval: -0.1683116 3.015591

Final Remark. All the bootstrap intervals work well when θ is asymptoticallynormal.

Page 241: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Chapter 9. Point Estimation

Let X1, · · · ,Xn be a random sample from a population F (·, θ), where theform of F is known but the parameter θ is unknown, and it has p compo-nents. Often we may specify θ ∈ Θ, where Θ is called the parameter space.

For N (µ,σ2), p = 2 and Θ = (−∞, ∞) × (0, ∞). For P oi sson(λ), p = 1 andΘ = (0, ∞).

Goal: to nd a (point) estimator for θ.

9.1 Method of Moments Estimation

Let µk ≡ µk (θ) = E (X k1 ) denote the k -th moment of the population, k =1, 2, · · · . Then µk depends on unknown parameter θ, as everything else on

Page 242: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

the distribution F (·, θ) are known. Denote the k -th sample moment by

Mk =1

n

n∑i=1

X ki =1

n(X k1 + · · · + X

kn ).

The MM estimator θ for θ is the solution of the p equations

µ1(θ) = M1, µ2(θ) = M2, · · · , µp(θ) = Mp .

Example 1. Let X1, · · · ,Xn be a sample from a population with mean µand variance σ2 < ∞. Find the MM estimator for (µ,σ2).

There are two unknown parameters. Let

µ = µ1 = M1, µ2 = M2 =1

n

n∑i=1

X 2i .

Page 243: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

This gives us µ = M1 = X . Since σ2 = µ2 − µ21,

σ2 = M2 −M21 =

1

n

n∑i=1

X 2i − X

2 =1

n

n∑i=1

(Xi − X )2.

Note. E (σ2) = E (X 21 ) − E (X

2) = σ2 + µ2 − (σ2/n + µ2) = n−1n σ

2. We callE (σ2) − σ2 = −σ2/n the estimation bias. The sample variance

S2 =1

n − 1

n∑i=1

(Xi − X )2

is a more frequently used estimator for σ2, and it has zero-bias.

Page 244: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Theorem1. Under somemild regularity conditions, theMME θ is a consistentestimator in the sense that as n →∞, θ P

−→ θ, i.e.

P | |θ − θ | | > ε → 0 for any ε > 0.

Further it is asymptotically normal, i.e.√n(θ − θ) converges in distribution

to a p-dimensional normal distribution.

Page 245: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

9.2. Maximum likelihood estimation

9.2.1 Likelihood

Likelihood is one of the most fundamental concepts in all types of statis-tical inference.Denition 1 Suppose that X has density function or probability functionf (x; θ). We have observed X = x. Then the likelihood function with obser-vation x is dened as

L(θ) ≡ L(θ; x) = f (x; θ).

Density/probability function: a function of x, specifying the distributionof random variable X

Likelihood: a function of θ, reecting information on θ contained in ob-servation x

Page 246: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Note. A likelihood function represents the uncertainty on a unknownnon-random constant θ, and it is not a density or probability function! Itprovides

a rational degree of belief, oran order of preferences

on possible values of the parameter θ. This can be seen more clearly inthe simple example on next slide.

In fact, a likelihood function is often dened up to a positive constant— the constant here refers to a quantity independent of θ. But it maydepending on x. (Note x is a given constant.)

Page 247: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Example 2. Suppose that x is the number of successes from a knownnumber n of independent trials with unknown probability of success π .The probability function, and so the likelihood function is

L(π) = f (x ; π) =(nx

)πx (1 − π)n−x .

The likelihood function L(π ; x ) can be graphed as a function of π . It changesshape for dierent values of x . A likelihood function for a x = 3 whenn = 10 is shown in the Figure below.

Page 248: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

0

0.05

0.1

0.15

0.2

0.25

0 0.2 0.4 0.6 0.8 1

Page 249: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Notice that the likelihood function shown above is not a density function.It does not have an area of 1 below it.

We use the likelihood function to compare the plausibility of dierentpossible parameter values. For instance, the likelihood is much larger forπ = 0.3 than for π = 0.8, that is the data x = 3 have a greater probabilityof being observed if π = 0.3 than if π = 0.8. This makes π = 0.3much morelikely as the true value for π than 0.8.

Note. In the above argument, we do not need to calculate exact prob-abilities under dierent values of θ. Only the order of those quantitiesmatters!

Page 250: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Let X1, · · · ,Xn be i.i.d. with PDF f (·, θ). Write X = (X1, · · · ,Xn)′. Then thelikelihood function is

L(θ) = L(θ;X) =n∏i=1

f (Xi , θ),

which is a product of n terms. Then the log-likelihood function is

l (θ) = l (θ;X) ≡ logL(θ;X) =n∑i=1

logf (Xi , θ),

which is a sum of n terms.

This explains why log-likelihood functions are often usedwith independentobservations.

Page 251: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

9.2.3 Maximum likelihood estimator (MLE)

The MLE is by far the most popular estimator.

Denition 2 — MLEA Maximum Likelihood Estimator (MLE), θ = θ(X) ∈ Θ, of parameter θ is anestimator satisfying

L(θ;X) ≥ L(θ;X) for all θ ∈ Θ, or equivalently l (θ;X) ≥ l (θ;X) forall θ ∈ Θ.

Obviously, a maximum likelihood estimator is the most plausible valuefor θ as judged by the likelihood function. In many cases where Θ iscontinuous and the maximum does not occur at a boundary of Θ, θ isoften the solution of the equation

s(θ;X) = ∂

∂θl (θ;X) = 0.

We call s(θ) ≡ s(θ;X) a score function.

Page 252: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Example 3. Suppose thatY1,Y2, . . . ,Yn is a random sample from N (µ,σ2)

where neither µ or σ2 is known. Then we can nd the maximum likelihoodestimator from the log-likelihood

l (µ,σ2) = −n log√2π − n/2 logσ2 −

n∑1

(Yi − µ)2/(2σ2)

= −n log√2π − n/2 logσ2 −

n∑1

(Yi − Y )2/(2σ2) − n(Y − µ)2/(2σ2).

This is maximised by choosing µ = Y , so µ = Y is the MLE for µ. It is easyto see

E (µ) =1

n

n∑i=1

E (Xi ) = µ.

Such a estimator is called unbiased.

Page 253: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

The prole log-likelihood remaining is

l (µ,σ2) = −n log√2π + (n/2)(logσ−2 − σ2σ−2),

where σ2 =∑n1(Yi − Y )

2/n . By the lemma below, the MLE for σ2 is σ2. Notethat the MLE of σ2 is biased since

E (σ2) = (1 − 1/n)σ2 , σ2.

Lemma. Dene L(x ) = log(x−1) − b/x , where b > 0 are constants. Then L(b) ≥ L(x ) for allx > 0.

Example 4. Let X1, · · · ,Xn be i.i.d. Bernoulli(π). Then

L(π) =n∏i=1

πXi (1 − π)1−Xi = πnX (1 − π)n(1−X ).

l (π) = nX log π + n(1 − X ) log(1 − π).Let s(π) = ∂

∂π l (π) = 0, leading to π = X .

Page 254: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Example 5. Suppose thatY1,Y2, . . . ,Yn is a random sample from an expo-nential distribution with density function e−(y−θ) for y ≥ θ. This is theusual exponential distribution shifted to start at θ. The Likelihood is

L(θ;Y) = e−n(Y−θ)I(θ,∞)(Y(1)),whereY(1) is the smallest observation. This likelihood is zero for θ > Y(1)and increases in θ for θ ≤ Y(1). So the MLE θ = Y(1), which is a boundarymaximum.

theta

L(th

eta)

0.0 0.5 1.0 1.5 2.0 2.5

01

23

4

Y(1)

Page 255: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Invariance property of MLEs

Suppose X ∼ f (x, θ), and ψ = g (θ). Let θ be the MLE for θ, i.e.

θ = argmaxθf (X, θ).

It is obvious to see that the MLE for ψ is ψ = g (θ).

If ψ = g (θ) is a 1-1 transform and ψ is the MLE for ψ , θ ≡ g−1(ψ) is theMLE for θ.

Page 256: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

9.2.4 Numerical computation of MLEs

In modern statistical applications, it is typically dicult to nd explicitanalytic forms for the maximum likelihood estimators. These estimatorsare found more often by iterative procedures built into computer software.An iterative scheme starts with some guess at the MLE and then steadilyimproves it with each iteration. The estimator is considered found whenit has become numerically stable. Sometimes the iterative proceduresbecome trapped at a local maximum which is not a global maximum. Theremay be a very large number of parameters in a model, which makes suchlocal entrapment much more common.

Page 257: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Newton-Raphson Scheme

Suppose that the log-likelihood function function l (θ) is suciently smooth.Then

s(θ) = 0,

where θ is the MLE and s(θ) = ∂∂θ l (θ) is the score function. Let

Ûs(θ) = Ül (θ) =∂2

∂θ∂θ′l (θ) =

(∂2l (θ)

∂θi ∂θj

).

Suppose θ is close to the true value θ0. By a simple Taylor expansion,

s(θ0) = Ûs(θ0)(θ0 − θ) + op(| |θ − θ0 | |).

Page 258: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

This leads to the approximation

θ ≈ θ0 − Ûs(θ0)−1s(θ0).

Since θ0 is unknown, we use iterative estimators

θk+1 = θk − Ûs(θk )−1s(θk ) (6)

for k = 1, 2, · · · , where θ0 is a prescribed initial value. We dene θ = θ j ifθ j and θ j−1 dier by a small amount.

Example 6. Let X1, · · · ,Xn be a sample from Cauchy distribution with PDF

f (x , θ) =1

π1 + (x − θ)2,

where θ is the location parameter. The log-likelihood is

l (θ) = −n∑i=1

log1 + (Xi − θ)2 − n log π .

Page 259: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

The MLE is the solution of s(θ) = 0, where

s(θ) = 2n∑i=1

Xi − θ

1 + (Xi − θ)2.

Since s(θ) = 0 does not admit an explicit solution, we adopt a Newton-Raphson scheme: θk+1 = θk − s(θk )/ Ûs(θk ), where

Ûs(θ) = 2n∑i=1

(Xi − θ)2 − 1

1 + (Xi − θ)22.

The R-function below implements the above scheme.

cauchyMLE <- function(n, theta, init, Tiny) x <- rcauchy(n, theta) # x is a samplei <- 0 # No. of iterationstheta0 <- init +10*Tinytheta1 <- initwhile(abs(theta1-theta0)>Tiny)

theta0 <- theta1

Page 260: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

x2 <- x-theta0x22 <- x2*x2t1 <- mean(x2/(x22+1)) # s(theta0)/(2n)t2 <- mean((x22-1)/(x22+1)^2) # derivation of s(theta0)/(2n)theta1 <- theta0 - t1/t2i <- i+1cat(i, "iteration:", theta1, "\n") # print out iteration values

cat("MLE:", theta1, "No. of iterations:", i, "\n")

By calling cauchyMLE(100,10,11.12,0.01), we excute the above iterativealgorithm as follows:

> source("cauchyMLE.r")> cauchyMLE(100, 10, 11.12, 0.01)1 iteration: 9.5948352 iteration: 10.087873 iteration: 10.07524 iteration: 10.07521

Page 261: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

MLE: 10.07521 No. of iterations: 4

Note the initial value is important: the iterations will not converge ifθ0 < 8.75 or θ0 > 11.2 on my PC.

Choosing a good initial value is always important. For this example, thePDF is symmetric around θ, it makes sense to consider either the samplemean or sample median as an initial estimate. However E (X1) is not well-dened, so the sample mean may not be a good estimator θ. Thus we mayuse the sample median as the initial value for our algorithm.

Page 262: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

The Fisher Scoring method: replace Ûs(θk ) in (6) by Eθ Ûs(θ) under θ = θk .So the algorithm is now

θk+1 = θk − [E Ûs(θk )]−1s(θk ) = θk + I(θk )

−1s(θk ),

where

I(θ) = Vars(θ) = −E Ûs(θ)

is the Fisher information.

Example 6 (continue). It can be shown that

E Ûs(θ) =2n

π

∫ ∞−∞

(x − θ)2 − 1

1 + (x − θ)23dx = −n/2.

Hence the Fisher scoring method is

θk+1 = θk +4

n

n∑i=1

Xi − θk

1 + (Xi − θk )2.

Page 263: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

cauchyMLEscoring <- function(n, theta, init, Tiny) # Fisher scoring MLE for Cauchy(theta)# n is sample size, Tiny is the tolerance limit controls# iteration estimates, init is the initial value used iteration

x <- rcauchy(n, theta)i <- 0 # No. of iterationstheta0 <- init +10*Tinytheta1 <- initwhile(abs(theta1-theta0)>Tiny)

theta0 <- theta1x2 <- x-theta0t1 <- mean(x2/(x2*x2+1)) # s(theta0)/(2n)theta1 <- theta0 + 4*t1i <- i+1cat(i, "iteration:", theta1, "\n") # print out iteration values

cat("\n", "MLE:", theta1, "No. of iterations:", i, "\n")

Calling cauchyMLEscoring(100, 10, 15, 0.1) yieldsMLE: 10.14528 No. of iterations: 7

Page 264: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Note now that the range of valid initial values is much bigger.

Page 265: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Like most iterative algorithms, the choice of appropriate initial values isimportant to ensure the convergence to right limits. In practice multipleinitial values are often used.

The dierences between the Newton-Raphson and Fisher scoring methodsare subtle. We make observations below

• The convergence of the Newton-Raphson algorithm is often fasterwhen both algorithms converge

• The radius of convergence for the Fisher scoring method is often larger,making the choice of initial values less important for the scoringmethod.

Page 266: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

9.2.5 EM algorithms

Goal: to nd the MLE θ = θ(Y) for θ from the likelihood based on data Y:

L(θ;Y) = fY(Y, θ),

while the ‘complete’ data X′ = (Y′, Z′) contain a ‘missing’ component Z. Thelikelihood based on the complete data is

L(θ;X) = fX(X, θ).

EM (Expectation and Maximisation) algorithms

• E-step: compute the conditional expectation

Q (θ) = Q (θ |Y, θ0) ≡ E logL(θ;X)|Y, θ0

Page 267: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

• M-step: maximise Q (θ) to give an updated value θ1

then go to the E-step using θ0 = θ1, and keep iterating until convergence.The limit of θ0 is taken as θ(Y).

Page 268: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Example 7. The genetic example from (Rao 1973, p.396) assumes that thephenotype data

Y = (Y1,Y2,Y3,Y4)′ ∼

Multinomial(4; 12+θ

4,1 − θ

4,1 − θ

4,θ

4

),

where θ ∈ (0, 1). Then log-likelihood isl (θ,Y) =Y1 log(2 + θ) + (Y2 +Y3) log(1 − θ) +Y4 log θ + C ,

which does not yield a closed form θ.

Now we treat Y as incomplete data from X = (X1, . . . ,X5)′ with multinomialprobabilities (1

2,θ

4,1 − θ

4,1 − θ

4,θ

4

).

Then

Y1 = X1 + X2, Yi = Xi+1 for i = 2, 3, 4.

Page 269: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

The log-likelihood of based on X isl (θ, X) = (X2 + X5) log θ + (X3 + X4) log(1 − θ) + C ,

which readily yields

θ(X) = X2 + X5X2 + X3 + X4 + X5

.

Now the E-step is to ndQ (θ) = E l (θ, X)|Y, θ0

= log θE (X2 + X5 |Y, θ0) + log(1 − θ)E (X3 + X4 |Y, θ0)= (X2 +Y4) log θ + (Y2 +Y3) log(1 − θ),

where X2 = E (X2 |Y, θ0). Since the conditional distribution of X2 givenY1(= X1 + X2) is a binomial distribution with n =Y1 and

p =θ0/4

1/2 + θ0/4=

θ02 + θ0

.

Page 270: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Hence

X2 = np =Y1θ02 + θ0

. (7)

The M-step leads to

θ1 =X2 +Y4

X2 +Y4 +Y2 +Y3. (8)

ForY = (125, 18, 20, 34) and the initial value

θ0 = 4 × 34/(125 + 18 + 20 + 34)

which is a relative frequency estimate, the rst 5 iterations between (7) and (8) are 0.690,0.635, 0.628, 0.627 and 0.627, giving the MLE θ = 0.627.

Page 271: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

What can be said about convergence properties of the EM algorithm?

Let θ0 be an arbitrary initial value and θ1 be the updated value obtainedfrom applying the iteration once. Then it can be shown that

L(θ1;Y) ≥ L(θ0;Y).

Unfortunately, it does not imply that the iterations will always lead to theMLE eventually.

It is important to choose appropriate initial values to ensure the algorithmconverges to the MLE. (This is also true for both Gaussian-Raphson andscore methods!) In practice, it is a good idea to use a variety of initialvalues.

Further discussion on the convergence of EM algorithms, see Wu (1983)Annals of Statistics, Vol.11, pp.95-103 and §12.4 of Pawitan (2001).

Page 272: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

General comments on EM algorithms:

• The EM algorithm is a general procedure for computing MLEs. It is not really anumerical algorithm. The calculation of M-Step typically involves other numericalalgorithms such as Newton-Raphson and the score methods.

• It can be applied when some data are genuinely missing. It can also be appliedwhen the missing information is merely a concept based on which we transform adicult optimisation problem into a sequence of easier problems; see Example 7above. This is particularly relevant when θ(Y) is dicult to calculate while θ(X) iseasier to obtain.

• The convergence of EM algorithm may be very slow, depending on the amount ofmissing information.

Page 273: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Example 8. Mixture distributionsLetY1, · · · ,Yn be i.i.d. with a mixture PDF

f (y ) =k∑j=1

αj fj (y ,λj ),

where fj are PDFs, αj ≥ 0 and∑j αj = 1. The parameter θ contains all αj

and λj . This model becomes important because

1. it represents heterogeneous data well since each fj representsone heterogeneous component, and2. it provides very good approximations to a large class of distri-butions.

The likelihood based on Y = (Y1, · · · ,Yn)′ is

L(θ;Y) =n∏i=1

k∑j=1

αj fj (Yi ,λj ).

Page 274: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Maximising this likelihood is dicult, due to the presence of the summa-tions, which reect the fact that we are typically lacking in knowledge ofwhich component any particular sample value comes from. This is themissing information!

Let X′i= (Yi , Z′i ), where Zi = (Zi1, · · · , Zi k )

′ is a k × 1 vector with a 1 in theposition corresponding to the component of the mixture that Yi comesfrom, and 0 elsewhere.

Let ej be the k × 1 vector with the j -th component 1 and all the othercomponents 0. The joint probability-density function for (Z1,Y1) is

P Z1 = e` , Y1 ∈ [y1, y1 + dy1)/dy1= P (Z1 = e` ) P Y1 ∈ [y1, y1 + dy1)|Z1 = e` /dy

= α`f` (y1,λ` ) =k∏j=1

αe`jjfj (y1,λj )

e`j ,

Page 275: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

where e` = (e`1, · · · , e`k )′. Let X′ = (X′1, · · · , X′n).

L(θ;X) =n∏i=1

k∏j=1

αZi jjfj (Yi ,λj )

Zi j ,

l (θ;X) =n∑i=1

Z′i

©­«logα1...

logαk

ª®¬ + ©­«log f1(Yi ,λ1)

...log fk (Yi ,λk )

ª®¬ .

Note that

E (Zi |Y, θ0) =( α01f1(Yi ,λ

01)∑k

`=1 α0`f`(Yi ,λ

0`), · · · ,

α0kfk (Yi ,λ

0k )∑k

`=1 α0`f`(Yi ,λ

0`)

)′≡

(b1(Yi , θ0), · · · , bk (Yi , θ0)

)′,

which are constants as far as the M-step is concerned.

Page 276: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Now the E-step implies that

Q (θ) =n∑i=1

E (Z′i |Y, θ0)©­«

logα1...

logαk

ª®¬ + ©­«log f1(Yi ,λ1)

...log fk (Yi ,λk )

ª®¬ ,

The M-step requires to maximise Q (θ), i.e.

θ1 = argmaxθQ (θ).

Note that λ1j and α1

j can be evaluated separately, which is much easier

than minimizing l (θ,Y) directly. For example,

α1j =

∑ki=1 bj (Yi , θ0)∑k

`=1

∑ki=1 b` (Yi , θ0)

, j = 1, · · · , k .

When fj is a normal PDF, λ1j admits an explicit formula.

Page 277: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Let θ0 = θ1, keep iterating between E-step and M-step until two successivevalues of θ1 dier by a small amount.

Page 278: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

9.3 Evaluating estimation

To measure the accuracy of an MLE or, more general, any estimation pro-cedure, we need to dene some measures for the goodness (or badness)of an estimator.

Let θ = θ(X) be an estimator of θ, and θo be the (unknown) true value ofθ. Note that

(i) exact estimation error θ − θo is unknown, and(ii) θ is a random variable

we have to gauge the error

(i) in terms of a probability average, and(ii) for all possible values of θo ∈ Θ.

Page 279: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Let Pθ , Eθ and Varθ denote the probability distribution, expectation andvariance under θo = θ.

Bias: Biasθ(θ) = Eθ(θ) − θ

Variance: Varθ(θ)

Standard deviation: Varθ(θ)1/2

Standard error: Varθ(θ)1/2

Mean square error (MSE): Eθ(θ − θ)2

Mean absolute error (MAE): Eθ |θ − θ |

Note that

• standard error is ameaningful measure of accuracy for (approximately)unbiased estimators only, and

Page 280: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

• MSE (or its squared-root) should be used in general as

MSEθ(θ) = Biasθ(θ)2 + Varθ(θ).

Ideally we would seek for the estimator which minimises MSE or MAE forall θ ∈ Θ over all possible candidate estimators. Unfortunately such aglobal optimum rarely exists. However if we conne to some subclass ofestimators, the MLE is often optimal or asymptotically optimal.

The MSE is most frequently used largely due to is technical tractabilitywhile the MAE leads to estimators which is more robust against outliers inobservations.

Page 281: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Fisher Information

Suppose X ∼ f (x, θ). The score function is

s(θ) = Ûl (θ;X) = ∂

∂θlogf (X, θ).

We assume certain regularity conditions so that we can take derivativesunder the integral sign.

Mean of s(θ):

Eθs(θ) =

∫s(θ)f (x, θ)dx =

∫∂

∂θlogf (x, θ)f (x, θ)dx

=

∫∂

∂θf (x, θ)dx = ∂

∂θ

∫f (x, θ)dx = 0.

Page 282: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Variance of s(θ) — Fisher information matrix:

I(θ) = IX(θ) = Varθs(θ) = Eθs(θ)s(θ)′ = −Eθ[ ∂2

∂θ∂θ′logf (X, θ)

],

because

Eθ ∂2

∂θ∂θ′l (θ)

=

∫ÜL(θ)L(θ) − ÛL(θ) ÛL(θ)′

L(θ)dx =

∫ÜL(θ)dx −

∫ÛL(θ) ÛL(θ)′

L(θ)dx

= −

∫ÛL(θ) ÛL(θ)′

L(θ)dx = −

∫s(θ)s(θ)′f (x, θ)dx = −Eθs(θ)s(θ)′.

Fisher information I(θ) measures the information on θ contained in dataX. Further if X = (X1, · · · ,Xn)′, and X1, · · · ,Xn are IID,

I(θ) = IX(θ) =n∑j=1

IXj (θ) = nIX1,

Page 283: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

i.e. the information is additive.

For θ = θ is a scalar, the Fisher information is

I(θ) = Eθs(θ)2 = −EθÜl (θ).

Theorem 2. (Cramér-Rao inequality)Let X ∼ f (·, θ) which satisfying some regularity conditions. Let T = T (X) bea statistic with g (θ) = Eθ(T ). Then for any θ ∈ Θ,

Varθ(T ) ≥ Ûg (θ)2/I(θ).

The Cramér-Rao inequality species a lower bound for any unbiased esti-mator for the parameter g (θ). When the equality holds, T is the minimumvariance unbiased estimator (MVUE) of g (θ).

Page 284: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Important case: For any unbiased estimator θ = θ(X),

Var(θ) ≥ 1/I(θ).

Multivariate case: For any unbiased estimator θ = θ(X), Var(θ) − I(θ)−1

is a non-negative denite matrix. Hence Var(θj ) ≥ I j j (θ), where θj is thej -th component of θ, and I j j (θ) is the (j , j )-th element of I(θ)−1.

Page 285: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Example 9. Let X1, · · · ,Xn be a sample from N (µ,σ2). We consider estima-tors for µ, treating σ2 as known. The score function (for one observation)is

s(µ;X1) =∂

∂µlog[e−

12σ2(X1−µ)

2

/

√2πσ2]

=∂

∂µ[−

1

2σ2(X1 − µ)

2] = (X1 − µ)/σ2.

Note Ül (µ) = Ûs(µ) = −σ−2. Hence the Fisher information based on a singleobservation is IX1(µ) = σ

−2. Therefore

I(µ) = IX1,··· ,Xn (µ) = n/σ2.

For any unbiased estimator µ for µ, it holds that

Varµ(µ) ≥ σ2/n,

which is the variance of X . Hence X is the MVUE for µ.

Page 286: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Asymptotic properties of MLEs

Let X1, · · · ,Xn be i.i.d. with PDF f (·, θ). Write

l (θ) = l (θ;X) =n∑j=1

log f (Xj , θ).

Let θ be the MLE which maximises l (θ). Suppose f fulls certain regularityconditions.

(a) Consistency.The MLE is consistent in the sense that as n →∞,

Pθ| |θ − θ | | > ε → 0

for any ε > 0.

Page 287: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Consistency requires that an estimator converges to the parameter to beestimated. It is a very mild and modest condition that any reasonableestimator should full. The consistency condition is often used to rule outbad estimators.

Page 288: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

(b) Asymptotic normalityAs n →∞,

n1/2(θ − θ)D−→ N

(0, IX1(θ)

−1) .For large n , it holds approximately that

θ ∼ N(θ, IX1(θ)

−1/n).

Therefore asymptotically the MLE is unbiased and attains the Cramér-Raolower bound. Any estimator fullling this condition is called ecient.

An approximate standard error of the j -th component of θ is the square-root of the (j , j )-th element of IX1(θ)

−1 divided by n1/2.

Page 289: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Bootstrapping MSEs — Parametric bootstrap

An MSE provides a measure for the accuracy of the estimator. But it is notalways feasible to derive an explicit expression for MSE. Furthermore itdepends on the unknown parameter. Alternatively we may estimate theMSE by Bootstrapping.

Let X1, · · · ,Xn be i.i.d. with PDF f (·, θ), where θ is a scalar. Let

θ = T (X1, · · · ,Xn)

be an estimator. The goal here is to estimate

ν ≡ MSEθo (θ)1/2,

where θo is the true value.

Page 290: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

If we knew f (·, θo) completely, ν is known in principle, andmay be estimatedeasily via a repeated sampling as follows. We draw B independent samplesof size n from f (·, θo). For each sample, we calculate θ, obtaining θ1, · · · , θB .Then the sample root-MSE 1

B

B∑b=1

(θb − θo)21/2

is a reasonable estimator for ν. By the LLN, this estimator converges to νas B →∞.

The basic idea of parametric bootstrap is to adopt the above samplingprocedure in the so-called bootstrap world: now the population is f (·, θ)which is known. We draw a sample denoted as (X ∗1 , · · · ,X

∗n ) from this

distribution. Dene the bootstrap version of the estimator

θ∗ = T (X ∗1 , · · · ,X∗n ).

Page 291: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Then the quantity

ν∗ = MSEθ(θ∗)1/2

is known in principle since the distribution f (·, θ) is completely known.Dene ν∗ as a bootstrap estimator for ν.

In practice, we draw B sets samples from f (·, θ), forming B bootstrapversions of estimator θ∗1, · · · , θ

∗B. The ν∗ is calculated as

ν∗ = 1B

B∑j=1

(θ∗j − θ)21/2.

Remark. The bootstrap methods introduced in Chapter 8 are in the cate-gory of Nonparametric Bootstrap. If we know the form of the underlyingdistribution, parametric bootstrap methods are typically more ecient.

Page 292: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Real World Bootstrap World

Populationf(·, θo)unknown

Populationf(·, θ)known

Estimatorθ = T (X1, · · · , Xn)

Estimatorθ∗ = T (X∗

1, · · · , X∗

n)

Errorθ − θo

unknown

Errorθ∗ − θ

known

Estimating

Bootstrap is a powerful tool for statistical inference. It has dierent formsfor dierent applications. The diagram above indicates that the basic ideaof a parametric bootstrap method.

Page 293: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Chapter 10. Hypothesis Testing (I)

Hypothesis Testing, together with statistical estimation, are the two mostfrequently used statistical inference methods. It addresses a dierenttype of practical problems from statistical estimation.

10.1 Basic idea, p-values

Based on the data, a (statistical) test is to make a binary decision on awell-dened hypothesis, denoted as H0:

Reject H0 or Not reject H0

Page 294: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Consider a simple experiment: toss a coin n times.

Let X1, · · · ,Xn be the outcomes: Head – Xi = 1, Tail – Xi = 0

Probability distribution: P (Xi = 1) = π = 1 − P (Xi = 0), π ∈ (0, 1)

Estimation: π = X = (X1 + · · · + Xn)/n .

Test: to assess if a hypothesis such as “a fair coin” is true or not, whichmay be formally represented as

H0 : π = 0.5.

The answer cannot be resulted from the estimator π

Page 295: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

If π = 0.9, H0 is unlikely to be true

If π = 0.45, H0 may be true (and also may be untrue)

If π = 0.7, what to do then?

Page 296: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

A customer complaint: the amount of coee in a Hilltop coee bottle isless than the advertised weight 3 pounds.

Sample 20 bottles, yielding the average 2.897

Is this sucient to substantiate the complaint?

Again statistical estimation cannot provide a satisfactory answer, due torandom uctuation among dierent samples

We cast the problem into a hypothesis testing problem:

Let the weight of coee be a normal random variable X ∼ N (µ,σ2). Weneed to test the hypothesis µ < 3. In fact, we use the data to test thehypothesis

H0 : µ = 3 (or H0 : µ ≥ 3)If we could reject H0, the customer complaint will be vindicated.

Page 297: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Suppose one is interested in estimating the mean income of a community.Suppose the income population is normal N (µ, 25) and a random sampleof n = 25 observations is taken, yielding the sample mean X = 17.

Three expert economists give their own opinions as follows:

• Mr A claims the mean income µ = 16

• Mr B claims the mean income µ = 15

• Mr C claims the mean income µ = 14

How would you assess those experts’ statements?

Note. X ∼ N (µ,σ2/n) = N (µ, 1) — we assess the statements based on thisdistribution.

Page 298: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

If Mr A’s claim were correct,X ∼ N (16, 1).

The observed value X = 17 isone standard deviation away fromµ, and may be regarded as a typi-cal observation from the distribution.

Little inconsistency between theclaim and the data evidence.

Page 299: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

If Mr B’s claim were correct,X ∼ N (15, 1).

The observed value X = 17 beginsto look a bit extreme, as it is twostandard deviation away from µ.

Inconsistency between the claim andthe data evidence.

Page 300: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

If Mr C’s claim were correct,X ∼ N (14, 1).

The observed value X = 17 is ex-treme indeed, as it is three standarddeviation away from µ.

Strong inconsistency between theclaim and the data evidence.

Page 301: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

A measure of the discrepancy between the hypothesised (claimed) valuefor µ and the observed value X = x is the probability of observing X = x

or more extreme values. This probability is called the p-value. That is

• under H0 : µ = 16,P (X ≥ 17) + P (X ≤ 15) = P (|X − 16| ≥ 1) = 0.317

• under H0 : µ = 15,P (X ≥ 17) + P (X ≤ 13) = P (|X − 15| ≥ 2) = 0.046

• under H0 : µ = 14,P (X ≥ 17) + P (X ≤ 11) = P (|X − 14| ≥ 3) = 0.003

In summary, we reject the hypothesis µ = 15 or µ = 14, as, for example, ifthe hypothesis µ = 14 is true, the probability of observing X = 17 or moreextreme values is merely 0.003. We are comfortable with this decision, asa small probability event would not occur in a single experiment.

Page 302: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

On the other hand, we cannot reject the hypothesis µ = 16.

But this does not imply that this hypothesis is necessarily true, as, forexample, µ = 17 or 18 are at least as likely as µ = 16.

Not Reject , Accept

A statistical test is incapable to accept a hypothesis.

Page 303: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

p-value: the probability of theevent that a test statistic takesthe observed value or moreextreme (i.e. more unlikely)values under H0

It is a measure of the discrepancybetween a hypothesis and data.p-value small: hypothesis is notsupported by datap-value large: hypothesis is notinconsistent with data

p-value may be seen as arisk measure of rejecting hy-pothesis H0

Page 304: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

General setting of hypothesis test

Let X1, · · · ,Xn be a random sample from a distribution F (·, θ). We areinterested in testing the hypotheses

H0 : θ = θ0 vs H1 : θ ∈ Θ1,where θ0 is a xed value, Θ1 is a set, and θ0 < Θ1.

• H0 is called a null hypothesis

• H1 is called an alternative hypothesis

Signicance level α : a small number between 0 and 1 selected subjectively.

Often we choose α = 0.1, 0.05 or 0.01, i.e. tests are often conducted asthe signicance levels 10%, 5% or 1%.

Decision: Reject H0 if p-value ≤ α

Page 305: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Statistical testing procedure:

Step 1. Find a test statistic T = T (X1, · · · ,Xn). Denote T0 the value of Twith the given sample of observations.

Step 2. Compute the p-value, i.e.

p = Pθ0(T = T0 or more extreme values ),

where Pθ0 denotes the probability distribution with θ = θ0.

Step 3. If p ≤ α , reject H0. Otherwise, H0 is not rejected.

Page 306: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Remarks. 1. The alternative hypothesis H1 is helpful to identify powerfultest statistic T .

2. The signicance level α controls how small is small for p-values.

3. “More extreme values" refers to those more unlikely values (than T0)under H0 in favour of H1.

Page 307: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Example 1. Let X1, · · · ,X20, taking values either 1 or 0, be the outcomesof an experiment of tossing a coin 20 times, i.e.

P (Xi = 1) = π = 1 − P (Xi = 0), π ∈ (0, 1).

We are interested in testing

H0 : π = 0.5 against H1 : π , 0.5.

Suppose there are 17 X ′is taking value 1, and 3 taking value 0. Will you

reject the null hypothesis at the signicance level 5%?

LetY = X1 + · · · + X20. ThenY ∼ Bi n(20, π). We useY as the test statistic.

With the given sample, we observe Y = 17. What are the more extremevalues forY if H0 is true?

Page 308: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Under H0, EY = nπ0 = 10. Hence 3 is as extreme as 17, and the moreextreme values are

18, 19, 20, and 0, 1, 2.

Thus the p-value is( 3∑i=0

+20∑i=17

)PH0(Y = i )

=( 3∑i=0

+20∑i=17

) 20!i !(20 − i )!(0.5)

i (1 − 0.5)20−i

= 2 × (0.5)203∑i=0

20!i !(20 − i )!

= 2 × (0.5)20 × 1 + 20 + 20 × 19/2 + 20 × 19 × 18/(2 × 3)

= 0.0026.

Page 309: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Hence we reject the hypothesis of a fair coin at the signicance level 1%.

Impact of H1

In the above example, if we test

H0 : π = 0.5 against H1 : π > 0.5.

We should only rejectH0 if there is strong evidence againstH0 in favour of H1.Having observedY = 17, the more extreme values are 18, 19 and 20. There-fore the p-value is

∑17≤i≤20 PH0(Y = i ) = 0.0013. Now the evidence against

H0 is even stronger.

Page 310: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

On the other hand, if we test

H0 : π = 0.5 against H1 : π < 0.5.

The observation Y = 17 is more in favour of H0 rather than H1 now. Wecannot reject H0, as the p-value now is

∑i≤17 PH0(Y = i ) = 1 − 0.0013 =

0.9987.

Remark. We only reject H0 if there is signicance evidence in favour of H1.

Page 311: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Two types of errors

Statistical tests are often associated with two kinds of errors, which aredisplayed in the table below.

Decision MadeH0 not rejected H0 rejected

True State H0 Correct decision Type I Errorof Nature H1 Type II Error Correct decision

Remarks. 1. Ideally we would like to have a test that minimises theprobabilities of making both types of errors, which unfortunately is notfeasible.

2. The probability of making Type I error is the p-value and is not greaterthan α – the signicance level. Hence it is under control.

Page 312: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

3. We do not have an explicit control on the probability of Type II error. Fora given signicance level α , we choose a test statistic such that, hopefully,the probability of Type II error is small.

4. Power. The power function of the test is dened as

β (θ) = Pθ H0 is rejected, θ ∈ Θ1,

i.e. β (θ) = 1− Probability of Type II error.

5. Asymmetry: null hypothesis H0 and alternative hypothesis H1 are nottreated equally in a statistical test. The choice of H0 is based on the subjectmatter concerned and/or technical convenience.

6. It is more conclusive to end a test with H0 rejected, as the decision of“Not Reject" does not imply that H0 is accepted.

Page 313: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

10.2 The Wald test

Suppose we would like to test H0 : θ = θ0, and θ = θ(X1, · · · ,Xn) is anestimator and is asymptotically normal, i.e.

(θ − θ)/SE(θ) D−→ N (0, 1), as n →∞.

Then under H0, (θ − θ0)/SE(θ) ∼ N (0, 1) approximately.

Page 314: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

The Wald test at the signicance levet α : LetT = (θ − θ0)/SE(θ) be the teststatistic. We reject H0 against

H1 : θ , θ0 if |T | > zα/2 (i.e. the p-value < α ), orH1 : θ > θ0 if T > zα (i.e. the p-value < α ), orH1 : θ < θ0 if T < −zα (i.e. the p-value < α ),

where zα is the top-α point of N (0, 1), i.e. P N (0, 1) > zα = α .

Remark. Since the Wald test is based on the asymptotic normality, it onlyworks for reasonably large n .

Page 315: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Example 2. To deal with the customer complaint that the amount of coeein a Hilltop coee bottle is less than the advertised 3 pounds, 20 bottleswere weighed, yielding observations

2.82, 3.01, 3.11, 2.71, 2.93, 2.68, 3.02, 3.01, 2.93, 2.56,2.78, 3.01, 3.09, 2.94, 2.82, 2.81, 3.05, 3.01, 2.85, 2.79

The sample mean and standard deviation:

X = 2.897, S = 0.148

Hence SE(X ) = 0.148/√20 = 0.033. By the CLT, (X − µ)/SE(X ) D

−→ N (0, 1).

To test H0 : µ = 3 vs H1 : µ < 3, we apply the Wald test with T = (X −3)/SE (X ) = −3.121 < −z0.01 = −2.326. Hence we reject H0 : µ = 3 at the1% signicance level.

We conclude that there is signicant evidence which supports the claimthat the coee in a Hilltop coee bottle is less than 3 pounds.

Page 316: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

10.3 χ2-distribution and t -distribution

χ2-Distributions

Background. χ2-distribution is one of the important distributions in statis-tics. It is closely linked with normal, t - and F -distributions. Inference forvariance parameter σ2 relies on χ2-distributions. More importantly mostgoodness-of-t tests are based on χ2-distributions.

Denition. Let X1, · · · ,Xk be independent N (0, 1) r.v.s. Let

Z = X 21 + · · · + X

2k =

k∑i=1

X 2i .

The distribution of Z is called the χ2-distribution with k degrees of free-dom, denoted by χ2(k ) or χ2

k.

Page 317: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

We list some properties of the distribution χ2kas follows.

1. χ2kis a continuous distribution on [0,∞).

2. Mean: E Z = kE (X 21 ) = k .

3. Variance: Var(Z ) = 2k .

Due to the independence among Xi ’s,

Var(Z ) = kVar(X 21 ) = k [E (X

41 ) − E (X

21 )

2] = k E (X 41 ) − 1.

Page 318: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

E (X 41 ) =

1√2π

∫ ∞−∞

x4e−x2/2dx =

1√2π

∫ ∞−∞

x3e−x2/2d (x2/2)

= −x3√2πe−x

2/2∞−∞

+3√2π

∫ ∞−∞

x2e−x2/2dx =

3√2π

∫ ∞−∞

x2e−x2/2dx

=3√2π

∫ ∞−∞

xe−x2/2d (x2/2) = −

3x√2πe−x

2/2∞−∞

+3√2π

∫ ∞−∞

e−x2/2dx

=3√2π

∫ ∞−∞

e−x2/2dx = 3.

4. If Z1 ∼ χ2k , Z2 ∼ χ2p , and Z1 and Z2 are independent, then Z1 + Z2 ∼

χ2k+p

.

Page 319: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

According to the denition, we may write

Z1 =k∑i=1

X 2i , Z2 =

k+p∑j=k+1

X 2j ,

where all Xi ’s are independent N (0, 1) r.v.s. Hence

Z1 + Z2 =

k+p∑i=1

X 2i ∼ χ

2k+p .

Page 320: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

5. The probability density function of χ2kis

f (x ) =

1

2k /2Γ(k /2)xk /2−1e−x/2 x > 0,

0 otherwise,

where

Γ(y ) =

∫ ∞0

uy−1e−udu .

For any integer k , Γ(k ) = (k − 1)!.

Hence χ22 is the exponential distribution with mean 2, as its pdf is

f (x ) =

12e−x/2 x > 0,

0 otherwise.

Page 321: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Probability density functions of χ2k-distributions

0 2 4 6 8

0.0

0.1

0.2

0.3

0.4

0.5

0.6

k=1k=2k=4k=6

0 10 20 30 40 500.

00.

020.

040.

060.

080.

10

k=10k=20k=30k=40

Page 322: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

6. The values of distribution functions of χ2k-distributions (for dierent k )

have been tabulated, and can also be easily obtained from statisticalpackages such as R.

Page 323: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

LetY1, · · · ,Yn be independent N (µ, σ2) r.v.s. Then

(Yi − µ)/σ ∼ N (0, 1).

Hence1

σ2

n∑i=1

(Yi − µ)2 ∼ χ2n .

Note that1

σ2

n∑i=1

(Yi − µ)2 =

1

σ2

n∑i=1

(Yi − Y )2 +

n

σ2(Y − µ)2. (9)

Since Y ∼ N (µ, σ2/n), nσ2(Y − µ)2 ∼ χ21 . It may be proved that

1

σ2

n∑i=1

(Yi − Y )2 ∼ χ2n−1.

Thus decomposition (9) may be formally written asχ2n = χ

2n−1 + χ

21 .

Page 324: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Condence Interval for σ2

Let X1, · · · ,Xn be a random sample from population N (µ,σ2).

Let M =∑ni=1(Xi − X )

2. Then M /σ2 ∼ χ2n−1.

For any given small α ∈ (0, 1), we may nd 0 < K1 < K2 such that

P (χ2n−1 < K1) = P (χ2n−1 > K2) = α/2,

where χ2n−1 stands for a r.v. with χ

2n−1-distribution. Then

1 − α = P (K1 < M /σ2 < K2) = P (M /K2 < σ

2 < M /K1)

Hence an 100(1 − α)% condence interval for σ2 is

(M /K2, M /K1).

Page 325: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Suppose n = 15 and the sample variance S2 = 24.5. Let α = 0.05.

From a table of χ2-distributions, we may nd

P (χ214 < 5.629) = P (χ214 > 26.119) = 0.025.

Hence a 95% condence interval for σ2 is

(M /26.119, M /5.629) = (14S2/26.119, 14S2/5.629)

= (0.536S2, 2.487S2) = (13.132, 60.934).

In the above calculation, we have used the formula

S2 =1

n − 1

∑i

(Xi − X )2 =

1

n − 1M = M /14.

Page 326: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Student’s t -distribution

Background. Another important distribution in statistics

• The t -test is perhaps the most frequently used statistical test in ap-plication.

• Condence intervals for normal mean with unknown variance may beaccurately constructed based on t -distribution.

Historical note. The t -distribution was rst studied by W.S. Gosset (1876-1937), who worked as a statistician for Guinness, writing under the pen-name ‘Student’.

Page 327: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Denition. Suppose X ∼ N (0, 1) and Z ∼ χ2k, and X and Z are indepen-

dent. Then the distribution of the random variable

T = X/√Z /k

is called the t -distribution with k degrees of freedom, denoted by tk ort (k ).

We now list some properties of the tk distribution below.

1. tk is a continuous and symmetric distribution on (−∞,∞).

(T and −T share the same distribution.)

2. E (T ) = 0 provided E |T | < ∞.

Page 328: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

3. Heavy tails. If T ∼ tk , E |T |k = ∞. For X ∼ N (µ,σ2), E |X |p < ∞for any p > 0. Therefore, t -distributions have heavier tails. This is auseful properties in modelling abnormal phenomena in nancial orinsurance data.

Note. E |T |k−ε < ∞ for any small constant ε > 0.

4. As k → ∞, the distribution of tk converges to the distribution ofN (0, 1).

For Z ∼ χ2k, Z = X 2

1 + · · · + X2k, where X1, · · · ,Xk are i.i.d. N (0, 1). By

the LLN, Z /k → E (X 21 ) = 1. Thus T = X /

√Z /k → X ∼ N (0, 1).

5. The probability density function of tk :

f (x ) =Γ(k+12 )√k πΓ(k2 )

(1 +

x2

k

)−k+12 .

Page 329: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Probability density functions of tk -distributions

−2 0 2

0.0

0.1

0.2

0.3

0.4

N(0,1)k=1k=3k=8k=20

Page 330: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

An important property of normal samples

Theorem. Let X1, · · · ,Xn be a sample from N (µ, σ2). Let

X =1

n

n∑i=1

Xi , S2 =1

n − 1

n∑i=1

(Xi − X )2, SE(X ) = S

√n.

Then(i) X ∼ N (µ, σ2/n), and (n − 1)S2/σ2 ∼ χ2n−1,(ii) X and S2 are independent, and therefore

√n(X − µ)

S=X − µ

SE(X )∼ tn−1.

The t -interval — an accurate (1 − α) condence interval for µ:(X − tα/2,n−1

S√n, X + tα/2,n−1

S√n

)= (X − tα/2,n−1 · SE(X ), X + tα/2,n−1 · SE(X )),

where tα/2,n−1 is a constant such that P (tn−1 > tα/2,n−1) = α/2.

Page 331: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Proof of Theorem. LetYi = (Xi − µ)/σ . Then Y = (X − µ)/σ , and

S2y ≡1

n − 1

n∑i=1

(Yi − Y ) = S2/σ2.

Hence we only need to show that (a) (n − 1)S2y ∼ χ2n−1, and (b) Y and S2yare independent.

As Y ≡ (Y1, · · · ,Yn)′ ∼ N (0, In), it also holds that

Z ≡ (Z1, · · · , Zn)′ ≡ ΓY ∼ N (0, In) for any orthogonal Γ.

Let ( 1√n, · · · , 1√

n) be the rst row of Γ. Then Z1 =

√nY . Hence

(n − 1)S2y =n∑i=1

Y 2i − nY2 =

n∑i=1

Z 2i − nY2 =

n∑i=2

Z 2i ∼ χ2n−1,

and it is independent of Z1 =√nY .

Page 332: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

The t -distributions with dierent degrees of freedom have been tabulatedin all statistical tables.

The table below lists some values of Cα dened by the equation

P (tk > Cα ) = α

α = 0.05 α = 0.025 α = 0.005k = 1 6.314 12.706 63.657k = 2 2.593 4.303 9.925k = 3 2.353 3.182 5.841k = 10 1.812 2.228 3.169k = 20 1.725 2.086 2.845k = 120 1.658 1.980 2.617· · · · · ·

N (0, 1) 1,645 1.960 2.576

Remark. When k ≥ 120, tk ≈ N (0, 1).

Page 333: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

10.4 t -tests – one of the most frequently used tests in practice.

10.4.1 Tests for normal means – One-sample problems

Let X1, · · · ,Xn be a sample from N (µ, σ2), where both µ and σ2 > 0 areunknown. Test the hypotheses

H0 : µ = µ0 against H1 : µ , µ0,where µ0 is known.

The famous t -statistic:

T =√n(X − µ0)/S =

√n(X − µ0)

/ 1

n − 1

n∑i=1

(Xi − X )21/2 = X − µ0

SE(X ),

where X = n−1∑i Xi and S2 = 1

n−1

∑i (Xi −X )

2. Note that under hypothesisH0,

√n(X − µ0)/σ ∼ N (0, 1), (n − 1)S2/σ2 ∼ χ2n−1.

Page 334: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Therefore

T =

√n(X − µ0)

S=

√n(X − µ0)/σ√

S2/σ2∼ tn−1 under H0.

Hence we reject H0 if |T | > tα/2,n−1, where α is the signicance level ofthe test, and tα ,k is the top-α point of tk -distribution, i.e. P (tk > tα ,k ) = α .

Remark. H0 : µ = µ0 is rejected against H1 : µ , µ0 at the α signicancelevel i µ0 lies outside the (1 − α) t -interval X ± tα/2,n−1SE(X ).

Page 335: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Example 2. (Continue) We use t -test to re-examine this data set. Recall

n = 20, X = 2.897, S = 0.148, SE(X ) = 0.033,

we are interested in testing hypotheses

H0 : µ = 3, H1 : µ < 3.

We reject H0 at the level α if T < −tα ,19. Since T = (X − 3)/SE(X ) =−3.121 < −t0.01,19 = −2.539, we reject the null hypothesis H0 : µ = 3 at 1%signicance level.

Page 336: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

10.4.2 Tests for normal means – two-sample problems

Available two independent samples: X1, · · · ,Xnx from N (µx ,σ2x ) and

Y1, · · · ,Yny from N (µy ,σ2y ). We are interested in testing

H0 : µx − µy = δ against H1 : µx − µy , δ (or µx − µy > δ etc),where δ is a known constant. Let

X =1

nx

nx∑i=1

Xi , S2x =1

nx − 1

nx∑i=1

(Xi − X )2,

Y =1

ny

ny∑i=1

Yi , S2y =1

ny − 1

ny∑i=1

(Yi − Y )2.

Then

X − Y ∼ N (µx − µy ,σ2xnx

+σ2y

ny), (nx − 1)

S2x

σ2x+ (ny − 1)

S2y

σ2y∼ χ2nx+ny−2

.

Page 337: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

With an addition assumption σ2x = σ2y , it holds that√nx + ny − 2

1/nx + 1/ny

X − Y − (µx − µy )√(nx − 1)S

2x + (ny − 1)S

2y

∼ tnx+ny−2

Dene a t -statistic

T =

√nx + ny − 2

1/nx + 1/ny

X − Y − δ√(nx − 1)S

2x + (ny − 1)S

2y

The null hypothesis H0 : µx − µy = δ is rejected againstH1 : µx − µy , δ if |T | > tα/2, nx+ny−2, orH1 : µx − µy > δ if T > tα , nx+ny−2, orH1 : µx − µy < δ if T < −tα , nx+ny−2,

where tα , k is the top-α point of the tk -distribution.

Page 338: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Example 3. Two types of razor, A and B, were compared using 100 men inan experiment. Each man shaved one side, chosen at random, of his faceusing one razor and the other side using the other razor. The times takento shave, Xi andYi minutes, i = 1, · · · , 100, corresponding to the razors Aand B respectively, were recorded, yielding

X = 2.84, S2x = 0.48, Y = 3.02, S2y = 0.42.

Also available is the sample variance of the dierences Zi ≡ Xi − Yi :S2z = 0.6.

Test, at the 5% signicance level, if the two razors lead to dierent shavingtimes. State clearly the assumptions used in the test.

Page 339: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Assumption. Suppose Xi and Yi are two samples from, respectively,N (µx , σ

2x ) and N (µy , σ2y ).

The problem requires to test hypotheses

H0 : µx = µy vs H1 : µx , µy .

There are three approaches: a pairwise comparison method, two two-sample comparisons based on dierent assumptions. Since the data arerecorded pairwisely, the pairwise comparison ismost relevant and eectiveto analyse this data

Page 340: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Method I: Pairwise comparison — one sample t -test

Note Zi = Xi −Yi ∼ N (µz , σ2z ) with µz = µx − µy . We test

H0 : µz = 0 vs H1 : µz , 0.

This is the standard one-sample t -test,

√nZ − µzSz

=X − Y − (µx − µy )

Sz/√n

∼ tn−1.

H0 is rejected if |T | > t0.025, 99 = 1.98, where

T =√nZ

/Sz =

√100(X − Y )/Sz .

With the given data, we observe T = 10(2.84 − 3.02)/√0.6 = −2.327. Hence

we reject the hypothesis that the two razors lead to the same shaving time.

Page 341: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

A 95% condence interval for µx − µy :

X − Y ± t0.025, n−1Sz/√n = −0.18 ± 0.154 = (−0.334, −0.026).

Remark. (i) Zero is not in the condence interval for µx − µy .

(ii) t0.025, 99 = 1.98 is pretty close to z0.025 = 1.96. Indeed when n is large,the t -test and the Wald test are almost the same.

Page 342: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Method II: Two sample t -test with equal but unknown variance

Additional assumption: two samples are independent, σ2x = σ2y .

Now X − Y ∼ N (µx − µy , σ2x/50), 99(S2x + S2y )/σ2x ∼ χ2198. Hence√50X − Y − (µx − µy )√

99(S2x + S2y )/198

= 10 ×X − Y − (µx − µy )√

S2x + S2y

∼ t198

Hence we reject H0 if |T | > t0.025, 198 = 1.97 where

T = 10(X − Y )/√

S2x + S2y .

For the given data, T = −1.897. Hence we cannot reject H0.

A 95% condence interval for µx − µy contains 0:

(X − Y ) ±t0.025, 198

10

√S2x + S

2y = −0.18 ± 0.1870 = (−0.367, 0.007),

Page 343: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Method III: The Wald test — The normality assumption is not required. Butthe two samples are assumed to be independent. Note

SE(X − Y ) =√S2x/n1 + S

2y/n2.

Hence it holds approximately that

X − Y − (µx − µy )/SE(X − Y ) ∼ N (0, 1).

Hence, we reject H0 when |T | > 1.96 at the 95% signicance level, where

T = (X − Y )/√

S2x/100 + S2y/100.

For the given data, T = −0.18/√0.009 = −1.9. Hence we cannot reject H0.

An approximate 95% condence interval for µx − µy is

X − Y ± 1.96 ×√S2x/100 + S

2y/100 = −0.18 ± 0.186 = (−0.366, 0.006).

The value 0 is contained in the interval now.

Page 344: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Remarks. (i) Dierent methods lead to dierent but not contradictoryconclusions, as

Not reject , Accept(ii) The pairwise comparison is intuitively most relevant, and leads tomost conclusive inference (i.e. rejection). It also produces the shortestcondence interval.(iii) Methods II and III ignore the pairing of the data, and therefore fail totake into account the variation due to the dierent individuals. Conse-quently the inference is less conclusive and less accurate.(iv) A general observation: H0 is rejected i the hypothesized value by H0is not in the corresponding condence interval.(v) It is much more challenging to compare two normal means with un-known and dierent variances, which is not discussed in this course. Onthe other hand, the Wald test provides an easy alternative when both nxand ny are large.

Page 345: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

10.4.3. t -tests with R

The R-function t.test performs one-sample, or two-sample t -tests withone-sided or two-sided alternatives. We illustrate it via an example.

Example 4. The daily returns of the Shanghai Stock Exchange CompositeIndex in 1999 and 2009: two subsets of the data analysed in Example 1 ofChapter 8.

(i) First we extract the two subsets and conduct some preliminarydata analysis.

> x <- read.table("shanghaiSECI.txt", skip=3, header=T)> y <- x[,4]*100 # daily returns in percentages> y1999 <- y[1005:1243] # extract daily returns in 1999> y2009 <- y[3415:3658] # extract daily returns in 2009> par(mar=c(4,4,2,1),mfrow=c(1,2))

Page 346: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

> plot(y1999, type='l', xlab='day', ylab='return',main='Returns in 1999')

> plot(y2009, type='l', xlab='day', ylab='return',main='Returns in 2009')

> length(y1999); length(y2009)[1] 239 # sample size of returns in 1999[1] 244 # sample size of returns in 2009> summary(y1999)

Min. 1st Qu. Median Mean 3rd Qu. Max.-8.8100 -0.8800 -0.1300 0.1037 0.7400 7.6400> summary(y2009)

Min. 1st Qu. Median Mean 3rd Qu. Max.-5.6600 -0.8150 0.3650 0.2561 1.4780 7.2900> var(y1999); var(y2009)[1] 3.493598[1] 3.922712

Page 347: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

0 50 100 150 200

−5

05

Returns in 1999

day

retu

rn

0 50 100 150 200 250

−6

−4

−2

02

46

Returns in 2009

day

retu

rn

Page 348: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

(ii) One sample t -test. Let Xi denote the returns in 1999, andYidenote the returns in 2009. Then nx = 239, ny = 244, and

X = 0.1037, Y = 0.2561, S2x = 3.4936, S2y = 3.9227.

We test H0 : µx = 0 vs H1 : µx > 0 rst.> t.test(y1999, alternative='greater', mu=0, conf.level=0.95)

# use alternative='two.sided' for two-sided alternativeOne Sample t-test

data: y1999t = 0.8576, df = 238, p-value = 0.196alternative hypothesis: true mean is greater than 095 percent confidence interval:-0.09596304 Inf

sample estimates:mean of x0.103682

Since the p-value is 0.196, we cannot reject H0 : µx = 0, i.e. the returns in1999 are not signicantly dierent from 0.

Page 349: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Corresponding the one-sided alternative, R also gives a correspondingone-sided condence interval for µ: (−0.096,∞), which contains 0. (Notethat the setting indicates that we believe µ is either 0 or positive. Thereforereasonable condence intervals are in the form (a,∞).)

Now we test H0 : µy = 0 vs H1 : µy > 0.> t.test(y2009, alternative='greater', mu=0, conf.level=0.99)

One Sample t-testdata: y2009t = 2.0202, df = 243, p-value = 0.02223alternative hypothesis: true mean is greater than 099 percent confidence interval:-0.04077725 Inf

sample estimates:mean of x0.2561475

For the returns in 2009, the p-value of the t -test is 0.022. Hence we rejectH0 : µy = 0 at the 5% signicance level, but cannot reject H0 at the 1%

Page 350: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

level. We conclude that there exists evidence indicating that the returns in2009 tend to greater than 0, although the evidence is not overwhelming.

Remark. With the sample sizes over 200, the above t -tests yield practicallythe same results as the Wald test.

(iii) Two-sample t -tests. We now test H0 : µx − µy = 0 againstH1 : µx − µy , 0 or H1 : µx − µy < 0.

> t.test(y1999, y2009, mu=0, alternative='two.sided', var.equal=T)# without flag "var.equal=T", the WelchâĂŞSatterthwaite approximate# test will be used insteadTwo Sample t-test

data: y1999 and y2009t = -0.8697, df = 481, p-value = 0.3849alternative hypothesis: true difference in means is not equal to 095 percent confidence interval:

Page 351: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

-0.4969197 0.1919887sample estimates:mean of x mean of y0.1036820 0.2561475

> t.test(y1999, y2009, mu=0, alternative='less', var.equal=T)Two Sample t-test

data: y1999 and y2009t = -0.8697, df = 481, p-value = 0.1924alternative hypothesis: true difference in means is less than 095 percent confidence interval:

-Inf 0.1364386sample estimates:mean of x mean of y0.1036820 0.2561475

Both the tests indicate that there is no signicant evidence against thehypothesis that the average returns in the two years are the same.

Page 352: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

10.5 Most Powerful Tests and Neyman-Pearson Lemma

Ideally we would choose, among those tests of size α , the test whichminimises the probability of Type II error, i.e. that maximises the powerβ (θ) over θ ∈ Θ1. If such a test exists, it is called the most powerful test(MPT).Neyman-Pearson Lemma. If a test of size α for

H0 : θ = θ0 against H1 : θ = θ1rejects H0 when

L(θ1; x) > KL(θ0; x),

and does not reject H0 when

L(θ1; x) < KL(θ0; x),

then it is a most powerful test of size α , where K > 0 is a constant.

Page 353: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Note. Both H0 and H1 are simple hypotheses.

Example 5 Let X1,X2, . . . ,Xn be a sample from N (µ, 1). To test

H0 : µ = 0 against H1 : µ = 5,

the likelihood ratio is

LR =

(1√2π

)nexp

(−

∑ni=1(Xi − 5)

2/2)

(1√2π

)nexp

(−

∑ni=1X

2i/2

) ∝ exp(5nX ).

Thus LR > K is equivalent to X > K1, K1 is determined by the size of thetest. Thus the MPT of size α rejects H0 i X > zα/

√n , where zα is a top-α

point of N (0, 1).

Question: If we change the alternative hypothesis to H1 : µ = 10, what isthe MPT then?

Page 354: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Uniformly Most Powerful Tests

Suppose that the MPT for testing

H0 : θ = θ0 vs H1 : θ = θ1does not change its form for all θ1 ∈ Θ1. Then it is the Uniformly MostPowerful Test (UMPT) for testing

H0 : θ = θ0 vs H1 : θ ∈ Θ1.

Note. Typically, Θ1 = (−∞, θ0) or Θ1 = (θ0,∞).

Example 5 (continue). For

H0 : µ = 0 against H1 : µ > 0,

the UMPT of size α rejects H0 i X > zα .

Page 355: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

A more general case

Let X = (X1, · · · ,Xn)T be random variables with joint pdf f (x, θ). We testthe hypotheses

H0 : θ ≤ θ0 vs H1 : θ > θ0. (10)

Denoted by T ≡ T (X) the MPT of size α for simple hypotheses

H0 : θ = θ0 vs H1 : θ = θ1,exists, where θ1 > θ0.

Then T is the UMPT of the same size α for hypotheses (10) provided that(i) T remains unchaged for all values of θ1 > θ0, and(ii) Pθ(T rejects H0) ≤ Pθ0(T rejects H0) = α for all θ < θ0.

Note. For hypotheses H0 : θ ≥ θ0 vs H1 : θ < θ0, the UMPT may be obtainedin the similar manner.

Page 356: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Example 6. Let (X1, · · · ,Xn) be a random sample from an exponentialdistribution with mean 1/λ. We are interested in testing

H0 : λ ≤ λ0 vs H1 : λ > λ0.

For

H0 : λ = λ0 vs H1 : λ = λ1,

the MPT rejects H0 i∑ni=1Xi ≤ K for any λ1 > λ0, where K is determined

by Pλ0∑ni=1Xi < K = α .

It is easy to verify that for λ < λ0, Pλ∑ni=1Xi < K < α .

Hence the MPT for the simple null hypothesis against simple alternativeis also the UMPT for the composite hypotheses.

Page 357: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Chapter 11. Hypothesis Testing (II)

11.1 LikelihoodRatio Tests—one of themost popular ways of constructingtests when both null and alternative hypotheses are composite (i.e. not asingle point).

Let X ∼ f (·, θ). Consider hypotheses

H0 : θ ∈ Θ0 vs H1 : θ ∈ Θ − Θ0.

The likelihood ratio testwill reject H0 for the large values of the statistic

LR = LR (X) ≡supθ∈Θ f (X, θ)supθ∈Θ0 f (X, θ)

= f (X, θ)/f (X, θ),

where θ the (unconstrained) MLE, and θ is the constrained MLE underhypothesis H0.

Page 358: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Remark. (i) It is easy to see that LR ≥ 1.

(ii) The exact sampling distributions of LR are usually unknown, except ina few special cases.

Example 1. (One-sample t -test)Let X = (X1, · · · ,Xn)τ be a random sample from N (µ,σ2). We are interestedin testing hypotheses

H0 : µ = µ0 against H1 : µ , µ0,

where µ0 is given, and σ2 is unknown and is a nuisance parameter. Nowboth H0 and H1 are composite. The likelihood function is

L(µ,σ2) = Cσ−n exp−

1

2σ2

n∑j=1

(Xi − µ)2.

Page 359: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

The unconstrained MLEs are

µ = X , σ2 =1

n

n∑j=1

(Xi − X )2,

and the constrained MLE is

σ2 =1

n

n∑j=1

(Xi − µ0)2.

The LR-ratio statistic is then

LR =L(µ, σ2)

L(µ0, σ2)= (σ2/σ2)n/2.

Since

nσ2 = nσ2 + n(X − µ0)2,

Page 360: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

it holds that σ2/σ2 = 1 +T 2/(n − 1), where

T =√n(X − µ0)

/ 1

n − 1

n∑j=1

(Xi − X )21/2.

Note thatT ∼ tn−1 under H0. The LRT will reject H0 i |T | > tn−1,α/2, wheretk ,α is the upper α-point of the t -distribution with k degrees of freedom.

Page 361: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Asymptotic Distribution of Likelihood ratio test statisticLet X = (X1, · · · ,Xn)

τ , and assume certain regularity conditions. Thenas n → ∞, the distribution of 2 log(LR ) under H0 converges to the χ2-distribution with d − d0 degrees of freedom, where d is the ‘dimension’ ofΘ and d0 is the ‘dimension’ of Θ0.

To make the computation of ‘dimension’ easy, reparametrisation is oftenadopted. Suppose that the parameter θ may be written in two parts

θ = (ψ,λ)

whereψ is k ×1 parameter of interest, and λ is of little interest and is callednuisance parameters. The hypotheses to be tested may be expressed as

H0 : ψ = ψ0 vs H1 : ψ , ψ0.

Page 362: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Now the LR-statistic is of the form

LR =L(ψ, λ;X)L(ψ0, λ;X)

,

where (ψ, λ) is unconstrained MLE while λ is the constrained MLE of λsubject to ψ = ψ0. Then as n →∞,

2 log(LR ) D−→ χ2k under H0.

Example 2. Let X1, · · · ,Xn be independent, and Xj ∼ N (µj , 1). Considerthe null hypothesis

H0 : µ1 = · · · = µn .

Page 363: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

The likelihood function is

L(µ1, · · · , µn) = C exp−1

2

n∑j=1

(Xj − µj )2,

where C > 0 is a constant independent of µj . Then the unconstrained MLEare µj = Xj and the constrained MLE is µ = X . Hence

LR =L(µ1, · · · , µn)

L(µ, · · · , µ)= exp

12

n∑j=1

(Xj − X )2.

Hence

2 log(LR ) =n∑j=1

(Xj − X )2 ∼ χ2n−1 under H0,

which is true for any nite n as well.

How to calculate the degree of freedom?

Page 364: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Since d = n, d0 = 1, the d.f. is d − d0 = n − 1.

Alternatively we may adopt the following reparametrisation:

µj = µ1 +ψj for 2 ≤ j ≤ n .

Then the null hypothesis can be expressed as

H0 : ψ2 = · · · = ψn = 0.

Therefore ψ = (ψ2, · · · ,ψn)τ has n − 1 component, i.e. k = n − 1.

11.2 The permutation test — a nonparametric method for testing if twodistributions are the same. It is particularly appealing when sample sizesare small, as it does not rely on any asymptotic theory.

Page 365: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Let X1, · · · ,Xm be sample from distribution Fx andY1, · · · ,Yn be a samplefrom distribution Fy . We are interested in testing

H0 : Fx = Fy versus H1 : Fx , Fy .

Key idea: under H0, X1, · · · ,Xm,Y1, · · · ,Yn form a sample of size m + n

from a single distribution.

Choose a test statistic

T = T (X1, · · · ,Xm,Y1, · · · ,Yn)

which is capable to tell the dierence between the two distribution, e.g.T = |X − Y |, or T = |X − Y |2 + |S2x − S

2y |.

Consider all (m + n)! permutations of (X1, · · · ,Xm,Y1, · · · ,Yn), compute thetest statistic T for each permutation, yielding the values T1, · · · ,T(m+n)!.

Page 366: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

The p-value of the test is dened as

p =1

(m + n)!

(m+n)!∑j=1

I (Tj > tobs ),

where tobs = T (X1, · · · ,Xm,Y1, · · · ,Yn). We reject H0 at the signicancelevel α if p ≤ α .

Note. When H0 holds, all those (m + n)! Tj ’s are on the equal footing, andtobs = T (X1, · · · ,Xm,Y1, · · · ,Yn) is one of them. Therefore tobs is unlikelyto be an extreme value among Tj ’s.

Page 367: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Algorithm for Permutation Tests:

1. Compute tobs = T (X1, · · · ,Xm,Y1, · · · ,Yn).

2. Randomly permute the data. Compute T again using the permuteddate.

3. Repeat Step 2 B times, and let T1, · · · ,TB denote the resulting values.

4. The approximate p-value is B−1∑1≤j≤B I (Tj > tobs ).

Remark. Let Z = (X1, · · · ,Xm,Y1, · · · ,Yn) (Z <- c(X,Y)). A permutation ofZ may be obtained in R asZp <- sample(Z, n+m)

You may also use the R-function sample.int:k <- sample.int(n+m, n+m)Now k is a permutation of 1, 2, · · · , n +m.

Page 368: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Example 3. Class A was taught using detailed PowerPoint slides. The marksin the nal exam are

45, 55, 39, 60, 64, 85, 80, 64, 48, 62, 75, 77, 50.

Students in Class B were required to read books and answer questions inclass discussions. The marks in the nal exam are

45, 59, 48, 74, 73, 78, 66, 69, 79, 81, 60, 52.

Can we infer that the marks from the two classes are signicantly dierent?

We conduct the permutation test using the test statistic T = |X − Y | in R:

> x <- c(45, 55, 39, 60, 64, 85, 80, 64, 48, 62, 75, 77, 50)> y <- c(45, 59, 48, 74, 73, 78, 66, 69, 79, 81, 60, 52)> length(x); length(y)[1] 13

Page 369: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

[1] 12> summary(x)

Min. 1st Qu. Median Mean 3rd Qu. Max.39.00 50.00 62.00 61.85 75.00 85.00

> summary(y)Min. 1st Qu. Median Mean 3rd Qu. Max.45.00 57.25 67.50 65.33 75.00 81.00

> Tobs <- abs(mean(x)-mean(y))> z <- c(x,y)> k <- 0> for(i in 1:5000) + zp <- sample(z, 25) # zp is a permutation of z+ T <- abs(mean(zp[1:13])-mean(zp[14:25]))+ if(T>Tobs) k <- k+1+ cat("p-value:", k/5000, "\n")p-value: 0.5194

Since p-value is 0.5194, we cannot reject the null-hypothesis that the markdistributions of the two classes are the same.

Page 370: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

We also apply the t -sample, obtaining the similar results:

> t.test(x, y, var.equal=T) # mu=0 is the defaultTwo Sample t-test

data: x and yt = -0.6472, df = 23, p-value = 0.5239alternative hypothesis: true difference in means is not equal to 095 percent confidence interval:-14.632967 7.658608

Page 371: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

11.3 χ2-tests

11.3.1 Goodness-of-t tests: to test if a given distribution ts the data.

Let X1, · · · ,Xn be a random sample from a discrete distribution of kcategories denoted by 1, · · · , k . Denote the probability function

pj = P (Xi = j ), j = 1, · · · , k .

Then pj ≥ 0 and∑kj=1 pj = 1.

Typically n >> k . Therefore the data are often compressed into a table:

Category 1 2 · · · k

Frequency Z1 Z2 · · · Zk

Page 372: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

where

Zj = No. of Xi ’s equal to j , j = 1, · · · , k .

Obviously∑kj=1 Zj = n .

To test the null hypothesis

H0 : pi = pi (θ), i = 1, · · · , k ,

where the function forms of pi (θ) are known but the parameter θ is un-known. For example, pi (θ) = θi−1e−θ/(i − 1)! (i.e. Poisson distribution).

We rst estimate θ by, for example, its MLE θ. The expected frequenciesunder H0 are

Ei = npi (θ), i = 1, · · · , k .

Page 373: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Listing them together with observed frequencies, we have

Category 1 2 · · · k

Frequency Z1 Z2 · · · ZkExpected frequency E1 E2 · · · Ek

If H0 is true, we expect Zj ≈ Ej = npj (θ) when n is large, as, by the LLN, itholds

Zj

n=1

n

n∑i=1

I (Xi = j ) → E I (Xi = j ) = P (Xi = j ) = pj (θ).

Test statistic: T =∑kj=1(Zj − Ej )

2/Ej .Theorem. Under H0, T

D−→ χ2

k−1−das n → ∞, where d is the number of

components in θ.

Page 374: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Remark. (i) It is important that Ei ≥ 5 at least. This may be achieved bycombining together the categories with smaller expected frequencies.

(ii) When pj are completely specied (i.e. known) under H0, d = 0.

Example 4. A supermarket recorded the numbers of arrivals over 100one-minute intervals. The data were summarized as follows

No. of arrivals 0 1 2 3 4 5 7Frequency 13 29 32 20 4 1 1

Do the data match a Poisson distribution?

The null hypothesis is H0 : pi = λi e−λ/i ! for i = 0, 1, · · · . We nd the MLEfor λ rst.

Page 375: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

The likelihood function: L(λ) =∏100i=1

λXiXi !e−λ ∝ λ

∑100i=1

Xi e−100λ.

The log-likelihood function: l (λ) = log(λ)∑100i=1

Xi − 100λ.

Let ddλ l (λ) = 0, leading to λ =1100

∑100i=1

Xi = X .

Since we are only given the counts Zj instead of Xi , we need to computeX from Zj . Recall Zj = no. of Xi equal to j . Hence

X =1

n

n∑i=1

Xi =1

n

k∑j=1

j · Zj

=1

100(0 × 13 + 1 × 29 + 2 × 32 + 3 × 20 + 4 × 4

+5 × 1 + 7 × 1) = 1.81.

Page 376: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

With λ = 1.81, the expected frequencies are

Ei = n · pi (λ) = 100 ×(1.81)i

i ! e−1.81, i = 0, 1, · · · .

We combine the last three categories to make sure Ei ≥ 5.No. of arrivals 0 1 2 3 ≥ 4 TotalFrequency Zi 13 29 32 20 6 100pi (λ) = λ

i e−λ/i ! 0.164 0.296 0.268 0.162 0.110 1Expected frequency Ei 16.4 29.6 26.8 16.2 11.0 100Dierence Zi − Ei -3.4 -0.6 5.2 3.8 -5 0(Zi − Ei )

2/Ei 0.705 0.012 1.01 0.891 2.273 4.891

Note under H0, T =∑4i=0(Zi − Ei )

2/Ei ∼ χ25−1−1 = χ

23 . Since T = 4.891 <

χ20.10, 3 = 6.25, we cannot reject the assumption that the data follow aPoisson distribution.

Page 377: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Remark. (i) The goodness-of-t test has been widely used in practice.However we should bear in mind that when H0 cannot be rejected, we arenot in the position to conclude that the assumed distribution is true, as

“not reject” , “accept"

(ii) The above test may be used to test the goodness-of-t of a continuousdistribution via discretization. However there exist more appropriatemethods such as Kolmogorov-Smirnov test and Cramér-von Mises test,which deal with the goodness-of-t for continuous distributions directly.

Page 378: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

11.3.2 Tests for contingency tables

Tests for independence of two discrete random variables

Let (X ,Y ) be two discrete random variables, and X have r categories andY have c categories. Let

pi j = P (X = i , Y = j ), i = 1, · · · , r , j = 1, · · · , c .

Then pi j ≥ 0 and∑i ,j pi j ≡

∑ri=1

∑cj=1 pi j = 1.

Let pi · = P (X = i ) and p ·j = P (Y = j ). It is easy to see that

pi · =c∑j=1

P (X = i , Y = j ) =c∑j=1

pi j =∑j

pi j

Similarly, p ·j =∑i pi j

Page 379: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

X andY are independent ipi j = pi ·p ·j for i = 1, · · · , r and j = 1, · · · , c.

Suppose we have n pairs of observations from (X ,Y ). The data are pre-sented in a contingency table below

Y1 2 · · · c

1 Z11 Z12 · · · Z1cX 2 Z21 Z22 · · · Z2c

... ... ... ... ...r Zr 1 Zr 2 · · · Zr c

where Zi j = no. of the pairs equal to (i , j ).

Page 380: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

It is often useful to add the marginals into the table:

Zi · =c∑j=1

Zi j , Z ·j =r∑i=1

Zi j , Z ·· =r∑i=1

Zi · =c∑j=1

Z ·j = n

Y1 2 · · · c

1 Z11 Z12 · · · Z1c Z1·X 2 Z21 Z22 · · · Z2c Z2·

... ... ... ... ... ...r Zr 1 Zr 2 · · · Zr c Zr ·

Z ·1 Z ·2 · · · Z ·c Z ·· = n

Page 381: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

We are interested in testing the independenceH0 : pi j = pi ·p ·j , i = 1, · · · , r , j = 1, · · · , c.

Under H0, a natural estimator for pi j is

pi j = pi ·p ·j =Zi ·n

Z ·j

n

Hence the expected frequency at the (i , j )-th cell is

Ei j = npi j = Zi ·Z ·j /n = Zi ·Z ·j /Z ··, i = 1, · · · , r , j = 1, · · · , c .

If H0 is true, we expect Zi j ≈ Ei j . The goodness-of-t test statistic isdened as

T =r∑i=1

c∑j=1

(Zi j − Ei j )2/Ei j .

We reject H0 for large values of T .

Page 382: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Under H0, T ∼ χ2p−d , where• p = no. of cells -1 = r c − 1• d = no. of estimated ‘free’ parameters = r + c − 2.

Note. 1. The sum of r × c counts Zi j is n xed. So knowing r c − 1 of them,the other one is also known. This is why p = r c − 1.

2. The estimated parameters are pi · and p ·j . But∑ri=1 pi · = 1 and

∑cj=1 p ·j =

1. Hence d = (r − 1) + (c − 1) = r + c − 2.

3. For testing independence, it always holds that

Zi · − Ei · = 0 and Z ·j − E·j = 0.

Those are useful facts in checking for computational errors. The proofsare simple, as, for example,

Zi · − Ei · = Zi · −∑j

Ei j = Zi · −∑j

Zi ·Z ·j

Z ··= Zi · −

Zi ·Z ··Z ··

= 0.

Page 383: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Theorem. Under H0, the limiting distribution of T is χ2 with (r − 1)(c − 1)degrees of freedom, as n →∞.

Page 384: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Example. The table below lists the counts on the beer preference andgender of beer drinker from randomly selected 150 individuals. Test at the5% signicance level the hypothesis that the preference is independentof the gender.

Beer preferenceLight ale Lager Bitter Total

Gender Male 20 40 20 80Female 30 30 10 70Total 50 70 30 150

The expected frequencies are:

E11 =80 · 50

150= 26.67, E12 =

80 · 70

150= 37.33, E13 =

80 · 30

150= 16,

E21 =70 · 50

150= 23.33, E22 =

70 · 70

150= 32.67, E33 =

70 · 30

150= 14.

Page 385: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Ei j26.67 37.33 16 8023.33 32.67 14 70

50 70 30 150

Zi j − Ei j-6.67 2.67 4 06.67 -2.67 -4 0

0 0 0 0

(Zi j − Ei j )2/Ei j

1.668 0.191 1.000 2.8591.907 0.218 1.142 3.267

6.126

Under the null hypothesis of independence, T =∑i ,j (Zi j − Ei j )

2/Ei j ∼ χ22 .

Note the degree freedom is (2 − 1)(3 − 1) = 2.

Since T = 6.126 > χ20.05, 2 = 5.991, we reject the null hypothesis, i.e. thereis signicant evidence from the data indicating that the beer preferenceand the gender of beer drinker are not independent.

Page 386: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Tests for several binomial distributions

Consider a real example: Three independent samples of sizes 80, 120 and100 are taken respectively from single, married, and widowed or divorcedpersons. Each individual was asked to if “friends and social life” or “joband primary activity” contributes most to their general well-being. Thecounts from the three samples are summarized in the table below.

Single Married Widowed ordivorced

Friends andsocial life 47 59 56

Job or primaryactivity 33 61 44

Total 80 120 100

Page 387: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Conditional Inference: Sometimes we conduct inference under the as-sumption that all the row (or column) margins are xed.

Dierent from the tables for independent tests, now

Z1j ∼ Bi n(Z ·j , p1j ), j = 1, 2, 3,

where Z ·j are xed constants — sample sizes. Furthermore, p2j = 1 − p1j .

We are interested in testing hypothesis

H0 : p11 = p12 = p13.

Under H0, the three independent samples may be seen from the samepopulation. Furthermore,

Z11 + Z12 + Z13 ∼ Bi n(Z ·1 + Z ·2 + Z ·3, p),

Page 388: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

where p denotes the common value of p11, p12 and p13.

Therefore the MLE is

p =Z11 + Z12 + Z13Z ·1 + Z ·2 + Z ·3

=47 + 59 + 56

80 + 120 + 100= 0.54.

The expected frequencies are

E1j = pZ ·j and E2j = Z ·j − E1j , j = 1, 2, 3.

Ei j43.2 64.8 54.036.8 55.2 46.0

Total 80 120 100

Zi j − Ei j3.8 -5.8 2.0-3.8 5.8 -2.0

Total 0 0 0

Page 389: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

(Zi j − Ei j )2/Ei j

0.334 0.519 0.0740.392 0.609 0.087

Total 0.726 1.128 0.161 2.015

Page 390: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Under H0, T =∑i ,j (Zi j − Ei j )

2/Ei j ∼ χ2p−d

= χ22 , where

• p = no. of free counts Zi j = 3

• d = no. of estimated free parameters = 1.

Since T = 2.015 < χ20.10,2 = 4.605, we cannot reject H0, i.e. there is nosignicant dierence among the three populations in terms of choosingbetween F&SL and J&PA as the more important factor towards their generalwell-being.

Remark. Similar to the independence tests, it holds that Zi · − Ei · = 0 andZ ·j − E·j = 0.

Page 391: Chapter 1. Introduction to R, and Descriptive Data ... · PDF fileIntroduction to R, and Descriptive Data ... The easiest form of data to import into R is a simple text ... hist(fSalary,

Tests for r × c tables – a general description

In general, we may test for dierent types of the structure in a r × c table,for example, the symmetry (pi j = pj i ).

The key is to compute expected frequencies Ei j under null hypothesis H0.

Under H0, the test statistic

T =r∑i=1

c∑j=1

(Zi j − Ei j )2

Ei j∼ χ2p−d ,

• p = no. of ‘free’ counts among Zi j ,

• d = no. of the estimated ‘free’ parameters.

We reject H0 if T > χ2α , p−d .

Remark. The R-function chisq.test performs both the goodness-of-ttest and the contingency table test.