Sample Geometry and Random Sampling
-
Upload
rebekah-joyner -
Category
Documents
-
view
58 -
download
0
description
Transcript of Sample Geometry and Random Sampling
11
Sample Geometry and Sample Geometry and Random SamplingRandom Sampling
Shyh-Kang JengShyh-Kang JengDepartment of Electrical Engineering/Department of Electrical Engineering/Graduate Institute of Communication/Graduate Institute of Communication/
Graduate Institute of Networking and MultiGraduate Institute of Networking and Multimediamedia
22
Array of DataArray of Data
npnknn
jpjkjj
pk
pk
xxxx
xxxx
xxxx
xxxx
21
21
222221
111211
X
*a sample of size n from a p-variate population
33
Row-Vector ViewRow-Vector View
'
'
'2
'1
21
21
222221
111211
n
j
npnknn
jpjkjj
pk
pk
xxxx
xxxx
xxxx
xxxx
x
x
x
x
X
44
Example 3.1Example 3.1
53
31
14
X
55
Column-Vector ViewColumn-Vector View
]|||[ 21
21
21
222221
111211
p
npnknn
jpjkjj
pk
pk
xxxx
xxxx
xxxx
xxxx
yyyX
66
Example 3.2Example 3.2
53
31
14
X
77
Geometrical Interpretation of Geometrical Interpretation of Sample Mean and DeviationSample Mean and Deviation
ini
ii
ii
iii
ii
iniii
i
xx
xx
xx
xn
x
xn
xxx
nn
2
1
'
21'
1
1)
1(
]1,,1,1['
1yd
1y
1111y
1
88
Decomposition of Column VectorsDecomposition of Column Vectors
99
Example 3.3Example 3.3
]'2,0,2[]'3,3,3[]'5,3,1[
]'1,3,2[]'2,2,2[]'3,1,4[
]'3,3,3[]'1,1,1[3
]'2,2,2[]'1,1,1[2
3,2,
53
31
14
222
111
2
1
21
1yd
1yd
1
1
X
x
x
x
x
xx
1010
Lengths and Angles of Lengths and Angles of Deviation VectorsDeviation Vectors
ik
kkii
ikik
ik
n
jiji
n
jiji
ik
ikkjk
n
jijiki
n
jiiijiii
rss
s
xxxx
LL
nsxxxx
nsxxL
ki
i
cos
cos
cos
1
2
1
2
1
'
1
2'2
dd
d
dd
dd
1111
Example 3.4Example 3.4
1189.0
189.01,
3/83/2
3/23/14
189.0
32
38,314
]'2,0,2[,]'1,3,2[
53
31
14
2211
1212
122'1
222'2111
'1
21
RS
dd
dddd
dd
X
n
ss
sr
s
ss
1212
Random MatrixRandom Matrix
'
'2
'1
21
22221
11211
nnpnn
p
p
XXX
XXX
XXX
X
X
X
X
1313
Random SampleRandom Sample
Row vectors Row vectors XX11’, ’, XX22’, …, ’, …, XXnn’ repre’ represent independent observations from sent independent observations from a common joint distribution with dea common joint distribution with density function nsity function ff((xx)=)=ff((xx11, , xx22, …, , …, xxpp))
Mathematically, the joint density funMathematically, the joint density function of ction of XX11’, ’, XX22’, …, ’, …, XXnn’ is’ is
),,,()(
)()()(
21
21
jpjjj
n
xxxff
fff
x
xxx
1414
Random SampleRandom SampleMeasurements of a single trial, such as Measurements of a single trial, such as XXjj’=’=[[XXj1j1,X,Xj2j2,…,X,…,Xjpjp]], will usually be correlate, will usually be correlateddThe measurements from different trials The measurements from different trials must be independentmust be independentThe independence of measurements froThe independence of measurements from trial to trial may not hold when the vam trial to trial may not hold when the variables are likely to drift over timeriables are likely to drift over time
1515
Geometric Interpretation of Geometric Interpretation of Randomness Randomness
Column vector Column vector YYkk’’=[=[XX1k1k,,XX2k2k,…,,…,XXnknk]] regarded as regarded as a point in n dimensionsa point in n dimensionsThe location is determined by the joint probabThe location is determined by the joint probability distribution ility distribution ff((yykk)) = = ff((xx1k1k, , xx2k2k,…,,…,xxnknk))
For a random sample, For a random sample, ff((yykk)=)=ffkk((xx1k1k))fkfk((xx2k2k)…)…ffkk((xxnknk))
Each coordinate Each coordinate xxjkjk contributes equally to the l contributes equally to the location through the same marginal distributioocation through the same marginal distribution n ffkk((xxjkjk))
1616
Result 3.1Result 3.1
) of estimatepoint unbiasedan as 1
(
)1
(
1)(,
1)Cov(
) of estimatepoint unbiasedan as (,)(
then,matrix
covariance and r mean vecto hason that distributi
joint a from sample random a are ,,, 21
ΣSS
ΣS
ΣSΣX
μXμX
Σ
μ
XXX
n
n
n
n
n
nn
nE
n
nE
n
E
1717
Proof of Result 3.1Proof of Result 3.1
'1)')(()Cov(
'1
'11
)')((
)(1
)(1
)(1
)111
()(
1 12
1 12
11
21
21
μXμXμXμXX
μXμX
μXμXμXμX
μXXX
XXXX
n
j
n
j
n
j
n
j
nn
jj
n
n
En
E
n
nn
En
En
En
nnnEE
1818
Proof of Result 3.1Proof of Result 3.1
')'()(
''
'
)'1
()(
11)')((
1)(Cov
ce.independen of becausefor 0))((
'
1
'
11
'
1
1
21
2
μμΣμμXμμXXX
XXXXXXXXXX
XXXX
XXXXS
ΣΣμXμXX
μXμX
jjjj
n
jjj
n
jj
n
jjj
j
n
jj
j
n
jjn
n
jjj
j
EE
n
nEE
nn
nE
n
jE
1919
Proof of Result 3.1Proof of Result 3.1
n
n
nnn
n
XXnXXEn
SE
nXXEXXE
n
jjjn
1'
1'
1
)'(1
)(
'1
)'()'(
1
'
2020
Some Other EstimatorsSome Other Estimators
large moderately is size if ignored be
usuallycan )( and )( Biases
)(,)(
)1
1()
1(
1 ofentry ),( theofn expectatio The
1
n
rEsE
rEsE
XXXXn
Esn
nE
n
nthki
ikikiiii
ikikiiii
ikkjk
n
jijiik
n
S
2121
Generalized Sample VarianceGeneralized Sample Variance
487.26
67.12343.68
43.6804.252
in US firms
publishinglargest 16for employeeper
profits and Employees :3.7 Example
Variance Sample dGeneralize
S
S
S
2222
Geometric Interpretation for Geometric Interpretation for Bivariate CaseBivariate Case
22
2122211
22122211
12221111
2122211
221
22211
1
211
2
222111
)1/()(
)1(
)1()1(,cos
)1(,)1(
cos1sin is
,vectorsdeviation by two generated Area
21
2121
narea
rsssrss
rsss
rssnarear
snxxLsnxxL
LLLLarea
xx
ik
n
jjd
n
jjd
dddd
S
1yd1yd
2323
Generalized Sample Variance for Generalized Sample Variance for Multivariate CasesMultivariate Cases
2)()1( volumen pS
2424
Interpretation in Interpretation in pp-space Scatter Plot-space Scatter Plot
Equation for points within a Equation for points within a constant distance constant distance cc from the from the sample meansample mean
variancedgeneralize large
a toscorrespond volumelargeA
)()'( of Volume
)()'(
2/1
21
21
pp ck
c
c
S
xxSxx
xxSxx
2525
Example 3.8: Scatter PlotsExample 3.8: Scatter Plots
2626
Example 3.8: Sample Mean and Example 3.8: Sample Mean and Variance-Covariance MatricesVariance-Covariance Matrices
cases threeallfor 9],1,2['
8.0,54
45
0,30
03
8.0,54
45
Sx
S
S
S
r
r
r
2727
Example 3.8: Example 3.8: Eigenvalues and EigenvectorsEigenvalues and Eigenvectors
]2/1,2/1[],2/1,2/1[
1,9:54
45
]1,0[],0,1[
3,3:30
03
]2/1,2/1[],2/1,2/1[
1,9:54
45
'2
'1
21
'2
'1
21
'2
'1
21
ee
ee
ee
2828
Example 3.8: Example 3.8: Mean-Centered EllipseMean-Centered Ellipse
nsobservatio 95%ely approximatcover to99.5 Choose
),(
, rseigenvecto ;1
,1
seigenvalue :
)('
)('
2
22
11
'2
'1
2
1
1
2121
1
2
22
1
211
21
c
xx
xx
y
y
yy
c
e
e
eSeeSe
eeS
xxSxx
xxSxx
2929
Example 3.8: Example 3.8: Semi-major and Semi-minor Axes Semi-major and Semi-minor Axes
99.5,99.53,54
45
99.53,99.53,30
03
99.5,99.53,54
45
ba
ba
ba
S
S
S
3030
Example 3.8:Example 3.8:Scatter Plots with Major AxesScatter Plots with Major Axes
3131
Result 3.2Result 3.2
The generalized variance is zero The generalized variance is zero when the columns of the following when the columns of the following matrix are linear dependentmatrix are linear dependent
'
'
'
'
2211
2222121
1212111
'
'2
'1
x1X
xx
xx
xx
pppnn
pp
pp
nxxxxxx
xxxxxx
xxxxxx
3232
Proof of Result 3.2Proof of Result 3.2
''
'
'
'''
)'()''(
)'()''()1(
0,)'(
)'(col)'(col0
1
'
'2
'1
''2
'1
11
xxxx
xx
xx
xx
xxxxxx
x1Xx1X
x1Xx1XS
aax1X
x1Xx1X
j
n
jj
n
p
pp
n
aa
3333
Proof of Result 3.2Proof of Result 3.2
0)(0
0
)'()''()1(0
0such that ,0 if
00)(col)(col
0)'()''()1(
2)'(
11
a'x1X
)a'x1(X)''x1(Xa'
ax1Xx1XSa
SaaS
SSS
ax1Xx1XS
ax1XL
n
aa
an
pp
3434
Example 3.9Example 3.9
0
12/10
2/112/3
02/33
:check
02
]1,1,0[],1,0,1[],1,1,2[
111
101
012
',5,1,3',
404
614
521
213
'3
'2
'1
SS
Sddd
ddd
x1XxX
3535
Example 3.9Example 3.9
3636
Examples Cause Zero Examples Cause Zero Generalized VarianceGeneralized Variance
Example 1Example 1– Data are test scoresData are test scores– Included variables that are sum of Included variables that are sum of
othersothers– e.g., algebra score and geometry score e.g., algebra score and geometry score
were combined to total math scorewere combined to total math score– e.g., class midterm and final exam e.g., class midterm and final exam
scores summed to give total pointsscores summed to give total points
Example 2Example 2– Total weight of chemicals was included Total weight of chemicals was included
along with that of each componentalong with that of each component
3737
Example 3.10Example 3.10
0)()(1)(1
]1,1,1[
S of seigenvalue zero toingcorrespondr Eigenvecto
00
0.55.25.2
5.25.20
5.205.2
,
110
022
101
321
312
',
14113
1385
12102
16124
1091
332211
xxxxxx
'
jjj
a
SaS
Sx1XX
3838
Result 3.3Result 3.3
If the sample size is less than or If the sample size is less than or equal to the number of variables equal to the number of variables ( ) then |( ) then |SS| = 0 for all | = 0 for all samplessamples
pn
3939
Proof of Result 3.3Proof of Result 3.3
)''(row)()''(row)(
)'(col)''()(col)1(
),'()''()1( Since
of because ,1 toequalor than less i.e.,
,1 toequalor than less is ofrank theThus
because
vectorzero the tosum of vectorsrown The
11
1 1
x1Xx1X
x1Xx1XS
x1Xx1XS
x1X
x1X
nknkkk
kk
n
j
n
jkjk
xxxx
n
n
pnp
n'-
xx
'-
4040
Proof of Result 3.3Proof of Result 3.3
0 || matrix, by a is Since
.1 toequalor than less
i.e., ,1 toequalor than less thusis ofrank The
vectorsrow of transposeoft independenlinear
1most at ofn combinatiolinear a is )(col
vectorsrow remaining
theofn combinatiolinear a is )''(row1
SS
S
x1X
pp
p-
n-
nSk
4141
Result 3.4Result 3.4Let the Let the pp by by 11 vectors vectors xx11, , xx22, …, , …, xxnn, where , where xxjj’ is t’ is the he jjth row of the data matrix th row of the data matrix XX, be realizations , be realizations of the independent random vectors of the independent random vectors XX11, , XX22, …, , …, XXnn. . If the linear combination If the linear combination aa’’XXjj has positive var has positive variance for each non-zero constant vector iance for each non-zero constant vector aa, the, then, provided that n, provided that pp < < nn, , SS has full rank with prob has full rank with probability 1 and |ability 1 and |SS| > 0| > 0If, with probability 1, If, with probability 1, aa’’XXjj is a constant is a constant cc for al for all l jj, then |, then |SS| = 0| = 0
4242
Proof of Part 2 of Result 3.4Proof of Part 2 of Result 3.4
0||0
''
''
)'(
'/
isit for mean sample The . allfor '
1,y probabilit with '
1
1
11
111
1
12211
2211
S
xaxa
xaxa
x1X
xa
xa
Xa
cc
cc
xx
xx
a
xx
xx
aa
cnxaxaxa
jc
cXaXaXa
n
pnp
pp
p
n
n
jjppjj
j
jppjjj
4343
Generalized Sample Variance of Generalized Sample Variance of Standardized VariablesStandardized Variables
1-or 1nearly are moreor onewhen
small is and zero,nearly are all when large is
,)()1(
'1
variablesedstandardiz the
of variancesample dGeneralize
22112
21
ik
ik
ppp
ii
ini
ii
ii
ii
ii
ii
ii
r
r
sssvolumen
s
xx
s
xx
s
xx
s
xy
| |
R
RSR
R
4444
Volume Generated by Deviation Volume Generated by Deviation Vectors of Standardized VariablesVectors of Standardized Variables
4545
Example 3.11Example 3.11
RSRS
RS
332211
332211
,18
7,14
1,9,4
13/22/1
3/212/1
2/12/11
,
121
293
134
sss
sss
4646
Total Sample VarianceTotal Sample Variance
5 variancesample Total
12/10
2/112/3
02/33
:3.9 Example
71.375 variancesample Total
67.12343.68
43.6804.252 :3.7 Example
vectorsresidual theofn orientatio thetoattention no Pays
Variance Sample Total 2211
S
S
sss pp
4747
Sample Mean as Matrix OperationSample Mean as Matrix Operation
1X
1y
1y
1y
x
'1
1
1
1
1
/
/
/
21
22212
12111
'
'2
'1
2
1
n
xxx
xxx
xxx
n
n
n
n
x
x
x
nppp
n
n
pp
4848
Covariance as Matrix OperationCovariance as Matrix Operation
X11X
X11x1
'
'1
'
2211
2222121
1212111
21
21
21
pnpnn
pp
pp
p
p
p
xxxxxx
xxxxxx
xxxxxx
n
xxx
xxx
xxx
4949
Covariance as Matrix OperationCovariance as Matrix Operation
X11XX11X
S
'1
''1
)1(
2211
2222121
1212111
21
22222212
11121111
nn
xxxxxx
xxxxxx
xxxxxx
xxxxxx
xxxxxx
xxxxxx
n
pnpnn
pp
pp
pnppppp
n
n
5050
Covariance as Matrix OperationCovariance as Matrix Operation
X11IXS
1111I
11111111I11I11I
X11I11IX
X11XX11X
)'1
('1
1
)'('1
''1
'1
'1
)'1
()''1
(
)'1
()''1
('
'1
''1
2
nn
nn
nnnnn
nn
nn
5151
Sample Standard Deviation MatrixSample Standard Deviation Matrix
2/12/1
2/12/1
22
2
11
1
22
2
2222
22
1122
21
11
1
2211
12
1111
11
22
11
2/122
11
2/1
/100
0/10
00/1
,
00
00
00
RDDS
SDDR
DD
pppp
pp
pp
p
pp
p
pp
p
pp
p
pppp
ss
s
ss
s
ss
s
ss
s
ss
s
ss
s
ss
s
ss
s
ss
s
s
s
s
s
s
s
5252
Result 3.5Result 3.5
ScbXcXb
SbbXb
xbXb
Xc
Xb
'' and ' of covariance Sample
'' of varianceSample
'' ofmean Sample
'
'
2211
2211
pp
pp
XcXcXc
XbXbXb
5353
Proof of Result 3.5Proof of Result 3.5
Sbbbxxxxb
xbxb
bxxxxbxbxb
xbxbxbxb
xb
')')(('1
1
''1
1 varianceSample
)')((')''(
''''
mean Sample
'
1
1
2
2
21
2211
n
jjj
n
jj
jjj
n
jppjjj
n
n-
n
xbxbxb
5454
Proof of Result 3.5Proof of Result 3.5
Scbcxxxxb
xcxcxbxb
')')(('1
1
''''1
1 covariance Sample
1
1
n
jjj
n
jjj
n
n-
5555
Result 3.6Result 3.6
' matrix covariance Sample
ofmean Sample
2
1
21
22221
11211
ASA
xAAX
AX
pqpqq
p
p
X
X
X
aaa
aaa
aaa