Post on 26-Dec-2015
Empirical ModelingEmpirical Modeling
Dongsup Kim
Department of Biosystems, KAISTFall, 2004
Empirical modelingEmpirical modeling Moore’s law: Gordon Moore made his famous observation in 1965, just
four years after the first planar integrated circuit was discovered. The press called it "Moore's Law" and the name has stuck. In his original paper, Moore observed an exponential growth in the number of transistors per integrated circuit and predicted that this trend would continue.
From http://www.intel.com/research/silicon/mooreslaw.htm
Covariance and correlationCovariance and correlation Consider n pairs of measurements on each of variables
x and y.
A measure of linear association between the measurements of variable x and y is “sample covariance”
– If sxy > 0: positively correlated
– If sxy < 0: negatively correlated
– If sxy = 0: uncorrelated
Sample linear correlation coefficient (“Pearson’s product moment correlation coefficient”)
n
n
y
x
y
x
y
x ,..., ,
2
2
1
1
n
iiixy yyxx
ns
1
))((1
n
ii
n
ii
n
iii
yx
xyxy
yyxx
yyxx
ss
sr
1
2
1
2
1
)()(
))((
11 xyr
CorrelationCorrelation
201
8316
9021
5911
436
363
7213
649
578
303
Y (Salary in $1000s)
X (Years Experience)
972.0
979.22
315.6
4.55
1.9
,
YX
Y
X
r
Y
X
Strong relationship
Covariance & Correlation Covariance & Correlation matrix matrix
Given n measurements on p variables, the sample covariance is
and the covariance matrix,
The sample correlation coefficient for the ith and jth variables,
and the correlation matrix,
pppp
p
p
sss
sss
sss
21
22221
11211
S
1
1
1
21
221
112
pp
p
p
rr
rr
rr
R
pjpixxxxn
sn
kjjkiikij ,...,2,1 ,,...,2,1 ,))((
1
1
n
kjjk
n
kiik
n
kjjkiik
jjii
ijij
xxxx
xxxx
ss
sr
1
2
1
2
1
)()(
))((
In two dimensionIn two dimension
Fitting a line to dataFitting a line to data When the correlation coefficient is large, it indicates a
dependence of one variable on the other. The simplest relationship is the straight line: y = 0+ 1x
Criteria for a best fit line: least squares The resulting equation is called “regression equation”,
and its graph is called the “regression line”. The sum of squares of the error SS:
Least square equations:
n
iii
n
iii xyyySS
1
210
1
2 )]ˆˆ([)ˆ(
xys
s
xnxn
yxyxn
SSSS
x
xy
n
ii
n
ii
n
ii
n
ii
n
iii
10
1
2
1
2
1111
10
ˆˆ ,)(
ˆ
0ˆ
,0ˆ
A measure of fitA measure of fit Suppose we have data points (xi, yi) and modeled (or
predicted) points (xi, ŷi) from the model ŷ = f(x).
Data {yi} have two types of variations; (i) variation explained by the model and (ii) variation not explained by the model.
Residual sum of squares: variation not explained by the model
Regression sum of squares: variation explained by the model
The coefficient of determination R2
n
iiis yySS
1
2Re )ˆ(
n
iig yySS
1
2Re )ˆ(
x1 x2
y1
y2
y
Total variation in y = Variation explained by the model + Unexplained variation (error)
sg
g
SSSS
SSR
ReRe
Re2
Principal Component Analysis Principal Component Analysis (PCA)(PCA)
PCA selects a new set of axes for the data by moving and rotating the coordinate system in such a way that the dependency between the two variables is removed in a new transformed coordinate system.
First principal axis points to the direction of the maximum variation in the data.
Second principal axis is orthogonal to the first one and is in the direction of the maximum variation in the remaining allowable directions, and so on.
It can be used to:– Reduce number of dimensions in
data.– Find patterns in high-dimensional
data.– Visualize data of high
dimensionality.
PCA, IIPCA, II Assume X is an n p matrix and is “centered (zero
mean)” Let a be the p 1 column vector of projection weights
(unknown at this point) that result in the largest variance when the data X are projected along a.
We can express the projected values onto a of all data vectors in X as Xa.
Now define the variance along a as
We wish to maximize the variance under the constraint that aTa=1 optimization with constraints method of Lagrange multipliers
XaXa(Xa)(Xa) TTT as
aSa
aSaa
aaSaa TT
022
1
u
)λ(u
Example, 2D Example, 2D Covariance matrix,
Decomposition,
PCA,
2049.19596.0
9596.09701.0S
66.0
74.0,12.0
74.0
66.0,0592.2
12
11
v
v
From CMU 15-385 Computer Vision by Tai Sing Lee
yxyxz 66.074.0z ,74.066.0 21
PCA, IIIPCA, III If S = {sik} is the pp sample covariance matrix with
eigenvalue, eigenvectors pairs (1, v1), (2, v2),…, (p, vp), the ith principal component is given by
where 1 2 … p 0 and x is the p-dimensional vector formed by the random variables x1, x2,…, xp.
Also
pixvyp
jjij
Tii ,...,2,1 ,
1
xv
p
k
p
ii
p
iii
ji
ii
k
s
jiyy
piy
...component principal
thby variance
totalof proportion
varianceTotal
0],cov[
,...,2,1 ,]var[
21
11
ApplicationsApplications Dimensional reduction Image compression pattern recognition Gene expression data analysis Molecular dynamics simulation …
Dimensional reductionDimensional reduction
We can throw v3 away, and keep w=[v1 v2] and can still represent the information almost equally well.
v1 and v2 also provide good dimensions in which different objects/textures form nice clusters in this 2D space.
From CMU 15-385 Computer Vision by Tai Sing Lee
Image compression, IImage compression, I A set of N images, I1, I2,…, IN, each of which has n pixels.
– Dataset of N dimensions and n observations– Corresponding pixels form vectors of intensities
Expand each of them as a series,
where the optimal set of basis vectors are chosen to minimize the reconstruction error,
Principle components of the set form the optimal basis.– PCA produces N eigenvectors and eigenvalues.– Compress: choose limited number (k<N) of components– Information loss when recreating original data
N
jjiji vcI
1
NkvcIerrorN
i
k
jjiji
where)(1
2
1
Image compression, IIImage compression, II Given a large set of 8x8 image
patches, convert an image patch into a vector by stacking the columns together into one column vector.
Compute the covariance matrix Transforming into a set of new bases
by PCA. Since the eigenvalues in S drops
rapidly, we can represent the image more efficiently in this new coordinate with the eigenvectors (principle components) v1,...vk where k << 64 as bases (k10).
Then I = a1v1 + a2v2 + ...+ akvk
The idea is that now you only store 10 code words, each is a 8x8 image basis, then you can transmit the image with only 10 numbers instead of 64 numbers. From CMU 15-385 Computer Vision by Tai Sing Lee
ApplicationsApplications Representation
– N x N pixel image X=(x1 ... xN2)
– xi is intensity value
PCA for Pattern identification– Perform PCA on matrix of M images– If new image Which original image is most similar?– Traditionally: difference original image and new image– PCA: difference PCA data and new image– Advantage: PCA data reflects similarities and differences in image
data– Omitted dimensions still good performance
PCA for image compression– M images, each containing N2 pixels– Dataset of M dimensions and N2 observations– Corresponding pixels form vectors of intensities– PCA produces M eigenvectors and eigenvalues– Compress: choose limited number of components– Information loss when recreating original data
Interpolation & ExtrapolationInterpolation & Extrapolation Numerical Recipes, Chapter 3 Consider n pairs of data of variables x and y,
and we don’t know an analytic expression for y=f(x). The task is to estimate f(x) for arbitrary x by drawing a smooth
curve through xi’s.
– Interpolation: if x is in between the largest and smallest of xi’s.
– Extrapolation: if x is outside of the range (more dangerous, example: stock market)
Methods– Polynomials, rational functions– Trigonometric interpolation: Fourier methods– Spline fit.
Order: the number of points (minus one) used in an interpolation
– Increasing order does not necessarilyincrease the accuracy.
n
n
y
x
y
x
y
x ,..., ,
2
2
1
1
Polynomial interpolation, IPolynomial interpolation, I Straight line interpolation
– Given two points (x1, y1) and (x2, y2), use a straight line to join two points to find all the missing values in between
Lagrange interpolation– First order
– Second order polynomials:
12
1121 )()(
xx
xxyyyxPy
2
12
11
21
22 )( y
xx
xxy
xx
xxxPy
3
2313
212
3212
311
3121
323
))((
))((
))((
))((
))((
))(()( y
xxxx
xxxxy
xxxx
xxxxy
xxxx
xxxxxPy
Polynomial interpolation, IPolynomial interpolation, I In general, the interpolating polynomial of degree N-1
through the N points y1=f(x1), y2=f(x2), …, yN=f(xN) is
N
NNNN
N
N
N
N
NN
yxxxxxx
xxxxxx
yxxxxxx
xxxxxx
yxxxxxx
xxxxxxxPy
)())((
)())((
)())((
)())((
)())((
)())(()(
121
121
2
23212
31
1
13121
32
Example, IExample, I
The values are evaluated.P(x) = 9.2983*(x-1.7)(x-3.0)
- 19.4872*(x-1.1)(x-3.0) + 8.2186*(x-1.1)(x-1.7)
P(2.3) = 9.2983*(2.3-1.7)(2.3-3.0) - 19.4872*(2.3-1.1)(2.3-3.0) + 8.2186*(2.3-1.1)(2.3-1.7)
= 18.3813
x y1.1 10.61.7 15.23 20.3
2186.87.10.31.10.3
3.20
4872.190.37.11.17.1
2.15
298.90.31.17.11.1
6.10
3
2
3121
11
C
C
xxxx
yC
Lagrange Interpolation
0
5
10
15
20
25
1 1.5 2 2.5 3
x values
y va
lues
Example, IIExample, II What happens if we increase
the number of data points? Coefficient for 2 is
Note: that the coefficient creates a P4(x) polynomial and comparison between the two curves. The original value P2(x) is given.
The problem with adding additional points will create “bulges” in the graph.
x y Ci
1.1 10.6 28.17651.7 15.2 129.91453 20.3 6.42081.4 13.4 -116.3192.2 18.7 -53.125
52423212
22
xxxxxxxx
yC
Lagrange Interpolation
0
5
10
15
20
25
1 1.5 2 2.5 3
X Values
Y V
alue
s
Rational Function InterpolationRational Function Interpolation
222
23
112
13
icxbxax
cxbxaxxP
22
22
311
2
jcxbxax
cxbxxP
x y-1 0.0385-0.5 0.13790 10.5 0.13791 0.0385
Cubic Spline InterpolationCubic Spline Interpolation Cubic Spline interpolation use only the data points used
to maintaining the desired smoothness of the function and is piecewise continuous.
Given a function f defined on [a, b] and a set of nodes a=x0<x1<…<xn=b, a cubic spline interpolation S for f is
– S(x) is a cubic polynomial, denoted Sj(x), on the subinterval [xj, xj+1] for each j=0, 1, …, n-1;
– Sj(xj) = f(xj) for j = 0, 1, …, n;
– Sj+1(xj+1) = Sj(xj+1) for j = 0, 1, …, n-2;
– S’j+1(xj+1) = S’j(xj+1) for j = 0, 1, …, n-2;
– S’’j+1(xj+1) = S’’j(xj+1) for j = 0, 1, …, n-2;
– Boundary conditions: S’’(a)= S’’(b)= 0
3jj2
jjjjj xxdxxcxxbaxS