Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004.

Empirical ModelingEmpirical Modeling

Dongsup Kim

Department of Biosystems, KAISTFall, 2004

Empirical modelingEmpirical modeling Moore’s law: Gordon Moore made his famous observation in 1965, just

four years after the first planar integrated circuit was discovered. The press called it "Moore's Law" and the name has stuck. In his original paper, Moore observed an exponential growth in the number of transistors per integrated circuit and predicted that this trend would continue.

From http://www.intel.com/research/silicon/mooreslaw.htm

Covariance and correlationCovariance and correlation Consider n pairs of measurements on each of variables

x and y.

A measure of linear association between the measurements of variable x and y is “sample covariance”

– If sxy > 0: positively correlated

– If sxy < 0: negatively correlated

– If sxy = 0: uncorrelated

Sample linear correlation coefficient (“Pearson’s product moment correlation coefficient”)

n

n

y

x

y

x

y

x ,..., ,

2

2

1

1

n

iiixy yyxx

ns

1

))((1

n

ii

n

ii

n

iii

yx

xyxy

yyxx

yyxx

ss

sr

1

2

1

2

1

)()(

))((

11 xyr

CorrelationCorrelation

201

8316

9021

5911

436

363

7213

649

578

303

Y (Salary in $1000s)

X (Years Experience)

972.0

979.22

315.6

4.55

1.9

,

YX

Y

X

r

Y

X

Strong relationship

Covariance & Correlation Covariance & Correlation matrix matrix

Given n measurements on p variables, the sample covariance is

and the covariance matrix,

The sample correlation coefficient for the ith and jth variables,

and the correlation matrix,

pppp

p

p

sss

sss

sss

21

22221

11211

S

1

1

1

21

221

112

pp

p

p

rr

rr

rr

R

pjpixxxxn

sn

kjjkiikij ,...,2,1 ,,...,2,1 ,))((

1

1

n

kjjk

n

kiik

n

kjjkiik

jjii

ijij

xxxx

xxxx

ss

sr

1

2

1

2

1

)()(

))((

In two dimensionIn two dimension

Fitting a line to dataFitting a line to data When the correlation coefficient is large, it indicates a

dependence of one variable on the other. The simplest relationship is the straight line: y = 0+ 1x

Criteria for a best fit line: least squares The resulting equation is called “regression equation”,

and its graph is called the “regression line”. The sum of squares of the error SS:

Least square equations:

n

iii

n

iii xyyySS

1

210

1

2 )]ˆˆ([)ˆ(

xys

s

xnxn

yxyxn

SSSS

x

xy

n

ii

n

ii

n

ii

n

ii

n

iii

10

1

2

1

2

1111

10

ˆˆ ,)(

ˆ

0ˆ

,0ˆ

A measure of fitA measure of fit Suppose we have data points (xi, yi) and modeled (or

predicted) points (xi, ŷi) from the model ŷ = f(x).

Data {yi} have two types of variations; (i) variation explained by the model and (ii) variation not explained by the model.

Residual sum of squares: variation not explained by the model

Regression sum of squares: variation explained by the model

The coefficient of determination R2

n

iiis yySS

1

2Re )ˆ(

n

iig yySS

1

2Re )ˆ(

x1 x2

y1

y2

y

Total variation in y = Variation explained by the model + Unexplained variation (error)

sg

g

SSSS

SSR

ReRe

Re2

Principal Component Analysis Principal Component Analysis (PCA)(PCA)

PCA selects a new set of axes for the data by moving and rotating the coordinate system in such a way that the dependency between the two variables is removed in a new transformed coordinate system.

First principal axis points to the direction of the maximum variation in the data.

Second principal axis is orthogonal to the first one and is in the direction of the maximum variation in the remaining allowable directions, and so on.

It can be used to:– Reduce number of dimensions in

data.– Find patterns in high-dimensional

data.– Visualize data of high

dimensionality.

PCA, IIPCA, II Assume X is an n p matrix and is “centered (zero

mean)” Let a be the p 1 column vector of projection weights

(unknown at this point) that result in the largest variance when the data X are projected along a.

We can express the projected values onto a of all data vectors in X as Xa.

Now define the variance along a as

We wish to maximize the variance under the constraint that aTa=1 optimization with constraints method of Lagrange multipliers

XaXa(Xa)(Xa) TTT as

aSa

aSaa

aaSaa TT

022

1

u

)λ(u

Example, 2D Example, 2D Covariance matrix,

Decomposition,

PCA,

2049.19596.0

9596.09701.0S

66.0

74.0,12.0

74.0

66.0,0592.2

12

11

v

v

From CMU 15-385 Computer Vision by Tai Sing Lee

yxyxz 66.074.0z ,74.066.0 21

PCA, IIIPCA, III If S = {sik} is the pp sample covariance matrix with

eigenvalue, eigenvectors pairs (1, v1), (2, v2),…, (p, vp), the ith principal component is given by

where 1 2 … p 0 and x is the p-dimensional vector formed by the random variables x1, x2,…, xp.

Also

pixvyp

jjij

Tii ,...,2,1 ,

1

xv

p

k

p

ii

p

iii

ji

ii

k

s

jiyy

piy

...component principal

thby variance

totalof proportion

varianceTotal

0],cov[

,...,2,1 ,]var[

21

11

ApplicationsApplications Dimensional reduction Image compression pattern recognition Gene expression data analysis Molecular dynamics simulation …

Dimensional reductionDimensional reduction

We can throw v3 away, and keep w=[v1 v2] and can still represent the information almost equally well.

v1 and v2 also provide good dimensions in which different objects/textures form nice clusters in this 2D space.

From CMU 15-385 Computer Vision by Tai Sing Lee

Image compression, IImage compression, I A set of N images, I1, I2,…, IN, each of which has n pixels.

– Dataset of N dimensions and n observations– Corresponding pixels form vectors of intensities

Expand each of them as a series,

where the optimal set of basis vectors are chosen to minimize the reconstruction error,

Principle components of the set form the optimal basis.– PCA produces N eigenvectors and eigenvalues.– Compress: choose limited number (k<N) of components– Information loss when recreating original data

N

jjiji vcI

1

NkvcIerrorN

i

k

jjiji

where)(1

2

1

Image compression, IIImage compression, II Given a large set of 8x8 image

patches, convert an image patch into a vector by stacking the columns together into one column vector.

Compute the covariance matrix Transforming into a set of new bases

by PCA. Since the eigenvalues in S drops

rapidly, we can represent the image more efficiently in this new coordinate with the eigenvectors (principle components) v1,...vk where k << 64 as bases (k10).

Then I = a1v1 + a2v2 + ...+ akvk

The idea is that now you only store 10 code words, each is a 8x8 image basis, then you can transmit the image with only 10 numbers instead of 64 numbers. From CMU 15-385 Computer Vision by Tai Sing Lee

ApplicationsApplications Representation

– N x N pixel image X=(x1 ... xN2)

– xi is intensity value

PCA for Pattern identification– Perform PCA on matrix of M images– If new image Which original image is most similar?– Traditionally: difference original image and new image– PCA: difference PCA data and new image– Advantage: PCA data reflects similarities and differences in image

data– Omitted dimensions still good performance

PCA for image compression– M images, each containing N2 pixels– Dataset of M dimensions and N2 observations– Corresponding pixels form vectors of intensities– PCA produces M eigenvectors and eigenvalues– Compress: choose limited number of components– Information loss when recreating original data

Interpolation & ExtrapolationInterpolation & Extrapolation Numerical Recipes, Chapter 3 Consider n pairs of data of variables x and y,

and we don’t know an analytic expression for y=f(x). The task is to estimate f(x) for arbitrary x by drawing a smooth

curve through xi’s.

– Interpolation: if x is in between the largest and smallest of xi’s.

– Extrapolation: if x is outside of the range (more dangerous, example: stock market)

Methods– Polynomials, rational functions– Trigonometric interpolation: Fourier methods– Spline fit.

Order: the number of points (minus one) used in an interpolation

– Increasing order does not necessarilyincrease the accuracy.

n

n

y

x

y

x

y

x ,..., ,

2

2

1

1

Polynomial interpolation, IPolynomial interpolation, I Straight line interpolation

– Given two points (x1, y1) and (x2, y2), use a straight line to join two points to find all the missing values in between

Lagrange interpolation– First order

– Second order polynomials:

12

1121 )()(

xx

xxyyyxPy

2

12

11

21

22 )( y

xx

xxy

xx

xxxPy

3

2313

212

3212

311

3121

323

))((

))((

))((

))((

))((

))(()( y

xxxx

xxxxy

xxxx

xxxxy

xxxx

xxxxxPy

Polynomial interpolation, IPolynomial interpolation, I In general, the interpolating polynomial of degree N-1

through the N points y1=f(x1), y2=f(x2), …, yN=f(xN) is

N

NNNN

N

N

N

N

NN

yxxxxxx

xxxxxx

yxxxxxx

xxxxxx

yxxxxxx

xxxxxxxPy

)())((

)())((

)())((

)())((

)())((

)())(()(

121

121

2

23212

31

1

13121

32

Example, IExample, I

The values are evaluated.P(x) = 9.2983*(x-1.7)(x-3.0)

- 19.4872*(x-1.1)(x-3.0) + 8.2186*(x-1.1)(x-1.7)

P(2.3) = 9.2983*(2.3-1.7)(2.3-3.0) - 19.4872*(2.3-1.1)(2.3-3.0) + 8.2186*(2.3-1.1)(2.3-1.7)

= 18.3813

x y1.1 10.61.7 15.23 20.3

2186.87.10.31.10.3

3.20

4872.190.37.11.17.1

2.15

298.90.31.17.11.1

6.10

3

2

3121

11

C

C

xxxx

yC

Lagrange Interpolation

0

5

10

15

20

25

1 1.5 2 2.5 3

x values

y va

lues

Example, IIExample, II What happens if we increase

the number of data points? Coefficient for 2 is

Note: that the coefficient creates a P4(x) polynomial and comparison between the two curves. The original value P2(x) is given.

The problem with adding additional points will create “bulges” in the graph.

x y Ci

1.1 10.6 28.17651.7 15.2 129.91453 20.3 6.42081.4 13.4 -116.3192.2 18.7 -53.125

52423212

22

xxxxxxxx

yC

Lagrange Interpolation

0

5

10

15

20

25

1 1.5 2 2.5 3

X Values

Y V

alue

s

Rational Function InterpolationRational Function Interpolation

222

23

112

13

icxbxax

cxbxaxxP

22

22

311

2

jcxbxax

cxbxxP

x y-1 0.0385-0.5 0.13790 10.5 0.13791 0.0385

Cubic Spline InterpolationCubic Spline Interpolation Cubic Spline interpolation use only the data points used

to maintaining the desired smoothness of the function and is piecewise continuous.

Given a function f defined on [a, b] and a set of nodes a=x0<x1<…<xn=b, a cubic spline interpolation S for f is

– S(x) is a cubic polynomial, denoted Sj(x), on the subinterval [xj, xj+1] for each j=0, 1, …, n-1;

– Sj(xj) = f(xj) for j = 0, 1, …, n;

– Sj+1(xj+1) = Sj(xj+1) for j = 0, 1, …, n-2;

– S’j+1(xj+1) = S’j(xj+1) for j = 0, 1, …, n-2;

– S’’j+1(xj+1) = S’’j(xj+1) for j = 0, 1, …, n-2;

– Boundary conditions: S’’(a)= S’’(b)= 0

3jj2

jjjjj xxdxxcxxbaxS

Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004.

Documents

Transcript of Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004.