Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004.

Empirical ModelingEmpirical Modeling

Dongsup Kim

Department of Biosystems, KAISTFall, 2004

Empirical modelingEmpirical modeling Moore’s law: Gordon Moore made his famous observation in 1965, just

four years after the first planar integrated circuit was discovered. The press called it "Moore's Law" and the name has stuck. In his original paper, Moore observed an exponential growth in the number of transistors per integrated circuit and predicted that this trend would continue.

From http://www.intel.com/research/silicon/mooreslaw.htm

Covariance and correlationCovariance and correlation Consider n pairs of measurements on each of variables

x and y.

A measure of linear association between the measurements of variable x and y is “sample covariance”

– If sxy > 0: positively correlated

– If sxy < 0: negatively correlated

– If sxy = 0: uncorrelated

Sample linear correlation coefficient (“Pearson’s product moment correlation coefficient”)

x ,..., ,

iiixy yyxx

11 xyr

CorrelationCorrelation

Y (Salary in $1000s)

X (Years Experience)

979.22

Strong relationship

Covariance & Correlation Covariance & Correlation matrix matrix

Given n measurements on p variables, the sample covariance is

and the covariance matrix,

The sample correlation coefficient for the ith and jth variables,

and the correlation matrix,

pjpixxxxn

kjjkiikij ,...,2,1 ,,...,2,1 ,))((

kjjkiik

In two dimensionIn two dimension

Fitting a line to dataFitting a line to data When the correlation coefficient is large, it indicates a

dependence of one variable on the other. The simplest relationship is the straight line: y = 0+ 1x

Criteria for a best fit line: least squares The resulting equation is called “regression equation”,

and its graph is called the “regression line”. The sum of squares of the error SS:

Least square equations:

iii xyyySS

2 )]ˆˆ([)ˆ(

ˆˆ ,)(

A measure of fitA measure of fit Suppose we have data points (xi, yi) and modeled (or

predicted) points (xi, ŷi) from the model ŷ = f(x).

Data {yi} have two types of variations; (i) variation explained by the model and (ii) variation not explained by the model.

Residual sum of squares: variation not explained by the model

Regression sum of squares: variation explained by the model

The coefficient of determination R2

iiis yySS

2Re )ˆ(

iig yySS

2Re )ˆ(

Total variation in y = Variation explained by the model + Unexplained variation (error)

Principal Component Analysis Principal Component Analysis (PCA)(PCA)

PCA selects a new set of axes for the data by moving and rotating the coordinate system in such a way that the dependency between the two variables is removed in a new transformed coordinate system.

First principal axis points to the direction of the maximum variation in the data.

Second principal axis is orthogonal to the first one and is in the direction of the maximum variation in the remaining allowable directions, and so on.

It can be used to:– Reduce number of dimensions in

data.– Find patterns in high-dimensional

data.– Visualize data of high

dimensionality.

PCA, IIPCA, II Assume X is an n p matrix and is “centered (zero

mean)” Let a be the p 1 column vector of projection weights

(unknown at this point) that result in the largest variance when the data X are projected along a.

We can express the projected values onto a of all data vectors in X as Xa.

Now define the variance along a as

We wish to maximize the variance under the constraint that aTa=1 optimization with constraints method of Lagrange multipliers

XaXa(Xa)(Xa) TTT as

aaSaa TT

Example, 2D Example, 2D Covariance matrix,

Decomposition,

2049.19596.0

9596.09701.0S

74.0,12.0

66.0,0592.2

From CMU 15-385 Computer Vision by Tai Sing Lee

yxyxz 66.074.0z ,74.066.0 21

PCA, IIIPCA, III If S = {sik} is the pp sample covariance matrix with

eigenvalue, eigenvectors pairs (1, v1), (2, v2),…, (p, vp), the ith principal component is given by

where 1 2 … p 0 and x is the p-dimensional vector formed by the random variables x1, x2,…, xp.

pixvyp

Tii ,...,2,1 ,

...component principal

thby variance

totalof proportion

varianceTotal

0],cov[

,...,2,1 ,]var[

ApplicationsApplications Dimensional reduction Image compression pattern recognition Gene expression data analysis Molecular dynamics simulation …

Dimensional reductionDimensional reduction

We can throw v3 away, and keep w=[v1 v2] and can still represent the information almost equally well.

v1 and v2 also provide good dimensions in which different objects/textures form nice clusters in this 2D space.

From CMU 15-385 Computer Vision by Tai Sing Lee

Image compression, IImage compression, I A set of N images, I1, I2,…, IN, each of which has n pixels.

– Dataset of N dimensions and n observations– Corresponding pixels form vectors of intensities

Expand each of them as a series,

where the optimal set of basis vectors are chosen to minimize the reconstruction error,

Principle components of the set form the optimal basis.– PCA produces N eigenvectors and eigenvalues.– Compress: choose limited number (k<N) of components– Information loss when recreating original data

jjiji vcI

NkvcIerrorN

where)(1

Image compression, IIImage compression, II Given a large set of 8x8 image

patches, convert an image patch into a vector by stacking the columns together into one column vector.

Compute the covariance matrix Transforming into a set of new bases

by PCA. Since the eigenvalues in S drops

rapidly, we can represent the image more efficiently in this new coordinate with the eigenvectors (principle components) v1,...vk where k << 64 as bases (k10).

Then I = a1v1 + a2v2 + ...+ akvk

The idea is that now you only store 10 code words, each is a 8x8 image basis, then you can transmit the image with only 10 numbers instead of 64 numbers. From CMU 15-385 Computer Vision by Tai Sing Lee

ApplicationsApplications Representation

– N x N pixel image X=(x1 ... xN2)

– xi is intensity value

PCA for Pattern identification– Perform PCA on matrix of M images– If new image Which original image is most similar?– Traditionally: difference original image and new image– PCA: difference PCA data and new image– Advantage: PCA data reflects similarities and differences in image

data– Omitted dimensions still good performance

PCA for image compression– M images, each containing N2 pixels– Dataset of M dimensions and N2 observations– Corresponding pixels form vectors of intensities– PCA produces M eigenvectors and eigenvalues– Compress: choose limited number of components– Information loss when recreating original data

Interpolation & ExtrapolationInterpolation & Extrapolation Numerical Recipes, Chapter 3 Consider n pairs of data of variables x and y,

and we don’t know an analytic expression for y=f(x). The task is to estimate f(x) for arbitrary x by drawing a smooth

curve through xi’s.

– Interpolation: if x is in between the largest and smallest of xi’s.

– Extrapolation: if x is outside of the range (more dangerous, example: stock market)

Methods– Polynomials, rational functions– Trigonometric interpolation: Fourier methods– Spline fit.

Order: the number of points (minus one) used in an interpolation

– Increasing order does not necessarilyincrease the accuracy.

x ,..., ,

Polynomial interpolation, IPolynomial interpolation, I Straight line interpolation

– Given two points (x1, y1) and (x2, y2), use a straight line to join two points to find all the missing values in between

Lagrange interpolation– First order

– Second order polynomials:

1121 )()(

xxyyyxPy

22 )( y

))(()( y

xxxxxPy

Polynomial interpolation, IPolynomial interpolation, I In general, the interpolating polynomial of degree N-1

through the N points y1=f(x1), y2=f(x2), …, yN=f(xN) is

yxxxxxx

xxxxxx

yxxxxxx

xxxxxx

yxxxxxx

xxxxxxxPy

)())((

)())(()(

Example, IExample, I

The values are evaluated.P(x) = 9.2983*(x-1.7)(x-3.0)

- 19.4872*(x-1.1)(x-3.0) + 8.2186*(x-1.1)(x-1.7)

P(2.3) = 9.2983*(2.3-1.7)(2.3-3.0) - 19.4872*(2.3-1.1)(2.3-3.0) + 8.2186*(2.3-1.1)(2.3-1.7)

= 18.3813

x y1.1 10.61.7 15.23 20.3

2186.87.10.31.10.3

4872.190.37.11.17.1

298.90.31.17.11.1

Lagrange Interpolation

1 1.5 2 2.5 3

x values

Example, IIExample, II What happens if we increase

the number of data points? Coefficient for 2 is

Note: that the coefficient creates a P4(x) polynomial and comparison between the two curves. The original value P2(x) is given.

The problem with adding additional points will create “bulges” in the graph.

x y Ci

1.1 10.6 28.17651.7 15.2 129.91453 20.3 6.42081.4 13.4 -116.3192.2 18.7 -53.125

52423212

xxxxxxxx

Lagrange Interpolation

1 1.5 2 2.5 3

X Values

Rational Function InterpolationRational Function Interpolation

icxbxax

cxbxaxxP

jcxbxax

cxbxxP

x y-1 0.0385-0.5 0.13790 10.5 0.13791 0.0385

Cubic Spline InterpolationCubic Spline Interpolation Cubic Spline interpolation use only the data points used

to maintaining the desired smoothness of the function and is piecewise continuous.

Given a function f defined on [a, b] and a set of nodes a=x0<x1<…<xn=b, a cubic spline interpolation S for f is

– S(x) is a cubic polynomial, denoted Sj(x), on the subinterval [xj, xj+1] for each j=0, 1, …, n-1;

– Sj(xj) = f(xj) for j = 0, 1, …, n;

– Sj+1(xj+1) = Sj(xj+1) for j = 0, 1, …, n-2;

– S’j+1(xj+1) = S’j(xj+1) for j = 0, 1, …, n-2;

– S’’j+1(xj+1) = S’’j(xj+1) for j = 0, 1, …, n-2;

– Boundary conditions: S’’(a)= S’’(b)= 0

jjjjj xxdxxcxxbaxS

Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004.

Documents

Transcript of Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004.

CLEAN GREEN BIOSYSTEMS

ICISTS KAIST Keynote Speech

Introduction to KAIST GCC

Sang-Won Cho* : Ph.D. Candidate, KAIST Hyung-Jo Jung : Research Assistant Professor, KAIST

Kang-Min Choi, Graduate Student, KAIST, Korea Young-Jong Moon , Graduate Student, KAIST, Korea

IMAGION BIOSYSTEMS LIMITED - ASX · 2019-09-08 · Imagion Biosystems Limited ACN 616 305 027 Level 8, 555 Bourke Street, Melbourne VIC 3000 BIOSYSTEMS ® IMAGION BIOSYSTEMS LIMITED

Molecular BioSystems - pubs.rsc.org

Genetic Programming - KAIST

Molecular BioSystems

ME Biosystems and Food Engineering Course Talk Biosystems and Food Engineering.pdf · Ireland’s Greenhouse Gas Emissions (EPA.ie) Biosystems and Food Engineering ... • Application

View PDF - KAIST

EE Education in KAIST

LinAlg ch6 - KAIST

International Commission of Agricultural and Biosystems ... · Biosystems Engineers – A Message from the CIGR President . Agricultural and biosystems engineering applies science

Learning-Based Rendering - KAIST

Difficult Blocks and Reprocessing - Biosystems Home: Leica Biosystems

f @kaist.ac.kr arXiv:1909.13247v2 [cs.CV] 10 Oct 2019 · KAIST Seokeon Choi KAIST Hankyeol Lee KAIST Taekyung Kim KAIST Changick Kim KAIST fyoungeunkim, seokeon, hankyeol, tkkim93,

Kaist snail-20150122

KAIST Mobile Harbor Project

Chapter 1 Introduction - KAIST