Human Growth: From data to functions. Challenges to measuring growth We need repeated and regular...

32
Human Growth: From data to Human Growth: From data to functions functions 0 2 4 6 8 10 12 14 16 18 60 80 100 120 140 160 180 200 Age Height(cm )

Transcript of Human Growth: From data to functions. Challenges to measuring growth We need repeated and regular...

Human Growth: From data to Human Growth: From data to functionsfunctions

0 2 4 6 8 10 12 14 16 1860

80

100

120

140

160

180

200

Age

He

igh

t (c

m)

Challenges to measuring Challenges to measuring growthgrowth

We need repeated and regular access to We need repeated and regular access to subjects for up to 20 years.subjects for up to 20 years.

Height changes over the day, and must be Height changes over the day, and must be measured at a fixed time.measured at a fixed time.

Height is measured in supine position in Height is measured in supine position in infancy, followed by standing height. The infancy, followed by standing height. The change involves an adjustment of about 1 change involves an adjustment of about 1 cm. cm.

Measurement error is about 0.5 cm in later Measurement error is about 0.5 cm in later years, but is rather larger in infancy.years, but is rather larger in infancy.

Challenges to functional Challenges to functional modelingmodeling

We want smooth curves that fit the We want smooth curves that fit the data as well as is reasonable. data as well as is reasonable.

We will want to look at velocity and We will want to look at velocity and acceleration, so we want to acceleration, so we want to differentiate twice and still be smooth.differentiate twice and still be smooth.

In principle the curves should be In principle the curves should be monotone; i. e., have a positive monotone; i. e., have a positive derivative.derivative.

The monotonicity problemThe monotonicity problem

The tibia of a newborn The tibia of a newborn measured daily measured daily shows us that over shows us that over the short term the short term growth takes places growth takes places in spurts.in spurts.

This baby’s tibia grows This baby’s tibia grows as fast as 2 mm/day!as fast as 2 mm/day!

How can we fit a How can we fit a smooth monotone smooth monotone function? function?

Weighted sums of basis Weighted sums of basis functionsfunctions

We need a flexible We need a flexible method for method for constructing curves to constructing curves to fit the data.fit the data.

We begin with a set of We begin with a set of basic functional basic functional building blocks building blocks φφkk(t),(t), called called basis functionsbasis functions..

Our fitting function Our fitting function x(t)x(t) is a weighted sum of is a weighted sum of these: these:

( ) ( )k kk

x t c t

What are the main choices for What are the main choices for basis functions?basis functions?

Fourier seriesFourier series: : a constant term, a constant term, a sine/cosine pair of fixed frequency, a sine/cosine pair of fixed frequency,

and and followed by a series of sine/cosine followed by a series of sine/cosine

pairs with integer multiples of the base pairs with integer multiples of the base frequency.frequency.

Fourier series are best for periodic data. Fourier series are best for periodic data.

Five Fourier basis functionsFive Fourier basis functions

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

-1

-0.5

0

0.5

1

These basis functions are piecewise These basis functions are piecewise polynomials defined by a set of polynomials defined by a set of discrete values called discrete values called knotsknots. .

The The orderorder of the polynomials (degree of the polynomials (degree + 1) controls their smoothness. + 1) controls their smoothness.

Each basis function is nonzero only Each basis function is nonzero only over a number of contiguous inter-over a number of contiguous inter-knot intervals equal to the order.knot intervals equal to the order.

Polynomials are a special type of B-Polynomials are a special type of B-spline, and are thus included within spline, and are thus included within the system. the system.

B-splinesB-splines

When should I use B-When should I use B-splines?splines?

B-splines are the basis of choice for B-splines are the basis of choice for most non-periodic.most non-periodic.

They give complete control over They give complete control over flexibility, allowing more flexibility flexibility, allowing more flexibility where needed and less where not where needed and less where not needed.needed.

Computing with B-splines is Computing with B-splines is extremely efficient. extremely efficient.

Five order 2 B-spline basis Five order 2 B-spline basis functions: functions:

A basis for polygonal linesA basis for polygonal lines

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Eight order 4 B-spline basis Eight order 4 B-spline basis functions functions

A basis for twice-differentiable A basis for twice-differentiable functionsfunctions

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

t

B-s

plin

e ba

sis

func

tions

Bj4(t

)

B-splines for growth dataB-splines for growth data

We use order 6 B-splines because we We use order 6 B-splines because we want to differentiate the result at least want to differentiate the result at least twice. Order 4 splines look smooth, but twice. Order 4 splines look smooth, but their second derivatives are rough.their second derivatives are rough.

We place a knot at each of the 31 ages.We place a knot at each of the 31 ages. The total number of basis functions = The total number of basis functions =

order + number of interior knots. 35 in order + number of interior knots. 35 in this case.this case.

Isn’t using 35 basis functions Isn’t using 35 basis functions to fit 31 observations a to fit 31 observations a

problem?problem? Yes. We will fit each observation Yes. We will fit each observation

exactly.exactly. This will ignore the fact that the This will ignore the fact that the

measurement error is typically about measurement error is typically about 0.5 cm.0.5 cm.

But we’ll fix this up later, when we But we’ll fix this up later, when we look at roughness penalties.look at roughness penalties.

Okay, let’s see what Okay, let’s see what happenshappens

These two Matlab commands define These two Matlab commands define the basis and fit the data:the basis and fit the data:

hgtbasis = hgtbasis =

create_bspline_basis([1,18], 35, 6, create_bspline_basis([1,18], 35, 6, age);age);

hgtfd =hgtfd =

data2fd(hgtfmat, age, hgtbasis);data2fd(hgtfmat, age, hgtbasis);

Why we need to smoothWhy we need to smooth

Noise in the Noise in the data has data has a huge a huge impact on impact on derivative derivative estimatesestimates..

2 4 6 8 10 12 14 16 18

0

2

4

6

8

10

12

14

16

18

20

Age

Ve

loci

ty (

cm/y

r)

Please let me smooth the Please let me smooth the data!data!

This command sets up 12 B-spline This command sets up 12 B-spline basis functions defined by equally basis functions defined by equally spaced knots. This gives us about spaced knots. This gives us about the right amount of fitting power the right amount of fitting power given the error level.given the error level.

hgtbasis = hgtbasis =

create_bspline_basis([1,18], 12, 6);create_bspline_basis([1,18], 12, 6);

These are These are velocities velocities are much are much better. better.

They go They go negative on negative on the right, the right, though.though.

2 4 6 8 10 12 14 16 18

0

5

10

15

20

Age

Ve

loci

ty (

cm/y

r)

Let’s see some Let’s see some accelerationsaccelerations

These These acceleration acceleration curves are curves are too unstable too unstable at the ends. at the ends.

We need We need something something better. better.

2 4 6 8 10 12 14 16 18

-35

-30

-25

-20

-15

-10

-5

0

5

10

Age

Acc

ele

ratio

n (

cm/y

r2)

A measure of roughnessA measure of roughness

What do we mean by “smooth”? What do we mean by “smooth”? A function that is smooth has limited A function that is smooth has limited

curvature. curvature. Curvature depends on the second Curvature depends on the second

derivative. A straight line is derivative. A straight line is completely smooth.completely smooth.

Total curvatureTotal curvature

We can measure the We can measure the roughness of a roughness of a function function x(t)x(t) by by integrating its integrating its squared second squared second derivative.derivative.

The second The second derivative notation derivative notation is is DD22x(t).x(t).

2 2( ) [ ( )]R x D x t dt

Total curvature of Total curvature of accelerationacceleration

Since we want Since we want acceleration acceleration to be smooth, to be smooth, we measure we measure roughness at roughness at the level of the level of acceleration:acceleration:

24( ) ( )R x D x t dt

The penalized least squares The penalized least squares criterioncriterion

We strike a compromise between We strike a compromise between fitting the data and keeping the fit fitting the data and keeping the fit smooth.smooth.

2( ) ( )i i

i

PENSSE x y x t R x

How does this control How does this control roughness?roughness?

Smoothing parameter Smoothing parameter λλ controls controls roughness.roughness.

When When λλ = 0,= 0, only fitting the data matters.only fitting the data matters. But as But as λλ increases, we place more and increases, we place more and

more emphasis on penalizing roughness.more emphasis on penalizing roughness. As As λλ ∞, ∞, only roughness matters, and only roughness matters, and

functions having zero roughness are used. functions having zero roughness are used.

We can either smooth at the data We can either smooth at the data fitting step, or smooth a rough fitting step, or smooth a rough function.function.

This Matlab command smooths the fit This Matlab command smooths the fit to the data obtained using knots at to the data obtained using knots at ages. The roughness of the fourth ages. The roughness of the fourth derivative is controlled.derivative is controlled.

lambda = 0.01;lambda = 0.01;

hgtfd = smooth_fd(hgtfd, lambda, 4);hgtfd = smooth_fd(hgtfd, lambda, 4);

Accelerations using a Accelerations using a roughness penaltyroughness penalty

These These accelerationaccelerations are much s are much less variable less variable at the at the extremes.extremes.

The corresponding The corresponding velocitiesvelocities

How did you choose How did you choose λλ??

This is inevitably involves judgment.This is inevitably involves judgment. We smooth just enough to obtain We smooth just enough to obtain

tolerable roughness in the estimated tolerable roughness in the estimated curves (accelerations in this case), curves (accelerations in this case), but not so much as to lose but not so much as to lose interesting variation.interesting variation.

There are data-driven methods for There are data-driven methods for choosing choosing λλ, but they offer only a , but they offer only a reasonable place to begin exploring. reasonable place to begin exploring.

What about monotonicity?What about monotonicity?

The growth curves should be monotonic. The growth curves should be monotonic.

The velocities should be non-negative.The velocities should be non-negative. It’s hard to prevent linear combinations It’s hard to prevent linear combinations

of anything from breaking the rules.of anything from breaking the rules. We need an indirect approach to We need an indirect approach to

constructing a monotonic modelconstructing a monotonic model

A differential equation for A differential equation for monotonicitymonotonicity

Any strictly monotonic function x(t) must Any strictly monotonic function x(t) must satisfy a simple linear differential equation:satisfy a simple linear differential equation:

2 ( ) ( ) ( )D x t w t Dx t

The reason is simple: because of strict monotonicity, The reason is simple: because of strict monotonicity, the first derivative Dx(t) will never be 0, and the first derivative Dx(t) will never be 0, and function w(t) is therefore simply Dfunction w(t) is therefore simply D22x(t)/Dx(t).x(t)/Dx(t).

The solution of the differential The solution of the differential equationequation

Consequently, any strictly monotonic Consequently, any strictly monotonic function function x(t)x(t) must be expressible in the must be expressible in the formform

0 1 0 0( ) exp ( )

t ux t w v dv du

This suggests that we transform the This suggests that we transform the monotonemonotone smoothing problem into one of estimating smoothing problem into one of estimating function function w(t),w(t), and constants and constants ββ00 and and ββ11..

What we have learnedWhat we have learned

B-spline bases are a good choice for fitting B-spline bases are a good choice for fitting non-periodic functions; Fourier series are non-periodic functions; Fourier series are right for periodic situations.right for periodic situations.

We can control smoothness by either using We can control smoothness by either using a restricted number of basis functions, or by a restricted number of basis functions, or by imposing a roughness penalty.imposing a roughness penalty.

Roughness penalty methods generally work Roughness penalty methods generally work better.better.

Differential equations can play a useful role Differential equations can play a useful role when fitting constrained functions to data.when fitting constrained functions to data.

More informationMore information

Ramsay & Silverman (1997), Chs. 3, Ramsay & Silverman (1997), Chs. 3, 4, 134, 13

Ramsay & Silverman (2002), Ch. 6.Ramsay & Silverman (2002), Ch. 6. The long-term growth data are from The long-term growth data are from

the Berkeley growth study.the Berkeley growth study. The infant growth data were The infant growth data were

collected by Michael Hermanussen. collected by Michael Hermanussen.