Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing...

Post on 22-Mar-2020

18 views 0 download

Transcript of Smoothing Spline ANOVA - Statisticsusers.stat.umn.edu/~helwig/notes/ssanova-Notes.pdf · Smoothing...

Smoothing Spline ANOVA

Nathaniel E. Helwig

Assistant Professor of Psychology and StatisticsUniversity of Minnesota (Twin Cities)

Updated 04-Jan-2017

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 1

Copyright

Copyright c© 2017 by Nathaniel E. Helwig

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 2

Outline of Notes

1) IntroductionParametric regressionNonparametric regressionSmoothing splines

2) Background TheoryAveraging operatorsHilbert spacesReproducing kernels

3) Estimation & Inference:Penalized least squaresSmoothing parameter selectionBayesian confidence intervals

4) SSANOVA in Practice:One-way SSANOVATwo-way SSANOVA (additive)Two-way SSANOVA (interactive)

For a thorough treatment see:

Gu, C. (2013). Smoothing spline ANOVA models, 2nd edition. New York: Springer-Verlag.

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 3

Introduction

Introduction

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 4

Introduction Parametric Regression

Parametric Regression Model: Scalar Form

The multiple linear regression model has the form

yi =

p∑j=1

bjxij + ei

for i ∈ {1, . . . ,n} whereyi ∈ R is the real-valued response for the i-th observationbj ∈ R is the j-th predictor’s regression slopexij ∈ R is the j-th predictor for the i-th observation

eiiid∼ N(0, σ2) is Gaussian measurement error

Implies that (yi |xi1, . . . , xip)ind∼ N(

∑pj=1 bjxij , σ

2)

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 5

Introduction Parametric Regression

Parametric Regression Model: Matrix Form

The multiple linear regression model has the form

y = Xb + e

wherey = (y1, . . . , yn)′ ∈ Rn is the n × 1 response vectorX = [x1, . . . ,xp] ∈ Rn×p is the n × p design matrix• xj = (x1j , . . . , xnj )

′ ∈ Rn is j-th predictor vector (n × 1)

b = (b1, . . . ,bp)′ ∈ Rp is p × 1 vector of coefficientse = (e1, . . . ,en)′ ∈ Rn is the n × 1 error vector

Implies that (y|x) ∼ N(Xb, σ2In)

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 6

Introduction Parametric Regression

Ordinary Least Squares Solution

The ordinary least squares (OLS) problem is

minb∈Rp

1n‖y− Xb‖2 ←→ min

b∈Rp

1n

n∑i=1

(yi − yi)2

where ‖ · ‖ denotes the Euclidean norm and yi =∑p

j=1 bjxij .

The OLS solution has the form

b = (X′X)−1X′y

and the fitted values corresponding to b are given by

y = Xb = Hy

where H = X(X′X)−1X′ is the hat matrix.Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 7

Introduction Parametric Regression

Summary of Results

Using the model assumption (y|x) ∼ N(Xb, σ2In), we have

b ∼ N(b, σ2(X′X)−1)

y ∼ N(Xb, σ2H)

e ∼ N(0, σ2(In − H))

where e = y− y is the residual vector.

Typically σ2 is unknown, so we use the MSE σ2 = 1n−p

∑ni=1 e2

i .

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 8

Introduction Nonparametric Regression

Nonparametric Regression Model

The Gaussian nonparametric regression model has the form

yi = η(xi) + ei

for i ∈ {1, . . . ,n} whereyi ∈ R is the real-valued response for the i-th observationxi ∈ Rp is the predictor vector for the i-th observationη : Rp → R is an unknown smooth function

eiiid∼ N(0, σ2) is Gaussian measurement error

Implies that (yi |xi1, . . . , xip)ind∼ N(η(xi), σ

2)

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 9

Introduction Nonparametric Regression

Additive versus Interactive Models

Suppose that xi = (xi1, xi2) with xi1 ∈ X1 and xi2 ∈ X2.

We could fit one of two possible models:

Additive : η(xi) = η0 + η1(xi1) + η2(xi2)

Interaction : η(xi) = η0 + η1(xi1) + η2(xi2) + η12(xi1, xi2)

whereη0 is a constant functionη1 is main effect of first predictorη2 is main effect of second predictorη12 is interaction effect

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 10

Introduction Nonparametric Regression

Example 1: Continuous and Nominal Covariates

xi = (xi1, xi2) with xi1 ∈ [0,1] and xi2 ∈ {a,b}.

0.0 0.2 0.4 0.6 0.8 1.0

−2

−1

01

23

4

Additive

x1

y

x2 = ax2 = b

0.0 0.2 0.4 0.6 0.8 1.0

−2

−1

01

23

4

Interaction

x1

y

x2 = ax2 = b

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 11

Introduction Nonparametric Regression

Example 1: R Code

addfun = function(x1,x2){funval = sin(2*pi*x1)idx = which(x2=="a")funval[idx] = funval[idx] + 2funval

}intfun = function(x1,x2){

funval = sin(2*pi*x1)idx = which(x2=="a")funval[idx] = funval[idx] + 2 + sin(4*pi*x1[idx])funval

}

dev.new(width=12,height=6,noRStudioGD=TRUE)par(mfrow=c(1,2))x1 = seq(0,1,length=200)plot(x1,addfun(x1,rep("a",200)),type="l",ylim=c(-2,4),main="Additive",

ylab="y",cex.axis=1.25,cex.lab=1.5,cex.main=3)lines(x1,addfun(x1,rep("b",200)),lty=2)legend("bottomleft",legend=c(expression(x[2]*" = "*a),expression(x[2]*" = "*b)),

lty=1:2,bty="n",cex=1.5)plot(x1,intfun(x1,rep("a",200)),type="l",ylim=c(-2,4),main="Interaction",

ylab="y",cex.axis=1.25,cex.lab=1.5,cex.main=3)lines(x1,intfun(x1,rep("b",200)),lty=2)legend("bottomleft",legend=c(expression(x[2]*" = "*a),expression(x[2]*" = "*b)),

lty=1:2,bty="n",cex=1.5)

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 12

Introduction Nonparametric Regression

Example 2: Two Continuous Covariates

xi = (xi1, xi2) with xi1, xi2 ∈ [0,1].

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Additive

x1

x2

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Interaction

x1

x2

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 13

Introduction Nonparametric Regression

Example 2: R Code

addfun = function(x1,x2){sin(2*pi*x1) + cos(4*pi*x2*(1-x2))

}intfun = function(x1,x2){sin(2*pi*x1) + cos(4*pi*x2*(1-x2)) + 2*sin(pi*(x1-x2))

}

xs = seq(0,1,length=50)xg = expand.grid(xs,xs)dev.new(width=12,height=6,noRStudioGD=TRUE)par(mfrow=c(1,2))zmat = matrix(addfun(xg[,1],xg[,2]),50,50)image(xs,xs,zmat,xlab="x1",ylab="x2",main="Additive",

cex.axis=1.25,cex.lab=1.5,cex.main=3)zmat = matrix(intfun(xg[,1],xg[,2]),50,50)image(xs,xs,zmat,xlab="x1",ylab="x2",main="Interaction",

cex.axis=1.25,cex.lab=1.5,cex.main=3)

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 14

Introduction Smoothing Splines

Smoothing Splines on {1, . . . ,K}Suppose xi ∈ {1, . . . ,K} and note that ηf is a vector of length K .

f = (f1, . . . , fK )′ ∈ RK is vector corresponding to ηf

ηf(1) = f1, ηf(2) = f2, . . . , ηf(K ) = fKLet ηf =

∑Kx=1 ηf(x)/K denote the mean

A nominal smoothing spline is the ηλ ∈ RK that minimizes

1n

n∑i=1

(yi − ηf(xi))2 + λJ(ηf)

where λ ≥ 0 is smoothing parameter and J(ηf) is roughness penalty.J(ηf) =

∑Kx=1(ηf(x)− ηf)

2 to shrink towards constant

J(ηf) =∑K

x=1 ηf(x)2 to shrink towards zeroNathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 15

Introduction Smoothing Splines

Polynomial Smoothing Splines on [0,1]

Suppose xi ∈ [0,1] and let C(m)[0,1] = {η : η(m) ∈ L2[0,1]}.η(m) = dmη

dxm denotes m-th derivative of η

L2[0,1] = {η :∫ 1

0 η2dx <∞}

A polynomial smoothing spline is the ηλ ∈ C(m)[0,1] that minimizes

1n

n∑i=1

(yi − η(xi))2 + λ

∫ 1

0(η(m))2dx

where λ ≥ 0 is the smoothing parameter and m is spline order.Related to natural spline in numerical analysis literature

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 16

Introduction Smoothing Splines

Cubic Smoothing Splines

Setting m = 2 results in classic cubic smoothing spline.x1 < x2 < · · · < xq are “knots” (distinct xi values)ηλ is piecewise cubic polynomial, and is linear beyond x1 and xq

ηλ is three-times differentiable, and 3rd derivative jumps at “knots”As λ→ 0, ηλ approaches minimum curvature interpolantAs λ→∞, ηλ approaches simple linear regression

Can also view cubic smoothing spline as solution to

min1n

n∑i=1

(yi − η(xi))2 subject to∫ 1

0η2dx ≤ ρ

for some ρ ≥ 0, which is least-squares with soft constraint.

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 17

Introduction Smoothing Splines

Example with R’s spline Function

yi = sin(2πxi) + ei where xi = i/20 for i ∈ {0,1,2, . . . ,20} andNo Noise: ei = 0 ∀iSome Noise: ei

iid∼ N(0,0.152)

More Noise: eiiid∼ N(0,0.252)

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

−1.

5−

1.0

−0.

50.

00.

51.

01.

5

No Noise

x

y

●●

● ●

●●

0.0 0.2 0.4 0.6 0.8 1.0

−1.

5−

1.0

−0.

50.

00.

51.

01.

5

Some Noise

x

y

● ●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

−1.

5−

1.0

−0.

50.

00.

51.

01.

5

More Noise

xy

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 18

Introduction Smoothing Splines

spline Function (R code)

dev.new(width=12,height=4,noRStudioGD=TRUE)par(mfrow=c(1,3))x = seq(0,1,length=21)y = sin(2*pi*x)mysp = spline(x,y,method="natural")plot(x,y,main="No Noise",ylim=c(-1.5,1.5))lines(x,sin(2*pi*x),lty=2)lines(mysp)

set.seed(1)x = seq(0,1,length=21)y = sin(2*pi*x) + rnorm(21,sd=0.15)mysp = spline(x,y,method="natural")plot(x,y,main="Some Noise",ylim=c(-1.5,1.5))lines(x,sin(2*pi*x),lty=2)lines(mysp)

set.seed(1)x = seq(0,1,length=21)y = sin(2*pi*x) + rnorm(21,sd=0.25)mysp = spline(x,y,method="natural")plot(x,y,main="More Noise",ylim=c(-1.5,1.5))lines(x,sin(2*pi*x),lty=2)lines(mysp)

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 19

Introduction Smoothing Splines

Same Example with R’s smooth.spline Function

yi = sin(2πxi) + ei where xi = i/20 for i ∈ {0,1,2, . . . ,20} andNo Noise: ei = 0 ∀iSome Noise: ei

iid∼ N(0,0.152)

More Noise: eiiid∼ N(0,0.252)

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

−1.

5−

1.0

−0.

50.

00.

51.

01.

5

No Noise

x

y

●●

● ●

●●

0.0 0.2 0.4 0.6 0.8 1.0

−1.

5−

1.0

−0.

50.

00.

51.

01.

5

Some Noise

x

y

● ●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

−1.

5−

1.0

−0.

50.

00.

51.

01.

5

More Noise

xy

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 20

Introduction Smoothing Splines

smooth.spline Function (R code)

dev.new(width=12,height=4,noRStudioGD=TRUE)par(mfrow=c(1,3))set.seed(1)x = seq(0,1,length=21)y = sin(2*pi*x)mysp = smooth.spline(x,y)plot(x,y,main="No Noise",ylim=c(-1.5,1.5))lines(x,sin(2*pi*x),lty=2)lines(x,mysp$y)

set.seed(1)x = seq(0,1,length=21)y = sin(2*pi*x) + rnorm(21,sd=0.15)mysp = smooth.spline(x,y)plot(x,y,main="Some Noise",ylim=c(-1.5,1.5))lines(x,sin(2*pi*x),lty=2)lines(x,mysp$y)

set.seed(1)x = seq(0,1,length=21)y = sin(2*pi*x) + rnorm(21,sd=0.25)mysp = smooth.spline(x,y)plot(x,y,main="More Noise",ylim=c(-1.5,1.5))lines(x,sin(2*pi*x),lty=2)lines(x,mysp$y)

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 21

Background Theory

Background Theory

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 22

Background Theory Averaging Operators

One-Way ANOVA Decomposition

Consider the standard one-way ANOVA model

yij = µj + eij

for i ∈ {1, . . . ,nj} and j ∈ {1, . . . ,K}.

Typically, we want to decompose the treatment effects such as

µj = µ+ αj

where µ is overall mean and αj is treatment effect such thatα1 = 0 if first group is control∑K

j=1 αj = 0 if using effect coding

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 23

Background Theory Averaging Operators

One-Way ANOVA and Averaging Operators

Consider the standard one-way ANOVA model using a smoothingspline on xi ∈ {1, . . . ,K}

yi = η(xi) + ei

for i ∈ {1, . . . ,n} where n =∑K

j=1 nj .

The ANOVA decomposition µj = µ+ αj can be written as

η = Aη + (I − A)η = η0 + ηc

where A “averages out” η to return a constant η0.α1 = 0 corresponds to Aη = η(1)∑K

j=1 αj = 0 corresponds to Aη =∑K

x=1 η(x)/K

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 24

Background Theory Averaging Operators

Averaging Operators on Continuous Domain

For a continuous domain X = [a,b] we can decompose η such as

η = Aη + (I − A)η = η0 + ηc

where A “averages out” η to return a constant η0.Need averaging operator A defined such that A(Aη) = Aη = η0

Need identity operator I defined such that Iη = η

Note that η0 is overall constant, and ηc is treatment (contrast) effect.

For a function defined on X = [0,1], we could defineAη = η(0)

Aη =∫ 1

0 η(z)dz

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 25

Background Theory Averaging Operators

Two-Way ANOVA Decomposition

Consider the standard two-way ANOVA model

yijk = µjk + eijk

for i ∈ {1, . . . ,njk}, j ∈ {1, . . . ,a}, and k ∈ {1, . . . ,b}.

Typically, we want to decompose the treatment effects such as

µjk = µ+ αj + βk + γjk

where µ is overall mean andαj is main effect of Factor A such that

∑aj=1 αj = 0

βk is main effect of Factor B such that∑b

k=1 βk = 0

γjk is interaction effect such that∑a

j=1 γjk =∑b

k=1 γjk = 0 ∀j , kNathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 26

Background Theory Averaging Operators

Two-Way ANOVA and Averaging Operators

Consider the standard two-way ANOVA model using a smoothingspline on xi = (xi1, xi2) ∈ X1 ×X2 = {1, . . . ,a} × {1, . . . ,b}

yi = η(xi) + ei

for i ∈ {1, . . . ,n} where n =∑a

j=1∑b

k=1 njk .

The ANOVA decomposition µjk = µ+ αj + βk + γjk can be written as

η = [AX1 + (I − AX1)][AX2 + (I − AX2)]η

= AX1AX2η︸ ︷︷ ︸η0

+ (I − AX1)AX2η︸ ︷︷ ︸η1

+ AX1(I − AX2)η︸ ︷︷ ︸η2

+ (I − AX1)(I − AX2)η︸ ︷︷ ︸η12

where AX1 and AX2 are averaging operators such thatAX1(AX1η) = AX1η is constant for all xi1 ∈ X1

AX2(AX2η) = AX2η is constant for all xi2 ∈ X2

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 27

Background Theory Hilbert Spaces

Linear Spaces and Functionals

Suppose that η, φ ∈ L where the set L satisfies:η + φ ∈ LaL ∈ L for any scalar a

If these two conditions are met, we say that L is a linear space.

A functional L in L operates on η ∈ L and returns a real number.Linear functional: L(η + φ) = Lη + Lφ and L(aη) = aLηBilinear functional: linear functional of two variables

- J(aη + bφ, ψ) = aJ(η, ψ) + bJ(φ, ψ)- J(η,aφ+ bψ) = aJ(η, φ) + bJ(η, ψ)

Symmetry: J(η, φ) = J(φ, η) for all η, φ ∈ LPositive definite: J(η) = J(η, η) > 0 for all η ∈ LNon-negative definite: J(η) = J(η, η) ≥ 0 for all η ∈ LQuadratic: bilinear, symmetric, and non-negative definite

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 28

Background Theory Hilbert Spaces

Inner Products and Norms

In a linear space L, an inner-product is a positive definite bilinear form.We will use the notation 〈·, ·〉 to denote an inner-product.

The inner-product defines a norm in L, which provides a metric tomeasure distance between two objects η, φ ∈ L.

We will use the notation ‖η‖ =√〈η, η〉 to denote the norm of η.

We will use the notation D[η, φ] = ‖η − φ‖ to denote the distancebetween η and φ in L

In any linear space L we have the following two rules:Cauchy-Schwarz: |〈η, φ〉| ≤ ‖η‖‖φ‖Triangle: ‖η + φ‖ ≤ ‖η‖+ ‖φ‖

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 29

Background Theory Hilbert Spaces

Null Spaces, Semi-Inner Products, and Semi-Norms

The null space of a non-negative definite bilinear form J in a linearspace L is defined as NJ = {η : J(η, η) = 0, η ∈ L}, and note that

NJ = {0} if J is positive definiteNJ contains 0 and nonzero elements otherwise

A non-negative definite bilinear form J in a linear space L defines asemi-inner-product in L.

Induces a semi-norm√

J(η) =√

J(η, η) in LSimilar to a norm, but J(η) = 0 does not imply η = 0

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 30

Background Theory Hilbert Spaces

Hilbert Spaces and Projections

A Hilbert space is a complete inner-product linear space.A sequence where limm,n→∞ ‖ηm − ηn‖ = 0 is a Cauchy sequenceA linear space L is complete if every Cauchy sequence in Lconverges to some element in L.

Any closed linear subspace of H (denoted G ⊂ H) is a Hilbert space.Distance between η ∈ H and G is D[η,G] = infφ∈G ‖η − φ‖There exists ηG ∈ G such that D[η,G] = ‖η − ηG‖ηG is the unique projection of η in the space G

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 31

Background Theory Hilbert Spaces

Tensor Sum Decompositions

Given η ∈ H and G ⊂ H, we have that 〈η − ηG , φ〉 = 0 for all φ ∈ G.Gc = {η : 〈η, φ〉 = 0, ∀φ ∈ G} is orthogonal complement of GTenor sum decomposition: H = G ⊕ Gc and η = ηG + ηGc

If Hn and Hc are Hilbert spaces with inner products 〈·, ·〉n and 〈·, ·〉c,and if Hn ∩Hc = {0}, then H = Hn ⊕Hc is Hilbert space withinner-product 〈·, ·〉 = 〈·, ·〉n + 〈·, ·〉c

Consider a null space NJ corresponding to a semi-inner-product J inthe space H, and define J(·, ·) such that

1 J(·, ·) defines a full inner product in the space NJ

2 (∀η ∈ H)(∃φ ∈ NJ) such that J(η − φ) = 0Then (J + J)(η, φ) defines a full inner product in H.Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 32

Background Theory Hilbert Spaces

Hilbert Space Example: RK

Note that a Hilbert space is a generalization of Euclidean space RK .

For any vectors x,y ∈ RK , inner product is 〈x,y〉 = x′y =∑K

i=1 xiyi

〈x,y〉 = 〈x,y〉n + 〈x,y〉c = x′[ 1

K 1K 1′K + (IK − 1K 1K 1′K )

]y

Hn = {η : η(1) = · · · = η(K )} and Hc = {η :∑K

x=1 η(x) = 0}

This corresponds to classic one-way ANOVA decomposition

µj = µ+ αj

with the constraint∑

j αj = 0

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 33

Background Theory Reproducing Kernels

Riesz Representation Theorem

For every φ in a Hilbert space H, the functional Lφη = 〈φ, η〉 defines acontinuous linear functional Lφ.

L is continuous if limn→∞ Lηn = Lη whenever limn→∞ ηn = η

Every continuous linear functional L in H has a representationLη = 〈φL, η〉 for some φL ∈ H, which is called the representer of L.

TheoremFor every continuous linear functional L in a Hilbert space H, thereexists a unique φL ∈ H such that Lη = 〈φL, η〉 for all η ∈ H.

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 34

Background Theory Reproducing Kernels

Reproducing Kernel Hilbert Spaces

To estimate an SSANOVA, we need to evaluate η for different x ∈ X .Need continuity of evaluation functional: [x ]η = η(x)

Consider a Hilbert space H of functions on the domain X .If evaluation functional [x ]η = η(x) is continuous in H for all x ∈ X ,then we say that H is a reproducing kernel Hilbert spaceBy the Riesz Representation Theorem, there exists ρx ∈ H, whichis the representer of the evaluation functional [x ]η = η(x)

Symmetric bivariate function ρ(x , y) = ρx (y) = 〈ρx , ρy 〉 has thereproducing property 〈ρ(x , ·), η(·)〉 = η(x)

Consequently, ρ is called the reproducing kernel of the space H

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 35

Background Theory Reproducing Kernels

Examples of Reproducing Kernel Hilbert Spaces

Consider the Euclidean space RK , which is a RKHSInner product is defined as 〈x,y〉 =

∑Ki=1 xiyi

RK is define as ρ(x , y) = I{x=y}, which is indicator function

Consider the space L2[0,1] = {η :∫ 1

0 η2dx <∞}

Elements in L2[0,1] are defined via equivalent classes(not defined via individual functions)NOT a RKHS because evaluation functional is not well-defined

Consider the space C(m)[0,1] = {η : η(m) ∈ L2[0,1]}Elements in C(m)[0,1] are defined via individual functionsEvaluation functional is continuous, so we have a RKHS

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 36

Background Theory Reproducing Kernels

Tensor Sum Decompositions of RKHS

Given the tensor sum decomposition H = Hn ⊕Hc, we have

ρ = ρn + ρc

where ρ is the RK of H, ρn is the RK of Hn, and ρc is the RK of Hc.

Furthermore, if ρ is the RK of H and if ρ = ρn + ρc whereρn, ρc ∈ H are non-negative for all x ∈ X〈ρn(x , ·), ρc(y , ·)〉 = 0 for all x , y ∈ X

then the spaces Hn and Hc form a tensor sum decomposition of H.

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 37

Background Theory Reproducing Kernels

Reproducing Kernel for Nominal Smoothing Splines

Suppose that xi ∈ X = {1, . . . ,K} and η ∈ H = RK .

For any elements η, φ ∈ H, we have that〈η, φ〉 = η′φ =

∑Kx=1 η(x)φ(x)

ρ(x , y) = I{x=y} where I{·} is indicator function

Using the averaging operator Aη =∑K

x=1 η(x)/K〈η, φ〉 = 〈η, φ〉n + 〈η, φ〉c = η′

[ 1K 1K 1′K + (IK − 1

K 1K 1′K )]φ

ρ(x , y) = ρn(x , y) + ρc(x , y) = 1K + (I{x=y} − 1

K )

Hn = {η : η(1) = · · · = η(K )} and Hc = {η :∑K

x=1 η(x) = 0}

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 38

Background Theory Reproducing Kernels

Reproducing Kernel for Polynomial Smoothing Splines

Suppose that xi ∈ X = [0,1] and η ∈ H = C(m)[0,1].

Using the averaging operator Aη =∫ 1

0 ηdx〈η, φ〉 = 〈η, φ〉n + 〈η, φ〉c =

∑m−1ν=0 (

∫ 10 η

(ν)dx)(∫ 1

0 φ(ν)dx) +

∫ 10 η

(m)φ(m)dx

ρ(x , y) = ρn(x , y) + ρc(x , y) =∑m−1

ν=0 kν(x)kν(y) + (−1)m−1k2m(|x − y |)where kν(x) is scaled Bernoulli polynomial

Hn = {η : η(m) = 0} and Hc = {η :∫ 1

0 η(ν) = 0, ν = 0, . . . ,m − 1, η(m) ∈ L2[0, 1]}

Using the averaging operator Aη = η(0)

〈η, φ〉 = 〈η, φ〉n + 〈η, φ〉c =∑m−1

ν=0 η(ν)(0)φ(ν)(0) +

∫ 10 η

(m)φ(m)dx

ρ(x , y) = ρn(x , y) + ρc(x , y) =∑m−1

ν=0xν

ν!yν

ν!+

∫ 10

(x−u)m−1+

(m−1)!(y−u)m−1

+(m−1)! du

Hn = {η : η(m) = 0} and Hc = {η : η(ν)(0) = 0, ν = 0, . . . ,m − 1, η(m) ∈ L2[0, 1]}

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 39

Background Theory Reproducing Kernels

Tensor Product RKHS

Suppose that xi ∈ X where X = X1 × · · · × Xp is a product domain.Suppose HXj is a RKHS of functions with RK ρXj for all xj ∈ Xj

Note that the marginal RKs have the form ρXj = ρnj + ρcj

We can define ρX =∏p

j=1 ρXj =∏p

j=1(ρnj + ρcj )

ρX is non-negative for all x ∈ XρX is RK of tensor product RKHS H = HX1 ⊗ · · · ⊗ HXp

We can form functional spaces for any number of covariatesCan constrain and/or remove subspaces to fit different models

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 40

Background Theory Reproducing Kernels

Need for Additional Smoothing Parameters

Given a tensor product RKHS H = HX1 ⊗ · · · ⊗ HXp we have that. . .H = ⊗p

j=1(Hnj ⊕Hcj ) = ⊕sk=1Hk is tensor sum decomposition

Each subspace Hk has inner product 〈·, ·〉k and RK ρk

Inner-products and RKs have different metrics

Can introduce additional smoothing parameters to inner product:

〈·, ·〉 =s∑

k=1

θ−1k 〈·, ·〉k

which corresponds to the tensor product RK

ρ =s∑

k=1

θkρk

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 41

Estimation and Inference

Estimation and Inference

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 42

Estimation and Inference Penalized Least Squares

Tensor Product Smoothing Spline

Given xi ∈ X = X1 × · · · × Xp, a tensor product smoothing spline is theηλ ∈ H = HX1 ⊗ · · · ⊗ HXp that minimizes

1n

n∑i=1

(yi − η(xi))2 + λJ(η)

whereλ ≥ 0 is overall (global) smoothing parameterJ is a quadratic functional quantifying roughness of ηAdditional smoothing parameters θ = (θ1, . . . , θs) exist in J

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 43

Estimation and Inference Penalized Least Squares

Representation of η

Let H = Hn ⊕Hc denote the tensor sum decomposition of the tensorproduct RKHS H = ⊗p

j=1HXj

Note that H has RK ρ = ρn + ρc where ρc =∑s

k=1 θkρk

Given fixed smoothing parameters θ, the η ∈ H that minimizes thepenalized least-squares functional can be written as

η(x) =m∑

v=1

dvφv (x) +n∑

i=1

ciρc(xi ,x) (1)

where {φv}mv=1 is a set of known functions spanning Hn, ρc is thereproducing kernel (RK) of Hc, and d ≡ {dv}m×1 and c ≡ {ci}n×1 arethe (unknown) basis function coefficient vectors

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 44

Estimation and Inference Penalized Least Squares

Penalty of η

Given the tensor sum decomposition H = Hn ⊕Hc for some tensorproduct RKHS H = ⊗p

j=1HXj , we define the penalty functional

J(η) = 〈η, η〉c

which is a semi-inner-product with null space Hn.

Using the representation of η(x) on the previous slide, we have

〈η, η〉c = 〈∑m

v=1 dvφv (x) +∑n

i=1 ciρc(xi ,x),∑m

v=1 dvφv (x) +∑n

i=1 ciρc(xi ,x)〉c= 〈∑n

i=1 ciρc(xi ,x),∑n

i=1 ciρc(xi ,x)〉c

=n∑

i=1

n∑j=1

cicj〈ρc(xi ,x), ρc(xj ,x)〉c =n∑

i=1

n∑j=1

cicjρc(xi ,xj )

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 45

Estimation and Inference Penalized Least Squares

Penalized Least Squares Problem

Using {x∗u}qu=1 ⊂ {xi}ni=1 as knots, the penalized least-squares

functional can be approximated as

‖y− Kd− Jθc‖2 + λnc′Qθc

wherey = (y1, . . . , yn)′ is response vectorK = {φv (xi)}n×m is null space basis function matrixJθ = {ρc(xi ,x∗u)}n×q is contrast space basis function matrix

Note: Jθ =∑s

k=1 θk Jk where Jk = {ρk (xi ,x∗u)}n×q

Qθ = {ρc(x∗t ,x∗u)}q×q is penalty matrix

Note: Qθ =∑s

k=1 θk Qk where Qk = {ρk (x∗t ,x∗u)}q×q

d = (d1, . . . ,dm)′ and c = (c1, . . . , cq)′ are unknown coefficients

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 46

Estimation and Inference Penalized Least Squares

Coefficients and Smoothing Matrix

The coefficients minimizing the penalized least-squares function are(dc

)=

(K′K K′Jθ

J′θK J′θJθ + λnQθ

)†(K′

J′θ

)y

where (·)† denotes the Moore-Penrose pseudoinverse.

The fitted values are given by y = Kd + Jθc = Sλy where

Sλ =(K Jθ

)(K′K K′Jθ

J′θK J′θJθ + λnQθ

)†(K′

J′θ

)is the smoothing matrix, which depends on λ = (λ/θ1, . . . , λ/θs).

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 47

Estimation and Inference Smoothing Parameter Selection

Smoothing Parameter Goldilocks Phenomenon

The selection of λ is the crucial step when fitting an SSANOVA model!

If λk ≡ λ/θk is too large, the penalty corresponding to Hk will be toosevere, making it difficult to estimate ηk .

Oversmooth k -th contrast space

If λk ≡ λ/θk is too small, the penalty corresponding to Hk will be toolenient, making it difficult to estimate ηk (assuming noisy data)

Undersmooth k -th contrast space

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 48

Estimation and Inference Smoothing Parameter Selection

Cross-Validation

If σ2 is unknown, a reasonable loss function for selecting λ is thecross-validated loss function

CV(λ|y,X,w) = (1/n)n∑

i=1

wi(yi − η[i]λ (xi))2

where wi > 0 is some weight, and η[i]λ is the function φ ∈ H thatminimizes the delete the i-th observation functional:

(1/n)∑j 6=i

(yj − φ(xj))2 + λJ(φ)

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 49

Estimation and Inference Smoothing Parameter Selection

Cross-Validation (continued)

The form of CV loss function might suggest that it is necessary to fit ndifferent models (to obtain η[i]λ for i ∈ {1, . . . ,n}).

However, the CV function can be rewritten as

CV(λ|y,X,w) = (1/n)n∑

i=1

wi(yi − ηλ(xi))2

(1− sii(λ))2

which implies that the CV function can be minimized using the resultsof the full model.

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 50

Estimation and Inference Smoothing Parameter Selection

Generalized Cross-Validation

Defining wi ≡ (1− sii(λ))2/[n−1tr(In − Sλ)]2 replaces each sii(λ) with its

average value, producing the generalized cross-validation (GCV)criterion of Craven and Wahba (1979):

GCV(λ|y,X) = (1/n)n∑

i=1

(yi − ηλ(xi))2

[n−1tr(In − Sλ)]2

=(1/n)‖(In − Sλ)y‖2

[1− tr(Sλ)/n]2

(2)

The λ that minimizes the GCV score produces good estimates of η.

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 51

Estimation and Inference Bayesian Confidence Intervals

Gaussian Process Definition

A Gaussian process is a stochastic process {η(x) : x ∈ X} such thatη(x) ∼ N(µx , σ

2x ) for all x ∈ X where

µx = E(η(x)) is the mean functionγx ,x ′ = Cov(η(x), η(x′)) is the covariance functionσ2

x = Cov(η(x), η(x)) is the variance function

Note η(x) is a random variable that is normally distributed for all x ∈ XUse the notation η(x) ∼ N(µx , σ

2x ) for all x ∈ X

Mean and variance differ for each x ∈ X

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 52

Estimation and Inference Bayesian Confidence Intervals

Bayesian Interpretation of Smoothing Spline

Let η = ηn + ηc denote the null and contrast space functions andassume the following prior distributions:

ηn has a diffuse (vague) prior with mean zeroηc is a zero mean Gaussian process with covariance functionproportional to ρc

Using these prior assumptions. . .η can be interpreted as posterior mean of η given data ywe can derive posterior variance Var(η|y)

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 53

Estimation and Inference Bayesian Confidence Intervals

Bayesian Confidence Intervals

Using the Bayesian interpretation, we can form confidence intervals

η(x)± Zα/2√

Var(η|y)

where Zα/2 is critical value from standard normal distribution.

Bayesian CIs have approximate “across-the-function coverage” whenthe smoothing parameters are selected according to GCV.

On average contain 100(1− α)% of true function realizations

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 54

SSANOVA in Practice

SSANOVA in Practice

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 55

SSANOVA in Practice One-Way SSANOVA

Unidimensional Smoothing Splines in R

Many options for unidimensional smoothing spline in R:smooth.spline function (in stats package)bigspline function (in bigsplines package)bigssa function (in bigsplines package)ssanova function (in gss package)gam function (in mgcv package)

For unidimensional smoothing, we will focus on the smooth.splineand the bigspline functions, which have simple syntax.

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 56

SSANOVA in Practice One-Way SSANOVA

smooth.spline: Overview

> set.seed(1)> x = seq(0,1,length=100)> eta = 2 + x + sin(2*pi*x)> y = eta + rnorm(100)> plot(x,y)> smsp = smooth.spline(x,y)> lines(x,smsp$y)> lines(x,eta,lty=2)

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

x

y

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 57

SSANOVA in Practice One-Way SSANOVA

smooth.spline: Changing Smoothing Parameter

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

12

34

spar=0.25

x

y

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

12

34

spar=0.75

x

y●

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

12

34

spar=1

x

y

R code for leftmost plot:> smsp = smooth.spline(x,y,spar=0.25)> plot(x,y,main="spar=0.25")> lines(x,smsp$y)> lines(x,eta,lty=2)

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 58

SSANOVA in Practice One-Way SSANOVA

smooth.spline: Changing Number of Knots

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

12

34

spar=0.5, nknots=10

x

y

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

12

34

spar=0.5, nknots=20

x

y●

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

12

34

spar=0.5, nknots=30

x

y

R code for leftmost plot:> smsp = smooth.spline(x,y,spar=0.5,nknots=10)> plot(x,y,main="spar=0.5, nknots=10")> lines(x,smsp$y)> lines(x,eta,lty=2)

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 59

SSANOVA in Practice One-Way SSANOVA

smooth.spline: CV versus GCV

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

12

34

nknots=20, cv=TRUE

x

y

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

12

34

nknots=20, cv=FALSE

x

y

R code for leftmost plot:> smsp = smooth.spline(x,y,nknots=20,cv=TRUE)> plot(x,y,main="nknots=20, cv=TRUE")> lines(x,smsp$y)> lines(x,eta,lty=2)Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 60

SSANOVA in Practice One-Way SSANOVA

smooth.spline: Number of Knots (revisited)

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

12

34

cv=FALSE, nknots=10

x

y

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

12

34

cv=FALSE, nknots=20

x

y●

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

12

34

cv=FALSE, nknots=30

x

y

R code for leftmost plot:> smsp = smooth.spline(x,y,nknots=10)> plot(x,y,main="cv=FALSE, nknots=10")> lines(x,smsp$y)> lines(x,eta,lty=2)

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 61

SSANOVA in Practice One-Way SSANOVA

smooth.spline: Predicting for New Data

Given η we can predict for a new sequence of data:

> set.seed(1)> x = seq(0,1,length=100)> eta = 2 + x + sin(2*pi*x)> y = eta + rnorm(100)> plot(x,y,main="Prediction")> smsp = smooth.spline(x,y)> newdata = seq(0,1,length=200)> yhat = predict(smsp,newdata)> lines(yhat)> lines(x,eta,lty=2)

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

Prediction

x

y

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 62

SSANOVA in Practice One-Way SSANOVA

bigspline: Overview

For smoothing large samples. . .

> set.seed(1)> x = seq(0,1,length=100)> eta = 2 + x + sin(2*pi*x)> y = eta + rnorm(100)> plot(x,y)> bigsp = bigspline(x,y)> lines(x,bigsp$fitted)> lines(x,eta,lty=2)

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

x

y

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 63

SSANOVA in Practice One-Way SSANOVA

bigspline: Changing Smoothing Parameter

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

lambdas=10^−9

x

y

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

lambdas=10^−5

x

y

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

lambdas=1

x

y

R code for leftmost plot:> bigsp = bigspline(x,y,lambdas=10^-9)> plot(x,y,main="lambdas=10^-9")> lines(x,bigsp$fitted)> lines(x,eta,lty=2)

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 64

SSANOVA in Practice One-Way SSANOVA

bigspline: Changing Number of Knots

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

lambdas=10^−5, nknots=10

x

y

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

lambdas=10^−5, nknots=20

x

y

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

lambdas=10^−5, nknots=30

x

y

R code for leftmost plot:> bigsp = bigspline(x,y,lambdas=10^-5,nknots=10)> plot(x,y,main="lambdas=10^-5, nknots=10")> lines(x,bigsp$fitted)> lines(x,eta,lty=2)

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 65

SSANOVA in Practice One-Way SSANOVA

bigspline: Number of Knots (revisited)

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

GCV, nknots=10

x

y

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

GCV, nknots=20

x

y

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

GCV, nknots=30

x

y

R code for leftmost plot:> bigsp = bigspline(x,y,nknots=10)> plot(x,y,main="GCV, nknots=10")> lines(x,bigsp$fitted)> lines(x,eta,lty=2)

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 66

SSANOVA in Practice One-Way SSANOVA

bigspline: Predicting for New Data

Given η we can predict for a new sequence of data:

> set.seed(1)> x = seq(0,1,length=100)> eta = 2 + x + sin(2*pi*x)> y = eta + rnorm(100)> plot(x,y,main="Prediction")> bigsp = bigspline(x,y)> newdata = seq(0,1,length=200)> yhat = predict(bigsp,newdata)> lines(newdata,yhat)> lines(x,eta,lty=2)

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

Prediction

x

y

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 67

SSANOVA in Practice One-Way SSANOVA

bigspline: Predicting Linear and Non-Linear Effects

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

Full Prediction

x

y

0.0 0.2 0.4 0.6 0.8 1.0

2.0

2.2

2.4

2.6

2.8

3.0

Linear Effect

x

2 +

x

0.0 0.2 0.4 0.6 0.8 1.0

−1.

0−

0.5

0.0

0.5

1.0

Non−Linear Effect

x

sin(

2 *

pi *

x)

R code for center and rightmost plot:> newdata = seq(0,1,length=200)> plot(x,2+x,main="Linear Effect",type="l",lty=2)> yhat = predict(bigsp,newdata,effect="0") + predict(bigsp,newdata,effect="lin")> lines(newdata,yhat)> plot(x,sin(2*pi*x),main="Non-Linear Effect",type="l",lty=2)> yhat = predict(bigsp,newdata,effect="non")> lines(newdata,yhat)

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 68

SSANOVA in Practice One-Way SSANOVA

bigspline: Bayesian Confidence Intervals

> dev.new(width=6,height=6,noRStudioGD=TRUE)> set.seed(1)> x = seq(0,1,length=100)> eta = 2 + x + sin(2*pi*x)> y = eta + rnorm(100)> bigsp = bigspline(x,y,se.fit=TRUE)> cilo = bigsp$fit - qnorm(0.975)*bigsp$se> cihi = bigsp$fit + qnorm(0.975)*bigsp$se> plot(x,y)> lines(x,eta)> lines(bigsp$xunique,cilo,lty=2)> lines(bigsp$xunique,cihi,lty=2)> sum(eta>=cilo & eta<=cihi)/length(x)[1] 1

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

x

y

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 69

SSANOVA in Practice One-Way SSANOVA

Comparing smooth.spline and bigspline

Consider the function yi = 2 + xi + sin(2πxi) + ei where eiiid∼ N(0, σ2).

Suppose that xi = i/n for i ∈ {0, . . . ,n} and σ2 = 1 so that eiiid∼ N(0,1).

Median true MSE = 1n∑n

i=1(η(xi)− η(xi))2 using q = 20 knots:n 100 1000 10000 1e+05 1e+06

smooth.spline 0.13836 0.00504 0.00113 1e-04 2e-05bigspline 0.14030 0.00497 0.00110 1e-04 2e-05

Median runtimes using q = 20 knots:n 100 1000 10000 1e+05 1e+06

smooth.spline 0.001 0.002 0.021 0.1965 2.233bigspline 0.009 0.009 0.011 0.0120 0.094

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 70

SSANOVA in Practice One-Way SSANOVA

R Code for Simulation (on previous slide)

nsamp = 10^c(2:6)simresults = NULLxnew = seq(0,1,length=200)set.seed(1)for(j in 1:5){for(k in 1:10){

x = seq(0,1,length=nsamp[j])eta = 2 + x + sin(2*pi*x)y = eta + rnorm(nsamp[j])

tic = proc.time()ssmod = smooth.spline(x,y,nknots=20)toc = proc.time() - tictmse = sum( (ssmod$y - eta)^2 ) / nsamp[j]simsp = data.frame(method="smsp",n=nsamp[j],time=toc[3],tmse=tmse,row.names=k)

tic = proc.time()ssmod = bigspline(x,y,nknots=20)toc = proc.time() - tictmse = sum( (predict(ssmod) - eta)^2 ) / nsamp[j]simbig = data.frame(method="big",n=nsamp[j],time=toc[3],tmse=tmse,row.names=k+1)

simresults = rbind(simresults,simsp,simbig)}

}

round(tapply(simresults$tmse,list(simresults$method,simresults$n),median),5)round(tapply(simresults$time,list(simresults$method,simresults$n),median),5)

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 71

SSANOVA in Practice One-Way SSANOVA

bigspline: Linear and Non-Linear Effects (revisited)

0.0 0.2 0.4 0.6 0.8 1.0

2.0

2.2

2.4

2.6

2.8

3.0

Linear: n = 100

xnew

2 +

xne

w

0.0 0.2 0.4 0.6 0.8 1.0

−1.

0−

0.5

0.0

0.5

1.0

Non−linear: n = 100

xnew

sin(

2 *

pi *

xne

w)

0.0 0.2 0.4 0.6 0.8 1.0

2.0

2.2

2.4

2.6

2.8

3.0

Linear: n = 1000

xnew

2 +

xne

w

0.0 0.2 0.4 0.6 0.8 1.0

−1.

0−

0.5

0.0

0.5

1.0

Non−linear: n = 1000

xnew

sin(

2 *

pi *

xne

w)

0.0 0.2 0.4 0.6 0.8 1.0

2.0

2.2

2.4

2.6

2.8

3.0

Linear: n = 10000

xnew

2 +

xne

w

0.0 0.2 0.4 0.6 0.8 1.0

−1.

0−

0.5

0.0

0.5

1.0

Non−linear: n = 10000

xnew

sin(

2 *

pi *

xne

w)

0.0 0.2 0.4 0.6 0.8 1.0

2.0

2.2

2.4

2.6

2.8

3.0

Linear: n = 1e+05

xnew

2 +

xne

w

0.0 0.2 0.4 0.6 0.8 1.0

−1.

0−

0.5

0.0

0.5

1.0

Non−linear: n = 1e+05

xnew

sin(

2 *

pi *

xne

w)

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 72

SSANOVA in Practice Two-Way SSANOVA (Additive)

Multidimensional Smoothing Splines in R

A few options for multidimensional smoothing splines in R:bigssa function (in bigsplines package)bigssp function (in bigsplines package)ssanova function (in gss package)gam function (in mgcv package)

We will focus on the ssanova and bigssa (or bigssp) functions,which fit tensor product smoothing splines.

Note that the gam function handles interactions in different manner.

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 73

SSANOVA in Practice Two-Way SSANOVA (Additive)

Additive Function: Definition

Suppose we have the following function defined forx = (x1, x2) ∈ [0,1]× {a,b}:

addfun = function(x1,x2){funval = sin(2*pi*x1)idx = which(x2=="a")funval[idx] = funval[idx] + 2funval

}

Note that the function isη(x1, x2) = 2 + sin(2πx1) if x2 = aη(x1, x2) = sin(2πx1) if x2 6= a

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 74

SSANOVA in Practice Two-Way SSANOVA (Additive)

Additive Function: Visualization

0.0 0.2 0.4 0.6 0.8 1.0

−2

−1

01

23

4

x1

η(x 1

, x2)

x2 = ax2 = b

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 75

SSANOVA in Practice Two-Way SSANOVA (Additive)

Additive Function: bigssa fitting

> n = 100> set.seed(55455)> x1v = seq(0,1,length=n)> x2v = factor(sample(letters[1:2],n,replace=TRUE))> eta = addfun(x1v,x2v)> y = eta + rnorm(n)> idx = binsamp(cbind(x1v,x2v),nmbin=c(20,2))> ssint = bigssa(y~x1v*x2v, type=list(x1v="cub",x2v="nom"), nknots=idx)> sum((ssint$fitted-eta)^2) / length(eta)[1] 0.04605668> ssadd = bigssa(y~x1v+x2v, type=list(x1v="cub",x2v="nom"), nknots=idx)> sum((ssadd$fitted-eta)^2) / length(eta)[1] 0.03305623> fitstats = rbind(ssint$info,ssadd$info)> rownames(fitstats) = c("int","add")> fitstats

gcv rsq aic bicint 1.441561 0.6258559 319.3529 344.6341add 1.386159 0.6134636 316.0127 332.6986

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 76

SSANOVA in Practice Two-Way SSANOVA (Additive)

Additive Function: bigssa prediction> dev.new(width=12,height=6,noRStudioGD=TRUE)> par(mfrow=c(1,2))> newdata = expand.grid(x1v=seq(0,1,length=100),x2v=c("a","b"))> yint = predict(ssint,newdata)> yadd = predict(ssadd,newdata)> plot(newdata[1:100,1],yint[1:100],main="Interaction",+ type="l",ylim=c(-2,4))> lines(newdata[101:200,1],yint[101:200],lty=2)> plot(newdata[1:100,1],yadd[1:100],main="Additive",+ type="l",ylim=c(-2,4))> lines(newdata[101:200,1],yadd[101:200],lty=2)

0.0 0.2 0.4 0.6 0.8 1.0

−2

−1

01

23

4

Interaction

newdata[1:100, 1]

yint

[1:1

00]

0.0 0.2 0.4 0.6 0.8 1.0

−2

−1

01

23

4

Additive

newdata[1:100, 1]

yadd

[1:1

00]

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 77

SSANOVA in Practice Two-Way SSANOVA (Additive)

Additive Function: ssanova fitting

> n = 100> set.seed(55455)> x1v = seq(0,1,length=n)> x2v = factor(sample(letters[1:2],n,replace=TRUE))> eta = addfun(x1v,x2v)> y = eta + rnorm(n)> idx = binsamp(cbind(x1v,x2v),nmbin=c(20,2))> ssint = ssanova(y~x1v*x2v,type=list(x1v="cubic",x2v="nominal"),+ id.basis=idx)> newdata = data.frame(x1v=x1v,x2v=x2v)> sum((predict(ssint,newdata)-eta)^2) / length(eta)[1] 0.01449173> ssadd = ssanova(y~x1v+x2v,type=list(x1v="cubic",x2v="nominal"),+ id.basis=idx)> sum((predict(ssadd,newdata)-eta)^2) / length(eta)[1] 0.01432404

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 78

SSANOVA in Practice Two-Way SSANOVA (Additive)

Additive Function: ssanova prediction> dev.new(width=12,height=6,noRStudioGD=TRUE)> par(mfrow=c(1,2))> newdata = expand.grid(x1v=seq(0,1,length=100),x2v=c("a","b"))> yint = predict(ssint,newdata)> yadd = predict(ssadd,newdata)> plot(newdata[1:100,1],yint[1:100],main="Interaction",+ type="l",ylim=c(-2,4))> lines(newdata[101:200,1],yint[101:200],lty=2)> plot(newdata[1:100,1],yadd[1:100],main="Additive",+ type="l",ylim=c(-2,4))> lines(newdata[101:200,1],yadd[101:200],lty=2)

0.0 0.2 0.4 0.6 0.8 1.0

−2

−1

01

23

4

Interaction

newdata[1:100, 1]

yint

[1:1

00]

0.0 0.2 0.4 0.6 0.8 1.0

−2

−1

01

23

4

Additive

newdata[1:100, 1]

yadd

[1:1

00]

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 79

SSANOVA in Practice Two-Way SSANOVA (Interaction)

Interaction Function: Definition

Suppose we have the following function defined forx = (x1, x2) ∈ [0,1]× {a,b}:

intfun = function(x1,x2){funval = sin(2*pi*x1)idx = which(x2=="a")funval[idx] = funval[idx] + 2 + sin(4*pi*x1[idx])funval

}

Note that the function isη(x1, x2) = 2 + sin(2πx1) + sin(4πx1) if x2 = aη(x1, x2) = sin(2πx1) if x2 6= a

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 80

SSANOVA in Practice Two-Way SSANOVA (Interaction)

Interaction Function: Visualization

0.0 0.2 0.4 0.6 0.8 1.0

−2

−1

01

23

4

x1

η(x 1

, x2)

x2 = ax2 = b

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 81

SSANOVA in Practice Two-Way SSANOVA (Interaction)

Interaction Function: bigssa fitting

> n = 100> set.seed(55455)> x1v = seq(0,1,length=n)> x2v = factor(sample(letters[1:2],n,replace=TRUE))> eta = intfun(x1v,x2v)> y = eta + rnorm(n)> idx = binsamp(cbind(x1v,x2v),nmbin=c(20,2))> ssint = bigssa(y~x1v*x2v,type=list(x1v="cub",x2v="nom"),nknots=idx)> sum((ssint$fitted-eta)^2) / length(eta)[1] 0.1081747> ssadd = bigssa(y~x1v+x2v,type=list(x1v="cub",x2v="nom"),nknots=idx)> sum((ssadd$fitted-eta)^2) / length(eta)[1] 0.1858098> fitstats = rbind(ssint$info,ssadd$info)> rownames(fitstats) = c("int","add")> fitstats

gcv rsq aic bicint 1.522061 0.6509204 324.0861 356.6680add 1.510616 0.6097741 324.5034 343.1147

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 82

SSANOVA in Practice Two-Way SSANOVA (Interaction)

Interaction Function: bigssa prediction> dev.new(width=12,height=6,noRStudioGD=TRUE)> par(mfrow=c(1,2))> newdata = expand.grid(x1v=seq(0,1,length=100),x2v=c("a","b"))> yint = predict(ssint,newdata)> yadd = predict(ssadd,newdata)> plot(newdata[1:100,1],yint[1:100],main="Interaction",+ type="l",ylim=c(-2,4))> lines(newdata[101:200,1],yint[101:200],lty=2)> plot(newdata[1:100,1],yadd[1:100],main="Additive",+ type="l",ylim=c(-2,4))> lines(newdata[101:200,1],yadd[101:200],lty=2)

0.0 0.2 0.4 0.6 0.8 1.0

−2

−1

01

23

4

Interaction

newdata[1:100, 1]

yint

[1:1

00]

0.0 0.2 0.4 0.6 0.8 1.0

−2

−1

01

23

4

Additive

newdata[1:100, 1]

yadd

[1:1

00]

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 83

SSANOVA in Practice Two-Way SSANOVA (Interaction)

Interaction Function: ssanova fitting

> n = 100> set.seed(55455)> x1v = seq(0,1,length=n)> x2v = factor(sample(letters[1:2],n,replace=TRUE))> eta = intfun(x1v,x2v)> y = eta + rnorm(n)> idx = binsamp(cbind(x1v,x2v),nmbin=c(20,2))> ssint = ssanova(y~x1v*x2v,type=list(x1v="cubic",x2v="nominal"),+ id.basis=idx)> newdata = data.frame(x1v=x1v,x2v=x2v)> sum((predict(ssint,newdata)-eta)^2) / length(eta)[1] 0.1624814> ssadd = ssanova(y~x1v+x2v,type=list(x1v="cubic",x2v="nominal"),+ id.basis=idx)> sum((predict(ssadd,newdata)-eta)^2) / length(eta)[1] 0.1802812

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 84

SSANOVA in Practice Two-Way SSANOVA (Interaction)

Interaction Function: ssanova prediction> dev.new(width=12,height=6,noRStudioGD=TRUE)> par(mfrow=c(1,2))> newdata = expand.grid(x1v=seq(0,1,length=100),x2v=c("a","b"))> yint = predict(ssint,newdata)> yadd = predict(ssadd,newdata)> plot(newdata[1:100,1],yint[1:100],main="Interaction",+ type="l",ylim=c(-2,4))> lines(newdata[101:200,1],yint[101:200],lty=2)> plot(newdata[1:100,1],yadd[1:100],main="Additive",+ type="l",ylim=c(-2,4))> lines(newdata[101:200,1],yadd[101:200],lty=2)

0.0 0.2 0.4 0.6 0.8 1.0

−2

−1

01

23

4

Interaction

newdata[1:100, 1]

yint

[1:1

00]

0.0 0.2 0.4 0.6 0.8 1.0

−2

−1

01

23

4

Additive

newdata[1:100, 1]

yadd

[1:1

00]

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 85

SSANOVA in Practice Two-Way SSANOVA (Interaction)

Interaction Function: Fitting with More Data

> n = 1000> set.seed(55455)> x1v = seq(0,1,length=n)> x2v = factor(sample(letters[1:2],n,replace=TRUE))> eta = intfun(x1v,x2v)> y = eta + rnorm(n)> idx = binsamp(cbind(x1v,x2v),nmbin=c(20,2))> ssint = bigssa(y~x1v*x2v,type=list(x1v="cub",x2v="nom"),nknots=idx)> sum((ssint$fitted-eta)^2) / length(eta)[1] 0.03178251> ssadd = bigssa(y~x1v+x2v,type=list(x1v="cub",x2v="nom"),nknots=idx)> sum((ssadd$fitted-eta)^2) / length(eta)[1] 0.1311356> fitstats = rbind(ssint$info,ssadd$info)> rownames(fitstats) = c("int","add")> fitstats

gcv rsq aic bicint 1.016522 0.6479397 2854.060 2923.757add 1.081167 0.6236793 2915.779 2973.402

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 86

SSANOVA in Practice Two-Way SSANOVA (Interaction)

Interaction Function: Predicting with More Data> dev.new(width=12,height=6,noRStudioGD=TRUE)> par(mfrow=c(1,2))> newdata = expand.grid(x1v=seq(0,1,length=100),x2v=c("a","b"))> yint = predict(ssint,newdata)> yadd = predict(ssadd,newdata)> plot(newdata[1:100,1],yint[1:100],main="Interaction",+ type="l",ylim=c(-2,4))> lines(newdata[101:200,1],yint[101:200],lty=2)> plot(newdata[1:100,1],yadd[1:100],main="Additive",+ type="l",ylim=c(-2,4))> lines(newdata[101:200,1],yadd[101:200],lty=2)

0.0 0.2 0.4 0.6 0.8 1.0

−2

−1

01

23

4

Interaction

newdata[1:100, 1]

yint

[1:1

00]

0.0 0.2 0.4 0.6 0.8 1.0

−2

−1

01

23

4

Additive

newdata[1:100, 1]

yadd

[1:1

00]

Nathaniel E. Helwig (U of Minnesota) Smoothing Spline ANOVA Updated 04-Jan-2017 : Slide 87