Algebraic topology and statisticsTopology and Statistics Application Algebraic topology and...

34
Topology and Statistics Application Algebraic topology and statistics Peter Bubenik Cleveland State University August 4, 2009 NSF/CBMS conference: Algebraic topology in applied mathematics Peter Bubenik Algebraic topology and statistics

Transcript of Algebraic topology and statisticsTopology and Statistics Application Algebraic topology and...

Page 1: Algebraic topology and statisticsTopology and Statistics Application Algebraic topology and statistics Peter Bubenik Cleveland State University August 4, 2009 NSF/CBMS conference:

Topology and Statistics Application

Algebraic topology and statistics

Peter Bubenik

Cleveland State University

August 4, 2009NSF/CBMS conference:

Algebraic topology in applied mathematics

Peter Bubenik Algebraic topology and statistics

Page 2: Algebraic topology and statisticsTopology and Statistics Application Algebraic topology and statistics Peter Bubenik Cleveland State University August 4, 2009 NSF/CBMS conference:

Topology and Statistics Application

Introduction

Goal

Compare topological and statistical approaches to analyzing dataand show how they can be combined.

Outline:

1 Sampled points

2 Sampled points and values

3 Simplifying the calculations

4 An application to brain imaging

Peter Bubenik Algebraic topology and statistics

Page 3: Algebraic topology and statisticsTopology and Statistics Application Algebraic topology and statistics Peter Bubenik Cleveland State University August 4, 2009 NSF/CBMS conference:

Topology and Statistics Application Points Functions Simplifying the calculations

Sampled points

The setup

By experiment, we measure n points x1, . . . , xn (the sample) on amanifold.

Assumption

There is an underlying object (compact submanifold, probabilitydensity, . . .) generating the data.

Goal

Recover some (global) information about this object from thesample.

Peter Bubenik Algebraic topology and statistics

Page 4: Algebraic topology and statisticsTopology and Statistics Application Algebraic topology and statistics Peter Bubenik Cleveland State University August 4, 2009 NSF/CBMS conference:

Topology and Statistics Application Points Functions Simplifying the calculations

Topological approach

Replace each point with a ball of some fixed radius.

Take the union of these balls.

Peter Bubenik Algebraic topology and statistics

Page 5: Algebraic topology and statisticsTopology and Statistics Application Algebraic topology and statistics Peter Bubenik Cleveland State University August 4, 2009 NSF/CBMS conference:

Topology and Statistics Application Points Functions Simplifying the calculations

Topological approach

Replace each point with a ball of some fixed radius.

Take the union of these balls.

Use these balls to construct a simplicial complex whose vertices arethe sample.

Peter Bubenik Algebraic topology and statistics

Page 6: Algebraic topology and statisticsTopology and Statistics Application Algebraic topology and statistics Peter Bubenik Cleveland State University August 4, 2009 NSF/CBMS conference:

Topology and Statistics Application Points Functions Simplifying the calculations

Topological approach

Replace each point with a ball of some fixed radius.

Take the union of these balls.

Use these balls to construct a simplicial complex whose vertices arethe sample.

Vary the radius to get a filtered simplicial complex whose verticesare the sample.

Calculate its persistent homology.

Peter Bubenik Algebraic topology and statistics

Page 7: Algebraic topology and statisticsTopology and Statistics Application Algebraic topology and statistics Peter Bubenik Cleveland State University August 4, 2009 NSF/CBMS conference:

Topology and Statistics Application Points Functions Simplifying the calculations

Statistical approach

Replace each point with a bump function, called a kernel.

Take the sum of the bump functions.

Peter Bubenik Algebraic topology and statistics

Page 8: Algebraic topology and statisticsTopology and Statistics Application Algebraic topology and statistics Peter Bubenik Cleveland State University August 4, 2009 NSF/CBMS conference:

Topology and Statistics Application Points Functions Simplifying the calculations

Statistical approach

Replace each point with a bump function, called a kernel.

Take the sum of the bump functions.

Calculate the persistent homology (this will be explained soon).

Peter Bubenik Algebraic topology and statistics

Page 9: Algebraic topology and statisticsTopology and Statistics Application Algebraic topology and statistics Peter Bubenik Cleveland State University August 4, 2009 NSF/CBMS conference:

Topology and Statistics Application Points Functions Simplifying the calculations

Advantages and disadvantages

Balls

Good: relatively easy to use theoretically and computationally

Bad: if errors are not bounded then as n → ∞, outliers causeproblems

Kernels

Good: outliers are not a problem as n → ∞

Bad: harder to use theoretically and computationally

Peter Bubenik Algebraic topology and statistics

Page 10: Algebraic topology and statisticsTopology and Statistics Application Algebraic topology and statistics Peter Bubenik Cleveland State University August 4, 2009 NSF/CBMS conference:

Topology and Statistics Application Points Functions Simplifying the calculations

Sampled points and values

The setup

By experiment, we measure n pairs (x1, y1), . . . , (xn, yn) (thesample) where xi is a point on a manifold M and yi ∈ R.

Assumption

There is an underlying object (a function on the manifold)generating the data (yi = f (xi ) + εi ).

Goal

Recover some (global) information about this object from the data.

Peter Bubenik Algebraic topology and statistics

Page 11: Algebraic topology and statisticsTopology and Statistics Application Algebraic topology and statistics Peter Bubenik Cleveland State University August 4, 2009 NSF/CBMS conference:

Topology and Statistics Application Points Functions Simplifying the calculations

Topological approach

Use linear interpolation to extend the sample to a function on themanifold.

Calculate this function’s persistent homology.

Small persistent homology classes are assumed to be due toexperimental error.

Peter Bubenik Algebraic topology and statistics

Page 12: Algebraic topology and statisticsTopology and Statistics Application Algebraic topology and statistics Peter Bubenik Cleveland State University August 4, 2009 NSF/CBMS conference:

Topology and Statistics Application Points Functions Simplifying the calculations

Statistical approach

Replace each data point with a bump function (kernel).

Sum these kernel to get an estimator of the function.

Precisely: given kernel functions Kxi(x) centered at xi , take the

kernel weighted average

f (x) =

∑i Kxi

(x)yi∑i Kxi

(x).

Peter Bubenik Algebraic topology and statistics

Page 13: Algebraic topology and statisticsTopology and Statistics Application Algebraic topology and statistics Peter Bubenik Cleveland State University August 4, 2009 NSF/CBMS conference:

Topology and Statistics Application Points Functions Simplifying the calculations

Statistical approach

Replace each data point with a bump function (kernel).

Sum these kernel to get an estimator of the function.

Precisely: given kernel functions Kxi(x) centered at xi , take the

kernel weighted average

f (x) =

∑i Kxi

(x)yi∑i Kxi

(x).

Calculate this function’s persistent homology.

Peter Bubenik Algebraic topology and statistics

Page 14: Algebraic topology and statisticsTopology and Statistics Application Algebraic topology and statistics Peter Bubenik Cleveland State University August 4, 2009 NSF/CBMS conference:

Topology and Statistics Application Points Functions Simplifying the calculations

Advantages and disadvantages

Interpolation

Good: Relatively easy to use computationally

Bad: If the errors are unbounded then outliers cause problems asn → ∞.

Kernels

Good: Outliers are not a problem, get better and better estimator asn → ∞.

Bad: Calculating critical points of the estimator is difficult.

Peter Bubenik Algebraic topology and statistics

Page 15: Algebraic topology and statisticsTopology and Statistics Application Algebraic topology and statistics Peter Bubenik Cleveland State University August 4, 2009 NSF/CBMS conference:

Topology and Statistics Application Points Functions Simplifying the calculations

Simplifying the calculations

Both topologists and statisticians have methods that simplify theirconstructions when the sample is large.

Peter Bubenik Algebraic topology and statistics

Page 16: Algebraic topology and statisticsTopology and Statistics Application Algebraic topology and statistics Peter Bubenik Cleveland State University August 4, 2009 NSF/CBMS conference:

Topology and Statistics Application Points Functions Simplifying the calculations

Topology: Landmark points

As the size of the sample increases, the Cech complex andVietoris–Rips complex become increasingly expensive to compute.

One solution, is the choose a small set of points, called landmarkpoints from which to build a smaller simplicial complex.

Peter Bubenik Algebraic topology and statistics

Page 17: Algebraic topology and statisticsTopology and Statistics Application Algebraic topology and statistics Peter Bubenik Cleveland State University August 4, 2009 NSF/CBMS conference:

Topology and Statistics Application Points Functions Simplifying the calculations

Statistics: Design points

As the size of the sample increases, the kernel estimatorapproaches the function. However, calculating the critical points ofthe estimator become increasingly expensive to compute.

One solution, is to use the kernel estimator f to construct asimpler estimator.

Choose a small set of points called design points.

Evaluate the kernel estimator at these design points.

Use linear interpolation to obtain a simpler estimator f .

Peter Bubenik Algebraic topology and statistics

Page 18: Algebraic topology and statisticsTopology and Statistics Application Algebraic topology and statistics Peter Bubenik Cleveland State University August 4, 2009 NSF/CBMS conference:

Topology and Statistics Application Points Functions Simplifying the calculations

A theorem

Let M be a compact d-dimensional Riemannian manifold.Assume that there exists a function f : M → R such that

y = f (x) + ǫ, x ∈ M

where ǫ is a normal random variable with mean zero and varianceσ2 > 0. Assume f is in a Lipschitz class of functions.

Theorem (B–P.Kim–Z.Luo)

Given a sample (x1, y1), . . . , (xn, yn), there is an estimator f(constructed as above) such that

EdB(Dp(f ),Dp(f )) ≤ Cψn

as n → ∞.

Peter Bubenik Algebraic topology and statistics

Page 19: Algebraic topology and statisticsTopology and Statistics Application Algebraic topology and statistics Peter Bubenik Cleveland State University August 4, 2009 NSF/CBMS conference:

Topology and Statistics Application

Application to Brain Imaging

Peter Bubenik Algebraic topology and statistics

Page 20: Algebraic topology and statisticsTopology and Statistics Application Algebraic topology and statistics Peter Bubenik Cleveland State University August 4, 2009 NSF/CBMS conference:

Topology and Statistics Application

MRI

Peter Bubenik Algebraic topology and statistics

Page 21: Algebraic topology and statisticsTopology and Statistics Application Algebraic topology and statistics Peter Bubenik Cleveland State University August 4, 2009 NSF/CBMS conference:

Topology and Statistics Application

MRI

Peter Bubenik Algebraic topology and statistics

Page 22: Algebraic topology and statisticsTopology and Statistics Application Algebraic topology and statistics Peter Bubenik Cleveland State University August 4, 2009 NSF/CBMS conference:

Topology and Statistics Application

Cortical surface

Peter Bubenik Algebraic topology and statistics

Page 23: Algebraic topology and statisticsTopology and Statistics Application Algebraic topology and statistics Peter Bubenik Cleveland State University August 4, 2009 NSF/CBMS conference:

Topology and Statistics Application

Cortex thickness

Peter Bubenik Algebraic topology and statistics

Page 24: Algebraic topology and statisticsTopology and Statistics Application Algebraic topology and statistics Peter Bubenik Cleveland State University August 4, 2009 NSF/CBMS conference:

Topology and Statistics Application

Cortex thickness

Peter Bubenik Algebraic topology and statistics

Page 25: Algebraic topology and statisticsTopology and Statistics Application Algebraic topology and statistics Peter Bubenik Cleveland State University August 4, 2009 NSF/CBMS conference:

Topology and Statistics Application

Constructing our estimator

Construct an estimator:First, smooth the data on S2 using the kernel

Kxi(x) = max(1 − κ arccos(x t

i x), 0),

and the kernel function estimator

f (x) =

∑i yiKxi

(x)∑

i Kxi(x)

. (1)

Peter Bubenik Algebraic topology and statistics

Page 26: Algebraic topology and statisticsTopology and Statistics Application Algebraic topology and statistics Peter Bubenik Cleveland State University August 4, 2009 NSF/CBMS conference:

Topology and Statistics Application

Constructing our estimator

Construct an estimator:First, smooth the data on S2 using the kernel

Kxi(x) = max(1 − κ arccos(x t

i x), 0),

and the kernel function estimator

f (x) =

∑i yiKxi

(x)∑

i Kxi(x)

. (1)

Next, choose design points from a triangulation of the sphere:take an iterated subdivision of the icosahedron, which has 1280faces and 642 vertices.

Peter Bubenik Algebraic topology and statistics

Page 27: Algebraic topology and statisticsTopology and Statistics Application Algebraic topology and statistics Peter Bubenik Cleveland State University August 4, 2009 NSF/CBMS conference:

Topology and Statistics Application

Constructing our estimator

Construct an estimator:First, smooth the data on S2 using the kernel

Kxi(x) = max(1 − κ arccos(x t

i x), 0),

and the kernel function estimator

f (x) =

∑i yiKxi

(x)∑

i Kxi(x)

. (1)

Next, choose design points from a triangulation of the sphere:take an iterated subdivision of the icosahedron, which has 1280faces and 642 vertices.Define f on vertices using (1) and extend by linear interpolation.

Peter Bubenik Algebraic topology and statistics

Page 28: Algebraic topology and statisticsTopology and Statistics Application Algebraic topology and statistics Peter Bubenik Cleveland State University August 4, 2009 NSF/CBMS conference:

Topology and Statistics Application

Triangulated sphere

Peter Bubenik Algebraic topology and statistics

Page 29: Algebraic topology and statisticsTopology and Statistics Application Algebraic topology and statistics Peter Bubenik Cleveland State University August 4, 2009 NSF/CBMS conference:

Topology and Statistics Application

Cortex thickness estimator

Peter Bubenik Algebraic topology and statistics

Page 30: Algebraic topology and statisticsTopology and Statistics Application Algebraic topology and statistics Peter Bubenik Cleveland State University August 4, 2009 NSF/CBMS conference:

Topology and Statistics Application

Calculating Persistent Homology

Remark

Critical points only occur at vertices.

The values of the estimator at the vertices, induce a filtrationof the triangulation of the sphere.

The persistent homology of this filtered complex is identical tothe persistent homology of the estimator.

Use Plex to calculate the persistent homology of the filteredcomplex.

Peter Bubenik Algebraic topology and statistics

Page 31: Algebraic topology and statisticsTopology and Statistics Application Algebraic topology and statistics Peter Bubenik Cleveland State University August 4, 2009 NSF/CBMS conference:

Topology and Statistics Application

Persistence diagrams

2 2.5 3 3.5 4 4.5 5 5.5 62

2.5

3

3.5

4

4.5

5

5.5

6Persistence Diagram in degree 1

Peter Bubenik Algebraic topology and statistics

Page 32: Algebraic topology and statisticsTopology and Statistics Application Algebraic topology and statistics Peter Bubenik Cleveland State University August 4, 2009 NSF/CBMS conference:

Topology and Statistics Application

Cumulative Persistence diagrams

2 2.5 3 3.5 4 4.5 5 5.5 62

2.5

3

3.5

4

4.5

5

5.5

6Persistence Diagram in degree 1

11 control16 autistic

Peter Bubenik Algebraic topology and statistics

Page 33: Algebraic topology and statisticsTopology and Statistics Application Algebraic topology and statistics Peter Bubenik Cleveland State University August 4, 2009 NSF/CBMS conference:

Topology and Statistics Application

Wasserstein distance

−3 −2 −1 0 1 2 3−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

3CMDS of Wasserstein distance

autisticcontrol

Peter Bubenik Algebraic topology and statistics

Page 34: Algebraic topology and statisticsTopology and Statistics Application Algebraic topology and statistics Peter Bubenik Cleveland State University August 4, 2009 NSF/CBMS conference:

Topology and Statistics Application

Summary

Both topologists and statisticians replace a point with an extendedobject (disk/kernel).

Topologists take unions, statisticians sum.

Both topologists and statisticians simplify their constructions bychoosing a small set of points.

From a function on a manifold, we can consider the persistenthomology of its lower excursion sets.

There is a statistical estimator for such functions from which onecan calculate the persistent homology combinatorially.

Peter Bubenik Algebraic topology and statistics