Applications of Persistence

APPLICATIONS OF PERSISTENCE

August 13, 2018

Lori ZiegelmeierTutorial on Multiparameter Persistence, Computation, and ApplicationsInstitute for Mathematics and its Applications

INTRODUCTION

MOTIVATION:

Overview:

Topological data analysis has been applied to a wide variety ofapplications. This talk will sample a few:

# Sensor networks# Signal analysis# Natural imagery# Country development statistics# Simulations of partial differential equations# Biological aggregations

3

DATA COMES IN MANY FORMS

Data Type Topological MethodsPoint Cloud (PC) Vietoris-Rips Complex,

α-Complex,Cech Complex,Witness Complex

Function Sub or super-level set persistenceImage Cubical complex,

Treat as surface – sub or super-level set persistence,Treat pixels as points in a PC,

Embed as points in GrassmanianTime Series Time-delay embedding into PC

Family of Time Series Filtration tied to pairwise correlationsTime Varying Data Zig-zag persistence

CROCKER plot

4

WHAT CAN WE DO ONCE WE’VE RUN PERSISTENCE?

# Verify intuition of topological space was correct# Use union find algorithm to find elements of clusters# Look at generators of homology classes# Compute confidence intervals to determine noise/signal# Compute distances between persistence diagrams# Transform persistence diagram into new space for machine

learning tasks5

CLASSIC APPLICATIONS OFPERSISTENT HOMOLOGY

SENSOR NETWORKS

(de Silva, Ghrist 2007) Coverage in Sensor Networks via PersistentHomology

Determine when static nodes with minimal and localized sensingcapabilities completely cover a bounded domain of unknowntopological type.

7

SIGNAL ANALYSIS

(Perea, Harer 2013) Sliding Windows and Persistence: An Application ofTopological Methods to Signal Analysis

A sliding window point cloud for signal f :

SWM ,τf (t) �

f (t)

f (t + τ)...

f (t +Mτ)

creates time-delayed embedding {SWM ,τf (t1), . . . ,SWM ,τf (tS)}, a finitepoint cloud in RM+1

1-dimensional persistence provides measure of periodicity in signal

8

SIGNAL ANALYSIS

(Perea, Harer 2013) Sliding Windows and Persistence: An Application ofTopological Methods to Signal Analysis

A sliding window point cloud for signal f :

SWM ,τf (t) �

f (t)

f (t + τ)...

f (t +Mτ)

creates time-delayed embedding {SWM ,τf (t1), . . . ,SWM ,τf (tS)}, a finitepoint cloud in RM+1

1-dimensional persistence provides measure of periodicity in signal8

NATURAL IMAGERY

(Carlsson et al. 2008) On the Local Behavior of Spaces of NaturalImages

Images in next few slides from presentation by Henry Adams

9

NATURAL IMAGERY PERSISTENCE

Densest patches according to a global estimate

Interpretation: nature prefers linearity

10


Densest patches according to an intermediate estimate

Interpretation: nature prefers horizontal and vertical directions

11


Densest patches according to a local estimate

12


13

ANALYZING GLOBAL DEVELOPMENT

GAPMINDER PROJECT

The Gapminder project set out to use statistics to dispel simplisticnotions about global development.

Goal:

Use TDA to understand global development by looking at topolog-ical structure of health and wealth statistics in unbiased approach.

(Banman, Z. 2018) Mind the Gap: A Study in Global Development throughPersistent Homology 15

HEALTH AND WEATH DATA

Development indicators:

# 194 countries with most recently available data (usually from2015, 2016, with others as early as 2005)

Indicator Max Min Median Mean Stand Dev

Scaled Mean

GDP 148374 599 11903 18972 21523

-0.476

Life Exp. 84.8 48.86 74.5 72.56 7.74

0.296

Infant Mor. 96 1.5 23.89 15 21.9

0.528

GNI 87030 350 8360 13596 15399

-0.431

Preprocessing:

# Modulate value of GDP outliers to two standard deviationsfrom mean

# Re-scale each indicator to [−1, 1]

16




Indicator Max Min Median Mean Stand Dev

Scaled Mean

GDP 148374 599 11903 18972 21523

-0.476

Life Exp. 84.8 48.86 74.5 72.56 7.74

0.296

Infant Mor. 96 1.5 23.89 15 21.9

0.528

GNI 87030 350 8360 13596 15399

-0.431

Preprocessing:


# Re-scale each indicator to [−1, 1]16




Indicator Max Min Median Mean Stand Dev Scaled Mean

GDP 148374 599 11903 18972 21523 -0.476Life Exp. 84.8 48.86 74.5 72.56 7.74 0.296Infant Mor. 96 1.5 23.89 15 21.9 0.528GNI 87030 350 8360 13596 15399 -0.431

Preprocessing:


# Re-scale each indicator to [−1, 1]16

TWO ANALYSIS METHODS

1. Analyze clusters of ‘health’ and ‘wealth’◦ Define distance between countries with indicators I as

dI : R|I | → R

dI(x , y) �√∑

i∈I(xi − yi)2

2. Add geographic structure to the data by constructing aweighted graph over the countries and their borders.◦ Define adjacency matrix A

A(i ,j) �{

1 if countries i , j share a border,0 if countries i , j do not share a border

◦ Define distance between countries as

Di ,j �

{d(i , j) if Ai ,j � 1,∞ if Ai ,j � 0

17

CLUSTERING OF DEVELOPMENT GROUPS

18


(a) ε � 0.08with 54, 52, 14, 10, 8countries among 41 total clusters

(b) ε � 0.10with 132, 18, 10, 6, 2countries among 25 total clusters

(c) ε � 0.12 with 164, 6, 2, 2, 2countries among 19 total clusters

(d) ε � 0.14 with 170, 8, 3, 2, 2countries among 13 total clusters

Figure: Five largest clusters at varying scale parameters

19


Table: Countries (138) comprising the 6 largest clusters at ε � 0.08 and means of scaledindicators, GDP/capita (GDP) and life expectancy (LE).

Countries (ISO2) GDP LEBangladesh, Kyrgyzstan, Cambodia, Mauritania, Micronesia Fed. Sts., Nepal, Syria, Gam-bia, Comoros, Myanmar, Sudan, Sao Tome and Principe, India, Laos, Marshall Islands,Guyana, Pakistan, Ghana, Nigeria, Yemen Rep., Djibouti, Kenya, Senegal, Tanzania, Vanu-atu, Haiti, Liberia, Madagascar, Solomon Islands, Ethiopia, Rwanda, Benin, Kiribati, BurkinaFaso, Burundi, Congo Dem. Rep., Niger, Papua New Guinea, Togo, Uganda, Zimbabwe,Eritrea, Mali, Malawi, Guinea, Cote d’Ivoire, Cameroon, Sierra Leone, Mozambique, Chad,Zambia, South Sudan, Guinea-Bissau, Fiji

-0.93 -0.15

Albania, Bosnia and Herzegovina, Colombia, Jordan, Sri Lanka, Tunisia, Peru, MacedoniaFYR, Barbados, China, Dominican Rep., Algeria, Ecuador, Montenegro, Serbia, Thailand,Bulgaria, Brazil, Iran, Venezuela, Mauritius, Mexico, Romania, Argentina, Saint Lucia, Ar-menia, Jamaica, Paraguay, El Salvador, Morocco, Vietnam, Bolivia, Bhutan, Cape Verde,Georgia, Guatemala, Honduras, Moldova, Samoa, Belize, Ukraine, Indonesia, Philippines,Saint Vincent and the Grenadines, Egypt, Grenada, Tonga, Uzbekistan, Tajikistan, KoreaDem. Rep., Timor-Leste, Palestine

-0.69 0.44

Antigua and Barbuda, Croatia, Uruguay, Cuba, Panama, Turkey, Lebanon -0.37 0.63Estonia, Poland, Slovak Republic, Hungary, Latvia, Malaysia, Lithuania, Seychelles -0.19 0.53Cyprus, Malta, Slovenia, Israel, Spain, Italy, Korea Rep., New Zealand, Portugal, Greece -0.02 0.83Austria, Australia, Canada, Germany, Denmark, Netherlands, Sweden, Belgium, Taiwan,Finland, France, United Kingdom, Bahrain, Ireland

0.38 0.80

20

GEOGRAPHICDEVELOPMENTPATTERNS (WEIGHTEDNETWORK)

21

GEOGRAPHIC DEVELOPMENT PATTERNS

Table: South America cycle withinterval [0.34, 0.62)

Country GDP LE

Chile -0.29 0.71Peru -0.63 0.72Bolivia -0.81 0.37Brazil -0.52 0.43Argentina -0.45 0.55

Table: North Africa cycle withinterval [0.85, 0.97)

Country GDP LE

Libya -0.46 0.36Niger -0.99 -0.31Mali -0.96 -0.36Mauritania -0.89 0.17Algeria -0.58 0.54

22

ENCODING A PERSISTENCE DIAGRAMIN A NEW SPACE

PERSISTENCE DIAGRAMS AS A METRIC SPACE

The space of Persistence Diagrams (PDs) live in a metric space.

death

birth

Definition

The p-Wasserstein distance between two PDs B and B′ is given by

Wp(B,B′) � infγ:B→B′

(∑u∈B| |u − γ(u)| |p∞

)1/p,

where 1 ≤ p < ∞ and γ ranges over bijections between B and B′.

24

PERSISTENCE DIAGRAMS AS A METRIC SPACE

The space of Persistence Diagrams (PDs) can be endowed with ametric.

death

birth

Definition

The bottleneck distance between two PDs B and B′ is given by

W∞(B,B′) � infγ:B→B′

supu∈B| |u − γ(u)| |∞ ,

where ranges over bijections between B and B′.

25

TRANSFORMATIONS OF PERSISTENCE DIAGRAMS

# Rouse et al. (2015) create a vector representation by superimposing a grid over a PDand counting number of points in each bin.

# Carriere et al. (2015) develop a stable vector representation by rearranging the entriesof the distance matrix between points in a PD.

# Reininghaus et al. (2015) produce a stable surface from a PD by taking sum of apositive Gaussian centered on each PD point together with a negative Gaussiancentered on its reflection below the diagonal.

# Bubenik (2015) develops persistence landscape (PL), a stable functionalrepresentation of a PD that lies in a Banach space.

26

PERSISTENCE IMAGE

Persistence image, a transformation of a persistence diagram thatlies in Rn

birth

death

persistence

birth

data diagram B diagram T(B) surface image

(Adams et al. 2017) Persistence Images: A Stable Vector Representation ofPersistent Homology

27

ANISOTROPICKURAMOTO-SIVASHINSKY (AKS)

EQUATION

ANISOTROPIC KURAMOTO-SIVASHINSKY (AKS) EQUATION

# Partial differential equation for a function u(x , y , t) of spatialvariables x, y, and time t.

# Kuramoto-Sivashinsky equation derived in problems involvingpattern formation such as surface nanopatterning by ion-beamerosion, epitaxial growth, and solidification from a melt.

# Anisotropic Kuramoto-Sivashinsky (aKS) Equation is given by

∂∂t

u � −∇2u − ∇2∇2u + r(∂∂x

u)2

+

(∂∂y

u)2,

where ∇2 � ∂2

∂x2 +∂2

∂y2 , and the real parameter r controls thedegree of anisotropy.

# For a fixed time t∗, u(x , y , t∗) is a patterned surface (periodic inboth x and y) defined over the (x , y)-plane.

29

PARAMETER OF THE AKS EQUATION

Figure: Plots of u(x , y , ·) from simulations of the aKS equation. Columnsrepresent parameters r � 1, 1.25, 1.5, 1.75 and 2. Rows represent time:t � 3 (top) and t � 5 (bottom).

Figure: Surfaces u(x , y , 3) for r � 1.75 or r � 2. Can you group theimages by eye?Answer: (from left) r � 1.75, 2, 1.75, 2, 2.

30



Figure: Surfaces u(x , y , 3) for r � 1.75 or r � 2. Can you group theimages by eye?

Answer: (from left) r � 1.75, 2, 1.75, 2, 2.

30



Figure: Surfaces u(x , y , 3) for r � 1.75 or r � 2. Can you group theimages by eye?Answer: (from left) r � 1.75, 2, 1.75, 2, 2.

30

AKS CLASSIFICATION METHODS

Goal:

Classify (150, 30 for each parameter) trials of the aKS Equation byparameter (5 values) using snapshots of the surfaces u(x , y , ·) asthey evolve in time (5 values).

Methods of classification:

1. Surfaces viewed as points in R266144. Reduce resolution to 10 × 10 bycoarsening the discretization of the spatial domain. Classify with SubspaceDiscriminant Ensemble.

2. Parameter r influences mean and amplitude of pattern. Use a normaldistribution-based classifier built on the variances of the surface heights.

3. Sublevel set filtration PD. Generate PIs with resolution 10 × 10 and variance0.01. Classify H0 ,H1 and concatenated PIs using Subspace DiscriminantEnsemble.

31


Goal:





r = 1.00r = 1.25r = 1.50r = 1.75r = 2.00

variance5040 60 70 80 90 100

0.02

0.06

0.10

0.14

2019 21 22 23 24 25

0.10

0.30

0.50

dens

ity

variance

(b)(a)


31


Goal:






31

AKS EQUATION CLASSIFICATION ACCURACY

Classification ApproachTimet=3

Timet=5

Timet=10

Subspace Discriminant Ensemble, Resized Surfaces 26.0 % 19.3% 19.3 %Variance Normal Distribution Classifier 20.74% 75.2% 77.62 %

Subspace Discriminant Ensemble, H0 PIs 58.3 % 96.0 % 94.7 %Subspace Discriminant Ensemble, H1 PIs 67.7 % 87.3 % 93.3%

Subspace Discriminant Ensemble, H0 and H1 PIs 72.7 % 95.3 % 97.3 %

32

BIOLOGICAL AGGREGATIONS:TOPOLOGY EVOLVING IN TIME

BIOLOGICAL AGGREGATIONS

In many natural systems, particles, organisms, or agents interactlocally according to rules that produce aggregate behavior.

34

CLASSIC WAY TO ANALYZE BIOLOGICAL AGGREGATIONS

Alignment Order Parameter: ϕ(t) � 1Nv0

�� N∑i�1

®vi(t)��

(B)

0

7

0 7

Goal:

Use topology to analyze the collective behavior of interacting agents’positions and velocities as time evolves.

35

COMPUTE PERSISTENT HOMOLOGY

36

EVOLVE IN TIME

Compute the kth Betti number bk(ε, t),

CROCKER plot

Contour Realization Of Computed K -dimensional hole Evolutionin the Rips complex (CROCKER)

(Topaz, Z., Halverson 2015) Topological Data Analysis of Biological AggregationModels

37

EVOLVE IN TIME

Compute the kth Betti number bk(ε, t),

CROCKER plot

Contour Realization Of Computed K -dimensional hole Evolutionin the Rips complex (CROCKER)

(Topaz, Z., Halverson 2015) Topological Data Analysis of Biological AggregationModels 37

VICSEK MODEL

VICSEK MODEL

# Highly cited dynamical system in discrete time and continuousspace.

# Describes motion of interacting point particles in a square withperiodic boundary conditions.

# Model written as:

θi(t + ∆t) �1N

©«∑

|®xi−®xj |≤Rθj(t)

ª®¬ + U(−η/2, η/2)

®vi(t + ∆t) � v0(cos θi(t + ∆t), sin θi(t + ∆t)

)®xi(t + ∆t) � ®xi(t) + ®vi(t + ∆t)∆t

39

VICSEK MODEL

# Highly cited dynamical system in discrete time and continuousspace.

# Describes motion of interacting point particles in a square withperiodic boundary conditions.

39

TOPOLOGY OF INITIAL CONDITION

Three-torus T3: b � (1, 3, 3, 1, 0, . . .)

40

TOPOLOGY OF INITIAL CONDITION

(A)0

25

0 25

(B)

0.0 0.5 1.0 1.5 2.0

Proximity Parameter ε

(C)

1.0 1.5 2.0

Proximity Parameter ε

(D)0.0

0.5

1.0

1.5

2.0

0.0 0.5 1.0 1.5 2.0

Starting ε

Endin

g ε

Betti Number

0

1

Three-torus T3: b � (1, 3, 3, 1, 0, . . .)40

VICSEK SIMULATION A ANALYSIS

(A)

0.00

0.25

0.50

0.75

1.00

0 1000 2000 3000

Simulation Time t

Ord

er

Para

mete

r ϕ

b0 ≥ 5

b0 = 1

b0 = 2

(B)

0.0

0.5

1.0

1.5

2.0

0 1000 2000 3000

Simulation Time t

Pro

xim

ity P

ara

mete

r ε

level

1

2

3

4

5

b1 = 0 b1 = 0

b1 = 1 (C)

0.0

0.5

1.0

1.5

2.0

0 1000 2000 3000

Simulation Time t

Pro

xim

ity P

ara

mete

r ε

level

1

2

3

4

5

(A)

0

25

0 25

41

VICSEK SIMULATION B ANALYSIS

(A)

0.00

0.25

0.50

0.75

1.00

0 100 200 300 400 500 600

Simulation Time t

Ord

er

Para

mete

r ϕ

b0 ≥ 5

b0 = 1 (B)

0.0

0.5

1.0

1.5

2.0

0 100 200 300 400 500 600

Simulation Time t

Pro

xim

ity P

ara

mete

r ε

level

1

2

3

4

5

b1 = 0

b1 ≥ 5

b1 = 2

b1 = 3

(C)

0.0

0.5

1.0

1.5

2.0

0 100 200 300 400 500 600

Simulation Time t

Pro

xim

ity P

ara

mete

r ε

level

1

2

3

4

5

(B)

0

7

0 7

42

VICSEK SIMULATION C ANALYSIS

(A)

0.00

0.25

0.50

0.75

1.00

0 100 200 300

Simulation Time t

Ord

er

Para

mete

r ϕ

b0 ≥ 5

b0 = 1

b0 = 2

(B)

0.0

0.5

1.0

1.5

2.0

0 100 200 300

Simulation Time t

Pro

xim

ity P

ara

mete

r ε

level

1

2

3

4

5

b1 = 2

b1 ≥ 5

(C)

0.0

0.5

1.0

1.5

2.0

0 100 200 300

Simulation Time t

Pro

xim

ity P

ara

mete

r ε

level

1

2

3

4

5

(C)

0

5

0 5

43

IDENTIFYING PARAMETERSWITH TDA ANDMACHINE LEARNING

Goal:

Given simulated data from the Vicsek model, can we use machinelearning algorithms to recover the unknown underlying noise pa-rameter?

# Generate 100 simulations of different parameter choices# Compute alignment order parameter and H0 and H1

CROCKER plots# Compare pairwise (Euclidean) distances between simulations# Cluster with K -medoids

44

EXPERIMENT 1

Noise parameters:η � 0.01, 0.1, 1

Pairwise Distance Matrices:

Alignment Order Parameter

H0 CROCKER H1 CROCKER H0&1 CROCKER

45

EXPERIMENT 2

Noise parameters:

η � 0.01, 0.02, 0.03, 0.05, 0.1, 0.19, 0.2, 0.21, 0.3, 0.5, 1, 1.5, 1.9, 1.99, 2

Pairwise Distance Matrices:


H0 CROCKER H1 CROCKER H0&1 CROCKER

46

EXPERIMENT 3

Noise parameters:

η � 0.01, 0.5, 1, 1.5, 2

Pairwise Distance Matrices:H0 CROCKER H1 CROCKER H0&1 CROCKER


47

K -MEDOIDS CLUSTERING RESULTS

Exp 1 Exp 2 Exp 3H0&1 Align H0&1 Align H0&1 Align

Accuracy 77.0% 63.3% 23.3% 14.3% 99.6 % 63.0%Silhouette Width 0.61 0.45 0.43 0.3 0.77 0.46

48

CONCLUSION

A SAMPLING OF OTHER APPLICATIONS

# (Bendich et al 2016) Persistent Homology of Brain Artery Trees# (Giusti et al 2015) Clique Topology Reveals Intrinsic Geometric Structure in

Neural Correlations# (Zhu 2013) Persistent Homology: An Introduction and a New Text

Representation for Natural Language Processing# (Freedman and Chen 2009) Algebraic Topology for Computer Vision# (Singh et al 2008) Topological Analysis of Population Activity in Visual Cortex# (Chepushtanova et al. 2016) Persistent Homology on Grassmann Manifolds

for Analysis of Hyperspectral Movies# (Zeppelzauer et al 2016) Topological Descriptors for 3d Surface Analysis# (Dabaghian et al 2012) A Topological Paradigm for Hippocampal Spatial

Map Formation Using Persistent Homology# (Stolz et al 2016) The Topological “Shape" of Brexit# (Betancourt et al 2018) Pseudo-multidimensional Persistence and Its

Applications# (Lee et al 2017) Integrated Multimodal Network Approach to PET and MRI

Based on Multidimensional Persistent Homology50

CONCLUSION

# Topology is a useful way to analyze data.

# Topology can reveal structural information about data thatcannot be seen by other measures.

# Combining persistent homology with machine learning can aidin classification results.

51

THANK YOU!

Questions?

Country DevelopmentAndrew Banman

Persistence ImagesHenry Adams, Sofya Chepushtanova, Tegan Emerson, Eric Hanson, MichaelKirby, Francis Motta, Rachel Neville, Chris Peterson, and Patrick Shipman

CROCKER Plots and Viscek ModelHenry Adams, Tom Halverson, Chad Topaz, and Lu Xian

Lori [email protected] of Mathematics, Statistics, and Computer Science

52

Applications of Persistence

Documents

Transcript of Applications of Persistence