Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata...

91
Persistent homology for multivariate data visualization Bastian Rieck Interdisciplinary Center for Scientific Computing Heidelberg University

Transcript of Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata...

Page 1: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Persistent homology for multivariate datavisualization

Bastian Rieck

Interdisciplinary Center for Scientific ComputingHeidelberg University

Page 2: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

MotivationUnderstanding the ‘shape’ of data

Attributes

Inst

ance

s

Scatterplot matrix Parallel coordinatesUnstructured data

u.ve

loci

tyv.

velo

city

air.t

empe

ratu

rem

ean.

sea.

leve

l.pre

ssur

esu

rface

.tem

pera

ture

tota

l.pre

cipi

tatio

n

u.velocity v.velocity air.temperature mean.sea.level.pressure surface.temperature total.precipitation

−10

0

10

20

Corr:0.134

Corr:−0.096

Corr:−0.104

Corr:−0.0927

Corr:0.0791

−10

0

10

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●●

● ●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●● ●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

● ●

●● ●

●●

●●

● ●●

● ●

●● ●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

● ●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●● ●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

● ●●

●●●

●●

●●

●●

●●

●● Corr:−0.0525

Corr:0.00273

Corr:−0.0626

Corr:−0.0639

240

260

280

300

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

● ●●

●●

● ●

●●

●●●

●●

●●

●●●●

●●

●●

●●

●● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●●

●●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●● ●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

● ●

●●

● ●●

●●

●●

●●

●●●

●●

●●

● ● ●●

●●

●●

●●

●●●

●●

●●

● ●

●●

● ●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●● ●

●●

●●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

● ●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

Corr:0.196

Corr:0.997

Corr:0.171

98000

100000

102000

104000

●●

●●

● ●

● ●●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

● ●●

●●

● ●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●● ●

● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●● ●

●●●

●●

●●●

●●

●●

●●

●●

● ●●

● ●

●●

● ●

●●

●●

●●

● ●

●● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

● ●

●●

●●

● ●

●●

●●

● ●●

● ●

● ●

●●

●●

● ●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●●

●●

●●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

● ●

●●

●● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●●

●● ●

● ●

● ●●

● ●

●●

●●

●●

●●●

● ●

●●

● ●

●●

●●

●●

● ●

●● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●●

●●

● ●●

●●

●●

●●

● ●

●●

●●●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●●

●●●

●●

● ●●

● ●

●●

●●

●●

● ●●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●●

Corr:0.195

Corr:−0.147

220

240

260

280

300

●●

●●●

●●

●●

●●

●●

●●

● ●●

●●

● ●

●●

●●

● ●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●●

●●

● ●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●● ●

●●

●●

● ●

●●

●●

●●

● ●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

● ●●

●●

●●●

● ●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

● ●

●● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●●

●●

● ●

●● ●

●●

●●●

●●

● ●

● ●

●●

●●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●● ●

●●

● ●

● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

● ●

● ●●

●●

●●

●●

●●●

●●

●●

● ●

● ● ●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●●

●●

●● ●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

● ●●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●● ●●

●●

● ●

●●●

●●

●●●

●●

● ●

● ●

● ●

● ● ●

● ●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

● ●

●●●

●●

●●●

●●

●●

●●

● ●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●●● ●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●● ●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

● ●

●● ●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

● ●

●●

● ●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●●

● ●

● ●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

● ●

●● ●

●●

● ●

●●

●●

●● ●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

Corr:0.173

0e+00

2e−04

4e−04

6e−04

−10 0 10 20

●● ● ●●●

●●● ●●

●● ●

●● ●

●●● ● ●● ●●

●●●

● ●●

●●

● ●●● ●●

●● ●● ●● ● ●● ●● ●●● ●

●●

●●● ● ●●●● ●●

● ●● ● ●● ●● ●●● ●

●●● ●● ●●● ●●●

●● ● ●● ● ●●●

● ●

●●

●●

● ●● ● ●●●

● ●●● ●●● ● ●● ●●

●●●●

● ● ●

●●●●

●●

●●

●●

●●

●● ●●

● ●● ●●

●● ●● ● ● ●●

● ●●

●●●● ●● ●●●●

●●

●● ●● ●●

●●●

● ●●●● ● ●● ● ●

●● ● ●● ●● ●●●● ●●●

●●● ●●

●●●● ●● ●● ●●

●●●●● ●●

● ●● ●●●● ●●● ●● ●●

●● ●● ●●●● ●●

● ●●● ● ● ●

● ●●●● ●●

●● ●●● ●● ● ●● ●● ● ● ●●●●

●● ● ●●● ●● ●●

● ●●

●● ●●●

● ●

● ●● ●● ●●●●●

● ●●●●●●●● ●● ●●

●●●

●●● ●● ●● ●●●

●●●

●●● ● ●●● ● ●●● ●●●● ●●● ●

● ●● ●●● ●●● ●● ●●

●● ● ●● ●

● ●●●● ●●●

●●●●● ●●●●●

●●

●●●

●●● ●●● ●● ●

● ●●●●

● ●●●● ●●

●●●

●● ● ●●

●●●

●● ● ●

●●

●●

● ●● ● ●

●●

●● ●● ● ●

●●● ●

●●

● ●●●

● ●●● ●● ●● ●●● ● ●●● ● ●●● ●●● ●●

●● ●● ●● ●● ●

● ●

●● ● ●● ●●

● ● ●● ●● ●

● ●

●● ●● ●

●● ●●

●●

●●●

● ● ●

● ●●● ● ●●●● ●●● ●● ●● ● ●● ● ●●●● ●

●●●● ●

●●●● ● ●●

● ●● ● ●●

●●●

●●

● ●● ●● ●● ●●

●● ●

●● ●●●●

● ●

●●●

● ● ●

●● ●

●●

●● ●● ●●●●

●● ●

●● ●

● ●●● ● ●

● ●●

● ●●

●●

●●●●

● ● ●●●●

● ● ●●● ●● ●●●● ●●

●●

● ●●●● ● ● ●

●●●

● ●● ●●●

● ●●●● ●● ●● ●● ●● ●●●●● ●

●●

●● ●●●● ●●● ●●●● ● ●●● ●●

●●

●●● ●●

●● ●●● ●●● ●

●●● ● ● ●●●● ●● ● ●●

● ●

●●●

●●●● ● ● ●●●●●

●●●

● ●●

●●●● ●●

●●● ●

●●

● ●●●●●

●●

●● ●● ●● ●●● ● ●● ●●●

●●● ●●●

●● ●● ●●

● ●● ● ●

●● ●●

● ●●●

●● ●

●● ●● ●

●● ●

● ● ●● ●●●●

●● ●● ●● ●●

−10 0 10

●●●● ●●

●●● ●●

●● ●

● ●●

●●● ●●● ●●

●● ●

● ●●

●●

● ●●● ●●

● ●●● ●● ●●● ●●●● ●●

●●

● ●●● ●●●● ●●

● ●● ●●●●● ●● ● ●

●● ●● ●●● ●●● ●

● ●● ●●● ● ● ●

●●

●●

●●

● ●● ●● ●●

●●●●●● ● ●●● ●●

●●● ●

● ● ●

● ● ●●

●●

●●

● ●

●●

●● ●●

● ● ●●●

● ●●● ● ●●●

● ●●

●● ●●● ●● ●●●

● ●

●●●● ●●●

●●

● ●●● ●● ● ●● ●

●●●● ●● ●● ●● ● ●●●

● ●● ●●

● ● ●●●● ●● ●●

●●● ●● ●●

●●● ●● ●●● ●●● ●●●

●●● ●●●●● ●●

●●●●●● ●

●●● ●● ●●

● ●●● ●● ● ●●● ●● ●● ●● ●●

●● ● ●●● ●● ●●

●●●

●● ●●●

● ●

●● ●●●●● ● ●●

● ●●●●●● ●● ●●●●

●● ●

●●● ● ● ●● ● ●●

● ●●

● ●●●● ● ●●● ●● ●● ● ●●● ● ●

● ●●● ●●●● ●●● ●●

●●● ●● ●

● ●●

● ●●● ●

●●●● ●● ●● ●●

●●

●● ●

●●● ● ●● ●●●

●●●●●

● ● ●●● ● ●

● ●●

●● ●● ●

●●●

● ●● ●

●●

●●

●●● ● ●

●●

● ●●● ●●

●●● ●

●●

●● ●●

● ●●●●●● ●● ●●● ●●●● ● ●●●●●●●

●●● ●● ●●● ●

● ●

● ●● ● ●●●

●●●● ●●●

● ●

● ●●● ●

●●●●

● ●

●●●

●● ●

●● ● ●●●● ●● ●●● ●●● ● ●●●● ●● ●●●

●● ●●●

●●●● ●●●

●● ●● ●●

●●●

●●

●●● ● ●● ●●●● ● ●

● ● ●● ● ●

●●

●●●

●●●

●● ●

●●

●● ● ● ●●●●

● ● ●

● ●●

● ●●● ●●

●●●

● ●●

● ●

●● ●●

●●●●●●●● ●● ● ●● ●● ● ●●

●●

●● ● ●● ●●● ●

●●●

●●● ●●●

● ●●● ●● ●●● ●● ● ● ●●●

● ●●

●●

● ●● ● ●● ● ●●● ●● ● ●●●●●●

●●

●●●●

●●●● ●● ●●● ●

●●● ● ● ●●●●● ● ●●●

● ●

●●●

●● ●●● ● ●●●● ●

●● ●

● ●●

● ●●● ●●

● ● ●●

●●

●●● ●●●

●●

● ●● ●●●●● ●● ●● ● ●●

●●●● ● ●● ●●● ●●

● ●● ●●

●●● ●

● ●●●

● ●●

●● ●● ●

●●●

●● ●● ● ●●●

● ● ●● ●●●●

240 260 280 300

●● ●● ●●

●● ●●●

● ●●

●● ●

●● ●●● ● ●●

● ● ●

● ●●

● ●

●● ●●●●

●●● ●● ●●● ●● ●●●●●

●●

●● ●● ●●● ●●●

● ● ●●● ●● ● ● ●●●

●● ●●● ●● ● ●● ●

● ● ●●● ● ● ● ●

●●

● ●

● ●

● ●●●● ●●

●●● ●●● ● ●●●●●●

●● ●●● ●

● ●● ●

● ●

●●

● ●

●●

● ●●●

● ● ●●●

● ●● ●●●●●

●●●

●● ● ●● ●● ●●●

● ●

● ●●● ●●

●● ●

●● ●●● ● ●●● ●

●●● ●● ●● ● ●●● ●● ●

● ● ●●●

●●● ● ●●● ●●●

●● ● ●●● ●

●●● ●●● ●● ●● ●●●●

●● ●●● ● ● ●●●

● ● ●●● ●●

●● ●● ●●●

● ● ●● ●●● ●● ●● ●●● ●●●●

●●●● ● ●●●● ●

●● ●

●● ●●●

●●

● ●● ●● ● ●●●●

● ● ● ●●●● ●●● ● ● ●

●● ●

●● ●● ●● ●●●●

● ●●

●● ●●●● ●●● ●● ● ●●● ●●●●

● ●● ●● ●● ●●● ●● ●

●● ●●● ●

● ●●●● ●● ●

● ●●●●● ●●●●

●●

● ●●

●●● ●●●● ●●

●● ●●●

●●●●● ●●

●●●

● ●● ●●

●●●

● ●●●

●●

●●

●● ●●●

● ●

●●●●●●

●● ●●

●●

●● ●●

● ●● ●●●● ●● ●● ●●● ●● ●●●● ● ●●●

● ●● ●● ●●●●

● ●

●●●● ●●●

●● ● ●●● ●

●●

● ●●● ●

● ●●●

●●

●●●

● ● ●

●● ●●●●●●● ● ●● ●● ●● ●● ●● ●●● ●●

●●●● ●

● ●● ●● ● ●

●●●● ●●

●●●

●●

●● ● ●● ● ●●●

● ●●

●●●● ●●

● ●

●●●

●●●

●● ●

● ●

●● ●● ●● ●●

● ●●

●● ●

● ●●●● ●

● ●●

●●●

●●

● ●●●

● ●● ●●●

● ●●● ● ●●●●● ●●●

●●

●●● ●● ●●●

●● ●

●● ●●●●

● ● ● ●● ●● ●●● ●●● ● ●●

●●●

●●

●● ● ● ●●●●●●● ●●●●● ●● ●

●●

●● ●●

●● ●● ● ●● ●●●

● ●●●● ●●● ●● ●●●●

●●

●●●

●●●● ●●● ●● ● ●

●●●

●●●

● ● ●●●●

● ●●●

●●

●● ● ●●●

●●

●●● ● ● ●● ●● ●● ●●●●

● ●●●● ●●● ●●● ●

●● ●● ●

●● ●●

● ● ●●

● ●●

● ●● ●●

●● ●

●● ●●● ●● ●

●●●●● ●●●

98000 100000 102000 104000

●●●● ● ●

●● ●● ●

● ● ●

●● ●

● ●●●● ●●●

● ● ●

●● ●

●●

●● ● ●●●

● ●●● ● ●● ● ●●●●● ●●

●●

●● ●●●●● ● ● ●

● ● ●●● ●●● ●● ● ●

●●●●● ●● ●●●●

● ●●● ●● ● ●●

●●

● ●

● ●

●● ●●● ●●

● ●● ● ●● ● ●●● ●●

●●● ●

●● ●

● ●●●

●●

●●

●●

●●

● ●●●

●● ●●●

● ●● ●● ●● ●

●●●

●●● ●● ●● ● ●●

● ●

● ● ●● ●●

●● ●

●● ●●● ●●●● ●

●● ●● ● ●●● ●●● ●●●

● ●● ●●

●● ●● ●●● ●●●

●● ●●●● ●

● ●● ●●● ●● ●● ●●●●●● ●●●● ●●● ●

● ●●●● ●●

● ●● ●●●●

● ●●● ●●● ●● ●●● ●●●●●●

●●● ●● ●●●● ●

● ●●

●● ●● ●

● ●

●●●● ● ●●● ●●

●● ●● ●● ● ● ● ●●● ●

●● ●

●●● ● ●● ●● ●●

●●●

● ●●●● ●●●● ●● ●●●● ●●●●

●●●● ● ●● ●● ●●● ●

● ●● ● ●●

● ●●

● ● ●● ●

● ●● ●●● ● ● ●●

●●

● ●●

●●● ● ●●● ●●

●●● ●●

●●● ● ●●●

● ●●

●●● ●●

●●●

●●● ●

●●

●●

● ● ● ●●

● ●

●●● ●●●

●●● ●

●●

●●●●

●●● ●● ● ●●● ●● ● ●● ● ●● ●● ●●● ●●

● ● ●●● ● ●●●

●●

● ● ●● ● ●●

●● ●●●● ●

●●

●●●● ●

● ●● ●

●●

●●●

●● ●

● ●● ●●●●●● ●●● ●● ●● ● ●●● ● ● ●● ●

●●●● ●

●●● ●● ●●

●●● ● ●●

●●●

● ●

●● ● ●● ●●●●

● ●●

●●● ● ●●

● ●

●●●

● ●●

●● ●

●●

●● ●●● ●●●

●●●

●● ●

● ●●● ●●

● ●●

● ●●

●●

●● ●●

● ●● ● ●●

● ●● ●●●● ●● ● ●●●

●●

●●● ● ●●●●

●● ●

● ●●● ● ●

● ● ●●● ●●● ●● ●●● ●●●

● ●●

●●

●●● ● ●●●●●● ●●●●● ● ● ● ●

●●

●●●●

●● ●● ●● ●● ●●

●● ● ●● ● ●● ●● ● ●● ●

●●

●●●

●● ●● ●●● ●● ●●

●● ●

● ●●

●●●● ●●

●● ●●

●●

●● ●● ●●

●●

● ●● ● ●● ●●● ●● ●●●●

● ● ● ●● ●●● ●●● ●

●● ● ●●

● ●●●

● ●●●

● ● ●

● ●● ●●

● ●●

●● ● ●● ●●●

● ●●●● ●●●

220 240 260 280 300

●● ●● ●●

●● ●●●

● ●●

●● ●

●● ●●● ● ●●

● ● ●

● ●●

● ●

●● ● ●●●

●●● ●● ●●● ●● ●●●●●

●●

●● ●● ●●● ●●●

● ● ● ●●●● ●● ●●●

●● ●●● ●● ● ●● ●

● ● ●●● ● ● ● ●

●●

● ●

● ●

● ●●●● ●●

●●● ●●● ● ●●●●●

●●● ●

●● ●

● ●● ●

● ●

●●

● ●

●●

● ●●●

● ● ●●●

● ●● ●●●●●

●●●

●● ● ●●●● ●●●

● ●

● ●●● ●●

●● ●

●● ●●● ● ●●● ●

●●● ●● ●● ● ●●● ●● ●

● ● ●● ●

●●● ● ●●● ●●●

●● ● ●●● ●

●●● ●●● ●● ●● ●●●●

●● ●●● ● ● ●●●

● ● ●●● ●●

●● ●● ●●●

● ● ●● ●●● ●● ●● ●●● ●●●●

●●●● ● ●●●● ●

●● ●

●● ●●●

●●

● ●● ●● ● ●●●●

● ● ● ●●●● ●●● ● ● ●

●● ●

●● ●● ●● ●●●●

● ●●

●● ●●●● ●●● ●● ● ●●● ●●●●

● ●● ●● ●● ●●● ●● ●

●● ●●● ●

● ●●●● ●● ●

● ●●●●● ●●●●

●●

● ●●

●●● ●●●● ●●

●● ●●●

●●●●● ●●

●●●

● ●● ●●

●●●

● ●●●

●●

●●

●● ●●●

● ●

●●●●●●

●● ●●

●●

●● ●●

● ●● ●●●● ●● ●● ●●● ●● ●●●● ● ●●●

● ●● ●● ●●●●

● ●

●●●● ●●●

●● ●●●● ●

●●

● ●●● ●

● ●●●

●●

●●●

● ● ●

●● ●●●●●●● ● ●● ●● ●● ●● ●● ●●● ●●

●●●● ●

● ●● ●● ● ●

●●●● ●●

●●●

●●

●● ● ●● ● ●●●

● ●●

●●●● ●●

● ●

●●●

●●●

●● ●

● ●

●● ●● ●● ●●

● ●●

●● ●

● ●●●● ●

● ●●

●●●

●●

● ●●●

● ●● ●●●

● ●●● ●●●●●● ●●●

●●

●●● ●● ●●●

●● ●

●● ●●●●

● ● ● ●● ●● ●●● ●●● ● ●●

●●●

●●

●● ● ● ●●●●●●● ● ●●●● ●● ●

●●●

● ●●●

● ●● ● ●● ●●●

● ●●●● ●●● ●● ●●●●

●●

●●●

●●●● ●●● ●● ●●

●●●

●●●

● ● ●●●●

● ●●●

●●

●● ● ●●●●

●●●● ● ● ●● ●● ●● ●●●

●● ●●●● ●

●● ●●● ●

●● ●● ●

●● ●●

● ● ●●

● ●●

● ●● ●●

●● ●

●● ●●● ●● ●

●●●●● ●●●

0e+00 2e−04 4e−04 6e−04

0.00

0.25

0.50

0.75

1.00

u.velocity v.velocity air.temperature mean.sea.level.pressure surface.temperature total.precipitationvariable

valu

e

Bastian Rieck Persistent homology for multivariate data visualization 1

Page 3: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Agenda

1 Theory: Algebraic topology

2 Theory: Persistent homology

3 Applications

Bastian Rieck Persistent homology for multivariate data visualization 2

Page 4: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Part I

Theory: Algebraic topology

Page 5: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Algebraic topology

Algebraic topology is the branch of mathematics that usestools from abstract algebra to studymanifolds. The basicgoal is to find algebraic invariants that classify topologicalspaces up to homeomorphism.

Adapted from https://en.wikipedia.org/wiki/Algebraic_topology.

Bastian Rieck Persistent homology for multivariate data visualization 3

Page 6: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Manifolds

A d-dimensional Riemannian manifoldM in someRn, with d � n,is a space where every point p ∈M has a neighbourhood that‘locally looks’ likeRd.

A 2-dimensional manifold

Bastian Rieck Persistent homology for multivariate data visualization 4

Page 7: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Homeomorphisms

A homeomorphism between two spaces X and Y is a continuousfunction f : X → Y whose inverse f−1 : Y → X exists and iscontinuous as well.

Intuitively, we may stretch, bend—but not tear and glue the twospaces.

Bastian Rieck Persistent homology for multivariate data visualization 5

Page 8: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Algebraic invariants

An invariant is a property of an object that remains unchangedupon transformations such as scaling or rotations.

Example

Dimension is a simple invariant: R2 6= R3 because 2 6= 3.

In general

LetM be the family of manifolds. An invariant permits us to definea function f : M×M → {0, 1} that tells us whether two manifoldsare different or ‘equal’ (with respect to that invariant).

No invariant is perfect—there will be objects that have the sameinvariant even though they are different.

Bastian Rieck Persistent homology for multivariate data visualization 6

Page 9: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Betti numbersA useful topological invariant

Informally, they count the number of holes in different dimensionsthat occur in a data set.

β0 Connected componentsβ1 Tunnelsβ2 Voids...

...

Space β0 β1 β2

Point 1 0 0Circle 1 1 0Sphere 1 0 1Torus 1 2 1

Bastian Rieck Persistent homology for multivariate data visualization 7

Page 10: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Signature property

If βXi 6= βY

i , we know that X 6∼= Y. The converse is not true,unfortunately:

Space β0 β1

X 1 1Y 1 1

We have β0 = 1 and β1 = 1 for X and Y, but still X 6∼= Y.

Bastian Rieck Persistent homology for multivariate data visualization 8

Page 11: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Simplicial complexes

0-simplex 1-simplex 2-simplex 3-simplex

Valid Invalid

Bastian Rieck Persistent homology for multivariate data visualization 9

Page 12: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Simplicial complexesExample

The simplicial complex representation is compact and permits thecalculation of the Betti numbers using an efficient matrix reductionscheme.

Bastian Rieck Persistent homology for multivariate data visualization 10

Page 13: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Basic ideaCalculating boundaries

The boundary of the triangle is:

∂2{a, b, c} = {b, c}+ {a, c}+ {a, b}

The set of edges does not have boundary:

∂1 ({b, c}+ {a, c}+ {a, b})

= {c}+ {b}+ {c}+ {a}+ {b}+ {a}

= 0

Bastian Rieck Persistent homology for multivariate data visualization 11

Page 14: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Fundamental lemma

For all p, we have ∂p−1 ◦ ∂p = 0: Boundaries do not have a boundarythemselves.

This permits us to calculate Betti numbers of simplicial complexesby reducing a boundarymatrix to its Smith Normal Form usingGaussian elimination.

Bastian Rieck Persistent homology for multivariate data visualization 12

Page 15: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Summary

We want to differentiate between different objects.

This endeavour requires algebraic invariants.

One invariant, the Betti numbers, measures intuitive aspects ofour data.

Their calculation requires a simplicial complex and a boundaryoperator.

Bastian Rieck Persistent homology for multivariate data visualization 13

Page 16: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Summary

We want to differentiate between different objects.

This endeavour requires algebraic invariants.

One invariant, the Betti numbers, measures intuitive aspects ofour data.

Their calculation requires a simplicial complex and a boundaryoperator.

Bastian Rieck Persistent homology for multivariate data visualization 13

Page 17: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Summary

We want to differentiate between different objects.

This endeavour requires algebraic invariants.

One invariant, the Betti numbers, measures intuitive aspects ofour data.

Their calculation requires a simplicial complex and a boundaryoperator.

Bastian Rieck Persistent homology for multivariate data visualization 13

Page 18: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Summary

We want to differentiate between different objects.

This endeavour requires algebraic invariants.

One invariant, the Betti numbers, measures intuitive aspects ofour data.

Their calculation requires a simplicial complex and a boundaryoperator.

Bastian Rieck Persistent homology for multivariate data visualization 13

Page 19: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Summary

We want to differentiate between different objects.

This endeavour requires algebraic invariants.

One invariant, the Betti numbers, measures intuitive aspects ofour data.

Their calculation requires a simplicial complex and a boundaryoperator.

Bastian Rieck Persistent homology for multivariate data visualization 13

Page 20: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Part II

Theory: Persistent homology

Page 21: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Real-world multivariate data

Unstructured point clouds

n items withD attributes; n×Dmatrix

Non-random sample fromRD

Manifold hypothesis

There is an unknown d-dimensionalmanifoldM ⊆ RD, with d � D, from whichour data have been sampled.

2-manifold inR3

Bastian Rieck Persistent homology for multivariate data visualization 14

Page 22: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Agenda

1 Convert our input data into a simplicial complex K.

2 Calculate the Betti numbers of K.

3 Use the Betti numbers to compare data sets.

(Fair warning: It won’t be so simple)

Bastian Rieck Persistent homology for multivariate data visualization 15

Page 23: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Converting unstructured data into a simplicial complex

Require: Distance measure (e.g. Euclidean distance), maximumscale threshold ε.

Construct the Vietoris–Rips complex Vε by adding a k-simplexwhenever all of its (k− 1)-dimensional faces are present.

Bastian Rieck Persistent homology for multivariate data visualization 16

Page 24: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Calculating Betti numbers directly from VεUnstable behaviour

Bastian Rieck Persistent homology for multivariate data visualization 17

Page 25: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Calculating persistent Betti numbersPersistent homology

Bastian Rieck Persistent homology for multivariate data visualization 18

Page 26: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

How does this work in practice?An example for β0

Have a function f : Vε → R on the vertices of the Vietoris–Ripscomplex.

Extend it to a function on the whole simplicial complex bysetting f(σ) = max{f(v) | v ∈ σ}.

Analyse the connectivity changes in the sublevel sets of f, i.e.sets of the form

L−α(f) = {v | f(v) 6 α}.

This can be done by traversing the values of f in increasingorder and stopping at ‘critical points’.

Bastian Rieck Persistent homology for multivariate data visualization 19

Page 27: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

How does this work in practice?An example for β0

Have a function f : Vε → R on the vertices of the Vietoris–Ripscomplex.

Extend it to a function on the whole simplicial complex bysetting f(σ) = max{f(v) | v ∈ σ}.

Analyse the connectivity changes in the sublevel sets of f, i.e.sets of the form

L−α(f) = {v | f(v) 6 α}.

This can be done by traversing the values of f in increasingorder and stopping at ‘critical points’.

Bastian Rieck Persistent homology for multivariate data visualization 19

Page 28: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

How does this work in practice?An example for β0

Have a function f : Vε → R on the vertices of the Vietoris–Ripscomplex.

Extend it to a function on the whole simplicial complex bysetting f(σ) = max{f(v) | v ∈ σ}.

Analyse the connectivity changes in the sublevel sets of f, i.e.sets of the form

L−α(f) = {v | f(v) 6 α}.

This can be done by traversing the values of f in increasingorder and stopping at ‘critical points’.

Bastian Rieck Persistent homology for multivariate data visualization 19

Page 29: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

How does this work in practice?An example for β0

Have a function f : Vε → R on the vertices of the Vietoris–Ripscomplex.

Extend it to a function on the whole simplicial complex bysetting f(σ) = max{f(v) | v ∈ σ}.

Analyse the connectivity changes in the sublevel sets of f, i.e.sets of the form

L−α(f) = {v | f(v) 6 α}.

This can be done by traversing the values of f in increasingorder and stopping at ‘critical points’.

Bastian Rieck Persistent homology for multivariate data visualization 19

Page 30: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Persistent homology & persistence diagramsOne-dimensional example

Bastian Rieck Persistent homology for multivariate data visualization 20

Page 31: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Persistent homology & persistence diagramsOne-dimensional example

Bastian Rieck Persistent homology for multivariate data visualization 20

Page 32: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Persistent homology & persistence diagramsOne-dimensional example

Bastian Rieck Persistent homology for multivariate data visualization 20

Page 33: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Persistent homology & persistence diagramsOne-dimensional example

Bastian Rieck Persistent homology for multivariate data visualization 20

Page 34: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Persistent homology & persistence diagramsOne-dimensional example

Bastian Rieck Persistent homology for multivariate data visualization 20

Page 35: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Persistent homology & persistence diagramsOne-dimensional example

Bastian Rieck Persistent homology for multivariate data visualization 20

Page 36: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Persistent homology & persistence diagramsOne-dimensional example

Bastian Rieck Persistent homology for multivariate data visualization 20

Page 37: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Persistent homology & persistence diagramsOne-dimensional example

Bastian Rieck Persistent homology for multivariate data visualization 20

Page 38: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Persistent homology & persistence diagramsOne-dimensional example

Bastian Rieck Persistent homology for multivariate data visualization 20

Page 39: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Persistent homology & persistence diagramsOne-dimensional example

Bastian Rieck Persistent homology for multivariate data visualization 20

Page 40: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Persistent homology & persistence diagramsOne-dimensional example

Bastian Rieck Persistent homology for multivariate data visualization 20

Page 41: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Uses for persistence diagrams

A persistence diagram is a multi-scale summary of topologicalactivity in a data set. But the diagrams go well and beyond a simplecomparison of Betti numbers!

L’algèbre est généreuse, elle donne souvent plus qu’on luidemande.

—D’Alembert

Bastian Rieck Persistent homology for multivariate data visualization 21

Page 42: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Distance calculations

W2(X, Y) =

√inf

η : X→Y

∑x∈X

‖x− η(x)‖2∞

Bastian Rieck Persistent homology for multivariate data visualization 22

Page 43: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Distance calculations

W2(X, Y) =

√inf

η : X→Y

∑x∈X

‖x− η(x)‖2∞

Bastian Rieck Persistent homology for multivariate data visualization 22

Page 44: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Distance calculations

W2(X, Y) =

√inf

η : X→Y

∑x∈X

‖x− η(x)‖2∞

Bastian Rieck Persistent homology for multivariate data visualization 22

Page 45: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Sensitivity of distancesWasserstein versus function space distances

Only the Wasserstein distance does not distort the ‘shape’ of noisein the data.

Bastian Rieck Persistent homology for multivariate data visualization 23

Page 46: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Sensitivity of distancesWasserstein versus function space distances

Only the Wasserstein distance does not distort the ‘shape’ of noisein the data.

Bastian Rieck Persistent homology for multivariate data visualization 23

Page 47: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Part III

Applications

Page 48: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Published projects

Persistence rings Simplicial chain graphs

bagged trees

regression trees

enet

enet tuned

knn

lm tuned

lm

m5

mars

plspls tuned

random forest

ridge

ridge tuned

rlm pca

rlm

rpart

svm

svm tuned

cubist

Model landscapesRieck, Mara, Leitte: Multivariate data analysis using persistence-based �ltering and topological signatures

Rieck, Leitte: Structural analysis of multivariate point clouds using simplicial chains Rieck, Leitte: Enhancing comparative model analysis using persistent homology

MDS1.65

t-SNE2.52

Isomapk=8, 3.55

Evaluating embeddingsRieck, Leitte: Persistent homology for the evaluation of dimensionality reduction schemes

Agreement analysisRieck, Leitte: Agreement analysis of quality measures for dimensionality reduction

Data descriptor landscapesRieck, Leitte: Comparing dimensionality reduction methods using data descriptor landscapes

Bastian Rieck Persistent homology for multivariate data visualization 24

Page 49: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Published projects

Persistence rings Simplicial chain graphs

bagged trees

regression trees

enet

enet tuned

knn

lm tuned

lm

m5

mars

plspls tuned

random forest

ridge

ridge tuned

rlm pca

rlm

rpart

svm

svm tuned

cubist

Model landscapesRieck, Mara, Leitte: Multivariate data analysis using persistence-based �ltering and topological signatures

Rieck, Leitte: Structural analysis of multivariate point clouds using simplicial chains Rieck, Leitte: Enhancing comparative model analysis using persistent homology

MDS1.65

t-SNE2.52

Isomapk=8, 3.55

Evaluating embeddingsRieck, Leitte: Persistent homology for the evaluation of dimensionality reduction schemes

Agreement analysisRieck, Leitte: Agreement analysis of quality measures for dimensionality reduction

Data descriptor landscapesRieck, Leitte: Comparing dimensionality reduction methods using data descriptor landscapes

Bastian Rieck Persistent homology for multivariate data visualization 24

Page 50: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Qualitative visualizations: Persistence rings

0.5 1

0.5

1

Bastian Rieck Persistent homology for multivariate data visualization 25

Page 51: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Qualitative visualizations: Persistence rings

0.5 1

0.5

1

Bastian Rieck Persistent homology for multivariate data visualization 25

Page 52: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Qualitative visualizations: Persistence rings

0.5 1

0.5

1

Bastian Rieck Persistent homology for multivariate data visualization 25

Page 53: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Qualitative visualizations: Simplicial chain graphs

MotivationAnalyse the connectivity of topological features for ensembles ofdata sets: Different runs of an experiment, different times at whichmeasurements are being taken…

Bastian Rieck Persistent homology for multivariate data visualization 26

Page 54: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Central idea

Obtain geometrical descriptions of topological features (‘holes’)while calculating persistent homology.

This is known as the ‘localization problem’ in persistent homology.

Bastian Rieck Persistent homology for multivariate data visualization 27

Page 55: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Solving the localization problem

1 Define a geodesic ball in a simplicial complex.

2 Solve all-pairs-shortest-paths problem to find possible sites.

3 Branch-and-bound strategy to improve performance.

Bastian Rieck Persistent homology for multivariate data visualization 28

Page 56: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Simplicial chain graphBasic example

Advantage: The space in which we localize the features usually hasa high dimensions, but the graph will always be drawn inR2.

Bastian Rieck Persistent homology for multivariate data visualization 29

Page 57: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

ResultsData: Tropical Atmosphere Ocean Array

1995 199719961994

Bastian Rieck Persistent homology for multivariate data visualization 30

Page 58: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Lessons learned

1 We can obtain features via persistent homology that permit acomparative analysis.

2 Visualizing these features becomes abstract very quickly.

3 Need more ‘quantitative’ topological visualizations.

Bastian Rieck Persistent homology for multivariate data visualization 31

Page 59: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Quantitative visualizations: Model landscapes

Example: Solubility analysis

19 different mathematical models

1267 chemical compounds, described by228-dimensional feature vectors

Measured ground truth (solubility values)

Each model is a function f : D→ R. How to evaluate similarities &differences between the models?

Bastian Rieck Persistent homology for multivariate data visualization 32

Page 60: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

State-of-the-art

Existingmeasures (RMSE or R2) only focus on values of amodel.

The structure/shape is not being used!

Shortcomings: Sensitivity to noise, ‘masking’ the influence ofoutliers…

Bastian Rieck Persistent homology for multivariate data visualization 33

Page 61: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Our approach

1 Calculate Vietoris–Rips complex Vε on the molecular descriptors.

2 Use model values & ground truth to obtain a set of functions on Vε.

3 Calculate persistence diagrams for each function.

4 Compare diagrams using the Wasserstein distance.

5 Absolute comparison with ground truth diagram.

6 Relative comparison with all diagrams.

Bastian Rieck Persistent homology for multivariate data visualization 34

Page 62: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Our approach

1 Calculate Vietoris–Rips complex Vε on the molecular descriptors.

2 Use model values & ground truth to obtain a set of functions on Vε.

3 Calculate persistence diagrams for each function.

4 Compare diagrams using the Wasserstein distance.

5 Absolute comparison with ground truth diagram.

6 Relative comparison with all diagrams.

Bastian Rieck Persistent homology for multivariate data visualization 34

Page 63: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Our approach

1 Calculate Vietoris–Rips complex Vε on the molecular descriptors.

2 Use model values & ground truth to obtain a set of functions on Vε.

3 Calculate persistence diagrams for each function.

4 Compare diagrams using the Wasserstein distance.

5 Absolute comparison with ground truth diagram.

6 Relative comparison with all diagrams.

Bastian Rieck Persistent homology for multivariate data visualization 34

Page 64: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Our approach

1 Calculate Vietoris–Rips complex Vε on the molecular descriptors.

2 Use model values & ground truth to obtain a set of functions on Vε.

3 Calculate persistence diagrams for each function.

4 Compare diagrams using the Wasserstein distance.

5 Absolute comparison with ground truth diagram.

6 Relative comparison with all diagrams.

Bastian Rieck Persistent homology for multivariate data visualization 34

Page 65: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Our approach

1 Calculate Vietoris–Rips complex Vε on the molecular descriptors.

2 Use model values & ground truth to obtain a set of functions on Vε.

3 Calculate persistence diagrams for each function.

4 Compare diagrams using the Wasserstein distance.

5 Absolute comparison with ground truth diagram.

6 Relative comparison with all diagrams.

Bastian Rieck Persistent homology for multivariate data visualization 34

Page 66: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Our approach

1 Calculate Vietoris–Rips complex Vε on the molecular descriptors.

2 Use model values & ground truth to obtain a set of functions on Vε.

3 Calculate persistence diagrams for each function.

4 Compare diagrams using the Wasserstein distance.

5 Absolute comparison with ground truth diagram.

6 Relative comparison with all diagrams.

Bastian Rieck Persistent homology for multivariate data visualization 34

Page 67: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Visualization of relative model differences

bagged trees

regression trees

enet

enet tuned

knn

lm tuned

lm

m5

mars

plspls tuned

random forest

ridge

ridge tuned

rlm pca

rlm

rpart

svm

svm tuned

cubist

Bastian Rieck Persistent homology for multivariate data visualization 35

Page 68: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Why is m5 rated differently?Comparison with cubist

Bastian Rieck Persistent homology for multivariate data visualization 36

Page 69: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Summary

Topological methods yield quantitative and qualitative informationabout data sets—often, this goes well and beyond the scope ofregular geometric approaches.

Future work

1 Performance improvements: Smaller complexes, otherdistance measures, …

2 Ensemble data & ‘average’ topological structures

3 Connection to geometric features in data

Thank you for your attention!

Bastian Rieck Persistent homology for multivariate data visualization 37

Page 70: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Summary

Topological methods yield quantitative and qualitative informationabout data sets—often, this goes well and beyond the scope ofregular geometric approaches.

Future work

1 Performance improvements: Smaller complexes, otherdistance measures, …

2 Ensemble data & ‘average’ topological structures

3 Connection to geometric features in data

Thank you for your attention!

Bastian Rieck Persistent homology for multivariate data visualization 37

Page 71: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Summary

Topological methods yield quantitative and qualitative informationabout data sets—often, this goes well and beyond the scope ofregular geometric approaches.

Future work

1 Performance improvements: Smaller complexes, otherdistance measures, …

2 Ensemble data & ‘average’ topological structures

3 Connection to geometric features in data

Thank you for your attention!

Bastian Rieck Persistent homology for multivariate data visualization 37

Page 72: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Part IV

Bonus slides

Page 73: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Quantitative visualizations: Dimensionality reduction

Which of these embeddings are suitable for my data?

Bastian Rieck Persistent homology for multivariate data visualization 38

Page 74: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Data descriptorsDefinition & examples

Any function f : D ⊆ Rn → R is an admissible data descriptor. fshould measure ‘salient properties’ of a multivariate data set.

Density Eccentricity Local linearity

Low High

Bastian Rieck Persistent homology for multivariate data visualization 39

Page 75: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Data descriptorsDefinition & examples

Any function f : D ⊆ Rn → R is an admissible data descriptor. fshould measure ‘salient properties’ of a multivariate data set.

Density

Eccentricity Local linearity

Low High

Bastian Rieck Persistent homology for multivariate data visualization 39

Page 76: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Data descriptorsDefinition & examples

Any function f : D ⊆ Rn → R is an admissible data descriptor. fshould measure ‘salient properties’ of a multivariate data set.

Density Eccentricity

Local linearity

Low High

Bastian Rieck Persistent homology for multivariate data visualization 39

Page 77: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Data descriptorsDefinition & examples

Any function f : D ⊆ Rn → R is an admissible data descriptor. fshould measure ‘salient properties’ of a multivariate data set.

Density Eccentricity Local linearity

Low High

Bastian Rieck Persistent homology for multivariate data visualization 39

Page 78: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Workflow

Persistenthomology

Datadescriptors

Embeddings

Original data

W2

W2

Bastian Rieck Persistent homology for multivariate data visualization 40

Page 79: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Visualizing global quality

QualityEmbedding = W2

(DOriginal, DEmbedding

)HLLE

1.66

t-SNE2.26

Isomap2.26

PCA3.42

SPE10.67

Bastian Rieck Persistent homology for multivariate data visualization 41

Page 80: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Correspondence to our intuitionGlobal quality information

HLLE1.66

t-SNE2.26

Isomap2.26

PCA3.42

SPE10.67

t-SNE

Isomap PCA SPE

HLLEOriginal Data

Bastian Rieck Persistent homology for multivariate data visualization 42

Page 81: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Local quality information

Quality

GoodMediumBad

Bastian Rieck Persistent homology for multivariate data visualization 43

Page 82: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

ResultsIsomap faces

Image source: A Global Geometric Framework for Nonlinear Dimensionality Reduction (Joshua B. Tenenbaum, Vin de Silva,John C. Langford), 2000.

Bastian Rieck Persistent homology for multivariate data visualization 44

Page 83: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

ResultsIsomap faces: Parameter study

MDS1.65

t-SNE2.52

Isomapk=8, 3.55

Isomapk=16, 3.62

Isomapk=10, 5.07

Bastian Rieck Persistent homology for multivariate data visualization 45

Page 84: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Quantitative visualizations: Multiple data descriptors

Data descriptors

Dat

a se

tEm

be

dd

ing

s

Bastian Rieck Persistent homology for multivariate data visualization 46

Page 85: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Feature vector approach

Embedding Data descriptors

Feature vector

Data W2 W2 W2

Bastian Rieck Persistent homology for multivariate data visualization 47

Page 86: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Data descriptor landscapes

Bastian Rieck Persistent homology for multivariate data visualization 48

Page 87: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Data descriptor landscapes

density

eccentricity

linearity

Bastian Rieck Persistent homology for multivariate data visualization 48

Page 88: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Data descriptor landscapes

density

eccentricity

linea

rity

Isomap (k=8)

Isomap (k=9)

HLLE LTSA

LLE

PCA

Low (better) High (worse)

Norm

Bastian Rieck Persistent homology for multivariate data visualization 48

Page 89: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Data descriptor landscapes

density

eccentricity

linea

rity

Isomap (k=8)

Isomap (k=9)

HLLE

LLE

PCA

Low (better) High (worse)

Norm

LTSA

Bastian Rieck Persistent homology for multivariate data visualization 48

Page 90: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Data descriptor landscapesGrouping behaviour

SPE (k=20)

PCA

FA (n=1000)

LLE (k=30)density

eccentricity

linea

rity

FA (n=2000)

FA (n=5000)

LLE (k=25)

LLE (k=50)

LLE (k=45)

LLE (k=55)

LLE (k=99)

Bastian Rieck Persistent homology for multivariate data visualization 49

Page 91: Persistent homology for multivariate data …...Motivation Understandingthe‘shape’ofdata Instances Attributes Unstructured dataScatterplot matrix Parallel coordinates u.velocity

Data descriptor landscapesGrouping behaviour

PCA

LLE (k=30)density

eccentricity

linea

rity

FA (n=2000)

FA (n=5000)

LLE (k=25)

LLE (k=50)

LLE (k=45)

LLE (k=99)

FA (n=1000)

LLE (k=55)

SPE (k=20)

Bastian Rieck Persistent homology for multivariate data visualization 49