Tutorial of topological_data_analysis_part_1(basic)

63
Tutorial of Topological Data Analysis Tran Quoc Hoan @k09ht haduonght.wordpress.com/ Hasegawa lab., Tokyo The University of Tokyo Part I - Basic Concepts

Transcript of Tutorial of topological_data_analysis_part_1(basic)

Page 1: Tutorial of topological_data_analysis_part_1(basic)

Tutorial of Topological Data Analysis

Tran Quoc Hoan

@k09ht haduonght.wordpress.com/

Hasegawa lab., Tokyo

The University of Tokyo

Part I - Basic Concepts

Page 2: Tutorial of topological_data_analysis_part_1(basic)

My TDA = Topology Data Analysis ’s road

TDA Road 2

Part I - Basic concepts & applications

Part II - Advanced computation

Part III - Mapper Algorithm

Part V - Applications in…

Part VI - Applications in…

Part IV - Software Roadmap

He is following me

Page 3: Tutorial of topological_data_analysis_part_1(basic)

Outline

TDA - Basic Concepts 3

1. Topology and holes

3. Definition of holes

5. Some of applications

2. Simplicial complexes

4. Persistent homology

Page 4: Tutorial of topological_data_analysis_part_1(basic)

Outline

TDA - Basic Concepts 4

1. Topology and holes

5. Some of applications

2. Simplicial complexes

4. Persistent homology

3. Definition of holes

Page 5: Tutorial of topological_data_analysis_part_1(basic)

Topology

I - Topology and Holes 5

The properties of space that are preserved under continuous deformations, such as stretching and bending, but not tearing or gluing

⇠= ⇠= ⇠=

⇠= ⇠= ⇠=

⇠=

Page 6: Tutorial of topological_data_analysis_part_1(basic)

Invariant

6

Question: what are invariant things in topology?

⇠= ⇠= ⇠=

⇠= ⇠=

⇠=

⇠=

ConnectedComponent Ring Cavity

1 0 0

2 0 0

1 1 0

1 10

Number of

I - Topology and Holes

Page 7: Tutorial of topological_data_analysis_part_1(basic)

Holes and dimension

7

Topology: consider the continuous deformation under the same dimensional hole

✤ Concern to forming of shape: connected component, ring, cavity

• 0-dimensional “hole” = connected component• 1-dimensional “hole” = ring

• 2-dimensional “hole” = cavity

How to define “hole”?

Use “algebraic” Homology group

I - Topology and Holes

Page 8: Tutorial of topological_data_analysis_part_1(basic)

Homology group

8

✤ For geometric object X, homology Hl satisfied:

k0 : number of connected components

k1 : number of rings

k2 : number of cavities

kq : number of q-dimensional holes

Betti-numbers

I - Topology and Holes

Image source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf

Page 9: Tutorial of topological_data_analysis_part_1(basic)

Outline

TDA - Basic Concepts 9

1. Topology and holes

5. Some of applications

2. Simplicial complexes

4. Persistent homology

3. Definition of holes

Page 10: Tutorial of topological_data_analysis_part_1(basic)

Simplicial complexes

10

Simplicial complex:A set of vertexes, edges, triangles, tetrahedrons, … that are closed under taking faces and that have no improper intersections

vertex(0-dimension)

edge(1-dimension)

triangle(2-dimension)

tetrahedron(3-dimension)

simplicial complex

not simplicial complex

2 - Simplicial complexes

k-simplex

Page 11: Tutorial of topological_data_analysis_part_1(basic)

Simplicial

11

n-simplex:The “smallest” convex hull of n+1 affinity independent points

vertex(0-dimension)

edge(1-dimension)

triangle(2-dimension)

tetrahedron(3-dimension) n-simplex

� = |v0v1...vn| = {�0v0 + �1v1 + ...+ �nvn|�0 + ...+ �n = 1,�i � 0}

A m-face of σ is the convex hull τ = |vi0…vim| of a non-empty subset of {v0, v1, …, vn} (and it is proper if the subset is not the entire set)

⌧ � �

2 - Simplicial complexes

Page 12: Tutorial of topological_data_analysis_part_1(basic)

Simplicial

12

Direction of simplicial:The same direction with permutation <i0i1…in>

1-simplex

2-simplex

3-simplex

2 - Simplicial complexes

Page 13: Tutorial of topological_data_analysis_part_1(basic)

Simplicial complex

13

Definition:A simplicial complex is a finite collection of simplifies K such that

(1) If � 2 K and for all face ⌧ � � then ⌧ 2 K

(2) If �, ⌧ 2 K and � \ ⌧ 6= ? then � \ ⌧ � � and � \ ⌧ � ⌧

The maximum dimension of simplex in K is the dimension of K

K2 = {|v0v1v2|, |v0v1|, |v0v2|, |v1v2|, |v0|, |v1|, |v2|}

K = K2 [ {|v3v4|, |v3|, |v4|}

NOT YES

2 - Simplicial complexes

Page 14: Tutorial of topological_data_analysis_part_1(basic)

Simplicial complexes

14

Hemoglobin simplicial complex

Image source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf

2 - Simplicial complexes

Page 15: Tutorial of topological_data_analysis_part_1(basic)

✤ Let be a covering of

Nerve

15

� = {Bi|i = 1, ...,m} X = [mi=1Bi

✤ The nerve of is a simplicial complex� N (�) = (V,⌃)

2 - Simplicial complexes

Page 16: Tutorial of topological_data_analysis_part_1(basic)

Nerve theorem

16

✤ If is covered by a collection of convex closed sets then X and are homotopy equivalent

X ⊂ RN

� = {Bi|i = 1, ...,m} N (�)

2 - Simplicial complexes

Page 17: Tutorial of topological_data_analysis_part_1(basic)

Cech complex

17

P = {xi 2 RN |i = 1, ...,m}

Br(xi) = {x 2 RN | ||x� xi|| r}

✤ The Cech complex C(P, r) is the nerve of

� = {Br(xi)| xi 2 P}

✤ From nerve theorem: C(P, r)

Xr = [mi=1Br(xi) ' C(P, r)

✤ Filtration

ball with radius r

2 - Simplicial complexes

Page 18: Tutorial of topological_data_analysis_part_1(basic)

Cech complex

18

✤ The weighted Cech complex C(P, R) is the nerve of

✤ Computations to check the intersections of balls are not easy

ball with different radius� = {Bri(xi)| xi 2 P}

Alpha complex

2 - Simplicial complexes

Page 19: Tutorial of topological_data_analysis_part_1(basic)

Voronoi diagrams and Delaunay complex

19

✤ P = {xi 2 RN |i = 1, ...,m}

Vi = {x 2 RN | ||x� xi|| ||x� xj ||, j 6= i}

RN = [mi=1Vi

Voronoi cell

✤ � = {Vi|i = 1, ...,m}

D(P ) = N (�)

Voronoi decomposition

Delaunay complex

2 - Simplicial complexes

Page 20: Tutorial of topological_data_analysis_part_1(basic)

General position

20

✤ is in a general position, if there is no

✤ If all combination of N+2 points in P is in a general position, then P is in a general position

x1, ..., xN+2 2 RN

x 2 RNs.t.||x� x1|| = ... = ||x� xN+2||

✤ If P is in a general position then

The dimensions of Delaunay simplexes <= N

Geometric representation of D(P) can be embedded in RN

2 - Simplicial complexes

Page 21: Tutorial of topological_data_analysis_part_1(basic)

Alpha complex

21

✤ The alpha complex is the nerve of �

↵(P, r) = N (�)

✤ From Nerve theorem:Xr ' ↵(P, r)

2 - Simplicial complexes

Page 22: Tutorial of topological_data_analysis_part_1(basic)

Alpha complex

22

✤ The weighted alpha complex is defined with different radius

if P is in a general position

filtration of alpha complexes

2 - Simplicial complexes

Page 23: Tutorial of topological_data_analysis_part_1(basic)

Alpha complex

23

✤ Computations are much easier than Cech complexes

✤ Software: CGAL

• Construct alpha complexes of points clouds data in RN with N <= 3

Filtration of alpha complexImage source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf

2 - Simplicial complexes

Page 24: Tutorial of topological_data_analysis_part_1(basic)

Outline

TDA - Basic Concepts 24

1. Topology and holes

3. Definition of holes

5. Some of applications

2. Simplicial complexes

4. Persistent homology

Page 25: Tutorial of topological_data_analysis_part_1(basic)

Definition of holes

25

Simplicial complex

Chain complex

Homologygroup

Algebraic Holes

Geometrical object

Algebraic object

3 - Definition of Holes

Page 26: Tutorial of topological_data_analysis_part_1(basic)

What is hole?

26

✤ 1-dimensional hole: ring

not ring have ring

boundary

without ring

without boundary

Ring = 1-dimensional graph without boundary?

However, NOT

1-dimensional graph without

boundary but is 2-dimensional graph ’s boundary

Ring = 1-dimensional graph without boundary and is not boundary of 2-dimensional graph

3 - Definition of Holes

Page 27: Tutorial of topological_data_analysis_part_1(basic)

What is hole?

27

✤ 2-dimensional hole: cavity

not cavity have cavity

boundary

without cavity

without boundary

However, NOT

2-dimensional graph without

boundary but is 3-dimensional graph ’s boundary

Cavity = 2-dimensional graph without boundary and is not boundary of 3-dimensional graph

Cavity = 2-dimensional graph without boundary?

3 - Definition of Holes

Page 28: Tutorial of topological_data_analysis_part_1(basic)

Hole and boundary

28

q-dimensional hole

q-dimensional graph without boundary and

is not boundary of (q+1)-dimensional graph

=We try to make it clear by “Algebraic” language

3 - Definition of Holes

Page 29: Tutorial of topological_data_analysis_part_1(basic)

Chain complexes

29

Let K be a simplicial complex with dimension n. The group of q-chains is defined as below:

The element of Cq(K) is called q chain.

Definition:

Cq(K) := {X

↵i

⌦vi0 ...viq

↵|↵i 2 R,

⌦vi0 ...viq

↵: q simplicial in K}

0 q nifCq(K) := 0, if q < 0 or q > n

3 - Definition of Holes

Page 30: Tutorial of topological_data_analysis_part_1(basic)

Boundary

30

Boundary of a q-simplex is the sum of its (q-1)-dimensional faces.

Definition:

vil is omitted

@|v0v1v2| := |v0v1|+ |v1v2|+ |v0v2|

3 - Definition of Holes

Page 31: Tutorial of topological_data_analysis_part_1(basic)

Boundary

31

Fundamental lemma@q�1 � @q = 0

@2 @1For q = 2

In general• For a q - simplex τ, the boundary ∂qτ, consists of all (q-1) faces of τ.• Every (q-2)-face of τ belongs to exactly two (q-1)-faces, with different direction

@q�1@q⌧ = 0

3 - Definition of Holes

Page 32: Tutorial of topological_data_analysis_part_1(basic)

Hole and boundary

32

q-dimensional holeq-dimensional graph without boundary and is

not boundary of (q+1)-dimensional graph

(1)

(2)

(1)

(2)

:= ker @q

:= im@q+1

(cycles group)

(boundary group)

Bq(K) ⇢ Zq(K) ⇢ Cq(K)

@q � @q+1 = 0

3 - Definition of Holes

Page 33: Tutorial of topological_data_analysis_part_1(basic)

Hole and boundary

33

q-dimensional holeq-dimensional graph without boundary and is

not boundary of (q+1)-dimensional graph

(1)

(2)

Elements in Zq(K) remain after make Bq(K) become zero

This operator is defined as Q=

:= ker @q := im@q+1

Q(z0) = Q(z) +Q(b) = Q(z)

(z and z’ are equivalent in with respect to )

q-dimensional hole = an equivalence class of vectors

ker @qim @q+1

For z0 = z + b, z, z0 2 ker @q, b 2 im @q+1

3 - Definition of Holes

Page 34: Tutorial of topological_data_analysis_part_1(basic)

Homology group

34

Homology groupsThe qth Homology Group Hq is defined as Hq = Ker@q/Im@q+1

= {z + Im@q+1 | z 2 Ker@q } = {[z]|z 2 Ker@q}

Divided in groups with operator [z] + [z’] = [z + z’]

Betti NumbersThe qth Betti Number is defined as the dimension of Hq

bq = dim(Hq)

H0(K): connected component H1(K): ring H2(K): cavity

3 - Definition of Holes

Page 35: Tutorial of topological_data_analysis_part_1(basic)

Computing Homology

35

v0

v1 v2

v3

All vectors in the column space of Ker@0 are equivalent with respect to Im@1

b0 = dim(H0) = 1Im@2 has only the zero vector

b1 = dim(H1) = 1H1 = {�(|v0v1|+ |v1v2|+ |v2v3|+ |v3v0|)}

3 - Definition of Holes

Page 36: Tutorial of topological_data_analysis_part_1(basic)

Computing Homology

36

v0

v1 v2

v3

H1 = {�(hv0v1i+ hv1v2i+ hv2v3i � hv0v3i)}

All vectors in the column space of Ker@0 are equivalent with respect to Im@1

b0 = dim(H0) = 1Im@2 has only the zero vector

b1 = dim(H1) = 13 - Definition of Holes

Page 37: Tutorial of topological_data_analysis_part_1(basic)

Outline

TDA - Basic Concepts 37

1. Topology and holes

3. Definition of holes

5. Some of applications

2. Simplicial complexes

4. Persistent homology

Page 38: Tutorial of topological_data_analysis_part_1(basic)

Persistent Homology

Persistent homology 38

✤ Consider filtration of finite type

K : K0 ⇢ K1 ⇢ ... ⇢ Kt ⇢ ...

9 ⇥ s.t. Kj = K⇥, 8j � ⇥

✤ : total simplicial complexK = [t�0Kt

Kk

Ktk

T (�) = t � 2 Kt \Kt�1

: all k-simplexes in K

: all k-simplexes in K at time t

: birth time of the simplex

time

Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf

Page 39: Tutorial of topological_data_analysis_part_1(basic)

Persistent Homology

39

✤ Z2 - vector space

✤ Z2[x] - graded module

✤ Inclusion map

✤ is a free Z2[x] module with the baseCk(K)

Persistent homology Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf

Page 40: Tutorial of topological_data_analysis_part_1(basic)

Persistent Homology

40

✤ Boundary map

✤ From the graded structure

✤ Persistent homology

(graded homomorphism)face of σ

Persistent homology Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf

Page 41: Tutorial of topological_data_analysis_part_1(basic)

Persistent Homology

41

✤ From the structure theorem of Z2[x] (PID)

✤ Persistent interval

✤ Persistent diagram

Ii(b): inf of Ii, Ii(d): sup of Ii

Persistent homology Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf

Page 42: Tutorial of topological_data_analysis_part_1(basic)

Persistent Homology

42

birth time

death time

✤ “Hole” appears close to the diagonal may be the “noise”

✤ “Hole” appears far to the diagonal may be the “noise”

✤ Detect the “structure hole”

Persistent homology Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf

Page 43: Tutorial of topological_data_analysis_part_1(basic)

Outline

TDA - Basic Concepts 43

1. Topology and holes

3. Definition of holes

5. Some of applications

2. Simplicial complexes

4. Persistent homology

see more at part2 of tutorial

Page 44: Tutorial of topological_data_analysis_part_1(basic)

Applications

5 - Some of applications 44

• Persistence to Protein compressibilityMarcio Gameiro et. al. (Japan J. Indust. Appl. Math (2015) 32:1-17)

Page 45: Tutorial of topological_data_analysis_part_1(basic)

Protein Structure

Persistence to protein compressibility 45

amino acid 1 amino acid 2

3-dim structure of hemoglobin1-dim structure of protein

foldingpeptide bond

Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf

Page 46: Tutorial of topological_data_analysis_part_1(basic)

Protein Structure

Persistence to protein compressibility 46

✤ Van der Waals radius of an atom

H: 1.2, C: 1.7, N: 1.55 (A0)O: 1.52, S: 1.8, P: 1.8 (A0)

Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf

Van der Waals ball model of hemoglobin

Page 47: Tutorial of topological_data_analysis_part_1(basic)

Alpha Complex for Protein Modeling

Persistence to protein compressibility 47

: position of atoms

: radius of i-th atom

: weighted Voronoi Decomposition

: power distance

: ball with radius ri

Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf

Page 48: Tutorial of topological_data_analysis_part_1(basic)

Alpha Complex for Protein Modeling

Persistence to protein compressibility 48

Alpha complex nerve

k - simplex

Nerve lemma

Changing radius

to form a filtration (by w)

Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf

Page 49: Tutorial of topological_data_analysis_part_1(basic)

Topology of Ovalbumin

Persistence to protein compressibility 49

birth time

deat

h tim

e

birth time

deat

h tim

e1st betti

plot2nd betti

plot

PD1 PD2

Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf

Page 50: Tutorial of topological_data_analysis_part_1(basic)

Compressibility

Persistence to protein compressibility 50

3-dim structureFunctionality

Softness

Compressibility

Experiments Quantification

Persistence diagrams

(Difficult)

…..…..

Select generators and fitting parameters with experimental compressibility

holes

Page 51: Tutorial of topological_data_analysis_part_1(basic)

Denoising

Persistence to protein compressibility 51

birth timede

ath

time

✤ Topological noise

✤ Non-robust topological features depend on a status of fluctuations

✤ The quantification should not be dependent on a status of fluctuations

Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf

Page 52: Tutorial of topological_data_analysis_part_1(basic)

Holes with Sparse or Dense Boundary

Persistence to protein compressibility 52

✤ A sparse hole structure is deformable to a much larger extent than the dense hole → greater compressibility

✤ Effective sparse holes

: van der Waals ball: enlarged ball

birth time

deat

h tim

e

Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf

Page 53: Tutorial of topological_data_analysis_part_1(basic)

# of generators v.s. compressibility

Persistence to protein compressibility 53

# of generators v.s. compressibility

Topological Measurement Cp

Com

pres

sibi

lity

Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf

Page 54: Tutorial of topological_data_analysis_part_1(basic)

Applications

5 - Some of applications 54

• Persistence to Phylogenetic Trees

Page 55: Tutorial of topological_data_analysis_part_1(basic)

Protein Phylogenetic Tree

Persistence to Phylogenetic Trees 55

✤ Phylogenetic tree is defined by a distance matrix for a set of species (human, dog, frog, fish,…)

✤ The distance matrix is calculated by a score function based on similarity of amino acid sequences

amino acid sequences

fish hemoglobin

frog hemoglobin

human hemoglobin

distance matrix ofhemoglobin

fishfroghumandog

Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf

Page 56: Tutorial of topological_data_analysis_part_1(basic)

Persistence Distance and Classification of Proteins

Persistence to Phylogenetic Trees 56

✤ The score function based on amnio acid sequences does not contain information of 3-dim structure of proteins

✤ Wasserstein distance (of degree p)

Cohen-Steiner, Edelsbrunner, Harer, and Mileyko, FCM, 2010

on persistence diagrams reflects similarity of persistence diagram (3-dim structures) of proteins

Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf

Page 57: Tutorial of topological_data_analysis_part_1(basic)

Persistence Distance and Classification of Proteins

Persistence to Phylogenetic Trees 57

birth time

deat

h tim

e

birth time

birth time

deat

h tim

e

deat

h tim

eWasserstein distance

Bijection

Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf

Page 58: Tutorial of topological_data_analysis_part_1(basic)

Distance between persistence diagrams

Persistence to Phylogenetic Trees 58

Persistence of sub level sets

Stability Theorem (Cohen-Steiner et al., 2010)birth time

deat

h tim

e

Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf

Page 59: Tutorial of topological_data_analysis_part_1(basic)

Phylogenetic Tree by Persistence

Persistence to Phylogenetic Trees 59

✤ Apply the distance on persistence diagrams to classify proteins

Persistence diagram used the noise band same as in the computations of compressibility

3DHT

3D1A

1QPW

3LQD

1FAW

1C40

2FZB

Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf

Page 60: Tutorial of topological_data_analysis_part_1(basic)

Future work

TDA - Basic Concepts 60

✤ Principle to de-noise fluctuations in persistence diagrams (NMR experiments)

✤ Finding minimum generators to identify specific regions in a protein (e.g., a region inducing high compressibility, hereditarily important regions)

✤ Zigzag persistence for robust topological features among a specific group of proteins (quiver representation)

✤ Multi-dimensional persistence (PID → Grobner basic)

Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf

Page 61: Tutorial of topological_data_analysis_part_1(basic)

Applications more in part … of tutorials

5 - Some of applications 61

✤ Robotics

✤ Computer Visions

✤ Sensor network

✤ Concurrency & database

✤ Visualization

Prof. Robert Ghrist Department of Mathematics University of Pennsylvania

One of pioneers in applications

Michael Farber Edelsbrunner

Mischaikow Gaucher Bubenik

Zomorodian

Carlsson

Page 62: Tutorial of topological_data_analysis_part_1(basic)

Software

TDA - Basic Concepts 62

• Alpha complex by CGALhttp://www.cgal.org/

• Persistence diagrams by Perseus (coded by Vidit Nanda)

http://www.sas.upenn.edu/~vnanda/perseus/index.html

http://chomp.rutgers.edu/Project.html

• CHomP project

Page 63: Tutorial of topological_data_analysis_part_1(basic)

Reference links

TDA - Basic Concepts 63

• Yasuaki Hiraoka associate professor homepage

http://www2.math.kyushu-u.ac.jp/~hiraoka/site/About_Me.html

http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf

www.msys.sys.i.kyoto-u.ac.jp/~kazunori/paper/nist20081219.pdf

• Applications in sensor network