Tutorial of topological_data_analysis_part_1(basic)

Tutorial of Topological Data Analysis

Tran Quoc Hoan

@k09ht haduonght.wordpress.com/

Hasegawa lab., Tokyo

The University of Tokyo

Part I - Basic Concepts

My TDA = Topology Data Analysis ’s road

TDA Road 2

Part I - Basic concepts & applications

Part II - Advanced computation

Part III - Mapper Algorithm

Part V - Applications in…

Part VI - Applications in…

Part IV - Software Roadmap

He is following me

Outline

TDA - Basic Concepts 3

1. Topology and holes

3. Definition of holes

5. Some of applications

2. Simplicial complexes

4. Persistent homology

Outline







Topology

I - Topology and Holes 5

The properties of space that are preserved under continuous deformations, such as stretching and bending, but not tearing or gluing

⇠= ⇠= ⇠=

⇠= ⇠= ⇠=

⇠=

�

�

Invariant

6

Question: what are invariant things in topology?

⇠= ⇠= ⇠=

⇠= ⇠=

⇠=

⇠=

ConnectedComponent Ring Cavity

1 0 0

2 0 0

1 1 0

1 10

Number of

I - Topology and Holes

Holes and dimension

7

Topology: consider the continuous deformation under the same dimensional hole

✤ Concern to forming of shape: connected component, ring, cavity

• 0-dimensional “hole” = connected component• 1-dimensional “hole” = ring

• 2-dimensional “hole” = cavity

How to define “hole”?

Use “algebraic” Homology group


Homology group

8

✤ For geometric object X, homology Hl satisfied:

k0 : number of connected components

k1 : number of rings

k2 : number of cavities

kq : number of q-dimensional holes

Betti-numbers


Image source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf

http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf

Outline







Simplicial complexes

10

Simplicial complex:A set of vertexes, edges, triangles, tetrahedrons, … that are closed under taking faces and that have no improper intersections

vertex(0-dimension)

edge(1-dimension)

triangle(2-dimension)

tetrahedron(3-dimension)

simplicial complex

not simplicial complex

2 - Simplicial complexes

k-simplex

Simplicial

11

n-simplex:The “smallest” convex hull of n+1 affinity independent points

vertex(0-dimension)

edge(1-dimension)

triangle(2-dimension)

tetrahedron(3-dimension) n-simplex

� = |v0v1...vn| = {�0v0 + �1v1 + ...+ �nvn|�0 + ...+ �n = 1,�i � 0}

A m-face of σ is the convex hull τ = |vi0…vim| of a non-empty subset of {v0, v1, …, vn} (and it is proper if the subset is not the entire set)

⌧ � �


Simplicial

12

Direction of simplicial:The same direction with permutation <i0i1…in>

1-simplex

2-simplex

3-simplex


Simplicial complex

13

Definition:A simplicial complex is a finite collection of simplifies K such that

(1) If � 2 K and for all face ⌧ � � then ⌧ 2 K

(2) If �, ⌧ 2 K and � \ ⌧ 6= ? then � \ ⌧ � � and � \ ⌧ � ⌧

The maximum dimension of simplex in K is the dimension of K

K2 = {|v0v1v2|, |v0v1|, |v0v2|, |v1v2|, |v0|, |v1|, |v2|}

K = K2 [ {|v3v4|, |v3|, |v4|}

NOT YES


Simplicial complexes

14

Hemoglobin simplicial complex

Image source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf



✤ Let be a covering of

Nerve

15

� = {Bi|i = 1, ...,m} X = [mi=1Bi

✤ The nerve of is a simplicial complex� N (�) = (V,⌃)


Nerve theorem

16

✤ If is covered by a collection of convex closed sets then X and are homotopy equivalent

X ⊂ RN

� = {Bi|i = 1, ...,m} N (�)


Cech complex

17

P = {xi 2 RN |i = 1, ...,m}

Br(xi) = {x 2 RN | ||x� xi|| r}

✤ The Cech complex C(P, r) is the nerve of

✤

� = {Br(xi)| xi 2 P}

✤ From nerve theorem: C(P, r)

Xr = [mi=1Br(xi) ' C(P, r)

✤ Filtration

ball with radius r


Cech complex

18

✤ The weighted Cech complex C(P, R) is the nerve of

✤ Computations to check the intersections of balls are not easy

ball with different radius� = {Bri(xi)| xi 2 P}

Alpha complex


Voronoi diagrams and Delaunay complex

19

✤ P = {xi 2 RN |i = 1, ...,m}

Vi = {x 2 RN | ||x� xi|| ||x� xj ||, j 6= i}

RN = [mi=1Vi

Voronoi cell

✤ � = {Vi|i = 1, ...,m}

D(P ) = N (�)

Voronoi decomposition

Delaunay complex


General position

20

✤ is in a general position, if there is no

✤ If all combination of N+2 points in P is in a general position, then P is in a general position

x1, ..., xN+2 2 RN

x 2 RNs.t.||x� x1|| = ... = ||x� xN+2||

✤ If P is in a general position then

The dimensions of Delaunay simplexes <= N

Geometric representation of D(P) can be embedded in RN


Alpha complex

21

✤

✤

✤ The alpha complex is the nerve of �

�

↵(P, r) = N (�)

✤ From Nerve theorem:Xr ' ↵(P, r)


Alpha complex

22

✤

✤

✤ The weighted alpha complex is defined with different radius

if P is in a general position

filtration of alpha complexes


Alpha complex

23

✤ Computations are much easier than Cech complexes

✤ Software: CGAL

• Construct alpha complexes of points clouds data in RN with N <= 3

Filtration of alpha complexImage source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf



Outline







Definition of holes

25

Simplicial complex

Chain complex

Homologygroup

Algebraic Holes

Geometrical object

Algebraic object

3 - Definition of Holes

What is hole?

26

✤ 1-dimensional hole: ring

not ring have ring

boundary

without ring

without boundary

Ring = 1-dimensional graph without boundary?

However, NOT

1-dimensional graph without

boundary but is 2-dimensional graph ’s boundary

Ring = 1-dimensional graph without boundary and is not boundary of 2-dimensional graph


What is hole?

27

✤ 2-dimensional hole: cavity

not cavity have cavity

boundary

without cavity

without boundary

However, NOT

2-dimensional graph without

boundary but is 3-dimensional graph ’s boundary

Cavity = 2-dimensional graph without boundary and is not boundary of 3-dimensional graph

Cavity = 2-dimensional graph without boundary?


Hole and boundary

28

q-dimensional hole

q-dimensional graph without boundary and

is not boundary of (q+1)-dimensional graph

=We try to make it clear by “Algebraic” language


Chain complexes

29

Let K be a simplicial complex with dimension n. The group of q-chains is defined as below:

The element of Cq(K) is called q chain.

Definition:

Cq(K) := {X

↵i

⌦vi0 ...viq

↵|↵i 2 R,

⌦vi0 ...viq

↵: q simplicial in K}

0 q nifCq(K) := 0, if q < 0 or q > n


Boundary

30

Boundary of a q-simplex is the sum of its (q-1)-dimensional faces.

Definition:

vil is omitted

@|v0v1v2| := |v0v1|+ |v1v2|+ |v0v2|


Boundary

31

Fundamental lemma@q�1 � @q = 0

@2 @1For q = 2

In general• For a q - simplex τ, the boundary ∂qτ, consists of all (q-1) faces of τ.• Every (q-2)-face of τ belongs to exactly two (q-1)-faces, with different direction

@q�1@q⌧ = 0


Hole and boundary

32

q-dimensional holeq-dimensional graph without boundary and is

not boundary of (q+1)-dimensional graph

(1)

(2)

(1)

(2)

:= ker @q

:= im@q+1

(cycles group)

(boundary group)

Bq(K) ⇢ Zq(K) ⇢ Cq(K)

@q � @q+1 = 0


Hole and boundary

33

q-dimensional holeq-dimensional graph without boundary and is

not boundary of (q+1)-dimensional graph

(1)

(2)

Elements in Zq(K) remain after make Bq(K) become zero

This operator is defined as Q=

:= ker @q := im@q+1

Q(z0) = Q(z) +Q(b) = Q(z)

(z and z’ are equivalent in with respect to )

q-dimensional hole = an equivalence class of vectors

ker @qim @q+1

For z0 = z + b, z, z0 2 ker @q, b 2 im @q+1


Homology group

34

Homology groupsThe qth Homology Group Hq is defined as Hq = Ker@q/Im@q+1

= {z + Im@q+1 | z 2 Ker@q } = {[z]|z 2 Ker@q}

Divided in groups with operator [z] + [z’] = [z + z’]

Betti NumbersThe qth Betti Number is defined as the dimension of Hq

bq = dim(Hq)

H0(K): connected component H1(K): ring H2(K): cavity


Computing Homology

35

v0

v1 v2

v3

All vectors in the column space of Ker@0 are equivalent with respect to Im@1

b0 = dim(H0) = 1Im@2 has only the zero vector

b1 = dim(H1) = 1H1 = {�(|v0v1|+ |v1v2|+ |v2v3|+ |v3v0|)}


Computing Homology

36

v0

v1 v2

v3

H1 = {�(hv0v1i+ hv1v2i+ hv2v3i � hv0v3i)}

All vectors in the column space of Ker@0 are equivalent with respect to Im@1

b0 = dim(H0) = 1Im@2 has only the zero vector

b1 = dim(H1) = 13 - Definition of Holes

Outline







Persistent Homology

Persistent homology 38

✤ Consider filtration of finite type

K : K0 ⇢ K1 ⇢ ... ⇢ Kt ⇢ ...

9 ⇥ s.t. Kj = K⇥, 8j � ⇥

✤ : total simplicial complexK = [t�0Kt

Kk

Ktk

T (�) = t � 2 Kt \Kt�1

: all k-simplexes in K

: all k-simplexes in K at time t

: birth time of the simplex

time

Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf


Persistent Homology

39

✤ Z2 - vector space

✤ Z2[x] - graded module

✤ Inclusion map

✤ is a free Z2[x] module with the baseCk(K)

Persistent homology Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf


Persistent Homology

40

✤ Boundary map

✤ From the graded structure

✤ Persistent homology

(graded homomorphism)face of σ



Persistent Homology

41

✤ From the structure theorem of Z2[x] (PID)

✤ Persistent interval

✤ Persistent diagram

Ii(b): inf of Ii, Ii(d): sup of Ii



Persistent Homology

42

birth time

death time

✤ “Hole” appears close to the diagonal may be the “noise”

✤ “Hole” appears far to the diagonal may be the “noise”

✤ Detect the “structure hole”



Outline







see more at part2 of tutorial

Applications

5 - Some of applications 44

• Persistence to Protein compressibilityMarcio Gameiro et. al. (Japan J. Indust. Appl. Math (2015) 32:1-17)

Protein Structure

Persistence to protein compressibility 45

amino acid 1 amino acid 2

3-dim structure of hemoglobin1-dim structure of protein

foldingpeptide bond



Protein Structure


✤ Van der Waals radius of an atom

H: 1.2, C: 1.7, N: 1.55 (A0)O: 1.52, S: 1.8, P: 1.8 (A0)


Van der Waals ball model of hemoglobin


Alpha Complex for Protein Modeling


✤

✤

✤

: position of atoms

: radius of i-th atom

: weighted Voronoi Decomposition

: power distance

: ball with radius ri



Alpha Complex for Protein Modeling


✤

✤

✤

Alpha complex nerve

k - simplex

Nerve lemma

Changing radius

to form a filtration (by w)



Topology of Ovalbumin


birth time

deat

h tim

e

birth time

deat

h tim

e1st betti

plot2nd betti

plot

PD1 PD2



Compressibility


3-dim structureFunctionality

Softness

Compressibility

Experiments Quantification

Persistence diagrams

(Difficult)

…..…..

Select generators and fitting parameters with experimental compressibility

holes

Denoising


birth timede

ath

time

✤ Topological noise

✤ Non-robust topological features depend on a status of fluctuations

✤ The quantification should not be dependent on a status of fluctuations



Holes with Sparse or Dense Boundary


✤ A sparse hole structure is deformable to a much larger extent than the dense hole → greater compressibility

✤ Effective sparse holes

: van der Waals ball: enlarged ball

birth time

deat

h tim

e



# of generators v.s. compressibility


# of generators v.s. compressibility

Topological Measurement Cp

Com

pres

sibi

lity



Applications


• Persistence to Phylogenetic Trees

Protein Phylogenetic Tree

Persistence to Phylogenetic Trees 55

✤ Phylogenetic tree is defined by a distance matrix for a set of species (human, dog, frog, fish,…)

✤ The distance matrix is calculated by a score function based on similarity of amino acid sequences

amino acid sequences

fish hemoglobin

frog hemoglobin

human hemoglobin

distance matrix ofhemoglobin

fishfroghumandog



Persistence Distance and Classification of Proteins


✤ The score function based on amnio acid sequences does not contain information of 3-dim structure of proteins

✤ Wasserstein distance (of degree p)

Cohen-Steiner, Edelsbrunner, Harer, and Mileyko, FCM, 2010

on persistence diagrams reflects similarity of persistence diagram (3-dim structures) of proteins



Persistence Distance and Classification of Proteins


birth time

deat

h tim

e

birth time

birth time

deat

h tim

e

deat

h tim

eWasserstein distance

Bijection



Distance between persistence diagrams


Persistence of sub level sets

Stability Theorem (Cohen-Steiner et al., 2010)birth time

deat

h tim

e



Phylogenetic Tree by Persistence


✤ Apply the distance on persistence diagrams to classify proteins

Persistence diagram used the noise band same as in the computations of compressibility

3DHT

3D1A

1QPW

3LQD

1FAW

1C40

2FZB



Future work


✤ Principle to de-noise fluctuations in persistence diagrams (NMR experiments)

✤ Finding minimum generators to identify specific regions in a protein (e.g., a region inducing high compressibility, hereditarily important regions)

✤ Zigzag persistence for robust topological features among a specific group of proteins (quiver representation)

✤ Multi-dimensional persistence (PID → Grobner basic)



Applications more in part … of tutorials


✤ Robotics

✤ Computer Visions

✤ Sensor network

✤ Concurrency & database

✤ Visualization

Prof. Robert Ghrist Department of Mathematics University of Pennsylvania

One of pioneers in applications

Michael Farber Edelsbrunner

Mischaikow Gaucher Bubenik

Zomorodian

Carlsson

Software


• Alpha complex by CGALhttp://www.cgal.org/

• Persistence diagrams by Perseus (coded by Vidit Nanda)

http://www.sas.upenn.edu/~vnanda/perseus/index.html

http://chomp.rutgers.edu/Project.html

• CHomP project

Reference links


• Yasuaki Hiraoka associate professor homepage

http://www2.math.kyushu-u.ac.jp/~hiraoka/site/About_Me.html


www.msys.sys.i.kyoto-u.ac.jp/~kazunori/paper/nist20081219.pdf

• Applications in sensor network

http://www2.math.kyushu-u.ac.jp/~hiraoka/site/About_Me.html


http://www.msys.sys.i.kyoto-u.ac.jp/~kazunori/paper/nist20081219.pdf

Tutorial of topological_data_analysis_part_1(basic)

Science

Transcript of Tutorial of topological_data_analysis_part_1(basic)