Tutorial of topological_data_analysis_part_1(basic)
Transcript of Tutorial of topological_data_analysis_part_1(basic)
Tutorial of Topological Data Analysis
Tran Quoc Hoan
@k09ht haduonght.wordpress.com/
Hasegawa lab., Tokyo
The University of Tokyo
Part I - Basic Concepts
My TDA = Topology Data Analysis ’s road
TDA Road 2
Part I - Basic concepts & applications
Part II - Advanced computation
Part III - Mapper Algorithm
Part V - Applications in…
Part VI - Applications in…
Part IV - Software Roadmap
He is following me
Outline
TDA - Basic Concepts 3
1. Topology and holes
3. Definition of holes
5. Some of applications
2. Simplicial complexes
4. Persistent homology
Outline
TDA - Basic Concepts 4
1. Topology and holes
5. Some of applications
2. Simplicial complexes
4. Persistent homology
3. Definition of holes
Topology
I - Topology and Holes 5
The properties of space that are preserved under continuous deformations, such as stretching and bending, but not tearing or gluing
⇠= ⇠= ⇠=
⇠= ⇠= ⇠=
⇠=
�
�
Invariant
6
Question: what are invariant things in topology?
⇠= ⇠= ⇠=
⇠= ⇠=
⇠=
⇠=
ConnectedComponent Ring Cavity
1 0 0
2 0 0
1 1 0
1 10
Number of
I - Topology and Holes
Holes and dimension
7
Topology: consider the continuous deformation under the same dimensional hole
✤ Concern to forming of shape: connected component, ring, cavity
• 0-dimensional “hole” = connected component• 1-dimensional “hole” = ring
• 2-dimensional “hole” = cavity
How to define “hole”?
Use “algebraic” Homology group
I - Topology and Holes
Homology group
8
✤ For geometric object X, homology Hl satisfied:
k0 : number of connected components
k1 : number of rings
k2 : number of cavities
kq : number of q-dimensional holes
Betti-numbers
I - Topology and Holes
Image source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
Outline
TDA - Basic Concepts 9
1. Topology and holes
5. Some of applications
2. Simplicial complexes
4. Persistent homology
3. Definition of holes
Simplicial complexes
10
Simplicial complex:A set of vertexes, edges, triangles, tetrahedrons, … that are closed under taking faces and that have no improper intersections
vertex(0-dimension)
edge(1-dimension)
triangle(2-dimension)
tetrahedron(3-dimension)
simplicial complex
not simplicial complex
2 - Simplicial complexes
k-simplex
Simplicial
11
n-simplex:The “smallest” convex hull of n+1 affinity independent points
vertex(0-dimension)
edge(1-dimension)
triangle(2-dimension)
tetrahedron(3-dimension) n-simplex
� = |v0v1...vn| = {�0v0 + �1v1 + ...+ �nvn|�0 + ...+ �n = 1,�i � 0}
A m-face of σ is the convex hull τ = |vi0…vim| of a non-empty subset of {v0, v1, …, vn} (and it is proper if the subset is not the entire set)
⌧ � �
2 - Simplicial complexes
Simplicial
12
Direction of simplicial:The same direction with permutation <i0i1…in>
1-simplex
2-simplex
3-simplex
2 - Simplicial complexes
Simplicial complex
13
Definition:A simplicial complex is a finite collection of simplifies K such that
(1) If � 2 K and for all face ⌧ � � then ⌧ 2 K
(2) If �, ⌧ 2 K and � \ ⌧ 6= ? then � \ ⌧ � � and � \ ⌧ � ⌧
The maximum dimension of simplex in K is the dimension of K
K2 = {|v0v1v2|, |v0v1|, |v0v2|, |v1v2|, |v0|, |v1|, |v2|}
K = K2 [ {|v3v4|, |v3|, |v4|}
NOT YES
2 - Simplicial complexes
Simplicial complexes
14
Hemoglobin simplicial complex
Image source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
2 - Simplicial complexes
✤ Let be a covering of
Nerve
15
� = {Bi|i = 1, ...,m} X = [mi=1Bi
✤ The nerve of is a simplicial complex� N (�) = (V,⌃)
2 - Simplicial complexes
Nerve theorem
16
✤ If is covered by a collection of convex closed sets then X and are homotopy equivalent
X ⊂ RN
� = {Bi|i = 1, ...,m} N (�)
2 - Simplicial complexes
Cech complex
17
P = {xi 2 RN |i = 1, ...,m}
Br(xi) = {x 2 RN | ||x� xi|| r}
✤ The Cech complex C(P, r) is the nerve of
✤
� = {Br(xi)| xi 2 P}
✤ From nerve theorem: C(P, r)
Xr = [mi=1Br(xi) ' C(P, r)
✤ Filtration
ball with radius r
2 - Simplicial complexes
Cech complex
18
✤ The weighted Cech complex C(P, R) is the nerve of
✤ Computations to check the intersections of balls are not easy
ball with different radius� = {Bri(xi)| xi 2 P}
Alpha complex
2 - Simplicial complexes
Voronoi diagrams and Delaunay complex
19
✤ P = {xi 2 RN |i = 1, ...,m}
Vi = {x 2 RN | ||x� xi|| ||x� xj ||, j 6= i}
RN = [mi=1Vi
Voronoi cell
✤ � = {Vi|i = 1, ...,m}
D(P ) = N (�)
Voronoi decomposition
Delaunay complex
2 - Simplicial complexes
General position
20
✤ is in a general position, if there is no
✤ If all combination of N+2 points in P is in a general position, then P is in a general position
x1, ..., xN+2 2 RN
x 2 RNs.t.||x� x1|| = ... = ||x� xN+2||
✤ If P is in a general position then
The dimensions of Delaunay simplexes <= N
Geometric representation of D(P) can be embedded in RN
2 - Simplicial complexes
Alpha complex
21
✤
✤
✤ The alpha complex is the nerve of �
�
↵(P, r) = N (�)
✤ From Nerve theorem:Xr ' ↵(P, r)
2 - Simplicial complexes
Alpha complex
22
✤
✤
✤ The weighted alpha complex is defined with different radius
if P is in a general position
filtration of alpha complexes
2 - Simplicial complexes
Alpha complex
23
✤ Computations are much easier than Cech complexes
✤ Software: CGAL
• Construct alpha complexes of points clouds data in RN with N <= 3
Filtration of alpha complexImage source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
2 - Simplicial complexes
Outline
TDA - Basic Concepts 24
1. Topology and holes
3. Definition of holes
5. Some of applications
2. Simplicial complexes
4. Persistent homology
Definition of holes
25
Simplicial complex
Chain complex
Homologygroup
Algebraic Holes
Geometrical object
Algebraic object
3 - Definition of Holes
What is hole?
26
✤ 1-dimensional hole: ring
not ring have ring
boundary
without ring
without boundary
Ring = 1-dimensional graph without boundary?
However, NOT
1-dimensional graph without
boundary but is 2-dimensional graph ’s boundary
Ring = 1-dimensional graph without boundary and is not boundary of 2-dimensional graph
3 - Definition of Holes
What is hole?
27
✤ 2-dimensional hole: cavity
not cavity have cavity
boundary
without cavity
without boundary
However, NOT
2-dimensional graph without
boundary but is 3-dimensional graph ’s boundary
Cavity = 2-dimensional graph without boundary and is not boundary of 3-dimensional graph
Cavity = 2-dimensional graph without boundary?
3 - Definition of Holes
Hole and boundary
28
q-dimensional hole
q-dimensional graph without boundary and
is not boundary of (q+1)-dimensional graph
=We try to make it clear by “Algebraic” language
3 - Definition of Holes
Chain complexes
29
Let K be a simplicial complex with dimension n. The group of q-chains is defined as below:
The element of Cq(K) is called q chain.
Definition:
Cq(K) := {X
↵i
⌦vi0 ...viq
↵|↵i 2 R,
⌦vi0 ...viq
↵: q simplicial in K}
0 q nifCq(K) := 0, if q < 0 or q > n
3 - Definition of Holes
Boundary
30
Boundary of a q-simplex is the sum of its (q-1)-dimensional faces.
Definition:
vil is omitted
@|v0v1v2| := |v0v1|+ |v1v2|+ |v0v2|
3 - Definition of Holes
Boundary
31
Fundamental lemma@q�1 � @q = 0
@2 @1For q = 2
In general• For a q - simplex τ, the boundary ∂qτ, consists of all (q-1) faces of τ.• Every (q-2)-face of τ belongs to exactly two (q-1)-faces, with different direction
@q�1@q⌧ = 0
3 - Definition of Holes
Hole and boundary
32
q-dimensional holeq-dimensional graph without boundary and is
not boundary of (q+1)-dimensional graph
(1)
(2)
(1)
(2)
:= ker @q
:= im@q+1
(cycles group)
(boundary group)
Bq(K) ⇢ Zq(K) ⇢ Cq(K)
@q � @q+1 = 0
3 - Definition of Holes
Hole and boundary
33
q-dimensional holeq-dimensional graph without boundary and is
not boundary of (q+1)-dimensional graph
(1)
(2)
Elements in Zq(K) remain after make Bq(K) become zero
This operator is defined as Q=
:= ker @q := im@q+1
Q(z0) = Q(z) +Q(b) = Q(z)
(z and z’ are equivalent in with respect to )
q-dimensional hole = an equivalence class of vectors
ker @qim @q+1
For z0 = z + b, z, z0 2 ker @q, b 2 im @q+1
3 - Definition of Holes
Homology group
34
Homology groupsThe qth Homology Group Hq is defined as Hq = Ker@q/Im@q+1
= {z + Im@q+1 | z 2 Ker@q } = {[z]|z 2 Ker@q}
Divided in groups with operator [z] + [z’] = [z + z’]
Betti NumbersThe qth Betti Number is defined as the dimension of Hq
bq = dim(Hq)
H0(K): connected component H1(K): ring H2(K): cavity
3 - Definition of Holes
Computing Homology
35
v0
v1 v2
v3
All vectors in the column space of Ker@0 are equivalent with respect to Im@1
b0 = dim(H0) = 1Im@2 has only the zero vector
b1 = dim(H1) = 1H1 = {�(|v0v1|+ |v1v2|+ |v2v3|+ |v3v0|)}
3 - Definition of Holes
Computing Homology
36
v0
v1 v2
v3
H1 = {�(hv0v1i+ hv1v2i+ hv2v3i � hv0v3i)}
All vectors in the column space of Ker@0 are equivalent with respect to Im@1
b0 = dim(H0) = 1Im@2 has only the zero vector
b1 = dim(H1) = 13 - Definition of Holes
Outline
TDA - Basic Concepts 37
1. Topology and holes
3. Definition of holes
5. Some of applications
2. Simplicial complexes
4. Persistent homology
Persistent Homology
Persistent homology 38
✤ Consider filtration of finite type
K : K0 ⇢ K1 ⇢ ... ⇢ Kt ⇢ ...
9 ⇥ s.t. Kj = K⇥, 8j � ⇥
✤ : total simplicial complexK = [t�0Kt
Kk
Ktk
T (�) = t � 2 Kt \Kt�1
: all k-simplexes in K
: all k-simplexes in K at time t
: birth time of the simplex
time
Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
Persistent Homology
39
✤ Z2 - vector space
✤ Z2[x] - graded module
✤ Inclusion map
✤ is a free Z2[x] module with the baseCk(K)
Persistent homology Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
Persistent Homology
40
✤ Boundary map
✤ From the graded structure
✤ Persistent homology
(graded homomorphism)face of σ
Persistent homology Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
Persistent Homology
41
✤ From the structure theorem of Z2[x] (PID)
✤ Persistent interval
✤ Persistent diagram
Ii(b): inf of Ii, Ii(d): sup of Ii
Persistent homology Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
Persistent Homology
42
birth time
death time
✤ “Hole” appears close to the diagonal may be the “noise”
✤ “Hole” appears far to the diagonal may be the “noise”
✤ Detect the “structure hole”
Persistent homology Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
Outline
TDA - Basic Concepts 43
1. Topology and holes
3. Definition of holes
5. Some of applications
2. Simplicial complexes
4. Persistent homology
see more at part2 of tutorial
Applications
5 - Some of applications 44
• Persistence to Protein compressibilityMarcio Gameiro et. al. (Japan J. Indust. Appl. Math (2015) 32:1-17)
Protein Structure
Persistence to protein compressibility 45
amino acid 1 amino acid 2
3-dim structure of hemoglobin1-dim structure of protein
foldingpeptide bond
Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
Protein Structure
Persistence to protein compressibility 46
✤ Van der Waals radius of an atom
H: 1.2, C: 1.7, N: 1.55 (A0)O: 1.52, S: 1.8, P: 1.8 (A0)
Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
Van der Waals ball model of hemoglobin
Alpha Complex for Protein Modeling
Persistence to protein compressibility 47
✤
✤
✤
: position of atoms
: radius of i-th atom
: weighted Voronoi Decomposition
: power distance
: ball with radius ri
Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
Alpha Complex for Protein Modeling
Persistence to protein compressibility 48
✤
✤
✤
Alpha complex nerve
k - simplex
Nerve lemma
Changing radius
to form a filtration (by w)
Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
Topology of Ovalbumin
Persistence to protein compressibility 49
birth time
deat
h tim
e
birth time
deat
h tim
e1st betti
plot2nd betti
plot
PD1 PD2
Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
Compressibility
Persistence to protein compressibility 50
3-dim structureFunctionality
Softness
Compressibility
Experiments Quantification
Persistence diagrams
(Difficult)
…..…..
Select generators and fitting parameters with experimental compressibility
holes
Denoising
Persistence to protein compressibility 51
birth timede
ath
time
✤ Topological noise
✤ Non-robust topological features depend on a status of fluctuations
✤ The quantification should not be dependent on a status of fluctuations
Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
Holes with Sparse or Dense Boundary
Persistence to protein compressibility 52
✤ A sparse hole structure is deformable to a much larger extent than the dense hole → greater compressibility
✤ Effective sparse holes
: van der Waals ball: enlarged ball
birth time
deat
h tim
e
Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
# of generators v.s. compressibility
Persistence to protein compressibility 53
# of generators v.s. compressibility
Topological Measurement Cp
Com
pres
sibi
lity
Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
Applications
5 - Some of applications 54
• Persistence to Phylogenetic Trees
Protein Phylogenetic Tree
Persistence to Phylogenetic Trees 55
✤ Phylogenetic tree is defined by a distance matrix for a set of species (human, dog, frog, fish,…)
✤ The distance matrix is calculated by a score function based on similarity of amino acid sequences
amino acid sequences
fish hemoglobin
frog hemoglobin
human hemoglobin
distance matrix ofhemoglobin
fishfroghumandog
Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
Persistence Distance and Classification of Proteins
Persistence to Phylogenetic Trees 56
✤ The score function based on amnio acid sequences does not contain information of 3-dim structure of proteins
✤ Wasserstein distance (of degree p)
Cohen-Steiner, Edelsbrunner, Harer, and Mileyko, FCM, 2010
on persistence diagrams reflects similarity of persistence diagram (3-dim structures) of proteins
Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
Persistence Distance and Classification of Proteins
Persistence to Phylogenetic Trees 57
birth time
deat
h tim
e
birth time
birth time
deat
h tim
e
deat
h tim
eWasserstein distance
Bijection
Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
Distance between persistence diagrams
Persistence to Phylogenetic Trees 58
Persistence of sub level sets
Stability Theorem (Cohen-Steiner et al., 2010)birth time
deat
h tim
e
Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
Phylogenetic Tree by Persistence
Persistence to Phylogenetic Trees 59
✤ Apply the distance on persistence diagrams to classify proteins
Persistence diagram used the noise band same as in the computations of compressibility
3DHT
3D1A
1QPW
3LQD
1FAW
1C40
2FZB
Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
Future work
TDA - Basic Concepts 60
✤ Principle to de-noise fluctuations in persistence diagrams (NMR experiments)
✤ Finding minimum generators to identify specific regions in a protein (e.g., a region inducing high compressibility, hereditarily important regions)
✤ Zigzag persistence for robust topological features among a specific group of proteins (quiver representation)
✤ Multi-dimensional persistence (PID → Grobner basic)
Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
Applications more in part … of tutorials
5 - Some of applications 61
✤ Robotics
✤ Computer Visions
✤ Sensor network
✤ Concurrency & database
✤ Visualization
Prof. Robert Ghrist Department of Mathematics University of Pennsylvania
One of pioneers in applications
Michael Farber Edelsbrunner
Mischaikow Gaucher Bubenik
Zomorodian
Carlsson
Software
TDA - Basic Concepts 62
• Alpha complex by CGALhttp://www.cgal.org/
• Persistence diagrams by Perseus (coded by Vidit Nanda)
http://www.sas.upenn.edu/~vnanda/perseus/index.html
http://chomp.rutgers.edu/Project.html
• CHomP project
Reference links
TDA - Basic Concepts 63
• Yasuaki Hiraoka associate professor homepage
http://www2.math.kyushu-u.ac.jp/~hiraoka/site/About_Me.html
http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
www.msys.sys.i.kyoto-u.ac.jp/~kazunori/paper/nist20081219.pdf
• Applications in sensor network