De la 'Data Analysis' la 'Topologie

28
De la "Data Analysis" la "Topologie " Dan Burghelea Department of Mathematics Ohio State University, Columbuus, OH Pitesti 2019 D.Burghelea De la "Data Analysis" la "Topologie

Transcript of De la 'Data Analysis' la 'Topologie

Page 1: De la 'Data Analysis' la 'Topologie

De la "Data Analysis" la "Topologie "

Dan Burghelea

Department of MathematicsOhio State University, Columbuus, OH

Pitesti 2019

D.Burghelea De la "Data Analysis" la "Topologie

Page 2: De la 'Data Analysis' la 'Topologie

Data

Topology

New Invariants

Mutual benefits

D.Burghelea De la "Data Analysis" la "Topologie

Page 3: De la 'Data Analysis' la 'Topologie

Data Analysis

1 What is Data?

2 How the Data is obtained?

3 What do we want from Data?

4 How the Topology enter the picture ?

D.Burghelea De la "Data Analysis" la "Topologie

Page 4: De la 'Data Analysis' la 'Topologie

What is "Data"

1. Mathematically Data (≡ Point Cloud Data) is

a finite metric space (X ,d)

possibly equipped with

a map f : X → R or f : X → S1

D.Burghelea De la "Data Analysis" la "Topologie

Page 5: De la 'Data Analysis' la 'Topologie

How is "Data" obtained

a. sampling (of a shape in three or higher dimensional Euclidean space or a

probability distribution)⇒ a collection of points (three coordinates)

b. scanning of a 2 dimensional picture

c. collection of 2D pictures (black-white) of a 3D environmenttaken by a camera from different angles; each 2D picture regarded as a vector in the

pixel space with a gray scale coordinate for each pixel. If camera has 100× 100 pixels

this is a collection of vectors in R10000

d. list of measurements for a collection of objects/individuals;for example observations on the patients in a hospital ( EXAMPLE 2) suspected of

diabetes by measuring: insulin response, glucose tolerance, relative weight , blood

pressure, pulse, hemoglobin level = (A1C)

D.Burghelea De la "Data Analysis" la "Topologie

Page 6: De la 'Data Analysis' la 'Topologie

What one wants to extract from "Data"

In case of sampled geometrical objects

a) derive topological features number of components, number of

holes, the number of tubes transversing the object

without reconstructing the object entirelyb) reconstruct a continuous shape from a sampling.

In case of experimental observationsa) discover patterns and unexpected features,b) detect missing blocks of data and notice clusterings

determine the relevant number of (local ) parameters datadepends on

D.Burghelea De la "Data Analysis" la "Topologie

Page 7: De la 'Data Analysis' la 'Topologie

Example 1.Lung cancer imaging.

- 3D radiological images of cancerous lungs shows both tumorsand blood vessels as areas of increased density.

- Blood vessels show up as long tunnels in the image

- Tumors show up as balls.

Question: How to distinguish automatically between tumorsand blood vessels ?

D.Burghelea De la "Data Analysis" la "Topologie

Page 8: De la 'Data Analysis' la 'Topologie

Example 2. Diabetes Patients(after Miller-Reaven study mentioned in [?])

- Study carried out in 1976 on 145 patients at Stanford Hospital;Most of patients had symptoms of diabetes although somewere normal

- For each patient 6 metabolic variables (involving insulinresponse, glucose tolerence, relative weight, A1C level , bloodpressure .....) were measured and recorded in a 6 dimensionalspace. Hence a point cloud of 145 points in R6

Questions: Find the relevant number of metabolic variablesneeded to detect the diabetes. Find qualitative features (type ofdiabetes, etc).

D.Burghelea De la "Data Analysis" la "Topologie

Page 9: De la 'Data Analysis' la 'Topologie

What Topology does for "Data"

GEOMERTIZE DATA

i.e. converts data into nice topological spaces / possiblyequipped with real or angle valued maps with computablequalitative features:

1. Simplicial complexes (components, tubes, holes arecounted by Betti numbers),

2. Smooth manifolds equipped with a Riemannian metric(curvature)

D.Burghelea De la "Data Analysis" la "Topologie

Page 10: De la 'Data Analysis' la 'Topologie

TOPOLOGY

1. converts a finite metric space into "nice topologicalspace " ;simplicial complex orsimplicial complexs + real / angle- valued maps orsimplicial complex + filtration.

2. uses Homology ( Betti numbers, EP characteristic) tomake mathematically precise and algorithmicallycomputable "qualitative features" of the associated shape.

D.Burghelea De la "Data Analysis" la "Topologie

Page 11: De la 'Data Analysis' la 'Topologie

Why simplicial complex?1. any resonable shape can is homeomorhic to a geometricsimplicial complex2. A finite simplicial complex can be input in a computer as alarge matrix

D.Burghelea De la "Data Analysis" la "Topologie

Page 12: De la 'Data Analysis' la 'Topologie

SIMPLICIAL COMPLEXES

A solid k− simplex is the convex hull of (k + 1) linearlyindependent points .

A geometric simplicial complex K is a "nice subspace ofan Euclidean space " = a union of solid simplicies whichintersect each other in faces .

An abstract simplicial complex is a pair (X ,Σ) with:X a finite set,Σ a family of nonempty subsets of X , covering X s.t.

σ ⊆ τ ∈ Σ⇒ σ ∈ Σ .

The subsets of cardinality (k + 1) are denoted by Xkandreferred to as the k−simplexes.

D.Burghelea De la "Data Analysis" la "Topologie

Page 13: De la 'Data Analysis' la 'Topologie

An abstract simplicial complex determines a geometricsimplicial complex and vice versa.

A simplicial complex is determined by its incidence matrixwhich can be fed in as input of an algorithm.

D.Burghelea De la "Data Analysis" la "Topologie

Page 14: De la 'Data Analysis' la 'Topologie

VIETORIS RIPS complex

To a finite metric space (X ,d) and ε > 0 one asociates: theabstract complex Rε(X ,d) := (X ,Σε = ∪kXk ).

X0 = X ,

Xk := {(x1, x2, · · · xk+1)| iff d(xi , xj) < ε}.

One hasIf ε < ε′ Rε(X ,d) ⊆ Rε′(X ,d).

A map f : X → R provides the simplicial mapsf : Rε(X ,d)→ R

If ε < π a map f : X → S1 provides the simplicial mapsf : Rε(X ,d)→ S1.

D.Burghelea De la "Data Analysis" la "Topologie

Page 15: De la 'Data Analysis' la 'Topologie

If the data is a sample of a compact manifold embedded in theEuclidean space then:

TheoremThere exists α > 0 so that for any ε− dense sample (X ,d),ε < α, the MV-Rips complex Rε(X ,d) is homotopy equivalent tothe manifold.

D.Burghelea De la "Data Analysis" la "Topologie

Page 16: De la 'Data Analysis' la 'Topologie

Ilustration

A fixed set of points can be completed to a Cech complex Cε or to a Rips complex Rε based on a proximity parameter ε. This Cech complex has the homotopy type of the ε/2 cover (S1 ∨ S1 ∨ S1), while the Rips complex has a different homotopy type (S1 ∨ S2).D.Burghelea De la "Data Analysis" la "Topologie

Page 17: De la 'Data Analysis' la 'Topologie

The topology of the ε complexes differ, for different ε’s .

It is therefore desirable to consider all these complexes.

A sequence of Rips complexes for a point cloud data set representing an annulus. Upon increasing ε, holes appear and disappear. Which holes are real and which are noise?

D.Burghelea De la "Data Analysis" la "Topologie

Page 18: De la 'Data Analysis' la 'Topologie

New algebraic topological invariants

For1 A simplicial complex and a simplicial map f : X → R whose

sub levels f−1(−∞, t ] change the homology for finitelymany real values t0 < t1 < t2, · · · < tN ,

2 A simplicial complex and a simplicial map f : X → R orf : X → S1 whose levels f−1(t) change the homology forfinitely many (real or angle values) t0 < t2 < · · · tN ∈ R,

3 A simplicial complex X with a filtrationX0 ⊂ X1 ⊂ · · ·XN−1 ⊂ XN = X ;

D.Burghelea De la "Data Analysis" la "Topologie

Page 19: De la 'Data Analysis' la 'Topologie

Topological persistence- alternative computableinvariants for the classical topological invariants

Inspired from Morse theory/ Morse–Novikov theory oneassociates for any r = 0,1, · · ·

1 to f : X → R, based on changes in homology of sub levelsf−1(−∞,a], a collection of sub level bar codes = intervals[a,b], [a,∞)

2 to f : X → R or f : X → S1, based on changes in thehomology of the levels f−1(t), a collection of four types oflevel bar codes = intervals [a,b], (a,b), [a,b), (a,b] Jordancells {(λ, k) | λ ∈ C \ 0, k ∈ Z≥1}

D.Burghelea De la "Data Analysis" la "Topologie

Page 20: De la 'Data Analysis' la 'Topologie

In (1.) the ends of the interval are values where the topology ofcritical sub level a change.

In (2.) the ends of the interval are values where the topology oflevel a changes.

The Jordan cells (for angle-value map) appear from Jordandecomposition of matrtces which describe the way the topologyof a level θ is related to itself when θ moves along the circle .

D.Burghelea De la "Data Analysis" la "Topologie

Page 21: De la 'Data Analysis' la 'Topologie

D.Burghelea De la "Data Analysis" la "Topologie

Page 22: De la 'Data Analysis' la 'Topologie

One denotes by Bcr (f ), Bo

r (f ), Bc,or (f ) and Bo,c

r (f ) the set ofclosed, open, closed-open, and open-closed r−bar codesof f .

One collects the sets Bcr (f ) and Bo

r−1(f ) as theconfiguration Cr (f ) of points in C for f real-valued and inC \ 0 for f angle valued.

One collects the sets Bc,or (f ) and Bo,c

r (f ) as theconfiguration cr (f ) of points in C \∆},∆ := {z ∈ C | <z = =z} for f real valued and inC \ (S1 t 0) for f angle valued.

D.Burghelea De la "Data Analysis" la "Topologie

Page 23: De la 'Data Analysis' la 'Topologie

X

X

X

X

X

X

X

X

X

X

Y

x

x

x

x

x

x

x

Y

The bar code with ends a, b, a≤b and closed at a is represented

as a point a+ ib while the bar code with ends a, b, a < b open at

a is represented as a point b + ia.

D.Burghelea De la "Data Analysis" la "Topologie

Page 24: De la 'Data Analysis' la 'Topologie

for f angle-valued replace [a,b] and [a,b)by z = eia+(b−a) and (a,b) and (a,b] by z = eib+(a−b) and obtain

To the configuration Cr (f ) resp. cr (f ) one associates thepolynomial P f

r (z) resp. pfr (z) given by the product∏

(z − zi)

D.Burghelea De la "Data Analysis" la "Topologie

Page 25: De la 'Data Analysis' la 'Topologie

Main results

TheoremBetti numbers can be calculated from the degree of thepolynomial P f

r (z) and the cardinality of J ′r (f )s. (see [?]) .

Theorem

The assignment f P fr (z and f pf

r (z) from the space ofmaps to the space of polynomials (with the appropriatetopology) is continuous

D.Burghelea De la "Data Analysis" la "Topologie

Page 26: De la 'Data Analysis' la 'Topologie

TheoremIf X = Mn a closed topological manifold then

PC,fr (z) = PC,f

n−r (τ(z))

andpC,f

r (z) = pC,fn−r−1(τ(z))

where for f real valued τz = iz and and for f angle valuedτ(z) = 1/z.

D.Burghelea De la "Data Analysis" la "Topologie

Page 27: De la 'Data Analysis' la 'Topologie

Algorithms

A1. From Point Cloud data to incidence matrix of a simplicialcomplex + the simplicial map

A2. From the incidence matrix and the simplicial map f tobarcodes and Jordan cells

D.Burghelea De la "Data Analysis" la "Topologie

Page 28: De la 'Data Analysis' la 'Topologie

Dan Burghelea, New topological invariants for real- andangle-valurd maps; an alternative to Morse-Novikov theoryWord scientific Publishing Co. Pte. Ltd, 2017

G. Carlsson, Topology and Data Bull. Amer.Math.Soc, 46pp255-3086.

D.Burghelea De la "Data Analysis" la "Topologie