Using Topological Data Analysis on your BigData

168
Shape and Meaning An Introduction to Topological Data Analysis Anthony Bak

description

Synopsis: Topological Data Analysis (TDA) is a framework for data analysis and machine learning and represents a breakthrough in how to effectively use geometric and topological information to solve 'Big Data' problems. TDA provides meaningful summaries (in a technical sense to be described) and insights into complex data problems. In this talk, Anthony will begin with an overview of TDA and describe the core algorithm that is utilized. This talk will include both the theory and real world problems that have been solved using TDA. After this talk, attendees will understand how the underlying TDA algorithm works and how it improves on existing “classical” data analysis techniques as well as how it provides a framework for many machine learning algorithms and tasks. Speaker: Anthony Bak, Senior Data Scientist, Ayasdi Prior to coming to Ayasdi, Anthony was at Stanford University where he did a postdoc with Ayasdi co-founder Gunnar Carlsson, working on new methods and applications of Topological Data Analysis. He completed his Ph.D. work in algebraic geometry with applications to string theory at the University of Pennsylvania and ,along the way, he worked at the Max Planck Institute in Germany, Mount Holyoke College in Germany, and the American Institute of Mathematics in California.

Transcript of Using Topological Data Analysis on your BigData

Page 1: Using Topological Data Analysis on your BigData

Shape and MeaningAn Introduction to Topological Data Analysis

Anthony Bak

Page 2: Using Topological Data Analysis on your BigData

Goals

For this talk I want to:

I Show you how TDA provides a framework for many machine learning/dataanalysis techniques

I Demonstrate how Ayasdi provides insights into the data.

Caveats: I am only talking about the strain of TDA done by Ayasdi

Page 3: Using Topological Data Analysis on your BigData

Goals

For this talk I want to:

I Show you how TDA provides a framework for many machine learning/dataanalysis techniques

I Demonstrate how Ayasdi provides insights into the data.

Caveats: I am only talking about the strain of TDA done by Ayasdi

Page 4: Using Topological Data Analysis on your BigData

Goals

For this talk I want to:

I Show you how TDA provides a framework for many machine learning/dataanalysis techniques

I Demonstrate how Ayasdi provides insights into the data.

Caveats: I am only talking about the strain of TDA done by Ayasdi

Page 5: Using Topological Data Analysis on your BigData

Goals

For this talk I want to:

I Show you how TDA provides a framework for many machine learning/dataanalysis techniques

I Demonstrate how Ayasdi provides insights into the data.

Caveats: I am only talking about the strain of TDA done by Ayasdi

Page 6: Using Topological Data Analysis on your BigData

The Data Problem

How do we extract meaning from Complex Data?

I Data is complex because it’s "Big Data"I Or has very rich features (eg. Genetic Data >500,000 features, complicated

interdependencies)I Or both!

The Problem in both cases is that there isn’t a single story happening in your data.

TDA will be the tool that summarizes out the irrelevant stories to get at somethinginteresting.

Page 7: Using Topological Data Analysis on your BigData

The Data Problem

How do we extract meaning from Complex Data?

I Data is complex because it’s "Big Data"

I Or has very rich features (eg. Genetic Data >500,000 features, complicatedinterdependencies)

I Or both!

The Problem in both cases is that there isn’t a single story happening in your data.

TDA will be the tool that summarizes out the irrelevant stories to get at somethinginteresting.

Page 8: Using Topological Data Analysis on your BigData

The Data Problem

How do we extract meaning from Complex Data?

I Data is complex because it’s "Big Data"I Or has very rich features (eg. Genetic Data >500,000 features, complicated

interdependencies)

I Or both!

The Problem in both cases is that there isn’t a single story happening in your data.

TDA will be the tool that summarizes out the irrelevant stories to get at somethinginteresting.

Page 9: Using Topological Data Analysis on your BigData

The Data Problem

How do we extract meaning from Complex Data?

I Data is complex because it’s "Big Data"I Or has very rich features (eg. Genetic Data >500,000 features, complicated

interdependencies)I Or both!

The Problem in both cases is that there isn’t a single story happening in your data.

TDA will be the tool that summarizes out the irrelevant stories to get at somethinginteresting.

Page 10: Using Topological Data Analysis on your BigData

The Data Problem

How do we extract meaning from Complex Data?

I Data is complex because it’s "Big Data"I Or has very rich features (eg. Genetic Data >500,000 features, complicated

interdependencies)I Or both!

The Problem in both cases is that there isn’t a single story happening in your data.

TDA will be the tool that summarizes out the irrelevant stories to get at somethinginteresting.

Page 11: Using Topological Data Analysis on your BigData

The Data Problem

How do we extract meaning from Complex Data?

I Data is complex because it’s "Big Data"I Or has very rich features (eg. Genetic Data >500,000 features, complicated

interdependencies)I Or both!

The Problem in both cases is that there isn’t a single story happening in your data.

TDA will be the tool that summarizes out the irrelevant stories to get at somethinginteresting.

Page 12: Using Topological Data Analysis on your BigData

Data Has Shape

And Shape Has Meaning

⇒ In this talk I will focus on how we extract meaning.

Page 13: Using Topological Data Analysis on your BigData

Data Has Shape

And Shape Has Meaning

⇒ In this talk I will focus on how we extract meaning.

Page 14: Using Topological Data Analysis on your BigData

But first... What is shape?

I Shape is the global realization of local constraints.I For a given problem determined by a choice of

I Features, columns or properties to measure.I A metric on the columns.

I But not necessarily so. There are more relaxed definitions of shape and wecan use those too.

The goal of TDA is to understand (for us, summarize) the shape with nopreconceived model of what it should be.

Page 15: Using Topological Data Analysis on your BigData

But first... What is shape?

I Shape is the global realization of local constraints.

I For a given problem determined by a choice of

I Features, columns or properties to measure.I A metric on the columns.

I But not necessarily so. There are more relaxed definitions of shape and wecan use those too.

The goal of TDA is to understand (for us, summarize) the shape with nopreconceived model of what it should be.

Page 16: Using Topological Data Analysis on your BigData

But first... What is shape?

I Shape is the global realization of local constraints.I For a given problem determined by a choice of

I Features, columns or properties to measure.I A metric on the columns.

I But not necessarily so. There are more relaxed definitions of shape and wecan use those too.

The goal of TDA is to understand (for us, summarize) the shape with nopreconceived model of what it should be.

Page 17: Using Topological Data Analysis on your BigData

But first... What is shape?

I Shape is the global realization of local constraints.I For a given problem determined by a choice of

I Features, columns or properties to measure.I A metric on the columns.

I But not necessarily so. There are more relaxed definitions of shape and wecan use those too.

The goal of TDA is to understand (for us, summarize) the shape with nopreconceived model of what it should be.

Page 18: Using Topological Data Analysis on your BigData

But first... What is shape?

I Shape is the global realization of local constraints.I For a given problem determined by a choice of

I Features, columns or properties to measure.I A metric on the columns.

I But not necessarily so. There are more relaxed definitions of shape and wecan use those too.

The goal of TDA is to understand (for us, summarize) the shape with nopreconceived model of what it should be.

Page 19: Using Topological Data Analysis on your BigData

Math World

To show you how we extract insight from shape we start in "Math World"

I We’ll draw the data as a smooth manifold.I Functions that appear are smooth or continuous.

⇒ We will not need either of these assumptions once we’re in "Data World".

⇒ Even more importantly, data in the real world is never like this

Page 20: Using Topological Data Analysis on your BigData

Math World

To show you how we extract insight from shape we start in "Math World"

I We’ll draw the data as a smooth manifold.

I Functions that appear are smooth or continuous.

⇒ We will not need either of these assumptions once we’re in "Data World".

⇒ Even more importantly, data in the real world is never like this

Page 21: Using Topological Data Analysis on your BigData

Math World

To show you how we extract insight from shape we start in "Math World"

I We’ll draw the data as a smooth manifold.I Functions that appear are smooth or continuous.

⇒ We will not need either of these assumptions once we’re in "Data World".

⇒ Even more importantly, data in the real world is never like this

Page 22: Using Topological Data Analysis on your BigData

Math World

To show you how we extract insight from shape we start in "Math World"

I We’ll draw the data as a smooth manifold.I Functions that appear are smooth or continuous.

⇒ We will not need either of these assumptions once we’re in "Data World".

⇒ Even more importantly, data in the real world is never like this

Page 23: Using Topological Data Analysis on your BigData

Math World

To show you how we extract insight from shape we start in "Math World"

I We’ll draw the data as a smooth manifold.I Functions that appear are smooth or continuous.

⇒ We will not need either of these assumptions once we’re in "Data World".

⇒ Even more importantly, data in the real world is never like this

Page 24: Using Topological Data Analysis on your BigData

Math World

Data

Page 25: Using Topological Data Analysis on your BigData

Math World

f

p

Data

Page 26: Using Topological Data Analysis on your BigData

Math World

f

pf−1(p)

Data

Page 27: Using Topological Data Analysis on your BigData

Math World

f

pf−1(p)

=⇒

Data

Page 28: Using Topological Data Analysis on your BigData

Math World

f

pf−1(p)

=⇒

q

Data

Page 29: Using Topological Data Analysis on your BigData

Math World

f

pf−1(p)

=⇒

q

Data

Page 30: Using Topological Data Analysis on your BigData

Math World

f

pf−1(p)

=⇒

q

Data

Page 31: Using Topological Data Analysis on your BigData

Math World

f

pf−1(p)

=⇒

q

g

Data

Page 32: Using Topological Data Analysis on your BigData

Math World

f

pf−1(p)

=⇒

q

g

Data

p′

Page 33: Using Topological Data Analysis on your BigData

Math World

f

pf−1(p)

=⇒

q

g

Data

p′

Page 34: Using Topological Data Analysis on your BigData

Math World

f

pf−1(p)

=⇒

q

g

Data

p′ q′

Page 35: Using Topological Data Analysis on your BigData

Math World

f

pf−1(p)

=⇒

q

g

Data

p′ q′

Page 36: Using Topological Data Analysis on your BigData

Math World

f

pf−1(p)

=⇒

q

g

Data

r′

Page 37: Using Topological Data Analysis on your BigData

Math World

f

pf−1(p)

=⇒

q

g

Data

r′

Page 38: Using Topological Data Analysis on your BigData

Math World

f

pf−1(p)

=⇒

q

g

Data

Page 39: Using Topological Data Analysis on your BigData

Exercise:

What is the summary if we use both lenses, g and f at the same time?

(g, f )

=⇒ We recover the original space

Page 40: Using Topological Data Analysis on your BigData

Exercise:

What is the summary if we use both lenses, g and f at the same time?

(g, f )

=⇒ We recover the original space

Page 41: Using Topological Data Analysis on your BigData

Exercise:

What is the summary if we use both lenses, g and f at the same time?

(g, f )

p

=⇒ We recover the original space

Page 42: Using Topological Data Analysis on your BigData

Exercise:

What is the summary if we use both lenses, g and f at the same time?

(g, f )

p

=⇒ We recover the original space

Page 43: Using Topological Data Analysis on your BigData

Exercise:

What is the summary if we use both lenses, g and f at the same time?

(g, f )

p

=⇒ We recover the original space

Page 44: Using Topological Data Analysis on your BigData

Exercise:

What is the summary if we use both lenses, g and f at the same time?

(g, f )

p

=⇒ We recover the original space

Page 45: Using Topological Data Analysis on your BigData

Exercise:

What is the summary if we use both lenses, g and f at the same time?

(g, f )

p

=⇒ We recover the original space

Page 46: Using Topological Data Analysis on your BigData

What did the exercise tell us?

I With a rich enough set of functions (lenses) we can recover the original spaceI Of course this leaves us no better off then where we started.

⇒ Instead we select a set of functions to tune in to the signal we want.

Page 47: Using Topological Data Analysis on your BigData

What did the exercise tell us?

I With a rich enough set of functions (lenses) we can recover the original space

I Of course this leaves us no better off then where we started.

⇒ Instead we select a set of functions to tune in to the signal we want.

Page 48: Using Topological Data Analysis on your BigData

What did the exercise tell us?

I With a rich enough set of functions (lenses) we can recover the original spaceI Of course this leaves us no better off then where we started.

⇒ Instead we select a set of functions to tune in to the signal we want.

Page 49: Using Topological Data Analysis on your BigData

What did the exercise tell us?

I With a rich enough set of functions (lenses) we can recover the original spaceI Of course this leaves us no better off then where we started.

⇒ Instead we select a set of functions to tune in to the signal we want.

Page 50: Using Topological Data Analysis on your BigData

This is what Ayasdi does:

f

pf−1(p)

=⇒

q

g

Data

Modulo some details....

Page 51: Using Topological Data Analysis on your BigData

This is what Ayasdi does:

f

pf−1(p)

=⇒

q

g

Data

Modulo some details....

Page 52: Using Topological Data Analysis on your BigData

Why is this useful?

⇒ We get "easy" understanding of the localizations of quantities of interest.

Page 53: Using Topological Data Analysis on your BigData

Why is this useful?

⇒ We get "easy" understanding of the localizations of quantities of interest.

Page 54: Using Topological Data Analysis on your BigData

Why is this useful?

f

g

Page 55: Using Topological Data Analysis on your BigData

Why is this useful?

f

g

Page 56: Using Topological Data Analysis on your BigData

Why is this useful?

f

g

Page 57: Using Topological Data Analysis on your BigData

Why is this useful?

f

g

Page 58: Using Topological Data Analysis on your BigData

Why is this useful?

f

g

Page 59: Using Topological Data Analysis on your BigData

Why is this useful?

f

g

Page 60: Using Topological Data Analysis on your BigData

Why is this useful?

f

g

Page 61: Using Topological Data Analysis on your BigData

Why is this useful?

f

g

Page 62: Using Topological Data Analysis on your BigData

Why is this useful?

f

g

Page 63: Using Topological Data Analysis on your BigData

Why is this useful?

I Lenses inform us where in the space to look for phenomena.I For easy localizations many different lenses will be informative.I For hard ( = geometrically distributed) localizations we have to be more

careful.

But even then, we frequently get incremental knowledge even from apoorly chosen lens.

Page 64: Using Topological Data Analysis on your BigData

Why is this useful?

I Lenses inform us where in the space to look for phenomena.

I For easy localizations many different lenses will be informative.I For hard ( = geometrically distributed) localizations we have to be more

careful.

But even then, we frequently get incremental knowledge even from apoorly chosen lens.

Page 65: Using Topological Data Analysis on your BigData

Why is this useful?

I Lenses inform us where in the space to look for phenomena.I For easy localizations many different lenses will be informative.

I For hard ( = geometrically distributed) localizations we have to be morecareful.

But even then, we frequently get incremental knowledge even from apoorly chosen lens.

Page 66: Using Topological Data Analysis on your BigData

Why is this useful?

I Lenses inform us where in the space to look for phenomena.I For easy localizations many different lenses will be informative.I For hard ( = geometrically distributed) localizations we have to be more

careful.

But even then, we frequently get incremental knowledge even from apoorly chosen lens.

Page 67: Using Topological Data Analysis on your BigData

Why is this useful?

I Lenses inform us where in the space to look for phenomena.I For easy localizations many different lenses will be informative.I For hard ( = geometrically distributed) localizations we have to be more

careful. But even then, we frequently get incremental knowledge even from apoorly chosen lens.

Page 68: Using Topological Data Analysis on your BigData

Modulo Details....

We want to move from this mathematical model to a data driven setup.

Page 69: Using Topological Data Analysis on your BigData

Step 1

I Replace points in the range with an open covering of the range.I Connect nodes when their corresponding sets intersect.

f

⇒ The output is now a graph.

Page 70: Using Topological Data Analysis on your BigData

Step 1

I Replace points in the range with an open covering of the range.

I Connect nodes when their corresponding sets intersect.

f

((

()

))

U1

U2

U3

⇒ The output is now a graph.

Page 71: Using Topological Data Analysis on your BigData

Step 1

I Replace points in the range with an open covering of the range.

I Connect nodes when their corresponding sets intersect.

f

((

()

))

U1

U2

U3

⇒ The output is now a graph.

Page 72: Using Topological Data Analysis on your BigData

Step 1

I Replace points in the range with an open covering of the range.

I Connect nodes when their corresponding sets intersect.

f

((

()

))

U1

U2

U3

⇒ The output is now a graph.

Page 73: Using Topological Data Analysis on your BigData

Step 1

I Replace points in the range with an open covering of the range.

I Connect nodes when their corresponding sets intersect.

f

((

()

))

U1

U2

U3

⇒ The output is now a graph.

Page 74: Using Topological Data Analysis on your BigData

Step 1

I Replace points in the range with an open covering of the range.

I Connect nodes when their corresponding sets intersect.

f

((

()

))

U1

U2

U3

⇒ The output is now a graph.

Page 75: Using Topological Data Analysis on your BigData

Step 1

I Replace points in the range with an open covering of the range.I Connect nodes when their corresponding sets intersect.

f

((

()

))

U1

U2

U3

⇒ The output is now a graph.

Page 76: Using Topological Data Analysis on your BigData

Step 1

I Replace points in the range with an open covering of the range.I Connect nodes when their corresponding sets intersect.

f

((

()

))

U1

U2

U3

⇒ The output is now a graph.

Page 77: Using Topological Data Analysis on your BigData

Step 1

I Replace points in the range with an open covering of the range.I Connect nodes when their corresponding sets intersect.

f

((

()

))

U1

U2

U3

⇒ The output is now a graph.

Page 78: Using Topological Data Analysis on your BigData

New Parameters

We’ve introduced new parameters into the construction:

I The resolution is the number of open sets in the range.I The gain is the amount of overlap of these intervals.

Roughly speaking, the resolution controls the number of nodes in the output andthe ’size’ of feature you can pick out, while the gain controls the number of edgesand the ’tightness’ of the graph.

Page 79: Using Topological Data Analysis on your BigData

New Parameters

We’ve introduced new parameters into the construction:I The resolution is the number of open sets in the range.

I The gain is the amount of overlap of these intervals.Roughly speaking, the resolution controls the number of nodes in the output andthe ’size’ of feature you can pick out, while the gain controls the number of edgesand the ’tightness’ of the graph.

Page 80: Using Topological Data Analysis on your BigData

New Parameters

We’ve introduced new parameters into the construction:I The resolution is the number of open sets in the range.I The gain is the amount of overlap of these intervals.

Roughly speaking, the resolution controls the number of nodes in the output andthe ’size’ of feature you can pick out, while the gain controls the number of edgesand the ’tightness’ of the graph.

Page 81: Using Topological Data Analysis on your BigData

New Parameters

We’ve introduced new parameters into the construction:I The resolution is the number of open sets in the range.I The gain is the amount of overlap of these intervals.

Roughly speaking, the resolution controls the number of nodes in the output andthe ’size’ of feature you can pick out, while the gain controls the number of edgesand the ’tightness’ of the graph.

Page 82: Using Topological Data Analysis on your BigData

Resolution: A closer look

f

()

()

()

()

U1

U2

U3

U4

Page 83: Using Topological Data Analysis on your BigData

Resolution: A closer look

f

()

()

()

()

U1

U2

U3

U4

Page 84: Using Topological Data Analysis on your BigData

Resolution: A closer look

f

()

()

()

()

U1

U2

U3

U4

()

()

Page 85: Using Topological Data Analysis on your BigData

Resolution: A closer look

f

()

()

()

()

U1

U2

U3

U4

()

()

Page 86: Using Topological Data Analysis on your BigData

Step 2: Clustering as π0

We need to make a final adjustment to the algorithm to bring it into data world.

I We replace "connected component of the inverse image" is with "clusters inthe inverse image".

I We connect clusters (nodes) with an edge if they share points in common.

Page 87: Using Topological Data Analysis on your BigData

Step 2: Clustering as π0

We need to make a final adjustment to the algorithm to bring it into data world.

I We replace "connected component of the inverse image" is with "clusters inthe inverse image".

I We connect clusters (nodes) with an edge if they share points in common.

Page 88: Using Topological Data Analysis on your BigData

Step 2: Clustering as π0

We need to make a final adjustment to the algorithm to bring it into data world.

I We replace "connected component of the inverse image" is with "clusters inthe inverse image".

I We connect clusters (nodes) with an edge if they share points in common.

Page 89: Using Topological Data Analysis on your BigData

Step 2: Clustering as π0

f

I Nodes are clusters of data pointsI Edges represent shared points between the clusters

Page 90: Using Topological Data Analysis on your BigData

Step 2: Clustering as π0

f

I Nodes are clusters of data pointsI Edges represent shared points between the clusters

Page 91: Using Topological Data Analysis on your BigData

Step 2: Clustering as π0

f

U1

I Nodes are clusters of data pointsI Edges represent shared points between the clusters

Page 92: Using Topological Data Analysis on your BigData

Step 2: Clustering as π0

f

U1

U2

I Nodes are clusters of data pointsI Edges represent shared points between the clusters

Page 93: Using Topological Data Analysis on your BigData

Step 2: Clustering as π0

f

U1

U2

I Nodes are clusters of data pointsI Edges represent shared points between the clusters

Page 94: Using Topological Data Analysis on your BigData

Step 2: Clustering as π0

f

U1

U2

I Nodes are clusters of data pointsI Edges represent shared points between the clusters

Page 95: Using Topological Data Analysis on your BigData

Step 2: Clustering as π0

f

U1

U2

I Nodes are clusters of data points

I Edges represent shared points between the clusters

Page 96: Using Topological Data Analysis on your BigData

Step 2: Clustering as π0

f

U1

U2

I Nodes are clusters of data pointsI Edges represent shared points between the clusters

Page 97: Using Topological Data Analysis on your BigData

Step 2: Clustering as π0

f

U1

U2

I Nodes are clusters of data pointsI Edges represent shared points between the clusters

Page 98: Using Topological Data Analysis on your BigData

Step 2: Clustering as π0

f

U1

U2

I Nodes are clusters of data pointsI Edges represent shared points between the clusters

Page 99: Using Topological Data Analysis on your BigData

Step 2: Clustering as π0

f

U1

U2

I Nodes are clusters of data pointsI Edges represent shared points between the clusters

Page 100: Using Topological Data Analysis on your BigData

Step 2: Clustering as π0

f

U1

U2

I Nodes are clusters of data pointsI Edges represent shared points between the clusters

Page 101: Using Topological Data Analysis on your BigData

Step 2: Clustering as π0

f

U1

U2

I Nodes are clusters of data pointsI Edges represent shared points between the clusters

Page 102: Using Topological Data Analysis on your BigData

Step 2: Clustering as π0

f

U1

U2

I Nodes are clusters of data pointsI Edges represent shared points between the clusters

Page 103: Using Topological Data Analysis on your BigData

Step 2: Clustering as π0

f

U1

U2

I Nodes are clusters of data pointsI Edges represent shared points between the clusters

Page 104: Using Topological Data Analysis on your BigData

Step 2: Clustering as π0

f

U1

U2

I Nodes are clusters of data pointsI Edges represent shared points between the clusters

Page 105: Using Topological Data Analysis on your BigData

Step 2: Clustering as π0

f

U1

U2

I Nodes are clusters of data pointsI Edges represent shared points between the clusters

Page 106: Using Topological Data Analysis on your BigData

Step 2: Clustering as π0

f

U1

U2

I Nodes are clusters of data pointsI Edges represent shared points between the clusters

Page 107: Using Topological Data Analysis on your BigData

That’s It

Ok not quite...

Page 108: Using Topological Data Analysis on your BigData

That’s It

Ok not quite...

Page 109: Using Topological Data Analysis on your BigData

Lenses: Where do they come from

The technique rests on finding good lenses.

⇒ Luckily lots of people have worked on this problem

Page 110: Using Topological Data Analysis on your BigData

Lenses: Where do they come from

The technique rests on finding good lenses.

⇒ Luckily lots of people have worked on this problem

Page 111: Using Topological Data Analysis on your BigData

Lenses: Where do they come from

I Standard data analysis functionsI Geometry and TopologyI Modern StatisticsI Domain Knowledge / Data Modeling

A Non Exhaustive Table of Lenses

Statistics Geometry Machine Learning Data DrivenMean/Max/Min Centrality PCA/SVD AgeVariance Curvature Autoencoders Datesn-Moment Harmonic Cycles Isomap/MDS/TSNE ...Density ... SVM Distance from Hyperplane... Error/Debugging Info

...

Page 112: Using Topological Data Analysis on your BigData

Lenses: Where do they come from

I Standard data analysis functions

I Geometry and TopologyI Modern StatisticsI Domain Knowledge / Data Modeling

A Non Exhaustive Table of LensesStatistics

Geometry Machine Learning Data DrivenMean/Max/Min Centrality PCA/SVD AgeVariance Curvature Autoencoders Datesn-Moment Harmonic Cycles Isomap/MDS/TSNE ...Density ... SVM Distance from Hyperplane... Error/Debugging Info

...

Page 113: Using Topological Data Analysis on your BigData

Lenses: Where do they come from

I Standard data analysis functions

I Geometry and TopologyI Modern StatisticsI Domain Knowledge / Data Modeling

A Non Exhaustive Table of LensesStatistics

Geometry Machine Learning Data Driven

Mean/Max/Min

Centrality PCA/SVD AgeVariance Curvature Autoencoders Datesn-Moment Harmonic Cycles Isomap/MDS/TSNE ...Density ... SVM Distance from Hyperplane... Error/Debugging Info

...

Page 114: Using Topological Data Analysis on your BigData

Lenses: Where do they come from

I Standard data analysis functions

I Geometry and TopologyI Modern StatisticsI Domain Knowledge / Data Modeling

A Non Exhaustive Table of LensesStatistics

Geometry Machine Learning Data Driven

Mean/Max/Min

Centrality PCA/SVD Age

Variance

Curvature Autoencoders Datesn-Moment Harmonic Cycles Isomap/MDS/TSNE ...Density ... SVM Distance from Hyperplane... Error/Debugging Info

...

Page 115: Using Topological Data Analysis on your BigData

Lenses: Where do they come from

I Standard data analysis functions

I Geometry and TopologyI Modern StatisticsI Domain Knowledge / Data Modeling

A Non Exhaustive Table of LensesStatistics

Geometry Machine Learning Data Driven

Mean/Max/Min

Centrality PCA/SVD Age

Variance

Curvature Autoencoders Dates

n-Moment

Harmonic Cycles Isomap/MDS/TSNE ...Density ... SVM Distance from Hyperplane... Error/Debugging Info

...

Page 116: Using Topological Data Analysis on your BigData

Lenses: Where do they come from

I Standard data analysis functions

I Geometry and TopologyI Modern StatisticsI Domain Knowledge / Data Modeling

A Non Exhaustive Table of LensesStatistics

Geometry Machine Learning Data Driven

Mean/Max/Min

Centrality PCA/SVD Age

Variance

Curvature Autoencoders Dates

n-Moment

Harmonic Cycles Isomap/MDS/TSNE ...

Density

... SVM Distance from Hyperplane... Error/Debugging Info

...

Page 117: Using Topological Data Analysis on your BigData

Lenses: Where do they come from

I Standard data analysis functions

I Geometry and TopologyI Modern StatisticsI Domain Knowledge / Data Modeling

A Non Exhaustive Table of LensesStatistics

Geometry Machine Learning Data Driven

Mean/Max/Min

Centrality PCA/SVD Age

Variance

Curvature Autoencoders Dates

n-Moment

Harmonic Cycles Isomap/MDS/TSNE ...

Density

... SVM Distance from Hyperplane

...

Error/Debugging Info...

Page 118: Using Topological Data Analysis on your BigData

Lenses: Where do they come from

I Standard data analysis functionsI Geometry and Topology

I Modern StatisticsI Domain Knowledge / Data Modeling

A Non Exhaustive Table of LensesStatistics Geometry

Machine Learning Data Driven

Mean/Max/Min

Centrality PCA/SVD Age

Variance

Curvature Autoencoders Dates

n-Moment

Harmonic Cycles Isomap/MDS/TSNE ...

Density

... SVM Distance from Hyperplane

...

Error/Debugging Info...

Page 119: Using Topological Data Analysis on your BigData

Lenses: Where do they come from

I Standard data analysis functionsI Geometry and Topology

I Modern StatisticsI Domain Knowledge / Data Modeling

A Non Exhaustive Table of LensesStatistics Geometry

Machine Learning Data Driven

Mean/Max/Min Centrality

PCA/SVD Age

Variance

Curvature Autoencoders Dates

n-Moment

Harmonic Cycles Isomap/MDS/TSNE ...

Density

... SVM Distance from Hyperplane

...

Error/Debugging Info...

Page 120: Using Topological Data Analysis on your BigData

Lenses: Where do they come from

I Standard data analysis functionsI Geometry and Topology

I Modern StatisticsI Domain Knowledge / Data Modeling

A Non Exhaustive Table of LensesStatistics Geometry

Machine Learning Data Driven

Mean/Max/Min Centrality

PCA/SVD Age

Variance Curvature

Autoencoders Dates

n-Moment

Harmonic Cycles Isomap/MDS/TSNE ...

Density

... SVM Distance from Hyperplane

...

Error/Debugging Info...

Page 121: Using Topological Data Analysis on your BigData

Lenses: Where do they come from

I Standard data analysis functionsI Geometry and Topology

I Modern StatisticsI Domain Knowledge / Data Modeling

A Non Exhaustive Table of LensesStatistics Geometry

Machine Learning Data Driven

Mean/Max/Min Centrality

PCA/SVD Age

Variance Curvature

Autoencoders Dates

n-Moment Harmonic Cycles

Isomap/MDS/TSNE ...

Density

... SVM Distance from Hyperplane

...

Error/Debugging Info...

Page 122: Using Topological Data Analysis on your BigData

Lenses: Where do they come from

I Standard data analysis functionsI Geometry and Topology

I Modern StatisticsI Domain Knowledge / Data Modeling

A Non Exhaustive Table of LensesStatistics Geometry

Machine Learning Data Driven

Mean/Max/Min Centrality

PCA/SVD Age

Variance Curvature

Autoencoders Dates

n-Moment Harmonic Cycles

Isomap/MDS/TSNE ...

Density ...

SVM Distance from Hyperplane

...

Error/Debugging Info...

Page 123: Using Topological Data Analysis on your BigData

Lenses: Where do they come from

I Standard data analysis functionsI Geometry and TopologyI Modern Statistics

I Domain Knowledge / Data Modeling

A Non Exhaustive Table of LensesStatistics Geometry Machine Learning

Data Driven

Mean/Max/Min Centrality

PCA/SVD Age

Variance Curvature

Autoencoders Dates

n-Moment Harmonic Cycles

Isomap/MDS/TSNE ...

Density ...

SVM Distance from Hyperplane

...

Error/Debugging Info...

Page 124: Using Topological Data Analysis on your BigData

Lenses: Where do they come from

I Standard data analysis functionsI Geometry and TopologyI Modern Statistics

I Domain Knowledge / Data Modeling

A Non Exhaustive Table of LensesStatistics Geometry Machine Learning

Data Driven

Mean/Max/Min Centrality PCA/SVD

Age

Variance Curvature

Autoencoders Dates

n-Moment Harmonic Cycles

Isomap/MDS/TSNE ...

Density ...

SVM Distance from Hyperplane

...

Error/Debugging Info...

Page 125: Using Topological Data Analysis on your BigData

Lenses: Where do they come from

I Standard data analysis functionsI Geometry and TopologyI Modern Statistics

I Domain Knowledge / Data Modeling

A Non Exhaustive Table of LensesStatistics Geometry Machine Learning

Data Driven

Mean/Max/Min Centrality PCA/SVD

Age

Variance Curvature Autoencoders

Dates

n-Moment Harmonic Cycles

Isomap/MDS/TSNE ...

Density ...

SVM Distance from Hyperplane

...

Error/Debugging Info...

Page 126: Using Topological Data Analysis on your BigData

Lenses: Where do they come from

I Standard data analysis functionsI Geometry and TopologyI Modern Statistics

I Domain Knowledge / Data Modeling

A Non Exhaustive Table of LensesStatistics Geometry Machine Learning

Data Driven

Mean/Max/Min Centrality PCA/SVD

Age

Variance Curvature Autoencoders

Dates

n-Moment Harmonic Cycles Isomap/MDS/TSNE

...

Density ...

SVM Distance from Hyperplane

...

Error/Debugging Info...

Page 127: Using Topological Data Analysis on your BigData

Lenses: Where do they come from

I Standard data analysis functionsI Geometry and TopologyI Modern Statistics

I Domain Knowledge / Data Modeling

A Non Exhaustive Table of LensesStatistics Geometry Machine Learning

Data Driven

Mean/Max/Min Centrality PCA/SVD

Age

Variance Curvature Autoencoders

Dates

n-Moment Harmonic Cycles Isomap/MDS/TSNE

...

Density ... SVM Distance from Hyperplane...

Error/Debugging Info...

Page 128: Using Topological Data Analysis on your BigData

Lenses: Where do they come from

I Standard data analysis functionsI Geometry and TopologyI Modern Statistics

I Domain Knowledge / Data Modeling

A Non Exhaustive Table of LensesStatistics Geometry Machine Learning

Data Driven

Mean/Max/Min Centrality PCA/SVD

Age

Variance Curvature Autoencoders

Dates

n-Moment Harmonic Cycles Isomap/MDS/TSNE

...

Density ... SVM Distance from Hyperplane... Error/Debugging Info

...

Page 129: Using Topological Data Analysis on your BigData

Lenses: Where do they come from

I Standard data analysis functionsI Geometry and TopologyI Modern Statistics

I Domain Knowledge / Data Modeling

A Non Exhaustive Table of LensesStatistics Geometry Machine Learning

Data Driven

Mean/Max/Min Centrality PCA/SVD

Age

Variance Curvature Autoencoders

Dates

n-Moment Harmonic Cycles Isomap/MDS/TSNE

...

Density ... SVM Distance from Hyperplane... Error/Debugging Info

...

Page 130: Using Topological Data Analysis on your BigData

Lenses: Where do they come from

I Standard data analysis functionsI Geometry and TopologyI Modern StatisticsI Domain Knowledge / Data Modeling

A Non Exhaustive Table of LensesStatistics Geometry Machine Learning Data DrivenMean/Max/Min Centrality PCA/SVD

Age

Variance Curvature Autoencoders

Dates

n-Moment Harmonic Cycles Isomap/MDS/TSNE

...

Density ... SVM Distance from Hyperplane... Error/Debugging Info

...

Page 131: Using Topological Data Analysis on your BigData

Lenses: Where do they come from

I Standard data analysis functionsI Geometry and TopologyI Modern StatisticsI Domain Knowledge / Data Modeling

A Non Exhaustive Table of LensesStatistics Geometry Machine Learning Data DrivenMean/Max/Min Centrality PCA/SVD AgeVariance Curvature Autoencoders

Dates

n-Moment Harmonic Cycles Isomap/MDS/TSNE

...

Density ... SVM Distance from Hyperplane... Error/Debugging Info

...

Page 132: Using Topological Data Analysis on your BigData

Lenses: Where do they come from

I Standard data analysis functionsI Geometry and TopologyI Modern StatisticsI Domain Knowledge / Data Modeling

A Non Exhaustive Table of LensesStatistics Geometry Machine Learning Data DrivenMean/Max/Min Centrality PCA/SVD AgeVariance Curvature Autoencoders Datesn-Moment Harmonic Cycles Isomap/MDS/TSNE

...

Density ... SVM Distance from Hyperplane... Error/Debugging Info

...

Page 133: Using Topological Data Analysis on your BigData

Lenses: Where do they come from

I Standard data analysis functionsI Geometry and TopologyI Modern StatisticsI Domain Knowledge / Data Modeling

A Non Exhaustive Table of LensesStatistics Geometry Machine Learning Data DrivenMean/Max/Min Centrality PCA/SVD AgeVariance Curvature Autoencoders Datesn-Moment Harmonic Cycles Isomap/MDS/TSNE ...Density ... SVM Distance from Hyperplane... Error/Debugging Info

...

Page 134: Using Topological Data Analysis on your BigData

Interperability and Meaning

But what about insight? meaning?

Page 135: Using Topological Data Analysis on your BigData

Interperability and Meaning

f=⇒Complex Data

The units on the lens give interperability/meaning to the topological summary.

Page 136: Using Topological Data Analysis on your BigData

Interperability and Meaning

f=⇒Complex Data f is gaussian density

The units on the lens give interperability/meaning to the topological summary.

Page 137: Using Topological Data Analysis on your BigData

Interperability and Meaning

f=⇒Complex Data

f is gaussian density

⇒ The data is bi-modal.

The units on the lens give interperability/meaning to the topological summary.

Page 138: Using Topological Data Analysis on your BigData

Interperability and Meaning

f=⇒Complex Data f is centrality

The units on the lens give interperability/meaning to the topological summary.

Page 139: Using Topological Data Analysis on your BigData

Interperability and Meaning

f=⇒Complex Data

f is centrality

⇒ The data has two ways ofbeing abnormal.

The units on the lens give interperability/meaning to the topological summary.

Page 140: Using Topological Data Analysis on your BigData

Interperability and Meaning

f=⇒Complex Data f is mean

The units on the lens give interperability/meaning to the topological summary.

Page 141: Using Topological Data Analysis on your BigData

Interperability and Meaning

f=⇒Complex Data

f is mean

⇒ Two groups of high meandata.

The units on the lens give interperability/meaning to the topological summary.

Page 142: Using Topological Data Analysis on your BigData

Interperability and Meaning

f=⇒Complex Data f is error

The units on the lens give interperability/meaning to the topological summary.

Page 143: Using Topological Data Analysis on your BigData

Interperability and Meaning

f=⇒Complex Data

f is error

⇒ Two types of error.

The units on the lens give interperability/meaning to the topological summary.

Page 144: Using Topological Data Analysis on your BigData

Interperability and Meaning

f=⇒Complex Data

f is error

⇒ Two types of error.

The units on the lens give interperability/meaning to the topological summary.

Page 145: Using Topological Data Analysis on your BigData

Interperability and Meaning

Another way to think about lenses is as a kind of ’geometric query’.

Examples

1. Heart disease study

I Stratification by age without making arbitrary cutoffs.

2. Heavy machinery

I Use mean a variance as a lens to find what operating regimes lead to failure ofmechanical components.

Page 146: Using Topological Data Analysis on your BigData

Interperability and Meaning

Another way to think about lenses is as a kind of ’geometric query’.

Examples

1. Heart disease study

I Stratification by age without making arbitrary cutoffs.

2. Heavy machinery

I Use mean a variance as a lens to find what operating regimes lead to failure ofmechanical components.

Page 147: Using Topological Data Analysis on your BigData

Interperability and Meaning

Another way to think about lenses is as a kind of ’geometric query’.

Examples

1. Heart disease studyI Stratification by age without making arbitrary cutoffs.

2. Heavy machinery

I Use mean a variance as a lens to find what operating regimes lead to failure ofmechanical components.

Page 148: Using Topological Data Analysis on your BigData

Interperability and Meaning

Another way to think about lenses is as a kind of ’geometric query’.

Examples

1. Heart disease studyI Stratification by age without making arbitrary cutoffs.

2. Heavy machinery

I Use mean a variance as a lens to find what operating regimes lead to failure ofmechanical components.

Page 149: Using Topological Data Analysis on your BigData

Interperability and Meaning

Another way to think about lenses is as a kind of ’geometric query’.

Examples

1. Heart disease studyI Stratification by age without making arbitrary cutoffs.

2. Heavy machineryI Use mean a variance as a lens to find what operating regimes lead to failure of

mechanical components.

Page 150: Using Topological Data Analysis on your BigData

Some generalizations and extensions

MetricsI We don’t need a metric, just a notion of similarity - or perhaps a clustering

mechanism.LensesI Lenses don’t need to be continuous - just "sensible".I We can use multiple lenses at the same time.I Lenses can map to space other then R.I In fact, can work with "open covers" of the space (Here taken to mean

overlapping partitions). Don’t need a lens at all.Data

I Input space can be anything with a topology. Typically we work withrow/column numeric/categorical data but, for example, graphs are ok.

OutputI The output of the algorithm isn’t just a graph but is an abstract simplicial

complex (swept under the rug in this presentation).

Page 151: Using Topological Data Analysis on your BigData

Some generalizations and extensionsMetrics

I We don’t need a metric, just a notion of similarity - or perhaps a clusteringmechanism.

LensesI Lenses don’t need to be continuous - just "sensible".I We can use multiple lenses at the same time.I Lenses can map to space other then R.I In fact, can work with "open covers" of the space (Here taken to mean

overlapping partitions). Don’t need a lens at all.Data

I Input space can be anything with a topology. Typically we work withrow/column numeric/categorical data but, for example, graphs are ok.

OutputI The output of the algorithm isn’t just a graph but is an abstract simplicial

complex (swept under the rug in this presentation).

Page 152: Using Topological Data Analysis on your BigData

Some generalizations and extensionsMetricsI We don’t need a metric, just a notion of similarity - or perhaps a clustering

mechanism.

LensesI Lenses don’t need to be continuous - just "sensible".I We can use multiple lenses at the same time.I Lenses can map to space other then R.I In fact, can work with "open covers" of the space (Here taken to mean

overlapping partitions). Don’t need a lens at all.Data

I Input space can be anything with a topology. Typically we work withrow/column numeric/categorical data but, for example, graphs are ok.

OutputI The output of the algorithm isn’t just a graph but is an abstract simplicial

complex (swept under the rug in this presentation).

Page 153: Using Topological Data Analysis on your BigData

Some generalizations and extensionsMetricsI We don’t need a metric, just a notion of similarity - or perhaps a clustering

mechanism.Lenses

I Lenses don’t need to be continuous - just "sensible".I We can use multiple lenses at the same time.I Lenses can map to space other then R.I In fact, can work with "open covers" of the space (Here taken to mean

overlapping partitions). Don’t need a lens at all.Data

I Input space can be anything with a topology. Typically we work withrow/column numeric/categorical data but, for example, graphs are ok.

OutputI The output of the algorithm isn’t just a graph but is an abstract simplicial

complex (swept under the rug in this presentation).

Page 154: Using Topological Data Analysis on your BigData

Some generalizations and extensionsMetricsI We don’t need a metric, just a notion of similarity - or perhaps a clustering

mechanism.LensesI Lenses don’t need to be continuous - just "sensible".

I We can use multiple lenses at the same time.I Lenses can map to space other then R.I In fact, can work with "open covers" of the space (Here taken to mean

overlapping partitions). Don’t need a lens at all.Data

I Input space can be anything with a topology. Typically we work withrow/column numeric/categorical data but, for example, graphs are ok.

OutputI The output of the algorithm isn’t just a graph but is an abstract simplicial

complex (swept under the rug in this presentation).

Page 155: Using Topological Data Analysis on your BigData

Some generalizations and extensionsMetricsI We don’t need a metric, just a notion of similarity - or perhaps a clustering

mechanism.LensesI Lenses don’t need to be continuous - just "sensible".I We can use multiple lenses at the same time.

I Lenses can map to space other then R.I In fact, can work with "open covers" of the space (Here taken to mean

overlapping partitions). Don’t need a lens at all.Data

I Input space can be anything with a topology. Typically we work withrow/column numeric/categorical data but, for example, graphs are ok.

OutputI The output of the algorithm isn’t just a graph but is an abstract simplicial

complex (swept under the rug in this presentation).

Page 156: Using Topological Data Analysis on your BigData

Some generalizations and extensionsMetricsI We don’t need a metric, just a notion of similarity - or perhaps a clustering

mechanism.LensesI Lenses don’t need to be continuous - just "sensible".I We can use multiple lenses at the same time.I Lenses can map to space other then R.

I In fact, can work with "open covers" of the space (Here taken to meanoverlapping partitions). Don’t need a lens at all.

DataI Input space can be anything with a topology. Typically we work with

row/column numeric/categorical data but, for example, graphs are ok.Output

I The output of the algorithm isn’t just a graph but is an abstract simplicialcomplex (swept under the rug in this presentation).

Page 157: Using Topological Data Analysis on your BigData

Some generalizations and extensionsMetricsI We don’t need a metric, just a notion of similarity - or perhaps a clustering

mechanism.LensesI Lenses don’t need to be continuous - just "sensible".I We can use multiple lenses at the same time.I Lenses can map to space other then R.I In fact, can work with "open covers" of the space (Here taken to mean

overlapping partitions). Don’t need a lens at all.

DataI Input space can be anything with a topology. Typically we work with

row/column numeric/categorical data but, for example, graphs are ok.Output

I The output of the algorithm isn’t just a graph but is an abstract simplicialcomplex (swept under the rug in this presentation).

Page 158: Using Topological Data Analysis on your BigData

Some generalizations and extensionsMetricsI We don’t need a metric, just a notion of similarity - or perhaps a clustering

mechanism.LensesI Lenses don’t need to be continuous - just "sensible".I We can use multiple lenses at the same time.I Lenses can map to space other then R.I In fact, can work with "open covers" of the space (Here taken to mean

overlapping partitions). Don’t need a lens at all.Data

I Input space can be anything with a topology. Typically we work withrow/column numeric/categorical data but, for example, graphs are ok.

OutputI The output of the algorithm isn’t just a graph but is an abstract simplicial

complex (swept under the rug in this presentation).

Page 159: Using Topological Data Analysis on your BigData

Some generalizations and extensionsMetricsI We don’t need a metric, just a notion of similarity - or perhaps a clustering

mechanism.LensesI Lenses don’t need to be continuous - just "sensible".I We can use multiple lenses at the same time.I Lenses can map to space other then R.I In fact, can work with "open covers" of the space (Here taken to mean

overlapping partitions). Don’t need a lens at all.Data

I Input space can be anything with a topology. Typically we work withrow/column numeric/categorical data but, for example, graphs are ok.

OutputI The output of the algorithm isn’t just a graph but is an abstract simplicial

complex (swept under the rug in this presentation).

Page 160: Using Topological Data Analysis on your BigData

Some generalizations and extensionsMetricsI We don’t need a metric, just a notion of similarity - or perhaps a clustering

mechanism.LensesI Lenses don’t need to be continuous - just "sensible".I We can use multiple lenses at the same time.I Lenses can map to space other then R.I In fact, can work with "open covers" of the space (Here taken to mean

overlapping partitions). Don’t need a lens at all.Data

I Input space can be anything with a topology. Typically we work withrow/column numeric/categorical data but, for example, graphs are ok.

Output

I The output of the algorithm isn’t just a graph but is an abstract simplicialcomplex (swept under the rug in this presentation).

Page 161: Using Topological Data Analysis on your BigData

Some generalizations and extensionsMetricsI We don’t need a metric, just a notion of similarity - or perhaps a clustering

mechanism.LensesI Lenses don’t need to be continuous - just "sensible".I We can use multiple lenses at the same time.I Lenses can map to space other then R.I In fact, can work with "open covers" of the space (Here taken to mean

overlapping partitions). Don’t need a lens at all.Data

I Input space can be anything with a topology. Typically we work withrow/column numeric/categorical data but, for example, graphs are ok.

OutputI The output of the algorithm isn’t just a graph but is an abstract simplicial

complex (swept under the rug in this presentation).

Page 162: Using Topological Data Analysis on your BigData

Demo

Page 163: Using Topological Data Analysis on your BigData

Online FraudFraud Score

Page 164: Using Topological Data Analysis on your BigData

Online FraudCharge Back (Ground Truth)

Page 165: Using Topological Data Analysis on your BigData

Online FraudTime On Page

Page 166: Using Topological Data Analysis on your BigData

Online FraudNo Flash

Page 167: Using Topological Data Analysis on your BigData

Online FraudNo Javascript

Page 168: Using Topological Data Analysis on your BigData

Parkinson’s Detection with Mobile Phone