Survey of the Mathematics of Big Data - KSU Web...

23
Survey of the Mathematics of Big Data Philippe B. Laval KSU September 12, 2014 Philippe B. Laval (KSU) Math & Big Data September 12, 2014 1 / 23

Transcript of Survey of the Mathematics of Big Data - KSU Web...

Page 1: Survey of the Mathematics of Big Data - KSU Web Homeksuweb.kennesaw.edu/~plaval/BigData/mathsurvey_slides.pdfSurvey of the Mathematics of Big Data ... 4 Data Analysis ... knowledge

Survey of the Mathematics of Big Data

Philippe B. Laval

KSU

September 12, 2014

Philippe B. Laval (KSU) Math & Big Data September 12, 2014 1 / 23

Page 2: Survey of the Mathematics of Big Data - KSU Web Homeksuweb.kennesaw.edu/~plaval/BigData/mathsurvey_slides.pdfSurvey of the Mathematics of Big Data ... 4 Data Analysis ... knowledge

Introduction

We survey some mathematical techniques used with Big Data. The goalhere is to make you aware of these techniques rather than giving you detailabout them. That task would take several semesters for each technique.

Philippe B. Laval (KSU) Math & Big Data September 12, 2014 2 / 23

Page 3: Survey of the Mathematics of Big Data - KSU Web Homeksuweb.kennesaw.edu/~plaval/BigData/mathsurvey_slides.pdfSurvey of the Mathematics of Big Data ... 4 Data Analysis ... knowledge

Outline

1 How Big is Big Data?2 How Can Mathematics Help

Types of DataMathematics to the Rescue!

3 Acquisition of Data

Information ContentCompressed Sensing

4 Data Analysis

High Dimensional DataImaging Data

5 Conclusion

Philippe B. Laval (KSU) Math & Big Data September 12, 2014 3 / 23

Page 4: Survey of the Mathematics of Big Data - KSU Web Homeksuweb.kennesaw.edu/~plaval/BigData/mathsurvey_slides.pdfSurvey of the Mathematics of Big Data ... 4 Data Analysis ... knowledge

How Big is Big Data?

We live in a digital world, which generates a lot of data.

New technologies produce enormous amount of data.

We are gathering more data than ever, even from old technologies.

Problem: Acquisition/storage, analysis and transmission of data.

Total data generated > total storage.Increase in generation rate >> increase in communication rate.Analysis can be very complex.

Problem: Data is noisy, unstructured and dynamic.

Philippe B. Laval (KSU) Math & Big Data September 12, 2014 4 / 23

Page 5: Survey of the Mathematics of Big Data - KSU Web Homeksuweb.kennesaw.edu/~plaval/BigData/mathsurvey_slides.pdfSurvey of the Mathematics of Big Data ... 4 Data Analysis ... knowledge

How Can Mathematics Help?

One answer is given in a video from a previous lecture, rememberingthat many of the skills mathematicians have are needed inprogramming.

But also, mathematics ...

...allows us to formalize both the data and the problem.

...provides a big "chest" of tools or methodologies.

...allows validation of the methodologies (proof of functionality).

Philippe B. Laval (KSU) Math & Big Data September 12, 2014 5 / 23

Page 6: Survey of the Mathematics of Big Data - KSU Web Homeksuweb.kennesaw.edu/~plaval/BigData/mathsurvey_slides.pdfSurvey of the Mathematics of Big Data ... 4 Data Analysis ... knowledge

How Can Mathematics Help? - Types of Data

Signals. Can be represented by a function f : R→ R

Philippe B. Laval (KSU) Math & Big Data September 12, 2014 6 / 23

Page 7: Survey of the Mathematics of Big Data - KSU Web Homeksuweb.kennesaw.edu/~plaval/BigData/mathsurvey_slides.pdfSurvey of the Mathematics of Big Data ... 4 Data Analysis ... knowledge

How Can Mathematics Help? - Types of Data

Images. Can be represented by a function f : R2 → R

Philippe B. Laval (KSU) Math & Big Data September 12, 2014 7 / 23

Page 8: Survey of the Mathematics of Big Data - KSU Web Homeksuweb.kennesaw.edu/~plaval/BigData/mathsurvey_slides.pdfSurvey of the Mathematics of Big Data ... 4 Data Analysis ... knowledge

How Can Mathematics Help? - Types of Data

Data on manifolds. Can be represented by a function f : S2 → R

Philippe B. Laval (KSU) Math & Big Data September 12, 2014 8 / 23

Page 9: Survey of the Mathematics of Big Data - KSU Web Homeksuweb.kennesaw.edu/~plaval/BigData/mathsurvey_slides.pdfSurvey of the Mathematics of Big Data ... 4 Data Analysis ... knowledge

Acquisition of Data

As noted above, the main problem is the size of the data.However, this cannot be solved by increasing storage capacity.

Possible solutions:

Search for better ways to compress.Acquire less data.

Philippe B. Laval (KSU) Math & Big Data September 12, 2014 9 / 23

Page 10: Survey of the Mathematics of Big Data - KSU Web Homeksuweb.kennesaw.edu/~plaval/BigData/mathsurvey_slides.pdfSurvey of the Mathematics of Big Data ... 4 Data Analysis ... knowledge

Acquisition of Data - Better Ways to Compress

Data Compression saves storage but also decreases bandwidth.

General ideas behind compression:

Take advantage of redundant information.Only keep the most relevant information.

Einstein: "Not everything that can be counted counts, and noteverything that counts can be counted."

Philippe B. Laval (KSU) Math & Big Data September 12, 2014 10 / 23

Page 11: Survey of the Mathematics of Big Data - KSU Web Homeksuweb.kennesaw.edu/~plaval/BigData/mathsurvey_slides.pdfSurvey of the Mathematics of Big Data ... 4 Data Analysis ... knowledge

Acquisition of Data - Better Ways to Compress

The standard for data compression was JPEG. It is based on thediscrete cosine transform (DCT) (see Fourier analysis, Fouriertransform, fast Fourier transform (FFT) and the discrete fast Fouriertransform (DFFT). It expresses the image in terms of building blocks,the functions cos

[πn

(i + 1

2

)k].

DefinitionThe DCT is a linear, invertible function f : Rn → Rn. It transforms thenumbers (x0, x1, ..., xn−1) into (X0,X1, ...,Xn−1) by the formula

Xk =n−1∑i=0

xi cos[π

n

(i +

12

)k]for k = 0, 1, ..., n − 1

Philippe B. Laval (KSU) Math & Big Data September 12, 2014 11 / 23

Page 12: Survey of the Mathematics of Big Data - KSU Web Homeksuweb.kennesaw.edu/~plaval/BigData/mathsurvey_slides.pdfSurvey of the Mathematics of Big Data ... 4 Data Analysis ... knowledge

Acquisition of Data - Better Ways to Compress

The latest standard for data compression is JPEG2000. It is similarto JPEG, but it uses wavelets instead of the discrete cosinetransform.

The branch of mathematics which studies wavelets is calledHarmonic Analysis.Like the DCT, wavelets decompose the image in terms of buildingblocks. But the building blocks are not limited to the cosine function.

Both the DCT and wavelets use the divide and conquer method.

Decompose an image in terms of its building blocks.Only keep the most relevant building blocks.

Philippe B. Laval (KSU) Math & Big Data September 12, 2014 12 / 23

Page 13: Survey of the Mathematics of Big Data - KSU Web Homeksuweb.kennesaw.edu/~plaval/BigData/mathsurvey_slides.pdfSurvey of the Mathematics of Big Data ... 4 Data Analysis ... knowledge

Acquisition of Data - Better Ways to Compress

Problem: The raw data still has to be stored firstSolution: Compressed sensing (Donoho, Candès, Romberg, Tao (2006)

Philippe B. Laval (KSU) Math & Big Data September 12, 2014 13 / 23

Page 14: Survey of the Mathematics of Big Data - KSU Web Homeksuweb.kennesaw.edu/~plaval/BigData/mathsurvey_slides.pdfSurvey of the Mathematics of Big Data ... 4 Data Analysis ... knowledge

Acquisition of Data - Compressed Sensing

Main idea:

Assumption: Only a small percentage of the data to be captured isreally relevant (sparse data).

Problem: We don’t know which.Solution: Use random building blocks.

This is pushing the sampling theorem beyond its limits!

Goal:

Goal: To capture a signal with as few points as possible. Thus theraw data will be already compressed.

Requires a lot of linear algebra, solving sparse systems.

Philippe B. Laval (KSU) Math & Big Data September 12, 2014 14 / 23

Page 15: Survey of the Mathematics of Big Data - KSU Web Homeksuweb.kennesaw.edu/~plaval/BigData/mathsurvey_slides.pdfSurvey of the Mathematics of Big Data ... 4 Data Analysis ... knowledge

Applications of Compressed Sensing

Business

Compression

Astronomy

Radar

Biology

Communications

Imaging Science

Geology

Medicine

Information theory

Remote sensing

Optics

Philippe B. Laval (KSU) Math & Big Data September 12, 2014 15 / 23

Page 16: Survey of the Mathematics of Big Data - KSU Web Homeksuweb.kennesaw.edu/~plaval/BigData/mathsurvey_slides.pdfSurvey of the Mathematics of Big Data ... 4 Data Analysis ... knowledge

Data Analysis

There are many techniques, statistical and mathematical whichalready exist.

Problem:

The analysis maybe be too complex for current computing power.Data can be noisy, unstructured and dynamic.

However, these problems cannot be simply solved by more computingpower.

Possible solutions include:

Graph Theory.TDA (Topological Data Analysis).

Philippe B. Laval (KSU) Math & Big Data September 12, 2014 16 / 23

Page 17: Survey of the Mathematics of Big Data - KSU Web Homeksuweb.kennesaw.edu/~plaval/BigData/mathsurvey_slides.pdfSurvey of the Mathematics of Big Data ... 4 Data Analysis ... knowledge

Data Analysis - High-Dimensional Data

Strategy:

Detection of inner structure.

Reduction of dimension

Examples:

A circle can be approximated bylines.

A sphere can be approximatedby flat surfaces.

Philippe B. Laval (KSU) Math & Big Data September 12, 2014 17 / 23

Page 18: Survey of the Mathematics of Big Data - KSU Web Homeksuweb.kennesaw.edu/~plaval/BigData/mathsurvey_slides.pdfSurvey of the Mathematics of Big Data ... 4 Data Analysis ... knowledge

Data Analysis - High-Dimensional Data

Philippe B. Laval (KSU) Math & Big Data September 12, 2014 18 / 23

Page 19: Survey of the Mathematics of Big Data - KSU Web Homeksuweb.kennesaw.edu/~plaval/BigData/mathsurvey_slides.pdfSurvey of the Mathematics of Big Data ... 4 Data Analysis ... knowledge

Data Analysis - High-Dimensional Data

Question: How do we identify inner structure?

By using topology. The link will open a website. Scroll down to playthe video.

Philippe B. Laval (KSU) Math & Big Data September 12, 2014 19 / 23

Page 20: Survey of the Mathematics of Big Data - KSU Web Homeksuweb.kennesaw.edu/~plaval/BigData/mathsurvey_slides.pdfSurvey of the Mathematics of Big Data ... 4 Data Analysis ... knowledge

Data Analysis - Imaging Data

Tasks:

Removing Noise.

Pattern recognition.

Reconstructing missing data.

...

Tools:

Statistics.

Fourier and harmonic analysis.

Partial differential equations.

Compressed sensing.

Linear Algebra.

Philippe B. Laval (KSU) Math & Big Data September 12, 2014 20 / 23

Page 21: Survey of the Mathematics of Big Data - KSU Web Homeksuweb.kennesaw.edu/~plaval/BigData/mathsurvey_slides.pdfSurvey of the Mathematics of Big Data ... 4 Data Analysis ... knowledge

Conclusion

Big Data is here to stay. We need to tame it!

The techniques presented are fairly diffi cult and require a lotknowledge in various branches of mathematics, statistics and science.

Not everybody needs to be a mathematician. But chances are youwill work with one. Knowing at least some mathematics will make thecommunication and the teamwork easier.

Questions?

Philippe B. Laval (KSU) Math & Big Data September 12, 2014 21 / 23

Page 22: Survey of the Mathematics of Big Data - KSU Web Homeksuweb.kennesaw.edu/~plaval/BigData/mathsurvey_slides.pdfSurvey of the Mathematics of Big Data ... 4 Data Analysis ... knowledge

Some Questions

1 Name some new technologies which generate data which were notmentioned during the talk.

2 Give examples of unstructured data not mentioned during the talk.3 Why is removing noise from data (images as well as other data)important?

4 Give applications of pattern recognition in images.

Philippe B. Laval (KSU) Math & Big Data September 12, 2014 22 / 23

Page 23: Survey of the Mathematics of Big Data - KSU Web Homeksuweb.kennesaw.edu/~plaval/BigData/mathsurvey_slides.pdfSurvey of the Mathematics of Big Data ... 4 Data Analysis ... knowledge

Online Articles to Read

1 The Mathematical Shape of Things to Come from Quanta Magazine.

2 The Real Secret to Unlocking Big Data? Math.

3 The New Shape of Big Data.

4 New Mathematical Method May Help Tame Big Data fromCommunications of the ACM.

5 Better Way to Make Sense of Big Data from Science Daily.

Philippe B. Laval (KSU) Math & Big Data September 12, 2014 23 / 23