Data, Signals, and Systems Data as a Signal and Correlation · 2020. 7. 9. · are signals in time...

27
Data as a Signal and Correlation Data, Signals, and Systems Ikhlaq Sidhu Chief Scientist & Founding Director, Sutardja Center for Entrepreneurship & Technology IEOR Emerging Area Professor Award, UC Berkeley

Transcript of Data, Signals, and Systems Data as a Signal and Correlation · 2020. 7. 9. · are signals in time...

Page 1: Data, Signals, and Systems Data as a Signal and Correlation · 2020. 7. 9. · are signals in time • Stock market • Temperature • Instrument readings Continuous signals x(t)

Data as a Signal and Correlation Data, Signals, and Systems

Ikhlaq SidhuChief Scientist & Founding Director,Sutardja Center for Entrepreneurship & TechnologyIEOR Emerging Area Professor Award, UC Berkeley

Page 2: Data, Signals, and Systems Data as a Signal and Correlation · 2020. 7. 9. · are signals in time • Stock market • Temperature • Instrument readings Continuous signals x(t)

Converting From Time Sequence Data to Features

Of course, not all data has a time property, but lets start with this type.For example( key1, value 1),( key 2, value 2)… in this case, the keys are indexed by time.

Page 3: Data, Signals, and Systems Data as a Signal and Correlation · 2020. 7. 9. · are signals in time • Stock market • Temperature • Instrument readings Continuous signals x(t)

Converting From Time Sequence Data to Features

Many Types of data are signals in time

• Stock market• Temperature• Instrument readings

Continuous signals x(t) Sampled signals (data) x(nT)

Sometimes we sample them, record at intervals of T

We get a list in a table, array, or vector

Discrete data xn = x1, x2, x3, …

(might lose time reference)

What we want (for now): features and characteristics

For example:

• Means

• Variances

• Patten matches

• Changes

• accumulation

• Frequency

Page 4: Data, Signals, and Systems Data as a Signal and Correlation · 2020. 7. 9. · are signals in time • Stock market • Temperature • Instrument readings Continuous signals x(t)

What is the Correlation of the table?

A B C D

Leads to question:

What does it mean for one row to be similar to another?

Is what is the Correlation (A, B)

Correlation Matrix: Or how is every column related to every other column:

AA AB AC ADBA BB BC BDCA CB CC CDDA DB BC DD

Page 5: Data, Signals, and Systems Data as a Signal and Correlation · 2020. 7. 9. · are signals in time • Stock market • Temperature • Instrument readings Continuous signals x(t)

Correlation and Correlation Matrices

Page 6: Data, Signals, and Systems Data as a Signal and Correlation · 2020. 7. 9. · are signals in time • Stock market • Temperature • Instrument readings Continuous signals x(t)

Correlation and Covariance:

A practical example

Data Table

X Y

------------------

5 7

8 10

14 7

15 12

…. …

… …

… …

… …

… …

Page 7: Data, Signals, and Systems Data as a Signal and Correlation · 2020. 7. 9. · are signals in time • Stock market • Temperature • Instrument readings Continuous signals x(t)

Correlation and Covariance:

A practical example

Data Table

X Y

------------------

5 7

8 10

14 7

15 12

…. …

… …

… …

Page 8: Data, Signals, and Systems Data as a Signal and Correlation · 2020. 7. 9. · are signals in time • Stock market • Temperature • Instrument readings Continuous signals x(t)

Example:

What is the correlation of X,Y?

How Do we Find It?

X Y

1 0.5

2 2

3 3.5

Page 9: Data, Signals, and Systems Data as a Signal and Correlation · 2020. 7. 9. · are signals in time • Stock market • Temperature • Instrument readings Continuous signals x(t)

Example:

What is the correlation of X,Y?

How Do we Find It?

X Y

1 0.5

2 2

3 3.5

Page 10: Data, Signals, and Systems Data as a Signal and Correlation · 2020. 7. 9. · are signals in time • Stock market • Temperature • Instrument readings Continuous signals x(t)

Example:

What is the correlation of X,Y?

How Do we Find It?

X Y

1 0.5

2 2

3 3.5

Mean 2.00 2.00

Standard Deviation 1 1.5

Variance 1 2.25

Page 11: Data, Signals, and Systems Data as a Signal and Correlation · 2020. 7. 9. · are signals in time • Stock market • Temperature • Instrument readings Continuous signals x(t)

Example: What is the correlation of X,Y?

One way to do it:

ExampleX Y X*Y E[X]E[Y]1 0.5 0.5 42 2 4 43 3.5 10.5 4

mean 2.00 2.00 5.00 4.00stdev 1 1.5var 1 2.25

E[XY] - E[X]E[Y] stdev(X) * stdev(Y)

= E[XY] - E[X]E[Y] / 1.5

= 5 – 4 / 1.5

= 1 / 1.5 = .67

Corr (X,Y)=

Page 12: Data, Signals, and Systems Data as a Signal and Correlation · 2020. 7. 9. · are signals in time • Stock market • Temperature • Instrument readings Continuous signals x(t)

Example: What is the correlation of X,Y

The other way to do it

X Y X-ux Y-uy (X-ux)(Y-uy)

1 0.5 -1 -1.5 1.5

2 2 0 0 0

3 3.5 1 1.5 1.5

mean 2.00 2.00 0.00 0.00 1.00

st.dev 1 1.5

Cor(X,Y) = E[(X-ux)(Y-uy)] / 1.5

[(1-2)(0.5-2) + (2-2)(2-2) + (3-2)(3.5-2)]/3= 1.5

= 1.5 + 0 + 1*1.5 / (3* 1.5) = 1/1.5 = .67

Page 13: Data, Signals, and Systems Data as a Signal and Correlation · 2020. 7. 9. · are signals in time • Stock market • Temperature • Instrument readings Continuous signals x(t)

Correlation Matrix

x1

x2

xn

Many samples, or a sequence in timex1(t), x2(t), .. xn(t)

To estimate from data:a) Use all samples ever collectedb) Use window size of W samples of each to estimate a recent Corr Matrix

Window WMultiple Sources

Table

x1 x2 … xnSamples123n..N+W

Samples from Window of W

Page 14: Data, Signals, and Systems Data as a Signal and Correlation · 2020. 7. 9. · are signals in time • Stock market • Temperature • Instrument readings Continuous signals x(t)

Correlation Matrix

Corr(x1,x1) …….. Corr(x1,xn)

Corr(xn,x1) Corr(xn,xn)

To estimate from data:a) Use all samples ever collectedb) Use window size of W samples of each to estimate recent Corr Matrix

Corr(x1,x1) = 1

Corr(x2, x1) = a number from -1 to 1x1

x2

xn

Window WMultiple Sources

Page 15: Data, Signals, and Systems Data as a Signal and Correlation · 2020. 7. 9. · are signals in time • Stock market • Temperature • Instrument readings Continuous signals x(t)

Code Example:

Correlations of Rows with Numpy

• Here each row is a vector of length 5

• There are 4 vectors

• Correlation matrix is 4 x 4

Import numpy as np

# ignore line formattingx = np.array( [[0.1, .32, .2, 0.4, 0.8],

[.23, .18, .56, .61, .12], [.9, .3, .6, .5, .3], [.34, .75, .91, .19, .21]])

np.corrcoef(x)Out[4]: array([[ 1. , -0.35153114, -0.74736506, -0.48917666], [-0.35153114, 1. , 0.23810227, 0.15958285], [-0.74736506, 0.23810227, 1. , -0.03960706], [-0.48917666, 0.15958285, -0.03960706, 1. ]])

• If you want the correlation of the columns, just use transpose

• np.corrcoef ( np.transpose(x) )

• For a window, use a slice:• window = x[0:4,3:5] for the last • two columns

Page 16: Data, Signals, and Systems Data as a Signal and Correlation · 2020. 7. 9. · are signals in time • Stock market • Temperature • Instrument readings Continuous signals x(t)

Correlation of Featuresfrom Different Sources

Pandas TableUse corr() method

dataframe.corr()

Page 17: Data, Signals, and Systems Data as a Signal and Correlation · 2020. 7. 9. · are signals in time • Stock market • Temperature • Instrument readings Continuous signals x(t)

Correlation Types: Pearson, Kendal, Spearman

Pearson: Understanding Correlation in a different way

X • Y = |X| |Y| Cos ΘUse n dimensions

Data Table

X Y---------------5 78 10

14 715 12

… …… …… …… …… …

f2

f1

X Y

Page 18: Data, Signals, and Systems Data as a Signal and Correlation · 2020. 7. 9. · are signals in time • Stock market • Temperature • Instrument readings Continuous signals x(t)

Pandas will create a correlation matrix with “columns”

Page 19: Data, Signals, and Systems Data as a Signal and Correlation · 2020. 7. 9. · are signals in time • Stock market • Temperature • Instrument readings Continuous signals x(t)

Kendall Correlation

List of (x,y) points

No X Y1 2 32 4 63 3 84 9 12

N(N -1) / 2 pairs of x,y points

• Concordant pairs: for

• Disconcordant pairs: when the above is not true

or

N points

Page 20: Data, Signals, and Systems Data as a Signal and Correlation · 2020. 7. 9. · are signals in time • Stock market • Temperature • Instrument readings Continuous signals x(t)

Spearman CorrelationData (x=IQ,y=TV)

Order rows by X and Index X and Y in increasing order

x y rgx rgy

Then find Pearson Correlationof (rgx,rgy)

Page 21: Data, Signals, and Systems Data as a Signal and Correlation · 2020. 7. 9. · are signals in time • Stock market • Temperature • Instrument readings Continuous signals x(t)

Correlation with Time Series Data Sources

Page 22: Data, Signals, and Systems Data as a Signal and Correlation · 2020. 7. 9. · are signals in time • Stock market • Temperature • Instrument readings Continuous signals x(t)

Correlation Matrix with multiple sources and time segments

Suppose this is x1 as an array of numbers 0 0 1 1..1 0 0 0

Suppose this is x2

What is np.corr(x1,x2[n:n+w])?

0

1

1

100

Page 23: Data, Signals, and Systems Data as a Signal and Correlation · 2020. 7. 9. · are signals in time • Stock market • Temperature • Instrument readings Continuous signals x(t)

Approaches to the Data Sequences from Multiple Sources in Tables

Discrete data for each source A, B, C.. xn = x1, x2, x3, …

A

D

.

.

One row for each source A, B, C, D..

Eg Numpy arrays

A: x1, x2, x3, …B: x1, x2, x3, …C: x1, x2, x3, …D: x1, x2, x3, …

One column for each series: A , B, C, D..

eg Pandas(add rows with every new time sample)

Ax1x2x3..

Bx1x2x3..

Cx1x2x3..

123..

Time ->

Page 24: Data, Signals, and Systems Data as a Signal and Correlation · 2020. 7. 9. · are signals in time • Stock market • Temperature • Instrument readings Continuous signals x(t)

Discrete data for each source A, B, C.. xn = x1, x2, x3, …

A

D

.

.

One row for each source A, B, C, D..

Eg Numpy arrays

A: x1, x2, x3, …B: x1, x2, x3, …C: x1, x2, x3, …D: x1, x2, x3, …

One column for each series: A , B, C, D..

eg Pandas(add rows with every new time sample)

Ax1x2x3..

Bx1x2x3..

Cx1x2x3..

123..

Time ->

Approaches to the Data Sequences in Tables

Preprocess:Filter/

ConvolutionCorrelationMean, var,

stats

Windows of time

If ML:Table of (X,Y)

forPrediction

Classification

Data Input and Storage Preprocessing ML for Decisions

Features

Page 25: Data, Signals, and Systems Data as a Signal and Correlation · 2020. 7. 9. · are signals in time • Stock market • Temperature • Instrument readings Continuous signals x(t)

Discrete data for each source A, B, C.. xn = x1, x2, x3, …

A

D

.

.

One row for each source A, B, C, D..

Eg Numpy arrays

A: x1, x2, x3, …B: x1, x2, x3, …C: x1, x2, x3, …D: x1, x2, x3, …

One column for each series: A , B, C, D..

eg Pandas(add rows with every new time sample)

Ax1x2x3..

Bx1x2x3..

Cx1x2x3..

123..

Time ->

Approaches to the Data Sequences in Tables

Data Input and Storage Example: pre-processed statistics canbe used for in ML predictions

LabeledCor(w/A) Cor(w/B) St.Dev. Cor(w/A[n-20]) Condition

A # # # # DangerB # # # # SafeC # # # # SafeD # # # # Warning

# = number obtained from preprocessing

Page 26: Data, Signals, and Systems Data as a Signal and Correlation · 2020. 7. 9. · are signals in time • Stock market • Temperature • Instrument readings Continuous signals x(t)

A High-Level Framework

Objects

Events/Experiments

People/Customers

Products

Stocks

In Real Life

Features, but also loss of information

In Sample

Out of Sample

Person 1

Person 2

Person 3

.

.

.

Person N

CharacteristicsPatterns Models

Predictions SimilaritiesDifferencesDistance

Some data has observed results

Page 27: Data, Signals, and Systems Data as a Signal and Correlation · 2020. 7. 9. · are signals in time • Stock market • Temperature • Instrument readings Continuous signals x(t)

END OF SECTION