Download - FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Transcript
Page 1: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 1

FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Yasuko Matsubara, Yasushi Sakurai (Kumamoto University)

Willem G. van Panhuis (University of Pittsburgh)

Christos Faloutsos (CMU)

SIGKDD 2014

Page 2: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 2

Motivation

Given: Large set of epidemiological data

SIGKDD 2014

e.g., Measles cases in the U.S.

Linear

(Weekly)

Page 3: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 3

Motivation

Given: Large set of epidemiological data

SIGKDD 2014

e.g., Measles cases in the U.S.

Linear

Yearly periodicity

(Weekly)

Page 4: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 4

Motivation

Given: Large set of epidemiological data

SIGKDD 2014

e.g., Measles cases in the U.S.

Linear

Yearly periodicity Vaccine

effect

(Weekly)

Page 5: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 5

Motivation

Given: Large set of epidemiological data

SIGKDD 2014

e.g., Measles cases in the U.S.

Linear

Yearly periodicity

Shocks,e.g., 1941

Vaccineeffect

(Weekly)

Page 6: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 6

Motivation

Given: Large set of epidemiological data

SIGKDD 2014

e.g., Measles cases in the U.S.

Linear

Goal: summarize all the epidemic time-series, “fully-automatically”

Page 7: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 7

Data description

Project Tycho: infectious diseases in the U.S.

SIGKDD 2014

50 states

56 d

isea

ses

Time (weekly) 1888 (> 125 years)

X

Page 8: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 8

50 states

56 d

isea

ses

Time (weekly) 1888 (> 125 years)

Data description

Project Tycho: infectious diseases in the U.S.

SIGKDD 2014

Element x : # of casese.g., ‘measles’, ‘NY’, ‘April 1-7, 1931’, ‘4000’

Xx

Page 9: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 9

Problem definition

Given: Tensor X (disease x state x time)

SIGKDD 2014

X

Page 10: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 10

Problem definition

Given: Tensor X (disease x state x time)

Find: Compact description of X, “automatically”

SIGKDD 2014

X

P1 P2 P3 P4 P5

ME

FUNNEL

X

Page 11: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 11

Problem definition

Given: Tensor X (disease x state x time)

Find: Compact description of X, “automatically”

SIGKDD 2014

X

P1 P2 P3 P4 P5

ME

FUNNEL

X

Discontinuities

Page 12: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 12

Problem definition

Given: Tensor X (disease x state x time)

Find: Compact description of X, “automatically”

SIGKDD 2014

X

P1 P2 P3 P4 P5

ME

FUNNEL

X

NO magic numbers !

Parameter-free!

Page 13: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Roadmap

SIGKDD 2014 13Y. Matsubara et al.

- Motivation- Modeling power of FUNNEL- Overview – main ideas- Proposed model – idea #1 - Algorithm – idea #2- Experiments- Discussion- Conclusions

Page 14: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 14

Modeling power of FUNNEL

SIGKDD 2014

Questions about epidemicsQ1 Q2 Q3 Q4 Q5

X

Page 15: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 15

X

Questions

SIGKDD 2014

Are there any periodicities?If yes, when is the peak season?

Q1 Q2 Q3 Q4 Q5

Q1

Page 16: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 16

Answers

SIGKDD 2014

Q1 Q2 Q3 Q4 Q5

SIGKDD 2014 16Y. Matsubara et al.

SeasonalityP1

Angle: peak season

Radius: seasonality strength

FUNNEL:

Polar plot

Page 17: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 17

Answers

SIGKDD 2014

Q1 Q2 Q3 Q4 Q5

SIGKDD 2014 17Y. Matsubara et al.

SeasonalityP1

Questions

?

Page 18: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 18

Answers

SIGKDD 2014

Q1 Q2 Q3 Q4 Q5

SIGKDD 2014 18Y. Matsubara et al.

SeasonalityP1

Questions

?Q: Does Influenza have seasonality? If yes, when?

Page 19: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 19

Answers

SIGKDD 2014

Q1 Q2 Q3 Q4 Q5

SIGKDD 2014 19Y. Matsubara et al.

SeasonalityP1

Questions

?

Page 20: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 20

Answers

SIGKDD 2014

Q1 Q2 Q3 Q4 Q5

SIGKDD 2014 20Y. Matsubara et al.

SeasonalityP1

Influenza in Feb.Detected by FUNNEL(strong seasonality)

Detected!

Page 21: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 21

Answers

SIGKDD 2014

Q1 Q2 Q3 Q4 Q5

SIGKDD 2014 21Y. Matsubara et al.

SeasonalityP1

Questions

?

Page 22: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 22

Answers

SIGKDD 2014

Q1 Q2 Q3 Q4 Q5

SIGKDD 2014 22Y. Matsubara et al.

SeasonalityP1

Questions

?Q: How about measles ?

Page 23: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 23

Answers

SIGKDD 2014

Q1 Q2 Q3 Q4 Q5

SIGKDD 2014 23Y. Matsubara et al.

SeasonalityP1

Questions

?

Page 24: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 24

Answers

SIGKDD 2014

Q1 Q2 Q3 Q4 Q5

SIGKDD 2014 24Y. Matsubara et al.

SeasonalityP1

Measles(children’s)

in spring

Detected!

Page 25: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 25

Answers

SIGKDD 2014

Q1 Q2 Q3 Q4 Q5

SIGKDD 2014 25Y. Matsubara et al.

SeasonalityP1

Questions

?

Page 26: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 26

Answers

SIGKDD 2014

Q1 Q2 Q3 Q4 Q5

SIGKDD 2014 26Y. Matsubara et al.

SeasonalityP1

Questions

?Q: Which disease peaks in summer?

Page 27: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 27

Answers

SIGKDD 2014

Q1 Q2 Q3 Q4 Q5

SIGKDD 2014 27Y. Matsubara et al.

SeasonalityP1

Questions

?

Page 28: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 28

Answers

SIGKDD 2014

Q1 Q2 Q3 Q4 Q5

SIGKDD 2014 28Y. Matsubara et al.

SeasonalityP1

Detected!

Lyme-disease

(tick-borne)in summer

Page 29: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 29

Answers

SIGKDD 2014

Q1 Q2 Q3 Q4 Q5

SIGKDD 2014 29Y. Matsubara et al.

SeasonalityP1

Questions

?

Page 30: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 30

Answers

SIGKDD 2014

Q1 Q2 Q3 Q4 Q5

SIGKDD 2014 30Y. Matsubara et al.

SeasonalityP1

Questions

?Q: Which disease has no periodicity?

Page 31: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 31

Answers

SIGKDD 2014

Q1 Q2 Q3 Q4 Q5

SIGKDD 2014 31Y. Matsubara et al.

SeasonalityP1

Questions

?

Page 32: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 32

Answers

SIGKDD 2014

Q1 Q2 Q3 Q4 Q5

SIGKDD 2014 32Y. Matsubara et al.

SeasonalityP1

Detected!

Gonorrhea(STD)

no periodicity

Page 33: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 33

Questions

SIGKDD 2014

Can we see any discontinuities?

Q1 Q2 Q3 Q4 Q5

Q2

X

Page 34: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 34

Answers

SIGKDD 2014

Q1 Q2 Q3 Q4 Q5

Disease reduction effectP2

Measles 1965: Detected by

FUNNEL

1963: Vaccine licensure

Detected!

Page 35: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 35

Questions

SIGKDD 2014

Q3

What’s the difference between measles in NY and in FL?

Q1 Q2 Q3 Q4 Q5

Page 36: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 36

Answers

SIGKDD 2014

Q1 Q2 Q3 Q4 Q5

area sensitivityP3

FUNNEL’s guess of susceptibles (measles)

CA

TX

NY, PA(more

children)

FL (fewer children)

Detected!

Page 37: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 37

X

Questions

SIGKDD 2014

Q4Are there any external shock events, like wars?

Q1 Q2 Q3 Q4 Q5

Page 38: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 38

Answers

SIGKDD 2014

Q1 Q2 Q3 Q4 Q5

external shock events

P4

Funnel can detect external shocks“fully-automatically” !

World war II

Scarlet fever

Detected by FUNNEL

Detected!

Page 39: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 39

X

Questions

SIGKDD 2014

Q5

How can we remove mistakes and incorrect values?

Q1 Q2 Q3 Q4 Q5

TYPO!

Page 40: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 40

Answers

SIGKDD 2014

Q1 Q2 Q3 Q4 Q5

mistakes

P5

It can also detect typos, “automatically” !!

Missing values

Typhoid fever cases

Detected!

Mistake

Page 41: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 41

Modeling power of FUNNEL

Our model can capture 5 properties

SIGKDD 2014

P1

P2

P3

P4

P5

Seasonality

Disease

reductions

Area sensitivity

External events

Mistakes

Page 42: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Roadmap

SIGKDD 2014 42Y. Matsubara et al.

- Motivation- Modeling power of FUNNEL- Overview – main ideas- Proposed model – idea #1 - Algorithm – idea #2- Experiments- Discussion- Conclusions

✔✔

Page 43: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 43

Problem definition

Given: Tensor X (disease x state x time)

Find: Compact description of X, “automatically”

SIGKDD 2014

X

P1 P2 P3 P4 P5

ME

FUNNEL

X

Page 44: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 44

Two main ideas

Idea #1: Grey-box model

Idea #2: MDL for fitting

SIGKDD 2014

P1 P2 P3 P4 P5

ME

FUNNEL

X

NO magic numbers !(parameter-free)

Page 45: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 45

Two main ideas

Idea #1: Grey-box model - domain knowledge

SIGKDD 2014

P1 P2 P3 P4 P5

ME

FUNNEL

X

(SIRS+) : 6 parameters

ShocksVaccine

Page 46: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 46

Two main ideas

Idea #2: Fitting with MDL -> parameter free!

SIGKDD 2014

P1 P2 P3 P4 P5

ME

FUNNEL

X

NO magic numbers

Parameter-free!

Page 47: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Roadmap

SIGKDD 2014 47Y. Matsubara et al.

- Motivation- Modeling power of FUNNEL- Overview – main ideas- Proposed model – idea #1 - Algorithm – idea #2- Experiments- Discussion- Conclusions

✔✔✔

Page 48: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 48

Proposed model: FUNNEL

SIGKDD 2014

states

dise

ases

Time

single epidemic

Multi-evolving epidemics

(a) FUNNEL-single

(b) FUNNEL-full

X

Page 49: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 49

Proposed model: FUNNEL

SIGKDD 2014

states

dise

ases

Time

single epidemic

Multi-evolving epidemics

(a) FUNNEL-single

(b) FUNNEL-full

X

Page 50: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 50

FUNNEL – with a single epidemic

Given: “single” epidemic sequence

Find: nonlinear equation,model parameters

SIGKDD 2014

e.g., measles in NY

FUNNEL

Page 51: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 51

FUNNEL – with a single epidemic

With a single epidemic: Funnel-RE

SIGKDD 2014

S(t)

I(t)V(t)

Linear

Log

I(t)

People of 3 classes• S :

Susceptible• I : Infected • V : Vigilant/ vaccinated

Details

Page 52: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 52

FUNNEL – with a single epidemic

With a single epidemic: Funnel-RE

SIGKDD 2014

S(t) : susceptibleI (t) : Infected V(t) : Vigilant /Vaccinated

S

V

I

Details

Page 53: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 53

FUNNEL – with a single epidemic

With a single epidemic: Funnel-RE

SIGKDD 2014

S

V

I

Details

: strength of infection

(yearly periodic func)

Page 54: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 54

FUNNEL – with a single epidemic

With a single epidemic: Funnel-RE

SIGKDD 2014

S

V

I

Details

: healing rate : disease reduction

effect

Page 55: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 55

FUNNEL – with a single epidemic

With a single epidemic: Funnel-RE

SIGKDD 2014

S

V

I

Details

: temporal susceptible rate

Page 56: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 56

Proposed model: FUNNEL

SIGKDD 2014

states

dise

ases

Time

single epidemic

Multi-evolving epidemics

(a) FUNNEL-single

(b) FUNNEL-full

X

Page 57: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 57

states

disease

s

time

n

X

(P1, P2): global/country (P3): local/state

M

n

(P4, P5): extra - E: shocks & M: mistakes

E

n

Proposed model: FUNNEL-full

P1 P2 P3

P4 P5

SIGKDD 2014

Page 58: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 58

states

disease

s

time

n

X

(P1, P2): global/country

Proposed model: FUNNEL-full

P1 P2Detail

s

Global

P1

P2Base matrix (d x 6)Disease reduction matrix (d x 2)

SIGKDD 2014

Page 59: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 59

states

disease

s

time

n

X

(P3): local/state

Proposed model: FUNNEL-full

P3Detail

s

Local

P3 Geo-disease matrix (d x l)

: potential population of disease i in state jSIGKDD 2014

Page 60: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 60

states

disease

s

time

n

X

M

n

(P4, P5): extra - E: shocks & M: mistakes

E

n

Proposed model: FUNNEL-full

P4 P5

Extra

P4

P5External shock tensor EMistake tensor M

Details

SIGKDD 2014

Page 61: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Roadmap

SIGKDD 2014 61Y. Matsubara et al.

- Motivation- Modeling power of FUNNEL- Overview – main ideas- Proposed model – idea #1 - Algorithm – idea #2- Experiments- Discussion- Conclusions

✔✔✔✔

Page 62: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 62

Challenges

Q1. How to automatically- find “external shocks” ?- ignore “mistakes” (i.e., typos) ?

Q2. How to efficiently estimate model parameters ?

SIGKDD 2014

M

E

XP1 P2 P3 P4 P5

ME

FUNNEL

Page 63: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 63

Challenges

Q1. How to automatically- find “external shocks” ?- ignore “mistakes” (i.e., typos) ?

Idea (1) : Model description cost

Q2. How to efficiently estimate model parameters ?

Idea (2): Multi-layer optimization (linear)

SIGKDD 2014

M

E

XP1 P2 P3 P4 P5

ME

FUNNEL

Page 64: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Roadmap

SIGKDD 2014 64Y. Matsubara et al.

- Motivation- Modeling power of FUNNEL- Overview – main ideas- Proposed model – idea #1 - Algorithm – idea #2- Experiments- Discussion- Conclusions

✔✔✔✔✔

Page 65: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Roadmap

SIGKDD 2014 65Y. Matsubara et al.

- Motivation- Modeling power of FUNNEL- Overview – main ideas- Proposed model – idea #1 - Algorithm – idea #2- Experiments- Discussion- Conclusions

✔✔✔✔✔✔

Page 66: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Roadmap

SIGKDD 2014 66Y. Matsubara et al.

- Motivation- Modeling power of FUNNEL- Overview – main ideas- Proposed model – idea #1 - Algorithm – idea #2- Experiments- Discussion- Conclusions

✔✔✔✔✔✔

Page 67: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

(a) FUNNEL at work - forecasting

SIGKDD 2014 67Y. Matsubara et al.

Forecasting future epidemics

Train: 2/3 sequences

Forecast:1/3 following years

?

Page 68: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

(a) FUNNEL at work - forecasting

SIGKDD 2014 68Y. Matsubara et al.

Forecasting future epidemics

Train: 2/3 sequences

Forecast:1/3 following years

Funnel can capture future epidemics (AR: fail)

Page 69: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

(a) FUNNEL at work - forecasting

SIGKDD 2014 69Y. Matsubara et al.

Forecasting future epidemics

Train: 2/3 sequences

Forecast:1/3 following years

Funnel can capture future epidemics (AR: fail)

Page 70: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

(b) Generality of FUNNEL

Epidemics on computer networks

SIGKDD 2014 70Y. Matsubara et al.

Funnel is general: it fits computer virus very well!

(10 years)

Spread via email attachment

Spread through corporate networks

Page 71: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Roadmap

SIGKDD 2014 71Y. Matsubara et al.

- Motivation- Modeling power of FUNNEL- Overview – main ideas- Proposed model – idea #1 - Algorithm – idea #2- Experiments- Discussion- Conclusions

✔✔✔✔✔✔✔

Page 72: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 72

Conclusions

FUNNEL has the following advantagesSense-making

Captures all essential aspects:

Fully-automaticNo training set

ScalableIt scales linearly

GeneralReal epidemics (+ computer virus)

SIGKDD 2014

P1 P2 P3 P4 P5

Page 73: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Thank you!

SIGKDD 2014 73Y. Matsubara et al.

Data: http://www.tycho.pitt.edu/

Code: http://www.cs.kumamoto-u.ac.jp/~yasuko/software.html

Page 74: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 74

FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Yasuko Matsubara, Yasushi Sakurai (Kumamoto University)

Willem G. van Panhuis (University of Pittsburgh)

Christos Faloutsos (CMU)

SIGKDD 2014

Page 75: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 75

Proposed model: FUNNEL-fullDetail

s

P4 P5

ME vs.What’s the difference??External shock vs. Mistake

P4 P5

giardiasis giardiasis

SIGKDD 2014

Page 76: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 76

Proposed model: FUNNEL-fullDetail

s

P4 P5

ME vs.What’s the difference??External shock vs. Mistake

P4 P5

independent

influenced

SIGKDD 2014

Page 77: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 77

Proposed model: FUNNEL-fullDetail

s

E

n

(e1)

(e2)

(ek)

P4

E=

Diseasematrix

Timematrix

Statematrix

SIGKDD 2014

Page 78: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 78

Idea (1): Model description costQ1. How should we

- find “external shocks” ?- ignore “mistakes” (i.e., typos) ?

Idea (1) : Model description cost• Minimize coding cost• find “optimal” # of externals/mistakes• “automatically”

SIGKDD 2014

M

E

Page 79: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 79

Idea (1): Model description costIdea: Minimize encoding cost!

SIGKDD 2014

Good compression

Good description

1 2 3 4 5 6 7 8 9 10

CostMCostCCostT

(# of |E|+|M|)

CostM(F) + Costc(X|F)

Model cost Coding cost

min ( )

Page 80: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 80

Idea (1): Model description costTotal cost of tensor X, given F

SIGKDD 2014

Details

Page 81: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 81

Idea (1): Model description costTotal cost of tensor X, given F

SIGKDD 2014

Details

Model description

cost of F

Coding cost of X given F

Dimensions of X

Page 82: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 82

Idea (2): Multi-layer optimization

Q2. How to efficiently estimate model parameters ?

Idea (2): Multi-layer optimization• Find “optimal” solution w.r.t.– level parameters– level parameters

SIGKDD 2014

XP1 P2 P3 P4 P5

ME

FUNNEL

LocalGlobal

Page 83: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 83

Idea (2): Multi-layer optimization

Find “optimal” solution w.r.t.

SIGKDD 2014

P1 P2 P3 P4 P5

ME

FUNNEL

LocalGlobal

LocalGlobal

Page 84: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 84

Idea (2): Multi-layer optimization

Find “optimal” solution w.r.t.

SIGKDD 2014

P1 P2 P3 P4 P5

ME

FUNNEL

LocalGlobal

LocalGlobal

P1 P2 P4 P5 P4 P5P3

Page 85: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Y. Matsubara et al. 85

Idea (2): Multi-layer optimization

Multi-layer fitting algorithm

SIGKDD 2014

Global

P1 P2 P4 P5

Local

P4 P5P3

Global fitting

Local fitting

X

X

Page 86: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Experiments

We answer the following questions…

Q1. Sense-making Can it help us understand the given epidemics?

Q2. Accuracy How well does it match the data?

Q3. Scalability How does it scale in terms of computational time?

SIGKDD 2014 86Y. Matsubara et al.

Page 87: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Q1. Sense-making

Our preliminary observations:

SIGKDD 2014 87Y. Matsubara et al.

P1

P2

P3

P4

P5

yearly periodicitydisease reduction effectsarea specificity and sensitivityexternal shock eventsmistakes, incorrect values

Page 88: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Q1. Sense-makingDisease seasonality

SIGKDD 2014 88Y. Matsubara et al.

P1

Radius: seasonality strength Angle: peak season

Page 89: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

SIGKDD 2014 89Y. Matsubara et al.

Q1. Sense-makingDisease seasonality

P1 Influenzain Feb.

Children’sin spring

Gonorrheano periodicity

Tick-bornein summer

Page 90: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

SIGKDD 2014 90Y. Matsubara et al.

Q1. Sense-makingDisease reduction effectP2

Measles (vaccine licensure: 1963)

1967

1969

Linear

Log

Page 91: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

SIGKDD 2014 91Y. Matsubara et al.

Q1. Sense-makingarea specificity and sensitivity

P3

Potential population of susceptibles (measles)

CA

TX

NY, PA

FL (less children)

Page 92: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

SIGKDD 2014 92Y. Matsubara et al.

Q1. Sense-makingarea specificity and sensitivity

P3

Measles in NY and PA

Linear

Log

Page 93: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

SIGKDD 2014 93Y. Matsubara et al.

Q1. Sense-makingexternal shock eventsP4

We can detect external shocks “automatically” !!

Linear

Log

Page 94: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

SIGKDD 2014 94Y. Matsubara et al.

Q1. Sense-makingmistakes, incorrect valuesP5

We can also detect typos “automatically” !!

Linear

Log

Missing values

Mistake

Page 95: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

SIGKDD 2014 95Y. Matsubara et al.

Q2. Accuracy

Fitting accuracy for sequences (lower is better) LocalGlobal

LocalGlobal

X X

Page 96: FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

SIGKDD 2014 96Y. Matsubara et al.

Q3. Scalability

Wall clock time vs. diseases , states , Time

X X X

FunnelFit is linear w.r.t. data size : O(dln)