Thank you!

79
CMU SCS Thank you! C. Faloutsos CMU

description

Thank you!. C. Faloutsos CMU. Large Graph Mining. C. Faloutsos CMU. Large Graph Mining Data Mining for fun (and profit). C. Faloutsos CMU. Outline. Credit where credit is due Technical part – Data mining Can it be automated? Research challenges Non-technical part: `Listen’ - PowerPoint PPT Presentation

Transcript of Thank you!

Page 1: Thank you!

CMU SCS

Thank you!

C. Faloutsos

CMU

Page 2: Thank you!

CMU SCS

Large Graph Mining

C. Faloutsos

CMU

Page 3: Thank you!

CMU SCS

C. Faloutsos

CMU

Page 4: Thank you!

CMU SCS

KDD'10 C. Faloutsos 4

Outline

• Credit where credit is due

• Technical part – Data mining– Can it be automated?– Research challenges

• Non-technical part: `Listen’– To the data– To non-experts

Page 5: Thank you!

CMU SCS

Nominator

• Jian Pei

KDD'10 C. Faloutsos 5

Page 6: Thank you!

CMU SCS

Endorsers

• Charu C. Aggarwal (IBM Research)• Ricardo Baeza-Yates (Yahoo! Research)• Albert-Laszlo Barabasi (Northeastern University)• Denilson Barbosa (University of Alberta)• Yixin Chen (Washington University at St. Louis)

KDD'10 C. Faloutsos 6

Page 7: Thank you!

CMU SCS

Endorsers, cont’d

• William Cohen (Carnegie Mellon University)• Diane J. Cook (Washington State University)• Gautam Das (University of Texas at Arlington)• Inderjit S. Dhillon (University of Texas at Austin)• Chris H. Q. Ding (University of Texas at

Arlington)

KDD'10 C. Faloutsos 7

Page 8: Thank you!

CMU SCS

Endorsers, cont’d

• Petros Drineas (Rensselaer Polytechnic Institute)• Tina Eliassi-Rad (Lawrence Livermore National

Laboratory)• Greg Ganger (Carnegie Mellon University)• Minos Garofalakis (Technical University of Crete)• James Garrett (Carnegie Mellon University)

KDD'10 C. Faloutsos 8

Page 9: Thank you!

CMU SCS

Endorsers, cont’d

• Dimitrios Gunopulos (University of Athens)• Xiaofei He (Zhejiang University)• Panagiotis G. Ipeirotis (New York University)• Eamonn Keogh (UCR)• Hiroyuki Kitagawa (University of Tsukuba)• Tamara Kolda (Sandia Nat. Labs)

KDD'10 C. Faloutsos 9

Page 10: Thank you!

CMU SCS

Endorsers, cont’d

• Flip Korn (AT&T Research)• Nick Koudas (University of Toronto)• Hans-Peter Kriegel • Ravi Kumar (Yahoo! Research)• Laks Lakshmanan (UBC)• Jure Leskovec (Stanford University)

KDD'10 C. Faloutsos 10

Page 11: Thank you!

CMU SCS

Endorsers, cont’d

• Nikos Mamoulis (Hong Kong University)• Heikki Manilla (Aalto University, • Dharmendra S. Modha (IBM Research)• Mario Nascimento (University of Alberta)• Jennifer Neville (Purdue University)• Beng Chin Ooi (National University of Singapore)

KDD'10 C. Faloutsos 11

Page 12: Thank you!

CMU SCS

Endorsers, cont’d

• Dimitris Papadias (Hong Kong University of Science and Technology)

• Spiros Papadimitriou (IBM Research)• Jian Pei (Simon Fraser University)• Foster Provost (New York University)• Oliver Schulte (Simon Fraser University)• Dennis Shasha (New York University)• Srinivasan Parthasarathy (OSU)

KDD'10 C. Faloutsos 12

Page 13: Thank you!

CMU SCS

Endorsers, cont’d

• Jimeng Sun (IBM Research)• Dacheng Tao (Nanyang University of Technology)• Yufei Tao (The Chinese University of Hong Kong)• Evimaria Terzi (Boston University)• Alex Thomo (University of Victoria)• Andrew Tomkins (Google Research)

KDD'10 C. Faloutsos 13

Page 14: Thank you!

CMU SCS

Endorsers, cont’d

• Caetano Traina (University of Sao Paulo)• Vassilis Tsotras (University of California, Riverside)• Alex Tuzhilin (New York University)• Haixun Wang (Microsoft Research)

KDD'10 C. Faloutsos 14

Page 15: Thank you!

CMU SCS

Endorsers, cont’d

• Wei Wang (University of North Carolina at Chapel Hill)

• Philip S. Yu (University of Illinois, Chicago)• Zhongfei Zhang (Binghamton University, State

University of New York)

KDD'10 C. Faloutsos 15

Page 16: Thank you!

CMU SCS

KDD committee

• Ramasamy Uthurusamy, Chair

• Robert Grossman (University of Illinois at Chicago)

• Jiawei Han (University of Illinois at Urbana-Champaign)

• Tom Mitchell (Carnegie Mellon University)

• Gregory Piatetsky-Shapiro (KDnuggets)

KDD'10 C. Faloutsos 16

Page 17: Thank you!

CMU SCS

KDD committee cnt’d

• Raghu Ramakrishnan (Yahoo! Research)

• Sunita Sarawagi (Indian Institute of Technology, Bombay)

• Padhraic Smyth (University of California at Irvine)

• Ramakrishnan Srikant (Google Research)

KDD'10 C. Faloutsos 17

Page 18: Thank you!

CMU SCS

KDD committee cnt’d

• Xindong Wu (University of Vermont)

• Mohammed J. Zaki (Rensselaer Polytechnic Institute)

KDD'10 C. Faloutsos 18

Page 19: Thank you!

CMU SCS

Family• Parents Nikos & Sophia

• Siblings Michalis*, Petros*, Maria

• Wife Christina#

(*) : and co-authors(#) : and research impact evaluator (‘grandpa’ test - see later…)

KDD'10 C. Faloutsos 19

Page 20: Thank you!

CMU SCS

KDD'10 C. Faloutsos 20

Academic ‘parents’

• Christodoulakis, Stavros (T.U.C.)

• Sevcik, Ken (U of T)

• Roussopoulos, Nick (UMD)

Page 21: Thank you!

CMU SCS

KDD'10 C. Faloutsos 21

Academic ‘children’

• King-Ip (David) Lin• Ibrahim Kamel• Flip Korn• Byoung-Kee Yi• Leejay Wu• Deepayan Chakrabarti

Page 22: Thank you!

CMU SCS

KDD'10 C. Faloutsos 22

Academic ‘children’

• Jia-Yu (Tim) Pan

• Spiros Papadimitriou

• Jimeng Sun

• Jure Leskovec

• Hanghang Tong

Page 23: Thank you!

CMU SCS

KDD'10 C. Faloutsos 23

Academic ‘children’

• Mary McGlohon• Fan Guo• Lei Li• Leman Akoglu• Dueng Horng (Polo) Chau• Aditya Prakash• U Kang

Page 24: Thank you!

CMU SCS

KDD'10 C. Faloutsos 24

CMU colleagues

• Tom Mitchell• Garth Gibson• Greg Ganger• M. (Satya) Satyanarayanan• Howard Wactlar• Jeannette Wing• + +

Page 25: Thank you!

CMU SCS

KDD'10 C. Faloutsos 25

Co-authors

• [dblp 7/2010:] All 300 of you

• Agma J. M. Traina (22)• Caetano Traina Jr. (20)• …

Page 26: Thank you!

CMU SCS

Funding agencies

• NSF (Maria Zemankova, Frank Olken, ++)

• DARPA, LLNL, PITA

• IBM, MS, HP, INTEL, Y!, Google, Symantec, Sony, Fujitsu, …

KDD'10 C. Faloutsos 26

Page 27: Thank you!

CMU SCS

KDD'10 C. Faloutsos 27

Outline

• Credit where credit is due

• Technical part – Data mining– Can it be automated?– Research challenges

• Non-technical part: `Listen’– To the data– To non-experts

Page 28: Thank you!

CMU SCS

Data mining = compression & …

KDD'10 C. Faloutsos 28

Christos Faloutsos, Vasileios Megalooikonomou: On data mining, compression, and Kolmogorov complexity. Data Min. Knowl. Discov. 15(1): 3-20 (2007)

Page 29: Thank you!

CMU SCS

Data mining = compression & …

KDD'10 C. Faloutsos 29

Christos Faloutsos, Vasileios Megalooikonomou: On data mining, compression, and Kolmogorov complexity. Data Min. Knowl. Discov. 15(1): 3-20 (2007)

Page 30: Thank you!

CMU SCS

Data mining = compression & …

KDD'10 C. Faloutsos 30

But: how can compression• do forecasting?• spot outliers?

• do classification?

Page 31: Thank you!

CMU SCS

Data mining = compression & …

KDD'10 C. Faloutsos 31

OK – then, isn’t compression a solved problem (gzip, LZ)?

Page 32: Thank you!

CMU SCS

… compression is undecidable!

Theorem*: for an arbitrary string x, computing its Kolmogorov complexity K(x) is undecidable

KDD'10 C. Faloutsos 32(*) E.g., [T. M. Cover and J. A. Thomas. Elements of Information Theory. John Wiley and Sons,1991, section 7.7]

A.N. Kolmogorov

EVEN WORSEthan NP-hard!

Page 33: Thank you!

CMU SCS

… compression is undecidable!

…which means there will always be better data mining tools/models/patterns to be discovered

-> job security

-> job satisfaction

KDD'10 C. Faloutsos 33

Page 34: Thank you!

CMU SCS

Let’s see some examples of models

KDD'10 C. Faloutsos 34Body weight

Responseto new drug

Page 35: Thank you!

CMU SCS

Let’s see some examples of models

KDD'10 C. Faloutsos 35income

$ spent

Page 36: Thank you!

CMU SCS

Let’s see some examples of models

KDD'10 C. Faloutsos 36income

$ spent

Page 37: Thank you!

CMU SCS

Let’s see some examples of models

KDD'10 C. Faloutsos 37income

$ spent

Page 38: Thank you!

CMU SCS

Let’s see some examples of models

KDD'10 C. Faloutsos 38income

$ spent

3/4

Page 39: Thank you!

CMU SCS

KDD'10 C. Faloutsos 39

Metabolicrate

3/4

mass

Let’s see some examples of models

http://universe-review.ca /R10-35-metabolic.htm

Page 40: Thank you!

CMU SCS

KDD'10 C. Faloutsos 40

Metabolicrate

3/4

mass

Kleiberg’s law

http://universe-review.ca /R10-35-metabolic.htm

Page 41: Thank you!

CMU SCS

KDD'10 C. Faloutsos 41

Outline

• Credit where credit is due

• Technical part – Data mining– Can it be automated? NO!

• Always room for better models

– Research challenges

• Non-technical part: `Listen’– To the data– To non-experts

Page 42: Thank you!

CMU SCS

Always room for better models

• Eg.: clustering – k-means (or our favorite clustering algo)

• How many clusters are in the Sierpinski triangle?

KDD'10 C. Faloutsos 42

Page 43: Thank you!

CMU SCS

Always room for better models

KDD'10 C. Faloutsos 43

Page 44: Thank you!

CMU SCS

Always room for better models

KDD'10 C. Faloutsos 44

K=3 clusters?

Page 45: Thank you!

CMU SCS

Always room for better models

KDD'10 C. Faloutsos 45

K=3 clusters?K=9 clusters?

Page 46: Thank you!

CMU SCS

Always room for better models

KDD'10 C. Faloutsos 46

Piece-wiseflat

Mixtureof (Gaussian)

clusters

Page 47: Thank you!

CMU SCS

Always room for better models

KDD'10 C. Faloutsos 47

Piece-wiseflat

Mixtureof (Gaussian)

clusters

¾ Powerlaw

??

Page 48: Thank you!

CMU SCS

Always room for better models

KDD'10 C. Faloutsos 48

Piece-wiseflat

Mixtureof (Gaussian)

clusters

¾ Powerlaw

ONE, butSelf-similar

‘cluster’

Page 49: Thank you!

CMU SCS

Always room for better models

KDD'10 C. Faloutsos 49

ONE, butSelf-similar

‘cluster’

• Barnsley’s method of IFS (iterated function systems) can easily generate it [Barnsley+Sloan, BYTE, 1988]

~100 lines of C code: www.cs.cmu.edu~/christos/www/SRC/ifs.tar

Page 50: Thank you!

CMU SCS

Always room for better models

KDD'10 C. Faloutsos 50

• But, does self-similarity appear in real life?

Page 51: Thank you!

CMU SCS

Real, self similar dataset

KDD'10 C. Faloutsos 51

Page 52: Thank you!

CMU SCS

Real, self similar dataset

KDD'10 C. Faloutsos 52

Page 53: Thank you!

CMU SCS

Real, self similar dataset

KDD'10 C. Faloutsos 53

Page 54: Thank you!

CMU SCS

Real, self similar dataset

KDD'10 C. Faloutsos 54

Page 55: Thank you!

CMU SCS

KDD'10 C. Faloutsos 55

• the red is true• origin: Norway•but most other coastlines are ‘self-similar’, too!

Page 56: Thank you!

CMU SCS

How can we find better models?

• Obviously, an art (‘undecidable’!)

• Helps if we– Listen to domain experts and– Listen to the data (next)

KDD'10 C. Faloutsos 56

Page 57: Thank you!

CMU SCS

KDD'10 C. Faloutsos 57

Outline

• Credit where credit is due

• Technical part – Data mining– Can it be automated? NO!– Research challenges

• Listen to the data (the more, the better!)

• Non-technical part: `Listen’– To the data– To non-experts

Page 58: Thank you!

CMU SCS

KDD'10 C. Faloutsos 58

Scalability

• Google: > 450,000 processors in clusters of ~2000 processors each [Barroso, Dean, Hölzle, “Web Search for a Planet: The Google Cluster Architecture” IEEE Micro 2003]

• Yahoo: ~5Pb of data [Fayyad’07]

• ‘M45’: 4K proc’s, 3Tb RAM, 1.5 Pb disk

Page 59: Thank you!

CMU SCS

Promising research direction: scalability

• challenges– Vast amounts of data; storing; cooling (!); …

• … and opportunities:– DATA: Easier to collect (clickstreams, sensors

etc)– S/W: Hadoop, hbase, pig, … : open source– H/W: 1Tb disk: ~ US$ 100

KDD'10 C. Faloutsos 59

Page 60: Thank you!

CMU SCS

Promising research direction

• The more data, the more subtle patterns we may discover

• Examples of subtle patterns:

KDD'10 C. Faloutsos 60

Page 61: Thank you!

CMU SCS

More data, more subtle patterns

KDD'10 C. Faloutsos 61

Mukund Seshadri, Sridhar Machiraju, Ashwin Sridharan, Jean Bolot, Christos Faloutsos, Jure Leskovec: Mobile call graphs: beyond power-law and lognormal distributions. KDD 2008: 596-604

Duration (log scale)

PDF: fraction of customers (log scale)

Page 62: Thank you!

CMU SCS

More data, more subtle patterns

KDD'10 C. Faloutsos 62

Mukund Seshadri, Sridhar Machiraju, Ashwin Sridharan, Jean Bolot, Christos Faloutsos, Jure Leskovec: Mobile call graphs: beyond power-law and lognormal distributions. KDD 2008: 596-604

Duration (log scale)

PDF: fraction of customers (log scale)

(mixture of)Gaussians

Page 63: Thank you!

CMU SCS

More data, more subtle patterns

KDD'10 C. Faloutsos 63

Mukund Seshadri, Sridhar Machiraju, Ashwin Sridharan, Jean Bolot, Christos Faloutsos, Jure Leskovec: Mobile call graphs: beyond power-law and lognormal distributions. KDD 2008: 596-604

Page 64: Thank you!

CMU SCS

More data, more subtle patterns

KDD'10 C. Faloutsos 64

Mukund Seshadri, Sridhar Machiraju, Ashwin Sridharan, Jean Bolot, Christos Faloutsos, Jure Leskovec: Mobile call graphs: beyond power-law and lognormal distributions. KDD 2008: 596-604

Zipf(Pareto,

Power-law)

Page 65: Thank you!

CMU SCS

More data, more subtle patterns

KDD'10 C. Faloutsos 65

Mukund Seshadri, Sridhar Machiraju, Ashwin Sridharan, Jean Bolot, Christos Faloutsos, Jure Leskovec: Mobile call graphs: beyond power-law and lognormal distributions. KDD 2008: 596-604

Page 66: Thank you!

CMU SCS

More data, more subtle patterns

KDD'10 C. Faloutsos 66

Mukund Seshadri, Sridhar Machiraju, Ashwin Sridharan, Jean Bolot, Christos Faloutsos, Jure Leskovec: Mobile call graphs: beyond power-law and lognormal distributions. KDD 2008: 596-604

lognormal

Page 67: Thank you!

CMU SCS

More data, more subtle patterns

KDD'10 C. Faloutsos 67

Mukund Seshadri, Sridhar Machiraju, Ashwin Sridharan, Jean Bolot, Christos Faloutsos, Jure Leskovec: Mobile call graphs: beyond power-law and lognormal distributions. KDD 2008: 596-604

Page 68: Thank you!

CMU SCS

More data, more subtle patterns

KDD'10 C. Faloutsos 68

Mukund Seshadri, Sridhar Machiraju, Ashwin Sridharan, Jean Bolot, Christos Faloutsos, Jure Leskovec: Mobile call graphs: beyond power-law and lognormal distributions. KDD 2008: 596-604

dPln(=doubly

ParetoLognormal)

Page 69: Thank you!

CMU SCS

So, dPln is the answer?

KDD'10 C. Faloutsos 69

Page 70: Thank you!

CMU SCS

So, dPln is the answer?

KDD'10 C. Faloutsos 70

Yes, for the moment…

Page 71: Thank you!

CMU SCS

So, dPln is the answer?

KDD'10 C. Faloutsos 71

With more data, who knows?!

Page 72: Thank you!

CMU SCS

KDD'10 C. Faloutsos 72

Outline

• Credit where credit is due

• Technical part – Data mining– Can it be automated? NO!– Research challenges

• Listen to the data (the more, the better!)

• Non-technical part: ‘Listen’– To the data– To non-experts

Page 73: Thank you!

CMU SCS

Listen to non-experts

• Explain ‘why’, to a non-expert (‘grandpa’)

• (and, even harder, explain ‘how’ – e.g.:– Frobenious Perron for irreducible MC

KDD'10 C. Faloutsos 73

Page 74: Thank you!

CMU SCS

Listen to non-experts

• Explain ‘why’, to a non-expert (‘grandpa’)

• (and, even harder, explain ‘how’ – e.g.:– Frobenious Perron for irreducible MC ->

pageRank -> random surfer

KDD'10 C. Faloutsos 74

Page 75: Thank you!

CMU SCS

Summary• Data mining = compression = undecidable =

job security • Hence: always room for better

models/patterns

– Listen to the data (Gb, Tb and Pb of them!)

– Listen to domain experts (e.g., ¾ Kleiberg’s law)

• Listen to non-experts (‘explain to grandpa’)

KDD'10 C. Faloutsos 75

Page 76: Thank you!

CMU SCS

Compression, fun, recursion

• The shortest, recursive joke:

• There are 3 types of data miners

KDD'10 C. Faloutsos 76

Page 77: Thank you!

CMU SCS

Compression, fun, recursion

• The shortest, recursive joke:

• There are 3 types of data miners– Those who can count

KDD'10 C. Faloutsos 77

Page 78: Thank you!

CMU SCS

Compression, fun, recursion

• The shortest, recursive joke:

• There are 3 types of data miners– Those who can count– And those who can not

KDD'10 C. Faloutsos 78

Page 79: Thank you!

CMU SCS

Thank you!For the honor,and for making this wonderful research community

KDD'10 C. Faloutsos 79