Amy Langville, Associate Professor of Mathematics, The College of Charleston in South Carolina at...

38
4 Consulting Projects from this past year September 19, 2014 Machine Learning 2014 Amy Langville Mathematics Department College of Charleston [email protected] 1

description

My talk will cover four ranking and clustering projects that I consulted on this past year. The projects range from ranking Olympic athletes, mixed martial arts fighters, and cell phone carriers to clustering sentences to rank individuals by how much humility they evidence in their written language. For each project, I will address the particular data challenges and the solutions and techniques we proposed.

Transcript of Amy Langville, Associate Professor of Mathematics, The College of Charleston in South Carolina at...

Page 1: Amy Langville, Associate Professor of Mathematics, The College of Charleston in South Carolina at MLconf ATL

1

4 Consulting Projects from this past yearSeptember 19, 2014

Machine Learning 2014

Amy LangvilleMathematics Department

College of [email protected]

Page 2: Amy Langville, Associate Professor of Mathematics, The College of Charleston in South Carolina at MLconf ATL

2

Tyler PeriniMathematics Department

College of [email protected]

4 Consulting Projects from this past year

Amy LangvilleMathematics Department

College of [email protected]

Page 3: Amy Langville, Associate Professor of Mathematics, The College of Charleston in South Carolina at MLconf ATL

3

4 Consulting Projects from this past year

Tyler PeriniMathematics Department

College of [email protected]

Amy LangvilleMathematics Department

College of [email protected]

Page 4: Amy Langville, Associate Professor of Mathematics, The College of Charleston in South Carolina at MLconf ATL

4

2 Books generate questions

US Olympic Projects

CageRank

Ranking Cell Phone Carriers

The Humility Project

Outline

Page 5: Amy Langville, Associate Professor of Mathematics, The College of Charleston in South Carolina at MLconf ATL

5

2 Books generate questions

1232-1315

Page 6: Amy Langville, Associate Professor of Mathematics, The College of Charleston in South Carolina at MLconf ATL

6

2 Books generate questions

1232-1315

Chapter 7 talks about . . . but I need to . . . Any advice?

Page 7: Amy Langville, Associate Professor of Mathematics, The College of Charleston in South Carolina at MLconf ATL

7

2 Books generate questions

1232-1315

Chapter 7 talks about . . . but I need to . . . Any advice?

I really enjoyed your book, but my problem is . . ., which you

don’t mention. How do I solve it?

Page 8: Amy Langville, Associate Professor of Mathematics, The College of Charleston in South Carolina at MLconf ATL

8

Project 1: from U.S. Olympic Committee

Page 9: Amy Langville, Associate Professor of Mathematics, The College of Charleston in South Carolina at MLconf ATL

9

Project 1: from U.S. Olympic Committee

Problem 1:Your book talks a lot about ranking in head-to-head contests (and that was helpful), but we need to rank

multi-competitor sports like downhill skiing and gymnastics.

Page 10: Amy Langville, Associate Professor of Mathematics, The College of Charleston in South Carolina at MLconf ATL

10

Project 1: from U.S. Olympic Committee

Problem 1:

Solution 1: TRUESKILL

μ = average skill

σ = uncertainty

Your book talks a lot about ranking in head-to-head contests (and that was helpful), but we need to rank

multi-competitor sports like downhill skiing and gymnastics.

Page 11: Amy Langville, Associate Professor of Mathematics, The College of Charleston in South Carolina at MLconf ATL

11

Page 12: Amy Langville, Associate Professor of Mathematics, The College of Charleston in South Carolina at MLconf ATL

12

Project 1: from U.S. Olympic Committee

1st

3rd

2nd

Page 13: Amy Langville, Associate Professor of Mathematics, The College of Charleston in South Carolina at MLconf ATL

13

Project 1: from U.S. Olympic Committee

1st

3rd

2nd

Page 14: Amy Langville, Associate Professor of Mathematics, The College of Charleston in South Carolina at MLconf ATL

14

Project 1: from U.S. Olympic Committee

2nd

3rd

1st

Page 15: Amy Langville, Associate Professor of Mathematics, The College of Charleston in South Carolina at MLconf ATL

15

Project 1: from U.S. Olympic Committee

Problem 2:Your book talks a lot about ranking

in head-to-head contests where there are multiple matches

between competitors, but our data is sparse. Any advice?

Page 16: Amy Langville, Associate Professor of Mathematics, The College of Charleston in South Carolina at MLconf ATL

16

Page 17: Amy Langville, Associate Professor of Mathematics, The College of Charleston in South Carolina at MLconf ATL

17

Problem:

Solution: FIND SIMILAR FIGHTERS to densify the graph

Project 2: CageRank

You talk a lot about ranking head-to-head contests, like ours [MMA

fights], but our data is really sparse. How do we deal with that?

Page 18: Amy Langville, Associate Professor of Mathematics, The College of Charleston in South Carolina at MLconf ATL

UFC 163Phil Davis Lyoto Machida

Page 19: Amy Langville, Associate Professor of Mathematics, The College of Charleston in South Carolina at MLconf ATL

UFC 163Phil Davis Lyoto Machida

had never fought each other

Page 20: Amy Langville, Associate Professor of Mathematics, The College of Charleston in South Carolina at MLconf ATL

College football vs. UFC

Page 21: Amy Langville, Associate Professor of Mathematics, The College of Charleston in South Carolina at MLconf ATL

UFC 163Rashad Evans 1

Ryan Bader 2Alexander Gustafson 3

Antonio Rogerio Nogueira 4Quinton “Rampage” Jackson

5Chael Sonnen 6

Matt Hamill 7James Te-Huna 8

Dan Henderson 9Vladimir Matyushenko 10

Phil Davis Lyoto Machida1 Ricardo Arona

2 Jason Brilz

3 Ryan Bader

4 Stephan Bonnar5 Randy Couture6 Trevor Prangley

7 Tito Ortiz

8 Mark Coleman

9 Ovince St. Preux10 Chael Sonnen

Find 10 most similar

fighters to each

Similar by? Fightmetric statsSVD SIGNS

Page 22: Amy Langville, Associate Professor of Mathematics, The College of Charleston in South Carolina at MLconf ATL

UFC 163Rashad Evans 1

Ryan Bader 2Alexander Gustafson 3

Antonio Rogerio Nogueira 4Quinton “Rampage” Jackson

5Chael Sonnen 6

Matt Hamill 7James Te-Huna 8

Dan Henderson 9Vladimir Matyushenko 10

Phil Davis Lyoto Machida1 Ricardo Arona

2 Jason Brilz

3 Ryan Bader

4 Stephan Bonnar5 Randy Couture6 Trevor Prangley

7 Tito Ortiz

8 Mark Coleman

9 Ovince St. Preux10 Chael Sonnen

6

Page 23: Amy Langville, Associate Professor of Mathematics, The College of Charleston in South Carolina at MLconf ATL

UFC 163Rashad Evans 1

Ryan Bader 2Alexander Gustafson 3

Antonio Rogerio Nogueira 4Quinton “Rampage” Jackson

5Chael Sonnen 6

Matt Hamill 7James Te-Huna 8

Dan Henderson 9Vladimir Matyushenko 10

Phil Davis Lyoto Machida1 Ricardo Arona

2 Jason Brilz

3 Ryan Bader

4 Stephan Bonnar5 Randy Couture6 Trevor Prangley

7 Tito Ortiz

8 Mark Coleman

9 Ovince St. Preux10 Chael Sonnen

12

6

Question: is the goal to predict the winner or generate buzz?

Page 24: Amy Langville, Associate Professor of Mathematics, The College of Charleston in South Carolina at MLconf ATL

24

Problem:

Project 3: Ranking Cell Phone CarriersRather than individual games between carriers, we have a

distribution of game scores for each carrier. How do we use this

data to rank carriers?

Page 25: Amy Langville, Associate Professor of Mathematics, The College of Charleston in South Carolina at MLconf ATL

25

Problem:

Solution: SIMULATE HEAD-TO-HEAD GAMES BY RANDOM DRAWS FROM DATA, then rank aggregate by BORDA COUNT (#carriers each carrier outranks).

Project 3: Ranking Cell Phone CarriersRather than individual games between carriers, we have a

distribution of game scores for each carrier. How do we use this

data to rank carriers?

Page 26: Amy Langville, Associate Professor of Mathematics, The College of Charleston in South Carolina at MLconf ATL

26

Project 3: Ranking Cell Phone CarriersRather than individual games between carriers, we have a

distribution of game scores for each carrier. How do we use this

data to rank carriers?

Problem:

Solution: SIMULATE HEAD-TO-HEAD GAMES BY RANDOM DRAWS FROM DATA, then rank aggregate by BORDA COUNT (#carriers each carrier outranks).

New Problem: data is loaded with ties!

Page 27: Amy Langville, Associate Professor of Mathematics, The College of Charleston in South Carolina at MLconf ATL

27

Page 28: Amy Langville, Associate Professor of Mathematics, The College of Charleston in South Carolina at MLconf ATL

28

Project 3: Ranking Cell Phone CarriersMARKOV CHAIN

Question: what makes a model good?Stability in the face of small data changesExplainability to public

Page 29: Amy Langville, Associate Professor of Mathematics, The College of Charleston in South Carolina at MLconf ATL

29

Problem:

Project 4: Humility Project

We’re trying to analyze a person’s writing to predict

his/her humility, but we lost our data guy. Can you help us?

Page 30: Amy Langville, Associate Professor of Mathematics, The College of Charleston in South Carolina at MLconf ATL

30

Problem:

Solution: NON-NEGATIVE MATRIX FACTORIZATION (NMF) to find hidden clusters in text.

Project 4: Humility Project

We’re trying to analyze a person’s writing to predict

his/her humility, but we lost our data guy. Can you help us?

Page 31: Amy Langville, Associate Professor of Mathematics, The College of Charleston in South Carolina at MLconf ATL

31

Project 4: Humility Project

Page 32: Amy Langville, Associate Professor of Mathematics, The College of Charleston in South Carolina at MLconf ATL

32

Project 4: Humility Project

Page 33: Amy Langville, Associate Professor of Mathematics, The College of Charleston in South Carolina at MLconf ATL

33

Project 4: Humility Project

Page 34: Amy Langville, Associate Professor of Mathematics, The College of Charleston in South Carolina at MLconf ATL

34

Project 4: Humility Project

Page 35: Amy Langville, Associate Professor of Mathematics, The College of Charleston in South Carolina at MLconf ATL

35

Project 4: Humility Project

Page 36: Amy Langville, Associate Professor of Mathematics, The College of Charleston in South Carolina at MLconf ATL

36

Project 4: Humility Project

Page 37: Amy Langville, Associate Professor of Mathematics, The College of Charleston in South Carolina at MLconf ATL

37

ConclusionsWe need you. You open our eyes to problems we never

would have thought about.

Iterative Collaboration

Many GREAT ALGORITHMS exist. Some just need tweaking.

Page 38: Amy Langville, Associate Professor of Mathematics, The College of Charleston in South Carolina at MLconf ATL

38

ConclusionsWe need you. You open our eyes to problems we never would

have thought about.

Iterative Collaboration

Many GREAT ALGORITHMS exist. Some just need tweaking.

Future Work. . . (you tell me)