Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina & Tim Chartier,...

36
Learning to Play Sports ML in sports analytics Dr. Tim Chartier Tresata Davidson College tichartier@davidso n.edu Dr. Amy Langville College of Charleston Dept. of Math [email protected] u @timchartier

Transcript of Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina & Tim Chartier,...

Page 1: Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina & Tim Chartier, Chief Researcher, Tresata at MLconf ATL 2016

Learning to Play SportsML in sports analytics

Dr. Tim ChartierTresata

Davidson [email protected]

Dr. Amy LangvilleCollege of Charleston

Dept. of [email protected]

@timchartier

Page 2: Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina & Tim Chartier, Chief Researcher, Tresata at MLconf ATL 2016

Outline of talk

Page 3: Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina & Tim Chartier, Chief Researcher, Tresata at MLconf ATL 2016

Play from benchdata availability

general interest(a.k.a. cool factor)

domain knowledge

Page 4: Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina & Tim Chartier, Chief Researcher, Tresata at MLconf ATL 2016

Application 1: Ranking

Page 5: Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina & Tim Chartier, Chief Researcher, Tresata at MLconf ATL 2016

Apply here

Picture credit: http://orlandonest.files.wordpress.com/2011/03/2011-march-madness-bracket.gif

Page 6: Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina & Tim Chartier, Chief Researcher, Tresata at MLconf ATL 2016

How do we do?• ESPN Tournament Challenge: > 4 million

brackets! • 1st round correct choice = 10 points• nth round correct choice = 2*(previous round)

Page 7: Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina & Tim Chartier, Chief Researcher, Tresata at MLconf ATL 2016

finding ideal weight4 prediction

Page 8: Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina & Tim Chartier, Chief Researcher, Tresata at MLconf ATL 2016

Method 1: crowd source• 2009 – best bracket – 97%• 2010 – best bracket – 99%• 2014 – national media led to thousands of

brackets on: marchmathness.davidson.edu

Page 9: Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina & Tim Chartier, Chief Researcher, Tresata at MLconf ATL 2016

Method 2: learn sports

• vary parameter weights to optimize ESPN score or prediction rate

• subtlety: not all seasons are equally predictive

Page 10: Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina & Tim Chartier, Chief Researcher, Tresata at MLconf ATL 2016

Method 3: mad web

10 years, 50,000 games

Page 11: Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina & Tim Chartier, Chief Researcher, Tresata at MLconf ATL 2016

Application 2: Cats StatsAnalytics for college teams to support coaching.

Page 12: Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina & Tim Chartier, Chief Researcher, Tresata at MLconf ATL 2016

sports analytics keys• coachable• consumable • understandable (informed opinion)

Page 13: Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina & Tim Chartier, Chief Researcher, Tresata at MLconf ATL 2016

impact: coaching

“It kind of blew us away…it really opened our eyes...” – Matt McKillop, NYT

Page 14: Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina & Tim Chartier, Chief Researcher, Tresata at MLconf ATL 2016

impact: off-seasonPlayer Poss. TO% OR% EFG% 2P% 3P%

Brian Sullivan

77 14.3% 20.0% 65.6% 67.4% 40.0%

without 56 23.2% 20.8% 55.3% 42.9% 47.1%

Page 15: Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina & Tim Chartier, Chief Researcher, Tresata at MLconf ATL 2016

Application 3: Lotsa data

Page 16: Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina & Tim Chartier, Chief Researcher, Tresata at MLconf ATL 2016

missile tech

25 frames/sec

Page 17: Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina & Tim Chartier, Chief Researcher, Tresata at MLconf ATL 2016

Filtered for Warriors regular season

Page 18: Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina & Tim Chartier, Chief Researcher, Tresata at MLconf ATL 2016

data we have

Page 19: Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina & Tim Chartier, Chief Researcher, Tresata at MLconf ATL 2016

SportVU-like data

Page 20: Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina & Tim Chartier, Chief Researcher, Tresata at MLconf ATL 2016

MasseyRatings.comcolumn 1 = date of game as measured as days since 1/1/0000column 2 = date in YYYYMMDD formatcolumn 3 = team 1 indexcolumn 4 = team 1 home field (1 = home, -1 = away, 0 = neutral)column 5 = team 1 scorecolumn 6 = team 2 indexcolumn 7 = team 2 home field (1 = home, -1 = away, 0 = neutral)column 8 = team 2 score

Page 21: Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina & Tim Chartier, Chief Researcher, Tresata at MLconf ATL 2016

Tresata DataFor network analysis, Tresata added: • seed• coach’s Madness history• kenpom.com statistics• every season game (and added game stats)What can we learn from about 50,000 games?

Page 22: Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina & Tim Chartier, Chief Researcher, Tresata at MLconf ATL 2016

Data needed• ESPN bracket challenge scores for past years• injuries for every game• score with 2 min or 4 minutes left• learn from Vegas odds• biometric data

Page 23: Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina & Tim Chartier, Chief Researcher, Tresata at MLconf ATL 2016

• If we remove a team and it highly affects reranking, what can we learn about such a team for March Madness?

• How can Buddy Hield light up March Madness?

• Compare Jack Gibbs to Stephen Curry in college play.

media ?’s

Page 24: Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina & Tim Chartier, Chief Researcher, Tresata at MLconf ATL 2016
Page 25: Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina & Tim Chartier, Chief Researcher, Tresata at MLconf ATL 2016

New WorkHow rankable is this dataset?

Page 26: Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina & Tim Chartier, Chief Researcher, Tresata at MLconf ATL 2016

RankabilityData

Apps Amazon productsNetflix moviesFinancial networksTeams

Page 27: Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina & Tim Chartier, Chief Researcher, Tresata at MLconf ATL 2016

Intuitive Ideasone extreme

Dominance graph(very rankable)

Random graph(less rankable)

other extreme

Page 28: Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina & Tim Chartier, Chief Researcher, Tresata at MLconf ATL 2016

InconsistencyUparcs in a rank-ordered graph

5 uparcs

Page 29: Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina & Tim Chartier, Chief Researcher, Tresata at MLconf ATL 2016

InconsistencyUparcs in a rank-ordered graphMinimum Violations Ranking

3 uparcs

Page 30: Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina & Tim Chartier, Chief Researcher, Tresata at MLconf ATL 2016

Inconsistency

BUT this measure of rankability is tied to the ranking.

March Madness 2008 sorted by Massey ratinguparcs = 27.2%

March Madness 2014 sorted by Massey ratinguparcs = 26.9%

Page 31: Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina & Tim Chartier, Chief Researcher, Tresata at MLconf ATL 2016

Goalk-cycles

Create a rankability measure that is independent of ranking.

2-cycles: 1-2-1

2-1-2

5-cycles: 1-2-3-4-5-1 2-3-4-5-

1-2 3-4-5-1-

2-3 4-5-1-2-

3-4 5-1-2-3-

4-5

Page 32: Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina & Tim Chartier, Chief Researcher, Tresata at MLconf ATL 2016

Goalk-cycles

Create a rankability measure that is independent of ranking.

2-cycles: 1-2-1

2-1-2

5-cycles: 1-2-3-4-5-1 2-3-4-5-

1-2 3-4-5-1-

2-3 4-5-1-2-

3-4 5-1-2-3-

4-5

4-paths: 1-2-1-2-1 2-1-2-

1-2

Page 33: Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina & Tim Chartier, Chief Researcher, Tresata at MLconf ATL 2016

Goalk-cycles

Create a rankability measure that is independent of ranking.

2-cycles: 1-2-1

2-1-2

5-cycles: 1-2-3-4-5-1 2-3-4-5-

1-2 3-4-5-1-

2-3 4-5-1-2-

3-4 5-1-2-3-

4-5

4-paths: 1-2-1-2-1 2-1-2-

1-2

Page 34: Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina & Tim Chartier, Chief Researcher, Tresata at MLconf ATL 2016

Future WorkIf a dataset is not very rankable, which edges should we add to the graph to improve its rankability?

Page 35: Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina & Tim Chartier, Chief Researcher, Tresata at MLconf ATL 2016

earn to play sports

data questions applications

Page 36: Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina & Tim Chartier, Chief Researcher, Tresata at MLconf ATL 2016

questions?

Picture credit: http://www.trendir.com/ultra-modern/