Post on 06-Dec-2014
description
1
Headline Goes HereSpeaker Name or Subhead Goes Here
DO NOT USE PUBLICLY PRIOR TO 10/23/12
Doing Data Science on theNFL Play by Play DatasetJesse Anderson | Curriculum Developer and Instructor July 2013 v2
2
Plays
• Advanced NFL stats released all Play by Play since 2002 season• 2,898 total games• 471,392 plays
3
Full Play Entry20121119_CHI@SF,3,17,48,SF,CHI,3,2,76,20,0,(2:48) C.Kaepernick pass short right to M.Crabtree to SF 25 for 1 yard (C.Tillman). Caught at SF 25. 0-yds YAC,0,3,0,27,7 ,2012
4
Play Description
(2:48) C.Kaepernick pass short right to M.Crabtree to SF 25 for 1 yard (C.Tillman). Caught at SF 25. 0-yds YAC
5
There's A Chart for That
6
There's A Custom MapReduce Behind Thatpublic class IncompletesMapper extends Mapper<LongWritable, Text, Text, PassWritable> {
@Overridepublic void map(LongWritable key, Text value, Context context) throws
IOException, InterruptedException {String line = value.toString();
if (line.contains("incomplete")) {Matcher matcher = incompletePass.matcher(line);
if (matcher.find()) {context.write(new Text(matcher.group(1) +
"-" + matcher.group(2)), new PassWritable(1,Integer.parseInt(matcher.group(3))));
7
The Hive Story
Enter the Query
8
Queryable Data
Give me every run play by New Orleans in the 2010 season
9
From the Data: Fourth Downs
15% of 4th downplays weren't kicks
10
Play by Play Pieces
(2:48) C.Kaepernick pass short right to M.Crabtree to SF 25 for 1 yard (C.Tillman). Caught at SF 25. 0-yds YAC
11
From the Data: Sacks
QB sacks and scrambles
double on 3rd downs
12
Hive
• Abstraction on top of MapReduce• Allows queries using a SQL-like
language
13
Hive Query
Give me every run by New Orleans in the 2010 season:
SELECT * FROM playbyplay WHERE playtype = "RUN"and year = 2010and game like "%NO%";
14
From the Data: Yards to Go
With 1 yard to go, 65% of plays are runs
15
Lost in data
Algorithm Alone
16
Data Janitorial
17
From the Data: Number of Plays By Yard Line
9% 41% 28% 18%
3%
Direction of Offense
18
Stadium
19
Figuring Out Stadium
20121119_CHI@SF
Date Played Away Team Home Team
20
From the Data: Stadium Attendance
Stadiums with the smallest capacities average the best
scores 20.55-17.79
21
Stadium DataStadium The capacity of the stadium
Expanded Capacity The expanded capacity of the stadium
Location The location of the stadium
Playing Surface The type of grass, etc that the stadium has
Is Artificial Is the playing surface artificial
Team The name of the team that plays at the stadium
Roof Type The type of roof in the stadium (None, Retractable, Dome)
Elevation The elevation of the stadium
22
From the Data: Stadium Elevation
There is a 1% increase in passes at Mile High versus sea
level stadiums
23
Weather
1,015 games had weather
24
From the Data: Fumble
Games with weather have a fumble 93%
of the timecompared to 56%
without
25
Weather Data
STATION Station identifier
STATION NAME Station location name
READING DATE Date of reading
PRCP Precipitation
AWND Average daily wind speed
WV20 Fog, ice fog, or freezing fog (may include heavy fog)
TMAX Maximum temperature
TMIN Minimum temperature
26
From the Data: Home Field Advantage
Baltimore has the biggest weather advantage 22-14
27
Arrests
28
Arrest Data
Season Player Arrested in (February to February)
Team Team person played on
Player Name of player Arrested
Player Arrested Was a player in the play arrested that season
Offense Player Arrested Offense had player arrested in season
Defense Player Arrested Defense had player arrested in season
Home Team Player Arrested Home Team had player arrested in season
Away Team Player Arrested Away Team had player arrested in season
29
Whenever there are arrests either in the home team, away team or both,
the home team
From 2002 to 2012, each team had many arrests. From to a low in 2002 of
56% to a
91%HIGH OF57%WINS
Arrest = Win?
30
“ ”
The data didn't bear out some of my preconceived notions
31
“ ”
If we had every piece of data about a game, could we determine its outcome?
32
The Low Downs
• /me - http://www.jesse-anderson.com• @jessetanderson• Code - https://github.com/eljefe6a/nfldata
*I am not in any way affiliated with the NFL or any Team
33
34
From the Data: Weather
Wind had the most effect on gamesAt calm winds 41% pass and 37% runAt >30 MPH 34% pass and 46% run
35
From the Data: Field Goals
Weather only increases misses by %114% of Field Goals are missed21% of Field Goals are missed 30-39 MPH average winds