A recurring theme in the mathematics ofwebspace.ship.edu/deensley/MathSportsMAA.pdf · 2010. 4....
Transcript of A recurring theme in the mathematics ofwebspace.ship.edu/deensley/MathSportsMAA.pdf · 2010. 4....
A recurring theme in
the mathematics of
sports
Doug Ensley
Shippensburg University
April is Mathematics Awareness Month!
April is also…
•Alcohol Awareness Month
•National Oral Health Month
•Stress Awareness Month
•Jazz Appreciation Month
•Train Safety Month
Esoterica(pedia)
The longest known singles tennis game was one of 80
points between Anthony Fawcett (Rhodesia) and Keith
Glass (Great Britain) in the first round of the Surrey,
Great Britain Championships on 26 May 1975.
QUESTION:
What is the probability of this happening by chance?
What assumptions on a model of a tennis game can
account for this freakish phenomenon?
Scoring in tennis
Essentially a game in tennis is won by the first
player with 4 points, but that player must win by 2
points.
When the score is tied 3-3, 4-4, etc., we say the
score is at deuce.
After a deuce score, when the server is up one
point, we say the score is ad in, and when the
receiver is up one point, the score is ad out.
The 2005 Fawcett-Glass game had deuce 37 times.
Expected ValueProblem. What is the expected length of a tennis game
which begins tied at deuce and in which player A wins a point with probability p?
Background: For a random (quantitative, discrete) variable X (e.g., number of points in a tennis game), the expected value of X is a weighted average of the possible values of X; specifically, if the possible values of X are v0, v1, v2, …, then
k
kk vXvXE )Pr()()(
Aside: Average Value
Suppose for the experiment of “choosing a
random member of the Ensley family” on
04/09/2010, we define the variable X = the
age of the person chosen. The following
table shows the four possible values of X
as well as the probability each is chosen.
Value 14 17 45 46
Pr(X=Value) 0.25 0.25 0.25 0.25
Aside: Average Value
What is the average age of people in the
Ensley house today?
5.30
)25.0()46()25.0()45(
)25.0()17()25.0()14()(
XE
Value 14 17 45 46
Pr(X=Value) 0.25 0.25 0.25 0.25
Tennis, anyone?Problem. What is the expected length of a tennis game
which begins tied at deuce and in which player A wins a point with probability p?
Let X = the number of points that are played after
deuce. What is the set of all possible values of X?
Tennis, anyone?Problem. What is the expected length of a tennis game
which begins tied at deuce and in which player A wins a point with probability p?
Let X = the number of points that are played after
deuce. What is the set of all possible values of X?
According to the definition, the expected value is the
infinite series
0
)Pr()()(k
kXkXE
NOTE: The distribution of the values of X is called a geometric distribution
in probability theory.
Tennis, anyone?Problem. What is the expected length of a tennis game
which begins tied at deuce and in which player A wins a point with probability p?
Solution. Let S be the set of all outcomes of this
experiment. That is,S = {AA, BB, ABAA, ABBB, BAAA, BABB, ABBAAA, …}
Hence, every element of S is either
• AA alone, or • BB alone, or• of the form AB____ or • of the form BA____ , where the blank is filled by any
element of S.
Problem. What is the expected length of a tennis game which begins tied at deuce and in which player A wins a point with probability p?
Solution. Let S be the set of all outcomes of this
experiment. That is,S = {AA, BB, ABAA, ABBB, BAAA, BABB, …}
Let L be the average length of a string in S.
• AA alone,
• or BB alone,
• or AB____,
• or BA____.
Problem. What is the expected length of a tennis game which begins tied at deuce and in which player A wins a point with probability p?
Solution. Let S be the set of all outcomes of this
experiment. That is,S = {AA, BB, ABAA, ABBB, BAAA, BABB, …}
Let L be the average length of a string in S.
• AA alone,
• or BB alone,
• or AB____,
• or BA____.
Probability: p∙p = p2
Length: 2
Problem. What is the expected length of a tennis game which begins tied at deuce and in which player A wins a point with probability p?
Solution. Let S be the set of all outcomes of this
experiment. That is,S = {AA, BB, ABAA, ABBB, BAAA, BABB, …}
Let L be the average length of a string in S.
• AA alone,
• or BB alone,
• or AB____,
• or BA____.
Probability: p∙p = p2
Length: 2
Probability: (1 – p)2
Length: 2
Problem. What is the expected length of a tennis game which begins tied at deuce and in which player A wins a point with probability p?
Solution. Let S be the set of all outcomes of this
experiment. That is,S = {AA, BB, ABAA, ABBB, BAAA, BABB, …}
Let L be the average length of a string in S.
• AA alone,
• or BB alone,
• or AB____,
• or BA____.
Probability: p∙p = p2
Length: 2
Probability: (1 – p)2
Length: 2
Probability: p∙(1 – p)
Length: 2 + L
Problem. What is the expected length of a tennis game which begins tied at deuce and in which player A wins a point with probability p?
Solution. Let S be the set of all outcomes of this
experiment. That is,S = {AA, BB, ABAA, ABBB, BAAA, BABB, …}
Let L be the average length of a string in S.
• AA alone,
• or BB alone,
• or AB____,
• or BA____.
Probability: p∙p = p2
Length: 2
Probability: p∙(1 – p)
Length: 2 + L
Probability: (1 – p)2
Length: 2
Probability: (1 – p)∙p
Length: 2 + L
Problem. What is the expected length of a tennis game which begins tied at deuce and in which player A wins a point with probability p?
Solution. S = {AA, BB, ABAA, ABBB, BAAA, BABB, …}Elements of S (Probability)∙(Length)
• AA alone p2 (2)
• or BB alone, (1 – p)2 (2)
• or AB____, p (1 – p) (2 + L)
• or BA____. (1 – p) p (2 + L)
The average length L of elements of S satisfies the
equation
L = p2 (2) + (1 – p)2 (2) + 2 p (1 – p) (2 + L)
which has solution
222 )1(
2
122
2
ppppL
Average length of a tennis game
beyond “deuce”
22 )1(
2
ppL
NOTE: Probability theory tells us that the variance of the geometric distribution
of X is given by 8p(1-p)/(p2+(1-p)2)2, which has maximum value of 8.
A more general problemIn tennis a deuce point is always served from the right-hand
service court; an ad point is always served from the left-hand
service court. Tennis broadcasts often present data on players as if
there is no difference. While this might be sound at the highest
levels of tennis, it is certainly not true for amateur players. We will
try the previous solution method allowing for p and q to differ.
Seriously?
A more general problemProblem. What is the expected length of a tennis game
which begins tied at deuce and in which player Awins a deuce point with probability p and an ad point with probability q?
A more general problemProblem. What is the expected length of a tennis game
which begins tied at deuce and in which player Awins a deuce point with probability p and an ad point with probability q?
Solution. S = {AA, BB, ABAA, ABBB, BAAA, BABB, …}Every element of S is either
• AA alone, or • BB alone, or • of the form AB____ , or• of the form BA____ , where the blank is filled by any element of S.
Problem. What is the expected length of a tennis game which begins tied at deuce and in which player Awins a deuce point with probability p and an ad point with probability q?
Solution. S = {AA, BB, ABAA, ABBB, BAAA, BABB, …}Elements of S (Probability)∙(Length)
• AA alone p∙q∙(2)
• or BB alone, (1 – p)∙(1 – q)∙(2)
• or AB____, p∙(1 – q)∙(2 + L)
• or BA____. (1 – p)∙q∙(2 + L)
The average length L of elements of S satisfies the
equation
L = 2pq + 2(1–p)(1–q) + p(1–q)(2+L) + (1–p)q(2+L)
which has solution
)1)(1(
2
12
2
qppqqppqL
)1)(1(
2),(
qppqqpf
Examples
When q = 0.40, the maximum L is 5 with variance = 6.0.
When q = 0.30, the maximum L is 6.7 with variance = 9.3.
When q = 0.20, the maximum L is 10 with variance = 16.0.
When q = 0.10, the maximum L is 20 with variance = 36.0.
Even in the last, extreme case a 74-point game is 9 standard deviations
above the mean.
Theorem. The expected length L of a tennis game which begins tied at deuce and in which player A wins a deuce point with probability p and an ad point with probability q is given by
)1)(1(
2
12
2
qppqqppqL
Alternative game scoring
Some tennis matches or leagues employ "No-Ad" scoring.
Each game proceeds as in regular tennis scoring, but if
the score reaches deuce, then the winner of the next
point, the seventh in the game, wins the game. The
receiver selects which court to receive in. No-ad scoring
is most notably used in World Team Tennis, in many
recreational leagues, and some Major Mixed doubles
events.
Note: This scoring system assumes that the server is not
equally effective from deuce and ad courts.
Other problems to approach
Problem. What is the probability that player A(who has probability p of winning a point) wins a tennis game that begins tied at deuce?
Solution. Let t represent the probability of player A winning once the game is tied at deuce. Use recursive thinking to justify the equation
t = p2 + p (1 – p) t + (1 – p) p t
Solving this equation yields22
2
)1( pp
pt
Other problemsProbability of winning a tennis game
22
2
)1()Pr(
pp
pA
What does the data say?
In the 2009 Wimbledon Championship, Andy
Roddick won 71% of his service points and
Roger Federer won 78% of his service points.
Federer won 95% (35 of 37) of his service
games and Roddick won 98% (37 of 38) of his
service games. (There were two tie-breakers
played, split between the two players.)
Based on these point probabilities, the model
predicts 93% of service games won by Roddick
and 86% by Federer.
Tennis as a gambling problem
Suppose two players A and B have $2 each,
and they play a sequence of games with
$1 at stake each time until someone is out
of money. This is also known as the
Gambler’s Ruin – it generalizes nicely to
other starting values.
Tennis as a board game
State 1: A wins game
State 2: A up 1 point
State 3: DEUCE
State 4: B up 1 point
State 5: B wins game
Markov chains
States of the game
State 1: A wins game
State 2: A up 1 point
State 3: DEUCE
State 4: B up 1 point
State 5: B wins game
10000
3/103/200
03/103/20
003/103/2
00001
Transition Matrix.
Say the probability of A winning any point is 2/3. The transition matrix gives the probabilities of moving between states in one points.
Markov chains
Matrix multiplication
10000
3/103/200
03/103/20
003/103/2
00001
10000
3/103/200
03/103/20
003/103/2
00001
Matrix multiplication
Row 3 times Column 3 …
)0)(0()3/1)(3/2()0)(0()3/2)(3/1()0)(0(
0
3/1
0
3/2
0
03/203/10
… gives the probability of going from State 3 to State 3 in
two moves
Markov chains
Markov chains
Matrix multiplication
10000
3/19/209/40
9/109/409/4
09/109/23/2
00001
10000
3/103/200
03/103/20
003/103/2
000012
The entry in Row i, Column j of M2 is the probability of the game
progressing from State i to State j in exactly 2 moves.
Markov chains
General Matrix Powers
If M is a transition matrix for a game, then
the entry in Row i, Column j of Mk is
the probability of the game progressing
from State i to State j in exactly k
moves.
This allows us to compute the probability that a
game lasts a specified number of points!
Markov chains
10000
467.01050109533.0
200.001090800.0
067.01020105933.0
00001
10000
3/103/200
03/103/20
003/103/2
00001
1414
14
1414
74
This shows that the probability that the game is still going on after 74 points is about 10-13.
Probability of long games
With p = q = 0.5 …
10000
750.01040104250.0
500.001070500.0
250.01040104750.0
00001
10000
2/102/100
02/102/10
002/102/1
00001
1212
12
1212
74
This shows that the probability that the game is still going on after 74 points is about 10-11. This is the most optimistic outcome for the case where p = q.
Probability of long games
With p = 0.67 and q = 0.25 …
10000
900.01030106100.0
600.001020400.0
450.01090102550.0
00001
10000
4/304/100
03/103/20
004/304/1
00001
1010
9
109
74
This shows that the probability that the game is still going on after 74 points is about 10-9.
Probability of long games
With p = 0.90 and q = 0.05 …
10000
984.0102010202.0
676.001040320.0
642.01040104354.0
00001
10000
20/19020/100
010/1010/90
0020/19020/1
00001
54
3
43
74
This shows that the probability that the game is still going on after 74 points is about 0.004.
What does the data say?
In the 2009 Wimbledon Championship, Andy
Roddick won 71% of his service points and
Roger Federer won 78% of his service points.
Federer won 95% (35 of 37) of his service
games and Roddick won 98% (37 of 38) of his
service games. (There were two tie-breakers
played, split between the two players.)
Based on these point probabilities, the model
predicts 93% of service games won by Roddick
and 86% by Federer.
Other sports with recurrence
Notes.
Cal Ripken, Jr. was 2 for 13 for Rochester in this game.
Wade Boggs went 4 for 12 for Pawtucket.
Baseball
A game cannot end in a tie so additional whole innings are played until there is
a winner. The longest professional baseball game was a 33 inning affair
played in 1981 at McCoy Stadium in Pawtucket, Rhode Island:
Other sports with recurrence
Baseball
An “at bat” can last any number of pitches. We can list the possible states as
counts of 0-0, 0-1, 1-0, 1-1, 0-2, 2-0, 1-2, 2-1, 3-0, 2-2, 3-1 or 3-2, base hit,
strike out, or base on balls. We can then relate the probability p of getting a
hit on any given pitch with the official batting average.
There are no official records for number of pitches in an “at bat,” but here is
some baseball lore:
• Alex Cora had an 18-pitch at bat against Matt Clement in 2004.
• Roy Thomas (1901) supposedly had a 29-pitch at bat. His ability to foul
away pitches supposedly brought about a rule change re: foul balls.
• Luke Appling supposedly fouled off 17 straight pitches before hitting a
triple.
• Phillies’ pitcher Brett Myers had a 9-pitch at bat against CC Sabathia in the
2008 playoffs.
More tennis esoterica
Most games in a singles match before the introduction of the tiebreaker:
In 1969 at Wimbledon, Pancho Gonzales took 112 games to defeat Charlie
Pasarell in the first round 22–24, 1–6, 16–14, 6–3, 11–9.
Most games in a singles match after the introduction of the tiebreaker: In
2003 at the Australian Open, Andy Roddick took 83 games to defeat Younes
El Aynaoui in the quarterfinals 4–6, 7–6(5), 4–6, 6–4, 21–19.
Most games in a doubles match before the introduction of the tiebreaker:
In the American Zone Final of the 1973 Davis Cup, the United States team
of Stan Smith and Erik Van Dillen took 122 games to defeat the Chile team
of Patricio Cornejo and Jaime Fillol 7–9, 37–39, 8–6, 6–1, 6–3.
Most games in a doubles match after the introduction of the tiebreaker:
In 2007 at Wimbledon, the team of Marcelo Melo and André Sá took 102
games to defeat the team of Paul Hanley and Kevin Ullyett 5–7, 7–6(4), 4–6,
7–6(7), 28–26.
References
Math Awareness Month at http://www.mathaware.org
Tennis Statistics at http://www.atpworldtour.com/
Baseball Statistics at http://www.baseball-reference.com/
Doug Ensley, [email protected]