May the Best Man Win! - University Of...
Transcript of May the Best Man Win! - University Of...
![Page 1: May the Best Man Win! - University Of Marylandscholar.rhsmith.umd.edu/sites/default/files/iryzhov/...\May the Best Man Win!" Simulation optimization for match-making in e-sports Ilya](https://reader033.fdocuments.in/reader033/viewer/2022041520/5e2dfb9b4eb67d0d6d211779/html5/thumbnails/1.jpg)
“May the Best Man Win!”Simulation optimization for match-making in e-sports
Ilya O. Ryzhov1 Awais Tariq2 Warren B. Powell2
1Robert H. Smith School of BusinessUniversity of Maryland
College Park, MD 20742
2Operations Research and Financial EngineeringPrinceton UniversityPrinceton, NJ 08544
INFORMS Annual MeetingNovember 15, 2011
1 / 34
![Page 2: May the Best Man Win! - University Of Marylandscholar.rhsmith.umd.edu/sites/default/files/iryzhov/...\May the Best Man Win!" Simulation optimization for match-making in e-sports Ilya](https://reader033.fdocuments.in/reader033/viewer/2022041520/5e2dfb9b4eb67d0d6d211779/html5/thumbnails/2.jpg)
Outline
1 Introduction
2 TrueSkill™ model for learning skill levelsLearning with moment-matchingThe DrawChance policy
3 Match-making with knowledge gradients
4 Moving on: Targeting and selection
5 Conclusions
2 / 34
![Page 3: May the Best Man Win! - University Of Marylandscholar.rhsmith.umd.edu/sites/default/files/iryzhov/...\May the Best Man Win!" Simulation optimization for match-making in e-sports Ilya](https://reader033.fdocuments.in/reader033/viewer/2022041520/5e2dfb9b4eb67d0d6d211779/html5/thumbnails/3.jpg)
Outline
1 Introduction
2 TrueSkill™ model for learning skill levelsLearning with moment-matchingThe DrawChance policy
3 Match-making with knowledge gradients
4 Moving on: Targeting and selection
5 Conclusions
3 / 34
![Page 4: May the Best Man Win! - University Of Marylandscholar.rhsmith.umd.edu/sites/default/files/iryzhov/...\May the Best Man Win!" Simulation optimization for match-making in e-sports Ilya](https://reader033.fdocuments.in/reader033/viewer/2022041520/5e2dfb9b4eb67d0d6d211779/html5/thumbnails/4.jpg)
Motivation: e-sports
The term “e-sports” refers to competitive multi-player online gaming
Thousands of players simultaneously log on to networks such as XboxLive or Battle.net
4 / 34
![Page 5: May the Best Man Win! - University Of Marylandscholar.rhsmith.umd.edu/sites/default/files/iryzhov/...\May the Best Man Win!" Simulation optimization for match-making in e-sports Ilya](https://reader033.fdocuments.in/reader033/viewer/2022041520/5e2dfb9b4eb67d0d6d211779/html5/thumbnails/5.jpg)
Motivation: e-sports
Revenues of South Korean game company NCSoft, 2000-2004 (Huhh 2008).
E-sports have become culturally significant and very profitable
Top players from around the world compete professionally
In 2005, Xbox Live had over 2 million subscribers; Battle.net has over3 million registered players for a single game
5 / 34
![Page 6: May the Best Man Win! - University Of Marylandscholar.rhsmith.umd.edu/sites/default/files/iryzhov/...\May the Best Man Win!" Simulation optimization for match-making in e-sports Ilya](https://reader033.fdocuments.in/reader033/viewer/2022041520/5e2dfb9b4eb67d0d6d211779/html5/thumbnails/6.jpg)
Ranking and competition in e-sports
Game services and outside organizations create rankings of players
Casual players are matched up automatically according to their skilllevel
6 / 34
![Page 7: May the Best Man Win! - University Of Marylandscholar.rhsmith.umd.edu/sites/default/files/iryzhov/...\May the Best Man Win!" Simulation optimization for match-making in e-sports Ilya](https://reader033.fdocuments.in/reader033/viewer/2022041520/5e2dfb9b4eb67d0d6d211779/html5/thumbnails/7.jpg)
Simulation optimization for match-making
We would like to create fair and challenging games by matchingplayers of similar skill level
The TrueSkill™ system used by Xbox Live views this as a Bayesianlearning problem in which we sequentially learn players’ skills
Unlike e.g. multi-armed bandit problems (Gittins 1989), the goal is tomatch a target rather than find the most skilled player
We compare a value-of-information procedure to the greedy policyused by Microsoft
7 / 34
![Page 8: May the Best Man Win! - University Of Marylandscholar.rhsmith.umd.edu/sites/default/files/iryzhov/...\May the Best Man Win!" Simulation optimization for match-making in e-sports Ilya](https://reader033.fdocuments.in/reader033/viewer/2022041520/5e2dfb9b4eb67d0d6d211779/html5/thumbnails/8.jpg)
Outline
1 Introduction
2 TrueSkill™ model for learning skill levelsLearning with moment-matchingThe DrawChance policy
3 Match-making with knowledge gradients
4 Moving on: Targeting and selection
5 Conclusions
8 / 34
![Page 9: May the Best Man Win! - University Of Marylandscholar.rhsmith.umd.edu/sites/default/files/iryzhov/...\May the Best Man Win!" Simulation optimization for match-making in e-sports Ilya](https://reader033.fdocuments.in/reader033/viewer/2022041520/5e2dfb9b4eb67d0d6d211779/html5/thumbnails/9.jpg)
Mathematical model
Player i = 0,1, ...,M has an underlying skill level si , unknown to thegame master
Our uncertainty about si is expressed as
si ∼N(
µ0i ,(σ
0i
)2)
The performance of player i in a game is expressed as
pi ∼N(si ,σ
2ε
)We assume that performances and skill levels are independent
9 / 34
![Page 10: May the Best Man Win! - University Of Marylandscholar.rhsmith.umd.edu/sites/default/files/iryzhov/...\May the Best Man Win!" Simulation optimization for match-making in e-sports Ilya](https://reader033.fdocuments.in/reader033/viewer/2022041520/5e2dfb9b4eb67d0d6d211779/html5/thumbnails/10.jpg)
Non-conjugacy of Bayesian model
We say that player i beats player j if pi > pj in a game between theseplayers
Unfortunately, we never observe the exact values of pi or pj , onlywhich player won
Thus, the posterior belief
P (si ∈ ds |pi > pj) =P (pi > pj |si = s)P (si ∈ ds)
P (pi > pj)
is non-normal
Conjugacy is forced using moment-matching (Minka 2001): plug themean and variance of the posterior into a normal distribution
10 / 34
![Page 11: May the Best Man Win! - University Of Marylandscholar.rhsmith.umd.edu/sites/default/files/iryzhov/...\May the Best Man Win!" Simulation optimization for match-making in e-sports Ilya](https://reader033.fdocuments.in/reader033/viewer/2022041520/5e2dfb9b4eb67d0d6d211779/html5/thumbnails/11.jpg)
Moment-matching for approximate conjugacy
Given the beliefs at time n, and the outcome of game n+ 1 between i andj , update (Dangauthier et al. 2007)
µn+1i =
µni +
(σni )
2
σ̄nij·v(
µni −µn
j
σ̄nij
)if pn+1
i > pn+1j ,
µni −
(σni )
2
σ̄nij·v(
µnj −µn
i
σ̄nij
)if pn+1
i < pn+1j ,
(σn+1i
)2=
(σn
i )2
(1− (σn
i )2
σ̄nij·w(
µni −µn
j
σ̄nij
))if pn+1
i > pn+1j ,
(σni )2
(1− (σn
i )2
σ̄nij·w(
µnj −µn
i
σ̄nij
))if pn+1
i < pn+1j ,
with v (x) = φ(x)Φ(x) , w (x) = v (x)(v (x) + x), and
σ̄nij = (σ
ni )2 +
(σnj
)2+ 2σ
2ε .
Intuitively: increase our skill estimate for the winning player.
11 / 34
![Page 12: May the Best Man Win! - University Of Marylandscholar.rhsmith.umd.edu/sites/default/files/iryzhov/...\May the Best Man Win!" Simulation optimization for match-making in e-sports Ilya](https://reader033.fdocuments.in/reader033/viewer/2022041520/5e2dfb9b4eb67d0d6d211779/html5/thumbnails/12.jpg)
Choosing an opponent
In Dangauthier et al. (2007), a game between i and j ends in a draw if
|pi −pj |< δ
for some small δ > 0.12 / 34
![Page 13: May the Best Man Win! - University Of Marylandscholar.rhsmith.umd.edu/sites/default/files/iryzhov/...\May the Best Man Win!" Simulation optimization for match-making in e-sports Ilya](https://reader033.fdocuments.in/reader033/viewer/2022041520/5e2dfb9b4eb67d0d6d211779/html5/thumbnails/13.jpg)
The draw probability
After n games, our prediction that the (n+ 1)st game will end in a draw is
Pn(∣∣∣pn+1
i −pn+1j
∣∣∣< δ
)= IEnPn
(∣∣∣pn+1i −pn+1
j
∣∣∣< δ |si ,sj).
For very small δ ,
Pn(∣∣∣pn+1
i −pn+1j
∣∣∣< δ |si ,sj)≈ 1√
2π (2σ2ε )
e−(si−sj)
2
4σ2ε δ .
We take δ → 0 and define the draw probability as
qnij = IEn
1√2π (2σ2
ε )e−(si−sj)
2
4σ2ε
.
13 / 34
![Page 14: May the Best Man Win! - University Of Marylandscholar.rhsmith.umd.edu/sites/default/files/iryzhov/...\May the Best Man Win!" Simulation optimization for match-making in e-sports Ilya](https://reader033.fdocuments.in/reader033/viewer/2022041520/5e2dfb9b4eb67d0d6d211779/html5/thumbnails/14.jpg)
Choosing an opponent
Thus, the “probability” of a draw between players i and j is
qnij =1√2π
1√(σni
)2+(
σnj
)2+ 2σ2
ε
e
− (µni −µnj )2
2
((σn
i )2
+(σnj )
2+2σ2
ε
).
We expect the game to be more competitive when this quantity is higher.
14 / 34
![Page 15: May the Best Man Win! - University Of Marylandscholar.rhsmith.umd.edu/sites/default/files/iryzhov/...\May the Best Man Win!" Simulation optimization for match-making in e-sports Ilya](https://reader033.fdocuments.in/reader033/viewer/2022041520/5e2dfb9b4eb67d0d6d211779/html5/thumbnails/15.jpg)
Connection to DrawChance
The DrawChance formula given in Herbrich et al. (2006) is
q̃nij =
√√√√ 2σ2ε(
σni
)2+(
σnj
)2+ 2σ2
ε
e
− (µni −µnj )2
2
((σn
i )2
+(σnj )
2+2σ2
ε
)
which is identical to qnij up to a constant scale factor
The DrawChance policy used by Xbox Live greedily selects thematch-up with the highest draw probability
15 / 34
![Page 16: May the Best Man Win! - University Of Marylandscholar.rhsmith.umd.edu/sites/default/files/iryzhov/...\May the Best Man Win!" Simulation optimization for match-making in e-sports Ilya](https://reader033.fdocuments.in/reader033/viewer/2022041520/5e2dfb9b4eb67d0d6d211779/html5/thumbnails/16.jpg)
Outline
1 Introduction
2 TrueSkill™ model for learning skill levelsLearning with moment-matchingThe DrawChance policy
3 Match-making with knowledge gradients
4 Moving on: Targeting and selection
5 Conclusions
16 / 34
![Page 17: May the Best Man Win! - University Of Marylandscholar.rhsmith.umd.edu/sites/default/files/iryzhov/...\May the Best Man Win!" Simulation optimization for match-making in e-sports Ilya](https://reader033.fdocuments.in/reader033/viewer/2022041520/5e2dfb9b4eb67d0d6d211779/html5/thumbnails/17.jpg)
Simulation optimization for match-making
We interpret the match-making problem as online simulationoptimization with the objective
supπ
N
∑n=0
qn0,X π (µn,σn)
where π is a policy for choosing opponents for a fixed player 0
The concept of value of information (Chick 2006) looks ahead to theoutcome of the next decision
This approach can be adapted to many types of objective functions(Frazier et al. 2008; Ryzhov & Powell 2011a)
17 / 34
![Page 18: May the Best Man Win! - University Of Marylandscholar.rhsmith.umd.edu/sites/default/files/iryzhov/...\May the Best Man Win!" Simulation optimization for match-making in e-sports Ilya](https://reader033.fdocuments.in/reader033/viewer/2022041520/5e2dfb9b4eb67d0d6d211779/html5/thumbnails/18.jpg)
Prediction of game outcome
Our Bayesian beliefs provide us with an (approximate) estimate of theoutcome of a hypothetical game between i and j :
Proposition
Under the normality assumption, the probability that player i beats playerj in game n+ 1 is given by
Pn(pn+1i > pn+1
j
)= Φ
µni −µn
j(σni
)2+(
σnj
)2+ 2σ2
ε
.
18 / 34
![Page 19: May the Best Man Win! - University Of Marylandscholar.rhsmith.umd.edu/sites/default/files/iryzhov/...\May the Best Man Win!" Simulation optimization for match-making in e-sports Ilya](https://reader033.fdocuments.in/reader033/viewer/2022041520/5e2dfb9b4eb67d0d6d211779/html5/thumbnails/19.jpg)
Prediction of game outcome
Proof.
We compute
Pn(pn+1i > pn+1
j
)= IEnΦ
(si − sj√
2σ2ε
)
=∫
∞
−∞
Φ
(x√2σ2
ε
)1√
2π
((σni
)2+(
σnj
)2)e
− (x−(µi−µj))2
2
((σn
i )2
+(σnj )
2)dx
and recast the last line as P (X ≤ Y ) where X ∼N(0,2σ2
ε
)and
Y ∼N
(µi −µj ,(σn
i )2 +(
σnj
)2)
are independent.
19 / 34
![Page 20: May the Best Man Win! - University Of Marylandscholar.rhsmith.umd.edu/sites/default/files/iryzhov/...\May the Best Man Win!" Simulation optimization for match-making in e-sports Ilya](https://reader033.fdocuments.in/reader033/viewer/2022041520/5e2dfb9b4eb67d0d6d211779/html5/thumbnails/20.jpg)
Value of information in match-making
Let
kw ,n+1 =(µw ,n+1,σw ,n+1
), k l ,n+1 =
(µl ,n+1,σ l ,n+1
)be the beliefs that we would have at time n+ 1 if player 0 wins (orloses) against j
Similarly, let qw ,n+10i (or ql ,n+1
0i ) be the draw probabilities if player 0wins (or loses)
The greedy policy would arrange the next game by computing
Fw ,n+1 = maxi
qw ,n+10i , F l ,n+1 max
iql ,n+1
0i
depending on what happens now
20 / 34
![Page 21: May the Best Man Win! - University Of Marylandscholar.rhsmith.umd.edu/sites/default/files/iryzhov/...\May the Best Man Win!" Simulation optimization for match-making in e-sports Ilya](https://reader033.fdocuments.in/reader033/viewer/2022041520/5e2dfb9b4eb67d0d6d211779/html5/thumbnails/21.jpg)
Value of information in match-making
If we stop learning after the next game, the optimal match-up is
X n = arg maxj
qn0j + (N−n)F nj
where
F nj = Pn (0 beats j)Fw ,n+1 +Pn (j beats 0)F l ,n+1
is the expected value (pre-game) of the highest draw probability(post-game)
If the total number N of games is unknown, use
X n = arg maxj
qn0j +γ
1− γF nj
where γ is tunable
21 / 34
![Page 22: May the Best Man Win! - University Of Marylandscholar.rhsmith.umd.edu/sites/default/files/iryzhov/...\May the Best Man Win!" Simulation optimization for match-making in e-sports Ilya](https://reader033.fdocuments.in/reader033/viewer/2022041520/5e2dfb9b4eb67d0d6d211779/html5/thumbnails/22.jpg)
Experimental results: draw probabilities
In simulations, our method behaved more aggressively than DrawChance...
22 / 34
![Page 23: May the Best Man Win! - University Of Marylandscholar.rhsmith.umd.edu/sites/default/files/iryzhov/...\May the Best Man Win!" Simulation optimization for match-making in e-sports Ilya](https://reader033.fdocuments.in/reader033/viewer/2022041520/5e2dfb9b4eb67d0d6d211779/html5/thumbnails/23.jpg)
Experimental results: difference in true skills
...pursued tougher opponents early on, but found better matches later...
23 / 34
![Page 24: May the Best Man Win! - University Of Marylandscholar.rhsmith.umd.edu/sites/default/files/iryzhov/...\May the Best Man Win!" Simulation optimization for match-making in e-sports Ilya](https://reader033.fdocuments.in/reader033/viewer/2022041520/5e2dfb9b4eb67d0d6d211779/html5/thumbnails/24.jpg)
Experimental results: errors of estimates
...produced better estimates of player 0’s true skill...
24 / 34
![Page 25: May the Best Man Win! - University Of Marylandscholar.rhsmith.umd.edu/sites/default/files/iryzhov/...\May the Best Man Win!" Simulation optimization for match-making in e-sports Ilya](https://reader033.fdocuments.in/reader033/viewer/2022041520/5e2dfb9b4eb67d0d6d211779/html5/thumbnails/25.jpg)
Experimental results: win/loss ratios
...and came closer to a 0.5 win/loss ratio.
25 / 34
![Page 26: May the Best Man Win! - University Of Marylandscholar.rhsmith.umd.edu/sites/default/files/iryzhov/...\May the Best Man Win!" Simulation optimization for match-making in e-sports Ilya](https://reader033.fdocuments.in/reader033/viewer/2022041520/5e2dfb9b4eb67d0d6d211779/html5/thumbnails/26.jpg)
Outline
1 Introduction
2 TrueSkill™ model for learning skill levelsLearning with moment-matchingThe DrawChance policy
3 Match-making with knowledge gradients
4 Moving on: Targeting and selection
5 Conclusions
26 / 34
![Page 27: May the Best Man Win! - University Of Marylandscholar.rhsmith.umd.edu/sites/default/files/iryzhov/...\May the Best Man Win!" Simulation optimization for match-making in e-sports Ilya](https://reader033.fdocuments.in/reader033/viewer/2022041520/5e2dfb9b4eb67d0d6d211779/html5/thumbnails/27.jpg)
...but that’s not the end!
In simulation optimization, wemight tune a simulator to seehow performance of a systemcould be improved
But before the simulator can beoptimized, we need to make surethat it is a good model of reality
Targeting and selection: whichsimulation model most closelymatches data from the field?
27 / 34
![Page 28: May the Best Man Win! - University Of Marylandscholar.rhsmith.umd.edu/sites/default/files/iryzhov/...\May the Best Man Win!" Simulation optimization for match-making in e-sports Ilya](https://reader033.fdocuments.in/reader033/viewer/2022041520/5e2dfb9b4eb67d0d6d211779/html5/thumbnails/28.jpg)
Targeting and selection
Let c be a deterministic target (e.g. average historical performance)and consider M simulation models
The mean output si of model i matches the target if
|si − c |< δ
We can simulate system i to obtain a noisy observation
pi ∼N(si ,σ
2ε
),
and apply Bayesian updating with no moment-matching required
The “draw probability” in this context is given by
qni =1√2π
1√(σni
)2+ σ2
ε
e
− (µni −c)2
2
((σn
i )2
+σ2ε
).
28 / 34
![Page 29: May the Best Man Win! - University Of Marylandscholar.rhsmith.umd.edu/sites/default/files/iryzhov/...\May the Best Man Win!" Simulation optimization for match-making in e-sports Ilya](https://reader033.fdocuments.in/reader033/viewer/2022041520/5e2dfb9b4eb67d0d6d211779/html5/thumbnails/29.jpg)
The value of information
Our goal is to maximize
supπ
IEmaxi
qNi
or its online equivalent, if we are refining an existing simulator
Bayesian analysis tells us that, conditional on our beliefs at time n,
µn+1i ∼N
(µni ,(σ̃
ni )2)
where (σ̃ni )2 = (σn
i )2−(σn+1i
)2
The knowledge gradient approach simulates the system
X n = arg maxi
IEni max
jqn+1j
which is expected to yield the best result after the simulation
29 / 34
![Page 30: May the Best Man Win! - University Of Marylandscholar.rhsmith.umd.edu/sites/default/files/iryzhov/...\May the Best Man Win!" Simulation optimization for match-making in e-sports Ilya](https://reader033.fdocuments.in/reader033/viewer/2022041520/5e2dfb9b4eb67d0d6d211779/html5/thumbnails/30.jpg)
Issues for further work
The quantity IEni maxj q
n+1j can be computed in closed form
If i is believed to be suboptimal with high precision, one simulationmay yield no information (see also Ryzhov & Powell 2011b)
30 / 34
![Page 31: May the Best Man Win! - University Of Marylandscholar.rhsmith.umd.edu/sites/default/files/iryzhov/...\May the Best Man Win!" Simulation optimization for match-making in e-sports Ilya](https://reader033.fdocuments.in/reader033/viewer/2022041520/5e2dfb9b4eb67d0d6d211779/html5/thumbnails/31.jpg)
Outline
1 Introduction
2 TrueSkill™ model for learning skill levelsLearning with moment-matchingThe DrawChance policy
3 Match-making with knowledge gradients
4 Moving on: Targeting and selection
5 Conclusions
31 / 34
![Page 32: May the Best Man Win! - University Of Marylandscholar.rhsmith.umd.edu/sites/default/files/iryzhov/...\May the Best Man Win!" Simulation optimization for match-making in e-sports Ilya](https://reader033.fdocuments.in/reader033/viewer/2022041520/5e2dfb9b4eb67d0d6d211779/html5/thumbnails/32.jpg)
Conclusions
We have studied online match-making through the framework ofonline optimal learning
In simulations, a look-ahead policy offers some improvement over agreedy policy
The formulation of the problem has interesting implications for futurework in simulation optimization (Ryzhov 2011)
32 / 34
![Page 33: May the Best Man Win! - University Of Marylandscholar.rhsmith.umd.edu/sites/default/files/iryzhov/...\May the Best Man Win!" Simulation optimization for match-making in e-sports Ilya](https://reader033.fdocuments.in/reader033/viewer/2022041520/5e2dfb9b4eb67d0d6d211779/html5/thumbnails/33.jpg)
References
Chick, S.E. (2006) “Subjective probability and Bayesianmethodology.” In Handbooks of Operations Research andManagement Science 13, 225–258.
Dangauthier, P., Herbrich, R., Minka, T. & Graepel, T. (2007)“TrueSkill through time: revisiting the history of chess.” In Advancesin Neural Information Processing Systems 20, 337–344.
Frazier, P.I., Powell, W. & Dayanik, S. (2008) “A knowledge-gradientpolicy for sequential information collection.” SIAM J. on Control andOptimization 47:5, 2410-2439.
Gittins, J. (1989) Multi-armed bandit allocation indices. John Wileyand Sons.
Herbrich, R., Minka, T. & Graepel, T. (2006) “TrueSkill™: a Bayesianskill rating system.” In Advances in Neural Information ProcessingSystems 19, 569–576.
33 / 34
![Page 34: May the Best Man Win! - University Of Marylandscholar.rhsmith.umd.edu/sites/default/files/iryzhov/...\May the Best Man Win!" Simulation optimization for match-making in e-sports Ilya](https://reader033.fdocuments.in/reader033/viewer/2022041520/5e2dfb9b4eb67d0d6d211779/html5/thumbnails/34.jpg)
References
Huhh, J. (2008) “Culture and business of PC bangs in Korea.”Games and Culture 3:1, 26–37.
Minka, T. (2001) “A family of algorithms for approximate Bayesianinference.” Ph.D. thesis, MIT.
Ryzhov, I.O. (2011) “Targeting and selection: a new approach tosimulation validation.” In preparation.
Ryzhov, I.O. & Powell, W.B. (2011a) “Information collection on agraph.” Operations Research 59:1, 188–201.
Ryzhov, I.O. & Powell, W.B. (2011b) “The value of information inmulti-armed bandits with exponentially distributed rewards.”Proceedings of the 2011 International Conference on ComputationalScience, 1363–1372.
34 / 34