Algorithmic Trading with Learning - Ryerson Universityt"T)trader learns the realized value of D 7 /...
Transcript of Algorithmic Trading with Learning - Ryerson Universityt"T)trader learns the realized value of D 7 /...
Algorithmic Trading with Learning
Ryerson University
Damir Kinzebulatov1
(Fields Institute)
joint work with
Alvaro Cartea (University College London) and
Sebastian Jaimungal (University of Toronto)
1www.math.toronto.edu/dkinz1 / 43
Asset price St
Suppose that at time t < T trader has a prediction ST about ST .
ST is a random variable
e.g. in High Frequency trading, using Data Analysis algorithms:
ST − S0 =
2 · 10−2 prob 0.1
10−2 prob 0.20 prob 0.55
−10−2 prob 0.1−2 · 10−2 prob 0.05
2 / 43
Naive strategy:
if E[ST ] > St ⇒ buy
Advanced strategy:
– would incorporate prediction ST in the asset price process St
– would learn from the realized dynamics of the asset price
3 / 43
– incorporate prediction ST in the asset price process St . . .
A three point prediction... ST = −5, 0, 5 with prob 0.7, 0.2, 0.1
0 0.2 0.4 0.6 0.8 1−10
−5
0
5
10
Time
Midprice
4 / 43
Story 1: Asset price as a randomized Brownian bridge
5 / 43
Recall:
Brownian bridge βtT is a Gaussian process such that
β0T = βTT = 0, βtT ∼ N(
0,t
T(T − t)
)
6 / 43
Algorithmic trading with learning – our model
St is a “randomized Brownian bridge”
St = S0 + σβtT +t
TD
D – random change in asset price (distribution of D is known a priori)
βtT – Brownian bridge (‘noise’) independent of D
Thus, ST = S0 +D
t ↑ T ⇒ trader learns the realized value of D
7 / 43
Insider trading is not possible
Let Ft = (Su)u6t
Trader has access only to filtration Ft (but not to the filtration of βtT )
⇒ trader can’t distinguish between noise βtT and D
8 / 43
What about the standard model?
St = S0 + σWt (“arithmetic BM”)
corresponds to the choice D ∼ N(0, σ2T )
9 / 43
Proposition: Asset price St satisfies
dSt = A(t, St) dt+ σ dWt, St|t=0 = S0,
where Wt is an Ft-Brownian motion,
A(t, S) =E[D|St = S] + S0 − S
T − t
and
E[D|St = S] =
∫x exp
(x S−S0σ2(T−t) − x
2 t2σ2T (T−t)
)µD(dx)∫
exp(x S−S0σ2(T−t) − x2 t
2σ2T (T−t)
)µD(dx)
.
10 / 43
Story 2: Trader’s optimization problem
(high-frequency trading)
11 / 43
Market microstructure: Limit Order Book
Oxford Centre for Industrial and Applied Mathematics:
An order matching a sell limit order is called a buy market order (notshown, because it is executed immediately!)
12 / 43
Market microstructure: Limit Order Book
To summarize:
– use buy market orders (MO) ⇒ pay higher prices– use buy limit orders (LO) ⇒ pay lower prices, but have to wait . . .
(similarly for sell LO and sell MO)
13 / 43
Trader’s optimization problem: Strategy
Simplifying assumptions (not crucial)
– at each t post LOs & MOs for 0 or 1 units of asset, at best bid/ask price
⇒ trader’s strategy has 4 components:
`+t ∈ {0, 1} (sell LO)
`−t ∈ {0, 1} (buy LO)
m−t ∈ {0, 1} (buy MO)
m+t ∈ {0, 1} (buy MO)
– the spread is constant
14 / 43
Key quantities
Inventory:
Qt = −∫ t
0`+t dN
+t +
∫ t
0`−t dN
−t −m
+t +m−
t
where Poisson processes N+t , N−
t count the number of filled sell, buy LOs
Cash process
Xt =−∫ t
0
(St − ∆
2
)`−t 1{Qt6Q} dN
−t
+
∫ t
0
(St + ∆
2
)`+t 1{Qt>Q} dN
+t
−∫ t
0
(St + ∆
2 + ε)1{Qt6Q} dm
−t
+
∫ t
0
(St − ∆
2 − ε)1{Qt>Q} dm
+t
where ∆ = spread, ε is transaction fee for market order, St = midprice15 / 43
Constraints on inventory:
Q 6 Qt 6 Q and QT = 0
16 / 43
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.96
0.98
1
1.02
1.04
Time (t)
Asset
price
(S)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−20
−10
0
10
20
Inventory
(Q)
17 / 43
Trader’s optimization problem: Goal
Goal: find
sup{`±t }t≤T ,{m±
t }t≤T
E[XT +QT
(ST − ∆
2 sgn(QT )− αQT
)](1)
– 1st term: cash from trading– 2nd term: profit/cost from closing the position at T
So far midprice St was any process . . . We want RBB
St = S0 + σβtT +t
TD
18 / 43
Dynamic programming
Since RBB St satisfies an SDE
dSt = A(t, St) dt+ σ dWt
we can use Dynamic Programming to solve the optimization problem
19 / 43
Dynamic programming
Goal: find the value function
H(t, S,Q,X) =
sup`±· ,m±
·
E[XT +QT
(ST − ∆
2 sgn(QT )− αQT
) ∣∣∣∣St = S,Qt = Q,Xt = X
]
20 / 43
Dynamic programming
The value function H admits presentation
H(t,X, S,Q) = X +QS + g(t, S,Q)
where g solves (in viscosity sense) system of non-linear PDEs
0 = max{∂tg +
12σ2∂SSg +A(t, S) (Q+ ∂Sg)− ϕQ2
+1Q<Qmax`−∈{0,1} λ− [`−∆
2+ g(t, S,Q+ `−)− g
]+1Q>Qmax`+∈{0,1} λ
+[`+ ∆
2+ g(t, S,Q− `+)− g
];
max{−∆2− ε+ g(t, S,Q+ 1)− g,
−∆2− ε+ g(t, S,Q− 1)− g, 0}
}.
subject to terminal condition
g(θ, S,Q) = −∆2|Q| − αQ2, Q 6 Q 6 Q
21 / 43
Example
22 / 43
Example
Informed trader (IT) believes that
D =
{0.02 with prob 0.8−0.02 with prob 0.2
Compare the performance of IT trader with
– uninformed trader (UT) who views
D ∼ N(0, σ2T )
(i.e. St is an arithmetic BM)
– uninformed with learning (UL) who believes
D = 0.02,−0.02 with prob 0.5, 0.5
23 / 43
Example
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.96
0.98
1
1.02
1.04
Time (t)
Asset
price
(S)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−20
−10
0
10
20
Inventory
(Q)
The strategy of UT
who views the midprice as a Brownian motion
24 / 43
Example
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.96
0.98
1
1.02
1.04
Time (t)
Asset
price
(S)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−20
−10
0
10
20
Inventory
(Q)
The strategy of UL
who views D = −0.02, 0.02 with prob 0.5
25 / 43
Example
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.96
0.98
1
1.02
1.04
Time (t)
Asset
price
(S)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−20
−10
0
10
20
Inventory
(Q)
The strategy of IT
who views D = −0.02, 0.02 with prob 0.2, 0.8
Note: for large volatility IT stops learning.
26 / 43
Example
0.02 0.04 0.06 0.08 0.10.2
0.25
0.3
0.35
0.4
0.45
Std of P&L
MeanP&L
IwLUwLUwoL
Bounds oninventory areincreasing
Risk-Reward profiles for the three types of agents as inventory bound increases
27 / 43
Example
0 5 10 15 200
1
2
3
4
# of time interval
l.o. buy
l.o. sellm.o. buy
m.o. sell
UT: the mean executed Limit and Market orders
28 / 43
Example
0 5 10 15 200
1
2
3
4
# of time interval
l.o. buy
l.o. sellm.o. buy
m.o. sell
UL: the mean executed Limit and Market orders
29 / 43
Example
0 5 10 15 200
1
2
3
4
# of time interval
l.o. buy
l.o. sellm.o. buy
m.o. sell
IT: the mean executed Limit and Market orders
30 / 43
Multiple assets
31 / 43
Multiple assets
Asset midprices S are randomized Brownian bridges
S(i)t = S
(i)0 + σ(i) β
(i)tT +
t
TD(i)
β(i)tT − mutually independent std. Brownian bridges
D(i) − the random change in asset prices – may have dependence
– asset prices interact non-linearly through D = (D(i))
– IT may trade in an asset that has high volatility, and in which they aremarginally uniformed, but can learn joint information from a second, lessvolatile, asset
32 / 43
Multiple assets
For illustration purposes...
Probability of outcomes
D(1)
-0.02 +0.02
D(2) -0.02 0.45 0.05
+0.02 0.05 0.45
σ(1) = 0.02 and σ(2) = 0.01
With observing solely S(1) or S(2) the agent is uniformed
33 / 43
Multiple assets
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.98
1
1.02
1.04
Time (t)
Asset
price
(S)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−10
−5
0
5
10
Inventory
(Q)
The strategy of trader who excludes Asset 2 from their info
34 / 43
Multiple assets
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.98
1
1.02
1.04
Time (t)
Asset
price
(S)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−10
−5
0
5
10
Inventory
(Q)
The strategy of trader who includes Asset 2 in their info
35 / 43
Conclusions
– Agents who have info can outperform other traders
– We show how to trade when info is uncertain
– Optimal strategy learns from midprice dynamics and outperforms naivestrategies
– Including info from other assets can add value to assets in which learningdoes not help
Thank you!
www.math.toronto.edu/dkinz
36 / 43