Dynamic Optimization and Learning for Renewal Systems --
description
Transcript of Dynamic Optimization and Learning for Renewal Systems --
![Page 1: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/1.jpg)
Dynamic Optimization and Learning for Renewal Systems --
With applications to Wireless Networks and Peer-to-Peer Networks
Michael J. Neely, University of Southern California
tT/R
T/R
T/R
T/R
T/RNetwork
Coordinator
Task 1
Task 2
Task 3
T[0] T[1] T[2]
![Page 2: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/2.jpg)
Outline:
• Optimization of Renewal Systems
• Application 1: Task Processing in Wireless Networks Quality-of-Information (ARL CTA project) Task “deluge” problem
• Application 2: Peer-to-Peer Networks Social networks (ARL CTA project) Internet and wireless
![Page 3: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/3.jpg)
References:
General Theory and Application 1:• M. J. Neely, Stochastic Network Optimization with Application to
Communication and Queueing Systems, Morgan & Claypool, 2010. • M. J. Neely, “Dynamic Optimization and Learning for Renewal Systems,” Proc.
Asilomar Conf. on Signals, Systems, and Computers, Nov. 2010.
Application 2 (Peer-to-Peer): • M. J. Neely and L. Golubchik, “Utility Optimization for Dynamic Peer-to-Peer
Networks with Tit-for-Tat Constraints,” Proc. IEEE INFOCOM, 2011.
These works are available on:
http://www-bcf.usc.edu/~mjneely/
![Page 4: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/4.jpg)
A General Renewal System
tT[0] T[1] T[2]
y[2]y[1]y[0]
•Renewal Frames r in {0, 1, 2, …}.•π[r] = Policy chosen on frame r.•P = Abstract policy space (π[r] in P for all r).•Policy π[r] affects frame size and penalty vector on frame r.
π[r]•y[r] = [y0(π[r]), y1(π[r]), …, yL(π[r])]
•T[r] = T(π[r]) = Frame Duration
![Page 5: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/5.jpg)
A General Renewal System
tT[0] T[1] T[2]
y[2]y[1]y[0]
•Renewal Frames r in {0, 1, 2, …}.•π[r] = Policy chosen on frame r.•P = Abstract policy space (π[r] in P for all r).•Policy π[r] affects frame size and penalty vector on frame r. These are random functions of π[r] (distribution depends on π[r]):
π[r]•y[r] = [y0(π[r]), y1(π[r]), …, yL(π[r])]
•T[r] = T(π[r]) = Frame Duration
![Page 6: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/6.jpg)
A General Renewal System
tT[0] T[1] T[2]
y[2]y[1]y[0]
•Renewal Frames r in {0, 1, 2, …}.•π[r] = Policy chosen on frame r.•P = Abstract policy space (π[r] in P for all r).•Policy π[r] affects frame size and penalty vector on frame r. These are random functions of π[r] (distribution depends on π[r]):
π[r]•y[r] = [1.2, 1.8, …, 0.4]
•T[r] = 8.1 = Frame Duration
![Page 7: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/7.jpg)
A General Renewal System
tT[0] T[1] T[2]
y[2]y[1]y[0]
•Renewal Frames r in {0, 1, 2, …}.•π[r] = Policy chosen on frame r.•P = Abstract policy space (π[r] in P for all r).•Policy π[r] affects frame size and penalty vector on frame r. These are random functions of π[r] (distribution depends on π[r]):
π[r]•y[r] = [0.0, 3.8, …, -2.0]
•T[r] = 12.3 = Frame Duration
![Page 8: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/8.jpg)
A General Renewal System
tT[0] T[1] T[2]
y[2]y[1]y[0]
•Renewal Frames r in {0, 1, 2, …}.•π[r] = Policy chosen on frame r.•P = Abstract policy space (π[r] in P for all r).•Policy π[r] affects frame size and penalty vector on frame r. These are random functions of π[r] (distribution depends on π[r]):
π[r]•y[r] = [1.7, 2.2, …, 0.9]
•T[r] = 5.6 = Frame Duration
![Page 9: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/9.jpg)
Example 1: Opportunistic Scheduling
S[r] = (S1[r], S2[r], S3[r])
•All Frames = 1 Slot•S[r] = (S1[r], S2[r], S3[r]) = Channel States for Slot r•Policy π[r]: On frame r: First observe S[r], then choose a channel to serve (i.,e, {1, 2, 3}).•Example Objectives: thruput, energy, fairness, etc.
![Page 10: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/10.jpg)
Example 2: Convex Programs (Deterministic Problems)
Minimize: f(x1, x2, …, xN)
Subject to: gk(x1, x2, …, xN) ≤ 0 for all k in {1,…, K}
(x1, x2, …, xN) in A
![Page 11: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/11.jpg)
Example 2: Convex Programs (Deterministic Problems)
•All Frames = 1 Slot.•Policy π[r] = (x1[r], x2[r], …, xN[r]) in A.
•Time average: f(x[r]) = limR∞ (1/R)∑r=0 f(x[r])
Minimize: f(x1, x2, …, xN)
Subject to: gk(x1, x2, …, xN) ≤ 0 for all k in {1,…, K}
(x1, x2, …, xN) in A
Equivalent to:
Minimize: f(x1[r], x2[r], …, xN[r])
Subject to: gk(x1[r], x2[r], …, xN[r]) ≤ 0 for all k in {1,…, K}
(x1[r], x2[r], …, xN[r]) in A for all frames r
R-1
![Page 12: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/12.jpg)
Example 2: Convex Programs (Deterministic Problems)
Jensen’s Inequality: The time average of the dynamic solution (x1[r], x2[r], …, xN[r]) solves the original convex program!
Minimize: f(x1, x2, …, xN)
Subject to: gk(x1, x2, …, xN) ≤ 0 for all k in {1,…, K}
(x1, x2, …, xN) in A
Equivalent to:
Minimize: f(x1[r], x2[r], …, xN[r])
Subject to: gk(x1[r], x2[r], …, xN[r]) ≤ 0 for all k in {1,…, K}
(x1[r], x2[r], …, xN[r]) in A for all frames r
![Page 13: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/13.jpg)
Example 3: Markov Decision Problems
•M(t) = Recurrent Markov Chain (continuous or discrete)•Renewals are defined as recurrences to state 1.•T[r] = random inter-renewal frame size (frame r).•y[r] = penalties incurred over frame r.•π[r] = policy that affects transition probs over frame r.
•Objective: Minimize time average of one penalty subj. to time average constraints on others.
1
2
3
4
![Page 14: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/14.jpg)
Example 4: Task Processing over Networks
T/R
T/R
T/R
T/R
T/R
Network Coordinator
•Infinite Sequence of Tasks.•E.g.: Query sensors and/or perform computations.•Renewal Frame r = Processing Time for Frame r.•Policy Types:• Low Level: {Specify Transmission Decisions over Net}• High Level: {Backpressure1, Backpressure2, Shortest Path}
•Example Objective: Maximize quality of information per unit time subject to per-node power constraints.
Task 1Task 2Task 3T/R
![Page 15: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/15.jpg)
Quick Review of Renewal-Reward Theory(Pop Quiz Next Slide!)
Define the frame-average for y0[r]:
The time-average for y0[r] is then:
*If i.i.d. over frames, by LLN this is the same as E{y0}/E{T}.
![Page 16: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/16.jpg)
Pop Quiz: (10 points)
•Let y0[r] = Energy Expended on frame r.•Time avg. power = (Total Energy Use)/(Total Time)•Suppose (for simplicity) behavior is i.i.d. over frames.
To minimize time average power, which one should we minimize?
(a) (b)
![Page 17: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/17.jpg)
Pop Quiz: (10 points)
•Let y0[r] = Energy Expended on frame r.•Time avg. power = (Total Energy Use)/(Total Time)•Suppose (for simplicity) behavior is i.i.d. over frames.
To minimize time average power, which one should we minimize?
(a) (b)
![Page 18: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/18.jpg)
Two General Problem Types:
1) Minimize time average subject to time average constraints:
2) Maximize concave function φ(x1, …, xL) of time average:
![Page 19: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/19.jpg)
Solving the Problem (Type 1):
Define a “Virtual Queue” for each inequality constraint:
Zl[r] clT[r]yl[r]
Zl[r+1] = max[Zl[r] – clT[r] + yl[r], 0]
![Page 20: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/20.jpg)
Lyapunov Function and “Drift-Plus-Penalty Ratio”:
Z2(t)
Z1(t)
L[r] = Z1[r]2 + Z2[r]2 + … + ZL[r]2
Δ(Z[r]) = E{L[r+1] – L[r] | Z[r]} = “Frame-Based Lyap. Drift”
•Scalar measure of queue sizes:
•Algorithm Technique: Every frame r, observe Z1[r], …, ZL[r]. Then choose a policy π[r] in P to minimize:
Δ(Z[r]) + VE{y0[r]|Z[r]}
E{T|Z[r]}“Drift-Plus-Penalty Ratio” =
![Page 21: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/21.jpg)
The Algorithm Becomes:
•Observe Z[r] = (Z1[r], …, ZL[r]). Choose π[r] in P to solve:
•Then update virtual queues:
Δ(Z[r]) + VE{y0[r]|Z[r]}
E{T|Z[r]}
Zl[r+1] = max[Zl[r] – clT[r] + yl[r], 0]
![Page 22: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/22.jpg)
Theorem: Assume the constraints are feasible. Then under this algorithm, we achieve:
Δ(Z[r]) + VE{y0[r]|Z[r]}
E{T|Z[r]}
DPP Ratio:
(a)
(b)
For all frames r in {1, 2, 3, …}
![Page 23: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/23.jpg)
Application 1 – Task Processing:
T/R T/R
T/R
T/R
T/R
Network Coordinator
Task 1Task 2Task 3
•Every Task reveals random task parameters η[r]: η[r] = [(qual1[r], T1[r]), (qual2[r], T2[r]), …, (qual5[r], T5[r])]•Choose π[r] = [which node to transmit, how much idle] in {1,2,3,4,5} X [0, Imax] •Transmissions incur power•We use a quality distribution that tends to be better for higher-numbered nodes.•Maximize quality/time subject to pav≤ 0.25 for all nodes.
Setup Transmit Idle I[r]Frame r
![Page 24: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/24.jpg)
Minimizing the Drift-Plus-Penalty Ratio:
•Minimizing a pure expectation, rather than a ratio, is typically easier (see Bertsekas, Tsitsiklis Neuro-DP).
•Define:
•“Bisection Lemma”:
![Page 25: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/25.jpg)
Learning via Sampling from the past:
•Suppose randomness characterized by: {η1, η2, ..., ηW} (past random samples)
•Want to compute (over unknown random distribution of η):
•Approximate this via W samples from the past:
![Page 26: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/26.jpg)
Simulation:
Sample Size W
Qua
lity
of In
form
ation
/ U
nit T
ime
Drift-Plus-Penalty Ratio Alg. With Bisection
Alternative Alg. With Time Averaging
![Page 27: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/27.jpg)
Concluding Sims (values for W=10):
T/R T/R
T/R
T/R
T/R
Network Coordinator
Task 1Task 2Task 3
Setup Transmit Idle I[r]Frame r
![Page 28: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/28.jpg)
“Application 2” – Peer-to-Peer Wireless Networking:
![Page 29: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/29.jpg)
Network Cloud
1
2
3
5
4
• N nodes.• Each node n has download social group Gn.• Gn is a subset of {1, …, N}.• Each file f is in some subset of nodes Nf.• Each node n can request download of a file f
from any node in Gn Nf • Transmission rates (µab(t)) between nodes are
chosen in some (possibly time-varying) set (G t)
![Page 30: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/30.jpg)
“Internet Cloud” Example 1:
Network Cloud
1
2
3
5
4
Uplink capacity C1uplink
• G(t) = Constant (no variation).• ∑bµnb(t) ≤ Cn
uplink for all nodes n.
This example assumes uplink capacity is the bottleneck.
![Page 31: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/31.jpg)
“Internet Cloud” Example 2:
Network Cloud
1
2
3
5
4
• G(t) specifies a single supportable (µab(t)).
No “transmission rate decisions.” The allowable rates (µab(t)) are given to the peer-to-peer system from some underlying transport and routing protocol.
![Page 32: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/32.jpg)
“Wireless Basestation” Example 3:
= base station
= wireless device
• Wireless device-to-device transmission increases capacity.• (µab(t)) chosen in G(t).• Transmissions coordinated by base station.
![Page 33: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/33.jpg)
“Commodities” for Request Allocation• Multiple file downloads can be active.
• Each file corresponds to a subset of nodes.
• Queueing files according to subsets would result in O(2N) queues. (complexity explosion!).
Instead of that, without loss of optimality, we use the following alternative commodity structure…
![Page 34: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/34.jpg)
“Commodities” for Request Allocation
• Use subset info to determine the decision set.
n(An(t), Nn(t))j
k
m
Gn Nn(t)
![Page 35: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/35.jpg)
“Commodities” for Request Allocation
• Use subset info to determine the decision set.• Choose which node will help download.
n(An(t), Nn(t))j
k
m
Gn Nn(t)
![Page 36: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/36.jpg)
“Commodities” for Request Allocation
• Use subset info to determine the decision set.• Choose which node will help download.• That node queues the request: Qmn(t+1) = max[Qmn(t) + Rmn(t) - µmn(t), 0]• Subset info can now be thrown away.
n(An(t), Nn(t))j
k
mQmn(t)
![Page 37: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/37.jpg)
Stochastic Network Optimization Problem:
Maximize: ∑n gn(∑a ran)
Subject to: (1) Qmn < infinity
(Queue Stability Constraint)(2) α ∑a ran ≤ β + ∑b rnb for all n
(Tit-for-Tat Constraint)
![Page 38: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/38.jpg)
Maximize: ∑n gn(∑a ran)
Subject to: (1) Qmn < infinity
(Queue Stability Constraint)(2) α ∑a ran ≤ β + ∑b rnb for all n
(Tit-for-Tat Constraint)
concave utility function
Stochastic Network Optimization Problem:
![Page 39: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/39.jpg)
Maximize: ∑n gn(∑a ran)
Subject to: (1) Qmn < infinity
(Queue Stability Constraint)(2) α ∑a ran ≤ β + ∑b rnb for all n
(Tit-for-Tat Constraint)
concave utility function
time average request rate
Stochastic Network Optimization Problem:
![Page 40: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/40.jpg)
Maximize: ∑n gn(∑a ran)
Subject to: (1) Qmn < infinity
(Queue Stability Constraint)(2) α ∑a ran ≤ β + ∑b rnb for all n
(Tit-for-Tat Constraint)
concave utility function
time average request rate
α x Download rate
Stochastic Network Optimization Problem:
![Page 41: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/41.jpg)
Maximize: ∑n gn(∑a ran)
Subject to: (1) Qmn < infinity
(Queue Stability Constraint)(2) α ∑a ran ≤ β + ∑b rnb for all n
(Tit-for-Tat Constraint)
concave utility function
time average request rate
α x Download rate β + Upload rate
Stochastic Network Optimization Problem:
![Page 42: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/42.jpg)
Solution Technique for Infocom paper
• Use “Drift-Plus-Penalty” framework in a new “Universal Scheduling” scenario.
• We make no statistical assumptions on the stochastic processes [S(t); (An(t), Nn(t))].
![Page 43: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/43.jpg)
Resulting Algorithm:• (Auxiliary Variables) For each n, choose an aux.
variable γn(t) in interval [0, Amax] to maximize:
Vgn(γn(t)) – Hn(t)gn(t)
• (Request Allocation) For each n, observe the following value for all m in {Gn Nn(t)}:
-Qmn(t) + Hn(t) + (Fm(t) – αFn(t))
Give An(t) to queue m with largest non-neg value, Drop An(t) if all above values are negative.
• (Scheduling) Choose (µab(t)) in G(t) to maximize:
∑nb µnb(t)Qnb(t)
![Page 44: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/44.jpg)
How the Incentives Work for node n:
Fn(t)α x Receive Help(t) β + Help Others(t)
-Qmn(t) + Hn(t) + (Fm(t) – αFn(t))
Node n can only request downloads from others if it finds a node m with a non-negative value of:
Fn(t) = “Node n Reputation” (Good reputation = Low value)
![Page 45: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/45.jpg)
How the Incentives Work for node n:
Fn(t)α x Receive Help(t) β + Help Others(t)
-Qmn(t) + Hn(t) + (Fm(t) – αFn(t))
Node n can only request downloads from others if it finds a node m with a non-negative value of:
Fn(t) = “Node n Reputation” (Good reputation = Low value)
Bounded Compare Reputations!
![Page 46: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/46.jpg)
Concluding Theorem: For any arbitrary [S(t); (An(t), Nn(t))] sample path, we guarantee:
a) Qmn(t) ≤ Qmax = O(V) for all t, all (m,n).
b) All Tit-for-Tat constraints are satisfied.
c) For any T>0:
liminfK∞ [Achieved Utility(KT)] ≥
liminfK∞ (1/K)∑i=1[“T-Slot-Lookahead-Utility[i]”]- BT/V
Frame 1 Frame 2 Frame 3
0 T 2T 3T
K
![Page 47: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/47.jpg)
Conclusions for Peer-to-Peer Problem: • Framework for posing peer-to-peer networking as
stochastic network optimization problems.• Can compute optimal solution in polynomial time.
Conclusions overall: • Renewal Optimization Framework can be viewed as
“Generalized Linear Programming”• Variable Length Scheduling Modes• Many applications (task processing, peer-to-peer
networks, Markov decision problems, linear programs, convex programs, stock market, smart grid, energy harvesting, and many more)
![Page 48: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/48.jpg)
Solving the Problem (Type 2):
We reduce it to a problem with the structure of Type 1 via:• Auxiliary Variables γ[r] = (γ1[r], …, γL[r]).• The following variation on Jensen’s Inequality:
For any concave function φ(x1, .., xL) and any (arbitrarily correlated) vector of random variables (x1, x2, …, xL, T), where T>0, we have:
E{Tφ(X1, …, XL)}
E{T}E{T(X1, …, XL)}
E{T}φ( )≤
![Page 49: Dynamic Optimization and Learning for Renewal Systems --](https://reader036.fdocuments.in/reader036/viewer/2022062323/56815df3550346895dcc2723/html5/thumbnails/49.jpg)
The Algorithm (type 2) Becomes:
•On frame r, observe Z[r] = (Z1[r], …, ZL[r]).•(Auxiliary Variables) Choose γ1[r], …, γL[r] to max the below deterministic problem:
•(Policy Selection) Choose π[r] in P to minimize:
•Then update virtual queues:Zl[r+1] = max[Zl[r] – clT[r] + yl[r], 0], Gl[r+1] = max[Gl[r] + γl[r]T[r] - yl[r], 0]