Outline - Hong Kong University of Science and Technology

Outline �

2�

  Introduction and Motivation

  Survey of Existing Approaches

  Example:

Distributive Delay-Optimal Control for Uplink OFDMA via

Localized Stochastic Learning and Auction Game

  Convergence Analysis

  Asymptotic Optimality

  Conclusion

Introduction and Motivation �

3�

  Why delay performance is important?   “WHAT??!!Heisstuckinthe

air!!!$*(&#%*!(!”

  “Youmustbekiddingme!Bufferingatsuchanimportantmoment!!??”�

Introduction and Motivations �

4�

  We may have multiple delay-sensitive wireless applications running at different devices

Keeptrackofagame�

PlaymulI‐playergame�

Keeptalkingtosomefriends �

Related Works�

5�

  OFDMA Joint Power and Subband Design for PHY

Performance

[Yu’02],[Hoo’04],[Seong’06],etc.–  Selectsthestrongestuserpersubband–  Time‐FrequencyWater‐fillingPowerAllocaIon–  AssumingknowledgeofperfectCSIT.

[Lau’05],[Wong’09],[Brah’07]etc.–  Robust Power and Subband Control with limited

feedbackoroutdatedCSIT(packeterrors). �

Introduction and Motivations �

6�

  Challenges to incorporate QSI and CSI in adaptation

When Shannon meets Kleinrock… �

ClaudeShannon � LeonardKleinrock�

Existing Approaches to deal with Delay-Optimal Control�

7�

  Various approaches dealing with delay problems

BufferStates

Toregulatethebufferstatetowards1/v

S<1/v S>1/v

v ‐vBufferParNNoning

Related Works�

8�

  Various approaches dealing with delay problems ApproachII[Yeh’01PhD],[Yeh’03ISIT]

‐SymmetricandhomogeneoususersinmulI‐accessfadingchannels

‐UsingstochasNcmajorizaNontheory,theauthorsshowedthatthelongestqueuehighestpossiblerate(LQHPR)policyisdelay‐opImal �

A

BCapacityregion

Longerqueueforuser1

higherrateforuser1

Related Works�

9�


Related Works�

10�


Technical Challenges To be Solved�

11�

Uplink OFDMA System Model�

H

OFDMA PHY Model�

14�

  OFDMA Physical Layer Model

OFD

MA

SubbandAllocaN

on&pow

erControl�

CSI

MobileK�The image cannot be displayed. Your computer may not

The image cannot be Mobile1�

BS�

E[XXH ] = I

H = {Hk,n}

OFDMAPHYSubcarrier&PowerAllocaIon

DataRateRk

Source Model and System States�

15�

G‐MAPPackets

YouTubePackets

CSI�

QSI�

CrossLayerController

(BS)

PHYState

MACLayer

G‐MAPPackets

PHYLayer

Power&SubbandAllocaNon

Ime�

PacketArrivals

PHYFrames

schedulingNmeslot  Channelisquasi‐staNcinaslot  i.i.d.betweenslots

MACState

YouTubePackets

OFDMA Queue Dynamics�

16�

  Time domain partitioned into scheduling slots

  CSI H(t) remains quasi-static within a slot and is i.i.d.

between slots

  Packet arrival A(t)=(A1(t) ,…,AK(t)) where Ak (t) i.i.d.

according to a general distribution P(A).

  Nk(t) denotes the random packet size, i.i.d.

  Qk(t) denotes the number of packets waiting in the k-th

buffer at the t-th slot.

  Global System State (CSI, QSI) TotalnumberofbitsTransmi`edinthet‐thslot

OFDMA Delay-Optimal Formulation�

17�

Stationary Power and Subband Allocation Control Policy

  A mapping from the system state to a power

and subband allocation actions.

(Power Constraint)

(Subband Allocation Constraint)

(Packet Drop Rate Constraint)


18�

Definitions: Average Delay, Power and Packet Drop Constraints

under a control policy

Li`le’sLaw:averageno.ofpackets=averagearrivalrate*averagedelaytheaveragedelay(intermsofseconds)theaveragequeuelength


19�

Problem Formulation: Find the optimal control policy that minimizes

“PosiIveWeighIngFactor”

ParetoOpImaldelayboundary

  Why the Optimization Problem is difficult? –  Hugedimensionofvariablesinvolved

(policy=setofacIonsoverallsystemstaterealizaIons)–  KqueuesarecoupledtogetherExponenIallyLargeStateSpace–  Ingeneral,wecannothaveexplicitclosed‐formexpressionofhowthe

objecIvefuncIon(averagedelay)isrelatedtothecontrolvariables(policy).

–  Theproblemisnotconvex

Solution: Markov Decision Problem (MDP)

Overview of Markov Decision Problem Formulation �

20�

  Specification of an Infinite Horizon Markov Decision Problem

–  DecisionsaremadeatpointsofNme–decisionepochs

–  SystemstateandControlAcNonSpace:

–  Atthet‐thdecisionepoch,thesystemoccupiesastate

–  ThecontrollerobservesthecurrentstateandappliesanacIon–  Per‐stageReward&TransiNonProbability

–  BychoosingacIonthesystemreceivesareward

–  ThesystemstateatthenextepochisdeterminedbyatransiIonprobabilitykernel

–  StaNonaryControlPolicy:–  ThesetofacIonsforallsystemstaterealizaIons

–  TheOpNmizaNonProblem:

–  AverageReward–  OpImalPolicy

R∗ = max

πlim

T→∞

1T

E[

T∑

t=1

R(St, At)

]At = π(St)

  Solution of an Markov Decision Problem

  Optimal average reward

  Optimal policy (Fixed Point Problem on Functional Space)

Overview of Markov Decision Problem Formulation �

21�

Constrained Markov Decision Problem Formulation �

22�

Lagrangian approach to the Constrained MDP:

CMDP Formulation: Find the optimal control policy that minimizes

Optimal Solution �

23�

  Infinite Horizon Average Reward MDP

  Given a stationary control policy ,

he random process evolves like a Markov Chain

with transition kernel:

  Solution is given by the “Bellman Equation”

“PotenIalfuncIon”(contribuIonofthestateitotheaveragereward)

“OpImalValue” EquaIonsand unknowns

  Centralized Solution ?

  Obtain knowledge of global QSI from K users (Uplink)?

  Heavy signaling loading to deliver these QSI from mobiles to BS

  Must have distributive solution !

Optimal Solution – Online Learning �

24�

  How to determine the potential function ?

  Brute-Force solution of the Bellman Equation ? (Value Iteration):

  Too complicated, exponential complexity and memory requirement

  Online stochastic learning !

  Iteratively estimate potential function based on real time

observation of CSI and QSI – online value iteration

Per-user Potential and LMs Initialization

Online Policy Improvement Based on Per-subband Auction

Online Per-user Potential and LMs Update [Local CSI, Local QSI]

Termination

  Distributive Solution:

Decentralized Solution (I) �

25�

  Online Per-user Primal-Dual Potential Learning Algorithm via

Stochastic Approximation

Remark (Comparison to the deterministic NUM) Deterministic NUM:IteraIveupdatesareperformedwithintheCSIcoherenceImelimitthenumberofiteraIonsandtheperformance.Proposed online algorithm:IteraIveupdatesevolvesinthesameImescaleastheCSIandQSIconvergetoabe`ersoluIon(nolongerlimitedbythecoherenceImeofCSI).

Both the per-user potential and 2 LMs

are updated simultaneously.

New Observation at the beginning of the (l+1)-th slot

Decentralized Solution (II) �

26�

  Per-stage auction with K bidders (MSs) and one auctioneer

(BS)

  Low complexity Scalarized Per-Subband Auction

  Bidding: Each user submits a bid

  Subband allocation:

  Power allocation:

  Charging:

  Lemma: The per-stage social optimal scalarized bid

(CSI,QSI) is Water‐leveldependsonQSI(viapotenIalfuncIon)

Decentralized Solution �

27�

Theorem (Convergence of online per-user learning) Under

some mild conditions, the distributive learning converges

almost surely.

Theorem (Asymptotically Global Optimal) For large K, the

online per-user learning algorithm is asymptotical global

optimal, and the summation of the per-user potential

approaches (w.p.1) to the solution of the centralized

Bellman equation.

Remark (Comparison to conventional stochastic learning) Conventional SL:(1)forunconstrainedMDPonlyorLMforCMDParedeterminedofflinebysimulaIon;(2)designedforcentralizedsoluIonwithcontrolacIondeterminedenIrelyfromthepotenIalupdateConvergenceProofbasedonstandard“contrac(onMapping”andFixed‐PointTheoremargument.Proposed SL:(1)simultaneousupdateofLMandthepotenIalfuncIon;(2)controlacIonisdeterminedbyalltheusers’potenIalviaper‐stageaucIonper‐userpotenIalupdateisNOTacontrac(onmapping&standardproofdoesnotapply.

Numerical Results�

28�

Average Delay per user vs SNR

Close‐to‐opImalperformanceevenforsmallnumberofusers

HugegainindelayperformancecomparedwithModified‐LargestWeightedDelayFirst(M‐LWDF),whichisthequeuelengthweightedthroughputmaximizaIon.


29�

Average Delay per user vs No. of users

ThedistribuIvesoluIonhashugegainindelayperformancecomparedwith3Baselines.


30�

Illustration of convergence property: Potential function vs. the scheduling slot index (K=10)

Conclusion �

31�

Online Per-user Learning: Simultaneous update of LMs and Potentials. Almost sure convergence

Asymptotically Global Optimal for large K

Optimal Strategy for the Auction Game: Delay-Optimal Power Control: Multi-Level Water-Filling (QSI water level; CSI instantaneous allocation) Delay-Optimal Subband Allocation: User selection based on (QSI,CSI)

References �

32�

• V.K.N.Lau,Y.Chen,“Delay‐Op(malPrecoderDesignforMul(‐StreamMIMOSystem”,toappearIEEETransac;onsonWirelessCommunica;ons,May2009.

• V.K.N.Lau,Y.Cui,“DelayOp(malPowerandSubcarrierAlloca(onforOFDMASystemviaStochas(cApproxima(on”,submièdtoIEEETransacIonsonWirelessCommunicaIon,2008.

• K.B.Huang,V.K.N.Lau,“StabilityandDelayofZero‐ForcingSDMAwithLimitedFeedback",submièdtoIEEETransacIonsonInformaIonTheory,Feb.2009.

• L.Z.Ruan,V.K.N.Lau,“Mul(‐levelWater‐FillingPowerControlforDelay‐Op(malSDMASystems”,submièdtoIEEETransacIonsonWirelessCommunicaIon,2008.

Outline - Hong Kong University of Science and Technology

Documents

Transcript of Outline - Hong Kong University of Science and Technology