Sophisticated Online Learning Scheme for Green Resource ... · requirements for all users. The...

16
1 Sophisticated Online Learning Scheme for Green Resource Allocation in 5G Heterogeneous Cloud Radio Access Networks Ismail AlQerm, Student Member, IEEE, Basem Shihada, Senior Member, IEEE Abstract—5G is the upcoming evolution for the current cellular networks that aims at satisfying the future demand for data services. Heterogeneous cloud radio access networks (H-CRANs) is envisioned as a new trend of 5G that exploits the advantages of heterogeneous and cloud radio access networks to enhance spectral and energy efficiency. Remote radio heads (RRHs) are small cells utilized to provide high data rates for users with high quality of service (QoS) requirements, while high power macro base station (BS) is deployed for coverage maintenance and low QoS users service. Inter-tier interference between macro BSs and RRHs and energy efficiency are critical challenges that accompany resource allocation in H-CRANs. Therefore, we propose an efficient resource allocation scheme using online learning, which mitigates interference and maximizes energy efficiency while maintaining QoS requirements for all users. The resource allocation includes resource blocks (RBs) and power. The proposed scheme is implemented using two approaches: centralized, where the resource allocation is processed at a controller integrated with the baseband processing unit and decentralized, where macro BSs cooperate to achieve optimal resource allocation strategy. To foster the performance of such sophisticated scheme with a model free learning, we consider users’ priority in RB allocation and compact state representation learning methodology to improve the speed of convergence and account for the curse of dimensionality during the learning process. The proposed scheme including both approaches is implemented using software defined radios testbed. The obtained results and simulation results confirm that the proposed resource allocation solution in H-CRANs increases the energy efficiency significantly and maintains users’ QoS. Index Terms—Resource allocation, Energy efficiency, Testbeds, Online learning, H-CRANs 1 I NTRODUCTION The wireless industry is witnessing an avalanche of mobile traffic fueled by the increasing interest in high data rate services such as video conferencing, online high definition video streaming, and the increasing pop- ularity of handheld devices. 5G is envisioned as the potential solution to handle this extraordinary data rate demand [1] [2]. Cloud radio access networks (CRANs) are recognized to reduce operating expenditures, man- age inter-cell interference, and provide high data rates with considerable energy efficiency performance [3] [4] [5]. It consists of remote radio heads (RRHs), which act as relays that forwards the users (UEs) data to the centralized baseband unit (BBU) pool for processing through wired/wireless fronthaul links. However, the fronthaul links limited capacity and long delays are lim- iting factors that degrade the performance of CRANs [6]. The energy efficiency of CRANs has been studied in the literature while designing resource allocation schemes. The authors in [7] formulated joint RRH selection and power consumption minimization, subjected to user QoS requirements and RRH power budget, as a group sparse beamforming problem. However, the proposed scheme ignored the fact that the fronthaul links have a limited capacity. The work in [8] aimed at optimizing the end- to-end TCP throughput performance of Mobile Cloud Computing (MCC) users in a CRAN network through topology configuration and rate allocation. One limita- tion of this work is that it did not constrain the capacity consumption of individual links. The authors in [9] min- imized the total network power consumption in a CRAN subjected to users’ QoS for both secure communication and efficient wireless power transfer, limited backhaul capacity, and power budget constraints. However, this proposal considers single tier network only. Consequently, it is necessary to decouple data and control signals in order to alleviate the influence of fronthaul links on energy efficiency and dedicate RRHs to provide high data rates without considering control functions [10]. On the other hand, high power macro base stations (BSs) in heterogeneous networks support coverage and guarantee backward compatibility with traditional cellular networks since small cells focus only on boosting the data rate in special zones [11] [12] [13]. Despite the fact that heterogeneous networks are able to improve the coverage and the capacity, inter-tier in- terference and the cumulative power consumption of the small cells are critical challenges that must be considered [14] [15]. The energy efficiency oriented resource alloca- tion for heterogeneous networks attracts the researchers’ attention in the literature. In [16], distributed power allocation for multi-cell OFDMA networks taking both energy efficiency and intercell interference mitigation into account was investigated, where bi-objective prob- lem was formulated and solved using multi-objective

Transcript of Sophisticated Online Learning Scheme for Green Resource ... · requirements for all users. The...

Page 1: Sophisticated Online Learning Scheme for Green Resource ... · requirements for all users. The resource allocation includes resource blocks (RBs) and power. The proposed scheme is

1

Sophisticated Online Learning Scheme forGreen Resource Allocation in 5G

Heterogeneous Cloud Radio Access NetworksIsmail AlQerm, Student Member, IEEE, Basem Shihada, Senior Member, IEEE

Abstract—5G is the upcoming evolution for the current cellular networks that aims at satisfying the future demand for data services.Heterogeneous cloud radio access networks (H-CRANs) is envisioned as a new trend of 5G that exploits the advantages ofheterogeneous and cloud radio access networks to enhance spectral and energy efficiency. Remote radio heads (RRHs) are smallcells utilized to provide high data rates for users with high quality of service (QoS) requirements, while high power macro basestation (BS) is deployed for coverage maintenance and low QoS users service. Inter-tier interference between macro BSs and RRHsand energy efficiency are critical challenges that accompany resource allocation in H-CRANs. Therefore, we propose an efficientresource allocation scheme using online learning, which mitigates interference and maximizes energy efficiency while maintaining QoSrequirements for all users. The resource allocation includes resource blocks (RBs) and power. The proposed scheme is implementedusing two approaches: centralized, where the resource allocation is processed at a controller integrated with the baseband processingunit and decentralized, where macro BSs cooperate to achieve optimal resource allocation strategy. To foster the performance ofsuch sophisticated scheme with a model free learning, we consider users’ priority in RB allocation and compact state representationlearning methodology to improve the speed of convergence and account for the curse of dimensionality during the learning process.The proposed scheme including both approaches is implemented using software defined radios testbed. The obtained results andsimulation results confirm that the proposed resource allocation solution in H-CRANs increases the energy efficiency significantly andmaintains users’ QoS.

Index Terms—Resource allocation, Energy efficiency, Testbeds, Online learning, H-CRANs

F

1 INTRODUCTION

The wireless industry is witnessing an avalanche ofmobile traffic fueled by the increasing interest in highdata rate services such as video conferencing, onlinehigh definition video streaming, and the increasing pop-ularity of handheld devices. 5G is envisioned as thepotential solution to handle this extraordinary data ratedemand [1] [2]. Cloud radio access networks (CRANs)are recognized to reduce operating expenditures, man-age inter-cell interference, and provide high data rateswith considerable energy efficiency performance [3] [4][5]. It consists of remote radio heads (RRHs), whichact as relays that forwards the users (UEs) data to thecentralized baseband unit (BBU) pool for processingthrough wired/wireless fronthaul links. However, thefronthaul links limited capacity and long delays are lim-iting factors that degrade the performance of CRANs [6].The energy efficiency of CRANs has been studied in theliterature while designing resource allocation schemes.The authors in [7] formulated joint RRH selection andpower consumption minimization, subjected to user QoSrequirements and RRH power budget, as a group sparsebeamforming problem. However, the proposed schemeignored the fact that the fronthaul links have a limitedcapacity. The work in [8] aimed at optimizing the end-to-end TCP throughput performance of Mobile CloudComputing (MCC) users in a CRAN network through

topology configuration and rate allocation. One limita-tion of this work is that it did not constrain the capacityconsumption of individual links. The authors in [9] min-imized the total network power consumption in a CRANsubjected to users’ QoS for both secure communicationand efficient wireless power transfer, limited backhaulcapacity, and power budget constraints. However, thisproposal considers single tier network only.

Consequently, it is necessary to decouple data andcontrol signals in order to alleviate the influence offronthaul links on energy efficiency and dedicate RRHsto provide high data rates without considering controlfunctions [10]. On the other hand, high power macrobase stations (BSs) in heterogeneous networks supportcoverage and guarantee backward compatibility withtraditional cellular networks since small cells focus onlyon boosting the data rate in special zones [11] [12] [13].Despite the fact that heterogeneous networks are ableto improve the coverage and the capacity, inter-tier in-terference and the cumulative power consumption of thesmall cells are critical challenges that must be considered[14] [15]. The energy efficiency oriented resource alloca-tion for heterogeneous networks attracts the researchers’attention in the literature. In [16], distributed powerallocation for multi-cell OFDMA networks taking bothenergy efficiency and intercell interference mitigationinto account was investigated, where bi-objective prob-lem was formulated and solved using multi-objective

Page 2: Sophisticated Online Learning Scheme for Green Resource ... · requirements for all users. The resource allocation includes resource blocks (RBs) and power. The proposed scheme is

2

optimization. The power allocation, resource block (RB)allocation and relay selection were optimized in [17] withthe goal of maximizing energy efficiency. An optimalpower allocation algorithm using equivalent conversionis proposed in [18] to maximize energy efficiency underinterference constraints. The authors in [19] explore asystem framework of cooperative green heterogeneousnetworks for 5G wireless communication systems. Ajoint efficient resource allocation was proposed in [20].To realize the 5G vision, Heterogeneous networks withmassive densification of small cells and CRANs arecombined in one network structure called heterogeneouscloud radio access networks (H-CRANs) to improvespectral efficiency, resource management, and energyefficiency [21] [22].

In this paper, we propose a green resource allocationscheme for the downlink of H-CRANs between RRHsand their associated UEs that aims at maximizing energyefficiency while satisfying the UE QoS requirements andmitigate the inter-tier interference. Allocation includesRB assignment and power allocation. The scheme isdeveloped using centralized and decentralized enhancedonline learning approaches. The centralized allocationis performed at a designated controller integrated withthe BBU, thanks to the macro BS, which is interfaced tothe BBU pool for coordinating the inter-tier interferenceand exchange resource allocation control signals. Thisalleviates the capacity and time delay constraints on thefronthaul and supports the burst traffic efficiently. Thedecentralized approach relies on macro BSs cooperationand local information analysis to let the macro BSsmake decisions to reach the appropriate allocation. Theproposed learning methodology functions in conjunctionwith an enhanced spectrum partitioning that classifiesthe available RBs according to the users traffic priorityand location. For example, UEs with high QoS traffic andlocated at the cell edge consume RBs that belongs to RBsset that is dedicated for high priority users. Moreover,compact state representation and Q-value approximationare employed to handle the curse of dimensionalityduring the learning process due to the large state space.To the best of our knowledge, there is no solutionsin the literature for resource allocation in H-CRANswith consideration of energy efficiency using machinelearning techniques. One of the advantages of onlinelearning is that it is model-free, which facilitates its usagein dynamic heterogeneous networks.

The key contributions of this paper are summarizedas follows,• We develop an enhanced spectrum partitioning

model that divides the available spectrum into twoRBs sets, where each set is dedicated for certaingroup of users according to their location and QoSrequirements.

• A centralized joint RBs and power allocation schemefor RRH and their associated UEs is proposed,which relies on a single controller integrated tothe BBU. This controller acquires the state infor-

mation and selects the most appropriate actionsthat enhance energy efficiency and guarantee QoSrequirements for UEs from different tiers.

• To eliminate the single point of failure possibilityand exploit the environment awareness capability ofthe macro BSs, we propose a decentralized resourceallocation scheme, where the macro BSs utilize theirlocal information and the BBU pool data to selectthe most appropriate resource allocation actions.

• The proposed online learning model incorporatescompact state representation and Q-value approx-imation to handle the curse of dimensionality, andaugment the algorithm convergence.

• We implement both resource allocation schemes us-ing software defined radio testbed to demonstratethe schemes capability to perform efficient resourceallocation in terms of energy efficiency, BS capacityand the encountered bit error rate (BER).

The paper is organized as follows, the motivation forthis paper is emphasized in section 2 while section 3presents the system model and the problem formulation.The learning model for resource allocation is describedin section 4. The approximated learning based central-ized approach is illustrated in section 5 while section 6describes the decentralized multi-agent based resourceallocation methodology. Testbed implementation setupalong with the achieved results are presented in section7. Numerical results are presented in section 8 and thepaper concludes in section 9.

2 MOTIVATION

H-CRANs aim at exploiting the advantages of bothheterogeneous networks and CRANs and overcome theirdrawbacks. Comparing with CRANs and heterogeneousnetworks, H-CRANs have been shown to exhibit sig-nificant performance gains though advanced collabo-rative signal processing and radio resource allocationare still challenging. Inter-tier interference between themacro BSs and RRHs has a severe impact on energyefficiency. Unlike traditional CRANs, inter-tier interfer-ence should be controlled by an advanced processingtechnique and the interference to the macro UEs mustbe maintained at low levels with sophisticated powerallocation techniques. The intra-tier interference betweenthe RRHs is another factor to degrade energy efficiencyin H-CRANs in addition to the inter-tier interferencebecause of the fronthaul capacity constraints. In addition,the UEs usually prefer to associate with RRHs becauselower transmission power is needed and more resourcesare allocated compared to association with macro BSs.This is reflected on energy efficiency of RRHs. Someadvanced algorithms in heterogeneous networks, thatconsider cell association and fractional frequency reuse(FFR), have been proposed in [23] and [24], respectivelyto mitigate inter-tier interference and improve spectraland energy efficiency. The EE-HCRAN in [24] aims toinvestigate the joint optimization problem with RB and

Page 3: Sophisticated Online Learning Scheme for Green Resource ... · requirements for all users. The resource allocation includes resource blocks (RBs) and power. The proposed scheme is

3

power allocation in H-CRANs under the constraint of theinter-tier interference and maintaining system capacity atcertain level. The optimization problem is formulated asa non-convex problem with the goal to maximize energyefficiency. To deal with the non-convexity, an equivalentconvex problem is created based on which an iterativealgorithm is proposed to achieve optimal solution. In[25], a power and sub-channel allocation problem forOFDM heterogeneous networks is investigated by con-sidering all users’ rate requirements and inter-cell inter-ference, a non-convex problem is formulated for powersaving purposes. An energy efficiency oriented powerand beam-forming vector allocation problem of OFDMAheterogeneous network is proposed in [26], where bydecomposing main problem to multiple single constraintproblems, a non-convex problem is solved. Authors in[27] proposed a joint optimization for RBs and powerallocation subjected to interference constraints in H-CRANs. The work in [28] formulated a joint optimizationproblem of relay selection, power allocation and networkselection to maximize the energy efficiency in H-CRANs.However, the proposed schemes are limited to the spec-ified optimization model, which is not practical in suchdynamic environment. Thus, inter-tier interference withQoS guarantee for each tier user, and resource allocationincluding RBs and power in an energy efficient fashionin H-CRANs is still a challenging problem that is notwell explored.

3 SYSTEM MODEL AND PROBLEM FORMULA-TION

The architecture of the considered H-CRANs system ispresented in Fig. 1, in which a two-tier cellular networkis shown, the macro tier and the small cells tier. Themacro BSs denoted by u ∈ U are underlaid with smallcells (RRHs) denoted by s ∈ S, and the BBU poolperforms all baseband processing functionalities. Theresource allocation process for RRHs is executed at thecontroller integrated with the BBU pool in the central-ized approach and it is achieved through cooperation be-tween macro BSs in the distributed approach. The inter-RRH interference between RRHs is jointly coordinated atthe BBU pool. The network users in this model are classi-fied into two categories: MUEs denoted by index m ∈M,which are the users associated with the macro BS, whileRUEs with index n are the users associated with RRH.The backhaul interface is utilized to link the macroBSs with the BBU for control exchange. All the RRHsare connected to the BBU pool by the fronthaul linksto facilitate data processing, transmissions, and cloudcomputing. The system RBs are denoted by k ∈ K withtotal bandwidth B. To enhance the spectral efficiencyand mitigate the inter-tier interference, the availablespectrum is partitioned into two sets of RBs: the first setdenoted by Γ1 incorporates RBs dedicated to RUEs withhigh QoS requirements or located at their correspondingcell edge. The second set of RBs denoted by Γ2 comprises

RBs to be shared among RUEs with low QoS or locatedat the cell center and MUEs [29]. This specific partition-ing improves the spectrum efficiency compared to thetraditional schemes and avoids random exploration ofmachine learning in RBs allocation, which expedites thelearning speed of convergence. The QoS requirement isdefined as the minimum data rate of the RUEs. The setsN = {1, 2, ...., N} and q = {N + 1, ....N + q} representthe users associated with RRH and occupying the setsΓ1 and Γ2 respectively. Let us assume that akn is the RBallocation indicator, akn = 1, if the RB k is allocated toRUE n associated with RRH s and 0 otherwise. For anRUE n served by RRH s on RB k, the received signal canbe written as,

ηks,n = P ks,ngks,nρ

kn + Ikn +N0 (1)

where P ks,n is the transmission power of RRH s allocatedto RUE n on RB k, ρkn is the transmitted informationsymbol for RUE n on RB k, N0 is the noise power, andgks,n = Hk

s,nls,n is the channel gain between RRH s andRUE n on RB k. Note that Hk

s,n and ls,n refer to thefast fading coefficient and the path loss between RRHs and RUE n respectively. The encountered interferencedenoted by Ikn is calculated according to the categoryof the RUE n determined by the set of RBs it accesses.Therefore, Ikn is calculated as follows,

Ikn =

∑Ni=1,i6=n(

∑r∈S,r 6=s P

kr,ig

kr,i) k ∈ Γ1∑M

m=1 Pku,mg

ku,m +

∑N+qi=N,i6=n(

∑r∈S,r 6=s P

kr,ig

kr,i)

k ∈ Γ2

(2)where P ku,m is the transmission power of macro BS uallocated to MUE m on RB k, and gku,m = Hk

u,mlu,m isthe channel gain between macro BS u and MUE m onRB k. Note that the index i represents the other RUEsserved by other RRH r. The signal to interference andnoise ratio (SINR) for nth RUE occupying kth RB is givenby,

γks,n =P ks,ng

ks,n

Ikn +N0(3)

Similarly, the received signal by an MUE m served bymacro BS u is written as follows,

ηku,m = P ku,mgku,mρ

km + Ikm +N0 (4)

where ρkm is the received information symbol and Ikm isthe encountered interference by the MUE m allocated toRB k and it is evaluated as follows,

Ikm =

N+q∑i=N

(∑s∈S

P ks,igks,i) (5)

The SINR achieved by MUE m utilizing RB k andassociated with macro BS u is found as follows,

γku,m =P ku,mg

ku,m

Ikm +N0(6)

Page 4: Sophisticated Online Learning Scheme for Green Resource ... · requirements for all users. The resource allocation includes resource blocks (RBs) and power. The proposed scheme is

4

Internet

BBU Pool

Gateway

RRH 1

RUE

MUE

RRH 3

RRH 2

MIMO

Fronthaul

X2/S1Backhaul

Macro BSRRH 1

RUE

MUE

RRH 3

RRH 2

MIMO

Macro BSRRH 1

RUE

MUE

RRH 3

RRH 2

MIMO

Macro BS

X2/S1Backhaul

X2/S1Backhaul

Gateway

Fronthaul

Fronthaul

Fig. 1. H-CRANs ArchitectureThe capacity for all RUEs associated with RRH s is

expressed as follows,

Cs =

N+q∑n=1

K∑k=1

aks,nB log2(1 + γks,n) (7)

Similarly, the capacity of all MUEs associated with themacro BS u is found as follows,

Cu =

M∑m=1

K∑k=1

aku,mB log2(1 + γku,m) (8)

where aku,m is the RB resource allocation indicator similarto aks,n. The power consumption of the H-CRANs systemis composed of the power consumed by RRHs and theirrelated links and the power consumed by macro BSs[30]. The power consumption of RRH s is evaluated asfollows,

PCs = Pct + Pf + Pe +

N+q∑n=1

K∑k=1

P ks,naks,n (9)

where Pct is the circuit power, Pf is the power consump-tion of the fronthaul links, and Pe is the power consumedin signal exchange.

The energy efficiency (EE) of the considered H-CRANs system is defined as follows,

EE =S× Cs + Cu

S× PCs + PCu(10)

where PCu is the power consumed by the macro tier.However, H-CRANs are established based on denseRRH deployment (i.e. the number of RRHs is muchlarger than macro BSs). The inter-tier interference fromRRHs to MUEs remains constant when the density ofRRHs is sufficiently high. Thus, the downlink spectralefficiency and energy efficiency performances for themacro BSs can be assumed to be stable [24] [30]. There-fore, with sufficiently high S, the energy efficiency canbe defined as,

EE =CsPCs

(11)

The allocation of RBs aks,n and transmission power P ks,nwith objective of EE maximization in the downlink of H-CRANs subjected to QoS requirements for all RUEs andMUEs and the fronthaul links constraints is formulatedas follows,

maxPks,n,ak

s,n

EE s.t (12)

C1:K∑k=1

aks,n ≤ zn, ∀s ∈ S

C2: aks,n ∈ {0, 1} ∀s ∈ S, ∀k ∈ K

C3:K∑k=1

aks,nB log2(1 + γks,n) ≥ θ, ∀n ∈ N

C4:K∑k=1

aks,nB log2(1 + γks,n) ≥ θ∗, ∀n ∈ q

C5:K∑k=1

aku,mB log2(1 + γku,m) ≥ δ, ∀m ∈M

C6: P ks,n ≤ aks,nPs,max, ∀s ∈ S,∀n ∈ N ∪ q,∀k ∈ K

C7:N+q∑n=1

φ(

K∑k=1

P ks,n) ≤ χmax, ∀s ∈ S

The constraint C1 limits the number of allocated RBs toRUE n to zn RBs, which prevents the cloud from greedilyallocating all the available RBs to its RUEs in order toleave some RBs for the other network tiers. C2 indicatesthat aks,n is a binary variable. The capacity constraintsC3 and C4 ensure that the achieved capacity for RUEwith high QoS requirements and low QoS requirementsRUE is above the thresholds θ and θ∗ respectively. C5is the constraint to guarantee QoS for the MUE m. Thetransmit power on an unallocated RB is enforced to bezero and the maximum allowed power for each RRHis indicated in C6 as Ps,max. Finally, C7 is a fronthaulconstraint, which limits the number of baseband signalstransmitted on the fronthaul link between the cloud and

Page 5: Sophisticated Online Learning Scheme for Green Resource ... · requirements for all users. The resource allocation includes resource blocks (RBs) and power. The proposed scheme is

5

RRH s. χmax is defined as the maximum number forthe transmitted signals on the fronthaul link between thecloud and RRH s and φ is a step function that takes value1 if

∑Kk=1 P

ks,n > 0 and 0 otherwise. P ks,n < 0 indicates

that RRH s does not serve RUE n and that the fronthaullink between the cloud and RRH s does not carry thebaseband signal for RUE n. On the other hand, if thereis at least one RB k such that P ks,n > 0 then RRH sserves RUE n and the fronthaul link between the cloudand RRH s is active. The resource allocation problemin (12) is tackled using enhanced online learning toreach ultimate allocation of RB and transmission powerto guarantee capacity in the designated sub-bands andinter-tier interference mitigation.

4 ONLINE LEARNING MODEL FOR RESOURCEALLOCATION

It is challenging to determine the exact state transitionmodel by applying a model-based dynamic program-ing algorithm for the following reasons: The spectrumpartitioning based RB allocation considered in this workis not only location dependent but also considers QoSrequirements. Moreover, it is not trivial to list all thestates and action pairs to migrate from one state toanother, and it is not practical to predefine the statetransition model in problem solving. For these reasons,we choose online Q-learning [31] for resource allocationin the H-CRANs specified model. The basic conceptof online learning is described as follows, when thenetwork is in state xt at time step t, a finite numberof possible actions yt, which are elements of the actionspace Y can be selected. As a result, a reward is received,which is the network feedback for the action selectedat state xt. In this section, we describe the learningmodel employed to achieve resource allocation that max-imizes the network energy efficiency. We suppose thatthe resource allocation action is chosen by the networkcontroller in the centralized approach and it is a result ofthe cooperative decision made by the macro BSs in thedistributed one within a slotted time step t. The consid-ered learning model for resource allocation is defined asζ = (N,q, γks,n, γku,m, aks,n, P ks,n, EE).

The online learning parameters are defined as follows:• State: the environment state at certain time step t is

defined as, xtn=(n, RUE location, θ, θ∗, δ, γks,n, γku,m).The state information is acquired from the BBU andfrom the macro BSs that are assumed to be awareof the small cells operate under their coverage.

• Action: the action ytn = (aks,n, Pks,n) is defined as

the allocation of RB and transmission power, RRHallocated to its associated RUEs.

• Reward: the reward function is the energy efficiencyand is defined as,

R(x, y) = EE(x, y) (13)

This indicates that the reward is achieved if theconditions in C1 to C7 are satisfied.

• Transition Function: for a given resource allocationstrategy π ∈ Π, the state transition probability isdefined as follows,

T (x, y, x′) = Pr(x(t+ 1) = x′|t = x, yt = y) (14)

Basically, the strategy π is defined as the probabilityof selection of action y at state x.

The optimal Q-value of the online learning model isdefined as the current expected reward plus a futurediscounted reward as follows,

Q∗(x, y) = E[EE(x, y)]

+β∑s′∈X

Tx,x′ (y) maxy′∈Y

Q∗(x′, y′) (15)

where Tx,x′(.) is the state transition probability and βis the discount factor. The optimal Q-value Q∗n(x, y)is learned by updating the Q-value function on thetransition from state x to state x′ under the action y intime stept t as follows,

Qt+1(x, y) = (1− αt)Qt(x, y) + αt[EE(x, y)

+βmaxy′∈YQt(x′, y′)] (16)

where αt ∈ (0, 1] is the learning rate. The initial Q-valuefor all (x, y) is arbitrary. The considered online learningmodel here is a stochastic approximation method thatsolves the Bellman’s optimality equation associated withthe discrete time Markovian decision process (DTMDP).Online learning does not require explicit state transitionprobability model and it converges with probability oneto an optimal solution if

∑∞t=1 α

t is infinite,∑∞t=1(αt)2

is finite, and all state action pairs are visited infinitelyoften [32]. Balancing exploration and exploitation is anessential issue in the stochastic learning process. Explo-ration aims to try new allocation strategies so it does notonly apply the strategies it already knows to be goodbut also explore new ones. Exploitation is the processof using well-established strategies. The most commontechnique to achieve exploration vs exploitation balanceis to use the ε-greedy selection [33], where ε is the percentof the time that an agent takes a randomly selectedaction rather than taking the action that is most likelyto maximize its reward given what it knows so far. Itusually starts with a lot of exploration (i.e. a higher valuefor ε). Over time, as the agent learns more about theenvironment and which actions yield the most long-termreward, it steadily reduce ε as it settles into exploitingwhat it knows.However, ε-greedy selects equally amongthe available actions i.e. ( the worst action is likely tobe chosen as the best one). In order to overcome thisdrawback, the action selection probabilities are variedas a graded function of the Q-value. The best powerlevel is given the highest selection probability, whileall other levels are ranked according to their Q-values.The learning algorithm exploits Boltzmann probabilitydistribution [34] to determine the probability of the re-source allocation action that fulfills the energy efficiency

Page 6: Sophisticated Online Learning Scheme for Green Resource ... · requirements for all users. The resource allocation includes resource blocks (RBs) and power. The proposed scheme is

6

maximization constraints in C1 to C7. Thus, the action yin state x is selected at t with the following probability,

πtn(x, y) =eQ

t(x,y)/τ∑y′∈Y e

Qt(x,y′)/τ(17)

where τ is a positive integer that controls the selectionprobability. With high value of τ , the action probabil-ities become nearly equal. However, low value of τcauses big difference in selection probabilities for actionswith different Q-values. One issue to report is that the5G H-CRANs system has a large space. Therefore, thecurse of dimensionality increases the required computa-tions and makes it unfeasible to use the typical onlinelearning methodology to maintain the Q-value for eachstate/action pair, which slows the system convergence.

5 CENTRALIZED APPROXIMATED ONLINELEARNING RESOURCE ALLOCATION SCHEME

In this approach, the resource allocation process is per-formed at a dedicated controller that is integrated withthe BBU pool and the macro BSs act as brokers betweenthe controller and the RRHs for control exchange withthe goal of inter-tier interference mitigation and energyefficiency maximization. Resource allocation and dataprocessing signals from the controller to the macro BSsare sent through X2 and S1 interfaces respectively, whichare obtained from definitions of the Third-GenerationPartnership Project (3GPP) standards [35]. All the entitiesin the network including RUEs, MUEs, RRHs reportthe channel state information and RUEs location to themacro BS that they operate under its coverage in a hier-archical manner. The channel state information includespath loss and channel gains from the serving RRH andthe macro BS to the RUEs and MUEs. All the macroBSs provide this information to the controller throughthe control exchange interface. The controller exploit thereported information and QoS requirements to selectthe proper RB and transmission power using onlinelearning. The allocation decision made by the controlleris sent to the RRH through the macro BS. Note that SINRis the state information exploited in action selection andit is determined according to the channel informationreported and the allocated transmission power.

The computational complexity of the system increasesalong with the size of the states and action spaces.The simple look-up table where separate Q-value ismaintained for each state/action pair is not feasiblein large space with massive number of states like oursystem. Therefore, we propose a brief representationfor the Q-values in which they are approximated as afunction of much smaller set of variables to account forthe curse of dimensionality. The brief representation ofQ-value focuses on a countable state space X∗ using thefunction Q′ : X∗ × Y , which is referred as a functionapproximator. The parameter vector ξ = {ξz}Zz=1 isadopted to approximate the Q-value by minimizing themetric of difference between Q∗(x, y) and Q′(x, y, ξ) for

all (x, y) ∈ X∗ × Y . Thus, the approximated Q′ value isformalized as follows,

Q′(x, y, ξ) =

Z∑z=1

ξzψz(x, y) = ξψT (x, y) (18)

where T denotes the transpose operator and the vectorψ(x, y) = [ψz(x, y)Zz=1] with a scalar function ψz(x, y)defined as the basis function (BF) over X∗ × Y , andξz(z = 1, ...., Z) are the associated weights. A gradientfunction ψ(x, y), which is a vector of partial derivativewith respect to the elements of ξt, is used to combinethe typical online learning model defined in (16) withthe linearly parametrized approximated online learningproposed.

The Q-value update rule in (16) is reconstructed toinclude the parameter vector updates as follows,

ξt+1ψT (x, y) ={

(1− αt)ξtψT (x, y)+

αt[EE(x, y) + βmax

y′∈YξtψT (x′, y′)

]}ψ(x, y) (19)

The probability of selecting certain action presentedin (17) is updated with the Q-value approximation asfollows,

πt(x, y) =eξ

tψT (x,y)/τ∑y′∈Y e

ξtψT (x,y)/τ(20)

The online learning process with approximated Q-valueis illustrated in Algorithm 1. The algorithm takes the QoS

Algorithm 1 Centralized approximated online learningalgorithm for resource allocationData: πt(x, y), t = 1, δ, θ∗, θ∗, γk

s,n, γku,m

Result: RB and P ks,n allocation for RUE

initialization of Learningfor each(x, y ∈ Y ′) do

initialize resource allocation strategy πt(x, y);initialize approximated Q-value ξtψT (x, y);

endwhile (true) do

evaluate the state x = xt

Select action y according to πt(x, y) in (20);if (C1 to C7 are satisfied ) then

R(x, y) is achievedelse

R(x, y) = 0endUpdate ξt+1

i ψT (x, y) according to (19)Update πt+1

i (x, y) according to (20)x = xt+1

t = t+ 1end

requirements for RUEs and MUEs as input to check thequality of the strategies selected and they are comparedto the capacity achieved by different BSs. The algorithmselects action strategies according to (20). If the condi-tions C1 to C7 are satisfied, then, the reward is achieved.Finally, the Q-value and resource allocation strategy areupdated according to (19) and (20) respectively, and thenew state is observed.

Page 7: Sophisticated Online Learning Scheme for Green Resource ... · requirements for all users. The resource allocation includes resource blocks (RBs) and power. The proposed scheme is

7

To demonstrate Algorithm 1 convergence, we foundthe necessary conditions for convergence of the proposedapproximated learning resource allocation. To start theproof, we introduce the following definition and as-sumptions [36] [37].

Definition 1: Let Ψ = E[ψT (x, y)ψ(x, y)]. For the pa-rameter vector ξ and a particular network state x ∈ X∗,we define a vector ψ(x, ξ) = [ψz(x, y)] for z = 1 → Zwhere y ∈ ℵ = {y = arg maxy′∈Y ξψ

T (x, y′)} is the set ofoptimal joint resource allocation actions for x. We definethe following a ξ-dependent matrix:

Ψ′ = E[ψT (x, ξ)ψ(x, ξ)] (21)

Assumption 1: The basis functions ψz(x, y) are linearlyindependent for all (x, y) and all the properties ofQt(x, y) in previous discussion are applicable to the dotproduct for the vectors ξtψT (x, y).

Assumption 2: For every z = (1, 2....Z), ψz(x, y) isbounded, which means E{ψ2

z(x, y)} <∞ and the rewardfunction satisfies E{R2(x, y)} <∞.

Assumption 3: The learning rate satisfies∑∞t=1 α

t =∞and

∑∞t=1(αt)2 <∞.

Proposition 1: With the assumptions 1-3 and Definition1, the approximated online learning algorithm convergeswith probability (w.p) 1, if

Ψ′ > Ψ, ∀ξ (22)

Proof:The proof of convergence requires finding stable fixed

points of the ordinary differential equations (ODE) asso-ciated with the update rule in (19), which can be writtenas.

ξ.t = E[EE(x, y) + βξtψT (x′, ξt)− ξtψT (x, y))ψ(x, y)](23)

where ξ.t = ∂ξ∂t as α → 0. We define two trajectories of

the ODE ξt1 and ξt2 that have different initial conditionsand satisfies ξt0 = ξt1 − ξt2. Then, we have

∂‖ξt0‖2

∂t= 2(ξ.t1 −ξ.t2 )(ξt0)T = 2βE

[ξt1ψ

T (x′, ξt1)ψ(x, y)(ξt0)T

−ξt2ψT (x′, ξt2)ψ(x, y)(ξt0)T]− 2ξt0Ψ(ξt0)T (24)

From the definition of ψ(x, ξ) in Definition 1, we candeduce the following two inequalities,

ξt1ψT (x′, ξt1) ≥ ξt1ψT (x′, ξt2) (25)

ξt2ψT (x′, ξt2) ≥ ξt2ψT (x′, ξt1) (26)

As the expectation E in (24) is taken over different statesand different actions, we can define two sets Λ+ ={(x, y) ∈ X × Y |ξt0ψT (x, y) > 0} and Λ− ∈ X × Y − Λ+.If we combine (25) and (26) in (24), we get,

∂‖ξt0‖2

∂t≥ 2β

(E[ξt0ψ

T (x′, ξt2))ψ(x, y)(ξt0)T∣∣Λ+

]+E[ξt0ψ

T (x′, ξt1))ψ(x, y)(ξt0)T∣∣Λ−])− 2ξt0Ψ(ξt0)T (27)

After the application of Holder’s inequality [38] to theexpectation in (27), we get,

∂‖ξt0‖2

∂t≥ 2β

(√E[(ξt0ψ

T (x′, ξt2))2∣∣Λ+

]×√

E[(ψ(x, y)(ξt0)T

)2∣∣Λ+

]+

√E[(ξt0ψ

T (x′, ξt1))2∣∣Λ−]

×√E[(ψ(x, y)(ξt0)T

)2∣∣Λ−])− 2ξt0Ψ(ξt0)T

≥ 2β

(√E[(ξt0ψ

T (x′, ξt2))2]×√E[(ψ(x, y)(ξt0)T

)2∣∣Λ+

]+

√E[(ξt0ψ

T (x′, ξt1))2]×√E[(ψ(x, y)(ξt0)T

)2∣∣Λ−])−2ξt0Ψ(ξt0)T

If we apply the definition of Ψ′ in Definition 1, we get,

≥ 2β√max

[ξt0Ψ′1(ξt0)T , ξt0Ψ′2(ξt0)T

]×√E[(ψ(x, y)(ξt0)T

)2]+−2ξt0Ψ(ξt0)T (28)

According to the condition in (21), we can state that,

∂‖ξt0‖2

∂t≥ 2βξt0Ψ(ξt0)T−2ξt0Ψ(ξt0)T = (2β−2)ξt0Ψ(ξt0)T < 0

(29)which means that ξt0 converges to the origin and thisconfirms that there exists a stable point of the ODE in(24). Thus, the proposed online learning with Q approx-imation converges w.p 1.

Consequently, the stable point ξ∗of the ODE in (24)indicates that,

0 = E[EE(x, y)+βξtψT (x′, ξt)−ξtψT (x, y))ψ(x, y)] (30)

and ξ∗ can be found as follows,

ξ∗ = E[EE(x, y) + βξ∗ψT (x′, ξ∗))ψ(x, y)]Ψ−1 (31)

As a result, the approximated online Q-function is statedas follows,

Q′(x, y, ξ∗) = ξ∗ψ(x, y) (32)

6 DECENTRALIZED MULTI-AGENT ONLINELEARNING RESOURCE ALLOCATION SCHEME

Even with the Q-function approximation using com-pact representation, the number of actions can growexponentially with the number of cells deployed inthe network. Thus, to have a centralized controller forresource allocation is not the best practice. As the macroBSs and can locally manage their operations, there is apossibility to deploy a decentralized resource allocationscheme using online learning that functions through themacro BSs. All the macro BSs can learn in a coopera-tive manner how to make local decisions for resourceallocation for the RUEs associated with RRHs. In thisway, the resource allocation task is achieved using multi-agent online learning, where macro BSs represent the

Page 8: Sophisticated Online Learning Scheme for Green Resource ... · requirements for all users. The resource allocation includes resource blocks (RBs) and power. The proposed scheme is

8

agents. The macro BSs estimate the SINR according tothe channel state information reported by the networknodes and the power allocated, and execute the onlinelearning algorithm for resource allocation.

We assume that the macro BSs learn in a team Markovgame defined as GE = {U, X ′, Y, T,R}with the commongoal to find a joint resource allocation strategy π thatmitigates the inter-tier interference and maximizes en-ergy efficiency, where U is the set of the macro BSs. Theoptimal Q-value Q∗(x, y) for all (x, y) ∈ X ′ × Y , definesthe optimal resource allocation strategy and capture themarkovian game structure. For each network state x ∈X ′, the action in the team Markov game is generated bythe U independent macro BSs in a decentralized fashion.The decided action y at state x is considered optimalif Q∗(x, y) ≥ Q∗(x, y′) for all y′ ∈ Y . The macro BSsare assumed to learn using the compact representationmodel stated in (19). Consequently, we can deduce thefollowing proposition.

Proposition 2: For the Markovian game GE, the decen-tralized multi-agent online learning algorithm convergesw.p. 1 if the condition in Proposition 1 holds.

Proof: If we consider each macro BS as a singlecontroller that follow a stationary resource allocationstrategy, then the Markovian game is a DTMDP asin section 5. As a result, the proof follows the sameprocedure of the proof of Proposition 1.

To proceed with the multi-agent learning, the follow-ing assumptions are made,

Assumption 4: The resource allocation strategy of dif-ferent macro BSs do not alter significantly in similarnetwork states.

Assumption 5: The initial network state xt evolves fol-lowing Harris recurrent Markov chain [39].According to Assumption 4, each macro BS can conjec-ture the allocation strategy of other macro BSs with-out explicit cooperation through the use of historicalknowledge if it encounters the same network state. Thesimilarity between network states can be measured interms of Hamming distance [40] denoted by DH(x, x′).We define the historical knowledge up to time step tusing σ-algebra as follows,

F (t) = σ({x(b), y(b)}tb=1, {R(x(b), y(b))}t−1b=1

)(33)

where the information of each experienced networkstate x(b), each performed joint action y(b) and net-work energy efficiency (the reward) R(x(b), y(b)) canbe obtained from the BBU. At each time step t, eachmacro BS checks the Hamming distance between thecurrent state x(t) and state x(b) in F (t). Then, it createsa sample set XF (xt, F (t)), which includes F most recentobservations from F (t) that has a minimum value of∑Ff=1DH(xt, xb). Now, the common reward Rc(x, y)

that all macro BSs receive after they perform a jointresource allocation action, which is defined as y ∈ Y , isset to 1 if y = argmaxy′∈YQ

′(x, y′, ξ∗) and 0 otherwise.Moreover, we define Yu for each macro BS as the set of

joint actions that gives the reward 1 in state xt. The de-centralized multi-agent learning for resource allocationprocess in H-CRANs is illustrated in Algorithm 2.

Algorithm 2 Decentralized online learning algorithm forresource allocationData: πt(x, y), t = 1, δ, θ∗, θ, γk

s,n, γku,m and 1 ≤ µ ≤ F ≤ ν

Result: RB, P ks,n allocation for RUEs

initialization of Learningfor each(x, y ∈ Y ′) do

initialize resource allocation strategy πt(x, y);initialize approximated Q-value ξtψT (x, y);

endwhile (true) do

evaluate the state x = xt

if (t < ν + 1) thenSelect action y according to πt(x, y) in (20);if (C1 to C7 are satisfied ) then

R(x, y) is achievedelse

R(x, y) = 0end

elseUpdate Yu(x

t) = {y|Rc(x, y) = 1} for xt

Randomly select YF (XF (xt, F (t))) out of F joint ac-

tions associated with XF (xt, F (t))

Calculate R′c(x, yj) according to (33) and populateY ′u(x

t)if (C8 and C9 hold) then

select the action from YF (XF (xt, F (t))) u Yu(x

t)else

select an action from Y ′u(xt)

endendUpdate ξt+1

i ψT (x, y) according to (19)Update πt+1

i (x, y) according to (20)x = xt+1

t = t+ 1end

Note that µ and ν are two integers that satisfy 1 ≤µ ≤ F ≤ ν. The algorithm states that when t < ν, theresource allocation process according to the probabilityin (19) is engaged to determine the resource allocationstrategy. From t = ν + 1, each macro BS selects µrecords of YF (XF (xt, F (t))) from the F joints actionsfor XF (xt, F (t)). If the following conditions C8 and C9are met, the macro BS u selects the allocation actionyu(b∗), where yu(b∗) = maxb{b|y(b) ∈ YF (XF (xt, F (t)))uYu(xt)}.• C8: there exists allocation action y = (yu, y−u) ∈Yu(xt) such that y′−u = y−u∀y′ = (y′u, y

′−u) ∈

YF (XF (xt, F (t)))• C9: there exists at least one actiony ∈ YF (XF (xt, F (t))) u Yu(xt)

However, if C8 and C9 are not met, macro BS u will selectan action from Y ′u(xt) = {yu|yu = argmaxy′u R

′c(x

t, y′u)},where

R′c(xt, yu) =

∑y−u

Rc(xt, y)

T tu(x

t, y−u)

ν(34)

ν are cases randomly drawn from F most recent actions,T tu(xt, y−u) is the number of times the conjectured action

Page 9: Sophisticated Online Learning Scheme for Green Resource ... · requirements for all users. The resource allocation includes resource blocks (RBs) and power. The proposed scheme is

9

of other macro BS y−u in state xt is performed.The convergence of {ξt} to the optimal ξ∗ arises as

a result of Proposition 2. With Assumptions 4 and 5, theteam stage game GE is reduced to a team game undernetwork states XF (xt, F (t)). According to theorem 1 in[41], the U macro BSs coordinate resource allocationstrategy for all xt as long as ν ≤ F/(ρGE + 2), whereρGE is the length of the shortest path in the best responsegraph of team stage game GE [42]. This confirms thatthe decentralized resource allocation algorithm will con-verge with probability one.

7 TESTBED IMPLEMENTATION

The H-CRANs system is implemented using radio front-ends testbeds to demonstrate the performance of the pro-posed resource allocation scheme. The logical architec-ture of the considered H-CRANs in testbed implemen-tation for both centralized and decentralized resourceallocation consists of four main components: centralizedBBU pool for baseband processing, macro BSs, RRHsrepresented by low power pico BSs, and the resourceallocation controller integrated to the BBU pool. Theheterogeneous network including macro and pico BSsare connected to the BBU pool through wired links.

As SINR is utilized by the online learning modelfor state information, this creates a need for a com-munication protocol that facilitates the acquisition ofchannel state information. The communication protocolperforms three tasks: synchronization, SINR estimationand resource allocation. Therefore, the resource alloca-tion frame structure is divided into three stages:

• Synchronization Stage: in this state each macroBS performs synchronization with the BSs operatingunder its coverage through broadcasting a beaconframe periodically to compensate for any misalign-ment in clock frequencies of each node. Moreover,the beacon frame includes some information aboutthe network such as the number of time steps ineach stage, and the time step occupancy in thefollowing stages.

• Acquisition Stage: this state is a TDMA-based state,where each UE report its channel state informationto its corresponding pico BS or macro BS. It is com-posed of three time slots, where the first and secondslots are spared for the macro BS and its associatedMUEs to acquire their channel state informationto find the channel gains of all links for each RB.The third slot is dedicated for pico BSs to reporttheir channel state information while other pico BSsestimate the channel gain with that pico BS on thesame RB.

• Resource Allocation Stage: at this state, the on-line learning-based resource allocation algorithmsare executed whether at the controller as in thecentralized approach or through macro BSs in thedecentralized approach.

7.1 Implementation SetupWe exploit GNU radio [43] as an software defined radio(SDR) development platform to create digital signals.GNU is an open-source software development toolkitthat provides signal processing blocks to implementwireless protocols. As GNU radio can only handle digitaldata, RF front ends are required to shift the basebandsignal to the desired center frequency. USRP-N210 fromEttus Research [29] is utilized as the RF front end inour setup with CBX daughter board that can operatein frequency range of 1 GHz to 6 GHz with one an-tenna. The USRP N210 provides high-bandwidth, high-dynamic range processing capability. It is designed fordemanding communications applications requiring thistype of rapid development. The product architecturecomprises a Xilinx Spartan 3A-DSP 3400 FPGA, 100MS/s dual ADC, 400 MS/s dual DAC and GigabitEthernet connectivity to stream data to and from hostprocessors. In addition, we add two Dell servers as SDRprocessors that run GNU radio to perform the basebandprocessing and include the controller implementationfor resource allocation. The centralized and the decen-tralized resource allocation topologies considered in thetestbed implementation are presented in Fig. 2 and Fig. 3respectively. We have two macro BSs, four pico BSs, fourRUEs, and two MUEs, where each of them is representedby one USRP-N210. In the centralized resource allocationapproach, only one dell server is utilized as the controllerand the unit for baseband processing. However, twoservers are used, where each one processes the resourceallocation algorithm implemented in each macro BS inthe distributed resource allocation approach.

BBU Pool

Pico 1

Macro

RUE 2RUE 1Pico 2

MUE

Controller

Pico 1

Macro

RUE 2RUE 1Pico 2

MUE

RUE 1 = user with high QoS requirements RUE 2 = user with low QoS requirements

Macro BS = USRP-N210Pico BS = USRP-N210

RUE = USRP-N210MUE = USRP-N210

Controller + BBU = Dell Server

Fig. 2. Centralized testbed implementation topology

BBU Pool

Pico 1

Macro

RUE 2RUE 1Pico 2

MUE

Pico 1

Macro

RUE 2RUE 1Pico 2

MUE

RUE 1 = user with high QoS requirements RUE 2 = user with low QoS requirements

Macro BS = USRP-N210 Pico BS = USRP-N210

RUE = USRP-N210 MUE = USRP-N210BBU = 2 Dell Server

Fig. 3. Decentralized testbed implementation topology

Page 10: Sophisticated Online Learning Scheme for Green Resource ... · requirements for all users. The resource allocation includes resource blocks (RBs) and power. The proposed scheme is

10

The conceptual models for each node in the networkfor both centralized and decentralized approaches forresource allocations are presented in Fig. 4. This modelconsists of the following modules: communication pro-tocol module, which is common in all nodes and itperforms the tasks mentioned before. The online learningalgorithm module, which includes the implementationof the resource allocation process in Algorithm 1 and2. This module is implemented in the controller inthe centralized approach and at the macro BS in thedistributed methodology. GNU Radio PHY module per-forms the physical layer functionality supported by thephysical layer modules in the GNU Radio software.Finally, SINR estimation module, which relies on theprobe block in GNU Radio, where the probe performsan average magnitude square process on the samplesacquired during the sensing time. The acquired resultsfrom the probe block is further used to estimate thesignal and the noise power. Signal power estimation isperformed by averaging the acquired samples from theprobe block over certain time while the noise power isestimated by considering the variance of the acquiredsamples. Table 1 presents the parameters used for testbed

SINR Estimation

GNU Radio PHY

Communication Protocol

Macro BS / MUERRH / RUE

Online Learning

Algorithm

Channel Acquired

Information

Controller

SINR Estimation

GNU Radio PHY

Communication Protocol

MUERRH / RUE

Online Learning Algorithm

Macro BS

Communication Protocol

GNU Radio PHY

SINR Estimation

Centralized Approach Decentralized Approach

Fig. 4. Conceptual model of the resource allocationsystemimplementation including the communication protocol,configuration and online learning parameters.

Parameter ValueCommunication protocol parameters

Acquisition stage time slots 3Sensing duration 10 msmodulation scheme gmsk

Configuration ParametersMacro BS Tx power 20 dBmSystem bandwidth 10 MHzCarrier frequency 2.4 GHzSampling rate 10 MHzAntenna gain 3 dBi

Online Learning Related ParametersLearning rate α 0.5Discount factor β 0.9

TABLE 1System Parameters

7.2 Testbed Experimental Results

In this section, we present the performance evalua-tion of our proposed centralized and decentralized ap-proaches compared to the equal power allocation stan-dard scheme. The logical structures in Fig. 2 and Fig.

3 are exploited as the network setup, where we haveH-CRANs that incorporates two macro BSs. Each macroBS has two small BSs operate under its coverage andone MUE. One of the small cells communicates with anRUE 1 with high QoS requirements while the other cellis connected to low QoS RUE 2. RUE 1 traffic type isvideo file streaming while RUE 2 and MUE receive asmall TCP file. The considered evaluation of the networkoperation targets UEs achieved capacity, network energyefficiency and UEs BER. Fig. 5, Fig. 6, and Fig. 7 presentthe average capacity achieved by the small cell withhigh QoS requirements RUE, the small cell with lowQoS requirements RUE, and macro BS with one MUErespectively. We notice that the our proposed scheme in

0 100 200 300 400 500

Time Step

2

4

6

8

10

12

14

16

18

20

22

Av

era

ge

Ca

pa

cit

y (

bit

s/s

ec

/Hz)

Standard

Centralized Resource Allocation

Decentralized Resource Allocation

Capacity Theshold

Fig. 5. Average high QoS pico BS capacity

0 100 200 300 400 500

Time Step

2

4

6

8

10

12

14

16

Av

era

ge

Ca

pa

cit

y (

bit

s/s

ec

/Hz)

Standard

Decentralized Rsource Allocation

Centeralized Resource Allocation

Capacity Threshold

Fig. 6. Average low QoS pico BS capacity

both approaches outperforms the standard, where fixedpower allocation is assumed in the achieved capacitiesfor all users. Moreover, the proposed scheme maintainsthe capacity above the threshold. In Fig. 5, the central-ized approach for resource allocation converges to capac-ity of 20 bits/sec/Hz while the decentralized approachsaturates at 18.5 bits/sec/Hz. However, the capacity ofthe low QoS requirements RUE is 14 bits/sec/Hz and12.8 bits/sec/Hz for both centralized and distributedresource allocation approaches respectively as in Fig. 6.Finally, the capacity for the macro UE is measured tobe 15 bits/sec/Hz for the centralized approach and 14

Page 11: Sophisticated Online Learning Scheme for Green Resource ... · requirements for all users. The resource allocation includes resource blocks (RBs) and power. The proposed scheme is

11

0 100 200 300 400 500

Time Step

1

2

4

6

8

10

12

14

16

18

Av

era

ge

Ca

pa

cit

y (

bit

s/s

ec

/Hz)

Standard

Centralized Resource Allocation

Decentralized Resource Allocation

Capacity Threshold

Fig. 7. Average macro BS capacity

bits/sec/Hz for the distributed approach as in Fig. 7.The convergence time for the centralized scheme is 300time steps while it is 330 for the decentralized scheme,where the time duration for each step is 0.065 ms. TheBER evaluations for RUE 1, RUE 2 and MUE are plottedin Fig. 8, Fig. 9, and Fig. 10 respectively. The proposed re-source allocation scheme in both approaches succeededto maintain BER below the BER threshold which is 0.001for RUE 1 as in Fig. 8 and below BER threshold forRUE 2 which is 0.005 as in Fig. 9. In addition, therecorded BER for the MUE using both decentralizedand centralized approaches for resource allocation ismaintained at 0.00038 and 0.00022 respectively as in Fig.10. Thus, our scheme reduces the BER by a factor of10 compared to the standard approach. The average

0 6 12 18 24 30

Time (ms)

0

0.002

0.004

0.006

0.008

0.01

0.012

0.014

Av

era

ge

BE

R

Standard

Decenterlized Reource Allocation

Centerlized Reource Allocation

BER Threshold

Fig. 8. Average BER experienced by high QoS RUE

energy efficiency for the small cell that supports RUE1 and the small cell that supports RUE 2 are plotted inFig. 11 and Fig. 12 respectively as a function of timestep with duration of 0.065 ms. Fig. 11 shows that ourproposed scheme achieves efficiency of 1.9 bps/Hz/Wand 1.75 bps/Hz/W using the centralized and decentral-ized approaches compared to 0.98 bps/Hz/W achievedby the standard. On the other hand, our scheme achievesenergy efficiency of 1.45 bps/Hz/W and 1.33 bps/Hz/Win both centralized and decentralized approaches respec-tively for RUE 2. Fig. 11 and Fig. 12 also note theconvergence time for both approaches, which is 300 for

6 12 18 24 30

Time (ms)

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

Av

era

ge

BE

R

Standard

Decentralized Resource Allocation

Centralized Resource Allocation

BER Threshold

Fig. 9. Average BER experienced by low QoS RUE

6 12 18 24 30

Time (ms)

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

Av

era

ge

BE

R

10-3

Standard

Centralized Resource Allocation

Decentralized Resource Allocation

BER Threshold

Fig. 10. Average BER experienced by MUE

50 100 150 200 250 300 350 400 450 500

Time Step

0

0.5

1

1.5

2

Ave

rag

e E

ner

gy

Eff

icie

ncy

(b

ps/

Hz/

W)

Centralized Resource Allocation

Decentralized Resource Allocation

Standard

Fig. 11. Average energy efficiency for high QoS pico BS

the centralized and 330 for the distributed.The system evaluation in Fig. 5, to Fig. 12 demon-

strates the capability of the implemented sophisticatedonline learning in resource allocation reflected by thesuperior results recorded. Moreover, we notice that theproposed scheme converges within short time thanksto the approximation of online learning Q-value thatreduces the difference between the approximated Q-value and the optimal one. Another point to reportfrom the evaluation is that the decentralized approachconverges at rate close to the centralized approach. Thisis because of the conjecture concept exploited in the

Page 12: Sophisticated Online Learning Scheme for Green Resource ... · requirements for all users. The resource allocation includes resource blocks (RBs) and power. The proposed scheme is

12

50 100 150 200 250 300 350 400 450 500

Time Step

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

Av

era

ge

En

erg

y E

ffic

ien

cy

(b

ps

/Hz/

W)

Centralized Resource Allocation

Decentralized Resource Allocation

Standard

Fig. 12. Average energy efficiency for low QoS pico BS

decentralized learning as in Algorithm 2 which does notrequire explicit information exchange between the macroBSs and relies on estimation of other BSs action basedon the historical knowledge for similar network states.This highlights a significant potential for deploying H-CRANs with distributed resource allocation approachin spite of the little loss in the achieved performancecompared to the centralized approach. This is becausethe centralized approach requires a dedicated controllerfor resource allocation, which increases the complexity ofthe network topology and makes it subjected to suddenfailure if the controller is down.

8 NUMERICAL RESULTS

In this section, we verify the performance of the pro-posed scheme in terms of energy efficiency, spectralefficiency, and QoS. The evaluation environment consistsof three macro BSs with 21 MUEs, 15 RRHs with 47RUEs accessing Γ1 and 28 RUEs sharing the spectrumwith MUEs in Γ2. It is assumed that the path-loss modelis expressed as 31.5 + 40 ∗ log10(d) for RRH to RUElink and 31.5 + 35 ∗ log10(d) for macro BS to RUE andRRH to MUE links, where d is the distance betweenthe transmitter and receiver. Fast-fading coefficients areall generated as independent and identically distributedRayleigh random variables with unit variances. The datarate thresholds per both types of RUEs and MUE θ, θ∗

and δ are assumed to be 2 Mbps, 512 Kbps, and 1.2 Mbps,respectively. We assume that the number of BFs is equalto the number of RBs that are within the correspondingsub-band. The rest of the simulation parameters arepresented in Table 2. The evaluation of the proposedscheme including energy efficiency, spectral efficiency,and data rate is investigated in the following sections.

8.1 Energy Efficiency EvaluationIn this evaluation, we are interested in measuring thespeed of convergence and the achieved energy efficiencyfor both types of RUEs. Thus, we conduct three sim-ulations to evaluate the system convergence, achievedenergy efficiency with variable maximum transmission

Parameter ValueUser distribution uniformNumber of RBs 50Total bandwidth 10 MHzRatio of Γ1 and Γ2 0.6 and 0.4Thermal noise power -112 dBmMacro BS transmission power 43 dBmBack-haul power consumption Pbh 23 dBmAntenna gain for macro/RRH 17/6 dBPico maximum transmission power 25 dBmTrials per experiment 1000

TABLE 2System Parameters

power of RRHs and energy efficiency against the SINRthreshold of MUEs. First, we plot energy efficiency asfunction of the number of time steps in Fig. 13. The per-formance of our centralized and decentralized resourceallocation is compared to two schemes including thestandard with fixed power allocation in which the samepower is allocated for all RBs and the scheme proposedin [24], which aims at tackling the energy efficiency prob-lem in H-CRANs denoted by (EE-HCRAN). Moreover,we include typical online learning resource allocation,which is online Q-learning without the enhancementof the brief representation proposed for the Q-value inwhich they are approximated as a function of muchsmaller set of variables. The inclusion of this typical

50 100 150 200 250 300 350 400 450 500 550 600

Time Step

0

0.5

1

1.5

2

2.5

3

3.5

4

En

erg

y E

ffic

ien

cy

fo

r R

UE

s (

bp

s/H

z/W

)

EE-HCRAN (S)

Decentralized Resource Allocation (S)

Standard (S)

Typical Online Learning (S)

Centralized Resource Allocation (S)

Cerntalized Resource Allocation (M)

Decentralized Resource Allocation (M)

Fig. 13. Energy efficiency convergence

online learning is to demonstrate the advantage of ap-proximation of online learning in speed of convergence.From Fig. 13, we notice that our scheme converges after300 iterations faster than typical online learning and EE-HCRAN and achieved the highest energy efficiency.

Second, we plot the achieved energy efficiency againstthe maximum transmission power of RRH (Pmax) in Fig.14. It is observed that the average energy efficiency ismonotonically non decreasing function of Pmax. Withsmall value of Pmax, the energy efficiency increasesuntil it saturates at Pmax of 23 dBm. This is due tothe fact that the compared schemes aim to balance thesystem energy efficiency and the power consumption.The further increase in transmission power will resultin degradation in energy efficiency. We notice that bothversions of our scheme achieved higher energy efficiencythan EE-HCRAN.

Page 13: Sophisticated Online Learning Scheme for Green Resource ... · requirements for all users. The resource allocation includes resource blocks (RBs) and power. The proposed scheme is

13

5 10 15 20 25 30 35 40

Pmax (dBm)

2

2.2

2.4

2.6

2.8

3

3.2

3.4

3.6E

nerg

y E

ffic

ien

cy f

or

RU

Es (

bp

s/H

z/W

)

Decentralized Resource Allocation

EE-HCRAN

Centralized Resource Allocation

Fig. 14. Energy efficiency with variable Pmax

The third evaluation focuses on the system with vari-able SINR threshold of MUEs. Fig. 15 presents the energyefficiency achieved versus the SINR threshold of MUEsaccessing Γ2 with Pmax = 25 dBm. We notice that theproposed scheme outperforms both EE-HCRAN and thestandard schemes. Fig. 15 reveals that when SINR thresh-old is not large, the energy efficiency is stable with theincreasing threshold because the inter-tier interference isnot severe thanks to the proposed RB allocation strategyin conjunction with sophisticated online learning. Thisindicates that the proposed solution mitigates the inter-tier interference and provide higher bit rates for MUEsthan the other schemes.

0 2 4 6 8 10 12 14 16 18

SINR Threshold of MUE (dB)

2

2.2

2.4

2.6

2.8

3

3.2

3.4

3.6

En

erg

y E

ffic

ien

cy f

or

RU

Es

(bp

s/H

z/W

)

Centralized Resource Allocation

Decentralized Resource Allocation

EE-HCRAN

Standard

Fig. 15. Energy efficiency with variable SINR thresholdfor MUEs

8.2 Spectral Efficiency and QoS EvaluationIn this section, we evaluate the proposed scheme per-formance in terms of spectral efficiency and QoS repre-sented by the data rate achieved by both RUEs accessingΓ1 and Γ2, and MUEs. Thus, we plot the average systemspectral efficiency against the time steps and maximumtransmission power of RRH in Fig. 16 and Fig. 17 re-spectively. Fig. 16 emphasizes the speed of convergenceachieved by our scheme compared to others. Our schemeis the fastest with the highest spectral efficiency. In Fig.17, spectral efficiency with variable maximum transmis-sion power shows that both approaches of our scheme

record the best level of spectral efficiency compared toother schemes.

100 200 300 400 500 600 700 800 900 1000

Time Step

0

55

110

165

220

275

330

Avera

ge S

pectr

al E

ffic

ien

cy (

bp

s/H

z)

Standard (S)

EE-HCRAN (S)

Centralized Resource Allocation (S)

Decentralized Resource Allocation (S)

Decentralized Resource Allocation (M)

Centralized Resource Allocation (M)

Fig. 16. Spectral efficiency convergence

15 20 25 30 35

Maximum Transmission Power (dBm)

165

185

205

225

245

265

285

305

Ave

rage

Spe

ctra

l Eff

icie

ncy

(bps

/Hz)

Standard

Centralized Resource Allocation

Decentralized Resource Allocation

EE-HCRAN

Fig. 17. Spectral efficiency with variable maximum RRHpower

QoS requirements stated in C2, C3 and C4 are inves-tigated in this evaluation. Fig. 18, Fig. 19 and Fig. 20present the cumulative distribution function (CDF) ofthe data rate for both RUEs accessing sub-band Γ1 andΓ2, and the CDF of the data rate achieved by MUEsrespectively. We notice that our scheme (CentralizedRA) and (Decentralized RA) is the only scheme thatmanaged to have more than 97 % of the users abovethe the specified thresholds θ, θ∗ and δ compared toEE-HCRAN that records 78 % above threshold, andstandard with 65 %. This evaluation demonstrates thecapability of the online learning scheme to allocate RBand power efficiently while maintaining QoS of users atthe maximum level.

8.3 Other Evaluations Aspects and Discussion

We consider the case of dynamic users who join andleave the system frequently. These users may be viewedas distributed randomly. Their arrival is modeled as ho-mogeneous Poisson point process with intensities λ = 3.Different users have independent duration to stay inthe system. We notice the impact of considering users

Page 14: Sophisticated Online Learning Scheme for Green Resource ... · requirements for all users. The resource allocation includes resource blocks (RBs) and power. The proposed scheme is

14

0 2 4 6 8 10

Data Rate (Mbps)

0

0.2

0.4

0.6

0.8

1E

mp

rica

l CD

F

Standard (S)

EE-HCRAN (S)

Decentralized RA (S)

Centralized RA (S)

Decentralized RA (M)

Centralized RA (M)

Threshold

Fig. 18. Data rate CDF for RUEs accessing Γ1

0 0.5 1 1.5 2 2.5 3 3.5

Data Rate (Mbps)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Em

pir

ical

CD

F

Standard (S)

EE-HCRAN (S)

Decentralized RA (S)

Centralized RA (S)

Decentralized RA (M)

Centralized RA (M)

Threshold

Fig. 19. Data rate CDF for RUEs accessing Γ2

0 .6 1.2 1.8 2.4 3 3.6

Data Rate (Mbps)

0

0.2

0.4

0.6

0.8

1

Em

pe

ric

al

CD

F

Standard (S)

EE-HCRAN (S)

Centralized RA (S)

Decentralized RA (S)

Decentralized RA (M)

Centralized RA (M)

Threshold

Fig. 20. Data rate CDF for MUEs accessing Γ2

mobility on the achieved energy efficiency, spectral effi-ciency and data rate in the results of Fig. 13, Fig. 16, andFig. 18 to Fig. 20. Fig. 13 and Fig. 16 reveal that thereis a drop of 27 % in the energy efficiency and spectralefficiency as a result of user mobility when comparingthe performance of our scheme in the case of stationaryusers (S) and mobile users (M). However, our schemehas only a drop of 14 % in the data rate achieved forboth types of RUEs and MUEs and all the achieved ratesare still above the threshold as in Fig. 18 to Fig. 20. Thedegradation in performance in the mobile users case isdue to the difficulty to acquire the state information as

Evaluation Metric ProposedCentralizedTestbed

ProposedDecentralizedTestbed

ProposedCentralized(S)Simulation

ProposedDecentralized(S)Simulation

Energy efficiencyfor high QoS RUEs

2bps/Hz/W

1.7bps/Hz/W

2.15bps/Hz/W

1.92bps/Hz/W

Energy efficiencyfor low QoS RUEs

1.45bps/Hz/W

1.25bps/Hz/W

1.62bps/Hz/W

1.34bps/Hz/W

Spectral efficiency 285 bps/Hz 260 bps/Hz 302 bps/Hz 275 bps/Hz

Average data ratefor high QoS RUEs

5.6 Mbps 5 Mbps 6 Mbps 5.55 Mbps

Average data ratefor low QoS RUEs

2 Mbps 1.6 Mbps 2.2 Mbps 1.78 Mbps

Average data ratefor MUEs

2.1 Mbps 1.73 Mbps 2.3 Mbps 1.9 Mbps

Convergence Speed(time steps) 300 325 290 310

TABLE 3Testbed and numerical results comparison

it changes frequently.From the presented results, we notice that our pro-

posed scheme with both approaches outperforms theEE-HCRAN scheme in terms of the achieved energyefficiency with gain of 16 % for the decentralized re-source allocation and 24 % for the centralized resourceallocation. The use of machine learning in resourceallocation is superior compared to the simple convexoptimization used in EE-HCRAN for resource allocationas online learning does not require specific model ofthe network environment. This is a significant factor as5G H-CRANs are a dynamic environment, which cannotbe tied to specific model. Therefore, online learning,which adopts learning from experience approach is agood fit for resource allocation problem. In addition,the compact representation for states and approximationfor Q-value contribute to the enhancement of speed ofconvergence in our scheme. Another issue to note inthe proposed scheme is that the decentralized approachachieved comparable results in the evaluation whichreflects the advantage of the conjecture feature in theexploited learning approach. This feature eliminates theneed for explicit cooperation between the macro BSsto exchange information that improve the quality ofthe selected action. In addition, this relives the BBUpool from resource allocation responsibility and reducesthe signal processing overhead. Another factor for thesuperior performance achieved is the sophisticated RBallocation mechanism that considers both location andQoS of the RUEs. Moreover, appropriate power allo-cation has a considerable contribution to the achievedresults.

Furthermore, we compare the testbed results and nu-merical results in terms of energy efficiency for highQoS RUEs, energy efficiency for low QoS RUEs, systemspectral efficiency, data rate for both types RUEs andMUEs, and convergence speed. The results are comparedon the basis of the optimal (maximum) value reached forenergy efficiency and spectral efficiency and on averagefor data rate. Table 3 presents the comparison results.We notice that the numerical results record better values

Page 15: Sophisticated Online Learning Scheme for Green Resource ... · requirements for all users. The resource allocation includes resource blocks (RBs) and power. The proposed scheme is

15

than the testbed implementation. The reason for thatlittle win is due to real time interfering means such asother wireless devices in the lab. In addition, hardwarelimitation on processing and synchronization with thecontrolling unit may degrade the achieved results.

Finally, we study the impact of the spectrum parti-tioning on the achieved energy efficiency and spectralefficiency. This impact is evaluated in Fig. 21 and Fig. 22respectively, with assumption that each RUE is allocatedat least one RB to guarantee minimum QoS require-ments. Fig. 21 and Fig. 22 show that the energy efficiencyand spectral efficiency increase linearly with the ratio ofΓ1 to ΓA ( ΓA = the total RBs for both Γ1 and Γ2). More-over, the centralized resource allocation achieves betterefficiency than the decentralized approach. Increasingthe ratio Γ1 to ΓA against Γ2 provides more RBs forRRHs, reduces the inter-tier interference as the sharedRBs in Γ2 are decreased. In addition, the increase of Γ1

ratio enhances the achieved SINR for RUEs and it allowsthe RUEs that access Γ2 to raise their transmission poweras the inter-tier interference is reduced. This will furtherenhances the energy efficiency of the H-CRANs. Despitethe fact that the increase of Γ1 enhances the achievedenergy and spectral efficiencies, it affects the fairnessof resource allocation, which opens directions for futureresearch on how to balance this tradeoff.

The ratio of spectrum resources Γ1 to Γ

A

0.4 0.5 0.6 0.7 0.8

Ene

rgy

Eff

icie

ncy

of R

UE

s (b

ps/H

z/W

)

2.6

2.8

3

3.2

3.4

3.6

3.8

4

4.2Centralized Resource Allocation (S)Decentralized Resource Allocation (S)

Fig. 21. Energy Efficiency for different ratios of Γ1 to ΓA

The ratio of spectrum resources Γ1 to Γ

A

0.4 0.5 0.6 0.7 0.8

Ave

rage

Spe

ctra

l Effi

cien

cy (b

ps/H

z)

220

240

260

280

300

320

340

360Centralized Resource Allocation (S)Decentralized Resource Allocation (S)

Fig. 22. Spectral Efficiency for different ratios of Γ1 to ΓA

9 CONCLUSION

In this paper, green resource allocation scheme in H-CRANs following online learning based centralized anddecentralized approaches is proposed. RBs and transmis-sion power are allocated subjected to inter-tier interfer-ence and capacity constraints. The centralized approachplaces a dedicated controller integrated to the BBU poolto perform resource allocation while the decentralizedapproach seizes the macro BSs awareness about RRHsoperates under their coverage and assign them the taskof resource allocation in a distributed fashion. Sophisti-cated frequency partitioning is proposed to account forinterference and support better online learning explo-ration. Approximated online learning methodology isestablished to perform joint allocation for RB and trans-mission power. Testbed implementations and simulationresults including energy efficiency, spectral efficiency,BER evaluation and data rate have demonstrated thecapability of the proposed scheme to allocate resourcesin green and suppressed interference fashion.

REFERENCES

[1] M. Agiwal, A. Roy, and N. Saxena, “Next generation 5g wire-less networks: A comprehensive survey”, IEEE CommunicationsSurveys Tutorials, vol. 18, no. 3, pp. 1617–1655, thirdquarter 2016.

[2] Nisha Panwar, Shantanu Sharma, and Awadhesh Kumar Singh,“A survey on 5g: The next generation of mobile communication”,Physical Communication, vol. 18, Part 2, pp. 64 – 84, 2016, SpecialIssue on Radio Access Network Architectures and Resource Man-agement for 5G.

[3] C. L. I, C. Rowell, S. Han, Z. Xu, G. Li, and Z. Pan, “Towardgreen and soft: a 5g perspective”, IEEE Communications Magazine,vol. 52, no. 2, pp. 66–73, February 2014.

[4] M. Peng and W. Wang, “Technologies and standards fortd-scdmaevolutions to imt-advanced”, IEEE Communications Magazine, vol.47, no. 12, pp. 50–58, Dec 2009.

[5] Y. Cai, F. R. Yu, and S. Bu, “Dynamic operations of cloud radioaccess networks (c-ran) for mobile cloud computing systems”,IEEE Transactions on Vehicular Technology, vol. 65, no. 3, pp. 1536–1548, March 2016.

[6] M. Peng, C. Wang, V. Lau, and H. V. Poor, “Fronthaul-constrainedcloud radio access networks: insights and challenges”, IEEEWireless Communications, vol. 22, no. 2, pp. 152–160, April 2015.

[7] Y. Shi, J. Zhang, and K. B. Letaief, “Group sparse beamforming forgreen cloud-ran”, IEEE Transactions on Wireless Communications,vol. 13, no. 5, pp. 2809–2823, May 2014.

[8] Y. Cai, F. R. Yu, and S. Bu, “Dynamic operations of cloud radioaccess networks (c-ran) for mobile cloud computing systems”,IEEE Transactions on Vehicular Technology, vol. 65, no. 3, pp. 1536–1548, March 2016.

[9] D. W. K. Ng and R. Schober, “Secure and green swipt indistributed antenna networks with limited backhaul capacity”,IEEE Transactions on Wireless Communications, vol. 14, no. 9, pp.5082–5097, Sept 2015.

[10] M. Peng, Y. Li, T. Q. S. Quek, and C. Wang, “Device-to-deviceunderlaid cellular networks under rician fading channels”, IEEETransactions on Wireless Communications, vol. 13, no. 8, pp. 4247–4259, Aug 2014.

[11] M. Peng, D. Liang, Y. Wei, J. Li, and H. H. Chen, “Self-configuration and self-optimization in lte-advanced heteroge-neous networks”, IEEE Communications Magazine, vol. 51, no. 5,pp. 36–45, May 2013.

[12] R. Q. Hu, Y. Qian, S. Kota, and G. Giambene, “Hetnets - a newparadigm for increasing cellular capacity and coverage [guesteditorial]”, IEEE Wireless Communications, vol. 18, no. 3, pp. 8–9, June 2011.

Page 16: Sophisticated Online Learning Scheme for Green Resource ... · requirements for all users. The resource allocation includes resource blocks (RBs) and power. The proposed scheme is

16

[13] S. Sun, K. Adachi, P. H. Tan, Y. Zhou, J. Joung, and C. K. Ho,“Heterogeneous network: An evolutionary path to 5g”, in 201521st Asia-Pacific Conference on Communications (APCC), Oct 2015,pp. 174–178.

[14] Y. Dong, Z. Chen, P. Fan, and K. B. Letaief, “Mobility-awareuplink interference model for 5g heterogeneous networks”, IEEETransactions on Wireless Communications, vol. 15, no. 3, pp. 2231–2244, March 2016.

[15] Tao Han, Guoqiang Mao, Qiang Li, Lijun Wang, and Jing Zhang,“Interference minimization in 5g heterogeneous networks”, Mo-bile Networks and Applications, vol. 20, no. 6, pp. 756–762, 2015.

[16] Z. Fei, C. Xing, N. Li, and J. Kuang, “Adaptive multiobjective opti-misation for energy efficient interference coordination in multicellnetworks”, IET Communications, vol. 8, no. 8, pp. 1374–1383, May2014.

[17] C. Y. Ho and C. Y. Huang, “Energy efficient subcarrier-powerallocation and relay selection scheme for ofdma-based cooper-ative relay networks”, in 2011 IEEE International Conference onCommunications (ICC), June 2011, pp. 1–6.

[18] Y. Wang, W. Xu, K. Yang, and J. Lin, “Optimal energy-efficientpower allocation for ofdm-based cognitive radio networks”, IEEECommunications Letters, vol. 16, no. 9, pp. 1420–1423, September2012.

[19] R. Q. Hu and Y. Qian, “An energy efficient and spectrum efficientwireless heterogeneous network framework for 5g systems”, IEEECommunications Magazine, vol. 52, no. 5, pp. 94–101, May 2014.

[20] I. AlQerm and B. Shihada, “A cooperative online learning schemefor resource allocation in 5g systems”, in 2016 IEEE InternationalConference on Communications (ICC), May 2016, pp. 1–7.

[21] M. Peng, Y. Li, J. Jiang, J. Li, and C. Wang, “Heterogeneous cloudradio access networks: a new perspective for enhancing spectraland energy efficiencies”, IEEE Wireless Communications, vol. 21,no. 6, pp. 126–135, December 2014.

[22] M. A. Marotta, N. Kaminski, I. Gomez-Miguelez, L. Z. Granville,J. Rochol, L. DaSilva, and C. B. Both, “Resource sharing inheterogeneous cloud radio access networks”, IEEE Wireless Com-munications, vol. 22, no. 3, pp. 74–82, June 2015.

[23] D. Lopez-Perez, X. Chu, A. V. Vasilakos, and H. Claussen, “Powerminimization based resource allocation for interference mitigationin ofdma femtocell networks”, IEEE Journal on Selected Areas inCommunications, vol. 32, no. 2, pp. 333–344, February 2014.

[24] M. Peng, K. Zhang, J. Jiang, J. Wang, and W. Wang, “Energy-efficient resource assignment and power allocation in heteroge-neous cloud radio access networks”, IEEE Transactions on VehicularTechnology, vol. 64, no. 11, pp. 5275–5287, Nov 2015.

[25] X. Sun and S. Wang, “Resource allocation scheme for energysaving in heterogeneous networks”, IEEE Transactions on WirelessCommunications, vol. 14, no. 8, pp. 4407–4416, Aug 2015.

[26] J. Tang, D. K. C. So, E. Alsusa, K. A. Hamdi, and A. Shojaei-fard, “Resource allocation for energy efficiency optimizationin heterogeneous networks”, IEEE Journal on Selected Areas inCommunications, vol. 33, no. 10, pp. 2104–2117, Oct 2015.

[27] W. Pramudito and E. Alsusa, “A hybrid resource managementtechnique for energy and qos optimization in fractional frequencyreuse based cellular networks”, IEEE Transactions on Communica-tions, vol. 61, no. 12, pp. 4948–4960, December 2013.

[28] Y. Zhang, Y. Wang, and W. Zhang, “Energy efficient resourceallocation for heterogeneous cloud radio access networks withuser cooperation and qos guarantees”, in 2016 IEEE WirelessCommunications and Networking Conference, April 2016, pp. 1–6.

[29] N. Saquib, E. Hossain, and D. I. Kim, “Fractional frequencyreuse for interference management in lte-advanced hetnets”, IEEEWireless Communications, vol. 20, no. 2, pp. 113–122, April 2013.

[30] D. Feng, C. Jiang, G. Lim, L. J. Cimini, G. Feng, and G. Y. Li,“A survey of energy-efficient wireless communications”, IEEECommunications Surveys Tutorials, vol. 15, no. 1, pp. 167–178, First2013.

[31] Richard S. Sutton and Andrew G. Barto, Introduction to Reinforce-ment Learning, MIT Press, Cambridge, MA, USA, 1st edition, 1998.

[32] C. J. C. H. Watkins and P. Dayan, “Q-learning”, Mach. Learn, vol.8, no. 3, pp. 279–292, 1992.

[33] X. Zhou, R. Zhang, and C. K. Ho, “Wireless information andpower transfer: Architecture design and rate-energy tradeoff”,IEEE Transactions on Communications, vol. 61, no. 11, pp. 4754–4767, November 2013.

[34] A. D. Tijsma, M. M. Drugan, and M. A. Wiering, “Comparingexploration strategies for q-learning in random stochastic mazes”,

in 2016 IEEE Symposium Series on Computational Intelligence (SSCI),Dec 2016, pp. 1–8.

[35] A. Checko, H. L. Christiansen, Y. Yan, L. Scolari, G. Kardaras,M. S. Berger, and L. Dittmann, “Cloud ran for mobile networks;atechnology overview”, IEEE Communications Surveys Tutorials, vol.17, no. 1, pp. 405–426, Firstquarter 2015.

[36] Eyal Even-Dar and Yishay Mansour, “Convergence of optimisticand incremental q-learning”, in Proceedings of the 14th InternationalConference on Neural Information Processing Systems: Natural andSynthetic, Cambridge, MA, USA, 2001, NIPS’01, pp. 1499–1506,MIT Press.

[37] Vassilis A. Papavassiliou and Stuart Russell, “Convergence ofreinforcement learning with general function approximators”, inProceedings of the 16th International Joint Conference on ArtificialIntelligence - Volume 2, San Francisco, CA, USA, 1999, IJCAI’99,pp. 748–755, Morgan Kaufmann Publishers Inc.

[38] “On nonlinear fractional programming”, Management Science, vol.13, no. 7, pp. 492–498, 1967.

[39] Gareth O. Roberts and Jeffrey S. Rosenthal, “Harris recurrence ofmetropolis-within-gibbs and trans-dimensional markov chains”,The Annals of Applied Probability, vol. 16, no. 4, pp. 2123–2139,2006.

[40] L. N. de Castro and F. J. Von Zuben, “Learning and optimiza-tion using the clonal selection principle”, IEEE Transactions onEvolutionary Computation, vol. 6, no. 3, pp. 239–251, Jun 2002.

[41] Y. S. Soh, T. Q. S. Quek, M. Kountouris, and H. Shin, “Energy ef-ficient heterogeneous cellular networks”, IEEE Journal on SelectedAreas in Communications, vol. 31, no. 5, pp. 840–850, May 2013.

[42] Francesca Gagliardi, “Institutions and economic change: A criticalsurvey of the new institutional approaches and empirical evi-dence”, Journal of Behavioral and Experimental Economics (formerlyThe Journal of Socio-Economics), vol. 37, no. 1, pp. 416–443, February2008.

[43] GNU Radio, “[online available at]”, http://gnuradio.org/trac.

Ismail AlQerm is a PhD candidate in computerscience at King Abdullah University of Scienceand Technology (KAUST) and a senior memberof the networking lab led by Prof. Basem Shi-hada. Ismail joined King Abdullah University ofScience and Technology in 2009 as a foundingmaster student in Electrical Engineering andwas among the recipients of KAUST ProvostAward. Prior Joining KAUST, he received bach-elor of Computer Engineering from King FahdUniversity of Petroleum and Minerals (KFUPM).

His research interests include Cognitive radio, Resource allocation inheterogeneous cellular networks, Developing machine learning tech-niques for resource allocation in wireless networks, and software definedradio prototypes.

Basem Shihada is an Associate and FoundingProfessor of computer science and electrical en-gineering in the Computer, Electrical and Math-ematical Sciences and Engineering (CEMSE)Division at King Abdullah University of Scienceand Technology (KAUST). Before joining KAUSTin 2009, he was a visiting faculty at the ComputerScience Department in Stanford University. Hiscurrent research covers a range of topics in en-ergy and resource allocation in wired and wire-less communication networks, including wireless

mesh, wireless sensor, multimedia, and optical networks. He is alsointerested in SDNs, IoT, and cloud computing. In 2012, he was elevatedto the rank of Senior Member of IEEE.