Mobility management in HetNets: a learning-based perspective

14
Florida International University FIU Digital Commons Electrical and Computer Engineering Faculty Publications College of Engineering and Computing 2-14-2015 Mobility management in HetNets: a learning-based perspective Meryem Simsek Vodafone Chair Mobile Communications Systems Technische Universität DresdenVodafone Chair Mobile Commu Mehdi Bennis University of Oulu Ismail Guvenc Department of Electrical and Computer Engineering, Florida International University, iguvenc@fiu.edu Follow this and additional works at: hps://digitalcommons.fiu.edu/ece_fac Part of the Electrical and Computer Engineering Commons is work is brought to you for free and open access by the College of Engineering and Computing at FIU Digital Commons. It has been accepted for inclusion in Electrical and Computer Engineering Faculty Publications by an authorized administrator of FIU Digital Commons. For more information, please contact dcc@fiu.edu. Recommended Citation Simsek, M., Bennis, M. & Guvenc, I. J Wireless Com Network (2015) 2015: 26. doi:10.1186/s13638-015-0244-2

Transcript of Mobility management in HetNets: a learning-based perspective

Page 1: Mobility management in HetNets: a learning-based perspective

Florida International UniversityFIU Digital CommonsElectrical and Computer Engineering FacultyPublications College of Engineering and Computing

2-14-2015

Mobility management in HetNets: a learning-basedperspectiveMeryem SimsekVodafone Chair Mobile Communications Systems Technische Universität DresdenVodafone Chair Mobile Commu

Mehdi BennisUniversity of Oulu

Ismail GuvencDepartment of Electrical and Computer Engineering, Florida International University, [email protected]

Follow this and additional works at: https://digitalcommons.fiu.edu/ece_fac

Part of the Electrical and Computer Engineering Commons

This work is brought to you for free and open access by the College of Engineering and Computing at FIU Digital Commons. It has been accepted forinclusion in Electrical and Computer Engineering Faculty Publications by an authorized administrator of FIU Digital Commons. For moreinformation, please contact [email protected].

Recommended CitationSimsek, M., Bennis, M. & Guvenc, I. J Wireless Com Network (2015) 2015: 26. doi:10.1186/s13638-015-0244-2

Page 2: Mobility management in HetNets: a learning-based perspective

Simsek et al. EURASIP Journal onWireless Communications andNetworking (2015) 2015:26 DOI 10.1186/s13638-015-0244-2

RESEARCH Open Access

Mobility management in HetNets:a learning-based perspectiveMeryem Simsek1*†, Mehdi Bennis2† and Ismail Guvenc3

Abstract

Heterogeneous networks (HetNets) are expected to be a key feature of long-term evolution (LTE)-advanced networksand beyond and are essential for providing ubiquitous broadband user throughput. However, due to differentcoverage ranges of base stations (BSs) in HetNets, the handover performance of a user equipment (UE) may besignificantly degraded, especially in scenarios where high-velocity UE traverse through small cells. In this article, wepropose a context-aware mobility management (MM) procedure for small cell networks, which uses reinforcementlearning techniques and inter-cell coordination for improving the handover and throughput performance of UE. Inparticular, the BSs jointly learn their long-term traffic loads and optimal cell range expansion and schedule their UEbased on their velocities and historical data rates that are exchanged among the tiers. The proposed approach isshown not only to outperform the classical MM in terms of throughput but also to enable better fairness. Using theproposed learning-based MM approaches, the UE throughput is shown to improve by 80% on the average, while thehandover failure probability is shown to reduce up to a factor of three.

Keywords: Cell range expansion; HetNets; Load balancing; Mobility management; Reinforcement learning;Context-aware scheduling

1 IntroductionTo cope with the wireless traffic demand within the nextdecade, operators are underlaying their macro-cellularnetworks with low-power base stations (BSs) [1]. Suchnetworks are typically referred as heterogeneous net-works (HetNets), and their deployment entails a num-ber of challenges in terms of capacity, coverage, mobilitymanagement (MM), and mobility load balancing acrossmultiple network tiers [2]. Mobility management, in par-ticular, is essential to ensure a continuous connectivityto mobile user equipment (UE) while maintaining satis-factory quality of service (QoS). Therefore, poor mobilitymanagement may lead to handover failures (HOFs), radiolink failures, as well as unnecessary handovers, typicallyreferred as ping-pong (PP) events. Such deficiencies resultin low resource utilization efficiency and poor user experi-ence. In order to solve such problems,mobility parametersin each cell need to be dynamically optimized according

*Correspondence: [email protected]†Equal contributors1Vodafone Chair Mobile Communications Systems Technische UniversitätDresden, Dresden, GermanyFull list of author information is available at the end of the article

to cell traffic loads, coverage areas of different cells, andvelocities of the UE.In the advent of HetNets, recent studies have demon-

strated that HOFs and PPs can be serious problems duesmall cell sizes [2,3]. MM mechanisms, which have beenincluded in the first release of the long-term evolution(LTE) standard (Rel-8), were originally developed for net-works that only involve macrocells [4]. The defined MMprocedures for the macrocell-only scenarios have beenwidely discussed in the literature, e.g., in [5-19]. It hasbeen shown that MM for macrocell-only scenarios yieldhighly reliable handover execution, where HOFs and PPscan be typically avoided due to large cell sizes [20]. How-ever, the deployment of a large number of small cells(e.g., femtocells, picocells, etc.) increases the complexityof MM in HetNets, since mobile UE may trigger fre-quent handovers when they traverse the coverage area of asmall cell. This leads to less reliable handover execution inHetNets.While MM carries critical importance for HetNets to

minimize HOFs and PPs, mobility load balancing is alsocrucial to achieve load balancing among different networktiers. In HetNets, the load among tiers is unbalanced due

© 2015 Simsek et al.; licensee Springer. This is an Open Access article distributed under the terms of the Creative CommonsAttribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproductionin any medium, provided the original work is properly credited.

Page 3: Mobility management in HetNets: a learning-based perspective

Simsek et al. EURASIP Journal onWireless Communications and Networking (2015) 2015:26 Page 2 of 13

to significant differences in transmit power levels. UE tendto connect to themacrocell evenwhen the path loss condi-tions between the small cell and the UE are better, becausethe handover decision is based on the highest referencesignal received power (RSRP) measured at a UE [21]. Asa remedy to this, the 3rd Generation Partnership Project(3GPP) standardized the concept of cell range expansionto virtually increase a small cell’s coverage area by addinga bias value to its RSRP, which leads to traffic offload-ing from the macrocell. To enhance the overall systemperformance, not only cell-specific handover parameteroptimization, such as the range expansion bias (REB)value adaptation, but also scheduling and resource alloca-tion must be performed in a coordinated manner acrossdifferent tiers. A survey of these and various other exist-ing mobility management and load balancing approachesconsidering such aspects for small cells in LTE-advancednetworks is provided in [22].In this article, a joint MM and UE scheduling approach

is proposed using tools from reinforcement learning. Theproposed MM approach utilizes parameter adaptationboth in the long term and the short term. Hereby, macro-and picocells learn how to optimize their long-term trafficload, whereas in the short-term, the UE association pro-cess is carried out based on history and velocity-basedscheduling. We propose multi-armed bandit (MAB) andsatisfaction-basedMM learning techniques as a long-termload balancing approach aiming at improving the over-all system throughput while at the same time reducingthe HOF and PP probabilities. A context-aware sched-uler is proposed as a short-term UE scheduling approachconsidering fairness.The rest of the article is organized as follows. Section 2

provides a brief review about recent mobility manage-ment works in the literature and summarizes our con-tribution in this paper. Section 3 describes our systemmodel. In Section 4, the problem formulation for MM ispresented. In Section 5, the context-aware scheduler isdescribed. In Section 6, we introduce our learning basedMM approaches. Section 7 presents system level simula-tion results, and finally, Section 8 concludes the article.

2 Related work and contributionIn this section, we first summarize some of the recentstudies on mobility management in LTE-advanced Het-Nets and use these to highlight the challenges and openproblems that should be further addressed in the researchcommunity. Subsequently, a typical HetNet scenario withmacro- and picocells deployed in the same frequency bandis considered to describe our contribution in this paper.

2.1 Literature reviewThe handover process requires a smooth transfer of a UE’sactive connection when moving from one cell to another,

while still maintaining the guaranteed QoS. The objectiveis to have mobility procedures resulting in low probabilityof experiencing radio link failures, HOFs, and PP events.Mobility solutions meeting those objectives are often saidto be robust. The enhancement for handover robustnessin HetNet LTE networks have been subject to recent inter-est. In LTE Rel-11,mobility enhancements in HetNets havebeen investigated through a dedicated study item [2]. Inthis study item and cited work items therein, mobilityperformance enhancement solutions for co-channel Het-Nets are analyzed taking into account mobility robustnessimprovement. Proposed solutions are related to optimiz-ing the handover procedure by dynamically adapting han-dover parameters for different cell sizes and UE velocities.Mobility management techniques for HetNets have

been recently investigated in the literature, e.g., [23-28].In [23], the authors evaluate the effect of different com-binations of various MM parameter settings for HetNets.The conclusions are aligned with the HetNet mobility per-formance evaluations in 3GPP [2], i.e., HetNet mobilityperformance strongly depends on the cell size and the UEspeed. The simulation results in [23] consider that all UEhave the same velocity in each simulation setup. Furtherresults on the effects of MM parameters are presentedin [24], where the authors propose a fuzzy-logic-basedcontroller. This controller adaptively modifies handoverparameters for handover optimization by considering thesystem load and UE speed in a macrocell-only network.In [25], the authors evaluate the mobility performance

of HetNets considering almost blank subframes in thepresence of cell range expansion and propose a mobilitybased intercell interference coordination scheme. Hereby,picocells configure coordinated resources by muting oncertain subframes, so that macrocells can schedule theirhigh-velocity UE in these resources which are free ofco-channel interference from the picocells. However, theproposed approach only considers three broad classes ofUE velocities: low, medium, and high. Moreover, no adap-tation of the REB has been taken into account. In [26],the authors propose a hybrid solution for HetNet MM,where MM in a macrocell is network controlled whileUE-autonomous in a small cell. In the scenario in [26],macrocells and small cells operate on different compo-nent carriers which allows the MM to be maintained onthe macrocell layer while enabling small cells to enhancethe user plane capacity. In addition to these approaches,a fairness-based MM solution is discussed in [27]. Here,the cell selection problem in HetNets is formulated as anetwork wide proportional fairness optimization problemby jointly considering the long-term channel conditionsand the distribution of user load among different tiers.While the proposed method enhances the cell-edge UEperformance, no results related to mobility performanceare presented.

Page 4: Mobility management in HetNets: a learning-based perspective

Simsek et al. EURASIP Journal onWireless Communications and Networking (2015) 2015:26 Page 3 of 13

In [28], the authors propose a MAB-based intercellinterference coordination approach that aims at max-imizing the throughput and handover performance bysubband selection for transmission for a small-cell-onlynetwork. The proposed approach deals with the trade-offof increasing the subband size for throughput and han-dover success rate maximization and decreasing the sub-band size as far as possible to minimize interference. Theonly parameter which is optimized is the number of sub-bands based on some signal-to-interference-plus-noise-ratio (SINR) thresholds. While the HOF rate is decreasedby the proposed approach, the PP probability is not ana-lyzed. To the best of our knowledge, there is no previouswork related to learning-based HetNet MM in the litera-ture, which jointly considers handover performance, loadbalancing, and fairness.

2.2 ContributionIn this paper, we propose MM approaches for networkcapacity maximization and HOF reduction, which alsomaintain user fairness across the network. In Figures 1a,b,we depict the basic idea of the classical MM and theproposed MM approaches, respectively. In the classicalMM approach, there is no information exchange amongtiers in case of UE handover and traffic offloading mightbe achieved by picocell range expansion. In the pro-posed MM approaches, instead, each cell individuallyoptimizes its own MM strategy based on limited coor-dination among tiers. Hereby, macro and pico BSs learnhow to optimize their REB on the long term. On the other

Classical MM:W/o picocell coverage optimization

macro UE pico UE

Macro/pico coordination: Average rate of UE is exchanged in case of handover

Learning based MM:Macro-/picocell coverage optimization

macro UE pico UE

a)

b)

Figure 1 Learning-based MM framework with velocity andhistory (average rate)-based resource allocation (a, b).

hand, on the short term, they carry out user schedul-ing based on each UE’s velocity and average rate viacoordination among the macrocell and picocell tiers. Wepropose two learning-based MM approaches: MAB andsatisfaction-based MM. The major difference betweenMAB and satisfaction-based learning is that MAB aims atmaximizing the overall capacity, while satisfaction-basedlearning aims at satisfying the network in terms of capac-ity. The contributions of this article can be summarized asfollows:

• In the proposed MM approaches, we focus on bothshort-term and long-term solutions. In the long term,a traffic load balancing procedure in a HetNetscenario is proposed, while in the short term, the UEassociation process is solved.

• To implement the long-term load balancing method,we propose two learning-based MM approaches byusing reinforcement learning techniques: aMAB-based and a satisfaction-based MM approach.

• The short-term UE association process is based on aproposed context-aware scheduler considering a UE’sthroughput history and velocity to enable fairscheduling and enhanced cell association.

3 SystemmodelWe focus on the downlink transmission of a two-layerHetNet, where layer 1 is modeled as macrocells and layer2 as picocells. The HetNet consists of a set of BSs K ={1, . . . ,K}with a setM = {1, . . . ,M} of macrocells under-laid by a set P = {1, . . . ,P} of picocells, where K =M∪P . Macro BSs are dropped following a hexagonal lay-out including three sectors. Within each macro sector m,p ∈ P picocells are randomly positioned, and a set U ={1, . . . ,U} of UEwhich are randomly dropped within a cir-cle around each picocell p (referred as a hotspot region).UE associated to macrocells are referred as macro UEU(m) = {1(m), . . . ,U(m)} ∈ U and UE served by pico-cells are referred as pico UE U(p) = {1(p), . . . ,U(p)} ∈ U ,where U(p) �= U(m). Each UE i(k) with k ∈ {m, p} has arandomly selected velocity vi(k) ∈ V km/h and a randomdirection of movement within an angle of [0; 2π ]. This is aslightly modified version of the mobility model describedin Section 5.3.1 case 2 in [2]. The only difference withthe 3GPP mobility model is that we do not consider anybouncing circle in our simulations. Hereby, consideringa random direction of movement and no bouncing circleenables different types of handover (macro-to-macro andpico(macro)-to-macro(pico). A co-channel deployment isconsidered, in which picocells and macrocells operate ina system with a bandwidth B consisting of r = {1, . . . ,R}resource blocks (RBs). At every time instant tn = nTs withn = [1, . . . ,N] and Ts = 1 ms, each BS k decides how toexpand its coverage area by learning its REB βk = {βm,βp}

Page 5: Mobility management in HetNets: a learning-based perspective

Simsek et al. EURASIP Journal onWireless Communications and Networking (2015) 2015:26 Page 4 of 13

with βm = {0; 3; 6} dB and βp = {0; 3; 6; 9; 12; 15} dBa.Both macro and pico BSs select their REB to decide whichUE i(k) to schedule on which RB based on the UE’s con-text parameters. These context parameters are defined asthe UE’s velocity vi(k), its instantaneous rate φi(k)(tn)whenassociated to BS k, and its average rate φi(k)(tn) defined as

φi(k)(tn) = 1T

N∑n=1

φi(k)(tn), (1)

whereby T = NTs is a time window. The instantaneousrate φi(k)(tn) is given by

φi(k)(tn) = Bi(k) log(1 + γi(k)(tn)

), (2)

with γi(k)(tn) being the SINR of UE i(k) at time tn, whichis defined as

γi(k)(tn) = pkgi(k),k(tn)∑j∈Kj �=k

pjgi(k),j(tn) + σ 2 , (3)

where pk is the transmit power of BS k, σ 2 is the noise vari-ance, and gi(k),k(tn) is the channel gain from cell k to UEi(k) associated to BS k. The bandwidth Bi(k) in Equation 2is the bandwidth which is allocated to UE i(k) by BS k attime tn.

3.1 Handover procedureAccording to the 3GPP standard, the handover mecha-nism involves the use of RSRPmeasurements, the filteringof measured RSRP samples, a handover hysteresis margin,and a time-to-trigger (TTT) parameter. Hereby, TTT is atime window which starts after the handover condition isfulfilled; a UE does not transmit its measurement reportto its serving cell before the TTT timer expires. This helpsto ensure that ping-pongs are minimized due to fluctua-tions in the link qualities from different cells. The mainsteps of a typical handover process in a HetNet scenarioare illustrated in Figure 2. First, a UE performs RSRPmea-surements and waits until the biased RSRP from a targetcell (e.g., a picocell) is larger than the biased RSRP fromits serving cell (e.g., a macrocell) plus a hysteresis margin(step 1). Hence, a handover is executed if the target cell’s(biased) RSRP (plus hysteresis margin) is larger than thesource cell’s (biased) RSRP, i.e., the handover condition fora UE i(k) to BS k is defined as

Pl(i(l)) + βl < Pk(i(k)) + βk + mhist, (4)

with {l, k} ∈ K, mhist [dB] the UE- or cell-specific hys-teresis margin, βk(βl) [dB] is the REB of BS k(l), andPk(i(k)) (orPl(i(l))) [dBm] is the i(k)-th (i(l)-th) UE’sRSRP from BS k(l). Even when the condition in (4) is sat-isfied, the UE waits for a duration of TTT, before sendinga measurement report to its serving cell (step 2). If thecondition in step 1 is still satisfied after TTT, the UE sends

Location of UE

DL RSRP

1

2

34

HOF

macrocell

picocell

1. Hysteresis threshold2. TTT running3. Measurement report (Uplink)4. Handover command (Downlink)

Figure 2 Illustration of the HOF problem in HetNets due to smallcell size.

the measurement reports to its serving cell in its uplink(step 3), which finalizes the handover by sending a han-dover command to the UE in the downlink (step 4). Inorder to filter out the effects of fast fading and shadowing,a UE obtains each RSRP sample as the linear average overthe power contribution of all resource elements that carrythe common reference signal within one subframe (i.e., 1ms) and in a pre-defined measurement bandwidth (e.g., 6RBs) [2]. This linear averaging is done in layer 1 filter. Asillustrated in Figure 3, layer 1 filtering is performed every40 ms and averaged over five samples. The layer 1 fil-tered measurement is then updated through a first-orderinfinite impulse response filter in layer 3 every 200 ms [2].

4 Problem formulation for throughputmaximization

In this section, we describe our optimization approachfor maximizing the total rate of each cell k as along-term load balancing process. To provide a betteroverview, we first present the interaction of the proposedlong-term and short-term processes in Figure 4. The long-term load balancing optimization approach is solved bythe proposed learning-based MM approaches presentedin Section 6.1 and Section 6.2. As illustrated in Figure 4,both, MAB and satisfaction-based MM, result in REB

time

Handover decision

L3 filtering

L1 filtering

L3 filtering

L3 filtering

40 ms 200 ms

L1 filtering L1 filtering

Handover decision Handover decision

Figure 3 Processing of the RSRPmeasurements through layer 1and layer 3 filtering at a UE [2].

Page 6: Mobility management in HetNets: a learning-based perspective

Simsek et al. EURASIP Journal onWireless Communications and Networking (2015) 2015:26 Page 5 of 13

Long-term load estimationMAB/Satisfaction based learning

Short-term UE scheduling processHistory and velocity based scheduling

REB value selection: k

Load:

k,tot (tn)

Scheduling:

i(k) (tn)Instantaneous

rate:

i(k) (tn)

Figure 4 Flow chart of the proposed learning-based MMapproaches.

βk value optimization and in load balancing φk,tot(tn).Based on the estimated instantaneous load, the context-aware scheduler selects, in the short term, for eachRB a UE by considering its history and velocity asdescribed in Section 5. This results in each UE’s instan-taneous rate φi(k)(tn) and the RB allocation vectorαi(k)(tn) = [

αi(k),1, . . . ,αi(k),R]containing binary variables

αi(k),r , and indicating whether UE i(k) of BS k is allo-cated at RB r or not. The inter-relation between theselected context parameters (UE’s history and velocity),the scheduling function, the described optimization for-mulation, and the rationale behind the short-term andlong-termMM approaches can be summarized as follows.Within the proposed MM approach, we carry out loadbalancing in the long term by optimizing the REB valuesand carry out history-based UE scheduling in the shortterm by means of the proposed context-aware scheduler.The combination of both procedures yields the HOF andPP probability reduction via the optimal REB value selec-tion and the proposed context-aware scheduler. Here, theload balancing procedure yielding the optimal REB valueincurs wideband SINR enhancement and HOF reduction.The context-aware scheduler on the other hand schedulesUE based on the highest estimated achievable rate of eachUE according to its instantaneous channel condition andits history, which leads to long-term fairness among UE.Both approaches, i.e., load balancing and history-basedscheduling, yield throughput enhancement. Additionally,the velocity-based ranking property of the context-awarescheduler reduces the PP probability since low velocity UEare prioritized over high-velocity UE.The optimization approach aims at maximizing the

total rate φk,tot(tn) of each cell k in long-term, i.e., fortn = [t1,. . . ,tN ], through dynamically changing the RBallocation αk(tn) and REB βk , whereby the total rate isdefined as

φk,tot(tn) =∑

i(k)∈Uk

R∑r=1

αi(k),r(tn)φi(k),r(tn), (5)

φi(k),r(tn) is the instantaneous rate of UE i(k) at RB r. Ateach time instant tn, each BS k performs the followingoptimization:

maxαi(k)(tn)

βk

N∑n=1

φk,tot(tn) (6)

= maxαi(k)(tn)

βk

N∑n=1

∑i(k)∈Uk

R∑r=1

αi(k),r(tn)φi(k),r(tn) (7)

= maxαi(k)(tn)

βk

N∑n=1

∑i(k)∈Uk

Bi(k)

R∑r=1

αi(k),r(tn) log(1 + γi(k),r(tn)

)(8)

subject to: (9)αi(k),r(tn) ∈ {0, 1} (10)∑i(k)∈Uk

αi(k),r = 1∀r,∀k, (11)

pk ≤ pk,max (12)φi(k)(tn) ≥ φk,min (13)

The condition in (12) implies that the total transmittedpower over all RBs does not exceed the maximum trans-mission power of BS k. Since our optimization approachin (6) aims at maximizing the total rate, the last condition(13) dictates that the instantaneous rate is larger than aminimum rate for BS k. Due to the distributed nature ofthis optimization problem in (6), we will investigate tworeinforcement learning techniques in this paper, as will bediscussed in Sections 6.1 and 6.2.

5 Short-term solution: a context-aware schedulerThe proposed MM approach considers a fairness-basedcontext-aware scheduling mechanism in the short term.At each RB r, a UE i(k)∗ is selected to be served by BS kaccording to the following scheduling criterion:

i(k)∗r = sortmin

(vi(k)

)(arg max

i(k)∈Uk

φi(k),r(tn)φi(tn)

), (14)

where sortmin(vi(k)) sorts the candidate UE according totheir velocity starting with the slowest UE. After the sort-ing operation, if more than one UE can be selected forRB r, the UE with minimum velocity is selected. Therationale behind introducing a sorting/ranking functionfor candidate UE according to their velocity is that high-velocity UE will not be favored over slow moving UE.This has two advantages: 1) High-velocity UE might passthrough the picocell quickly and should therefore not befavored to avoid PPs, and 2) the channel conditions of low-velocity UE changes slowly which may result, especiallyfor slow-moving cell-edge UE, in poor rates if they are notallocated to many RBs.

Page 7: Mobility management in HetNets: a learning-based perspective

Simsek et al. EURASIP Journal onWireless Communications and Networking (2015) 2015:26 Page 6 of 13

The scheduler defined in (14) will allocatemany (or evenall) resources to a newly handed over UE since its aver-age rate in the target cell is zero. To avoid this and enablea fair resource allocation among all UE in a cell, we pro-pose a history-based scheduling approached as follows.Via the X2-interface, macro- and picocells coordinate, sothat once a macro UE i(m) is handed over to a pico-cell p, the UE’s target cell p and source cell m exchangeinformation. In particular, UE i(m)’s rate history at timeinstant tn is provided to picocell p in terms of average rateφi(m)(tn), such that the UE’s (which is named as i(p) afterthe handover) average rate at picocell p becomes

φi(p) (tn + Ts) = Tφi(m)(tn) + φi(p) (tn + Ts)

T + 1. (15)

In (15), a moving average rate is considered frommacro-cell to picocell, whereas in the classical MM approach, aUE’s rate history is not considered and is equal to zero. Inother words, in the classical proportional fair scheduler,the average rate φi(tn) in (14) is φi(tn) = φi(k)(tn) = 0when a UE is handed over to cell k, whereas we rede-fine it according to (15), i.e., φi(tn) = φi(p)(tn + Ts). Theproposed MM approach, instead, considers the histori-cal rate when UE i(m) was associated to the macrocell min the past. The incorporation of a UE’s history enablesthe scheduler to perform fair resource allocation even inthe presence of a sequence of handovers. Since handoversoccur more frequently in HetNets due to small cell sizes,such a history-based scheduler leads to fair frequencyresource allocation among the UE of a cell. More specifi-cally, UE recently joining a cell will not be preferred overother UE of the cell since their historical average rate willbe taken into account.

6 Long-term solution: learning-basedmobilitymanagement techniques

To solve the optimization approach defined in Section 4,we rely on the self organizing capabilities of HetNets andpropose an autonomous solution for load balancing byusing tools from reinforcement learning [29]. Hereby, eachcell develops its ownMMstrategy to perform optimal loadbalancing based on the proposed learning approaches pre-sented in Sections 6.1 and 6.2 and schedules its UE accord-ing to the context-aware scheduler presented in Section 5.An overview of the inter-relation of the short-term andlong-term MM approaches is provided in Algorithm 2(see Appendix). To realize this, we consider the game G ={K, {Ak}k∈K, {uk}k∈K}, in which each cell autonomouslylearns its optimal MM strategy. Hereby, the set K ={M ∪ P} represents the set of players (i.e., BSs), and forall k ∈ K, the set Ak = {βk} represents the set of actionsplayer k can adopt. For all k ∈ K, uk is the utility functionof player k. The action definition implies that the BSs learnat each time instant tn to optimize their traffic load in the

long term using the cell range expansion process. EachBS learns its action selection strategy based on the MABor satisfaction-based learning approaches as presented inSections 6.1 and 6.2, respectively.

6.1 Multi-armed bandit-based learning approachThe first learning-based MM approach is based on theMAB procedure, which aims to maximize the overall sys-tem performance. MAB is a machine learning techniquebased on an analogy with the traditional slot machine(one-armed bandit) [30]. When pulled at time tn, eachmachine/player provides a reward. The objective is tomaximize the collected reward through iterative pulls, i.e.,learning iterations. The crucial trade-off the player facesat each trial is between exploitation of the action thathas the highest expected utility and exploration of newactions to get more information about the expected util-ity performance of the other actions. The player selectsits actions based on a decision function reflecting thisexploration-exploitation trade-off.The set of actions, the learning strategy, and the utility

function for our MAB-basedMM approach are defined asfollows:

• Actions:Ak = {βk} with βm = [0, 3, 6] dB andβp = [0, 3, 6, 9, 12, 15, 18] dB is the CRE bias. Weconsider higher bias values for picocells due to theirlow transmit power. The considered bias values areselected considering the results in [31] and also basedon our extensive simulation studies.

• Strategy:

1. Every BS k learns its optimum CRE bias value on along-term basis considering its load as definedin (5). This is interrelated with the handovertriggering by defining the cell border of each cell.The problem formulation defined in Section 4optimizes this load in the long term, i.e., overtime tn.

2. A UE is handed over to BS k if it fulfills thecondition (4).

3. RB-based scheduling is performed based on thecontext-aware scheduler defined in Section 5.

• Utility Function: The utility function is a decisionfunction in MAB learning and is composed by anexploitation term represented by player k’s meanreward for selecting an action until time tn and anexploration term that considers the number of timesan action has been selected so far. Player k selects itsaction aj(k)(tn) ∈ Ak at time tn through maximizing adecision function dk,aj(k) (tn), which is defined as

dk,aj(k) (tn) = uk,aj(k) (tn) +

√√√√ 2 log(∑|Ak |

i=1 nk,ai(k) (tn))

nk,aj(k) (tn), (16)

Page 8: Mobility management in HetNets: a learning-based perspective

Simsek et al. EURASIP Journal onWireless Communications and Networking (2015) 2015:26 Page 7 of 13

where uk,aj(k) (tn) is the mean reward of player k attime tn for action aj(k), nk,aj(k) (tn) is the number oftimes action aj(k) has been selected by player k untiltime tn, and | · | represents the cardinality operation.

The decision function in the form as in (16) has beenproposed by Auer et al. in [30]. The main advantage ofAuer’s solution is that the decision function does not relyon the regret which is the loss due to the fact that theglobally optimal policy (which is usually not known inpractical wireless networks) is not followed in each learn-ing iteration. It is clear that the regret will increase withtime, because the globally optimal policy is usually notknown and hence cannot be followed by the player. Thus,the MAB-based strategies need to bound the increase ofthe regret, at least asymptotically. Auer et al. defined anupper confidence bound (UCB) within which the regretwill be present. The UCB considers the player’s aver-age reward and the number of times an action has beenselected until tn. Relying on these assumptions, we defineour decision function in Equation 16.During the first tn = |Ak| iterations, player k selects

each action once in a random order to initialize the learn-ing process by receiving a reward for each action. Forthe following iterations tn > |Ak|, action selection isperformed according to Algorithm 1. In each learningiteration, the action a∗

j(k) that maximizes the decisionfunction in (16) is selected by player k. Then, the param-eters sk,aj(k) (tn), nk,aj(k) (tn), and uk,aj(k) (tn) are updated,where sk,aj(k) (tn) is the cumulated reward of player k afterplaying action aj(k) and 1i=j is equal to 1 if i = j and zerootherwise.

Algorithm1MAB-basedmobilitymanagement algorithm.1: for tn > |Ak| do2: for i = 1 : |Ak| do3: Select action a∗

j(k) according:

4: a∗j(k) = argmaxaj(k)∈|Ak |

(dk,aj(k) (tn)

)5: Update parameters according to:6: Update the cumulated reward when player k

selects action aj(k)7: sk,aj(k) (tn + Ts) = sk,aj(k) (tn) + 1i=jφk,tot(tn)8: Update the number of times action aj(k) has been

selected by player k9: nk,aj(k) (tn + Ts) = nk,aj(k) (tn) + 1i=j

10: Update the mean reward for selecting action aj(k)

11: uk,aj(k) (tn + Ts) = sk,aj(k) (tn+Ts)

nk,aj(k) (tn+Ts)

12: end for13: tn = tn + Ts14: end for

6.2 Satisfaction-based learning approachThe idea of satisfaction equilibrium was introducedin [32], where players having partial or no knowledgeabout their environment and other players are solelyinterested in the satisfaction of some individual per-formance constraints instead of individual performanceoptimization. Here, we consider the player to be satisfiedif its cell reaches a certain minimum level of total rate andif at least 90% of the UE in the cell obtain a certain averagerate. The rationale behind considering these satisfactionconditions is to guarantee a minimum rate for each indi-vidual UE, while at the same time improving the total rateof the cell.To enable a fair comparison, the set of players and the

corresponding set of actions in the proposed satisfaction-based MM approach are considered to be the same asthose in the MAB-based MM approach. The utility func-tion of player k at time tn is defined as the load accordingto (5). In the satisfaction-based learning approach, theactions are selected according to a probability distribu-tion πk(tn) = [

πk,1(tn), . . . ,πk,|Ak |(tn)]. Hereby, πk,j(tn) is

the probability with which BS k chooses its action aj(k)(tn)at time tn. The following learning steps are performed ineach learning iteration:

1. In the first learning iteration tn = 1, the probabilityof each action is equal and an action is selectedrandomly.

2. In the following learning iterations tn > 1, the playerchanges its action selection strategy only if thereceived utility does not satisfy the cell, i.e., if thesatisfaction condition is not fulfilled.

3. If the satisfaction condition is not fulfilled, the playerk selects its action aj(k)(tn) according to theprobability distribution πk(tn).

4. Each player k receives a reward φk,tot(tn) based onthe selected actions.

5. The probability πk,j(tn) of action aj(k)(tn) is updatedaccording to the linear reward-inaction scheme:

πk,j(tn) = πk,j(tn − Ts) + λbk(tn)[−4pt]×

(1aj(k)(tn) = ai(k)(tn) − πk,j(tn − Ts)

),

(17)

whereby 1aj(k)(tn) = ai(k)(tn) = 1 for the selected actionand zero for the non-selected actions. Moreover,bk(tn) is defined as follows:

bk(tn) = uk,max + φk,tot(tn) − uk,min2uk,max

, (18)

where uk,max is the maximum rate in case ofsingle-UE and uk,min = 1

2uk,max. Hereby,λ = 1

100tn+Tsis the learning rate.

Page 9: Mobility management in HetNets: a learning-based perspective

Simsek et al. EURASIP Journal onWireless Communications and Networking (2015) 2015:26 Page 8 of 13

7 Simulation resultsThe scenario used in the system-level simulations isbased on configuration #4b HetNet scenario in [21].We consider a macrocell consisting of three sectors andP = {1, 2, 3} pico BSs per macro sector, randomly dis-tributed within the macrocellular environment as illus-trated in Figure 5. It has to be pointed out that theproposed MM approaches can be applied to any numberof macrocells. We are presenting in this section the resultsfor one macrocell (three macro sectors) due to large com-putation times. In each macro sector, U = 30 mobile UEare randomly dropped within a 60-m radius of each picoBS. The rationale behind dropping all UE around pico BSsis to obtain a large number of handover within a short timein order to avoid large computation times due to the com-plexity of our system level simulations. Each UE i(k) hasa randomly selected velocity vi(k) of V = {3; 30; 60; 120}km/h and a random direction of movement within [0; 2π ].We consider fast-fading and shadowing effects in our sim-ulations that are based on 3GPP assumptions [21]. Furthersimulation parameters are summarized in Table 1.To compare our results with other approaches, we con-

sider a baseline MM approach as defined in [2]. In thebaseline MM approach, handover is performed based onlayer 1 and layer 3 filtering as described in Section 3.1. Forthe baseline MM approach, we consider proportional fair-based scheduling, with no information exchange betweenmacro and pico BSs. This baseline approach is referred toas classical HO approach. In the following, we will com-pare our proposedMMapproaches with this baselineMMapproach which is aligned with the handover proceduredefined in 3GPP [2].

7.1 UE throughput and sum-rateFigure 6 depicts the cumulative distribution function(CDF) of the UE throughput for the classical, MAB-and satisfaction-based MM approaches. For the classical

macro BS

pico BS

hotspot

picocellcoverageMUE

PUE

Figure 5 Illustration of the simulated HetNet scenario.

Table 1 Simulation parameters

Parameter Value

Cellular layout Hexagonal grid,

Three sectors per cell, reuse 1

Carrier frequency 2 GHz

System bandwidth 10 MHz

Subframe duration 1 ms

Number of RBs 50

Number of macrocells 1

Number of PBSs per macrocell P {1,2,3}

Max. macro (pico) BS transmit power PMmax = 46 dBm

(PPmax = 30 dBm)

Macro path loss model 128.1 + 37.6 log10(R) dB (R[km])

Pico path loss model 140.7 + 36.7 log10(R) dB (R[km])

Traffic model Full buffer

Scheduling algorithm Proportional fair

Transmission mode Transmit diversity

Min. dist. MBS-PBS 75 m

Min. dist. PBS-PBS 40 m

Min. dist. MBS-MUE 35 m

Min. dist. PBS-PUE 10 m

Hotspot radius 60 m

Thermal noise density −174 dBm

Macro BS antenna gain 14 dBi

Pico BS antenna gain 5 dBi

approach, we present results for different picocell REBvalues βp. For the MAB- and satisfaction- based MMapproaches, we distinguish between the long-term-onlyMM approach (with a proportional fair scheduler insteadof the proposed context-aware scheduler) and the long-term and short-term MM approach in dashed and solidlines, respectively, to demonstrate the impact of the pro-posed context-aware scheduler. In the selected scenario,the CRE of picocells does influence the cell-edge (5th%)UE throughput, which is zoomed in Figure 6. This isbecause in the simulation scenario all UE are droppedwithin a radius of 60 m around the picocell, so thatmany macrocell UE are close to the picocell and suf-fer from intercell interference. Compared to the classicalapproach, MAB- and satisfaction-based approaches leadto an improvement of up to 39% and 80% for cell-edgeUE for the long-term and short-term approaches, respec-tively. In the case of the long-term-onlyMMapproach, theMAB-based approach yields similar results as the classi-cal approach for cell-edge UE. A similar gain is achievedin terms of average (50th%) UE throughput. Here, theMAB- and satisfaction-based approaches yield 43% and75% improvement compared to the classical approach.

Page 10: Mobility management in HetNets: a learning-based perspective

Simsek et al. EURASIP Journal onWireless Communications and Networking (2015) 2015:26 Page 9 of 13

0 0.5 1 1.5 2 2.5 3 3.5 40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Throughput per UE [Mbps]

CD

F

ClassicalMABSatisfaction

0.1 0.2 0.3

0.01

0.02

0.03

0.04

βp = {0;3;6;9} dB

Cell−edge UE throughput

Satisfaction

Satisfaction

MAB

MAB

Figure 6 CDF of the UE throughput for 30 UE and 1 pico BS permacrocell and TTT = 480ms.

In the case of the long-term only MM approaches, theimprovement in average UE throughput is 12% and 37%for the MAB- and satisfaction-based MM approaches,respectively. Hence, the satisfaction-based approach out-performs the other MM approaches in terms of averageand cell-edge UE throughput. In case of the cell-centerUE throughput, which is defined as the 95th% through-put, the opposite behavior is obtained. In this case,an improvement of 124% and 80% is achieved for theMAB- and satisfaction-based approaches, respectively.The reason is that the satisfaction-based MM approachonly aims at satisfying the network in terms of rateand does not update its learning strategy once satis-faction is achieved in the network, i.e., no actions thatmay lead to improvements are explored. The MAB-basedapproach on the other hand aims at maximizing the

10 15 20 25 30 35 40 45 5050

100

150

200

250

300

350

400

450

500

550

Number of UEs per macrocell

Cel

l−ed

ge U

E th

roug

hput

[kb

ps]

βp={0,3,6,9} dB

ClassicalMABSatisfaction

Figure 7 Cell-edge (/5th%) UE throughput for 30 UE and 1 picoBS per macrocell and TTT = 480ms.

network performance, which is reflected in the improvedcell-center UE throughput.To show the effect of different REB values in case of

classical MM, we depict in Figure 7 the cell-edge UEthroughput for different number of UE per macrocell.With increasing REB bias value for picocells, the cell-edge UE throughput is enhanced. However, the proposedlearning-based MM approaches outperform the classi-cal approach up to five times for 10 UE per macrocelland up to 4.5 times for 50 UE per macrocell. Interest-ingly, the MAB-based MM yields higher cell-edge UEthroughput than the satisfaction-based MM approach forsmaller number of UE. This can be interpreted as fol-lows: In a scenario with 10 UE and a bandwidth of 10MHz, the probability of having unsatisfied UE is verylow. Since most of the UE (or even all UE) are satisfied,the satisfaction-based learning approach does not changeits strategy to further enhance each UE’s performance,whereas the MAB-based learning procedure adapts itsstrategy disregarding the satisfaction condition to maxi-mize the UE’s performance. This leads to the fact that thecell-edge UE performance is enhanced, too. With increas-ing number of UE, the probability of unsatisfied UE’sin the cell increases and the satisfaction-based learningapproach adapts its strategy to enhance especially the cell-edge UE performance, i.e., to satisfy all UE in the cell.That is why the cell-edge UE performance is larger forthe satisfaction-based MM approach for increasing thenumber of UE. In addition to the results for different REBvalues, we present in Figure 8 the sum-rate versus UEdensity per macrocell for TTT = {40, 480} ms values.For both TTT values, the classical approach yields verylow sum-rates, while the proposed approaches lead to sig-nificant improvement of up to 81% and 85% for TTTvalues 40 ms and 480 ms, respectively. The proposed MM

10 15 20 25 30 35 40 45 5050

75

100

125

150

Number of UEs per macrocell

Sum

−ra

te [

Mpb

s]

Classical (TTT = 480 ms)MAB (TTT = 480 ms)Satisfaction (TTT = 480 ms)Classical (TTT = 40 ms)MAB (TTT = 40 ms)Satisfaction (TTT = 40 ms)

Figure 8 Sum-rate vs. number of UE per macrocell with 1 pico BSand TTT = 40,480 ms.

Page 11: Mobility management in HetNets: a learning-based perspective

Simsek et al. EURASIP Journal onWireless Communications and Networking (2015) 2015:26 Page 10 of 13

Table 2 Confidence intervals of the sum-rates for a confidence level of 95%

MM approachNumber of UE per macrocell

10 20 30 50

Classical MM [45.74, 56.33] [58.77, 67.15] [62.61, 71.94] [67.51, 76.81]

MABMM [92.04, 113.92] [116.79, 132.7] [124.4, 142.33] [124.4, 142.33]

Satisfaction MM [92.37, 114.11] [116.07, 132.84] [124.04, 142.2] [128.64, 145.77]

Confidence interval of simulation results for sum-rate [Mbps] for 1 pico BS per macrocell, different number of UE, and TTT = 480 ms.

approaches converge to significantly larger sum-rates thanthe classical MM approach. Interestingly, the sum-rateperformance of the proposed MM approaches dependson the TTT. The reason for this lies in the convergencebehavior of the learning algorithms. For smaller TTT val-ues, handover is executed faster and the BS has to adaptits REB strategy to the new cell load before convergence.To demonstrate the statistical properties of the presentedresults in Figure 8, we present confidence intervals of thesum-rates for a confidence level of 95% in Table 2. It canbe seen that the presented sum-rate results lie within theconfidence intervals.Figure 9 depicts the effect of the long-term strategy on

the derived REB values. Since the long term is relatedto the load estimation which impacts the average UEthroughput, we depict it over the REB values. In caseof the learning-based MM approaches, we would liketo emphasize that each BS selects its own REB valueaccording to the selected learning approach. This leads inthe simulations to the fact that in the REB optimizationprocess some REB values are favored against other REBvalues and that a mix of REB value selection for all BSs isachieved, so that the average UE throughput is depictedover mixed REB values. This is due to the fact that basedon the self-organizing capability of our proposed learningapproaches, the REB values are selected autonomously by

0 3 6 mixed mixed0

0.2

0.4

0.6

0.8

1

1.2

1.4

REB [dB]

Ave

rage

UE

thro

ughp

ut [

Mbp

s]

ClassicalMABSatisfaction

Figure 9 Average UE throughput over REB values for 30 UE and1 pico BS per macrocell and TTT = 480ms.

each BS according to their learning strategy. Hence, theselection of each REB value is controlled by the learningapproach and not by our simulation settings, so that itis not possible to separately present results for each REBvalue. It can be seen that a fix REB value selection leadsto lower average UE throughput than an autonomousREB value selection by each BS, which is the case for theproposed learning-based MM approaches.To compare the performance of the proposed

approaches for different number of picocells per macro-cell, Figure 10 plots the sum-rate versus number of picoBSs per macrocell. For the different number of pico BSs,the proposed MM approaches yield gains of around 70%to 80% for TTT = 480 ms.

7.2 HOF and PP probabilitiesMobility management approaches must not only enhancethe networks performance in terms of UE throughput andsum-rate but also reduce handover failure rates and PPprobabilities. Here, we present results for the HOF proba-bility and PP event probability. We modify our simulationsettings by using the same velocity for each UE per simu-lation, so that we can present results for each velocity sep-arately. Figure 11 depicts the HOF probability for differentTTT values. As it can be seen, our proposed learning-based approaches yield, besides the gains in terms of rate,

1 2 30

50

100

150

200

250

Number of PBSs per macrocell

Sum

−ra

te [

Mbp

s]

ClassicalMABSatisfaction 76%

71%

80%

Figure 10 Sum-rate vs. number of pico BSs per macrocell andTTT= 480ms.

Page 12: Mobility management in HetNets: a learning-based perspective

Simsek et al. EURASIP Journal onWireless Communications and Networking (2015) 2015:26 Page 11 of 13

0 20 40 60 80 100 1200

0.2

0.4

0.6

0.8

Velocity [km/h]

HO

F pr

obab

ility

0 20 40 60 80 100 1200

0.2

0.4

0.6

0.8

Velocity [km/h]

HO

F pr

obab

ility

0 20 40 60 80 100 1200

0.2

0.4

0.6

0.8

Velocity [km/h]

HO

F pr

obab

ility

0 20 40 60 80 100 1200

0.2

0.4

0.6

0.8

Velocity [km/h]

HO

F pr

obab

ility

ClassicalMABSatisfaction

ClassicalMABSatisfaction

ClassicalMABSatisfaction

ClassicalMABSatisfaction

a) b)

d)c)

Figure 11 HOF probability for different TTT values. HOF probability for 30 UE and 1 pico BS per macrocell and (a) TTT = 40 ms, (b) TTT = 80 ms,(c) TTT = 160 ms, and (d) TTT = 480 ms.

improvements in terms of HOF probability. Compared tothe classical MM approach, the proposed methods yieldto the same HOF probability for UE at 3 km/h speed. Forhigher velocities in which more HOFs are expected, theHOF probability obtained by the proposed approaches issignificantly lower than in the case of classical MM. Inter-estingly, the proposed methods lead to almost constantHOF probabilities for velocities larger than 30 km/h. ForUE at 120 km/h, the HOF probability of the MAB-basedapproach is twice the HOF probability of satisfaction-based learning but only one third of the classical approachfor TTT = 40 ms. For increasing TTT values, the trendbetween the proposed MM approaches and the classicalMMapproach remains similar. This is because the picocellcoverage in the classical approach without range expan-sion is small, and thus the macrocell UE quickly run deepinside the picocell coverage area before the TTT expires,significantly degrading the signal quality of the macrocellUE before the handover is completed. In this case, HOFsare alleviated with smaller TTT values.Reducing TTT values may decrease the HOF probabil-

ity but increase PP probability. Hence, HOFs and PPsmustbe studied jointly. We depict in Figure 12 the PP prob-ability for the same simulation settings as in Figure 11.It can be observed that the number of PPs are reducedwith larger TTT values. In addition, for lower velocities,all MM approaches yield similar PP probabilities for all

TTT values. For higher velocities, the PP probability isdecreased by the proposed MM approaches by up to afactor of two (TTT = 40 ms).

7.3 Convergence behavior of MAB- andsatisfaction-based MM

One of the major concerns of learning-based approachesis their convergence behavior in dynamic systems. InFigure 13, we depict the convergence of the proposed

Figure 12 PP probability for 30 UE and 1 pico BS per macrocelland TTT= {40, 80, 160, 480}ms.

Page 13: Mobility management in HetNets: a learning-based perspective

Simsek et al. EURASIP Journal onWireless Communications and Networking (2015) 2015:26 Page 12 of 13

Figure 13 Convergence of cell-center (95-th%) UE throughputfor 30 UE and 1 pico BS per macrocell and TTT= {480}ms.

learning-based MM approaches in terms of cell-centerUE throughput. It can be seen that the MAB-based MMapproach converges slower than the satisfaction-basedMM approach, but it converges to a larger cell-center UEthroughput since it aims at system performance maxi-mization. Hence, depending on the system requirements,either the MAB-based approach can be applied if a largecell-center UE throughput is aimed or the satisfaction-based approach can be preferred if a better cell-edgeUE throughput (see Figure 7) and fast convergence isexpected.

8 ConclusionsWe propose two learning-based MM approaches anda history-based context-aware scheduling method forHetNets. The first approach is based on MAB-basedlearning and aims at system performance maximization.The second method aims at satisfying each cell and eachUE of a cell and is based on satisfaction-based learning.System level simulations demonstrate the performanceenhancement of the proposed approaches compared toclassical MMmethod. While up to 80% gains are achievedin average for UE throughput, the HOF probability isreduced significantly by the proposed learning-basedMMapproaches.

EndnoteaWe consider lower REB values for macro BSs to avoid

overloaded macrocells due to their large transmissionpower.

AppendixAlgorithm 2 presents the interrelation between the short-term and long-term MM approaches.

Algorithm 2 Simulation flow for long- and short-termmobility management algorithm.1: while tn < Tsim do2: Update UE positions based on each UE’s velocity

vi(k) and direction of movement3: Update channel conditions4: Begin short-term MM approach:5: for each UE i(k), k ∈ K do6: Check handover condition in (4):7: if pl(i(l)) + βl > pk(i(k)) + βk + mhist then8: Handover UE i(k) to BS l9: Forward average rate ¯φi(tn) of UE i to BS l

10: else11: No handover12: end if13: Perform context-aware scheduling according to

(14)14: end for15: Begin long-term MM approach:16: for each BS k, k ∈ K do17: Select βk according to MAB/satisfaction based

MM approach.18: end for

n = n + 119: end while

Competing interestsThe authors declare that they have no competing interests.

Authors’ contributionsThe general ideas presented in this article have been developed by all theauthors at high level. MS developed the algorithms presented, implementedthem into the system level simulator, and ran simulations. The paper has beenrevised collaboratively by all authors. All authors read and approved the finalmanuscript.

AcknowledgementsThis research was supported in part by the SHARING project under the Finlandgrant 128010 and by the U.S. National Science Foundation under the grantsCNS-1406968 and AST-1443999.

Author details1Vodafone Chair Mobile Communications Systems Technische UniversitätDresden, Dresden, Germany. 2Centre for Wireless Communications, Universityof Oulu, Oulu, Finland. 3Department of Electrical & Computer Engineering,Florida International University, Miami, FL, USA.

Received: 11 August 2014 Accepted: 5 January 2015

References1. Cisco Visual Networking Index: Global Mobile Data Traffic Forecast

Update, 2013-2018. Cisco Public Information (2014)2. 3GPP, Evolved Universal Terrestrial Radio Access (E-UTRA); Mobility

Enhancements in Heterogeneous Networks. Technical Report 3GPP TR36.839 V11.1.0 (Oct. 2012)

3. Y Peng, YZW Yang, Y Zhu, in Proc. IEEE Symposium on Personal Indoor andMobile Radio Communications (PIMRC). Mobility PerformanceEnhancements for LTE-Advanced Heterogeneous Networks (Sydney, Sept2012)

Page 14: Mobility management in HetNets: a learning-based perspective

Simsek et al. EURASIP Journal onWireless Communications and Networking (2015) 2015:26 Page 13 of 13

4. H Holma, A Toskala, LTE-Advanced: 3GPP Solution for IMT-Advanced. (JohnWiley & Sons, Ltd, UK, 2012)

5. J Aonso-Rubio, in Proc. IEEE Network Operations andManagement Symp.Self-optimization for handover oscillation control in LTE, (2010),pp. 950–953

6. G Hui, P Legg, in Proc. IEEE Vehic. Technol. Conf. (VTC). LTE handoveroptimisation using uplink ICIC, (2011), pp. 1–5

7. K Kitagawa, T Komine, in Proc. IEEE Int. Symp. Personal Indoor Mobile RadioCommun. (PIMRC). A handover optimization algorithm with mobilityrobustness for LTE systems, (2011), pp. 1647–1651

8. K Ghanem, H Alradwan, in Proc. IEEE Int. Symp. Commun. Syst., Networksand Digital Signal Proc. Reducing ping-pong handover effects in intraEUTRA networks, (2012), pp. 1–5

9. H Zhang, X Wen, B Wang, W Zheng, Z Lu, in Proc. IEEE Int. Conf. onResearch Challenges in Computer Science. A novel self-optimizinghandover mechanism for multi-service provisioning in LTE-Advanced,vol. 1, (2009), pp. 221–224

10. T Jansen, I Balan, J Turk, I Moerman, T Kurner, in Proc. IEEE Vehic. Technol.Conf. (VTC). Handover parameter optimization in LTE self-organizingnetworks, (2010), pp. 1–5

11. K Dimou, M Wang, Y Yang, M Kazmi, A Larmo, J Pettersson, W Muller, YTimner, in Proc. IEEE Vehic. Technol. Conf. (VTC). Handover within 3GPP LTE:design principles and performance, (2009), pp. 1–5

12. W Li, X Duan, S Jia, L Zhang, Y Liu, J Lin, in Proc. IEEE Vehic. Technol. Conf.(VTC). A dynamic hysteresis-adjusting algorithm in LTE self-organizationnetworks, (2012), pp. 1–5

13. T-H Kim, Q Yang, J-H Lee, S-G Park, Y-S Shin, in Proc. IEEE Vehic. Technol.Conf. (VTC). A mobility management technique with simple handoverprediction for 3G LTE systems, (2007), pp. 259–263

14. M Carvalho, P Vieira, in Proc. Int. Symp. onWireless Personal MultimediaCommun. An enhanced handover oscillation control algorithm in LTESelf-Optimizing networks, (2011)

15. W Luo, X Fang, M Cheng, X Zhou. Proc. IEEE Int. Workshop on SignalDesign and Its Applications in Commun, (2011), pp. 193–196

16. I Balan, T Jansen, B Sas, in Proc. Future Network andMobile Summit.Enhanced weighted performance based handover optimization in LTE,(2011), pp. 1–8

17. T Jansen, I Balan, S Stefanski, I Moerman, T Kurner, in Proc. IEEE Vehic.Technol. Conf. (VTC). Weighted performance based handover parameteroptimization in LTE, (2011), pp. 1–5

18. H-D Bae, B Ryu, N-H Park, in Australasian Telecommunication Networks andApplications Conference (ATNAC). Analysis of handover failures in LTEfemtocell systems, (2011), pp. 1–5

19. Y Lei, Y Zhang, in Proc. IEEE Int. Conf. Computer Theory and Applications.Enhanced mobility state detection based mobility optimization for femtocells in LTE and LTE-Advanced networks, (2011), pp. 341–345

20. M. P Wiley-Green, T Svensson, in Proc. IEEE GLOBECOM. Throughput,Capacity, Handover and Latency Performance in a 3GPP LTE FDD FieldTrial (FL, USA, Dec. 2010)

21. 3GPP, Evolved Universal Terrestrial Radio Access (EUTRA); Furtheradvancements for E-UTRA Physical Layer Aspects, Technical Report 3GPP TR36.814 V9.0.0, (Oct. 2010)

22. D Xenakis, LMN Passas, C Verikoukis, Mobility Management for Femtocellsin LTE-Advanced: Key Aspects and Survey of Handover DecisionAlgorithms. IEEE Comm. Surveys Tutorials. 16(1) (2014)

23. S Barbera, SPH MMichaelsen, K Pedersen, in Proc. IEEEWireless Comm. andNetworking Conf. (WCNC). Mobility Performance of LTE Co-ChannelDeployment of Macro and Pico Cells (France, Apr. 2012)

24. RBP Munoz, I de la Bandera, On the potential of handover parameteroptimization for self-organizing networks. IEEE Trans. Vehicular Technol.62(5), 1895–1905 (2013)

25. D Lopez-Perez, I Guvenc, X Chu, Mobility management challenges in3GPP heterogeneous networks. IEEE Comm. Mag. 50(12) (2012)

26. KI Pedersen, RPH CMichaelsen, S Barbera, Mobility enhancements forLTE-advanced multilayer networks with inter-site carrier aggregation. IEEEComm. Mag. 51(5) (2013)

27. J Wang, JPJ DWLiu, G Shen, in Proc. IEEE Vehic. Technol. Conf. (VTC).Optimized Fairness Cell Selection for 3GPP LTE-A Macro-Pico HetNets(CA, Sept. 2011)

28. A Feki, V Capdevielle, L Roullet, AG Sanchez, in 11th Int. Symp. onModeling& Optimization in Mobile, Ad Hoc &Wireless Networks (WiOpt). HandoverAware Interference Management in LTE Small Cells Networks, (2013)

29. ME Harmon, SS Harmon, Reinforcement Learning: A Tutorial. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.33.2480&rep=rep1&type=pdf

30. NC-BP Auer, P Fischer, Finite time analysis for the multiarmed banditproblem. Mach. Learn. 17, 235–256 (2012)

31. 3GPP, Performance Study on ABS with ReducedMacro Power, TechnicalReport 3GPP R1-113806, (Nov. 2011)

32. S Ross, B Chaib-draa, in Proc. 19th Canadian Conf. on Artificial Intelligence.Satisfaction equilibrium: achieving cooperation in incompleteinformation games, (2006)

Submit your manuscript to a journal and benefi t from:

7 Convenient online submission

7 Rigorous peer review

7 Immediate publication on acceptance

7 Open access: articles freely available online

7 High visibility within the fi eld

7 Retaining the copyright to your article

Submit your next manuscript at 7 springeropen.com