Low-Latency Networking: Where Latency Lurks and How to Tame Itmodiano/papers/CV_J_108.pdf ·...

27
Low-Latency Networking: Where Latency Lurks and How to Tame It This article presents a holistic analysis and classification of the main design principles and enabling technologies for the deployment of low-latency wireless networks. By XIAOLIN J IANG , Student Member IEEE,HOSSEIN SHOKRI -GHADIKOLAEI , Student Member IEEE, GABOR FODOR, Senior Member IEEE,EYTAN MODIANO, Fellow IEEE, ZHIBO PANG, Senior Member IEEE,MICHELE ZORZI , Fellow IEEE, AND CARLO FISCHIONE, Member IEEE ABSTRACT | While the current generation of mobile and fixed communication networks has been standardized for mobile broadband services, the next generation is driven by the vision of the Internet of Things and mission-critical communi- cation services requiring latency in the order of milliseconds or submilliseconds. However, these new stringent require- ments have a large technical impact on the design of all layers of the communication protocol stack. The cross-layer interactions are complex due to the multiple design princi- ples and technologies that contribute to the layers’ design and fundamental performance limitations. We will be able to develop low-latency networks only if we address the problem of these complex interactions from the new point of view of submilliseconds latency. In this paper, we propose a holistic analysis and classification of the main design principles and enabling technologies that will make it possible to deploy low-latency wireless communication networks. We argue that Manuscript received January 9, 2018; revised June 26, 2018; accepted July 27, 2018. Date of publication August 30, 2018; date of current version January 22, 2019. The work of H. S. Ghadikolaei was supported in part by the Swedish Research Council under Grant 2018-00820. The work of Z. Pang was suported in part by Vinnova (Swedish Innovation Agency) through Grants 2015-06548 and 2017-02822. G. Fodor acknowledges that part of this work has been performed in the framework of the H2020 project 5GCAR cofounded by the EU. The views expressed are those of the authors and do not necessarily represent the views of the 5GCAR project. (Corresponding author: Xiaolin Jiang.) X. Jiang, H. Shokri-Ghadikolaei, and C. Fischione are with the Department of Network and Systems Engineering, School of Engineering, KTH Royal Institute of Technology, 67100 Stockholm, Sweden (e-mail: [email protected]). G. Fodor is with Ericsson Research, Kista, Sweden. E. Modiano is with the Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, MA 02139 USA. Z. Pang is with ABB AB, Corporate Research, 721 78 Västerås, Sweden. M. Zorzi is with the Department of Information Engineering, University of Padova, 35131 Padua, Italy. Digital Object Identifier 10.1109/JPROC.2018.2863960 these design principles and enabling technologies must be carefully orchestrated to meet the stringent requirements and to manage the inherent tradeoffs between low latency and tra- ditional performance metrics. We also review currently ongoing standardization activities in prominent standards associations, and discuss open problems for future research. KEYWORDS | Internet of Things; low-latency communications; mission critical services; ultrareliable communications I. INTRODUCTION A series of major technological revolutions have pushed the development of communication networks to the current state of the art that includes internet, pervasive broadband wireless access, and low data rate Internet of Things (IoT). One of the first crucial steps of this series of revolutions is the global Public Switched Telephone Network (PSTN), the aggregate of the world’s nationwide circuit switched tele- phone networks, which was designed to deliver arguably the most demanded and revenue generating services in the history of communication networks, namely circuit switched voice. As noted in [1], the PSTN not only ensures the latency requirements imposed by voice communica- tion services, but also responds to randomly fluctuating demands and failures by rerouting traffic and reallocating communication resources. Due to its ability to reliably meet human-centered latency requirements and deliver the popular voice service over very long distances, even in the presence of fluctuating traffic demands and link failures, the PSTN is a technology to which Mark Weiser’s observation truly applies: “The most profound technologies are those that disappear. They weave themselves into the fabric 280 PROCEEDINGS OF THE IEEE | Vol. 107, No. 2, February 2019 0018-9219 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Transcript of Low-Latency Networking: Where Latency Lurks and How to Tame Itmodiano/papers/CV_J_108.pdf ·...

Page 1: Low-Latency Networking: Where Latency Lurks and How to Tame Itmodiano/papers/CV_J_108.pdf · 2019-04-12 · Low-Latency Networking: Where Latency Lurks and How to Tame It This article

Low-Latency Networking:Where Latency Lurksand How to Tame ItThis article presents a holistic analysis and classification of the main design principlesand enabling technologies for the deployment of low-latency wireless networks.

By XIAOLIN JIANG , Student Member IEEE, HOSSEIN SHOKRI-GHADIKOLAEI, Student Member IEEE,GABOR FODOR, Senior Member IEEE, EYTAN MODIANO, Fellow IEEE,ZHIBO PANG, Senior Member IEEE, MICHELE ZORZI, Fellow IEEE,AND CARLO FISCHIONE, Member IEEE

ABSTRACT | While the current generation of mobile and fixed

communication networks has been standardized for mobile

broadband services, the next generation is driven by the

vision of the Internet of Things and mission-critical communi-

cation services requiring latency in the order of milliseconds

or submilliseconds. However, these new stringent require-

ments have a large technical impact on the design of all

layers of the communication protocol stack. The cross-layer

interactions are complex due to the multiple design princi-

ples and technologies that contribute to the layers’ design

and fundamental performance limitations. We will be able to

develop low-latency networks only if we address the problem

of these complex interactions from the new point of view of

submilliseconds latency. In this paper, we propose a holistic

analysis and classification of the main design principles and

enabling technologies that will make it possible to deploy

low-latency wireless communication networks. We argue that

Manuscript received January 9, 2018; revised June 26, 2018; accepted July 27,2018. Date of publication August 30, 2018; date of current version January 22,2019. The work of H. S. Ghadikolaei was supported in part by the SwedishResearch Council under Grant 2018-00820. The work of Z. Pang was suported inpart by Vinnova (Swedish Innovation Agency) through Grants 2015-06548 and2017-02822. G. Fodor acknowledges that part of this work has been performedin the framework of the H2020 project 5GCAR cofounded by the EU. The viewsexpressed are those of the authors and do not necessarily represent the views ofthe 5GCAR project. (Corresponding author: Xiaolin Jiang.)

X. Jiang, H. Shokri-Ghadikolaei, and C. Fischione are with the Departmentof Network and Systems Engineering, School of Engineering, KTH Royal Instituteof Technology, 67100 Stockholm, Sweden (e-mail: [email protected]).

G. Fodor is with Ericsson Research, Kista, Sweden.

E. Modiano is with the Laboratory for Information and Decision Systems,Massachusetts Institute of Technology, Cambridge, MA 02139 USA.

Z. Pang is with ABB AB, Corporate Research, 721 78 Västerås, Sweden.

M. Zorzi is with the Department of Information Engineering, University ofPadova, 35131 Padua, Italy.

Digital Object Identifier 10.1109/JPROC.2018.2863960

these design principles and enabling technologies must be

carefully orchestrated to meet the stringent requirements and

to manage the inherent tradeoffs between low latency and tra-

ditional performancemetrics. We also review currently ongoing

standardization activities in prominent standards associations,

and discuss open problems for future research.

KEYWORDS | Internet of Things; low-latency communications;

mission critical services; ultrareliable communications

I. I N T R O D U C T I O N

A series of major technological revolutions have pushed thedevelopment of communication networks to the currentstate of the art that includes internet, pervasive broadbandwireless access, and low data rate Internet of Things (IoT).One of the first crucial steps of this series of revolutions isthe global Public Switched Telephone Network (PSTN), theaggregate of the world’s nationwide circuit switched tele-phone networks, which was designed to deliver arguablythe most demanded and revenue generating services inthe history of communication networks, namely circuitswitched voice. As noted in [1], the PSTN not only ensuresthe latency requirements imposed by voice communica-tion services, but also responds to randomly fluctuatingdemands and failures by rerouting traffic and reallocatingcommunication resources. Due to its ability to reliablymeet human-centered latency requirements and deliverthe popular voice service over very long distances, evenin the presence of fluctuating traffic demands and linkfailures, the PSTN is a technology to which Mark Weiser’sobservation truly applies:

“The most profound technologies are those thatdisappear. They weave themselves into the fabric

280 PROCEEDINGS OF THE IEEE | Vol. 107, No. 2, February 2019

0018-9219 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 2: Low-Latency Networking: Where Latency Lurks and How to Tame Itmodiano/papers/CV_J_108.pdf · 2019-04-12 · Low-Latency Networking: Where Latency Lurks and How to Tame It This article

Jiang et al.: Low-Latency Networking: Where Latency Lurks and How to Tame It

of everyday life until they are indistinguishablefrom it” [2], [3].

The second step in the communication networks revolu-tions has made PSTN indistinguishable from our everydaylife. Such step is the Global System for Mobile (GSM)communication standards suite. In the beginning of 2000,GSM has become the most widely spread mobile commu-nications system, thanks to the support for users mobility,subscriber identity confidentiality, subscriber authentica-tion as well as confidentiality of user traffic and signal-ing [4]. The PSTN and its extension via the GSM wirelessaccess networks have been a tremendous success in termsof Weiser’s vision, and also paved the way for new businessmodels built around mobility, high reliability, and latencyas required from the perspective of voice services.

The success of PSTN and GSM was ultimately due tothe development of packet switching and wireless accessmethods. The circuit switching technology dedicates com-munication resources along the end-to-end path of pairsof transmitters and receivers on a coarse time scale,which leads to poor resource utilization. In contrast, withpacket switching, a communication network is designed toshare a single link between multiple pairs of transmittersand receivers. Although this idea boosts communicationresource utilization, it may lead to packet congestionand consequent increase of latency. In fact, the internetemploys packet switching and Voice over IP (VoIP) tech-nologies, and the interval between the delivery of voiceand data packets at the receiving end is not deterministic:packets can arrive out of order, have variable latencies(jitter), or even get lost. To devise mechanisms thathelp VoIP systems to keep latency low and to compensatefor jitter, VoIP endpoints use buffers to delay incomingpackets so as to create a steady stream. For voice servicesdelivered by VoIP technologies, the standard answer tothe natural question “how much latency is too much?” isdictated by the ability of humans to adapt the cadenceof a conversation, and is typically quantified as around200–250 ms.

The boundary of 200–250 ms has been further pushedby the third and fourth steps of the communication net-work revolutions. As more powerful wireless access gen-erations succeeded GSM, the 3rd Generation PartnershipProject has designed the third and fourth generationsof wireless cellular networks to meet 150 ms and sub-sequently an order of magnitude lower latency require-ments in the wireless access part [5]. However, these thirdand fourth generations of wireless cellular networks weremostly driven by the need of high data rates and coverage.The requirements of low-latency for mission critical appli-cations, such as telesurgery, virtual reality over networks,autonomous control of vehicles and smart grids, were notthe concern of these generations.

The fifth step in the communication network revolutionshas recently started. It arguably envisions a new formof proximity-aware networking or ubiquitous computing[3], [6]. The vision is founded on the implementation of a

“wireless sense” using low-latency and highly reliable prox-imal communications via device-to-device and short-rangecommunication technologies [7], thereby making the IoTa technological possibility. The “things,” including sensorsand actuators, smart meters, radio-frequency identificationtags and other devices, are interconnected together eitherdirectly or through a gateway node and the global inter-net. In such emerging IoT communication systems, lowlatency and high reliability in communicating messageswill have a large impact on all elements of informationand communication technology, ranging from mobile net-works, consumer broadband, video, wireless sense-basedproximal and cloud services. Ultimately, the fifth step of thecommunication network revolution is expected to ensureend-to-end communication latencies below 1 ms.

There are many use cases demanding below 1-mslow-latency communications. The development of infor-mation and communication technology for healthcare,industrial processes, transport services, or entertainmentapplications generates new business opportunities for net-work operators [8]. Part of this vision is grounded onthe future availability of very low-latency communicationnetworks to build the tactile internet that will extendtouch and skills, and help realize real-time virtual andaugmented reality experience [9]. The transmission ofmultisensorial signals, including the sense of touch (hap-tics), will contribute to the overall experience of real-timeremote interactions. The healthcare industry is experi-menting with remote diagnosis with haptic feedback, whileremote robotic surgery with haptic feedback representsa potential future application of major impact for lowlatency communications. The transport sector is testingdriver assistance applications and self-driving cars thatwill benefit from remote monitoring. The entertainmentindustry expects that immersive entertainment and onlinegaming incorporating augmented reality (AR) will opennew revenue streams, while the manufacturing industryexpects significant productivity increase due to remotecontrol with AR applications. Remote control applicationscan help improve the safety of personnel and reduce thecost of managing the onsite work force for hazardousenvironments such as mines or construction sites.

The use cases mentioned above, from the perspec-tive of the wireless access network design, are expectedto be addressed by the fifth step of the network rev-olutions, which is commonly called the fifth genera-tion (5G). 5G is driven by the vision of the networkedsociety [10], for which two generic communication modesof machine-type communication will be supported [11]:ultrareliable low-latency communication (URLLC) andmassive machine-type communication. URLLC is an innov-ative feature of 5G networks, as it will be used for a rangeof mission critical communication scenarios. Thereby,URLLC is expected to create near-term new business andservice opportunities. Therefore, there is a great interestin technologies that will enable proximal communicationlinks as well as global networks to operate with very

Vol. 107, No. 2, February 2019 | PROCEEDINGS OF THE IEEE 281

Page 3: Low-Latency Networking: Where Latency Lurks and How to Tame Itmodiano/papers/CV_J_108.pdf · 2019-04-12 · Low-Latency Networking: Where Latency Lurks and How to Tame It This article

Jiang et al.: Low-Latency Networking: Where Latency Lurks and How to Tame It

low-latency (below milliseconds) while ensuring high reli-ability. As emphasized in [12], communication networkswith latency below milliseconds will have a direct impacton network and proximal communication technology mon-etization, because latency performance is decisive for anumber of revenue generating services, including highcapacity cloud services, mission-critical machine-type com-munication services and high-resolution video and stream-ing services. Indeed, the latency performance will bethe determining factor between winning and losing busi-ness since meeting latency requirements directly impactslatency-sensitive, high capacity proximal as well as globalinternet services.

Unfortunately, the vision of low-latency communicationnetworks appears as a problem of formidable complexity.Ensuring that audio, visual, and haptic feeds are sent withsufficiently low latency is a challenge, since the end-to-end route may incorporate multiple wireless access, localarea and core network domains. All services delivered overproximal or long-distance communication networks aresubject to latency, which is a function of several factors,such as link sharing, medium-access control (MAC) andnetworking technologies, competing service and trafficdemands, or service-processing algorithms. Within a singlenetwork, there are several components that contribute tolatency, such as at the physical, link, and routing layers.If we try to ensure low latency only at one layer, we mayhave nonnegligible latency components at other layers.Moreover, the optimization at a single layer may haveundesired effects at other layers. To make the problemworse, complexity is not limited to individual networks.The many use cases demanding low latency as describedabove, use end-to-end connections that may be supportednot only over a single network, but often over multidomainnetworks. Here, “domain” refers to a part of the end-to-end communication network that is under the control of anetwork operator in terms of dimensioning and managingcommunication resources. Examples of multidomain net-works include the global PSTN, internetworks consistingof multiple networks owned and operated by multipleinternet service providers, and cellular networks connect-ing cellular subscribers served by different mobile networkoperators. Such a complexity is largely unexplored.

In this paper, we conduct an analysis and classificationof the most prominent design principles and enablingtechnologies that are needed to meet the stringent require-ments of low-latency networks. While it seems probablethat such networks will be initially delivered in the prox-imity of communicating entities [3], [6], we also arguethat there are strong business cases to deliver missioncritical services even over long distances, such as forindustrial remote operations, health care and intelligenttransportation systems. Future low-latency services will beprovided not only in limited geographical areas, but alsoover multidomain networks. Therefore, it will be usefulto analyze the latency and reliability requirements of themost important use cases for low latency and the sources

of end-to-end latency for those cases. There is a need ofsubstantial research, standardization, and development oftechnology enablers that are applicable at the physical,MAC, network, and transport layers in all segments ofcommunication networks, including the access, core, andservice networks. Our analysis will greatly help in layingthe foundation for developing technologies that can realizenetworks supporting services below 1-ms latency. Althoughall of these use cases require low latency, they may posedifferent level of reliability requirement, e.g., the reliabilityrequirement of decentralized environmental notificationmessages for vehicular communications is more relaxedcompared to that of tactile internet. More relaxed relia-bility requirements may render more options of the tech-niques to achieve the same level of latency performance.

The remainder of this paper is structured as follows.Section II examines the latency requirements imposedby latency-critical applications and services. Then, basedon this examination, the causes of latency componentsin single-hop, multihop, and multidomain networks areexamined and formally defined in Section III. Section IVsurveys technology enablers applicable within a singledomain, while Section V discusses interdomain latencyreduction and control techniques. Next, Section VI pro-vides an overview of the most important related stan-dardization activities. Section VII discusses open researchquestions, and Section VIII concludes the paper.

II. L AT E N C Y R E Q U I R E M E N T S O F U S EC A S E S I N S I N G L E H O P, M U LT I H O PA N D M U LT I D O M A I N N E T W O R K S

In this section, we analyze the latency requirements offuture use cases of major societal impact, which will beenabled by the availability of networks capable to offerbelow 1-ms latencies. We argue that some of these usecases require a single-domain network, whereas the restshould be supported by multidomain networks.

The characterization of end-to-end latency and theidentification of latency requirements associated withlatency-critical applications and services are necessarysteps to understand and compare promising technologyenablers. We note that the definition of communicationlatency is not unique, but it depends on the use cases.Due to the stochastic nature of end-to-end latency, latencyrequirements may be typically specified in the form ofstochastic measures, such as the cumulative distributionfunction or its moments [13], [14], and a probability ofexceeding a predefined latency value. For mission-criticaland industrial process control systems, for example, thelatency requirements can specify that a predefined latencyvalue of a few milliseconds should be kept with proba-bility 10−8 [13]. Alternatively, latency requirements mayrequire that the expected value and the variance of thelatency must remain under predefined thresholds [14].The latency requirements of prominent latency-sensitiveapplication scenarios are listed in Table 1. In the following,we give a short description of each.

282 PROCEEDINGS OF THE IEEE | Vol. 107, No. 2, February 2019

Page 4: Low-Latency Networking: Where Latency Lurks and How to Tame Itmodiano/papers/CV_J_108.pdf · 2019-04-12 · Low-Latency Networking: Where Latency Lurks and How to Tame It This article

Jiang et al.: Low-Latency Networking: Where Latency Lurks and How to Tame It

Table 1 Requirement for Low-Latency Applications (Reliability Marked With High Is Due to No Availability of Precise Numbers)

A. Smart Grid

Smart grid is proposed to solve the deficiencies in theold power system such poor controllability to the powergeneration utilities and slow response to the change ofthe power consumption [26]. Besides the power grid,the smart grid also has a control network with monitor-ing, communicating, and computing abilities. This con-trol network should support low-latency communicationsto enable mission-critical applications such as substationautomation and distributed energy resources [27]. Specif-ically, substation automation refers to monitoring, pro-tection, and control functions performed on substationand feeder equipment. The substation automations areextremely latency-sensitive as the time-critical messagewill be irrelevant if not delivered within a specific timein the order of several millisecond. Distributed energyresources are small sources of power generation and/orstorage that are located close to the load they serveand connected to the distribution grid. To integratethe distributed energy resources smoothly, automatic andremote monitoring, control, manipulation, and coordina-tion should be performed in real time. Protection andcontrol traffics are the most latency sensitive in smart gridapplications, with latency requirements of 10 and 100 ms,respectively [22]. IEC 61850 is the international standardthat ensures interoperability between the control networksof smart grids [28].

B. Industrial Automation

The forthcoming Industry 4.0 paradigm is expectedto substantially boost interoperability in manufactur-ing by enabling machines, devices, sensors, and peopleto connect and communicate with one another. Indus-trial manufacturing requires high reliability and stringentlatency guarantees. By supervising the production activ-ities, process automation aims to support more efficientand safe operation of industries such as paper, min-ing, and cement [24], [25]. The latency requirementsare related with the sampling times of different appli-cations and are in the order of 100 ms to 1 s. Factoryautomation includes time-constrained operational applica-tions, such as those used for motion control and certainpower electronics applications, which requires the latencybetween 1 and 10 ms [23]. Ethernet-based solutions are

gaining popularity in industrial communications, due totheir capability of guaranteeing latency. [29]. The wirelesssolutions are also favored in many industrial scenarios withharsh environments.

C. Medical Applications

Telediagnosis and telesurgery are promising trends formedical care. Not constrained by the geographical dis-tances anymore, experienced surgeons will be able to diag-nose or even perform surgeries along with audio–visualand haptic feedback by a robot [16]. Telemonitoring isanother emerging application that enables the experiencedsurgeon at the remote side to watch both the local surgeonto perform the surgery, and the patient’s conditions, andprovide real-time guidance and suggestions. These usecases require, in general, latencies of some millisecondsand at most 2% packet loss rates to realize real-time andreliable audio, visual, and haptic feedback [21].

Medical applications of low-latency networks go waybeyond telesurgery. The exoskeletons are new supportiveprotheses that enable to help aging people move indepen-dently, promote patient rehabilitation following an injury,and allow workers to carry heavier loads. Exoskeletonsuse sensors on the skin to detect voltage change of thesignal at the muscles sent by the brain [19]. Prosthetichand is another artificial device, designed for those thatlost their own hand. Most of the existing prosthetic handscan only perform actions such as bracing and holding,which are control actions that can be performed with somehundreds of milliseconds’ communication latency betweenthe touching and the actuation. A prosthetic hand thatenables the users to feel has been recently reported in [20].The tactile and position sensors in such a prosthetic handcollect and send the data to a control unit that translatesthem into a neural code, which is then applied to thenerves in the user’s arm. Meanwhile, the data are alsoprocessed to interpret the user’s movement intention andsend commands to the prosthetic hand. To provide theuser with safety and a better experience wearing theexoskeletons and prosthetic hands, the latency should beno more than a few milliseconds.

D. Virtual Reality and Augmented Reality

Virtual reality (VR) and AR are revolutionary interfacesthat provide unprecedented experiences and enable new

Vol. 107, No. 2, February 2019 | PROCEEDINGS OF THE IEEE 283

Page 5: Low-Latency Networking: Where Latency Lurks and How to Tame Itmodiano/papers/CV_J_108.pdf · 2019-04-12 · Low-Latency Networking: Where Latency Lurks and How to Tame It This article

Jiang et al.: Low-Latency Networking: Where Latency Lurks and How to Tame It

applications. VR provides an immersive experience for alive concert, sports match or interactive game for usersjust sitting on the couch at home. AR augments reality byenabling better learning and working modes. For example,in a natural history museum, on top of the specimen ofa dinosaur, some vivid 3-D dinosaur projection providesa better conceptual learning environment. The real-timeaudio, visual, and haptic feedback by VR and AR requireslatency from action to reaction below 1 ms to avoidnausea [15]. Within the world of musical instruments andmusic industry, the vision of internet of musical instru-ments has recently been proposed. According to such avision, any musical instrument will be connected in thefuture to internet via wireless communications, providedthat the end-to-end latency is around 5 ms [30]–[32].

E. Intelligent Transport Systems

Vehicular communications refer to the communica-tion among vehicles that can improve driving safety,reduce traffic congestion and traffic accidents, improvefuel consumption efficiency (e.g., by platooning), sup-port high-quality entertainment, and ultimately enabledriver-less cars [33]–[36]. AR can also be used in intel-ligent transport systems to create a bird view of thereal-time traffic information. Vehicular cloud architecturesare also proposed to support applications that requirelarge computation and storage capacities [35]. To supportall these functionalities, communication networks haveto support latencies of only a few milliseconds [16].To ensure safety, a reliability as high as 99.99999% isalso needed [17]. Dedicated short-range communicationsbased on IEEE 802.11p [37] and cellular vehicle-to-everything (C-V2X) communication are promising candi-dates to support vehicle-to-vehicle and vehicle-to-roadsidecommunication [38], [39].

F. Financial Markets

Time equals money in the financial market, and havinglow-latency between the placing of an order and its execu-tion is essential to achieve the transaction at the desiredprice before the price changes. It has been stated thata 1-ms advantage in trading applications can be worth$100 million a year to a major brokerage firm [40]. Therequired latency in the financial trading environment is inthe order of several milliseconds [18]. A high availabilityis also required to support a great number of onlineclients.

We conclude this section by noting that the major usecases mentioned above do not have a common definition ofthe latency requirement. Moreover, in some instances, theyrequire proximal communications, and in other instances,they require remote communication services. In Section III,we will deepen the technical definition of latency and ofdelay components for proximal and remote communica-tion services.

III. C A U S E S A N D D E F I N I T I O N S O FL AT E N C Y C O M P O N E N T S I NS I N G L E-H O P, M U LT I H O P,A N D M U LT I D O M A I N N E T W O R K S

In Section II, we have seen that the use cases of major soci-etal impact enabled by low-latency networks will have bothproximal and remote communication sessions. As latencyis a complex function of several factors, it is useful toexamine the components of end-to-end latency in proximalcommunication scenarios, as well as over long distancesinvolving wired and wireless access networks and multido-main internetworks, as illustrated by Fig. 1, which we willexplain shortly in this section. We will often refer to thisfigure in the rest of the paper, with the purpose of analyz-ing which design principle and which enabling technologyare responsible for the various latency components.

In the generic multidomain scenario depicted in Fig. 1,information messages are transported by several net-works employing diverse technologies. For example, whenstreaming an online video on a cellular phone, the cor-responding service request may first be sent through asingle-hop wireless connection to the serving base sta-tion (BS). Subsequently, it may go through a series ofwired/wireless connections in the backhaul network anda set of intermediate networks to an application server.As another example, communications among hundreds ofsensors and actuators inside a vehicle and between vehi-cles and vulnerable road users and road-side infrastruc-ture equipment are inherently local and mission critical.Although both examples impose latency requirements, it isclear that the requirements imposed by mission-criticalservices can be order(s) of magnitude lower than quasireal-time entertainment services.

We can formally define latency as the time durationbetween the generation of a packet and its correct recep-tion at the destination. This message may go throughseveral networks as shown in Fig. 1, sometimes referredto as domains, which may not be handled by the sameoperator. It is difficult to guarantee a predefined totallatency, as different networks may introduce random laten-cies whose exact values or even statistical distributions aretypically not known a priori. In other words, even thoughthe latency can be measured or estimated within a singlenetwork, meeting end-to-end latency guarantees remainsa challenging task, since no single network operator ornetwork entity has control over the end-to-end latency.

To gain a precise insight, consider Fig. 2, which illus-trates the most important components of latency. Oncea packet is generated, it is typically placed in a queue,and waits for transmission. Analogously, when a packetarrives at the receiver, it may be placed in a queue at alow layer of the protocol stack, waiting to be processedand delivered to higher layers. We define the sum ofthese queuing latency as Tq. To facilitate link sharingbetween multiple traffic classes, modern access and corenetworks provide differential treatments of packets, and

284 PROCEEDINGS OF THE IEEE | Vol. 107, No. 2, February 2019

Page 6: Low-Latency Networking: Where Latency Lurks and How to Tame Itmodiano/papers/CV_J_108.pdf · 2019-04-12 · Low-Latency Networking: Where Latency Lurks and How to Tame It This article

Jiang et al.: Low-Latency Networking: Where Latency Lurks and How to Tame It

Fig. 1. General network architecture. The end devices are connected to base stations or access points, which further communicate with

one another through router switches mostly by wired networks.

implement quality-of-service (QoS)-dependent packet han-dling, including, for example, QoS-aware scheduling andpriority queueing mechanisms. When multiple QoS classeswith different latency priorities are supported, the packetswith higher priority have lower Tq. iSimilarly, the totalprocessing time during the end-to-end communication canbe conveniently characterized by the aggregate processinglatency Tpr. The processing latency Tpr, different from Tq,is a function of physical, link layer and hardware tech-nologies, node processing capacities, and signal processingalgorithms.

Apart from Tq and Tpr, for a single hop, latency incor-porates the time it takes for a packet to get access to the—typically shared—medium (denoted by Ta), including thetime taken by technology-dependent control signaling. Forexample, successfully transmitting and receiving request-to-send and clear-to-send messages of the popular IEEE802.11 protocol family or processing scheduling grant

Fig. 2. The latency formation of single-hop and multihop

communication.

messages of the 3GPP long term evolution (LTE) protocolsuite contribute to the latency over a single hop of anend-to-end path. Once the (wired or wireless) mediumis accessed for the delivery of a packet, we need toaccount for the transmission of that packet, which typicallyincludes a packet header and payload, amounting to atransmission time of Tt = Th + Tp.

When the end-to-end path involves multiple hops—eachof which is carried over dedicated or shared resources—the end-to-end latency is determined by the sum of theassociated medium access (Ta) and packet transmission(Tt) times. In addition, for multihop communications, thetime required for routing packets to the right outgoinginterface (Tr) adds to the end-to-end latency, as illustratedin Fig. 2. The routing (Tr) latency may be zero when thepacket is forwarded by an entity that does not performrouting, which is the case of a wireless relay or other entitythat operates at the MAC layer.

Recall that the main sources of latency variations—in addition to unpredictable interference levels, ran-dom appearance of shadowing objects, and other factorsappearing on the wireless interface—include the randomtraffic load along shared links and media as well asthe variations of the fast fading wireless channels. Theserandom factors make the end-to-end latency notoriouslydifficult to control and predict since the transmission ofa tagged data stream is affected by the fluctuating loadpattern of simultaneously delivered traffic streams [41].In Sections IV and V, different techniques for intranetworksand internetworks as shown in Fig. 3 will be investigated.

Vol. 107, No. 2, February 2019 | PROCEEDINGS OF THE IEEE 285

Page 7: Low-Latency Networking: Where Latency Lurks and How to Tame Itmodiano/papers/CV_J_108.pdf · 2019-04-12 · Low-Latency Networking: Where Latency Lurks and How to Tame It This article

Jiang et al.: Low-Latency Networking: Where Latency Lurks and How to Tame It

Fig. 3. Enabling technologies for low-latency communication, each of them may affect single or multiple delay components.

IV. I N T R A N E T W O R K T E C H N I Q U E SA N D T E C H N O L O G I E S

As illustrated in Fig. 2, the end-to-end latency is anaggregate of the latency components that are associatedwith queueing and processing (Tq + Tpr), medium access(Ta), transmission (Tt = Th + Tp), and routing (Tr).In low-latency networks, techniques at the physical, MAC,network, and transport layers will have to be designedto reduce these components (see Table 2), also in across-layer collaboration. These techniques will have tobe tailored to reduce a specific component or deal witha combination of them.

In this section, we discuss physical, MAC, network, andtransport layer techniques that can be used to reducelatency, and at the same time we take into consideration

constraints related to spectral efficiency, energy efficiency,peak data rate, and capacity. Also, this section discussestechnology enablers that are not specifically designed todeliver latency-critical services, but can be potentially usedto meet latency targets.

A. Physical Layer Techniques

Physical layer (PHY) techniques are fundamental instriking a good engineering tradeoff between latency, reli-ability, spectral and energy efficiency, and communicationrange. To quantify the effects of PHY techniques on latency,recall that, assuming a fixed-length packet, an highertransmission rate implies lower transmission time (Tt).In a single-antenna system, the maximum transmissionrate between a transmitter and its intended receiver is

Table 2 Potential Methods to Achieve Low-Latency in a Single Network

286 PROCEEDINGS OF THE IEEE | Vol. 107, No. 2, February 2019

Page 8: Low-Latency Networking: Where Latency Lurks and How to Tame Itmodiano/papers/CV_J_108.pdf · 2019-04-12 · Low-Latency Networking: Where Latency Lurks and How to Tame It This article

Jiang et al.: Low-Latency Networking: Where Latency Lurks and How to Tame It

upper bounded by

C = B log2(1 + SINR) [bps] (1)

where B is the bandwidth, and the instantaneous signal-to-interference-plus-noise ratio (SINR) is

SINR =|h|2Pt

N0B + PI(2)

where h is the instantaneous channel gain, Pt is thetransmit power, N0 denotes the noise spectral density, andPI is the received interference power.

Adaptive modulation and coding is a powerful techniqueto increase the spectral efficiency or decrease the biterror rate (BER). Generally speaking, for sufficiently highSINR, higher modulation orders together with light cod-ing schemes boosts the transmission rate and reduces Tt.However, when the channel becomes poor or the receivedinterference is high, the transmitter should reduce thetransmission rate (by adopting lower modulation orders orstronger coding schemes) to maintain a target BER [42].

From (1) it is straightforward to see that the bandwidth(B) acts as a multiplication factor in the transmission ratecalculation, thus increasing the bandwidth is an effectiveway to increase the transmission rate and thereby toreduce latency. As B also affects SINR from (2), the capac-ity is not a linear function of the bandwidth. However,we focus on the range where capacity increment withbandwidth is not negligible, i.e., it has not yet saturated.Despite that this is the range in which the reliability is notvery high, there could be an interesting tradeoff amongcapacity, latency, and reliability. Taking millimeter-wave(mmWave) communication as an example, the abundantbandwidth at mmWave frequency bands enables it toachieve multigigabit transmission rates—even with a verylow order of modulation—at the expense of loweringspectral efficiency.

1) Adaptive Modulation and Coding Schemes: The ran-domness of the fading wireless channel, |h| in (2), causesthe SINR at the receiver to fluctuate. However, when thetransmitter is able to acquire channel state information(CSI), it can adapt its transmission parameters, whichhelps to achieve high data rates, which in turn helpsto reduce Tt by adaptively setting the transmit power,modulation scheme, coding rate, or the combinations ofthese parameters. Indeed, as was shown in [43], adaptivetransmission schemes provide higher average transmissionrates compared to nonadaptive transmission schemes. Thisis because in order for nonadaptive schemes to operatewith acceptable BER, they need to be designed for theworst case channel conditions, and therefore operate withlow spectral efficiency even when the channel condition isfavorable. The seminal work in [43] presented optimal andsuboptimal adaptation policies, including the total channel

inversion scheme that adjusts the transmission power tomaintain a constant received power, and truncated channelinversion scheme that only compensates for fading abovea certain cutoff fade depth (see [44] for details). TheCSI feedback of adaptive modulation and coding takesresources and increases the latency. However, the overheadby the CSI feedback may be compensated for by the gainof the average transmission rate. For bidirectional com-munication with a packet exchange period shorter thanthe required CSI feedback interval, the CSI feedback canalways be piggybacked in the packets sent back to thetransmitter, further reducing the CSI overhead.

2) Waveform Design: Once a message (and the corre-sponding data packet) is mapped onto symbols with anappropriate modulation and coding scheme, selecting aproper waveform modulation scheme has a great impacton the transmission time. Indeed, recognizing the impor-tance of waveform selection, the research and standard-ization community spent a large effort on analyzing theadvantages and disadvantages of waveform candidates forthe next generation of wireless systems [45].

Single-carrier (SC) and multicarrier (MC) modulationsare two categories of waveform modulations. With SC,each signal is spread over the whole available bandwidth.The transmission time for each signal is short, but with-out time guard protection, the delay spread in wirelesschannels may cause intersymbol interference (ISI) to sub-sequent symbols. To combat ISI, MC modulation dividesthe available bandwidth into multiple narrow subcarriers,and maps one modulated signal to each subcarrier. Thetransmission time of each signal increases, making thedelay spread relatively short and presenting a higher SINRat the receiver side. However, the overall transmission timeof the payload of MC is comparable to that of SC.

Orthogonal frequency-division multiplexing (OFDM) isa widely used MC modulation technique employed by IEEEand 3GPP systems. It employs orthogonal subcarriers toachieve bandwidth efficiency. A brief comparison betweenOFDM and SC in terms of latency is given below. First, thetransmission rate of each subcarrier in OFDM is smallerthan that provided by SC modulation. However, the aggre-gate transmission rate over all subcarriers is comparable tothat of SC. In OFDM, a cyclic prefix (CP) is used to increaserobustness against ISI in multipath propagation environ-ments. Although the overhead introduced by the CP, whichmust be greater than the maximum delay spread of thechannel, is not negligible, it is still more economical thanSC modulation where one guard time interval is neededfor each symbol [46]. Apart from the high achievablespectral efficiency in multipath propagation environments,OFDM has the flexibility of adaptively assigning differenttransmission powers and symbol constellations to eachsubcarrier, which helps strike a good balance betweentransmission rate and reliability in frequency selectiveenvironments. In terms of processing latency, OFDM leadsto somewhat longer latency than block processing at both

Vol. 107, No. 2, February 2019 | PROCEEDINGS OF THE IEEE 287

Page 9: Low-Latency Networking: Where Latency Lurks and How to Tame Itmodiano/papers/CV_J_108.pdf · 2019-04-12 · Low-Latency Networking: Where Latency Lurks and How to Tame It This article

Jiang et al.: Low-Latency Networking: Where Latency Lurks and How to Tame It

the transmitter and the receiver, but its equalization canbe done efficiently on a subcarrier basis using a simpleone-tap equalizer. A more detailed analysis can be foundin [47] and the references therein.

The orthogonality of the subcarriers is essential forOFDM to avoid intercarrier interference, which requireshigh precision synchronization between the transmitterand the receiver. Thus, traditional OFDM with long syn-chronization is not ideal for meeting stringent latencyrequirements in a spectral efficient manner when trans-mitting short packets. The recently proposed wireless com-munication for high performance aims to support latencyin the order of microseconds for industrial control appli-cations using short packets [48]. Wireless communicationfor high performance reduces the PHY header of OFDMby taking advantage of the predictive and periodic trafficpattern of industrial applications.

The research community has developed new waveformsthat help relax the strict synchronization requirement,e.g., universal filtered multicarrier, filter bank multicarrier,generalized frequency-division multiplexing (GFDM) [45],and filter-OFDM [49]. These waveforms are nonorthogo-nal and thus intercarrier interference must be kept undercontrol by using filters to suppress the out-of-band emis-sion. Among the above asynchronous waveforms, GFDMis regarded as the most suitable for low-latency commu-nication [50], [51]. Compared with OFDM’s division onlyin the frequency dimension, GFDM has a block framestructure composed by M subsymbols and K subcarriers,and each subcarrier is modulated and filtered individually.

For low-latency communications, GFDM acts as a com-parable candidate to OFDM for several reasons. First,using GFDM may achieve lower Th, as it requires lesssynchronization accuracy compared to OFDM. Second, theduration of the GFDM symbol is more flexible than inOFDM, and may achieve lower Tp. To combat ISI, a CPis appended to each OFDM symbol, while nonorthogonalGFDM needs a single CP to protect the information con-tained in M subsymbols. To compare the T ′

ps of OFDM andGFDM, set the number of subcarriers of OFDM N as theproduct of the numbers of subcarriers and of subsymbols(N = M · K), which suggests that the subcarrier spacingof GFDM is M times that of OFDM, and the subsymbolduration of GFDM is 1/M that of OFDM. Given a CPduration, when mapping N bits of data, the T ′

ps of OFDMand GFDM are the same. However, if N +1 bits of data aresent, GFDM can easily be adapted to (M + 1) subsymbolswhile still using a single CP. On the other hand, OFDM canchange the subcarrier size to K resulting in (M +1) OFDMsymbols, each having one CP. This is equivalent to M

extra CP durations compared to GFDM. Third, distortionaccumulation can grow without bounds in OFDM whenorthogonality is not perfectly maintained, so OFDM musthave proportional subcarrier spacing to guarantee orthogo-nality [45]. GFDM inherently deals with nonorthogonality,and can therefore use nonproportional subcarrier spacing,and noncontinuous spectrum aggregation. Thus, GFDM

can also be used to improve the transmission rate. On thenegative side, GFDM requires high transmitter filter orderto achieve sharp filter edge, which increases both complex-ity and processing latency. To summarize, the ideal wave-form should be determined by the design criterion andoperational environment, including the tradeoff betweenreducing the transmission time and increasing complexityand processing latency.

3) Sending Short Packets: It is intuitive to think thatshort packets are needed for short transmission times inlatency sensitive communications. The use of short pack-ets brings a difference to the maximum coding rate andthe packet error probability [52]. Specifically, for a givenpacket error probability and for a finite packet length n,the maximum coding rate reduces by a factor proportionalto 1/

√n [53]. The maximum coding rate for finite packet

length with multiple antennas when considering the trade-off of diversity, multiplexing, and channel estimation isinvestigated in [54]and [55]. The maximum coding rateaffects the minimum packet transmission time Tt, andfurther affects the end-to-end latency.

With traditional long packets, the payload data are muchlonger than the metadata (control information), and thusthe metadata are coded with a low rate to be robust.As the metadata account for only a small fraction of thewhole packet, the overhead caused by the low codingrate can be neglected since it virtually does not affect thelatency performance. With short packets, however, the sizeof metadata Th is comparable to the size of payload dataTp. In this case, the overhead introduced by the metadatacannot be neglected, rendering coding metadata with alow rate inefficient. Therefore, to increase the spectralefficiency, metadata and payload should be coded jointly,the details of which is an open research issue [52].

B. MAC Layer Techniques

The MAC layer is responsible for synchronization, initialaccess, interference management, scheduling, and rateadaptation. While (1) governs the maximum achievablerate, inefficient initial access, queue management, andchannel access strategies may substantially reduce theeffective transmission rate of individual devices.

Define the one-hop access latency for every packet as thetime from the instant the node starts sending that packetfor the first time until the beginning of its successful trans-mission. It includes the processing and queuing latencies,which we do not cover in this paper, the association latency(only in the initial access phase), delay-to-access (latencyuntil the start of the scheduled time slot in contention-freemanner or that before the first transmission attempt in thecontention-based manner), and the retransmission delayin the case of decoding failure. In the following, we reviewmain techniques used to reduce these latency components.As we will see, this reduction comes (usually) at the priceof a higher computational and signaling overheads, andalso a penalty in energy and spectral efficiencies.

288 PROCEEDINGS OF THE IEEE | Vol. 107, No. 2, February 2019

Page 10: Low-Latency Networking: Where Latency Lurks and How to Tame Itmodiano/papers/CV_J_108.pdf · 2019-04-12 · Low-Latency Networking: Where Latency Lurks and How to Tame It This article

Jiang et al.: Low-Latency Networking: Where Latency Lurks and How to Tame It

1) Initial Access and Association: Initial access and asso-ciation are among the most important MAC layers func-tions that specify how a new device should connect tothe network. This is usually handled by a synchroniza-tion process and then a random access phase, by whichthe network registers the device as active. The latencycaused by the association procedure may be tolerablewhen the devices have to be connected all the time (e.g.,mobile phones), the data size is large (e.g., camera sensornetworks), or the handover time is negligible; see alsoSection IV-B3. In many IoT applications with massivenumber of wakeup radios each having just a few bits ofdata, however, the association latency may become muchlonger than the data transmission time. Unfortunately,most of the currently used standards are not capableof supporting a low-latency initial access procedure: theinitial access deadline of 10 ms for 3GPP [56] or 20 ms forITU [57]. That is why the existing standards only consider“connected users” for low-latency services and assume“normal” initial access. Designing more efficient synchro-nization and initial access procedures for low-latency net-works seems to be widely open areas.

2) Delay-to-Access and Retransmission: The MAC layerscheduling is responsible for the delay-to-access compo-nent. A MAC protocol is contention free if messages donot collide during its execution, which is usually guar-anteed by the allocation of orthogonal communicationresources to different devices. The orthogonality of con-tention free can be realized in the time domain [e.g.,time-division multiple access (TDMA)], in the frequencydomain [e.g., frequency-division multiple access (FDMA)],in the code domain [e.g., code-division multiple access(CDMA)], in the spatial domain [e.g., multiuser multiple-input–multiple-output (MU–MIMO)] or any combinationof those domains. In contention-based MAC protocols,devices contend to access the channel, and as a result,some messages are lost due to inevitable strong interfer-ence (also called collision).

Contention-free MAC protocols can guarantee certainlatency and jitter at the price of signaling and computa-tional overheads, which may not be tolerable or efficientin the use cases of low-latency networks with a largenumber of devices and each just has a few bits of data.Consider the example of massive IoT devices with wakeupradios. After being associated to the network, the radiosshould send a channel access request to a local coordinator(e.g., AP or BS), wait for the channel access notification,and follow the instruction to transmit their few bits ofinformation. For the TDMA strategy applied to N devices,the average latency is N/2 × Tt, where Tt is the time slotduration for each device. Clearly, this delay is not scalablewith the number of devices. FDMA, CDMA, and MU–MIMOdo not have this problem but the devices should stillregister their channel access requests and wait for theinstruction (carrier frequencies in FDMA, codes in CDMA,and beamforming vectors in MU–MIMO), which could be

problematic in massive wireless access scenarios [58]; seealso Section IV-E.

In contention-based MAC protocols, there is no con-troller that governs the scheduling. This class of proto-cols have low signaling and computational overheads, butthey impose a random latency for channel establishment.Slotted-ALOHA is a very simple contention-based schedul-ing, in which a device tries to access the channel at the startof the next slot as soon as it gets a new packet [59]. In thecase of collision, it retransmits the packet after a randombackoff whose average increases after every new collisionevent. It has a good delay and throughput performancewhen the offered load (traffic arrived per unit time) islow. The devices do not need to have any signaling toreserve the channel, unlike the contention-free protocols.When the offered load increases, however, the delay andthroughput performance of slotted-ALOHA degrades dra-matically due to continuous backoff caused by repeatedcollisions.

Carrier-sense multiple access (CSMA) protocol intro-duces channel assessment that allows each device to checkwhether the channel is idle before transmission. Thistechnique, together with the random backoff procedure,substantially reduces the collision probability, leading to abetter performance in high-load scenarios. Yang et al. [60]studied the delay distribution of slotted-ALOHA and CSMAand showed good delay performance of both schemesin low traffic regimes. CSMA is known to have hiddenand exposed node problems, which may cause collisionand unnecessary deferred transmission [61]. Carrier-sensemultiple access with collision avoidance uses request tosend and clear to send and clear to send signals to reservethe channel in a distributed fashion so as to reduce colli-sion probability. However, the extra signaling may reducethe throughput by as much as 30% [62] and increase thedelay. This inefficiency becomes more prominent when wego to high data rate technologies like mmWave networkswhere the transmission rate of data signals is 100–1000xhigher than that of control signals. In that case, theoverhead of the collision avoidance mechanisms can dropthe link performance to only 10% [63]. Fig. 4(a) showsthe latency (due to delay-to-access and retransmission)performance of slotted-ALOHA, CSMA, and TDMA for anetwork of single-antenna devices containing 100 trans-mitters. In all scenarios, increasing the input trafficsincreases both network throughput and average delay,almost linearly. Once the network operates close to itscapacity, increasing the input traffic would not improvethe network throughput but substantially increases thedelay. In contention-based algorithms, there is a criticalvalue of the input traffic after which the network becomescongested, and adding more traffic decreases the networkthroughput and increases the delay exponentially. Fromthis figure, slotted-ALOHA and CSMA are preferable forlight traffics, carrier-sense multiple access with collisionavoidance (not shown in this figure) for medium traffics,and TDMA for intensive traffics. Note that the overhead

Vol. 107, No. 2, February 2019 | PROCEEDINGS OF THE IEEE 289

Page 11: Low-Latency Networking: Where Latency Lurks and How to Tame Itmodiano/papers/CV_J_108.pdf · 2019-04-12 · Low-Latency Networking: Where Latency Lurks and How to Tame It This article

Jiang et al.: Low-Latency Networking: Where Latency Lurks and How to Tame It

Fig. 4. Delay versus throughput of slotted-ALOHA, CSMA, and

TDMA. Two numbers of users, 10 and 100, are set for TDMA. The unit

of delay is the packet transmission time. (a) Omnidirectional

antenna. (b) Directional antenna (based on [64]).

of TDMA channel access request is not shown in thisfigure.

When we deal with devices with multiple anten-nas, the traditional tradeoffs and design constraints maychange dramatically [64]–[68]. Directional communica-tion alleviates the hidden and exposed node problemsand reduces the interference footprint. The collisionavoidance mechanism are less important and CSMA andslotted-ALOHA substantially outperform TDMA in terms ofthroughput and delay [64]. The analysis in [64] showsthat directional communication is key to support lowlatency in massive wireless access scenarios. In par-ticular, higher directionality levels reduce the need forcomplicated scheduling approaches with high signalingand computational overheads. Fig. 4(b) shows the delayperformance of TDMA, slotted-ALOHA, and CSMA for a

network of multiple-antenna devices. Now, slotted-ALOHAand CSMA are preferable for almost any traffic, giventhat obtaining beamforming vectors is not time consum-ing. Nitsche et al. [69] use dual radio wherein oneradio is responsible for obtaining directional informa-tion all the time to be fed to the other radio thatis responsible for the data transmission. This techniquedecreases the delay to find proper beamforming vectors.Olfat et al. [70] relax the dual-radio requirement anddevelops a reinforcement-learning approach to design theoptimal beamforming vectors with a minimal number ofpilots. However, the research area of low-overhead beam-forming is widely open and further research is required.

An advanced technique to utilize correlatable symbolsequences in place of the traditional control messages isproposed in 802.11ec [62]. Correlatable symbol sequencesare predefined pseudonoise binary codewords that retainthe statistical properties of a sampled white noise, thecross correlation of which obtains spike value only with amatching copy. These sequences can be correctly decodedat very high transmission rates due to the robustness ofthe receiver. Shokri-Ghadikolaei et al. [63] extend the useof these sequences to the mmWave networks to addressthe imbalance transmission rates of the control and dataplanes, and improve the efficiency of distributed channelreservation for ad hoc mmWave networks. Interestingly,the extended approach can substantially alleviate the needfor retransmission of collided channel reservation packetsas different packets (of different types such as requestto send/clear to send or of the same type transmittedby different devices) can be recognized at the receivereven when they arrive at the same time. This techniqueproves to be suitable for low-latency communications inad hoc networks. Grant-free communication schemes forshort packets are a new approach that is proposed in[71]and [72]. These schemes tolerate collisions by send-ing replica packets at the transmitter side and applyingsuccessive interference cancellation at the receiver side.In [73], Narasimha Swamy et al. propose a frame-basedcontention-free protocol for star networks that exploitsthe cooperative multiuser diversity. Specifically, in eachframe, the devices that have successfully sent their datahelp the remaining unsuccessful devices by relaying theirdata toward the access point (AP). This multiuser diversityapproach is also combined with network coding to furtherimprove the reliability and delay performance [74].

3) Mobility Management: Mobility management andhandover are essential procedures of a mobile network tomaintain a connection. During the handover, the mobileterminal switches the set of its serving APs/BSs based onsome performance measures (e.g., received signal strengthindicator). When a terminal is capable of communicatingto more than one AP at a time, soft handover eliminatesthe need for breaking radio links. For other technologieswhere no such capability is supported, the mobile terminalcan be served in parallel by up to one AP and therefore

290 PROCEEDINGS OF THE IEEE | Vol. 107, No. 2, February 2019

Page 12: Low-Latency Networking: Where Latency Lurks and How to Tame Itmodiano/papers/CV_J_108.pdf · 2019-04-12 · Low-Latency Networking: Where Latency Lurks and How to Tame It This article

Jiang et al.: Low-Latency Networking: Where Latency Lurks and How to Tame It

has to break its communication with its current AP beforeestablishing a connection with a new one. Traditionally,reducing the signaling overhead and algorithmic complex-ity of handover were the primary design concerns of almostall existing commercial systems.

Choi et al. [75]–[77] proposed various algorithms forfast hard handover algorithms in cellular networks. Thegeneral idea of all these schemes is to anticipate handoverevents and perform some operations prior to the breakof the radio link. More precisely, these references focusedon mobile broadband wireless access (in particular IEEE802.16e) and introduced different mobility managementmessages that enable receiving data packets during thehandover process. Li et al. [78] generalized the idea of fasthandover to mobile IPv6 to maintain the connectivity ofany mobile node to the internet with low latency. In thisapproach, previous AP and new AP will coordinate to speedup the reassociation process of the mobile device to thenew AP. Dimopoulou et al. [79] provide a thorough reviewof the existing fast handover strategies for wireless localarea network networks.

A common drawback of these approaches is that theprediction of the handover event is only based on thelocation of the mobile terminal. In a wireless network withdirectional communications, however, even fixed terminalsmay need to execute the handover procedure if an obsta-cle appears on the communication link. When handoverbecomes frequent, e.g., in mmWave networks [80], softhandover strategies seem to be a better option to supportlow-latency services. Semiari et al. [81] considered a densenetwork of dual-mode APs that integrate both mmWaveand microwave frequencies. Each mobile terminal cancache their requested contents by exploiting high capacityof mmWave connectivity whenever available, and conse-quently avoid handover while passing cells with relativelysmall size. Xu et al. [82] introduced a distributed solutionapproach that reserves some portions of the capacity ofeach AP to ensure fast handover in mmWave networks.In particular, the proposed algorithm maintains two listsfor every AP: “active” and “potential” terminals. In thecase of handover event for potential terminals, the old andnew AP coordinate to eliminate the registration latencyand support seamless handover.

4) Energy-Delay-Spectral Efficiency Tradeoffs: There areapplications that impose less stringent latency require-ments than URLLC use cases, but require long life-time. Sensors packaged in certain products and sensorsimplanted in the human body, for example, typically tol-erate a couple of hundreds of milliseconds latency, butrequire a battery lifetime of years or decades. For suchapplications, the energy-delay-spectral efficiency tradeoffarises as an important issue. Modern wireless devicesare equipped with rate-adaptive capabilities which allowthe transmitter to adjust the transmission rate over time.Associated with a rate, there is a corresponding powerexpenditure that is governed by the power-rate function.

Specifically, such a function is a relationship that gives theamount of transmission power that would be required totransmit at a certain rate. For most encoding schemes, therequired power is a convex function of the rate, whichimplies that transmitting data at a low rate and over alonger duration has less energy cost compared to a fastrate transmission. This fact, first observed in [83], inducesa tradeoff between transmission energy and delay. In par-ticular, in [83], the problem of rate adaptation subjectto deadline constraints on packets was studied, and itwas shown that a “lazy” packet scheduling scheme thattransmits at the lowest possible rate to meet the deadlineis energy optimal. This idea was extended in [84] to gen-eral QoS constraints, where packets have different dead-lines, and using techniques from network calculus [85],an energy-optimal rate adaptation policy was devised.Moreover, algorithms for minimum energy transmissionover time-varying channels were developed in [86]. A com-prehensive overview of energy-delay tradeoffs in wirelessnetworks can be found in [87].

C. Routing Layer Techniques

To ensure a desired end-to-end latency in a multihopnetwork, the routing policy plays an important role in end-to-end latency. In this section, we first focus on the routingalgorithms that are proposed for fixed topologies and thendiscuss those developed for dynamic and ad hoc networks.

1) Routing Algorithms Based on Back-Pressure Scheduling:Throughput optimality is at the heart of most existing rout-ing algorithms. Back-pressure (BP) routing algorithm [88]is among the most celebrated ones that is provablythroughput optimal, but has quadratic end-to-end queuingdelay in terms of the number of hops. When implementingthe traditional back-pressure scheduling, each node keepsa separate queue for each flow. Then, in each time slot,each link checks the queues for all the flows passingthrough it, finds the flow with the maximum differentialbacklog, and uses this value as the weight for that link.At last, the transmission scheme with the highest weightunder the interference constraints will be scheduled.

Bui et al. [89] modified the BP algorithm to make theend-to-end delay grow linearly, instead of quadratically,with the number of hops. The price is a minor through-put loss, which might be of secondary importance forlow-latency networks. Their proposed algorithm uses theconcept of shadow queues to allocate service rates to eachflow on each link with a substantially lower number of realqueues.

Another limitation of the BP routing algorithm is thatevery device in the network should be able to applythe BP algorithm. In a real network, however, we maybe able to upgrade just a few routers. To address thisproblem, Jones et al. [90] considered an overlay archi-tecture in which a subset of the nodes act as overlaynodes and perform the BP routing, while the other nodescan perform legacy routing algorithms such as shortest

Vol. 107, No. 2, February 2019 | PROCEEDINGS OF THE IEEE 291

Page 13: Low-Latency Networking: Where Latency Lurks and How to Tame Itmodiano/papers/CV_J_108.pdf · 2019-04-12 · Low-Latency Networking: Where Latency Lurks and How to Tame It This article

Jiang et al.: Low-Latency Networking: Where Latency Lurks and How to Tame It

path [91], [92]. Simulation results show that with a smallportion of overlay nodes, the overlay network can achievethe full throughput, and outperforms the BP algorithm interms of delay. A decentralized overlay routing algorithmwas proposed to satisfy the average end-to-end delay fordelay-sensitive traffic [93].

In a wireless network, the delay performance ofany scheduling policy is largely affected by the mutualinterference, caused by the concurrent transmissions.Gupta et al. [94] derived a lower bound for the averagedelay of a multihop wireless network with a fixed routebetween each source–destination pair. The authors pro-posed a variant of BP to optimize the energy-delay-spectralefficiency tradeoffs and to perform close to the delay lowerbound.

2) Non-Back-Pressure-Based Routing: The BP routingalgorithm and its variants utilize the queue length as themetric for the delay [88]–[90], [94]. To keep the queuesstable and achieve throughput optimality, BP algorithmsprioritize highly loaded queues. When a flow has lighttraffic, this queue length metric can lead to a huge delay asa short queue gets a small chance to be served. To addressthis problem, head-of-line delays can be used to act as thelink weights instead of queue lengths [95], [96]. For amultihop network with a contention-based MAC protocol,Di Marco et al. [97] develop a new metric, called Q-metric,to jointly optimize the routing and MAC layer parameterswith the objective of optimizing the energy-delay tradeoff.

Recently, a new throughput optimal routing andscheduling algorithm, known as universal max-weight(UMW), was proposed in [98]. Unlike BP, UMW usessource routing, where each packet is routed along a short-est path and the cost of each edge is given by the queuebacklog in a corresponding virtual network. Since UMWuses source routing, it can choose a “shortest path” for eachpacket and avoid the looping problem that is inherit to BP.Consequently, UMW results in a significant delay reductionas compared to BP and its variants [98].

The surveyed routing algorithms need a fixed andknown topology a priori. In an ad hoc network, however,the topology may be unknown and dynamic, and the costof topology discovery may not be negligible when weconsider low-latency services. In the following, we reviewsome new classes of routing algorithms that suit ad hocnetworks.

3) Ad Hoc Network Routing: The ad hoc network isrepresentative for scenarios without static structure, likevehicular networks. The seminal work of [99] shows howthe delay scales with the number of devices in the network.For static random networks with n nodes, the optimalthroughput per node scales as T (n) = Θ(1/

√n log n), and

the delay scales as D(n) = Θ(n/√

log n).1 The resultssuggest that there is a tradeoff between the throughput

1Notation: i) f(n) = O(g(n)) means that there exists a constantc and an integer N such that f(n) ≤ cg(n) for n > N . ii) f(n) =Θ(g(n)) means that f(n) = O(g(n)) and g(n) = O(f(n)).

and the delay by varying the transmission power, i.e.,T (n) = Θ(D(n)/n). Higher transmission power increasesthe transmission range and can decrease the delay, dueto the possibly reduced number of hops. However, it alsoincreases the interference, leading to a drop in thethroughput performance. Yi et al. [100] extended theseresults to the case of directional antennas and showed that,under ideal beamforming assumptions, the throughputgain scales with θ−1

t θ−1r where θt and θr are the antenna

beamwidths of the transmitter and the receiver, receptively.Grossglauser and Tse [101] showed that mobility improvestransmission range and achievable throughput at the priceof higher delay. Specifically, the achievable throughputcan be improved to T (n) = Θ(1), and delay scales asD(n) = Θ(n log n). Note that these works do not considerthe overhead of channel estimation, nor signaling andcomputational overheads, so that the actual delay can belarger.

The delay and capacity tradeoff for multicast routingwith mobility in ad hoc networks is studied in [102]. Theresults are based on two-hop relay algorithms without andwith redundancy (redundancy is used to denote trans-mitting redundant packets through multiple paths withdifferent relay nodes). When a packet is sent to k desti-nations, the fundamental tradeoff ratio is D(n)/T (n) ≥O(n log k). The delay of the two-hop relay algorithm withredundancy is less than that without redundancy, but willcause decrease to the capacity. More recently, in [103], thescaling of capacity and delay for broadcast transmissionin a highly mobile cell partitioned network was studied.Using an independent mobility model as in [104], it wasshown that, in a dense wireless network (number of nodesper cell increases with n), the broadcast capacity scales as1/n, while the delay scales as log log n. Surprisingly, it isalso shown that both throughput and delay have worsescaling when the network is sparse.

SPEED [105] proposed a routing algorithm to supportreal-time communications via providing per-hop delayguarantees and bounding the number of hops from sourceto destination. SPEED is extended in [106] to satisfymultiple QoS levels measured in terms of reliability andlatency. To this end, this approach employees a localizedgeographic packet forwarding augmented with dynamiccompensation for local decision inaccuracies as a packettravels toward its destination. Therefore, there is no needfor any global information or a centralized entity to designthe routing, making this algorithm suitable for large anddynamic wireless sensor networks.

Zhang et al. [107] propose a variation of the cele-brated proactive distance vector routing algorithm [108]to guarantee end-to-end latency. In particular, each nodeestablishes and maintains routing tables containing thedistance (number of hops) and next hop along the short-est path (measured by the minimal number of hops) toevery destination. To support a delay-sensitive service, thetransmitter probes the destination along the shortest pathto test its suitability. If this path meets the delay constraint,

292 PROCEEDINGS OF THE IEEE | Vol. 107, No. 2, February 2019

Page 14: Low-Latency Networking: Where Latency Lurks and How to Tame Itmodiano/papers/CV_J_108.pdf · 2019-04-12 · Low-Latency Networking: Where Latency Lurks and How to Tame It This article

Jiang et al.: Low-Latency Networking: Where Latency Lurks and How to Tame It

the destination returns an ACK packet to the source, whichreserves link-layer resources along the path. Otherwise,the destination initiates a flood search by broadcastinga route-request packet. The flooding is controlled by thedelay constraint, namely an intermediate node forwardsthe route-request packet only if the total delay to thedestination is less than the threshold. When a copy ofthis packet reaches the source with a path that meets thedelay constraint, the route discovery process is complete.Lin [109] proposed a similar idea for mobile networkswhere the latency requirement will be met by reservingsufficient link-layer resources along the shortest path.

There are several survey papers on the QoS routing pro-tocols for general ad hoc networks [110] and for vehicularnetworks [111], [112] where the topology is dynamic.

D. Transport Layer Techniques

The transport layer is responsible for the flow andcongestion controls, and affects the queueing latency Tq.In general, the communication ends may belong to differ-ent network operators. However, when they both belong tothe same network, we can optimize the transport layer forlow-latency services. Transmission control protocol (TCP)and user datagram protocol are the two main protocolsin this layer. TCP provides reliable end-to-end communi-cations, independent of the underlying PHY,2 while userdatagram protocol does not have such a reliability guaran-tee [113]. When observing a packet loss at the receiver,TCP assumes that the underlying network is congestedand intermediate queues are dropping packets. Therefore,it reduces the transmission rate at the transmitter to asmall baseline rate to alleviate the congestion in a distrib-uted fashion, of course at the expense of a higher end-to-end delay. This congestion control approach is problematicin the presence of some faulty PHYs (e.g., wireless links) inthe end-to-end connection, because by every packet loss,which may happen frequently, TCP drops the transmissionrate dramatically. Moreover, the measure of TCP to detectcongestion cannot help early congestion avoidance andleads to the well-known “full buffer problem” [114]. In thelast decade, there have been tremendous efforts to addressthese problems with active queue management, and manyof them are surveyed in [115].

To reduce the end-to-end delay, the existing solutionsessentially change the congestion indicators, feedbacktypes, and control functions at every intermediate node.For instance, controlled delay (CoDel) [116] changes thecongestion indicator to a target queue latency experiencedby packets in an interval, which is on the order of a worstcase round trip time. CoDel starts dropping packets whenthe expected queue latency exceeds the threshold. Theauthors showed that CoDel absorbs packet bursts in a short

2Note that packets may traverse several networks and thereforedifferent PHY technologies. For example, to watch a YouTube videoon our mobile phone, the packets will be transmitted over some fibers(YouTube server to our serving BS/AP) and also wireless links (BS/APto our phone).

term manner, while keeping buffers far from fully occupiedin the long term, to guarantee low-latency performance.

To reduce the latency in a mesh network,Alizadeh et al. [117] propose the high-bandwidthultralow-latency (HULL) architecture. The main idea ofHULL is similar to that of CoDel, i.e., keeping the buffersof intermediate nodes largely unoccupied. HULL tradesbandwidth for buffer space, as low-latency packets requireessentially no buffering in the network. HULL uses acounter, called a phantom queue in [117], to simulatequeue buildup for a virtual egress link with a slower ratethan the actual physical link (e.g., 95% of the link rate).The counter is put in series with the switch egress port,and is incremented by every new received packet, anddecremented according to the virtual drain rate (e.g., 95%of the link rate). When the counter exceeds the threshold,it will send explicit congestion notifications to adjustthe contention window size adaptively. Essentially, HULLsends early congestion signals before the saturation ofthe queues and reserves some portion of the link capacityfor latency-sensitive traffics to avoid buffering, and theassociated large delays.

E. Transmission Capacity Boosting andSharing Techniques

In the following, we review some recent techniques thatwere originally proposed for boosting the transmissioncapacity, but have the potential to reduce the end-to-endlatency.

1) Mu-MIMO: MU-MIMO and beamforming are essen-tial elements of almost all modern wireless systemsincluding LTE-A and IEEE 802.11ac. Notably, these tech-niques can help steer the radiated/received energy beamstoward the intended locations while minimizing interfer-ence, thereby improving the capacity region of the system[118]–[120]. We can now serve multiple users in the sametime-frequency channel, which can substantially reducethe delay-to-access component of the MAC layer latency.However, MU-MIMO requires the knowledge of CSI at thetransmitter and at the receiver.

As investigated in [121] and [122], increasing the num-ber of antennas simplifies the design of beamforming.In the asymptotic regime where the number of trans-mit antennas goes to infinity, very simple beamformingschemes, like matched filters [123], become optimal, as thewireless channels among the transmitter (e.g., AP or BS)and different devices become quasi-orthogonal [122].However, the price of increasing the number of antennasis a higher CSI acquisition delay, which may limit theapplicability of this technique for low-latency services.This problem is exacerbated in mmWave communications,as we discuss in Section IV-E2. Beamforming design formultiple-input–multiple-output networks has a very richliterature with focus on spectral efficiency [124]–[126],energy efficiency [125], [127], [128], and interferencecancellation [129]–[131], among others. However, sur-

Vol. 107, No. 2, February 2019 | PROCEEDINGS OF THE IEEE 293

Page 15: Low-Latency Networking: Where Latency Lurks and How to Tame Itmodiano/papers/CV_J_108.pdf · 2019-04-12 · Low-Latency Networking: Where Latency Lurks and How to Tame It This article

Jiang et al.: Low-Latency Networking: Where Latency Lurks and How to Tame It

prisingly, designing beamforming for low-latency MIMOnetworks is a largely open problem.

2) Millimeter Wave: mmWave systems operate on a largebandwidth and employ large antenna arrays to supportextremely high-data rate services, including the 8 Gb/speak data rate of IEEE 802.11ad and 100 Gb/s ofIEEE 802.11ay3 for a single link [132]. In a multiusermmWave network, the use of large antenna arrays dras-tically reduces the interference footprint and boosts thethroughput, as shown in [64], [67], and [133]–[138]. As aresult, mmWave networks experience almost negligibletransmission latency, but they may suffer from delay-to-access which includes pilot transmission and beamformingdesign. [139] overviews existing approaches for beam-forming design in mmWave networks. Most of the existingapproaches are based on some iterations (or, equivalently,a huge number of pilot signals) among transmitters andreceivers, which could be very time consuming giventhe low transmission rate of the control signals [63].To reduce the beamforming setup delay, Zhang et al. [140]and Zhao et al. [141] augmented the beamforming partby a tracking algorithm based on extended Kalman fil-ters, which track the second-order statistics of the chan-nel. However, those approaches need a mobility model.Olfat et al. [70] alleviate that assumption by proposinga model-free (data-driven) approach based on reinforce-ment learning to drastically reduce the number of pilots.Still, the area of low-overhead beamforming design forlarge antenna arrays is in its infancy and needs furtherresearch.

Ford et al. [142] studied the feasibility of supportinglow-latency services in mmWave cellular networks, partic-ularly 20 Gb/s data rate and 1-ms delay as specified byIMT 2020 [143]. The authors focused on beamformingaspects, MAC layer, congestion control at the transportlayer, and core network architecture,4 and proposed a setof solution approaches to meet the delay and throughputrequirements. In particular, the authors concluded thatdigital beamforming using low resolution analog-to-digital(A/D) converters is a good choice for reducing beamform-ing delay and control channel overhead. Moreover, a flex-ible transmission time interval exhibits a better latencyperformance at the MAC layer than the conventional fixedtransmission time interval.

3) Full Duplex: Full-duplex techniques enable transmis-sion and reception at the same time-frequency resources,leading to a substantial improvement in the transmis-sion capacity and therefore a reduction in the end-to-endlatency [144]. The challenge is the strong self-interference,which can be alleviated by passive suppression (mainlyrelated to antenna design) and active suppression (signal

3Detailed information about this project can be found athttp://www.ieee802.org/11/Reports/ng60_update.htm.

4Due to the strong connection to the internetwork latency com-ponents, we discuss low-latency core network architectures and edgecomputing in the next section.

processing and beamforming). The latter increases theprocessing delay.

The low-latency merits of full duplex are in threedomains. First, it decreases the transmission time (Tt) byincreasing the spectral efficiency. Second, it decreases theMAC layer latency, as it reduces the potential contention,especially for the star topology where the central coordi-nator can send a message to one node while receiving anuplink message from another one at the same time. Last,it can decrease the routing latency (Tr). With full duplex,the route selection algorithms may activate adjacent hopssimultaneously where an intermediate (i.e., relay) nodeoperates in both downlink and uplink directions, which cansubstantially reduce the routing delay.

To harvest the gains of full duplex, good cancellationtechniques as well as a good MAC layer design are nec-essary and other research problems need to be addressed.The interested readers can refer to [145] and the refer-ences therein.

4) Device-to-Device (D2D): D2D communication enablesdirect communication between devices without goingthrough the core of a cellular network [146], [147]. D2Dcommunication is a promising solution for the increasingnumber of connected devices and data rate. It helps toreduce the latency in two perspectives. First, D2D enablesthe devices to communicate with each other in singlehop or fewer hops than communication via a BS, thusreducing the routing delay Tr [148]. Second, the localtraffic is separated from the global network (local trafficoffloading). Thus, the D2D mode reduces medium-accessdelay Ta, as the devices in D2D mode retrieving from thelocal source devices are fewer than those communicatingwith BS, and will access the channel faster. Non-D2D modedevices retrieving contents from the core network alsobenefits with lower Ta, because D2D mode offloads partof the contending devices.

If the D2D mode shares the same resources with thecore network communication, resource allocation shouldbe used to mitigate interference and guarantee low latencyto both D2D and non-D2D users. The integration ofmmWave and D2D can substantially alleviate the resourceallocation problem [149], thanks to the small interferencefootprint of mmWave networks [64], [67], [133]–[138].Niu et al. [150] showed that we can add many D2D linksto an mmWave cellular network so as to substantiallyboost the network capacity. The authors have also devel-oped a simple scheduling scheme to activate concurrenttransmissions to support low-latency content download-ing. The main challenges are the device discovery and theexchange of control signals, which are usually much harderin mmWave networks [80].

5) Cloud RAN and Mobile Fronthaul: A radio BS con-sists of a baseband unit (BBU) and a radio-frequencyunit. The concept of cloud radio access network con-sists in breaking the fixed topology between BBUs andremote radio heads (RRHs), and to form a virtual BBU

294 PROCEEDINGS OF THE IEEE | Vol. 107, No. 2, February 2019

Page 16: Low-Latency Networking: Where Latency Lurks and How to Tame Itmodiano/papers/CV_J_108.pdf · 2019-04-12 · Low-Latency Networking: Where Latency Lurks and How to Tame It This article

Jiang et al.: Low-Latency Networking: Where Latency Lurks and How to Tame It

pool for centralized control and processing [151]. Cloudradio access network supports intercell communicationand joint processing, which can reduce the routing latency(Tr) as well as the processing latency (Tpr). Mobilefronthaul is a novel optical access method that con-nects a centralized BBU to a number of RRHs with fiberlinks in mobile networks [152]. A SDN-controlled opticaltopology-reconfigurable fronthaul architecture is proposedfor 5G mobile networks [153]. The SDN-controlled fron-thaul architecture is responsible for the dynamic config-uration of the BBUs and RRHs connections to supportcoordinated multipoint and low-latency intercell D2D con-nectivity. The experimental results show that with 10-Gb/speak data rate, submillisecond end-to-end D2D connectiv-ity is achievable [153].

In [154], a combination of fiber and mmWave is pro-posed for efficient fronthauling to lower the cost andsupport mobility in small cells and moving cells. mmWaveis used for transmission between the large number of RRHsor moving RRHs and a remote antenna unit, and thefiber optic is used for the connection between the remoteantenna unit and BBU pool. To reduce the latency causedby the conversion between optic signal and mmWave sig-nal, analog waveform transmission is used for eliminatingthe digital processing, and a unitraveling carrier photodi-ode optical-to-electrical converter is used to provide fastconversion.

F. Cross-Layer Techniques

1) Forward Error Correction and Hybrid Automatic RepeatRequest Techniques: Automatic repeat request is a sim-ple error-control method for data transmission that usesacknowledgements and timeouts to achieve reliable datatransmission over an unreliable PHY [155]. Acknowledg-ments are short messages sent by the receiver indicat-ing that it has correctly received a packet, and timeoutis a predefined latency allowed to receive an acknowl-edgment. If the sender does not receive an acknowl-edgment before the timeout, it usually retransmits theframe/packet until the sender receives an acknowledg-ment or exceeds a predefined number of retransmis-sions. Block acknowledgement—initially defined in IEEE802.11e—is a simple way to reduce the MAC layer latency,especially for high-throughput devices, by sending oneacknowledge packet for multiple data packets [156].

Hybrid automatic repeat request combines high-rate for-ward error-correcting coding and automatic repeat requesterror control to flexibly perform retransmission of incre-mental redundancy or a complete new retransmissionaccording to different channel states. If the received packetcannot be decoded correctly due to high noise, incremen-tal redundancy can be retransmitted for joint decodingtogether with the previously received packets. If stronginterference is the reason, a new start of transmission isneeded. There is a tradeoff between the transmission timeTt and medium-access delay Ta. With lower coding rate,

Tt gets longer, while Ta is shorter as the message can bereceived successfully with a higher probability and possiblyfewer retransmissions.

2) MAC-Aware Routing Algorithms: The delay and relia-bility performance interaction between the MAC and rout-ing of the protocol IEEE 802.15.4 is analytically studiedin [97]. Therein, it has been shown that the MAC para-meters will influence the performance of different routingpaths, and, in turn, the traffic distribution determined bythe routing will also affect the MAC parameters. For a giventopology, the MAC parameters (affecting Ta) can be tunedto satisfy a certain reliability and latency requirement byusing the Q-metric proposed in [97]. The Q-metric mea-sures the contention level without measuring the queues,and adapt the routing patterns (affecting Tr). While theback-pressure is proved to be throughput optimal, it isefficient only when the forwarded traffic is high, and itcannot capture the contention of low traffic. So Q-metric ismore efficient in latency-sensitive wireless sensor networksand other cases where the traffic is low.

A TDMA-based MAC for wireless sensor networks thathave latency requirements is delay guaranteed routingand MAC [157]. However, unlike traditional TDMA MACsthat require a separate routing mechanism, the routesof delay guaranteed routing and MAC as well as themedium-access slot schedule are determined and fixedaccording to the position of each node. The algorithm isTDMA based and no retransmissions are permitted. As longas the transmission interval of two successive packets ofeach node is larger than the TDMA superframe duration,a deterministic upper delay bound and minimal packet losscan be guaranteed.

G. Network Function Virtualization andSoftware-Defined Networking

Network function virtualization (NFV) and software-defined networking (SDN) cannot reduce the latency bythemselves alone, but they are the premise of some algo-rithms that help achieve low latency, so we also study themtogether as an enabling technique.

NFV can virtualize the network node functions intobuilding blocks that may connect, or chain together, to cre-ate communication services. With the necessary hardwaresupport, the system can change flexibly by employingdifferent NFV blocks. For instance, different asynchronouswaveforms can be generated by combining different filtersand modulation schemes [158], which affects the trans-mission time (Tt).

Though NFV can act alone, it is generally workingtogether with SDN. SDN decouples the control plane anddata plane, promoting centralized network control and theability to program the network [159]. SDN can be used toimplement and manage the NFV infrastructure by combin-ing and tuning parameters of multiple NFV blocks, suchas to enable flexible configuration of the virtual networkslices for different latency priorities [142]. By tuning

Vol. 107, No. 2, February 2019 | PROCEEDINGS OF THE IEEE 295

Page 17: Low-Latency Networking: Where Latency Lurks and How to Tame Itmodiano/papers/CV_J_108.pdf · 2019-04-12 · Low-Latency Networking: Where Latency Lurks and How to Tame It This article

Jiang et al.: Low-Latency Networking: Where Latency Lurks and How to Tame It

the parameters of different NFV blocks, which furthervirtualize different techniques and affects different latencycomponents, SDN and NFV may have an ample effecton various latency components. On the other hand, SDNwill introduce overheads from the control plane and flowsetup latency as it is flow based. Software-based NFV alsotends to increase the processing latency (Tpr) comparedto the pure hardware operation. The effect of these onthe delay performance should be carefully controlled andminimized.

The ETSI NFV architecture [160] acts as a referencestandard, whose performance and security have been wellstudied [161], [162]. Siddiqui et al. [163] propose a 5Garchitecture for low-latency and secure applications basedon ETSI NFV architecture. To achieve low latency withshared resources, the architecture employs the scalabilityenabled by SDN and NFV to perform on-demand cachingand switching, which can guarantee low medium-accesslatency (Ta), routing latency (Tr), and queuing latency(Tq) by flexibly allocating different resources based ondifferent traffic load and requirements including latency.Moreover, it also proposes to use a smart network inter-face card with NFV acceleration capability to mitigate theprocessing latency (Tpr) caused by NFV. A backbone net-work that provides high-performance connectivity in Japanis equipped with the capabilities of NFV and SDN [164].The NFV orchestrator creates virtual network appliances,such as virtual routers, virtual firewalls, and virtual loadbalancers. SDN enables users to establish connections withspecified bandwidth and demand, which flexibly scale upor down the virtual network appliances. In this way, SDNaffects all the latency components whose parameters hasbeen dynamically tuned, such as transmission time (Tt),medium-access latency (Ta), and routing latency (Tr).The latency is reduced by 20% compared to the previousversion network, and the autohealing time of a virtualnetwork functions also drops to 30 s from 6 min from theprevious version.

V. I N T E R N E T W O R K T E C H N I Q U E SA N D T E C H N O L O G I E S

Many modern end-to-end services require packets to tra-verse multiple networks, usually handled by different oper-ators. Unfortunately, in most practical scenarios, the exactlatencies of the intermediate networks are not known,and the service provider may have access only to thedelay probability distributions. As a result, controlling thelatency in a multidomain scenario is significantly morecomplicated than managing latency in single-domain sce-narios.

In this section, we address techniques that can be usedto control internetwork (also called multidomain) latency.We first review some multidomain routing algorithms thatguarantee end-to-end latency, though they may not meetthe tight requirements of low-latency services. We thenhighlight the importance of content placement and edge

caching to facilitate multidomain routing and substantiallyreduce latency.

A. Interdomain Routing

In Section IV-C, we have discussed various routing tech-niques that can reduce the latency within a domain. Bor-der gateway protocol (BGP) is a celebrated interdomainrouting protocol used throughout the internet to exchangerouting information among different domains [165].In BGP, the edge routers frequently send path vector mes-sages to advertise the reachability of networks. Each edgerouter that receives a path vector update message shouldverify the advertised path according to the policy of itsdomain. If it complies with its policy, the router modifies itsrouting table, adds itself to the path vector message, andsends the new message to the neighbor domain. As a result,edge routers of every domain maintain only one path perdestination.

Although BGP has evolved for many years, its cur-rent implementation is based on the number of hops.Arins [166] proposes a method to measure the latency,share it with the edge routers of the neighboring domains,and modify the BGP routing decisions throughout theentire network. Egilmez et al. [167] and Lin et al. [168]generalize this idea to incorporate a logical SDN in amultidomain network. Egilmez et al. [167] formulated aconstrained shortest path problem to minimize a convexcombination of packet loss and jitter subject to an end-to-end latency constraint. Paolucci et al. [169] demonstrateda multidomain SDN orchestrator to select the shortestroutes based on end-to-end latency, where the delay sta-tistics are captured by the proposed segment routing mon-itoring system.

Lui et al. [170] extend BGP to allow for path-diversityin the sense that multiple paths are advertised by any edgerouter, multiple QoS metrics including latency are prop-agated in the BGP update message, and multiple routesfor any source–destination pair are selected. The authorsshowed that path diversity substantially improves loadbalancing throughout the multidomain network and canreduce the end-to-end latency. Singh and Modiano [171]developed a dynamic routing policy for delay-sensitivetraffics in an overlay network. In particular, the authorsreplaced the average end-to-end latency requirement byan upper bound on the average queue length of everyflow on every link. Then, considering an underlying legacynetwork whose latency characteristics are unknown to theoverlay network, they formulated a constrained Markovdecision process that keeps the average queue lengthsbounded, and proposed a distributed algorithm to solvethat problem. Although this paper targets a single domain,the main idea can be extended to the case of multidomainnetworks. Classical stochastic shortest path [172] and itsonline variation [173] are highly relevant to the problem ofrouting design in multidomain networks when only imper-fect knowledge of latency within each domain is available.

296 PROCEEDINGS OF THE IEEE | Vol. 107, No. 2, February 2019

Page 18: Low-Latency Networking: Where Latency Lurks and How to Tame Itmodiano/papers/CV_J_108.pdf · 2019-04-12 · Low-Latency Networking: Where Latency Lurks and How to Tame It This article

Jiang et al.: Low-Latency Networking: Where Latency Lurks and How to Tame It

Multidomain routing design with limited domain-specificinformation is a very interesting and wide open area forfuture research.

B. Edge Caching and Content Placement

Define the feasible latency region as the set of all pos-sible latency values that can be achieved by some routingpolicy. In Section V-A, we observed that optimizing inter-domain routing reduces the end-to-end latency. However,when the content is located far from the terminal, thefeasible latency region may not include the desired valueof the low-latency service; namely, there is no routingalgorithm that can lead to the target latency value. As asimple example, consider a line network of ten domains(including source and destination ones), and assume thateach adds at least 10-ms latency. Therefore, the roundtriplatency is lower bounded by 200 ms, which may be waybeyond the tolerable latency. Caching popular contents atthe edge routers of local domains is a promising approachto bring the contents closer to the devices so as to improvethe feasible latency region.

Edge caching brings the content closer to the end termi-nals, which substantially reduces the interdomain routingdelay. From the local server perspective, fewer terminals(only local ones) contend to access that content, whichimproves the service rate to those terminals. Moreover,this technique offloads parts of the traffics of the mainserver, freeing the capacity for other terminals and furtherreducing the latency. Altogether, edge caching is especiallybeneficial for services with stringent latency requirements.

The design of efficient caching strategies involves abroad range of problems, such as accurate predictionof demands, intelligent content placement, and optimaldimensioning of caches. Moreover, as the caches arephysically scattered across the network and the userrequests are generated almost everywhere in the net-work, caching policy favors low-complexity distributedalgorithms. Borst et al. [174] propose a lightweight coop-erative content placement algorithm that maximizes thetraffic volume served from caches and thus minimizes thebandwidth cost. Maddah-Ali and Niesen [175], [176] focuson the problem of caching within a network where popularcontent are pre-fetched into the end-user memories tobypass a shared link to the server and showed that thereis a tradeoff among the local cache size (i.e., the mem-ory available at each individual user), aggregated globalcache size (i.e., the cumulative memory available at allusers), and the achievable rate (and latency) of individualterminals. The authors develop a simple coded cachingscheme in [175] and its decentralized variant in [176]that substantially improves the memory-rate tradeoff. Theanalysis and proposed algorithms, however, are limitedto the single-domain single-server case. Doan et al. pro-pose a novel popularity predicting-caching procedure forbackhaul offloading in cellular network, and an optimalcache placement policy to minimize the backhaul load

is computed by taking both published and unpublishedvideos as input [177].

Tan and Massoulié [178] focus on a video-on-demandapplication over a network with two modes of operations:peer-to-peer and data center. In particular, video requestsare first submitted to the peer-to-peer system; if they areaccepted, uplink bandwidth is used to serve them at thevideo streaming rate (potentially via parallel substreamsfrom different peers). They are rejected if their acceptancewould require the disruption of an ongoing request service.Rejected requests are then handled by the data center.The authors developed a probabilistic content cachingstrategy that enables downloaders to maximally use thepeers’ uplink bandwidth, and hence maximally offloadthe servers in the data centers. Golrezaei et al. [179]extended the model of [178] to the scenario of video-on-demand streaming to mobile terminals from internet-basedservers and proposed a distributed caching network toreduce the download latency. In this setting, local cacheswith low-rate backhaul but high storage capacity storepopular video files. If the file is not available in the localcache, it will be transmitted by the cellular network. Theauthors analyzed the optimal assignment of files to thecaches in order to minimize the expected downloadingtime for files. They showed that caching the coded datacan substantially improve the performance in terms of bothcomputational complexity and aggregated storage require-ment. Bastug et al. [180] proposed a caching strategy thatpredicts the demand pattern by the users and caches them,in a proactive manner, in local BS during the off-peakhours. When the users actually make the demands, thecontents can be retrieved with high probability directlyfrom the cache instead of waiting for the backhaul networkand interdomain routing latency.

Bhattacharjee et al. [181] proposed a self-organizingcache scheme in which every router of the network main-tains a small cache and applies an active caching manage-ment strategy to organize the cache contents. Li et al. [182]focused on the optimal placement of M web proxies in N

potential sites with the objective of minimizing the over-all latency of searching a target web server for a givennetwork topology. The authors formulated this problem asa dynamic program, and obtained the optimal solution inpolynomial time.

C. Fog Computing

With IoT, billions of previously unconnected devicesare generating more than two exabytes of data each day,and 50 billion devices are estimated to connected to theinternet [183]. Such large amount of data cannot beprocessed fast enough by the cloud. Fog computing isproposed to extend cloud RAN further to the edge, suchthat any device with computing, storage, and networkconnectivity can be a fog node [184], [185]. The cloudand fog nodes merge into a new entity, referred to ascloud+fog [186]. In a cloud+fog architecture, critical IoT

Vol. 107, No. 2, February 2019 | PROCEEDINGS OF THE IEEE 297

Page 19: Low-Latency Networking: Where Latency Lurks and How to Tame Itmodiano/papers/CV_J_108.pdf · 2019-04-12 · Low-Latency Networking: Where Latency Lurks and How to Tame It This article

Jiang et al.: Low-Latency Networking: Where Latency Lurks and How to Tame It

data with stringent latency requirement can be processedat the closest fog node to minimize latency, while lessdelay-sensitive data can be passed to the aggregation nodeor the cloud. Fog computing also offloads gigabytes ofnetwork traffic from the core network. Similar to edgecaching, fog computing reduces the routing latency andchannel establishment latency to both data processed bythe fog nodes and by the core network.

Two main challenges in realizing fog computing’s fullpotential are to balance load distribution between fogand cloud, and to integrate heterogeneous devices into acommon computing platform [187]. To evaluate resourcemanagement and scheduling policies across fog and cloudresources, an open source simulator called iFogSim isdeveloped, which can model and simulate the perfor-mance in terms of latency, energy consumption, networkcongestion, and operational costs [188]. An architectureof smart gateway with fog computing is presented in[189], where the smart gateway can collect, preprocess,filter, reconstruct, and only upload necessary data to thecloud. To handle heterogeneous data collected from het-erogeneous devices, transcoding and interoperability areeither achieved by equipping the smart gateway with moreintelligence or through the fog computing resources.

VI. S TA N D A R D S F O R L O W-L AT E N C Y A N DU LT R A R E L I A B L E C O M M U N I C AT I O N S

In this section, we analyze how intranetworks and inter-network can be supported by current and emerging com-munication standards. We will review the low-latencycharacteristics of standards in cellular networks, industrialcommunication, and WLAN group. One or a combinationof these standards are used for the uses cases discussed inSection II.

A. The 3GPP New Radio and 5G Initiative

5G of mobile communication aims to support 1-mslatency, 10-Gb/s peak speed, and, at a global level,100 billion connections. 5G has three main classes of usecases: URLLC, enhanced mobile broadband, and massivemachine type communication. URLLC has the most strin-gent requirement for very low latency and high reliability,which suits applications such as factory automation, smartgrid, and intelligent transportation. With a vision to pro-vide a unified infrastructure for different use cases, 5G willinclude both the evolution of 4G and a new radio accesstechnology.

5G is designed to operate in a wide range of spectrumincluding frequency bands below 1 GHz up to 100 GHz.The wide bandwidth at higher frequencies includingmmWave band can effectively boost the transmissionrate and reduce the transmission time (Tt). Moreover,flexible deployment of more micro/pico sites at traf-fic dense spots eases the medium-access pressure, andreduces medium-access time (Ta). At the PHY, 5G supports

various modulations from quadrature phase-shift keying(QPSK), 16 quadrature amplitude modulation (16 QAM)to 1024 QAM to support different transmission rates andtransmission time (Tt) for different use cases. The cur-rent candidate waveform is OFDM with scalable numerol-ogy and adjustable subcarrier spacing, CP duration andOFDM symbol duration, which supports adjustable trans-mission time (Tt). A detailed assessment of OFDM in 5Gcan be found in [190]. MAC scheduling in 5G followsthe time-slotted framework of 4G, and the transmissioncan only start at the beginning of the scheduled slot.To improve the access efficiency of short packets forURLLC, the time slot in 4G can be divided into multipleminislots (the length can be as short as one OFDM symbol)to enable lower delay to access, which can greatly reducethe medium-access time (Ta). Multiple antennas will alsobe supported in 5G for MU–MIMO, which also helps reducethe medium-access time (Ta). With the increase of thecarrier frequency, the number of antennas and the multi-plexing order will increase. Meanwhile, the complexity toobtain CSI for beamforming also increases. A highly flexi-ble but unified CSI framework is supported by 5G, whichenables different antenna deployments corresponding todifferent CSI settings.

B. Industrial Communication Networks

Industry 4.0 is currently seen as the most advancedindustrial automation trend, where one of the impor-tant aspects is real-time communication between indus-trial modules. Wireless networks offer simple deployment,mobility, and low cost, and are gaining popularity in theindustrial sites [191]. The requirements in industrial appli-cations to support high reliability, high data rates, and lowlatency pose difficulties to wireless networks deployment,where the bottleneck mostly lies in the latency. To addressthese challenges, there have been some proposals, such asWirelessHP and IEEE 802.15.4e.

The recently proposed wireless high performance (Wire-less HP) aims to provide a PHY solution to supportmulti-gigabits-per-second aggregate data rate, very highreliability level ranging from 10−6 to 10−9, and packettransmission time (Tt) lower than 1 μs [48]. Takingthe advantage of deterministic and periodic traffic inthe latency-sensitive industrial applications, WirelessHPreduces the PHY preamble length (part of Th) (while stillensuring reliable packet decoding) and optimizes OFDMparameters to reduce the inefficiencies that affect shortpacket transmission.

IEEE 802.15.4 is a successful protocol that also forms thebasis for the first open standard WirelessHart for processautomation applications in the industrial field. However,the drawbacks of low reliability and unbounded packetlatency limit its deployment in the industrial applicationsthat have stringent requirements for latency and relia-bility [192]. To overcome these limitations, the recentlyreleased IEEE 802.15.4e amendment introduces MAC layer

298 PROCEEDINGS OF THE IEEE | Vol. 107, No. 2, February 2019

Page 20: Low-Latency Networking: Where Latency Lurks and How to Tame Itmodiano/papers/CV_J_108.pdf · 2019-04-12 · Low-Latency Networking: Where Latency Lurks and How to Tame It This article

Jiang et al.: Low-Latency Networking: Where Latency Lurks and How to Tame It

enhancements in three different MAC modes. Despite theindividual features among the different modes, here wefocus on the modifications in terms of latency performanceimprovement. To guarantee bounded medium-access time(Ta), channel access is time slotted and included bothcontention-free and contention-based modes for periodicand aperiodic traffic, respectively. The number of timeslots can be flexibly tuned according to the traffic load,which avoids the inefficiency of idle slots incurred in fixedframing structures. Channel hopping is applied to combatfading and improve reliability, which in turn will lowerthe retransmission latency, thus reducing medium-accesstime (Ta).

C. WLAN Group Standardization

IEEE 802.11 is a set of MAC and PHY specifications toimplement WLAN in the 0.9-, 2.4-, 3.6-, 5-, and 60-GHzfrequency bands. The key parameter index of the IEEE802.11 family focuses on data rate, coverage range, con-nectivity. Though latency performance is not specified inthe protocols, the protocol IEEE 802.11ak is designed tosupport industrial control equipment, and latency shouldbe bounded in this scenario. Moreover, the ability andinfrastructure of IEEE 802.11 to support high data ratemake it an indispensable part in the ecosystem to achievelow latency. IEEE 802.11ax, which is due to be publiclyreleased in 2019, is designed to improve spectral effi-ciency at 2.4 and 5 GHz. The technical highlights are themodulation support of up to 1024 QAM and multiusersupport in both frequency and spatial domains by the com-bination of OFDMA and MU–MIMO, which are effectiveto decrease the transmission time Tt and medium-accesslatency Ta, respectively. IEEE 802.11ay is the follow-upof 802.11ad working at 60 GHz. Compared to 802.11adwith 2.16-GHz bandwidth, 802.11ay has four times thebandwidth by channel bonding. Moreover, MIMO is addedwith a maximum of four streams with a per-stream linkrate of 44 Gb/s, which can substantially decrease thetransmission time Tt for heavy traffic by using the largebandwidth at higher frequency band.

D. Other Standardization Activities

Among the whole wide spectrum, only a small portionis regulated as licensed spectrum. While the licensed spec-trum has better performance due to less interference, theincreasing number of connected devices drives the neces-sity to use unlicensed spectrum. License Assisted Access,LTE in Unlicensed Spectrum, and Multefire [193] are threerepresentatives to explore LTE services in the unlicensed5-GHz band [194]. License Assisted Access and LTE inUnlicensed Spectrum use the unlicensed band by offload-ing traffic to boost data rate and reduce transmission time(Tt), while the control signals stay in the licensed band.Multefire takes a step even further, with no anchor atthe licensed band, and both the control signal and thedata traffic are transmitted in the unlicensed band. Thus,

Multefire not only helps to reduce the transmission time(Tt) thanks to the boosted data rate, but it also reducesthe access latency (Ta) with more access resources.

IEEE 802 Time-Sensitive Networking aims to deliverdeterministic latency over Ethernet networks [195]. Possi-ble applications include converged networks with real-timeaudio/video streaming and real-time control streamswhich are used in automotive or industrial control facil-ities. The objective latency for short messages per hopis set to be 4 μs or less with 1-Gb/s transmission rate.To guarantee deterministic delay for the time-sensitivedata, the switches are used to schedule the data trans-mission. The switches should be aware of the cycle timeof these latency-sensitive data, and during the windowexpected for the arrival of time-sensitive data, the switchwill block non-time-sensitive interfering traffic to eliminatequeueing latency Tq. To enhance the reliability, more thanone path is used simultaneously.

VII. F U R T H E R D I S C U S S I O N S

A. Short Packets

In Section IV-A3, we described research topic aboutsending short packets when the sizes of payload andheader are comparable; how to jointly code the payloadand metadata in an efficient way is of great importanceto improve the spectral efficiency and to reduce the trans-mission time Tt. Moreover, for short packets, channelestablishment delay Ta can be much longer than the trans-mission time, which is very inefficient, so the MAC controloverhead should be modified. Another open issue is to getthe optimal value of the maximum coding rate for finitepacket length and finite packet error probability, which isan NP-hard problem with exponential complexity [52].

B. The Tradeoff for Low Latency

In Sections IV-B and IV-C, we described the researchactivities around the combinations of different techniquesand different parameter settings have different latency per-formance and other performance indicators. Expectedly,there are tradeoffs between the latency performance andother performance metric such as throughput, reliability,and energy consumption. There is a tradeoff betweenlatency and throughput for different MAC schedulingschemes [196]; the tradeoff between latency and reliabilityof slotted ALOHA and CSMA is shown in [60] and thetradeoff between delay and energy consumption in [83].The tradeoff between latency and throughput in ad hocnetwork routing is given in [99]and [101]. The trade-offs may be different for different regions (i.e., differentthroughput or reliability). When the performance otherthan latency does not exceed a boundary, the latency growsmildly, while the latency increases sharply when exceedingthat boundary. For different techniques, such tradeoffs areexpected to exist, and the related boundaries should bewell determined.

Vol. 107, No. 2, February 2019 | PROCEEDINGS OF THE IEEE 299

Page 21: Low-Latency Networking: Where Latency Lurks and How to Tame Itmodiano/papers/CV_J_108.pdf · 2019-04-12 · Low-Latency Networking: Where Latency Lurks and How to Tame It This article

Jiang et al.: Low-Latency Networking: Where Latency Lurks and How to Tame It

C. mmWave

In Section IV-E2, we discussed mmWave as an enablingtechnology for low-latency communication, and it is also aresearch direction that attracts much attention with manyopen problems to be investigated.

First, beamforming is the premise to support the direc-tional transmission and the very high data rate to helpreduce latency, however, the delay caused by beamformingmay also prohibit mmWave techniques from working inlow-latency communications. Moreover, as the coherencetime in mmWave band is shorter than that in microwaveband, efficient beamforming algorithms in mmWave proveto be a very prominent research direction in mmWaveand are essential for mmWave techniques to be used inlow-latency scenarios. Giordani et al. surveyed some exist-ing beamforming and beam-tracking techniques for 3GPPNew Radio at the mmWave frequencies [197].

Second, although mmWave communication is mostlynoise limited, the SINR at the receiver side might probablydegrade greatly when the beam at the receiver side isalso aligned to undesired transmitting beams besides thedesired one. Thus, multilevel HARQ is needed to indicateretransmission for certain frames corrupted by randomstrong noise, or a new transmission when experiencinghigh interference [198]. When the high interference lastsfor a long time, retransmission may not be useful, so effec-tive and efficient solutions are needed.

Another problem occurs in the case of blockage in themain beam aligned link. As the PHY frame duration ismuch smaller compared to the time for the vehicle orpeople to move away, under this case the device shouldfall back to search other mmWave BSs in the local areainstead of waiting. And if no suitable mmWave BSs areavailable, UE may also fall back to use microwave. Then,whether to reassociate in the mmWave band or to changeto microwave band during blockage is an open problem.An efficient way to determine the reason for performancedegradation is of practical importance to reduce the delayin mmWave.

D. Combination With PHY Parameters

In Sections IV-B and IV-C, we discussed the effect ofPHY parameters on the corresponding latency compo-nents. For contention-based MACs, the probability of asuccessful transmission is often calculated by the prob-ability when there are no more than one node trans-mitting at the same time. This practice holds for theomnidirectional transmission. When we use directionaltransmission with a large number of antenna elements,e.g., in mmWave systems, some simultaneous transmis-sions may not cause interference to each other. Due tothis reason, the contention-based MACs need to be revisedfor certain scenarios. Moreover, given a reliability require-ment, retransmissions due to poor SINR may be neededbesides collision during the access phase, so the PHYparameters should be combined to determine the MAC

parameters, and more effective and efficient cross-layerdesign to achieve low-latency need to be investigated.

At the routing layer, as the topologies are often unknownand may change in time, it is difficult to determine the effi-cient interference model. However, in directional commu-nications, multiple links can be activated without causinginterference, and this suggests new research topics at therouting level. The tradeoff between the time to establishdirectional communication and the improvement of therouting delay should be investigated.

E. Unified Communication Network forLow-Latency Applications

In Sections VI and IV-G, we described the standard-ization activities, NFV and SDN separately; here we willdiscuss the open research that arise under the vision of 5G.5G aims at providing a unified infrastructure for a widevariety of use cases from media sharing (requiring hightransmission date), massive machine-type communication(requiring access availability), to applications that requirelow latency. Moreover, for the latency-sensitive applica-tions, different requirements in terms of latency, reliability,transmission rate, and energy consumption are needed.When considering different applications as verticals, theyoperate on top of horizontal communication infrastruc-tures, resources, and techniques [29]. A unified communi-cation ecosystem should figure out how to share horizontalinfrastructures and resources tailored to different verticals.NFV and SDN are necessary to enable flexible configura-tion of the communication services. When designing proto-cols, interfaces between different communities should alsobe taken into consideration to ensure easy and efficientcombination of the horizontal techniques.

Traffic offloading is another key technique to promote aunified communication network. By 2020, 5G is expectedto increase the area capacity 1000-fold, and to be ableto connect 100 billion devices. Despite the remarkablegrowth in capacity, it is still difficult to support such alarge number of connections. Even with different prior-ities, latency-sensitive transmission may not be guaran-teed. Traffic offloading may greatly relieve the burdenof the mobile network by using other communicationforms to share the traffic. For instance, local traffic can bedirectly exchanged by D2D communication mode withoutgoing through the infrastructure. An architecture unifyingthe coverage-centric 4G mobile networks and data-centricfiber-wireless net broadband networks was proposed in[154]. This architecture uses fiber as backhaul sharing andWiFi to offload the mobile user traffic from the 4G mobilenetworks, and as a result the end-to-end delay decreasesdramatically on the order of 1 ms.

F. Age of Information

Age of Information (AoI) is a new performance metricthat measures the amount of time that elapsed since themost recently received packet was generated at its source.

300 PROCEEDINGS OF THE IEEE | Vol. 107, No. 2, February 2019

Page 22: Low-Latency Networking: Where Latency Lurks and How to Tame Itmodiano/papers/CV_J_108.pdf · 2019-04-12 · Low-Latency Networking: Where Latency Lurks and How to Tame It This article

Jiang et al.: Low-Latency Networking: Where Latency Lurks and How to Tame It

As such, AoI measures the ”freshness of information” fromthe perspective of the destination. AoI has been receivingincreasing attention in the literature, particularly for appli-cations that generate time-sensitive data such as position,command and control, or sensor data. AoI is a functionof packet delay and packet interdelivery time. Thus, lowdelay alone many not be sufficient to achieve good AoIperformance. For example, an M/M/1 queue with a lowarrival rate and a high service rate may have low queueingdelay but high AoI because the packet interarrival timesare large. Thus, achieving good AoI performance involves abalance between maintaining low delays and small packetinterarrival times. To better understand these phenom-ena, Kaul et al. [199] modeled the network between thesource and destination as a single first-in–first-out (FIFO)queue, and proved that there is indeed an optimal rateat which AoI is minimized. Since then, most of the workon AoI has focused on single queue models. Age forFIFO M/M/1, M/D/1, and D/M/1 queues was analyzedin [199]; multiclass FIFO M/G/1 and G/G/1 queues werestudied in [200]. However, the problem of minimizingAoI in a wireless network with interference constraintshas received limited attention. In [201], Kadota et al.developed scheduling algorithms for minimizing AoI overa wireless base station, where only one node can transmitat a time, and in [202], the authors studied the AoI mini-mization problem in a wireless network subject to generalinterference constraints. The approach was generalized tomultihop wireless networks in [203].

G. Hardware and Smart Devices

Due to the limitation of our expertise, we mainly discussthe issue from a technique perspective; there are alsomany open research problems about hardware and smartdevices which enable the functioning of different tech-niques. When we seek to exploit the wide bandwidth athigher frequencies such as mmWave, the correct hardwarethat operates in the high bands is needed, and technologiesin the higher frequencies can be widely used to achievegood performance when the price of hardware becomeslower.

Today, the devices are becoming smarter in terms ofbetter processing capability and larger caching size. Thissmartness not only enables installing and running applica-tions smoothly, but also helps to achieve low latency. Withlarger caching capability, the devices can proactively cachepopular contents by an analysis of the user preference fromthe previous content the user has browsed [180]. Then,when the user actually makes the request to this cachedcontent, the device can display it with local cache insteadof an access through the network.

Another direction is to enable the devices to supportD2D communication from both the hardware and protocolperspective [204]. Then, the devices can communicatewith others in proximity using lower power and probablyin a single hop, instead of communicating through the BS

going through multiple hops or using strong transmittingpower. However, the tradeoff between the extra overheadfor control and channel estimation and the delay savedby communicating in the D2D mode still needs furtherresearch.

H. Security

Currently, the packet overheads due to security orprivacy methods substantially contribute to the delayin both the communication of information itself andthe decoding/processing. The classic approach to pri-vacy and security is cryptography [205]. However, cryp-tography introduces formidable overheads and heavycoordination over the transmitters/receivers that areinvolved, which substantially makes it impossible toachieve very low latencies. Existing alternative privacymethods for low-latency networking demand substan-tial investigation. Alternative methods to cryptography—namely, information-theoretic secrecy [206], differentialprivacy [207], k-anonymity [208], and signal processingsecurity methods [209]—present shortcomings when itcomes to their use for low-latency networking. Like cryp-tography, information-theoretic secrecy prevents an eaves-dropper acquiring information from two communicatingnodes. However, the channel acquisition phase that isneeded for these methods includes substantial delays thatare in contrast with low latencies. Differential privacy,k-anonymity, and signal processing methods perturb theoriginal data to make data analysis, and as such it is notclear how they can be used to ensure private or securelow-latency communications.

VIII. C O N C L U S I O N

Low-latency communications are arguably the most impor-tant direction in the next generation of communicationnetworks. In this paper, we showed that the most promi-nent use cases demanding low latency are supported bycomplex network interactions, including internetwork andintranetwork interactions. To realize low-latency networks,it is important to determine where and how latency occursand what methods can help to reduce it. We investi-gated how the delay accumulates from PHY to transportlayer, and we showed how to characterize the end-to-enddelay into several components. Then, we discussed howdifferent techniques may influence one or multiple delaycomponents. We argued that these techniques should beoptimized together to reduce the delay while satisfyingother requirements such as reliability and throughput.MU–MIMO, mmWave, and full duplex, which can greatlyimprove the data rate, were considered as three enablingtechnologies to support low-latency communication, buteach of the three also poses challenges, e.g., MU–MIMOand mmWave will introduce beamforming delay before thestart of the transmission. Finally, sending short packets, thetradeoff between latency and other network performance

Vol. 107, No. 2, February 2019 | PROCEEDINGS OF THE IEEE 301

Page 23: Low-Latency Networking: Where Latency Lurks and How to Tame Itmodiano/papers/CV_J_108.pdf · 2019-04-12 · Low-Latency Networking: Where Latency Lurks and How to Tame It This article

Jiang et al.: Low-Latency Networking: Where Latency Lurks and How to Tame It

indexes, using mmWave bands, design with physical para-meters, hardware and smart devices design, and traffic

offloading are some of the most promising research areasthat will need substantial future research developments.�

R E F E R E N C E S[1] F. P. Kelly, “Network routing,” Philos. Trans. Roy.

Soc. London A, Math., Phys. Eng. Sci., vol. 337,no. 1647, pp. 343–367, 1991.

[2] M. Weiser, “The computer for the 21st century,”Sci. Amer., vol. 265, no. 3, pp. 94–104, 1991.

[3] M. S. Corson, R. Laroia, J. Li, V. Park,T. Richardson, and G. Tsirtsis, “Towardproximity-aware internetworking,” IEEE WirelessCommun., vol. 17, no. 6, pp. 26–33, Dec. 2010.

[4] N. Katugampala, K. T. Al-Naimi, S. Villette, andA. M. Kondoz, “Real-time end-to-end secure voicecommunications over GSM voice channel,” inProc. 13th Eur. Signal Process. Conf., Sep. 2005,pp. 1–4.

[5] T. Chen, G. Charbit, K. Ranta-Aho, O. Fresan, andT. Ristaniemi, “VoIP end-to-end performance inHSPA with packet age aided HSDPA scheduling,”in Proc. IEEE 19th Int. Symp. Pers., Indoor MobileRadio Commun., Sep. 2008, pp. 1–5.

[6] M. S. Corson et al., “FlashLinQ: Enabling a mobileproximal Internet,” IEEE Wireless Commun.,vol. 20, no. 5, pp. 110–117, Oct. 2013.

[7] N. Brahmi, O. N. C. Yilmaz, K. W. Helmersson,S. A. Ashraf, and J. Torsner, “Deploymentstrategies for ultra-reliable and low-latencycommunication in factory automation,” in Proc.Globecom Workshops, Dec. 2015, pp. 1–6.

[8] M. A. Lema et al., “Business case and technologyanalysis for 5G low latency applications,” IEEEAccess, vol. 5, pp. 5917–5935, 2017.

[9] M. Simsek, A. Aijaz, M. Dohler, J. Sachs, andG. Fettweis, “5G-enabled tactile Internet,” IEEEJ. Sel. Areas Commun., vol. 34, no. 3,pp. 460–473, Mar. 2016.

[10] A. A. Zaidi, “Designing for the future: The 5G NRphysical layer,” in Proc. Ericsson Rev., 2017.

[11] J. Torsner et al., “Industrial remote operation: 5Grises to the challenge,” in Proc. Ericsson Rev., 2015.

[12] I. Redpath, “Monetizing high-performance,low-latency networks,” in Proc. Ovum, 2017.

[13] M. Weiner, M. Jorgovanovic, A. Sahai, andB. Nikolié, “Design of a low-latency,high-reliability wireless communication system forcontrol applications,” in Proc. IEEE Int. Conf.Commun., Jun. 2014, pp. 3829–3835.

[14] K. Yamamoto, F. Ichihara, K. Hasegawa,M. Tukuda, and I. Omura, “60 GHz wireless signaltransmitting gate driver for IGBT,” in Proc. IEEE27th Int. Symp. Power Semiconductor Devices IC’s,May 2015, pp. 133–136.

[15] M. Maier, M. Chowdhury, B. P. Rimal, andD. P. Van, “The tactile Internet: Vision, recentprogress, and open challenges,” IEEE Commun.Mag., vol. 54, no. 5, pp. 138–145, May 2016.

[16] G. P. Fettweis, “The tactile Internet: Applicationsand challenges,” IEEE Veh. Technol. Mag., vol. 9,no. 1, pp. 64–70, Mar. 2014.

[17] A. Osseiran, J. Sachs, and M. Puleri,“Manufacturing reengineered: Robots, 5G and theIndustrial IoT,” in Proc. Ericsson Bus. Rev., 2015.

[18] J. Hasbrouck and G. Saar, “Low-latency trading,”J. Financial Markets, vol. 16, no. 4, pp. 646–679,2013.

[19] R. Bogue, “Exoskeletons and robotic prosthetics: Areview of recent developments,” Ind. Robot, Int. J.,vol. 36, no. 5, pp. 421–427, 2009.

[20] D. J. Tyler, “Restoring the human touch:Prosthetics imbued with haptics give their wearersfine motor control and a sense of connection,”IEEE Spectrum, vol. 53, no. 5, pp. 28–33,May 2016.

[21] M. E. Tozal, Y. Wang, E. Al-Shaer, K. Sarac,B. Thuraisingham, and B.-T. Chu, “Adaptiveinformation coding for secure and reliablewireless telesurgery communications,” MobileNetw. Appl., vol. 18, no. 5, pp. 697–711, 2013.

[22] Machine-to-Machine Communications (M2M);Applicability of M2M Architecture to Smart GridNetworks; Impact of Smart Grids on M2M

Platform, document ETRI TR 102 935 V2.1.1,Sep. 2012.

[23] D. Orfanus, R. Indergaard, G. Prytz, and T. Wien,“EtherCAT-based platform for distributed controlin high-performance industrial applications,” inProc. IEEE 18th Conf. Emerg. Technol. FactoryAutom., Sep. 2013, pp. 1–8.

[24] Y. Liu, R. Candell, and N. Moayeri, “Effects ofwireless packet loss in industrial process controlsystems,” ISA Trans., vol. 68, pp. 412–424, May2017.

[25] Z. Pang, M. Luvisotto, and D. Dzung, “Highperformance wireless communications for criticalcontrol applications,” IEEE Ind. Electron. Mag.,to be published.

[26] V. C. Güngör et al., “Smart grid technologies:Communication technologies and standards,” IEEETrans. Ind. Informat., vol. 7, no. 4, pp. 529–539,Nov. 2011.

[27] R. H. Khan and J. Y. Khan, “A comprehensivereview of the application characteristics andtraffic requirements of a smart gridcommunications network,” Comput. Netw.,vol. 57, no. 3, pp. 825–845, 2013.

[28] IEC, IEC Standard 61850. [Online]. Available:http://www.iec.ch/

[29] M. Wollschlaeger, T. Sauter, and J. Jasperneite,“The future of industrial communication:Automation networks in the era of the Internet ofThings and industry 4.0,” IEEE Ind. Electron. Mag.,vol. 11, no. 1, pp. 17–27, Mar. 2017.

[30] L. Turchet, A. McPherson, and C. Fischione,“Smart instruments: Towards an ecosystem ofinteroperable devices connecting performers andaudiences,” in Proc. Sound Music Comput. Conf.,2016.

[31] L. Turchet, M. Benincaso, and C. Fischione,“Examples of use cases with smart instruments,”in Proc. 12th Int. Audio Mostly Conf. AugmentedParticipatory Sound Music Exp., 2017, p. 47.

[32] L. Turchet, C. Fischione, and M. Barthet, “Towardsthe Internet of musical things,” in Proc. SoundMusic Comput. Conf., 2017, pp. 13–20.

[33] J. Santa and A. F. Gomez-Skarmeta, “Sharingcontext-aware road and safety information,” IEEEPervasive Comput., vol. 8, no. 3, pp. 58–65,Jul./Sep. 2009.

[34] M. Alsabaan, K. Naik, and T. Khalifa,“Optimization of fuel cost and emissions usingV2V communications,” IEEE Trans. Intell. Transp.Syst., vol. 14, no. 3, pp. 1449–1461,Sep. 2013.

[35] R. Yu, Y. Zhang, S. Gjessing, W. Xia, and K. Yang,“Toward cloud-based vehicular networks withefficient resource management,” IEEE Netw.,vol. 27, no. 5, pp. 48–55, Sep./Oct. 2013.

[36] M. Gerla, E.-K. Lee, G. Pau, and U. Lee, “Internetof vehicles: From intelligent grid to autonomouscars and vehicular clouds,” in Proc. IEEE WorldForum Internet Things, Mar. 2014, pp. 241–246.

[37] R. A. Uzcátegui, A. J. De Sucre, andG. Acosta-Marum, “Wave: A tutorial,” IEEECommun. Mag., vol. 47, no. 5, pp. 126–133, May2009.

[38] 5G PPP, “5G automotive vision,” White Paper, Oct.2015. [Online]. Available: https://5g-ppp.eu/wp-content/uploads/2014/02/5G-PPP-White-Paper-on-Automotive-Vertical-Sectors.pdf

[39] A. Festag, “Standards for vehicularcommunication—From IEEE 802.11p to 5G,” E IElektrotechn. Informationstechn., vol. 132, no. 7,pp. 409–416, Nov. 2015.

[40] R. Martin, “Wall street’s quest to process data atthe speed of light,” Inf. Week, vol. 4, no. 21, p. 7,2007.

[41] L. Massoulié and J. Roberts, “Bandwidth sharing:Objectives and algorithms,” in Proc. IEEE 18thAnnu. Joint Conf. IEEE Comput. Commun. Soc.,vol. 3, Mar. 1999, pp. 1395–1403.

[42] O. N. C. Yilmaz, Y.-P. E. Wang, N. A. Johansson,N. Brahmi, S. A. Ashraf, and J. Sachs, “Analysis ofultra-reliable and low-latency 5G communicationfor a factory automation use case,” in Proc. IEEEInt. Conf. Commun. Workshop, Jun. 2015,pp. 1190–1195.

[43] A. J. Goldsmith and S.-G. Chua, “Variable-ratevariable-power MQAM for fading channels,” IEEETrans. Commun., vol. 45, no. 10, pp. 1218–1230,Oct. 1997.

[44] A. J. Goldsmith and P. P. Varaiya, “Capacity offading channels with channel side information,”IEEE Trans. Inf. Theory, vol. 43, no. 6,pp. 1986–1992, Nov. 1997.

[45] G. Wunder et al., “5GNOW: Non-orthogonal,asynchronous waveforms for future mobileapplications,” IEEE Commun. Mag., vol. 52, no. 2,pp. 97–105, Feb. 2014.

[46] P. Banelli, S. Buzzi, G. Colavolpe, A. Modenini,F. Rusek, and A. Ugolini, “Modulation formats andwaveforms for 5G networks: Who will be the heirof OFDM?: An overview of alternative modulationschemes for improved spectral efficiency,” IEEESignal Process. Mag., vol. 31, no. 6, pp. 80–93,Nov. 2014.

[47] F. Pancaldi, G. M. Vitetta, R. Kalbasi, N. Al-Dhahir,M. Uysal, and H. Mheidat, “Single-carrierfrequency domain equalization,” IEEE SignalProcess. Mag., vol. 25, no. 5, pp. 37–56, Sep. 2008.

[48] M. Luvisotto, Z. Pang, D. Dzung, M. Zhan, andX. Jiang, “Physical layer design ofhigh-performance wireless transmission for criticalcontrol applications,” IEEE Trans. Ind. Informat.,vol. 13, no. 6, pp. 2844–2854, Dec. 2017.

[49] J. Abdoli, M. Jia, and J. Ma, “Filtered OFDM: Anew waveform for future wireless systems,” inProc. IEEE 16th Int. Workshop Signal Process. Adv.Wireless Commun., Jun./Jul. 2015, pp. 66–70.

[50] A. Farhang, N. Marchetti, F. Figueiredo, andJ. P. Miranda, “Massive MIMO and waveformdesign for 5th generation wireless communicationsystems,” in Proc. 1st Int. Conf. 5G UbiquitousConnectivity, Nov. 2014, pp. 70–75.

[51] G. Fettweis, M. Krondorf, and S. Bittner,“GFDM—Generalized frequency divisionmultiplexing,” in Proc. IEEE 69th Veh. Technol.Conf., Apr. 2009, pp. 1–4.

[52] G. Durisi, T. Koch, and P. Popovski (2015).“Towards massive, ultra-reliable, and low-latencywireless communication with short packets.”[Online]. Available:https://arxiv.org/abs/1504.06526

[53] Y. Polyanskiy, H. V. Poor, and S. Verdú, “Channelcoding rate in the finite blocklength regime,” IEEETrans. Inf. Theory, vol. 56, no. 5, pp. 2307–2359,May 2010.

[54] J. Östman, W. Yang, G. Durisi, and T. Koch,“Diversity versus multiplexing at finiteblocklength,” in Proc. 11th Int. Symp. WirelessCommun. Syst., Aug. 2014, pp. 702–706.

[55] G. Durisi, T. Koch, J. Ostman, Y. Polyanskiy, andW. Yang, “Short-packet communications withmultiple antennas: Transmit diversity, spatialmultiplexing, and channel estimation overhead,”CoRR, 2014.

[56] D. Soldani, Y. J. Guo, B. Barani, P. Mogensen,C.-L. I, and S. K. Das, “5G for ultra-reliablelow-latency communications,” IEEE Network,vol. 32, no. 2, pp. 6–7, Mar./Apr. 2018.

[57] S. Parkvall, E. Dahlman, A. Furuskar, andM. Frenne, “NR: The new 5G radio accesstechnology,” IEEE Commun. Standards Mag.,vol. 1, no. 4, pp. 24–30, Dec. 2017.

[58] Y. Gu, T. He, M. Lin, and J. Xu, “Spatiotemporaldelay control for low-duty-cycle sensor networks,”in Proc. 30th IEEE Real-Time Syst. Symp., Dec.2009, pp. 127–137.

[59] L. G. Roberts, “ALOHA packet system with andwithout slots and capture,” ACM SIGCOMM

302 PROCEEDINGS OF THE IEEE | Vol. 107, No. 2, February 2019

Page 24: Low-Latency Networking: Where Latency Lurks and How to Tame Itmodiano/papers/CV_J_108.pdf · 2019-04-12 · Low-Latency Networking: Where Latency Lurks and How to Tame It This article

Jiang et al.: Low-Latency Networking: Where Latency Lurks and How to Tame It

Comput. Commun. Rev., vol. 5, no. 2, pp. 28–42,1975.

[60] Y. Yang and T. S. P. Yum, “Delay distributions ofslotted ALOHA and CSMA,” IEEE Trans. Commun.,vol. 51, no. 11, pp. 1846–1857, Nov. 2003.

[61] K. Sohraby, D. Minoli, and T. Znati, Wireless SensorNetworks: Technology, Protocols, and Applications.Hoboken, NJ, USA: Wiley, 2007.

[62] E. Magistretti, O. Gurewitz, and E. W. Knightly,“802.11ec: Collision avoidance without controlmessages,” IEEE/ACM Trans. Netw., vol. 22, no. 6,pp. 1845–1858, Dec. 2014.

[63] H. Shokri-Ghadikolaei, C. Fischione, P. Popovski,and M. Zorzi, “Design aspects of short-rangemillimeter-wave networks: A MAC layerperspective,” IEEE Network, vol. 30, no. 3,pp. 88–96, May 2016.

[64] H. Shokri-Ghadikolaei and C. Fischione, “Thetransitional behavior of interference in millimeterwave networks and its impact on medium accesscontrol,” IEEE Trans. Commun., vol. 62, no. 2,pp. 723–740, Feb. 2016.

[65] R. Ramanathan, J. Redi, C. Santivanez,D. Wiggins, and S. Polit, “Ad hoc networking withdirectional antennas: A complete systemsolution,” IEEE J. Sel. Areas Commun., vol. 23,no. 3, pp. 496–506, Mar. 2005.

[66] I. K. Son, S. Mao, M. X. Gong, and Y. Li, “Onframe-based scheduling for directional mmWaveWPANs,” in Proc. IEEE Int. Conf. Comput.Commun., Mar. 2012, pp. 2149–2157.

[67] S. Singh, R. Mudumbai, and U. Madhow,“Interference analysis for highly directional60-GHz mesh networks: The case for rethinkingmedium access control,” IEEE/ACM Trans. Netw.,vol. 19, no. 5, pp. 1513–1527, Oct. 2011.

[68] T. Stahlbuhk, B. Shrader, and E. Modiano,“Topology control for wireless networks withhighly-directional antennas,” in Proc. IEEE Int.Symp. Modeling Optim. Mobile, Ad Hoc, WirelessNetw., May 2016, pp. 1–8.

[69] T. Nitsche, A. B. Flores, E. W. Knightly, andJ. Widmer, “Steering with eyes closed: Mm-Wavebeam steering without in-band measurement,” inProc. IEEE Conf. Comput. Commun., Apr./May2015, pp. 2416–2424.

[70] E. Olfat, H. Shokri-Ghadikolaei,N. N. Najari-Moghadam, M. Bengtsson, andC. Fischione, “Learning-based pilot precoding andcombining for wideband millimeter-wavenetworks,” in Proc. IEEE Int. Workshop Comput.Adv. Multi-Sensor Adapt. Process., Dec. 2017,pp. 1–5.

[71] A. Azari, P. Popovski, G. Miao, and C. Stefanovic(2017). “Grant-free radio access for short-packetcommunications over 5G networks.” [Online].Available: https://arxiv.org/abs/1709.02179

[72] F. Clazzer, C. Kissling, and M. Marchese (2016).“Exploiting combination techniques in randomaccess MAC protocols: Enhanced contentionresolution ALOHA.” [Online]. Available:https://arxiv.org/abs/1602.07636

[73] V. Narasimha Swamy et al., “Cooperativecommunication for high-reliability low-latencywireless control,” in Proc. IEEE Int. Conf. Commun.(ICC), Jun. 2015, pp. 4380–4386.

[74] V. N. Swamy, P. Rigge, G. Ranade, A. Sahai, andB. Nikolic, “Network coding for high-reliabilitylow-latency wireless control,” in Proc. IEEEWireless Commun. Netw. Conf., Apr. 2016, pp. 1–7.

[75] S. Choi, G.-H. Hwang, T. Kwon, A.-R. Lim, andD.-H. Cho, “Fast handover scheme for real-timedownlink services in IEEE 802.16e BWA system,”in Proc. IEEE 61st Veh. Technol. Conf. (VTC-Spring),vol. 3, May 2005, pp. 2028–2032.

[76] D. H. Lee, K. Kyamakya, and J. P. Umondi, “Fasthandover algorithm for IEEE 802.16e broadbandwireless access system,” in Proc. 1st Int. Symp.Wireless Pervasive Comput., Jan. 2006, pp. 1–6.

[77] W. Jiao, P. Jiang, and Y. Ma, “Fast handoverscheme for real-time applications in mobileWiMAX,” in Proc. IEEE Int. Conf. Commun.,Jun. 2007, pp. 6038–6042.

[78] R. Li, J. Li, K. Wu, Y. Xiao, and J. Xie, “Anenhanced fast handover with low latency for

mobile IPv6,” IEEE Trans. Wireless Commun.,vol. 7, no. 1, pp. 334–342, Jan. 2008.

[79] L. Dimopoulou, G. Leoleis, and I. O. Venieris, “Fasthandover support in a WLAN environment:Challenges and perspectives,” IEEE Network,vol. 19, no. 3, pp. 14–20, May 2005.

[80] H. Shokri-Ghadikolaei et al., “Millimeter wavecellular networks: A MAC layer perspective,” IEEETrans. Commun., vol. 63, no. 10, pp. 3437–3458,Oct. 2015.

[81] O. Semiari, W. Saad, M. Bennis, and B. Maham,“Caching meets millimeter wave communicationsfor enhanced mobility management in 5Gnetworks,” IEEE Trans. Wireless Commun., vol. 17,no. 2, pp. 779–793, Feb. 2018.

[82] Y. Xu, H. Shokri-Ghadikolaei, and C. Fischione,“Distributed association and relaying with fairnessin millimeter wave networks,” IEEE Trans. WirelessCommun., vol. 15, no. 12, pp. 7955–7970, Dec.2016.

[83] B. Prabhakar, E. U. Biyikoglu, and A. El Gamal,“Energy-efficient transmission over a wireless linkvia lazy packet scheduling,” in Proc. 12th Annu.Joint Conf. IEEE Comput. Commun. Soc.,Apr. 2001, pp. 386–394.

[84] M. A. Zafer and E. Modiano, “A calculus approachto energy-efficient data transmission withquality-of-service constraints,” IEEE Trans. Netw.,vol. 17, no. 3, pp. 898–911, Jun. 2009.

[85] R. L. Cruz, “A calculus for network delay.I. Network elements in isolation,” IEEE Trans. Inf.Theory, vol. 37, no. 1, pp. 114–131, Jan. 1991.

[86] M. Zafer and E. Modiano, “Optimal rate controlfor delay-constrained data transmission over awireless channel,” IEEE Trans. Inf. Theory, vol. 54,no. 9, pp. 4020–4039, Sep. 2008.

[87] R. Berry, M. Zafer, and E. Modiano, Energy EfficientWireless Transmission With Delay Constraints. SanRafael, CA, USA: Morgan & Claypool, 2012.

[88] L. Tassiulas and A. Ephremides, “Stabilityproperties of constrained queueing systems andscheduling policies for maximum throughput inmultihop radio networks,” IEEE Trans. Autom.Control, vol. 37, no. 12, pp. 1936–1948, Dec.1992.

[89] L. X. Bui, R. Srikant, and A. Stolyar, “A novelarchitecture for reduction of delay and queueingstructure complexity in the back-pressurealgorithm,” IEEE/ACM Trans. Netw., vol. 19,no. 6, pp. 1597–1609, Dec. 2011.

[90] N. M. Jones, G. S. Paschos, B. Shrader, andE. Modiano, “An overlay architecture forthroughput optimal multipath routing,” IEEE/ACMTrans. Netw., vol. 25, no. 5, pp. 2615–2628, Oct.2017.

[91] R. Bellman, “On a routing problem,” Quart. Appl.Math., vol. 16, no. 1, pp. 87–90, 1958.

[92] E. W. Dijkstra, “A note on two problems inconnexion with graphs,” Numer. Math., vol. 1,no. 1, pp. 269–271, 1959.

[93] R. Singh and E. Modiano (2017). “Optimalrouting for delay-sensitive traffic in overlaynetworks.” [Online]. Available:https://arxiv.org/abs/1703.07419

[94] G. R. Gupta and N. Shroff, “Delay analysis formulti-hop wireless networks,” in Proc. IEEE Int.Conf. Comput. Commun., Apr. 2009,pp. 2356–2364.

[95] N. McKeown, A. Mekkittikul, V. Anantharam, andJ. Walrand, “Achieving 100% throughput in aninput-queued switch,” IEEE Trans. Commun.,vol. 47, no. 8, pp. 1260–1267, Aug. 1999.

[96] M. J. Neely, “Delay-based network utilitymaximization,” IEEE/ACM Trans. Netw., vol. 21,no. 1, pp. 41–54, Feb. 2013.

[97] P. Di Marco, G. Athanasiou, P.-V. Mekikis, andC. Fischione, “MAC-aware routing metrics for theInternet of Things,” Comput. Commun., vol. 74,pp. 77–86, Jan. 2016.

[98] A. Sinha and E. Modiano, “Optimal control forgeneralized network-flow problems,” in Proc. IEEEInt. Conf. Comput. Commun., May 2017,pp. 1–9.

[99] A. El Gamal, J. Mammen, B. Prabhakar, andD. Shah, “Optimal throughput-delay scaling in

wireless networks—Part I: The fluid model,” IEEETrans. Inf. Theory, vol. 52, no. 6, pp. 2568–2592,Jun. 2006.

[100] S. Yi, Y. Pei, and S. Kalyanaraman, “On thecapacity improvement of ad hoc wireless networksusing directional antennas,” in Proc. 4th ACM Int.Symp. Mobile Ad Hoc Netw. Comput., 2003,pp. 108–116.

[101] M. Grossglauser and D. N. C. Tse, “Mobilityincreases the capacity of ad-hoc wirelessnetworks,” in Proc. 12th Annu. Joint Conf. IEEEComput. Commun. Soc., vol. 3, Apr. 2001,pp. 1360–1369.

[102] X. Wang, W. Huang, S. Wang, J. Zhang, and C. Hu,“Delay and capacity tradeoff analysis formotioncast,” IEEE/ACM Trans. Netw., vol. 19,no. 5, pp. 1354–1367, Oct. 2011.

[103] R. Talak, S. Karaman, and E. Modiano, “Capacityand delay scaling for broadcast transmission inhighly mobile wireless networks,” in Proc. ACMMobiHoc, 2017, p. 8.

[104] M. J. Neely and E. Modiano, “Capacity and delaytradeoffs for ad hoc mobile networks,” IEEE Trans.Inf. Theory, vol. 51, no. 6, pp. 1917–1937,Jun. 2005.

[105] T. He, J. A. Stankovic, C. Lu, and T. Abdelzaher,“SPEED: A real-time routing protocol for sensornetworks,” Dept. Comput. Sci.,Univ. Virginia,Charlottesville, VA, USA, Tech. Rep., 2002.

[106] E. Felemban, C.-G. Lee, and E. Ekici, “MMSPEED:Multipath multi-SPEED protocol for QoSguarantee of reliability and timeliness in wirelesssensor networks,” IEEE Trans. Mobile Comput.,vol. 5, no. 6, pp. 738–754, Jun. 2006.

[107] B. Zhang and H. T. Mouftah, “QoS routing forwireless ad hoc networks: Problems, algorithms,and protocols,” IEEE Commun. Mag., vol. 43,no. 10, pp. 110–117, Oct. 2005.

[108] C. E. Perkins and E. M. Royer, “Ad-hoc on-demanddistance vector routing,” in Proc. Workshop MobileComput. Syst. Appl. (WMCSA), Feb. 1999,pp. 90–100.

[109] C. R. Lin, “On-demand QoS routing in multihopmobile networks,” in Proc. 20th Annu. Joint Conf.IEEE Comput. Commun. Soc., vol. 3, Apr. 2001,pp. 1735–1744.

[110] L. Hanzo and R. Tafazolli, “A survey of QoSrouting solutions for mobile ad hoc networks,”IEEE Commun. Surveys Tuts., vol. 9, no. 2,pp. 50–70, 2nd Quart., 2007.

[111] B. Paul and M. J. Islam, “Survey over VANETrouting protocols for vehicle to vehiclecommunication,” IOSR J. Comput. Eng.,pp. 0661–2278, 2012.

[112] M. Altayeb and I. Mahgoub, “A survey of vehicularad hoc networks routing protocols,” Int.J. InNov. Appl. Stud., vol. 3, no. 3, pp. 829–846,2013.

[113] D. Comer, Internetworking with TCP/IP: Principles,Protocols, and Architecture, vol. 1. EnglewoodCliffs, NJ, USA: Prentice-Hall, 2006.

[114] S. Floyd, “TCP and explicit congestionnotification,” ACM Comput. Commun. Rev.,vol. 24, no. 5, pp. 8–23, Oct. 1994.

[115] R. Adams, “Active queue management: A survey,”IEEE Commun. Surveys Tuts., vol. 15, no. 3,pp. 1425–1476, 3rd Quart., 2013.

[116] K. Nichols and V. Jacobson, “Controlling queuedelay,” Commun. ACM, vol. 55, no. 7, pp. 42–50,Jul. 2012.

[117] M. Alizadeh, A. Kabbani, T. Edsall, B. Prabhakar,A. Vahdat, and M. Yasuda, “Less is more: Tradinga little bandwidth for ultra-low latency in the datacenter,” in Proc. 9th USENIX Conf. Netw. Syst.Design Implement.. Berkeley, CA, USA: USENIXAssociation, 2012, p. 19.

[118] A. Goldsmith, S. A. Jafar, N. Jindal, andS. Vishwanath, “Capacity limits of MIMOchannels,” IEEE J. Sel. Areas Commun., vol. 21,no. 5, pp. 684–702, Jun. 2003.

[119] Q. H. Spencer, A. L. Swindlehurst, and M. Haardt,“Zero-forcing methods for downlink spatialmultiplexing in multiuser MIMO channels,” IEEETrans. Signal Process., vol. 52, no. 2, pp. 461–471,Feb. 2004.

Vol. 107, No. 2, February 2019 | PROCEEDINGS OF THE IEEE 303

Page 25: Low-Latency Networking: Where Latency Lurks and How to Tame Itmodiano/papers/CV_J_108.pdf · 2019-04-12 · Low-Latency Networking: Where Latency Lurks and How to Tame It This article

Jiang et al.: Low-Latency Networking: Where Latency Lurks and How to Tame It

[120] T. Yoo and A. Goldsmith, “Capacity and powerallocation for fading MIMO channels with channelestimation error,” IEEE Trans. Inf. Theory, vol. 52,no. 5, pp. 2203–2214, May 2006.

[121] T. L. Marzetta, “Noncooperative cellular wirelesswith unlimited numbers of base station antennas,”IEEE Trans. Wireless Commun., vol. 9, no. 11,pp. 3590–3600, Nov. 2010.

[122] F. Rusek et al., “Scaling up MIMO: Opportunitiesand challenges with very large arrays,” IEEE SignalProcess. Mag., vol. 30, no. 1, pp. 40–60, Jan. 2013.

[123] D. Tse and P. Viswanath, Fundamentals of WirelessCommunication. Cambridge, U.K.: CambridgeUniv. Press, 2005.

[124] H. Huh, A. M. Tulino, and G. Caire, “NetworkMIMO with linear zero-forcing beamforming:Large system analysis, impact of channelestimation, and reduced-complexity scheduling,”IEEE Trans. Inf. Theory, vol. 58, no. 5,pp. 2911–2934, May 2012.

[125] K. T. K. Cheung, S. Yang, and L. Hanzo, “Spectraland energy spectral efficiency optimization ofjoint transmit and receive beamforming basedmulti-relay MIMO-OFDMA cellular networks,”IEEE Trans. Wireless Commun., vol. 13, no. 11,pp. 6147–6165, Nov. 2014.

[126] W. Shin, M. Vaezi, B. Lee, D. J. Love, J. Lee, andH. V. Poor, “Coordinated beamforming formulti-cell MIMO-NOMA,” IEEE Commun. Lett.,vol. 21, no. 1, pp. 84–87, Jan. 2017.

[127] S. He, Y. Huang, L. Yang, B. Ottersten, andW. Hong, “Energy efficient coordinatedbeamforming for multicell system: Duality-basedalgorithm design and massive MIMO transition,”IEEE Trans. Commun., vol. 63, no. 12,pp. 4920–4935, Dec. 2015.

[128] H. Q. Ngo, L.-N. Tran, T. Q. Duong, M. Matthaiou,and E. G. Larsson, “On the total energy efficiencyof cell-free massive MIMO,” IEEE Trans. GreenCommun. Netw., vol. 2, no. 1, pp. 25–39, Mar.2018.

[129] J. Zhang and J. G. Andrews, “Adaptive spatialintercell interference cancellation in multicellwireless networks,” IEEE J. Sel. Areas Commun.,vol. 28, no. 9, pp. 1455–1468, Dec. 2010.

[130] K. Hosseini, W. Yu, and R. S. Adve, “Large-scaleMIMO versus network MIMO for multicellinterference mitigation,” IEEE J. Sel. Topics SignalProcess., vol. 8, no. 5, pp. 930–941, Oct. 2014.

[131] S. Huberman and T. Le-Ngoc, “MIMO full-duplexprecoding: A joint beamforming andself-interference cancellation structure,” IEEETrans. Wireless Commun., vol. 14, no. 4,pp. 2205–2217, Apr. 2015.

[132] Y. Ghasempour, C. R. C. M. da Silva, C. Cordeiro,and E. W. Knightly, “IEEE 802.11ay:Next-generation 60 GHz communicationfor 100 Gb/s Wi-Fi,” IEEE Commun. Mag., vol. 55,no. 12, pp. 186–192, Dec. 2017.

[133] H. Shokri-Ghadikolaei, L. Gkatzikis, andC. Fischione, “Beam-searching and transmissionscheduling in millimeter wave communications,”in Proc. IEEE Int. Conf. Commun., Jun. 2015,pp. 1292–1297.

[134] H. Shokri-Ghadikolaei, F. Boccardi, C. Fischione,G. Fodor, and M. Zorzi, “Spectrum sharing inmmWave cellular networks via cell association,coordination, and beamforming,” IEEE J. Sel.Areas Commun., vol. 34, no. 11, pp. 2902–2917,Nov. 2016.

[135] Y. Niu, Y. Li, D. Jin, L. Su, and D. Wu, “Blockagerobust and efficient scheduling for directionalmmWave WPANs,” IEEE Trans. Veh. Technol.,vol. 64, no. 2, pp. 728–742, Feb. 2015.

[136] A. Kinalis, S. Nikoletseas, D. Patroumpa, andJ. Rolim, “Biased sink mobility with adaptive stoptimes for low latency data collection in sensornetworks,” Inf. Fusion, vol. 15, pp. 56–63,Jan. 2014.

[137] F. Boccardi et al., “Spectrum pooling in mmWavenetworks: Opportunities, challenges, andenablers,” IEEE Commun. Mag., vol. 54, no. 11,pp. 33–39, Nov. 2016.

[138] V. Petrov, M. Komarov, D. Moltchanov, J. M. Jornet,and Y. Koucheryavy, “Interference and SINR in

millimeter wave and terahertz communicationsystems with blocking and directional antennas,”IEEE Trans. Wireless Commun., vol. 16, no. 3,pp. 1791–1808, Mar. 2017.

[139] R. W. Heath, N. González-Prelcic, S. Rangan,W. Roh, and A. M. Sayeed, “An overview of signalprocessing techniques for millimeter wave MIMOsystems,” IEEE J. Sel. Topics Signal Process.,vol. 10, no. 3, pp. 436–453, Apr. 2016.

[140] C. Zhang, D. Guo, and P. Fan, “Tracking angles ofdeparture and arrival in a mobile millimeter wavechannel,” in Proc. IEEE ICC, May 2016, pp. 1–6.

[141] J. Zhao, F. Gao, W. Jia, S. Zhang, S. Jin, andH. Lin, “Angle domain hybrid precoding andchannel tracking for millimeter wave massiveMIMO systems,” IEEE Trans. Wireless Commun.,vol. 16, no. 10, pp. 6868–6880, Oct. 2017.

[142] R. Ford, M. Zhang, M. Mezzavilla, S. Dutta,S. Rangan, and M. Zorzi, “Achieving ultra-lowlatency in 5G millimeter wave cellular networks,”IEEE Commun. Mag., vol. 55, no. 3, pp. 196–203,Mar. 2017.

[143] D. Soldani and A. Manzalini, “Horizon 2020 andbeyond: On the 5G operating system for a truedigital society,” IEEE Veh. Technol. Mag., vol. 10,no. 1, pp. 32–42, Mar. 2015.

[144] J. I. Choi, M. Jain, K. Srinivasan, P. Levis, andS. Katti, “Achieving single channel, full duplexwireless communication,” in Proc. 16th Annu. Int.Conf. Mobile Comput. Netw., 2010, pp. 1–12.

[145] Z. Zhang, K. Long, A. V. Vasilakos, and L. Hanzo,“Full-duplex wireless communications:Challenges, solutions, and future researchdirections,” Proc. IEEE, vol. 104, no. 7,pp. 1369–1409, Jul. 2016.

[146] Z. Li et al., “Overview on initial METIS D2Dconcept,” in Proc. 1st Int. Conf. 5G UbiquitousConnectivity, Nov. 2014, pp. 203–208.

[147] Z. Li et al., “Overview on initial METIS D2Dconcept,” in Proc. 1st Int. Conf. 5G UbiquitousConnectivity (5GU), Nov. 2014, pp. 203–208.

[148] P. Gandotra and R. K. Jha, “Device-to-devicecommunication in cellular networks: A survey,”J. Netw. Comput. Appl., vol. 71, pp. 99–117,Aug. 2016.

[149] J. Qiao, X. Shen, J. Mark, Q. Shen, Y. He, andL. Lei, “Enabling device-to-device communicationsin millimeter-wave 5G cellular networks,” IEEECommun. Mag., vol. 53, no. 1, pp. 209–215,Jan. 2015.

[150] Y. Niu et al., “Exploiting device-to-devicecommunications to enhance spatial reuse forpopular content downloading in directionalmmWave small cells,” IEEE Trans. Veh. Technol.,vol. 65, no. 7, pp. 5538–5550, Jul. 2016.

[151] C. Chen, “C-RAN: The road towards green radioaccess network,” White Paper, 2011.

[152] P. Chanclou et al., “Optical fiber solution formobile fronthaul to achieve cloud radio accessnetwork,” in Proc. Future Netw. Mobile Summit,Jul. 2013, pp. 1–11.

[153] N. Cvijetic, A. Tanaka, K. Kanonakis, and T. Wang,“SDN-controlled topology-reconfigurable opticalmobile fronthaul architecture for bidirectionalCoMP and low latency inter-cell D2D in the 5Gmobile era,” Opt. Express, vol. 22, no. 17,pp. 20809–20815, 2014.

[154] H. Beyranvand, M. Lévesque, M. Maier, andJ. A. Salehi, “FiWi enhanced LTE-A HetNetswith unreliable fiber backhaul sharing andWiFi offloading,” in Proc. IEEE Int. Conf.Comput. Commun., Apr./May 2015,pp. 1275–1283.

[155] A. S. Tanenbaum and D. J. Wetherall, ComputerNetworks. Englewood Cliffs, NJ, USA:Prentice-Hall, 1996.

[156] I. Tinnirello and S. Choi, “Efficiency analysis ofburst transmissions with block ACK incontention-based 802.11e WLANs,” in Proc. IEEEInt. Conf. Commun. (ICC), vol. 5, May 2005,pp. 3455–3460.

[157] C. Shanti and A. Sahoo, “DGRAM: A delayguaranteed routing and MAC protocol for wirelesssensor networks,” IEEE Commun. Mobile Comput.,vol. 9, no. 10, pp. 1407–1423, Oct. 2010.

[158] C. L. I, S. Han, Z. Xu, S. Wang, Q. Sun, andY. Chen, “New paradigm of 5G wireless Internet,”IEEE J. Sel. Areas Commun., vol. 34, no. 3,pp. 474–482, Mar. 2016.

[159] D. Kreutz, F. Ramos, P. E. Veríssimo,C. E. Rothenberg, S. Azodolmolky, and S. Uhlig,“Software-defined networking: A comprehensivesurvey,” Proc. IEEE, vol. 103, no. 1, pp. 14–76,Jan. 2015.

[160] “Network functions virtualisation (NFV);architectural framework,” ETSI, Sophia Antipolis,France, Tech. Rep., 2016.

[161] N. F. Virtualisation, “NFV performance &portability best practises,” ETSI, Sophia Antipolis,France, Tech. Rep., 2014.

[162] B. Briscoe, “Network functions virtualisation(NFV)-NFV security: Problem statement,” ETSI,Sophia Antipolis, France, Tech. Rep., 2014.

[163] M. S. Siddiqui et al., “Hierarchical, virtualised anddistributed intelligence 5G architecture forlow-latency and secure applications,” Trans.Emerg. Telecommun. Technol., vol. 27, no. 9,pp. 1233–1241, Sep. 2016.

[164] T. Kurimoto et al., “SINET5: A low-latency andhigh-bandwidth backbone network for SDN/NFVEra,” in Proc. IEEE Int. Conf. Commun., May 2017,pp. 1–7.

[165] Y. Rekhter, T. Li, and S. Hares, “A border gatewayprotocol 4 (BGP-4),” Tech. Rep., 2005.

[166] A. Arins, “Latency factor in worldwide IP routednetworks,” in Proc. IEEE 2nd Workshop Adv. Inf.,Electron. Elect. Eng., Nov. 2014, pp. 1–4.

[167] H. E. Egilmez, S. Civanlar, and A. M. Tekalp, “Adistributed QoS routing architecture for scalablevideo streaming over multi-domain OpenFlownetworks,” in Proc. 19th IEEE Int. Conf. ImageProcess., Sep./Oct. 2012, pp. 2237–2240.

[168] P. Lin et al., “A west-east bridge based SDNinter-domain testbed,” IEEE Commun. Mag.,vol. 53, no. 2, pp. 190–197, Feb. 2015.

[169] F. Paolucci et al., “Interoperable multi-domaindelay-aware provisioning using segment routingmonitoring and BGP-LS advertisement,” in Proc.42nd Eur. Conf. Opt. Commun., Sep. 2016, pp. 1–3.

[170] K.-S. Lui, K. Nahrstedt, and S. Chen, “Routingwith topology aggregation in delay-bandwidthsensitive networks,” IEEE/ACM Trans. Netw.,vol. 12, no. 1, pp. 17–29, Feb. 2004.

[171] R. Singh and E. Modiano (Mar. 2017). “Optimalrouting for delay-sensitive traffic in overlaynetworks.” [Online]. Available:https://arxiv.org/abs/1703.07419

[172] X. Ji, “Models and algorithm for stochasticshortest path problem,” Appl. Math. Comput.,vol. 170, no. 1, pp. 503–514, Nov. 2005.

[173] M. S. Talebi, Z. Zou, R. Combes, A. Proutiere, andM. Johansson, “Stochastic online shortest pathrouting: The value of feedback,” IEEE Trans.Autom. Control, vol. 63, no. 4, pp. 915–930,Apr. 2018.

[174] S. Borst, V. Gupta, and A. Walid, “Distributedcaching algorithms for content distributionnetworks,” in Proc. IEEE INFOCOM, Mar. 2010,pp. 1–9.

[175] M. A. Maddah-Ali and U. Niesen, “Fundamentallimits of caching,” IEEE Trans. Inf. Theory, vol. 60,no. 5, pp. 2856–2867, May 2014.

[176] M. A. Maddah-Ali and U. Niesen, “Decentralizedcoded caching attains order-optimal memory-ratetradeoff,” IEEE/ACM Trans. Netw., vol. 23, no. 4,pp. 1029–1040, Aug. 2014.

[177] K. N. Doan, T. Van Nguyen, T. Q. S. Quek, andH. Shin, “Content-aware proactive caching forbackhaul offloading in cellular network,” IEEETrans. Wireless Commun., vol. 17, no. 5,pp. 3128–3140, May 2018.

[178] B. Tan and L. Massoulié, “Optimal contentplacement for peer-to-peer video-on-demandsystems,” IEEE/ACM Trans. Netw., vol. 21, no. 2,pp. 566–579, Apr. 2013.

[179] N. Golrezaei, K. Shanmugam, A. G. Dimakis,A. F. Molisch, and G. Caire, “FemtoCaching:Wireless video content delivery throughdistributed caching helpers,” in Proc. IEEE Int.Conf. Comput. Commun., Mar. 2012,

304 PROCEEDINGS OF THE IEEE | Vol. 107, No. 2, February 2019

Page 26: Low-Latency Networking: Where Latency Lurks and How to Tame Itmodiano/papers/CV_J_108.pdf · 2019-04-12 · Low-Latency Networking: Where Latency Lurks and How to Tame It This article

Jiang et al.: Low-Latency Networking: Where Latency Lurks and How to Tame It

pp. 1107–1115.[180] E. Bastug, M. Bennis, and M. Debbah, “Living on

the edge: The role of proactive caching in 5Gwireless networks,” IEEE Commun. Mag., vol. 52,no. 8, pp. 82–89, Aug. 2014.

[181] S. Bhattacharjee, K. L. Calvert, and E. W. Zegura,“Self-organizing wide-area network caches,” inProc. 17th Annu. Joint Conf. IEEE Comput.Commun. Soc. (INFOCOM), vol. 2, Mar. 1998,pp. 600–608.

[182] B. Li, M. J. Golin, G. F. Italiano, X. Deng, andK. Sohraby, “On the optimal placement of Webproxies in the Internet,” in Proc. IEEE INFOCOM,vol. 3. New York, NY, USA, Mar. 1999,pp. 1282–1290.

[183] Fog Computing and the Internet of Things: Extendthe Cloud to Where the Things Are, 2015.

[184] F. Bonomi, R. Milito, J. Zhu, and S. Addepalli,“Fog computing and its role in the Internet ofThings,” in Proc. 1st Ed. Workshop Mobile CloudComput., 2012, pp. 13–16.

[185] F. Bonomi, R. Milito, P. Natarajan, and J. Zhu, “Fogcomputing: A platform for Internet of Things andanalytics,” in Big Data and Internet of Things: ARoadmap for Smart Environments. New York, NY,USA: Springer-Verlag, 2014, pp. 169–186.

[186] Q. Li, H. Niu, A. Papathanassiou, and G. Wu,“Edge cloud and underlay networks: Empowering5G cell-less wireless architecture,” in Proc. 20thEur. Wireless Conf. Eur. Wireless, May 2014,pp. 1–6.

[187] N. Bessis and C. Dobre, Eds., Big Data and Internetof Things: A Roadmap for Smart Environments,vol. 546. New York, NY, USA: Springer-Verlag,2014.

[188] H. Gupta, A. V. Dastjerdi, S. K. Ghosh, andR. Buyya, “iFogSim: A toolkit for modeling andsimulation of resource management techniques inthe Internet of Things, Edge and Fog computingenvironments,” Softw., Pract. Exper., vol. 47, no. 9,pp. 1275–1296, Sep. 2017.

[189] M. Aazam and E.-N. Huh, “Fog computing and

smart gateway based communication for cloud ofthings,” in Proc. Int. Conf. Future Internet ThingsCloud (FiCloud), Aug. 2014, pp. 464–470.

[190] A. A. Zaidi et al., “Waveform and numerology tosupport 5G services and requirements,” IEEECommun. Mag., vol. 54, no. 11, pp. 90–98,Nov. 2016.

[191] S. Mumtaz, A. Alsohaily, Z. Pang, A. Rayes,K. F. Tsang, and J. Rodriguez, “Massive Internet ofThings for industrial applications: Addressingwireless IIoT connectivity challenges andecosystem fragmentation,” IEEE Ind. Electron.Mag., vol. 11, no. 1, pp. 28–33, Mar. 2017.

[192] D. De Guglielmo, S. Brienza, and G. Anastasi,“IEEE 802.15.4e: A survey,” Comput. Commun.,vol. 88, pp. 1–24, Aug. 2016.

[193] LTE-Like Performance With Wi-Fi-Like Simplicity,MulteFire, 2015.

[194] M. Labib, V. Marojevic, J. H. Reed, andA. I. Zaghloul (2017). “Extending LTE into theunlicensed spectrum: Technical analysis of theproposed variants.” [Online]. Available:https://arxiv.org/abs/1709.04458

[195] IEEE TSN (Time-Sensitive Networking): ADeterministic Ethernet Standard, TTTech, Vienna,Austria, 2015.

[196] S. Tasaka, “Multiple-access protocols for satellitepacket communication networks: A performancecomparison,” Proc. IEEE, vol. 72, no. 11,pp. 1573–1582, Nov. 1984.

[197] M. Giordani, M. Polese, A. Roy, D. Castor, andM. Zorzi (2018). “A tutorial on beam managementfor 3GPP NR at mmWave frequencies.” [Online].Available: https://arxiv.org/abs/1804.01908

[198] T. Levanen, J. Pirskanen, and M. Valkama, “Radiointerface design for ultra-low latencymillimeter-wave communications in 5G Era,” inProc. Globecom Workshops, Dec. 2014,pp. 1420–1426.

[199] S. Kaul, R. Yates, and M. Gruteser, “Real-timestatus: How often should one update?” in Proc.IEEE Int. Conf. Comput. Commun., Mar. 2012,

pp. 2731–2735.[200] L. Huang and E. Modiano, “Optimizing

age-of-information in a multi-class queueingsystem,” in Proc. IEEE Int. Symp. Inf. Theory,Jun. 2015, pp. 1681–1685.

[201] I. Kadota, E. Uysal-Biyikoglu, R. Singh, andE. Modiano, “Minimizing the age of informationin broadcast wireless networks,” in Proc. 54thAnnu. Allerton Conf. Commun., Control, Comput.,Sep. 2016, pp. 844–851.

[202] Optimizing Information Freshness in WirelessNetworks Under General Interference Constraints,ACM, New York, NY, USA, 2018.

[203] R. Talak, S. Karaman, and E. Modiano,“Minimizing age-of-information in multi-hopwireless networks,” in Proc. 55th Annu. AllertonConf. Commun., Control, Comput. (Allerton), Oct.2017, pp. 486–493.

[204] F. Boccardi, R. W. Heath, A. Lozano, T. L. Marzetta,and P. Popovski, “Five disruptive technologydirections for 5G,” IEEE Commun. Mag., vol. 52,no. 2, pp. 74–80, Feb. 2014.

[205] O. Goldreich, Foundations of Cryptography: BasicApplications, vol. 2. Cambridge, U.K.: CambridgeUniv. Press, 2009.

[206] W. K. Harrison, J. Almeida, M. R. Bloch,S. W. McLaughlin, and J. Barros, “Coding forsecrecy: An overview of error-control codingtechniques for physical-layer security,” IEEE SignalProcess. Mag., vol. 30, no. 5, pp. 41–50, Sep. 2013.

[207] C. Dwork and A. Roth, “The algorithmicfoundations of differential privacy,” Found. TrendsTheor. Comput. Sci., vol. 9, nos. 3–4, pp. 211–407,2014.

[208] L. Sweeney, “K-anonymity: A model for protectingprivacy,” Int. J. Uncertainty, Fuzziness Knowl.-BasedSyst., vol. 10, no. 5, pp. 557–570, 2002.

[209] L. Sankar, W. Trappe, K. Ramchandran, H. V. Poor,and M. Debbah, “The role of signal processing inmeeting privacy challenges: An overview,” IEEESignal Process. Mag., vol. 30, no. 5, pp. 95–106,Sep. 2013.

A B O U T T H E A U T H O R S

Xiaolin Jiang (Student Member, IEEE)received the M.Sc. degree in informa-tion and communication engineering fromHarbin Institute of Technology, Harbin,China, in 2015. Currently, she is workingtoward the Ph.D. degree at the Departmentof Network and Systems Engineering, RoyalInstitute of Technology (KTH), Stockholm,Sweden.Her research interests include industrial wireless communica-

tion networks, low-latency communication, and millimeter-wavecommunication.

Hossein Shokri-Ghadikolaei (StudentMember, IEEE) received the B.Sc. degree inelectrical engineering from Iran Universityof Science and Technology, Tehran, Iran,in 2009, the M.Sc. degree in electricalengineering from Sharif University ofTechnology, Tehran, Iran, in 2011, and thePh.D. degree in electrical engineering andcomputer science from Royal Institute ofTechnology (KTH), Stockholm, Sweden, in 2018.His research interests include optimization and machine learning,

with applications in communication networks.Dr. Shokri-Ghadikolaei was a recipient of the IEEE Communica-

tions Society Stephen O. Rice Prize (2018), the Premium Awardfor Best Paper in IET Communications (2014), the Program ofExcellence award from KTH (2013), and the Best Paper Award fromthe Iranian Student Conference of Electrical Engineering (2011).

He was selected as an Exemplary Reviewer for the IEEETRANSACTIONS ON COMMUNICATIONS in 2017 and 2018. He is a mem-ber of the working group 1900.1 in the IEEE Dynamic SpectrumAccess Networks Standards Committee (DySPAN-SC).

Gabor Fodor (Senior Member, IEEE)received the M.Sc. and Ph.D. degrees inelectrical engineering (major in commu-nications engineering) from the BudapestUniversity of Technology and Economicsin 1988 and 1998, respectively.In 1998, he joined Ericsson Research,

Stockholm, Sweden, where he is currentlya Master Researcher specializing in theperformance analysis and design of wireless networks. Since 2017,he has also been an Adjunct Professor at the Royal Instituteof Technology (KTH), Stockholm, Sweden. He has publishedover 100 papers in peer-reviewed conference proceedings andjournals, and is a coauthor of over 80 international patentapplications (pending or granted). His current research interestsinclude physical layer and medium-access control layer algorithmsfor cellular networks, multiple-antenna systems, and full-duplexcommunications.Prof. Fodor was a corecipient of the IEEE Communications

Society Stephen O. Rice prize in 2018. He is an Editor of the IEEETRANSACTIONS ON WIRELESS COMMUNICATIONS.

Vol. 107, No. 2, February 2019 | PROCEEDINGS OF THE IEEE 305

Page 27: Low-Latency Networking: Where Latency Lurks and How to Tame Itmodiano/papers/CV_J_108.pdf · 2019-04-12 · Low-Latency Networking: Where Latency Lurks and How to Tame It This article

Jiang et al.: Low-Latency Networking: Where Latency Lurks and How to Tame It

Eytan Modiano (Fellow, IEEE) received theB.S. degree in electrical engineering andcomputer science from the University ofConnecticut, Storrs, CT, USA, in 1986, andthe M.S. and Ph.D. degrees in electrical engi-neering from the University of Maryland,College Park, MD, USA, in 1989 and 1992,respectively.He was a Naval Research Laboratory Fel-

low between 1987 and 1992 and a National Research Council Post-doctoral Fellow during 1992–1993. Between 1993 and 1999, he waswith MIT Lincoln Laboratory, Cambridge, MA, USA. Since 1999, hehas been on the faculty at MIT, where he is a Professor andAssociate Department Head at the Department of Aeronautics andAstronautics, and Associate Director of the Laboratory for Informa-tion and Decision Systems (LIDS). His research is on communicationnetworks and protocols with emphasis on satellite, wireless, andoptical networks.Prof. Modiano is the corecipient of the Infocom 2018 Best

Paper Award, the MobiHoc 2018 Best Paper Award, the Mobi-Hoc 2016 Best Paper Award, the Wiopt 2013 Best Paper Award,and the Sigmetrics 2006 Best Paper Award. He is the Editor-in-Chief of the IEEE/ACM TRANSACTIONS ON NETWORKING, and servedas an Associate Editor of the IEEE TRANSACTIONS ON INFORMATIONTHEORY and the IEEE/ACM TRANSACTIONS ON NETWORKING. He wasthe Technical Program Co-Chair for IEEE Wiopt 2006, IEEE Infocom2007, ACM MobiHoc 2007, and DRCN 2015. He is an AssociateFellow of the American Institute of Aeronautics and Astronautics(AIAA), and served on the IEEE Fellows committee.

Zhibo Pang (Senior Member, IEEE)received the B.Eng. degree in electronicengineering from Zhejiang University,Hangzhou, China, in 2002, the MBAdegree in Innovation and Growth from theUniversity of Turku, Turku, Finland, in 2012,and the Ph.D. degree in electronic andcomputer systems from the Royal Instituteof Technology (KTH), Stockholm, Sweden,in 2013.He is currently a Principal Scientist on Wireless Communications

at ABB Corporate Research, Västeras, Sweden, leading researchon digitalization solutions for smart buildings and homes, roboticsand factories, healthcare and logistics, power electronics, andpower systems. He is also Affiliated Faculty at Royal Institute ofTechnology (KTH), and taking adjunct roles at Tsinghua University,University of Sydney, and Zhejiang University. Before joined ABB,he was cofounder and CTO of startups such as Ambigua Medito AB.Dr. Pang is a Co-Chair of the Technical Committee on Industrial

Informatics. He is an Associate Editor of the IEEE TRANSACTIONS ON

INDUSTRIAL INFORMATICS and the IEEE JOURNAL OF BIOMEDICAL ANDHEALTH INFORMATICS, and a Guest Editor of the PROCEEDINGS OF THE

IEEE, the IEEE INTERNET OF THINGS JOURNAL, and IEEE REVIEWS IN

BIOMEDICAL ENGINEERING. He was awarded the 2016 Inventor of theYear Award by ABB Corporate Research Sweden, the only award forindividuals, with only one winner per year out of 300+ researchers.

Michele Zorzi (Fellow, IEEE) receivedthe Laurea and Ph.D. degrees in electricalengineering from the University ofPadova, Padova, Italy, in 1990 and 1994,respectively.From 1992 to 1993, he was on leave

from the University of California at SanDiego (UCSD), La Jolla, CA, USA. In 1993,he joined the Faculty of the Dipartimentodi Elettronica e Informazione, Politecnico di Milano, Milan, Italy.He was with the Center for Wireless Communications, UCSD,for three years. In 1998, he joined the School of Engineering,University of Ferrara, Ferrara, Italy, where he became a Professor in2000. Since 2003, he has been with the faculty of the InformationEngineering Department, University of Padova. His currentresearch interests include performance evaluation in mobilecommunications systems, WSN and Internet-of-Things, cognitivecommunications and networking, 5G mmWave cellular systems,and underwater communications and networks.Prof. Zorzi is currently the Editor-in-Chief of the IEEE

TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING,and was the Editor-in-Chief of IEEE WIRELESS COMMUNICATIONSfrom 2003 to 2005 and the IEEE TRANSACTIONS ON COMMUNICATIONSfrom 2008 to 2011. He served as a Member-at-Large of the Board ofGovernors of the IEEE Communications Society from 2009 to 2011,and as its Director of Education from 2014 to 2015.

Carlo Fischione (Member, IEEE) receivedthe Laurea degree in electronic engineering(Laurea, summa cum laude, 5/5 years) andthe Ph.D. degree in electrical and informa-tion engineering (3/3 years) from the Univer-sity of L’Aquila, Italy, in May 2005 and April2001, respectively.He has held research positions at the

Massachusetts Institute of Technology, Cam-bridge, MA, USA (2015, Visiting Professor); Harvard University,Cambridge, MA, USA (2015, Associate); and the University of Cali-fornia at Berkeley, Berkeley, CA, USA (2004–2005, Visiting Scholar,and 2007–2008, Research Associate). He is currently a tenuredAssociate Professor at the Electrical Engineering and ComputerScience, Department of Network and Systems Engineering, RoyalInstitute of Technology (KTH), Stockholm, Sweden. He also hasoffered his advice as a consultant to numerous technology com-panies such as ABB Corporate Research, Berkeley Wireless SensorNetwork Lab, Ericsson Research, Synopsys, and United TechnologyResearch Center. He is cofunder and Scientific Director of MINDMusic Labs. He has coauthored over 180 publications, including abook, book chapters, international journals and conferences, andinternational patents. His research interests include optimizationwith applications to networks, wireless and sensor networks, Inter-net of Things, and digital systems.Prof. Fischione received or coreceived a number of awards, such

as the IEEE Communication Society S. O. Rice Best Paper Awardof 2018 for the best IEEE TRANSACTIONS ON COMMUNICATIONS paper,the Best Paper Award of the IEEE TRANSACTIONS ON INDUSTRIALINFORMATICS (2007), the Best Paper Awards at the IEEE InternationalConference on Mobile Ad-hoc and Sensor System 2005 and 2009(IEEE MASS 2005 and IEEE MASS 2009), the Best Paper Award ofthe IEEE Sweden VT-COM-IT Chapter (2014), the Best Business Ideaawards from VentureCup East Sweden (2010) and from StockholmInnovation and Growth (STING) Life Science in Sweden (2014), the“Ferdinando Filauro” award from University of L’Aquila, Italy (2003),the “Higher Education” award from Abruzzo Region Government,Italy (2004), the Junior Research award from the Swedish ResearchCouncil (2007), the “Silver Ear of Wheat” award in history fromthe Municipality of Tornimparte, Italy (2012). He is an Editor of theIEEE TRANSACTIONS ON COMMUNICATIONS and an Associate Editor ofIFAC Automatica. He is an Ordinary Member of DASP (the ItalianAcademy of History Deputazione Abruzzese di Storia Patria).

306 PROCEEDINGS OF THE IEEE | Vol. 107, No. 2, February 2019