Application-Aware Acceleration for Wireless Data Networks: Design Elements and Prototype...

16
Application-Aware Acceleration for Wireless Data Networks: Design Elements and Prototype Implementation Zhenyun Zhuang, Student Member, IEEE, Tae-Young Chang, Student Member, IEEE, Raghupathy Sivakumar, Senior Member, IEEE, and Aravind Velayutham, Member, IEEE Abstract—A tremendous amount of research has been done toward improving transport-layer performance over wireless data networks. The improved transport layer protocols are typically application-unaware. In this paper, we argue that the behavior of applications can and does dominate the actual performance experienced. More importantly, we show that for practical applications, application behavior all but completely negates any improvement achievable through better transport layer protocols. In this context, we motivate an application-aware, but application transparent, solution suite called A 3 (application-aware acceleration) that uses a set of design principles realized in an application-specific fashion to overcome the typical behavioral problems of applications. We demonstrate the performance of A 3 through both emulations using realistic application traffic traces and implementations using the NetFilter utility. Index Terms—Wireless networks, application-aware acceleration. Ç 1 INTRODUCTION A significant amount of research has been done toward the development of better transport layer protocols that can alleviate the problems Transmission Control Protocol (TCP) exhibits in wireless environments [12], [16], [17], [26]. Such protocols, and several more, have novel and unique design components that are indeed important for tackling the unique characteristics of wireless environments. However, in this paper, we ask a somewhat orthogonal question in the very context the above protocols were designed for: How does the application’s behavior impact the performance deliverable to wireless users? Toward answering this question, we explore the impact of typical wireless characteristics on the performance experienced by the applications for very popularly used real-world applications including File Transfer Protocol (FTP), the Common Internet File Sharing (CIFS) protocol [1], the Simple Mail Transfer Protocol (SMTP), and the Hypertext Transfer Protocol (HTTP). Through our experi- ments, we arrive at an impactful result: Except for FTP, which has a simple application layer behavior, for all other applications considered, not only is the performance experienced when using vanilla TCP-NewReno much worse than for FTP, but the applications see negligible or no performance enhancements even when they are made to use the wireless-aware protocols. We delve deeper into the above observation and identify several common behavioral characteristics of the applica- tions that fundamentally limit the performance achievable when operating over wireless data networks. Such char- acteristics stem from the design of the applications, which is typically tailored for operation in substantially higher quality local area network (LAN) environments. Hence, we pose the question: if application behavior is a major cause for performance degradation as observed through the experiments, what can be done to improve the end-user application performance? In answering the above question, we present a new solution called Application-Aware Acceleration (A 3 , pro- nounced as “A-cube”), which is a middleware that offsets the typical behavioral problems of real-life applications through an effective set of principles and design elements. A 3 ’s design has five underlying design principles including transaction prediction, prioritized fetching, redundant and aggressive retransmissions, application-aware encoding, and infinite buffering. The design principles are derived explicitly with the goal of addressing the aforementioned application layer behavioral problems. We present A 3 as a platform solution requiring entities at both ends of the end- to-end communication, but also describe a variation of A 3 called A 3 (pronounced as “A-cube dot”), which is a point solution but is not as effective as A 3 . One of the keystone aspects of the A 3 design is that it is application-aware, but application transparent. The rest of the paper is organized as follows: Section 2 presents the motivation results for A 3 . Section 3 presents the key design elements underlying the A 3 solution. Section 4 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 8, NO. X, XXXXXXX 2009 1 . Z. Zhuang is with the College of Computing, Georgia Institute of Technology, 350333 Georgia Tech Station, Atlanta, GA 30332. E-mail: [email protected]. . T.-Y. Chang is with Xiocom Wireless, 3505 Koger Boulevard, Suite 400, Duluth, GA 30096. E-mail: [email protected]. . R. Sivakumar is with the School of Electrical and Computer Engineering, Georgia Institute of Technology, 5164 Centergy, 75 Fifth Street NW, Atlanta, GA 30308. E-mail: [email protected]. . A. Velayutham is with Asankya, Inc., 75 Fifth Street NW, Atlanta, GA 30308. E-mail: [email protected]. Manuscript received 9 Feb. 2008; revised 19 Nov. 2008; accepted 10 Feb. 2009; published online 19 Feb. 2009. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number TMC-2008-02-0045. Digital Object Identifier no. 10.1109/TMC.2009.52. 1536-1233/09/$25.00 ß 2009 IEEE Published by the IEEE CS, CASS, ComSoc, IES, & SPS

Transcript of Application-Aware Acceleration for Wireless Data Networks: Design Elements and Prototype...

Application-Aware Acceleration for WirelessData Networks: Design Elements

and Prototype ImplementationZhenyun Zhuang, Student Member, IEEE, Tae-Young Chang, Student Member, IEEE,

Raghupathy Sivakumar, Senior Member, IEEE, and Aravind Velayutham, Member, IEEE

Abstract—A tremendous amount of research has been done toward improving transport-layer performance over wireless data

networks. The improved transport layer protocols are typically application-unaware. In this paper, we argue that the behavior of

applications can and does dominate the actual performance experienced. More importantly, we show that for practical applications,

application behavior all but completely negates any improvement achievable through better transport layer protocols. In this context,

we motivate an application-aware, but application transparent, solution suite called A3 (application-aware acceleration) that uses a set

of design principles realized in an application-specific fashion to overcome the typical behavioral problems of applications. We

demonstrate the performance of A3 through both emulations using realistic application traffic traces and implementations using the

NetFilter utility.

Index Terms—Wireless networks, application-aware acceleration.

Ç

1 INTRODUCTION

A significant amount of research has been done towardthe development of better transport layer protocols

that can alleviate the problems Transmission ControlProtocol (TCP) exhibits in wireless environments [12],[16], [17], [26]. Such protocols, and several more, havenovel and unique design components that are indeedimportant for tackling the unique characteristics of wirelessenvironments. However, in this paper, we ask a somewhatorthogonal question in the very context the above protocolswere designed for: How does the application’s behavior impactthe performance deliverable to wireless users?

Toward answering this question, we explore the impactof typical wireless characteristics on the performanceexperienced by the applications for very popularly usedreal-world applications including File Transfer Protocol(FTP), the Common Internet File Sharing (CIFS) protocol [1],the Simple Mail Transfer Protocol (SMTP), and theHypertext Transfer Protocol (HTTP). Through our experi-ments, we arrive at an impactful result: Except for FTP,which has a simple application layer behavior, for all otherapplications considered, not only is the performance

experienced when using vanilla TCP-NewReno muchworse than for FTP, but the applications see negligible orno performance enhancements even when they are made touse the wireless-aware protocols.

We delve deeper into the above observation and identifyseveral common behavioral characteristics of the applica-tions that fundamentally limit the performance achievablewhen operating over wireless data networks. Such char-acteristics stem from the design of the applications, which istypically tailored for operation in substantially higherquality local area network (LAN) environments. Hence, wepose the question: if application behavior is a major cause forperformance degradation as observed through the experiments,

what can be done to improve the end-user application performance?In answering the above question, we present a new

solution called Application-Aware Acceleration (A3, pro-nounced as “A-cube”), which is a middleware that offsetsthe typical behavioral problems of real-life applicationsthrough an effective set of principles and design elements.A3’s design has five underlying design principles includingtransaction prediction, prioritized fetching, redundant andaggressive retransmissions, application-aware encoding,and infinite buffering. The design principles are derivedexplicitly with the goal of addressing the aforementionedapplication layer behavioral problems. We present A3 as aplatform solution requiring entities at both ends of the end-to-end communication, but also describe a variation of A3

called A3 � (pronounced as “A-cube dot”), which is a pointsolution but is not as effective as A3. One of the keystoneaspects of the A3 design is that it is application-aware, butapplication transparent.

The rest of the paper is organized as follows: Section 2presents the motivation results for A3. Section 3 presents thekey design elements underlying the A3 solution. Section 4

IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 8, NO. X, XXXXXXX 2009 1

. Z. Zhuang is with the College of Computing, Georgia Institute ofTechnology, 350333 Georgia Tech Station, Atlanta, GA 30332.E-mail: [email protected].

. T.-Y. Chang is with Xiocom Wireless, 3505 Koger Boulevard, Suite 400,Duluth, GA 30096. E-mail: [email protected].

. R. Sivakumar is with the School of Electrical and Computer Engineering,Georgia Institute of Technology, 5164 Centergy, 75 Fifth Street NW,Atlanta, GA 30308. E-mail: [email protected].

. A. Velayutham is with Asankya, Inc., 75 Fifth Street NW, Atlanta, GA30308. E-mail: [email protected].

Manuscript received 9 Feb. 2008; revised 19 Nov. 2008; accepted 10 Feb. 2009;published online 19 Feb. 2009.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TMC-2008-02-0045.Digital Object Identifier no. 10.1109/TMC.2009.52.

1536-1233/09/$25.00 � 2009 IEEE Published by the IEEE CS, CASS, ComSoc, IES, & SPS

describes the realization of A3 for specific applications.Section 5 evaluates A3 and presents a proof-of-conceptprototype of A3 using the NetFilter utility. Section 6discusses related works and Section 7 concludes the paper.

2 MOTIVATION

The focus of this work is entirely on applications thatrequire reliable and in-sequence packets delivery. In otherwords, we consider only applications that are traditionallydeveloped with the assumption of using the TCP transportlayer protocol.

2.1 Evaluation Model

We now briefly present the setting and methodologyemployed for the results presented in the rest of the section.

2.1.1 Applications

For the results presented in this section, we consider fourdifferent applications: FTP, CIFS, SMTP, and HTTP.

. CIFS: Common Internet File System is a platform-independent network protocol used for sharing files,printers, and other communication abstractionsbetween computers. While originally developed byMicrosoft, CIFS is currently an open technology thatis used for all Windows workgroup file sharing, NTprinting, and the Linux Samba server.1

. SMTP: Simple Mail Transfer Protocol is used for theexchange of e-mails either between mail servers orbetween a client and its server. Most e-mail systemsthat use the Internet for communication use SMTP.

. HTTP: Hypertext Transfer Protocol is the underlyingprotocol used by the World Wide Web (WWW).

2.1.2 Traffic Generator

We use IxChariot [18] to generate accurate application-specific traffic patterns. IxChariot is a commercial tool foremulating most real-world applications. It consists of theIxChariot console (for control), performance end points (fortraffic generation and reception), and IxProfile (for char-acterizing performance).

2.1.3 Testbed

We use a combination of a real testbed and emulation toconstruct the testbed for the results presented in the section.Since IxChariot is a software tool that generates actualapplication traffic, it is hosted on the sender and the receivingmachines as shown in Fig. 10. The path from the sender to thereceiver goes through a node running the Network Simulator(NS2) [27] in emulation mode. The network emulator isconfigured to represent desired topologies including thedifferent types of wireless technologies. More information onthe testbed is presented in Section 5.

2.1.4 Transport Protocols

Since we consider wireless LANs (WLANs), wireless WANs(WWANs), and wireless satellite area networks (SATs), weuse transport layer protocols proposed in related literature

for each of these environments. Specifically, we use New-Reno with Explicit Loss Notification (TCP-ELN) [12], Widearea Wireless TCP (WTCP) [26], and Satellite TransportProtocol (STP) [16] as enhanced transport protocols forWLANs, WWANs, and SATs, respectively.

2.1.5 Parameters

We use average RTT values of 5, 200, and 1,000 ms, averageloss rates of 1, 8, and 3 percent, and average bandwidths of5, 0.1, and 1 Mbps for WLANs, WWANs, and SATs,respectively. We simulate wireless channels by introducingvarious link parameters to packet-level traffic with NS2emulation. The default Ethernet LAN MAC protocol isused. The purpose for such a simplified wireless setup is toexamine the impact of application behaviors better byisolating the effect of complicated wireless MAC protocols.We use application-perceived throughput as the key metricof interest. Each data point is taken as an average of10 different experimental runs.

2.2 Quantitative Analysis

Fig. 1a presents the performance results for FTP undervarying loss conditions in WLANs, WWANs, and SATenvironments. The tailored protocols uniformly show con-siderable performance improvements. The results illustratethat the design of the enhancement protocols such as TCP-ELN, WTCP, and STP, is sufficient enough to deliverconsiderable improvements in performance for wirelessdata networks, when using FTP as the application. In the restof the section, we discuss the impact of using such protocolsfor other applications such as CIFS, SMTP, and HTTP.

Figs. 1b, 1c, and 1d show the performance experiencedby CIFS, SMTP, and HTTP, respectively, under varying lossconditions for the different wireless environments. It can beobserved that the performance improvements demonstrated bythe enhancement protocols for FTP do not carry over to these threeapplications. It can also be observed that the maximumperformance improvement delivered by the enhancementprotocols is less than 5 percent across all scenarios.

While the trend evident from the results discussed aboveis that the enhanced wireless transport protocols do notprovide performance improvements for three very popu-larly used applications, we argue in the rest of the sectionthat this is not due to any fundamental limitations of thetransport protocols themselves, but due to the specifics ofthe behavior of the three applications under consideration.

2.3 Impact of Application Behavior

We now explain the lack of performance improvementswhen using enhanced wireless transport protocols withapplications such as CIFS, SMTP, and HTTP. We use theconceptual application traffic pattern for the three applica-tions in Fig. 2 for most of our reasonings.

2.3.1 Thin Session Control Messages

All three applications, as observed in Fig. 2, use thin sessioncontrol message exchanges before the actual data transferoccurs, and thin request messages during the actual datatransfer phase as well. We use the term “thin” to refer to thefact that such messages are almost always contained in asingle packet of maximum segment size (MSS).

2 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 8, NO. X, XXXXXXX 2009

1. Samba uses SMB on which CIFS is based.

The observation above has two key consequences as

follows:

. When a loss occurs to a thin message, an entireround-trip time (RTT) is taken to recover from such aloss. When the round-trip time is large like inWWANs and SATs, this can result in considerablyinflating the overall transaction time for the applica-tions. Note that a loss during the data phase will nothave such an adverse impact, as the recovery fromthat loss can be multiplexed with other new datatransmissions whereas for thin message losses, noother traffic can be sent anyway.

. Most protocols, including TCP, rely on the arrival ofout-of-order packets to infer packet losses, and hence,trigger loss recovery. In the case of thin messages,since there are no packets following the lost message,the only means for loss detection is the expiry of theretransmission timer. Retransmission timers typicallyhave coarse minimum values to keep overheads low.

TCP, for example, typically uses a minimum Retrans-mission Time Out (RTO) value of one second.2

2.3.2 Block-Based Data Fetches

Another characteristic of the applications, especially CIFSand HTTP, is that although the total amount of data to befetched can be large, the data transfer is performed inblocks, with each block including a “request-response”exchange. CIFS uses its request-data-block message to sendthe block requests, with each request typically requestingonly 16 to 32 KB of data.

Such a block-based fetching of data has two implicationsto performance: 1) When the size of the requested data issmaller than the Bandwidth Delay Product (BDP), there is agross underutilization of the available resources. Hence,when the SAT network has a BDP of 128 KB and CIFS uses a16 KB request size, the utilization is only 12.5 percent.

ZHUANG ET AL.: APPLICATION-AWARE ACCELERATION FOR WIRELESS DATA NETWORKS: DESIGN ELEMENTS AND PROTOTYPE... 3

Fig. 2. Application traffic patterns. (a) CIFS. (b) SMTP. (c) HTTP (single connection case).

Fig. 1. Impact of wireless environment characteristics on application throughput. (a) FTP (WLAN, WWAN, and SAT). (b) CIFS (WLAN, WWAN, and

SAT). (c) SMTP (WLAN, WWAN, and SAT). (d) HTTP (WLAN, WWAN, and SAT).

2. While newer Linux releases have lower minimum RTO values, theystill are in the order of several hundred milliseconds.

2) Independent of the size of each requested data block, onertt is spent in sending the next request once the currentrequested data arrives. When the RTT of the path is large likein WWANs and SATs, this can inflate the overall transactiontime, and hence, lower throughput performance.

2.3.3 Flow-Control Bottlenecked Operations

Flow control is an important function in communicationthat helps in preventing the source from overwhelming thereceiver. In a mobile/wireless setting, flow control can kickin and prove to be the bottleneck for the connectionprogress due to two reasons: 1) If the application on themobile device reads slowly or is temporarily halted forsome other reason, the receiver buffer fills up and thesource is eventually frozen till the buffer empties. 2) Whenthere are losses in the network and the receiver buffer sizeis of the same order as the BDP (which is typically true),flow control can prevent new data transmissions evenwhen techniques such as fast recovery are used due tounavailability of buffer space at the receiver. With fastrecovery, the sender inflates the congestion window tocompensate the new ACKs received. However, this infla-tion may be curbed by the flow-control mechanism if thereis no buffer space on the receiver side.

2.3.4 Other Reasons

While the above discussed reasons are behavioral “acts ofcommission” by the applications that result in loweredperformance, we now discuss two more reasons that can beseen as behavioral “acts of omission.” These are techniquesthat the applications could have used to address conditionsin a wireless environment, but do not.

Nonprioritization of data. For all three applicationsconsidered, no explicit prioritization of data to be fetched isperformed, and hence, all the data to be fetched are givenequal importance. However, for certain applications prior-itizing data in a meaningful fashion can have a profoundimpact on the performance experienced by the end systemor user. For example, consider the case of HTTP used forbrowsing on a small-screen PDA. When a Web page URLrequest is issued, HTTP fetches all the data for the Webpage with equal importance. However, the data corre-sponding to the visible portion of the Web page on thePDA’s screen are obviously of more importance and willhave a higher impact on the perceived performance by theend user. Thus, leveraging some means of prioritizationtechniques can help deliver better performance to the user.With such nonprioritization of data, HTTP suffers perfor-

mance as defined by the original data size and the lowbandwidths of the wireless environment.

Nonuse of data reduction techniques. Finally, anotherissue is applications not using knowledge specific to theircontent or behavior to employ effective data reductiontechniques. For example, considering the SMTP application,“email vocabulary” of users has evolved over the lastcouple of decades to be very independent of traditional“writing vocabulary” and “verbal vocabulary” of the users.Hence, it is an interesting question as to whether SMTP canuse e-mail vocabulary-based techniques to reduce the actualcontent transferred between SMTP servers, or an SMTPserver and a client. Not leveraging such aspects proves to beof more significance in wireless environments, where thebaseline performance is poor to start with.

3 DESIGN

Since we have outlined several behavioral problems withapplications in Section 2, an obvious question to ask is:“Why not change the applications to address these problems?”We believe that is indeed one possible solution. Hence, westructure the presentation of the A3 solution into twodistinct components: 1) the key design elements orprinciples that underlie A3 and 2) the actual realization ofthe design elements for specific applications in the form ofan optimization middleware that is application-aware, butapplication transparent. The design elements genericallypresent strategies to improve application behavior andcan be used by application developers to improve perfor-mance by incorporating changes to the applicationsdirectly. In the rest of this section, we outline the designof five principles in the A3 solution.

3.1 Transaction Prediction (TP)

TP is an approach to deterministically predict futureapplication data requests to the server and issue them aheadof time. Note that this is different from techniques such as“opportunistic pre-fetching,” where content is heuristicallyfetched to speed up later access but is not guaranteed to beused.3 In TP, A3 is fully aware of application semantics andknows exactly what data to fetch and that the data will beused. TP will aid in conditions, where the BDP is larger thanthe default application block fetch size and the RTT is verylarge. Under both cases, the overall throughput will improvewhen TP is used. Fig. 3a shows the throughput performanceof CIFS when fetching files of varying sizes in a 100 Mbps

4 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 8, NO. X, XXXXXXX 2009

3. We further discuss on this issue in Section 7.

Fig. 3. Motivation for TP and RAR. (a) Throughput of FTP and CIFS. (b) Number of requests of CIFS. (c) Throughput of SMTP.

LAN network. It can be seen that the performance issubstantially lower than that of FTP and this is due to theblock-based fetching mechanism described in Section 2.Fig. 3b shows the number of transactions it takes CIFS toactually fetch a single file, and it can be observed that thenumber of transactions increases linearly with the file size.Under such conditions, TP will “parallelize” the transac-tions, and hence, improve throughput performance. Goodexamples of applications that will benefit from using TPinclude CIFS and HTTP for reasons outlined in Section 2.

3.2 Redundant and Aggressive Retransmissions(RARs)

RAR is an approach to protect thin session control anddata request messages better from losses. The techniqueinvolves recognizing thin application messages, using acombination of packet level redundancy, and aggressiveretransmissions to protect such messages. RAR will helpaddress both issues with thin messages identified inSection 2. The redundant transmissions reduce the prob-ability of message losses and the aggressive retransmis-sions that operate on tight RTT granularity time-outsreduce the loss recovery time. The key challenge in RAR isto recognize thin messages in an application-awarefashion. Note that only thin messages require RAR becauseof reasons outlined in Section 2. Regular data messagesshould not be subjected to RAR both because their lossrecovery can be masked in the overall transaction time byperforming the recovery simultaneously with other datapacket transmissions, and because the overheads ofperforming RAR will become untenable when applied tolarge volume messages such as the data. Fig. 3c shows thethroughput performance of SMTP under lossy conditionsin a WWAN setup. The dramatic effect of a 35 percentdrop in throughput performance for a loss-rate increasefrom 0 to 7 percent is much higher than the 15 percentdrop in performance in the FTP performance for the samecorresponding loss-rate increase. Typical applications thatcan benefit from RAR include CIFS, SMTP, and HTTP.

3.3 Prioritized Fetching (PF)

PF is an approach to prioritize subsets of data to be fetched asbeing more important than others and to fetch the higherpriority data faster than the lower priority data. A simpleapproach to achieve the dual-rate fetching is to use defaultTCP-like congestion control for the high-priority data, butuse congestion control like in TCP-LP [19] for low-prioritydata. An important consideration in PF is to devise a strategy

to prioritize data intelligently and on the fly. Fig. 4a showsthe average transfer sizes per screen as well as the entire Webpage for the top 50 accessed Web pages on the World WideWeb [2]. It can be seen that nearly 80 percent of the data(belonging to screens 2 and higher) are not directly impactingresponse time experienced by the user, and hence, can bedeprioritized in relation to the data pertaining to the firstscreen. Note that the results are for a 1;024� 768 resolutionlaptop screen, and will, in fact, be better for smaller screendevices such as PDAs. Good examples of applications thatcan benefit from PF include HTTP and SMTP.

3.4 Infinite Buffering (IB)

IB is an approach that prevents flow control from throttlingthe progression of a network connection terminating at themobile wireless device. IB prevents flow control fromimpacting performance by providing the sender theimpression of an infinite buffer at the receiver. Secondarystorage is used to realize such an infinite buffer, with themain rationale being that reading from the secondary storagewill be faster than fetching it from the sender over the wirelessnetwork when there is space created in the actual connection bufferat a later point. With typical hard disk data transfer ratestoday being at around 250 Mbps [5], the above-mentionedrationale is well justified for wireless environments. Notethat the trigger for using IB can be both due to applicationreading slowly or temporarily not reading from theconnection buffer, and due to losses on the wireless path.Figs. 4b and 4c show the throughput performance of SMTPunder both conditions. Note that the ideal scenarioscorrespond to an upper bound of the throughput.4 It canbe observed that for both scenarios, the impact of flowcontrol drastically lowers performance compared to what isachievable. Due to lack of space, in the rest of the paper, wefocus on IB specifically in the context of the more traditionaltrigger for flow control—application reading bottleneck.Typical applications that can benefit from IB include CIFS,SMTP, and HTTP—essentially, any application that mayattempt to transfer more than a BDP worth of data.

3.5 Application-Aware Encoding (AE)

AE is an approach that uses application-specific informa-tion to better encode or compress data during commu-nication. Traditional compression tools such as zip operateon a given content in isolation without any context for the

ZHUANG ET AL.: APPLICATION-AWARE ACCELERATION FOR WIRELESS DATA NETWORKS: DESIGN ELEMENTS AND PROTOTYPE... 5

Fig. 4. Motivation for PF (a) and IB (b, c). (a) Transfer size per screen. (b) Impact of application reading rate. (c) Impact of loss increase.

4. In Fig. 4c, the throughput drop is caused by both flow-control andcongestion-control-related mechanisms, and the flow-control mechanismcontributes significantly.

application corresponding to the content. AE, on the otherhand, explicitly uses this contextual information to achievebetter performance. Note that AE is not a better compres-sion algorithm. However, it is a better way of identifyingdata sets that need to be operated on by a givencompression algorithm. Table 1 shows the average e-mailvocabulary characteristics of 10 different graduate studentsbased on 100 e-mails sent by each person during twoweeks. It is interesting to see the following characteristicsin the results: 1) the e-mail vocabulary size across the10 people is relatively small—a few thousand words and2) even a simple encoding involving this knowledge willresult in every word being encoded with only 10 to 12 bits,which is substantially lower than using 40 to 48 bitsrequired using standard binary encoding. In Section 5, weshow that such vocabulary-based encoding can consider-ably outperform other standard compression tools such aszip as well. Moreover, further benefits can be attained ifmore sophisticated compression schemes such as Huffmanencoding are employed instead of a simple BINARYencoding. Typical applications that can benefit from usingAE include SMTP and HTTP.

4 SOLUTION

4.1 Deployment Model and Architecture

The A3 deployment model is shown in Fig. 5. Since A3 is aplatform solution, it requires two entities at either end of thecommunication session that are A3-aware. At the mobiledevice, A3 is a software module that is installed in userspace. At the server side, while A3 can be deployed as a

software module on all servers, a more elegant solutionwould be to deploy a packet processing network appliancethat processes all content flowing from the servers to thewide area network. We assume the latter model for ourdiscussions. However, note that A3 can be deployed ineither fashion as it is purely a software solution.

This deployment model will help in any communicationbetween a server behind the A3 server and the mobiledevice running the A3 module. However, if the mobiledevice communicates with a non-A3-enabled server, twooptions exist: 1) As we discuss later in the paper, A3 can beused as a point-solution with lesser effectiveness or 2) theA3 server can be brought closer to the mobile device,perhaps within the wireless network provider’s accessnetwork. In the rest of the paper, we do not delve into thelatter option. However, we do revisit the point-solutionmode of operation of A3.

We present an A3 implementation that resides in userspace, and uses the NetFilter utility in Linux for thecapturing of traffic outgoing and incoming at the mobiledevice. NetFilter is a Linux-specific packet capture tool thathas hooks at multiple points in the Linux kernel. The A3

hooks are registered at the Local-In and Local-Out stages ofthe chain of hooks in NetFilter. While our discussions areLinux-centric, our discussions can be mapped on theWindows operating system through the use of the WindowsPacket Filtering interface, or wrappers such as PktFilter thatare built around the interface. Fig. 6a shows the A3

deployment on the mobile device using NetFilter.The A3 software architecture is shown in Fig. 6b. Since

the design elements in A3 are to a large extent independentof each other, a simple chaining of the elements in anappropriate fashion results in an integrated A3 architecture.The specific order in which the elements are chained in theA3 realization is TP, RAR, PF, IB, and AE. While RAR

6 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 8, NO. X, XXXXXXX 2009

TABLE 1Statistics of 100 E-Mails Sent by 10 Users

Fig. 5. Deployment model.

Fig. 6. A3 deployment model with NetFilter and software architecture. (a) Deployment with Netfilter. (b) Software architecture.

protects the initial session control exchanges and the datarequests, it operates on traffic after TP, given that TP cangenerate new requests for data. PF manipulates the prioritywith which different requests are served and IB ensuresthat data responses are not throttled by flow control.Finally, AE compresses any data outgoing and decom-presses any data incoming.

4.2 Application Overviews

Since we describe the actual operations of the mechanismsin A3 in the context of one of the three applications, we nowbriefly comment on the specific message types involved intypical transactions by those applications. We then refer tothe specific message types when describing the operationsof A3 subsequently.

Due to lack of space, instead of presenting all messagetypes again, we refer readers back to Fig. 2 to observe themessage exchanges for the three applications. The labels suchas CIFS-x refer to particular message types in CIFS and will bereferred to in the A3 realization descriptions that follow.

CIFS, also sometimes known as Server Message Block(SMB), is a platform-independent protocol for file sharing.The typical message exchanges in a CIFS session are asshown in Fig. 2a. Overall, TP manipulates the CIFS-11message, RAR operates on CIFS-1 through CIFS-11, and IBaids in CIFS-12.

SMTP is Internet’s standard host-to-host mail transportprotocol and traditionally operates over TCP. The typicalmessage exchanges in an SMTP session are shown in Fig. 2b.Overall, RAR operates on SMTP-1 through SMTP-8 andSMTP-12 through SMTP-14, IB and AE operate on SMTP-9and SMTP-10.

The HTTP message exchange standard is relativelysimple, and typically consists of the messages shown inFig. 2c. A typical HTTP session consists of multiple objectsas well as the main HTML file, and hence, appears as asequence of overlapping exchanges of the above format.Overall, RAR operates on HTTP-1,2,3,5 and PF operates onHTTP-3,5.

4.3 A3 Realization

In the rest of the section, we take one design element at atime and walk through the algorithmic details of theelement with respect to a single application. Note that A3

is an application-aware solution, and hence, its operationswill be application specific. Since we describe each element

in isolation, we assume that the element resides between theapplication and the network. In an actual usage of A3, theelements will have to be chained as discussed earlier.

4.3.1 Transaction Prediction

Fig. 7a shows the flowchart for the implementation of TP forCIFS at the A3 client. When A3 receives a message from theapplication, it checks to see if the message is CIFS-9 andrecords state for the file transfer in its File-TP-States datastructure. It then passes through the message. If themessage was a request, TP checks to see if the request isfor a locally cached block, or for a new block. If the latter, itupdates the request for more blocks, stores informationabout the predicted requests generated in the Predicted-Request-States data structure, and forwards the requests.

In the reverse direction, when data come in from thenetwork, TP checks to see if the data are for a predictedrequest. If yes, it caches the data in secondary storage andupdates its state information, and forwards the data to theapplication otherwise.

The number of additional blocks to request is aninteresting design decision. For file transfer scenarios, TPgenerates requests asking for the entire file.5 The file sizeinformation can be retrieved from the CIFS-10 message. Ifthe incoming message is for an earlier retrieved block, TPretrieves the block from secondary storage and provides itto the application.

While CIFS servers accept multiple data requests from thesame client simultaneously, it is possible that for someapplications, the server might not be willing to acceptmultiple data requests simultaneously. In such an event, theA3 server will let only one of the client requests go through tothe server at any point in time, and will send the otherrequests one at a time once the previous requests are served.

4.3.2 Redundant and Aggressive Retransmissions

Fig. 7b shows the flowchart for the implementation of RARfor CIFS. When A3 receives a message from the application,it checks to see if it is a thin message. The way A3 performsthe check is to see if the message is one of the messagesbetween CIFS-1 and CIFS-11. All such messages areinterpreted as thin messages.

ZHUANG ET AL.: APPLICATION-AWARE ACCELERATION FOR WIRELESS DATA NETWORKS: DESIGN ELEMENTS AND PROTOTYPE... 7

Fig. 7. TP and RAR (shaded blocks are storage space and timer, white blocks are operations). (a) Transaction prediction. (b) Redundant andaggressive retransmissions.

5. We further discuss on this issue in Section 7.

If the incoming message is not a thin one, A3 will let itthrough as-is. Otherwise, A3 will create redundant copies ofthe message, note the information about current time, startretransmission alarm, and send out the copies in a staggeringfashion. When a response arrives, A3 checks the time stampfor the corresponding request and updates its estimatedRTT .A3 then passes on the message to the application. If thealarm expires for a particular thin message, the message isagain subjected to the redundant transmissions. A3 server isresponsible for filtering the successful arrivals of redundantcopies of the same message.

The key issues of interest in the RAR implementationare: 1) How many redundant transmissions are performed?Since packet loss rates in wireless data networks rarelyexceed 10 percent, even a redundancy factor of two (twoadditional copies created) reduces the effective loss-rate toabout 0.1 percent. Hence, A3 uses a redundancy factor oftwo. 2) How should the redundant messages be staggered?The answer to this question lies in the specific channelcharacteristics experienced by the mobile device. However,at the same time, the staggered delay should not exceed theround-trip time of the connection, as otherwise themechanism would lose its significance by unnecessarilydelaying the recovery of losses. Hence, A3 uses a stagger-ing delay of RTT

10 between any two copies of the samemessage. This ensures that within 20 percent of the RTTduration, all messages are sent out at the mobile device.3) How is the aggressive time-out value determined? Notethat while the aggressive time-out mechanism will helpunder conditions when all copies of a message are lost, thetotal message overhead by such an aggressive loss recoveryis negligible when compared to the overall size of datatransferred by the application. Hence, A3 uses a time-outvalue of the RTTavg þ �, where � is a small guard constantand RTTavg is the average RTT observed so far. Thissimple setting ensures that the time-out values are tight,and at the same time, the mechanism adapts to changes innetwork characteristics.

4.3.3 Prioritized Fetching

Fig. 8a shows the flowchart for the implementation of PF inthe context of HTTP. Once again, the key goal in PF forHTTP is to retrieve Web objects that are required for thedisplay of the visible portion of the Web page quickly at theexpense of the objects on the page that are not visible.

Unlike in the other mechanisms, PF cannot be imple-mented without some additional interactions with theapplication itself. Fortunately, browser applications havewell-defined interfaces for querying state of the browserincluding the current window focus, scrolling information,etc. Hence, the implementation of PF relies on a separatemodule called the application state monitor (ASM) that isakin to a browser plug-in to coordinate its operations.

When a message comes in from the application, PFchecks to see if the message is a request. If it is not, it is letthrough. Otherwise, PF checks with the ASM to see if therequested contents are immediately required. ASM classi-fies the objects requested as being of immediate need (i.e.,visible portion of Web page) or as those that are notimmediately required. PF then sends out fetch requestsimmediately for the first category of objects and uses a low-priority fetching mechanism for the remaining objects.

Since A3 is a platform solution, all PFs have to inform theA3 server that certain objects are of low priority throughA3-specific piggybacked information. The A3 server thendeprioritizes the transmission of those objects in preferenceto those that are of higher priority. Note that the relativeprioritization is used not only between the content of asingle end device, but also across end devices as well toimprove overall system performance. Approaches such asTCP-LP [19] are candidates that can be used for the relativeprioritization between TCP flows, although A3 currentlyuses a simple priority queuing scheme within the sameTCP flow at the A3 server.

Note that while the ASM might classify objects in aparticular fashion, changes in the application (e.g.,scrolling down) will result in a reprioritization of theobjects accordingly. Hence, the ASM has the capability ofgratuitously informing PF about priority changes. Suchchanges are immediately notified to the A3 server throughappropriate requests.

4.3.4 Infinite Buffering

Fig. 8b shows the flowchart for the implementation of IB inthe context of SMTP. IB keeps track of TCP connectionstatus and monitors all ACKs that are sent out by the TCPconnection serving the SMTP application for SMTP-9 andSMTP-10. If the advertised window in the ACK is less thanthe maximum possible, IB immediately resets the adver-tised window to the maximum value and appropriately

8 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 8, NO. X, XXXXXXX 2009

Fig. 8. PF and IB (shaded blocks are storage space and timer, white blocks are operations). (a) Prioritized fetching. (b) Infinite buffering.

updates its current knowledge of the connection’s bufferoccupancy and maximum in-sequence ACK information.

Hence, IB prevents anything less than the maximumbuffer size from being advertised. However, when datapackets arrive from the network, IB receives the packets andchecks to see if the connection buffer can accommodatemore packets. If the condition is true, IB delivers the packetsto the application directly. If the disk cache is nonempty,which means that the connection buffer is full, the incomingpacket is directly added to the cache. In this case, IBgenerates a proxy ACK back to the server. Then, if theconnection buffer has space in it, packets are retrieved fromthe disk cache and given to the application till the bufferbecomes full again. When the connection sends an ACK fora packet already ACKed by IB, IB suppresses the ACK.When the connection state is torn down for the SMTPapplication, IB resets the state accordingly.

4.3.5 Application-Aware Encoding

Fig. 9 shows the flowchart for the implementation of AE forSMTP. When AE receives data (SMTP-9) from the SMTPapplication, it uses its application vocabulary table tocompress the data, marks the message as being compressed,and forwards it to the network. The marking is done toinform the A3 server about the need to perform decom-pression. Similarly, when incoming data arrive for theSMTP server and the data are marked as compressed, AEperforms the necessary decompression.

The mechanisms used for the actual creation andmanipulation of the vocabulary tables are of importanceto AE. In A3, the SMTP vocabulary tables are created andmaintained purely on a user pairwise basis. Not only are thetables created in this fashion, but the data sets over whichthe vocabulary tables are created are also restricted to thispairwise model. In other words, if A is the sender and B isthe receiver, A uses its earlier e-mails to B as the data set onwhich the A-B vocabulary table is created, and then usesthis table for encoding. B, having the data set already (sincethe e-mails were sent to B), can exactly recreate the table onits side, and hence, decode any compressed data. Thisessentially precludes the need for exchanging tables fre-quently and also takes advantage of changes in vocabularysets that might occur based on the recipient. Though thetables are created on both sides implicitly and synchronizedin most cases, a backup mechanism used to explicitlysynchronize the tables is also needed. The synchronization

action is triggered by a mismatch of table hashes on both

sides, and the hash is sent along each new e-mail and

updated when the table changes.

4.4 A3 Point Solution—A3�While the A3 deployment model assumed so far is a

platform model requiring participation by A3-enabled

devices at both the client and server ends, in this section,

we describe how A3 can be used as a point-solution, albeit

with somewhat limited capabilities. We refer to the point-

solution version of A3 as A3 � .Of the five design elements in A3, the only design

element for which the platform model is mandatory is the

application-aware encoding mechanism. Since compression

or encoding is an end-to-end process, A3 � cannot be used

with AE. However, each of the other four principles can be

employed with minimal changes in A3 � .TP involves the generation of predictive data requests,

and hence, can be performed in A3 � as long as the

application server can accept multiple simultaneous re-

quests. For CIFS and HTTP, the servers do accept

simultaneous requests. IB is purely a flow-control avoid-

ance mechanism and can be realized in A3 � . RAR

involves redundant transmissions of messages, and hence,

can be implemented in A3 � as long as application servers

are capable of filtering duplicate messages. If the applica-

tion servers are not capable of doing so (e.g., HTTP

servers, which would respond to each request), the

redundant transmissions will have to be performed at

the granularity of transport layer segments as opposed to

application layer messages, since protocols such as TCP

provide redundant packet filtering. Finally, PF can be

accomplished in A3 � in terms of classifying requests and

treating the requests differently. However, the slow

fetching of data not required immediately has to be

realized through coarser receiver-based mechanisms such

as delayed requests as opposed to the best possible

strategy of slowing down responses as in A3.

5 EVALUATION

In this section, we evaluate the performance of A3. The

evaluation is performed with application-specific traffic

generators, which are modeled based on traffic traces

generated by the IxChariot emulator and documented

standards for the application protocols. Since each applica-

tion protocol in the study has various software implementa-

tions and different implementations may differ in certain

aspects of protocol standards, we believe that such simula-

tions with abstracted traffic generators can help capture the

trend of performance enhancement delivered by A3.In addition to the emulation, we also build a proof-of-

concept prototype ofA3. The prototype implements all five of

A3 design principles and works with the following applica-

tions: Secure Copy (SCP), Internet Explorer, Samba, and

SendMail. The primary goal of building such a prototype is to

prove that the proposed A3 architecture does indeed work

with real applications. We also use the prototype to obtain

system-level insights into A3 implementation.

ZHUANG ET AL.: APPLICATION-AWARE ACCELERATION FOR WIRELESS DATA NETWORKS: DESIGN ELEMENTS AND PROTOTYPE... 9

Fig. 9. Application-aware encoding.

5.1 Setup

5.1.1 Emulation

The experimental setup for the emulation is shown in Fig. 10.The setup consists of three desktop machines running theFedora Core 4 operating system with the Linux 2.6 kernel. Allthe machines are connected using 100 Mbps LAN.

An application-emulator (AppEm) module runs on boththe two end machines. The AppEm module is a custom-built user-level module that generates traffic patterns andcontent for three different application protocols: CIFS,SMTP, and HTTP. The AppEm module also generatestraffic content based on both real-life input data sets (fore-mail and Web content) and random data sets (Filetransfer).6 The traffic patterns shown in Fig. 2 arerepresentative of the traffic patterns generated by AppEm.

The system connecting the two end systems runs theemulators for both A3 (A3Em) and the wireless network(WNetEm). Both emulators are implemented within theframework of the ns2 simulator, and ns2 is running in theemulation mode. Running ns2 in its emulation mode allowsfor the capture and processing of live network traffic. Theemulator object in ns2 taps directly into the device driver ofthe interface cards to capture and inject real packets into thenetwork. All five A3 mechanisms are implemented in theA3-Em module and each mechanism can be enabled eitherindependently or in tandem with the other mechanisms.The WNetEm module is used for emulating differentwireless network links representing the WLAN, WWAN,and SAT environments. The specific characteristics used torepresent wireless network environments are the same asthose presented in Section 2.

The primary metrics monitored are throughput, re-sponse time (for HTTP), and confidence intervals for thethroughput and response time. Each data point is theaverage of 20 simulation runs, and in addition, we showthe 90 percent confidence intervals. The results of theevaluation study are presented in two stages. We firstpresent the results of the performance evaluation of A3

principles in isolation. Then, we discuss the combinedperformance improvements delivered by A3.

5.1.2 Proof-of-Concept Prototype

The prototype runs on a testbed consisting of five PCsconnected in a linear topology. The first four PCs runFedora Core 5 with 2.6.15 Linux kernel. The fifth PC hasdual OS of both Fedora Core 5 and Windows 2000. All

machines are equipped with 1 GHz CPU and 256 MBmemory. The implementation utilizes NetFilter [7] Utilityfor Linux platform. The first PC works as the applicationserver for the SMTP (Sendmail server), CIFS (Samba server),and SCP (SCP server). The A3 server module and clientmodule are installed on the second and fourth PCs,respectively. The third PC works as a WAN emulator andthe fifth PC has the e-mail client, sambaclient, SCP client,and Internet Explorer running on it.

The prototype implementation makes use of two netfilterlibraries: libnfnetlink (version 0.0.16) and libnetfilter_queue(version 0.0.12) [7]. The registered hook points that we use areNF_IP_FORWARD. For all the hooks, the registered target isthe NF_QUEUE, which queues packets and allows user-space processing. After appropriate processing, the queuedpackets will be passed, dropped, or altered by the modules.

5.2 Transaction Prediction

We use CIFS as the application traffic for evaluating theperformance of Transaction Prediction. The results of the TPevaluation are shown in Fig. 11. The x-axis of each graphshows the size of the transferred file in megabytes and they-axis the application throughput in megabits per second.The results show several trends as follows:

1. Using wireless-aware transport layer protocols (suchas ELN, WTCP, and STP), the increase in throughputis very negligible. This trend is consistent with theresults in Section 2.

2. Transaction Prediction improves CIFS applicationthroughput significantly. In the SAT network, forinstance, TP improves CIFS throughput by morethan 80 percent when transferring a 10-Mbytes file.

3. The improvement achieved by TP increases withincrease in file size. This is because TP is able toreduce more the number of request-response inter-actions with increasing file size.

4. TP achieves the highest improvement in SAT network.

5.2.1 Proof-of-Concept Results

The prototype works with a Smbclient and a Samba server.The scenario considered in the prototype is that of theSmbclient requesting files of various sizes from the Sambaserver. The implementations for CIFS are working withSMB protocols running directly above TCP (i.e., byconnecting to port 445 of Samba servers) instead of overNetBIOS sessions. The Samba server version is 3.0.23 andsmbclient version is 2.0.7.

One of the nontrivial issues faced while implementingthe prototype is the TCP sequence manipulation. The issueis caused by the TP acceleration requests generated by A3.SMB sessions use TCP to request/send data, thus, theSamba server is always expecting TCP packets with correctTCP sequence numbers. The acceleration request has topredict not only the block offset for SMB sessions but alsothe TCP sequence numbers, failing which the Samba serverwould see a TCP packet with an incorrect TCP sequencenumber and behave unexpectedly. The prototype imple-mentation addresses this problem by also keeping track ofTCP state and using the appropriate TCP sequencenumbers. This is an indication of the application awareness

10 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 8, NO. X, XXXXXXX 2009

Fig. 10. Simulation network.

6. While the IxChariot emulator can generate representative traffic traces,it does not allow for specific data sets to be used for the content, and hence,the need for the custom built emulator.

of A3 potentially needing to be extended to include

transport layer awareness as well. Since some of the

principles in A3 are directly transport layer dependent

(e.g., infinite buffering), we believe that this extension still

falls within the scope of A3.The proof-of-concept results are shown in Fig. 13a. TP

helps deliver more improvement for larger files, and the

throughput improvement achieved when requesting a 5-

Mbytes file is up to 500 percent.

5.3 Redundant and Aggressive Retransmissions

We evaluate the effectiveness of RAR using the CIFS

application protocol. The results of the RAR evaluation

are presented in Fig. 12. The x-axis in the graphs is the

requested file size in megabyte and the y-axis is the CIFS

application throughput in megabits per second. We observe

that RAR delivers better performance when compared to

both TCP-NewReno and the tailored transport protocols,

delivering up to 80 percent improvement in throughput

performance for SATs. RAR is able to reduce the chances of

experiencing a time-out when a wireless packet loss occurs.

The reduction of TCP time-outs leads to better performance

using RAR.

5.3.1 Proof-of-Concept Results

The prototype implementation of RAR includes compo-

nents for performing the following functions: recognizing

session control messages, retransmission control, and

redundancy removal. On the sender side, the retransmis-

sion control component maintains current RTT values, sets

timers, and retransmits the possibly lost messages when

timers expire. The transmission of the redundant messages

is done using raw sockets. On the receiver side, the

redundancy removal component identifies redundant mes-

sages when the retransmission is a false alert, i.e., the

original message being retransmitted was not lost, but the

RAR aggressively performs retransmission.The prototype is built with a SendMail server and an e-

mail client. An e-mail of 5.2 KB is sent to SendMail server

over the network of 200 ms RTT , 100 Kbps bandwidth, and

varying loss rates. Throughput with and without RAR is

shown in Fig. 13b. We observe a 250 percent throughput

improvement when loss rate is 8 percent. More interest-

ingly, the throughput of RAR is not affected much by the

loss rate since RAR effectively hides the losses.

5.4 Infinite Buffering

The effectiveness of IB is evaluated using CIFS traffic and the

results are shown in Fig. 14. The x-axis is the requested file

size in megabytes and the y-axis is the application

throughput in megabits per second. We can see that:

1) Transferring larger data size with IB achieves higher

throughput. This is because IB helps most during the actual

data transfer phase, and will not help when the amount of

data to be transferred is less than a few times the BDP of the

network. 2) IB performs much better in an SAT network than

the other two networks, delivering almost a 400 percent

improvement in performance. Again, the results are as

expected because IB’s benefits are higher when the BDP of

the network is higher.

ZHUANG ET AL.: APPLICATION-AWARE ACCELERATION FOR WIRELESS DATA NETWORKS: DESIGN ELEMENTS AND PROTOTYPE... 11

Fig. 13. Prototype results of TP and RAR. (a) TP (Samba server).

(b) RAR (SendMail).

Fig. 12. Simulation results of redundant and aggressive retransmissions (CIFS). (a) WLAN. (b) WWAN. (c) SAT.

Fig. 11. Simulation results of transaction prediction (CIFS). (a) WLAN. (b) WWAN. (c) SAT.

5.4.1 Proof-of-Concept Results

We choose an SCP implementation to build the prototype ofIB. The IB component on the client side provides virtualbuffers, i.e., local storage, in user space. It stores data onbehalf of the data sender (i.e., SCP server). On the otherhand, it supplies stored data to the data receiver (i.e., SCPclient) whenever it receives TCP ACKs from it.

In the experiments, a 303-Kbytes file is sent from the SCPserver to SCP client over the network of 100 Mbpsbandwidth and varying RTTs. The tests are performed tolearn the impact of RTT on the performance improvement.The results are shown in Fig. 16a. We see that considerableimprovements are achieved by IB. An interesting observa-tion is that IB delivers more improvements with small RTTvalues. As RTT increases, the actual throughput of SCP alsodecreases even with IB enabled.

5.5 Prioritized Fetching

The performance of PF is evaluated with HTTP traffic andthe results are shown in Fig. 15. We consider the Top 50Web Sites as representatives of typical Web pages andmeasure their Web characteristics. We then use the obtainedWeb statistics to generate the workload. The x-axis in the

graphs is the requested Web page size in kilobytes and they-axis is the response time in seconds for the initial screen.In the figure, it can be seen that as a user accesses largerWeb pages, the response time difference between defaultcontent fetching and PF increases. PF consistently delivers a15 to 30 percent improvement in the response timeperformance. PF reduces aggressive traffic volumes bydeprioritizing the out-of-sequence fetching of the offscreenobjects. Note that PF, while improving the response time,does not improve raw throughput performance. In otherwords, only the effective throughput, as experienced by theend user, increases when using PF.

5.5.1 Proof-of-Concept Results

PF is a client-side solution, which does not require anymodification at the server side, but requires the integrationwith the application at the client side. In the prototype, weuse WinAPI with Internet Explorer 6.0 on the Windowsoperating system (running on PC-5).

The PF prototype consists of three main components. Thefirst component is the location-based object prioritization.The current prototype initially turns off the display optionof multimedia objects by changing the associated registryvalues. After the initial rendering is completed withoutdownloading multimedia objects, it calculates the locationof all the objects. The second component is the priority-based object fetching and displaying. The current prototypeuses the basic on-off model, which fetches the high-priorityobjects first and then fetches the other objects. If the pixel-size information of the object is inconsistent with thedefinition in the main document file, the prototype per-forms the reflow process that renders the entire documentlayout again. The third component is the reprioritization.When a user moves the current focus in the applicationwindow, PF detects the movement of the window andperforms the reprioritization for the objects that aresupposed to appear in the newly accessed area.

12 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 8, NO. X, XXXXXXX 2009

Fig. 14. Emulation results of infinite buffering (CIFS). (a) WLAN. (b) WWAN. (c) SAT.

Fig. 16. Prototype results of IB and AE. (a) IB (SCP). (b) AE (SendMail).

Fig. 15. Emulation results of prioritized fetching (HTTP). (a) WLAN. (b) WWAN. (c) SAT.

The Web clients are connected to the Internet and accesstwo Web sites: www.amazon.com and www.cnn.com. Tohighlight the features of PF, we show the results of bothtransferred size and response time for the first screen inFigs. 17a and 17b, respectively. Note that PF can reduce theresponse time by prioritizing data on the Web pages andtransferring only high-priority data first. PF sees about30 percent improvements on both of these two metrics.

5.6 Application-Aware Encoding

AE is designed primarily to accelerate e-mail deliveryusing SMTP, and hence, we evaluate the effectiveness ofAE for SMTP traffic. In the evaluation, e-mails of sizesranging from 1 to 10 Kbytes (around 120 to 1,200 words)are used. We show the results in Fig. 18, where the x-axisis the e-mail size in kilobytes and the y-axis is theapplication throughput in megabits per second. Varyingdegrees of throughput improvements are achieved, and inWWAN, an increase of 80 percent is observed whentransferring a 10-Kbytes e-mail. We can see that AEachieves the highest improvement in WWAN due to itsrelatively low bandwidth.

We also show the effectiveness of AE in terms ofcompression ratio in Fig. 20. In the figure, the results of10 persons’ e-mails using three compression estimators(WinRAR, WinZip, and AE) are shown. We can see thatWinRAR and WinZip can compress an e-mail by a factor of2 to 3, while AE can achieve a compression ratio of about 5.

5.6.1 Proof-of-Concept Results

The prototype of AE maintains a coding table on eitherside, and these two tables are synchronized in order toprovide encoding and decoding functions. AE monitors theDATA message in SMTP protocol to locate the e-mailcontents. The e-mail content is textual in nature and isexpressed using the US-ASCII standard. AE uses the

Extended ASCII Codes to provide encoding. We employ asimplified Huffman-style coding mechanism for the sake ofoperation complexity. The total coding space size is 5,008.

The AE component scans an incoming e-mail, and if aword is contained in the coding table, it is replaced by thecorresponding tag. If several consecutive words are coveredby coding tables, their codes will be concatenated, andnecessary padding will be added to the codes to form fullbytes. For the words that are not covered by the codingtables, they will stay unchanged with their ASCII repre-sentations. Since the e-mail vocabulary of a user maychange with time, AE incorporates a table updatingmechanism. AE periodically performs updating operationsfor every 500 e-mails. To maintain table consistencybetween the client and the server, a table synchronizationmechanism is employed. Since a user’s e-mail vocabulary isexpected to change slowly, the AE performs incrementalsynchronization rather than copying the entire table.

SendMail is used to build the prototype. Purely text-based e-mails of various sizes are sent from an e-mailclient to SendMail server, and the network is configuredwith 100 ms RTT and 50 Kbps bandwidth. The results areshown in Fig. 16b. Every data point is the average value offive e-mails of similar sizes. The throughput is improvedby 80 percent with AE.

5.7 Integrated Performance Evaluation

We now present the results of the combined effectiveness ofall applicable principles for the three application protocols,CIFS, SMTP and HTTP. We employ RAR, TP, and IB on theCIFS traffic in the emulation setup. For SMTP, the RAR, AE,and IB principles are used. For HTTP, the A3 principlesapplied are RAR, PF, and IB. As expected, the throughputof the applications (CIFS and SMTP) when using theintegrated A3 principles is higher than when any individualprinciple is employed in isolation, while the response timeof HTTP is lower than any individual principle. The resultsare shown in Fig. 19, with A3 delivering performanceimprovements of approximately 70, 110, and 30 percent forCIFS, SMTP, and HTTP, respectively.

6 RELATED WORK

6.1 Wireless-Aware Middleware and Applications

The Wireless Application Protocol (WAP) is a protocoldeveloped to allow efficient transmission of WWW contentto handheld wireless devices. The transport layer protocolsin WAP consists of the Wireless Transaction Protocol and

ZHUANG ET AL.: APPLICATION-AWARE ACCELERATION FOR WIRELESS DATA NETWORKS: DESIGN ELEMENTS AND PROTOTYPE... 13

Fig. 18. Simulation results of application-aware encoding (SMTP). (a) WLAN. (b) WWAN. (c) SAT.

Fig. 17. Prototype results of prioritized fetching (Internet Explorer).

(a) Transferred size. (b) Response time.

Wireless Datagram Protocol, which are designed for useover narrow band bearers in wireless networks and are notcompatible with TCP. WAP is highly WWW-centric anddoes not aim to optimize any of the application behavioralpatterns identified earlier in the paper.

Browsers such as Pocket Internet Explorer (PIE) [8] aredeveloped with capabilities that can address resourceconstraints on mobile devices. However, they do notoptimize communication performance, which is the focusof A3. The work of Mohomed et al. [20] aims to savebandwidth and power by adapting the contents based onuser semantics and contexts. The adaptations, however, areexposed to the end applications and users. This is differentfrom the A3 approach, which is application-transparent.

The Odyssey project [21] focuses on system support forcollaboration between the operating system and individualapplications by letting them both be aware of the wirelessenvironment, and thus, adapt their behaviors. Compara-tively, A3 does not rely on the redesign of the OS or protocolstack for its operation and is totally transparent both to theunderlying OS and the applications. The Coda file system[25] is based on the Andrew File System (AFS), but supportsdisconnected operations for mobile hosts. When the client isconnected to the network, it hoards files for later use duringdisconnected operations. During disconnections, Codaemulates the server, serving files from its local cache.Coda’s techniques are specific to file systems and requireapplications to have changed semantics for the data thatthey use.

6.2 Related Design Principles

Some related works in literature have been proposed toaccelerate applications with various mechanisms [14], [15].We present a few of them here and identify the differencesvis-a-vis A3.

1. TP-related: Czerwinski and Joseph [13] propose to“upload” clients’ task to the server side, thus,eliminating many RTTs required for applicationslike SMTP. This approach is different from the A3

approach in terms of application protocols appliedand the overall mechanism.

2. RAR-related: Mechanisms like Forward Error Control(FEC) use error control coding for digital commu-nication systems. A link-layer retransmission ap-proach to improve TCP performance is proposed in[22]. Another work [28] proposes aggressive retrans-mission mechanism to encourage legitimate clientsto behave more aggressively in order to fight attackagainst servers. Compared to these approaches, A3

only applies RAR to control messages in applicationprotocols and it does so by retransmitting the controlmessages when a maintained timer expires. Wepresent arguments earlier in the paper as to whyprotecting control message exchanges is a majorfactor affecting application performance.

3. PF-related: To improve the Web-access performance,tremendous works have been done [24], [8], [6], [11].Work in [20] proposes out-of-order transmission ofHTTP objects above UDP and breaks the in-orderdelivery of an object. However, unlike the A3

framework, it requires the cooperation of both clientand server sides.

4. IB-related: The mechanisms such as that given in [23],TCP Performance Enhancing Proxy (TCP PEP) [9]are proposed to shield the undesired characteristicsof various networks, particularly wireless networks.IB is different from these approaches and aims atfully utilizing the network resources by removingthe buffer length constraint. IB specifically applies toapplications with bulk data transfer, such as FTP,and is meant to counter the impact of flow control.Some works also observe that applications sufferfrom poor performance over high latency links dueto flow control. For example, Rapier and Bennett [23]propose to change the SSH implementations toremove the bottleneck caused by receive buffer.

5. AE-related: Companies like Converged [3] provideapplication-aware compression solutions throughcompressing the data for some applications basedon priority and application nature. These mechanismsshare the property of being application-aware, meaningthat only a subset of applications will be compressed.

14 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 8, NO. X, XXXXXXX 2009

Fig. 19. Integrated A3 results in WWAN. (a) CIFS. (b) SMTP. (c) HTTP.

Fig. 20. Effectiveness of AE.

However, AE has the property of being user-aware,that is, taking into consideration user-specific infor-mation, and thus, can achieve better performances.

6.3 Commercial WAN Optimizers

Several companies, such as Riverbed [10] and Juniper [4],sell WAN-optimization application-acceleration products.However, 1) Almost all the commercial solutions areproprietary ones; 2) The A3 principles such as RAR, IB,AE, and PF are not seen in commercial solutions; and3) Many of the techniques used in commercial solutions,such as bit-level caching and compression, are hardware-based approaches and require large amounts of storage.These properties render the commercial solutions inap-plicable for environments, where easy deployment isrequired. A3 is a middleware approach requiring smallamounts of storage.

7 CONCLUSIONS AND DISCUSSION

In this paper, we motivate the need for applicationacceleration for wireless data networks, and present theA3 solution that is application-aware, but applicationtransparent. We further discuss a few issues in the rest ofthe section.

7.1 Insights into A3 Principles

In this work, we present a set of five A3 principles. Werealize that this set is not an exclusive set of all A3

principles. Hence, we further explore the design space of ageneral application acceleration framework. Specifically, weargue that the general A3 framework consists at least fiveorthogonal dimensions of principles, namely Provisioning,Protocol Optimization, Prediction, Compression, and QoS.In this context, RAR and IB belong to the dimension ofProtocol Optimization, TP belongs to Prediction dimension,AE belongs to Compression dimension, and PF belongs toQoS dimension. More principles, as well as more dimen-sions, are left as part of our future work.

The principles of RAR, IB, and AE are applicationindependent, meaning that they can be used to accelerateany application; while PF and TP are application specificand can only help certain applications. We believe that suchclassifications can help gain more insights into the A3

design so that the A3 principles can be incorporated into thedesign of new applications.

7.2 TP versus Opportunistic Prefetching

TP is designed to do deterministic prefetching rather thanopportunistic prefetching. Opportunistic prefetching tech-niques aggressively request data that might be used by theend user in the future. For example, some Web-accessproducts (e.g., Web browsers) prefetch data by requestingWeb contents based on some hints. Our design goal of TPis to do deterministic prefetching since otherwise thedesign will incur overhead incurred by requestingunnecessary contents.

Ensuring deterministic prefetching is nontrivial. We nowpresent several approaches to this problem and will exploreother approaches in future work. One simple approach is toapply TP only to file transfer operations, where users

always request a file in its entirety. The second approach isto let TP be fully aware of the application software beingaccelerated and only prefetch data that are definitelyneeded. In other words, TP can be designed to besufficiently intelligent so that it can recognize the specificapplication implementations and avoid the unnecessarydata fetching. For example, CIFS protocol may have varioussoftware implementations. Some software may support therange-locking functions, but others may not. If TP is awareof these differences, it can act correspondingly to ensure itsdeterministic behaviors. But surely, the downside of thisapproach is the associated design overhead required forsuch intelligence. In practical deployment, the overhead canbe affordable only if the benefits gained are larger than thecost of the overhead. An alternative approach is to relax thestrictness of deterministic prefetching by tolerating somedegree of opportunistic prefetching. The correspondingsolution is “constrained acceleration.” With constrainedacceleration, instead of prefetching the entire file, TPprefetches a “chunk,” which is larger than a block. Thus,even if some portion of the prefetched chunk is not used,the cost is constrained. The chunk size is defined byacceleration degree, the design of which requires furtherwork. In our proof-of-concept prototype, we adopted suchan approach with fixed value of acceleration degree.

7.3 Preliminary Complexity Analysis

One of the important issues when considering deploymentof a technique is the complexity. A3 can be deployed/realized in multiple ways. For instance, it can be realizedin either user space or kernel space, and can be deployedas either a full platform model or a point model (i.e.,A3 �). Different deployment or realization models areassociated with different degrees of complexity andperformance trade-off.

We now perform certain preliminary complexity analysisin term of lines of codes, memory usage, and computationoverhead. Our prototype implements A3 framework in userspace and is deployed as a platform solution. 1) Theprototype is implemented with about 4.5 K lines of c codes.Specifically, PF and TP each has about 1,000 lines of codes,and other elements each has about 600 to 900 lines. 2) Thememory usage varies with different A3 elements. Specifi-cally, TP uses more memory than other elements since itneeds to temporarily hold the returned data correspondingto the accelerated requests, hence, the memory size is afunction of the acceleration degree and the receiver’sconsumption rate. IB also stores application data temporallyto compensate the receiver’s TCP buffer and the memorysize depends on the receiver buffer size and receiver’sreading rate. AE needs to allocate a space to store thecoding table, so the memory usage is proportional to thetable size. RAR and PF use relatively less memory thanother three elements since they maintain little applicationdata and state. 3) In terms of the computation overhead, weobserve little change on the CPU usage when running theprototype. Specifically, AE uses relatively more CPU since itneeds to perform data compression. For PF, the CPU usageis higher at the moment of user scrolling up or down sincePF needs to reprioritize the objects.

ZHUANG ET AL.: APPLICATION-AWARE ACCELERATION FOR WIRELESS DATA NETWORKS: DESIGN ELEMENTS AND PROTOTYPE... 15

ACKNOWLEDGMENTS

An earlier version of this paper appeared at ACM

MobiCom 2006 [29].

REFERENCES

[1] CIFS: A Common Internet File System, http://www.microsoft.com/mind/1196/cifs.asp, 2009.

[2] Comscore Media Metrix Top 50 Online Property Ranking, http://www.comscore.com/press/release.asp?press=547, 2009.

[3] Converged Access Wan Optimization, http://www.convergedaccess.com/, 2008.

[4] Juniper Networks, http://www.juniper.net/, 2009.[5] Linux Magazine, http://www.linux-magazine.com/issue/15/,

2009.[6] Minimo, a Small, Simple, Powerful, Innovative Web Browser for Mobile

Devices, http://www.mozilla.org/projects/minimo/, 2009.[7] Netfilter Project, http://www.netfilter.org/, 2009.[8] Pocket Internet Explorer, http://www.microsoft.com/windows

mobile/, 2009.[9] Rfc 3135: Performance Enhancing Proxies Intended to Mitigate Link-

Related Degradations, http://www.ietf.org/rfc/rfc3135.txt, 2009.[10] Riverbed Technology, http://www.riverbed.com/, 2009.[11] T. Armstrong, O. Trescases, C. Amza, and E. de Lara, “Efficient

and Transparent Dynamic Content Updates for Mobile Clients,”Proc. Fourth Int’l Conf. Mobile Systems, Applications and Services(MobiSys ’06), pp. 56-68, 2006.

[12] H. Balakrishnan and R. Katz, “Explicit Loss Notification andWireless Web Performance,” Proc. IEEE Conf. Global Comm.(GLOBECOM ’98) Global Internet, Nov. 1998.

[13] S. Czerwinski and A. Joseph, “Using Simple Remote Evaluation toEnable Efficient Application Protocols in Mobile Environments,”Proc. First IEEE Int’l Symp. Network Computing and Applications,2001.

[14] E. de Lara, D. Wallach, and W. Zwaenepoel, “Puppeteer:Component-Based Adaptation for Mobile Computing (PosterSession),” SIGOPS Operating Systems Rev., vol. 34, no. 2, p. 40,2000.

[15] E. de Lara, D.S. Wallach, and W. Zwaenepoel, “Hats: HierarchicalAdaptive Transmission Scheduling,” Proc. Multimedia Computingand Networking Conf. (MMCN ’02), 2002.

[16] T. Henderson and R. Katz, “Transport Protocols for Internet-Compatible Satellite Networks,” IEEE J. Selected Areas in Comm.(JSAC ’99), vol. 17, no. 2, pp. 345-359, Feb. 1999.

[17] H.-Y. Hsieh, K.-H. Kim, Y. Zhu, and R. Sivakumar, “A Receiver-Centric Transport Protocol for Mobile Hosts with HeterogeneousWireless Interfaces,” Proc. ACM MobiCom, pp. 1-15, 2003.

[18] IXIA, http://www.ixiacom.com/, 2009.[19] A. Kuzmanovic and E.W. Knightly, “Tcp-lp: Low-Priority Service

via End-Point Congestion Control,” IEEE/ACM Trans. Networking,vol. 14, no. 4, pp. 739-752, Aug. 2006.

[20] I. Mohomed, J.C. Cai, S. Chavoshi, and E. de Lara, “Context-Aware Interactive Content Adaptation,” Proc. Fourth Int’l Conf.Mobile Systems, Applications and Services (MobiSys ’06), pp. 42-55,2006.

[21] B.D. Noble, M. Satyanarayanan, D. Narayanan, J.E. Tilton, J. Flinn,and K.R. Walker, “Agile Application-Aware Adaptation forMobility,” Proc. 16th ACM Symp. Operating System Principles, 1997.

[22] S. Paul, E. Ayanoglu, T.F.L. Porta, K.-W.H. Chen, K.E. Sabnani,and R.D. Gitlin, “An Asymmetric Protocol for Digital CellularCommunications,” Proc. IEEE INFOCOM, vol. 3, p. 1053, 1995.

[23] C. Rapier and B. Bennett, “High Speed Bulk Data Transfer Usingthe ssh Protocol,” Proc. 15th ACM Mardi Gras Conf. (MG ’08), pp. 1-7, 2008.

[24] P. Rodriguez, S. Mukherjee, and S. Rangarajan, “Session LevelTechniques for Improving Web Browsing Performance on Wire-less Links,” Proc. 13th Int’l Conf. World Wide Web (WWW ’04),pp. 121-130, 2004.

[25] M. Satyanarayanan, J.J. Kistler, P. Kumar, M.E. Okasaki, E.H.Siegel, and D.C. Steere, “Coda: A Highly Available File System fora Distributed Workstation Environment,” IEEE Trans. Computers,vol. 39, no. 4, pp. 447-459, Apr. 1990.

[26] P. Sinha, N. Venkitaraman, R. Sivakumar, and V. Bharghavan,“Wtcp: A Reliable Transport Protocol for Wireless Wide AreaNetworks,” Proc. ACM MobiCom, pp. 231-241, 1999.

[27] The Network Simulator—ns-2, http://www.isi.edu/nsnam/ns,2009.

[28] M. Walfish, H. Balakrishnan, D. Karger, and S. Shenker, “Dos:Fighting Fire with Fire,” Proc. Fourth ACM Workshop Hot Topics inNetworks (HotNets ’05), 2005.

[29] Z. Zhuang, T.-Y. Chang, R. Sivakumar, and A. Velayutham, “A3:Application-Aware Acceleration for Wireless Data Networks,”Proc. ACM MobiCom, pp. 194-205, 2006.

Zhenyun Zhuang received the BE degree ininformation engineering from the Beijing Uni-versity of Posts and Telecommunications andthe MS degree in computer science fromTsinghua University, China. He is currentlyworking toward the PhD degree in the Collegeof Computing at the Georgia Institute of Tech-nology. His research interests are wirelessnetworking, distributed systems, Web-basedsystems, and application acceleration techni-

ques. He is a student member of the IEEE.

Tae-Young Chang received the BE degree inelectronic engineering in 1999, the MS degree intelecommunication system technology in 2001from Korea University, and the PhD degree fromthe School of Electrical and Computer Engineer-ing at Georgia Institute of Technology in 2008.He works at Xiocom Wireless and his researchinterests are in wireless networks and mobilecomputing. He is a student member of the IEEE.

Raghupathy Sivakumar received the BE de-gree in computer science from Anna University,India, in 1996, and the MS and PhD degrees incomputer science from the University of Illinoisat Urbana-Champaign in 1998 and 2000, re-spectively. Currently, he is an associate profes-sor in the School of Electrical and ComputerEngineering at the Georgia Institute of Technol-ogy. He leads the GNAN Research Group,where he and his students do research in the

areas of wireless networking, mobile computing, and computer net-works. He is a senior member of the IEEE.

Aravind Velayutham received the BE degreein computer science and engineering fromAnna University, India, in 2002, and the MSdegree in electrical and computer engineeringfrom Georgia Institute of Technology in 2005.Currently, he is the director of development atAsankya, Inc. His research interests are wire-less networks and mobile computing. He is amember of the IEEE.

. For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.

16 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 8, NO. X, XXXXXXX 2009