Labyrinth: Visually Configurable Data-Leakage Detection in ...Labyrinth respects the Bring Your Own...

8
Labyrinth: Visually Configurable Data-leakage Detection in Mobile Applications (Invited Industrial Paper) Marco Pistoia Omer Tripp IBM T. J. Watson Research Center Yorktown Heights, New York, USA Email: {pistoia,otripp}@us.ibm.com Paolina Centonze Iona College New Rochelle, New York, USA Email: [email protected] Joseph W. Ligman IBM T. J. Watson Research Center Yorktown Heights, New York, USA Email: [email protected] Abstract—Mobile devices have revolutionized many aspects of our lives. We use smartphones and tablets as portable computers and, often without realizing it, we run various types of security- sensitive programs on them, such as personal and enterprise email and instant-messaging applications, as well as social, banking, insurance and retail programs. These applications access and transmit over the network numerous pieces of private informa- tion, including our geographical location, device ID, contacts, calendar events, passwords, and health records, as well as credit- card, social-security, and bank-account numbers. Guaranteeing that no private information is exposed to unauthorized observers is very challenging given the level of complexity that these applications have reached. Furthermore, using program-analysis tools with out-of-the-box configurations in order to detect con- fidentiality violations may not yield the desired results because only a few pieces of private data, such as the device’s ID and geographical location, are obtained from standard sources. The majority of confidentiality sources (such as credit-card and bank- account numbers) are application-specific and require careful configuration. This paper presents Labyrinth, a run-time privacy enforce- ment system that automatically detects leakage of private data originating from standard as well as application-specific sources. Labyrinth features several novel contributions: (i) it allows for visually configuring, directly atop the application’s User Interface (UI), the fields that constitute custom sources of private data; (ii) it does not require operating-system instrumentation, but relies only an application-level instrumentation and on a proxy that intercepts the communication between the mobile device and the back-end servers; and (iii) it performs an enhanced form of value- similarity analysis to detect data leakage even when sensitive data (such as a password) has been encoded or hashed. Labyrinth supports both Android and iOS. We have evaluated Labyrinth experimentally, and in this paper we report results on production- level applications. I. I NTRODUCTION Mobile applications have access to different categories of private information residing on our smartphones and tablets, such as the unique device ID, current geographical location, calendar events and contacts. Furthermore, mobile applications often receive as input security-sensitive information, including user IDs and passwords, as well as credit-card, social-security and bank-account numbers. Guaranteeing that this private data is not exposed to unintended observers is an essential security requirement. Application providers are encouraged to test their applications thoroughly and to use static and dynamic program-analysis tools in order to discover leakage of private data. Unfortunately, using such tools effectively is very challenging when dealing with confidentiality issues. The problem is that in mobile applications, only a few pieces of private data—such as the device ID, geographical location, contacts and calendar events—are accessed using well-known standard libraries. Using the out-of-the-box configuration of a program analysis tool may indeed help in detecting leakage of such data. However, commercial and enterprise mobile applications deal with large amounts of application-specific private data, such as user IDs and passwords, health records, and credit-card, bank-account and social-security numbers. The default configuration of a program-analysis tool is likely to miss unauthorized release of custom sensitive data. On the other hand, configuring these tools to detect application- specific data leakage is a non-trivial task, which may require accessing the source code of the application to infer the specific program points through which private data enters and exits the application. In order to perform this advanced configuration, one has to be not only versed in security, but also intimately familiar with the application’s source code and functionality. This paper presents Labyrinth, a privacy enforcement system for the mobile environment. Labyrinth advances the state of the art in usable security because it allows for visually configuring—directly on the application running on the mobile device—the application-specific sources of confi- dentiality, through which private data enters the application. Thanks to this visual approach, the security configuration of the run-time enforcement policy can be completed without requiring access to the application’s source code. Instead, the visual configuration for application-specific confidentiality sources is automatically integrated with the default security configuration, which accounts for private data entering the application through well-known libraries, such as the Android LocationManager. Labyrinth respects the Bring Your Own Device (BYOD) paradigm because it does not require instrumenting the entire operating system of the mobile device. Conversely, once an application’s security configuration is completed, Labyrinth instruments only that application so that, at run time, the value of each field that was labeled as a source of private data is logged. Furthermore, Labyrinth automatically instruments the application in order to log other security-sensitive data that is not associated with a visual field in the UI. Such data includes the device’s unique ID, Subscriber Identity Module 2015 16th IEEE International Conference on Mobile Data Management 978-1-4799-9972-9/15 $31.00 © 2015 IEEE DOI 10.1109/MDM.2015.69 279

Transcript of Labyrinth: Visually Configurable Data-Leakage Detection in ...Labyrinth respects the Bring Your Own...

Page 1: Labyrinth: Visually Configurable Data-Leakage Detection in ...Labyrinth respects the Bring Your Own Device (BYOD) paradigm because it does not require instrumenting the entire operating

Labyrinth: Visually ConfigurableData-leakage Detection in Mobile Applications

(Invited Industrial Paper)

Marco Pistoia Omer TrippIBM T. J. Watson Research Center

Yorktown Heights, New York, USA

Email: {pistoia,otripp}@us.ibm.com

Paolina CentonzeIona College

New Rochelle, New York, USA

Email: [email protected]

Joseph W. LigmanIBM T. J. Watson Research Center

Yorktown Heights, New York, USA

Email: [email protected]

Abstract—Mobile devices have revolutionized many aspects ofour lives. We use smartphones and tablets as portable computersand, often without realizing it, we run various types of security-sensitive programs on them, such as personal and enterprise emailand instant-messaging applications, as well as social, banking,insurance and retail programs. These applications access andtransmit over the network numerous pieces of private informa-tion, including our geographical location, device ID, contacts,calendar events, passwords, and health records, as well as credit-card, social-security, and bank-account numbers. Guaranteeingthat no private information is exposed to unauthorized observersis very challenging given the level of complexity that theseapplications have reached. Furthermore, using program-analysistools with out-of-the-box configurations in order to detect con-fidentiality violations may not yield the desired results becauseonly a few pieces of private data, such as the device’s ID andgeographical location, are obtained from standard sources. Themajority of confidentiality sources (such as credit-card and bank-account numbers) are application-specific and require carefulconfiguration.

This paper presents Labyrinth, a run-time privacy enforce-ment system that automatically detects leakage of private dataoriginating from standard as well as application-specific sources.Labyrinth features several novel contributions: (i) it allows forvisually configuring, directly atop the application’s User Interface(UI), the fields that constitute custom sources of private data; (ii)it does not require operating-system instrumentation, but reliesonly an application-level instrumentation and on a proxy thatintercepts the communication between the mobile device and theback-end servers; and (iii) it performs an enhanced form of value-similarity analysis to detect data leakage even when sensitive data(such as a password) has been encoded or hashed. Labyrinthsupports both Android and iOS. We have evaluated Labyrinthexperimentally, and in this paper we report results on production-level applications.

I. INTRODUCTION

Mobile applications have access to different categories ofprivate information residing on our smartphones and tablets,such as the unique device ID, current geographical location,calendar events and contacts. Furthermore, mobile applicationsoften receive as input security-sensitive information, includinguser IDs and passwords, as well as credit-card, social-securityand bank-account numbers. Guaranteeing that this privatedata is not exposed to unintended observers is an essentialsecurity requirement. Application providers are encouragedto test their applications thoroughly and to use static anddynamic program-analysis tools in order to discover leakage

of private data. Unfortunately, using such tools effectively isvery challenging when dealing with confidentiality issues. Theproblem is that in mobile applications, only a few pieces ofprivate data—such as the device ID, geographical location,contacts and calendar events—are accessed using well-knownstandard libraries. Using the out-of-the-box configuration of aprogram analysis tool may indeed help in detecting leakageof such data. However, commercial and enterprise mobileapplications deal with large amounts of application-specificprivate data, such as user IDs and passwords, health records,and credit-card, bank-account and social-security numbers.The default configuration of a program-analysis tool is likelyto miss unauthorized release of custom sensitive data. Onthe other hand, configuring these tools to detect application-specific data leakage is a non-trivial task, which may requireaccessing the source code of the application to infer the specificprogram points through which private data enters and exits theapplication. In order to perform this advanced configuration,one has to be not only versed in security, but also intimatelyfamiliar with the application’s source code and functionality.

This paper presents Labyrinth, a privacy enforcementsystem for the mobile environment. Labyrinth advances thestate of the art in usable security because it allows forvisually configuring—directly on the application running onthe mobile device—the application-specific sources of confi-dentiality, through which private data enters the application.Thanks to this visual approach, the security configuration ofthe run-time enforcement policy can be completed withoutrequiring access to the application’s source code. Instead,the visual configuration for application-specific confidentialitysources is automatically integrated with the default securityconfiguration, which accounts for private data entering theapplication through well-known libraries, such as the AndroidLocationManager.

Labyrinth respects the Bring Your Own Device (BYOD)paradigm because it does not require instrumenting the entireoperating system of the mobile device. Conversely, once anapplication’s security configuration is completed, Labyrinthinstruments only that application so that, at run time, the valueof each field that was labeled as a source of private data islogged. Furthermore, Labyrinth automatically instruments theapplication in order to log other security-sensitive data thatis not associated with a visual field in the UI. Such dataincludes the device’s unique ID, Subscriber Identity Module

2015 16th IEEE International Conference on Mobile Data Management

978-1-4799-9972-9/15 $31.00 © 2015 IEEE

DOI 10.1109/MDM.2015.69

279

Page 2: Labyrinth: Visually Configurable Data-Leakage Detection in ...Labyrinth respects the Bring Your Own Device (BYOD) paradigm because it does not require instrumenting the entire operating

(SIM) card ID, current geographical location, user contacts,calendar events, photographs, and audio files.

The enforcement aspect of the Labyrinth system includesa man-in-the-middle proxy that captures and logs all the datathat the application exchanges with its back-end servers. Thedata captured by the proxy is then scanned in order to detectif any of the fields that had been configured as private onthe application is transmitted in the clear, thereby becomingobservable to unauthorized entities.

Oftentimes, an application may reformat the data it receivesas input, including private data, and transmit over the network amodified version of the original value entered by the user or re-trieved from the device. For example, a social-security numberentered by the user as 123-456-7890 may be reformatted andtransmitted over the network without dashes as 1234567890.Rather than looking for exact matches, Labyrinth incorporatesa detection algorithm that discovers similarity between valueswith configurable distance.

Furthermore, Labyrinth accounts for other operations thatobfuscate private data without ensuring proper confidential-ity. For example, developers typically write application loginroutines in such a way that a user’s password’s hash—ratherthan the password itself—is transmitted over the network tothe authentication server. To authenticate the user, the serversimply has to compare the password hash received from theclient with the one computed locally. This prevents the pass-word from having to be transmitted in the clear. Nevertheless,this approach is not safe unless the entire communicationis encrypted using, for example, the Secure Sockets Layer(SSL) protocol. In fact, without proper encryption, an attackercould steal the user’s password hash and transmit it overthe network to the server along with the corresponding userID, thereby impersonating that user. Labyrinth instruments theauthentication libraries used by the application client-side codein order to log not only the password entered by the user butalso its hash, which enables detecting whether the hash hasbeen transmitted in the clear.

We have implemented the Labyrinth system in full—including the visual configuration, client-side instrumentationand logging, server-side proxy and similarity-detection tool—for both the Android and the iOS platforms. This has allowedus to evaluate Labyrinth on a number of production-levelcommercial and enterprise applications. In this paper, we reportthe confidentiality violations detected by using Labyrinth onthose applications.

The remainder of this paper is organized as follows. Section1 introduces the general architecture of the Labyrinth frame-work. Section 2 describes how Labyrinth instruments mobileapplications in order to collect user-defined and environment-specific sensitive data at run time. This section also presents anovel approach for visual security configuration, which allowseven non-developers to define what Labyrinth should track atrun time. The security analysis itself is the subject of SectionIV. Section V illustrates the results obtained by applyingLabyrinth to a number of production-level applications. Sec-tion VI compares Labyrinth with other solutions in the area ofdata-leakage detection for mobile applications. Finally, SectionVII concludes this paper.

II. ARCHITECTURE

The Labyrinth design is geared toward low overhead andhigh portability. Labyrinth does not require instrumenting theentire operating system of a mobile device, but acts at the levelof individual applications. The architecture of the Labyrinthsystem is shown in Figure 1. At run time, Labyrinth usesa Packet Analyzer to collect all the data that an applicationexchanges with any remote Application Server. Once the datais collected, Labyrinth’s security analyzer detects if privatedata has been exchanged in the clear, which constitutes aconfidentiality violation. Labyrinth will dynamically terminateany unauthorized communication of private data in the clearbetween the client and the server via the proxy interface.

In order to establish what information has to be consideredprivate, Labyrinth is equipped with an integrated Visual Con-figuration Framework that gets compiled into the applicationitself. On Android, the Visual Configuration Framework isembedded directly into the application at the level of theDalvik code. This operation does not even require accessingthe application’s source code. On iOS, the source code isnecessary in order to compile the framework into the appli-cation, but no source-code editing is required. Therefore, theVisual Configuration Framework allows even non-developersto instrument an application with a security configuration thatreflects application-specific confidentiality properties. At runtime, the private data accessed by the application either viauser input or through standard libraries is collected by theinstrumentation layer and stored in a database. Subsequently,that data is compared with the data collected at run time bythe Packet Analyzer. If a match is found, the confidentialitywarning is reported.

Data Analyzer

Mobile Application

Data Collector

Visual ConfigurationFramework

Database

ApplicationLogic

Packet Analyzer

Application Server

Similarity Analyzer

Reports

Fig. 1. Architecture of the Labyrinth System

Oftentimes, applications reformat the data entered by theend users or retrieved through standard sources, as explainedin Section I. In such cases, an analysis looking for a straightmatch between the data entered at the application level andthe data that the application communicates to the remoteserver would fail at reporting a vulnerability. For this reason,Labyrinth is augmented with a Similarity Analyzer that iscapable of detecting data leakages even when data has beenreformatted.

280

Page 3: Labyrinth: Visually Configurable Data-Leakage Detection in ...Labyrinth respects the Bring Your Own Device (BYOD) paradigm because it does not require instrumenting the entire operating

According to well-known secure-coding guidelines, anapplication should never send passwords in the clear to theremote server, even when the communication is encrypted.Rather, the application should be written in such a way thatthe user’s password is first hashed using a standard message-digest cryptographic algorithm, and only the password hash issent over the network to the server. At that point, the servercan authenticate the user by simply comparing hashes.

Unfortunately, mere hashing of passwords is not sufficientto guarantee confidentiality unless the communication is alsoencrypted, because an attacker sniffing the network couldextract the password hash, available in the clear, and resubmit itas part of a fake authentication process, thereby impersonatingthe user. The Similarity Analyzer would not be capable ofdetecting any match between the original password enteredby the user and the hash subsequently transmitted over thenetwork. Therefore, Labyrinth instruments the message-digestlibraries in order to log not only the original password enteredby the user, but also its hash.

We have implemented Labyrinth for both the Android andthe iOS platforms. On the client side, Labyrinth is based onapplication instrumentation. The instrumentation code servesboth to visually configure the security policy of the analysisand to collect any security-sensitive data entered by the user orgathered from the environment at run time, as will be describedin detail in Section 2. On the proxy-server side, Labyrinthexposes an extension point that enables integration with anythird-party Packet Analyzer. We chose to use Charles,1 anHTTP proxy / HTTP monitor / Reverse Proxy that enablesa developer to view all of the HTTP and SSL / HTTPS trafficbetween their computer or mobile device, and the Internet. Thisincludes requests, responses and the HTTP headers (whichcontain the cookies and caching information). Last, the detailsof the security data-leakage analysis that we have implementedare presented in Section IV.

III. APPLICATION INSTRUMENTATION

Labyrinth embeds instrumented classes into the subjectapplication that load at run time. This is accomplished ina platform-dependent manner. The foundations of Labyrinthare instrumented classes that extend standard ApplicationProgramming Interfaces (APIs) with enterprise features. Forexample, Labyrinth replaces standard UI classes such as views,buttons and form fields with instrumented versions providedas part of the Labyrinth framework.

The most basic extensions execute standard class behavior,but have code that allows Labyrinth to log all the private dataentered by end users, such as user ID, password, and credit-card number, as well as data that the application retrievesfrom the environment, such as the device’s location. The datalogged by the Labyrinth framework is stored in a database forsubsequent comparison with the data logged at run time by thePacket Analyzer. Figure 2 illustrates how Labyrinth augments amobile application with a visually configurable instrumentationframework.

Embedding the instrumentation layer into an application isdone in a platform-specific manner. Labyrinth provides differ-ent implementations for Android- and iOS-based applications.

1 http://www.charlesproxy.com.

Mobile Application

Visual Configuration Framework

Application Logic

Application Program Interfaces

Automatic Instrumentation

Device-specific Capability Capture

Transport Layer

Fig. 2. Application Instrumentation through Visual Configuration

For Android, the instrumentation layer is backed by a dynamicclass loader that, at launch time, for each class requiringspecialized behavior, loads a subclass that implements the ad-ditional logging logic. The Labyrinth Android class loader canbe injected into the application bundle even after compilation,directly inside the Android Application Package (APK) file ofthe application, even when the application’s source code is notavailable. This is achieved through Dalvik bytecode rewriting.

The iOS platform, on the other hand, does not use classloading. For this reason, Labyrinth relies on the categoryfeatures of Objective-C and Swift, which allow any class tobe augmented with additional features. This is accomplishedthrough method swizzling, the iOS-specific process of changingthe implementation of an existing selector. Method swizzling isenabled by the fact that method invocations in Objective-C andSwift can be modified at run time by changing how selectorsare mapped to underlying functions in the class’ dispatch table.Instrumenting an iOS application does not require editing itssource code. Nevertheless, the source code is necessary inorder to compile the additional instrumentation logic into theapplication bundle.

As shown in Figure 2, the Labyrinth instrumentationframework supports visual configuration via a Device-specificCapability Capture module, which enables pop-up contextmenus for enabling or disabling the logging of custom userdata. Using standard gestures provided by the platform, contextmenus are displayed allowing for selection of source inputs.

With Labyrinth, any administrator—even one lackingknowledge of the internal structure or the code—can easily

281

Page 4: Labyrinth: Visually Configurable Data-Leakage Detection in ...Labyrinth respects the Bring Your Own Device (BYOD) paradigm because it does not require instrumenting the entire operating

create an instrumentation configuration, which specifies thefeatures to be tracked at run time. Such features can beapplication specific or environment-dependent. Figures 3 and4 show what the visual configuration of application-specificfeatures looks like, on iOS and Android, respectively, whileFigure 5 demonstrates the visual configuration of environment-specific features to track at run time on the iOS platform. Theresulting configuration is then used to create a customizedinstrumented version of the application that is subsequentlydeployed in place of the original application. This approachgreatly improves the usability of the security administration ofa mobile application.

Fig. 3. Application-specific Visual Security Configuration on iOS

Fig. 4. Application-specific Visual Security Configuration on Android

Fig. 5. Environment-specific Visual Security Configuration on iOS

IV. SECURITY ANALYSIS

The core of Labyrinth is real-time analysis of the instru-mented application. This step involves three components:

1) Client Log: The client log is a product of the instrumen-tation code. As described in Section I, it records bothfields that are generally regarded as sensitive and customfields specified by the user (via visual configuration) assensitive.

2) Proxy Log: The proxy server intercepts outbound trafficfrom the application.

3) Analysis Server: The analysis server matches the fieldsfrom the client log against the proxy log. This is done inreal time as the proxy intercepts traffic from the device.

There are two main challenges in executing the above scheme:

1) Robustness: Precise matching of sensitive fields againstthe proxy log is brittle. The application may send outvariants of these fields (for example, by removing dashesor hashing the value), in which case matching fails and afalse negative occurs.

2) Efficiency: Real-time matching requires high efficiency.Otherwise the slowdown may affect usability, and evenmore severely, harm the intended functionality of appli-cations on the device as outgoing requests they send aredelayed.

These two challenges are the subject of the remainder of thissection.

A. Robustness

The example below is taken from the Flight CustomerService Agent iOS application, which is one of theapplications we used to evaluate Labyrinth. Thisexample illustrates how sensitive data is sometimessent out having undergone certain transformations:

client log : [email protected] log : {"password":"***","username":"johndoe"}

282

Page 5: Labyrinth: Visually Configurable Data-Leakage Detection in ...Labyrinth respects the Bring Your Own Device (BYOD) paradigm because it does not require instrumenting the entire operating

In this particular case, [email protected] is logged assensitive user input per the visual configuration. Theapplication then extracts and sends only johndoe (that is, theprefix up to @) as the value of the "username" field. Thisapplication-specific transformation defeats naive searching forsensitive inputs in the proxy log.

We address this challenge by enabling a measure of fuzzymatching that is robust to data transformations such as theone described above. In particular, instead of equality (orcontainment) checking, Labyrinth performs string-similarityanalysis. This yields a quantitative notion of similarity betweenclient-logged and proxy-logged values.

String metrics are appropriate, as private data often man-ifests as strings of American Standard Code For InformationInterchange (ASCII) characters [1, 2, 3]. These include de-vice identifiers—such as the International Mobile EquipmentIdentifier (IMEI) and International Mobile Subscriber Identity(IMSI) numbers—Global Positioning System (GPS) coordi-nates, and Inter-program Communication (IPC) parameters.

Many string metrics have been proposed to date [4].Two simple and popular metrics, which are also efficientlycomputable, are the following:

a) Hamming Distance: This metric assumes that thestrings are of equal length. The Hamming distance betweentwo strings is equal to the number of positions at whichthe corresponding symbols are different, as indicated by theindicator function δc1 �=c2 :

ham(a, b) = Σ0≤i<|a|δc1 �=c2(a(i), b(i))

In another view, Hamming distance measures the number ofsubstitutions required to change one string into the other.

b) Levenshtein Distance: The Levenshtein string metriccomputes the distance between strings a and b as leva,b(|a|, |b|)(abbreviated as lev(|a|, |b|)), where

lev(i, j) =

⎧⎪⎪⎨⎪⎪⎩

max(i, j) if min(i, j) = 0

min

⎛⎝

lev(i− 1, j) + 1lev(i, j − 1) + 1lev(i− 1, j − 1) + δai �=bj

⎞⎠ otherwise

Informally, lev(|a|, |b|) is the minimum number of single-character edits—insertion, deletion or substitution—needed totransform string a into string b. An efficient algorithm forcomputing the Levenshtein distance is bottom-up dynamicprogramming [5]. The asymptotic complexity is O(|a| · |b|).

Another dimension of robustness is to handle string trans-formations beyond plain text. Private information is sometimesreleased following standard hashing or encoding transforma-tions, such as the Base64 scheme, as illustrated in Figure 6.These are beyond the discriminative power of string metrics.Fortunately, the transformations that commonly manifest inleakage scenarios are all standard, and there is a small numberof such transformations [2].

To account for such transformations, Labyrinth applieseach of them to the values extracted from the client log, therebyexploding private values into multiple representations. This isdone lazily according to the standard encoding and hashingfunctions exercised by the client application.

TelephonyManager tm =g e t S y s t e m S e r v i c e (TELEPHONY SERVICE ) ;

S t r i n g ime i = tm . g e t D e v i c e I d ( ) ; / / s o u r c eS t r i n g encodedIMEI =

Base64Encoder . encode ( im e i ) ;Log . i ( encodedIMEI ) ; / / s i n k

Fig. 6. Adaptation of the DroidBench Loop1 Benchmark, Which Releasesthe Device ID Following Base64 Encoding

B. Efficiency

While Hamming and Levenshtein are efficient string met-rics, applying them to a large volume of data can still poten-tially create performance challenges. An important observationthat we leverage is that both metrics, and in particular Leven-shtein (which is the metric used by Labyrinth by default), aremore efficient at solving multiple small queries than a singlebig query.

We exploit this by decomposing private values into indi-visible information units. These are components of the privatevalue that cannot be broken further. In our specification, thephone, IMEI and IMSI numbers consist of only one unit ofinformation. Location, on the other hand, is an example of adata structure that consists of several distinct information units.These include the longitude and latitude values, and for eachof these values, their integral and fractional parts. Pleasingly,beyond posing as a significant performance optimization, de-composition also enables more granular—and thus also moreaccurate—comparisons.

A second performance optimization, necessary forLabyrinth’s role as a real-time privacy-enforcement solution, isdifferential (or incremental) comparison between the client andproxy logs. Log analysis is triggered only when new entriesare created in the client and/or proxy logs. When this happens,(i) newly introduced sensitive values are matched against theentire proxy log, and (ii) all sensitive values are matchedagainst newly introduced proxy log entries.

V. EVALUATION

We implemented Labyrinth for both the Android and iOSplatforms, and evaluated this technology on six iOS and sixAndroid applications, for a total of twelve applications. Wepicked these applications from a sample of enterprise andcommercial applications based on the amount of input theycollect, either directly from the user, through the UI, orindirectly, from the device. We noticed that applications thatare very intensive in collecting input are also more likely torelease information to unauthorized observers. The results ofthe evaluation are shown in Table I.

Of the six iOS applications, five are enterprise applicationsand the remaining one is a commercial application.

The first five applications in the table are written in Swiftand developed for industrial purposes.2 These applications areall compatible with iOS V8 and designed for usage withinenteprises. As Table I shows, the categories of these first five

2Due to their sensitive role in industry, we were advised not to disclosedetails of these applications.

283

Page 6: Labyrinth: Visually Configurable Data-Leakage Detection in ...Labyrinth respects the Bring Your Own Device (BYOD) paradigm because it does not require instrumenting the entire operating

Application Type Platform Category Data Leaked

A Enterprise iOS Travel

user namepasswordfirst namelast namephotographemployee ID

B Enterprise iOS Travel

user namepasswordfirst namelast namehome airport codeemployee ID

C Enterprise iOS Government —

D Enterprise iOS Retail —

E Enterprise iOS Financialuser namepassword hash

Swift Weather Commercial iOS Weather location

Realtor Commercial Android Lifestyle location

Business Plan & Start Startup Commercial Android Business

user namepasswordfirst namelast nameemail address

myHomework Student Planner Commercial Android Education —

Stock Maniac Hong Kong Commercial Android Puzzle

user namepasswordfirst namelast nameemail addressagegender

Ringtone Scanner Commercial Android Tools device ID

British Bingo Commercial Android Game device IDTABLE I. EXPERIMENTAL RESULTS

applications are related to travel, retail, finance and govern-ment. The evaluation of these five applications was performedbefore they were delivered to customers. The vulnerabilitieswere subsequently corrected. Numerous vulnerabilities werefound in these enterprise applications. This serves evidencethat even sensitive applications, written with caution andsubjected to thorough code reviews, still suffer from subtleinstances of unauthorized information release.

The sixth application, SwiftWeather, is an open-source iOSapplication, developed in Swift language.3 This application cansupport iPhones 4, 4S, 5, 5S, 6 and 6 Plus, and uses Apple’sToday widgets.

All of the six Android applications reported in Table Iare commercial applications, each belonging to a differentcategory. The first five are available on Google Play, and thefifth one on Android Freeware.4

Out of the six iOS applications in Table I, four exhibitedserious confidentiality problems. A and B revealed user name,password, first and last name, and employee ID. In addition, Aexposed the user’s photograph, while B gave away the homelocation of the user by revealing the closest airport to the user’shome. C and D did not exhibit any confidentiality issue sincethey correctly encrypt the entire client/server communication.E leaked the user name and the password hash, both in theclear. Finally, SwiftWeather exposed the user’s geographiclocation, both in the request made by the client to the serverand in the response from the server to the client. For Labyrinthto detect both flows of communication it was essential toconfigure the Packet Analyzer to capture not only the packets

3https://github.com/JakeLin/SwiftWeather.4http://androidfreeware.net/.

sent by the client to the server but also those sent by the serverto the client.

Except for application myHomework Student Planner, allthe Android applications we analyzed exhibit confidentialityproblems. The Realtor application leaks the user’s location.Ringtone Scanner and British Bingo expose the device ID—a vulnerability that can no longer affect iOS applications,since on iOS the device ID has become programmaticallyinaccessible. Applications Business Plan & Start Startup andStock Maniac Hong Kong allow the client to communicatewith the backend on an unencrypted channel, thereby lettingprivate information, such as the user name, password, firstname, last name and email address of the user to becomeaccessible to unauthorized users. In addition, Stock ManiacHong Kong reveals the user’s age and gender. The fact thatthese applications are available through Google Play—theofficial application store for Android, operated by Google—shows how common it can be for an end user to download andrun mobile applications and, by doing so, inadvertently exposeprivate information to unauthorized parties.

The overhead of introducing instrumentation is relativelysmall. For example, in Android, the application footprint sizeusually increases by less than 100 KB when it is instrumented.At run time, our measurements indicate that the heap alloca-tion and number of objects increases by 20% or less, whileconsumption of CPU cycles increases by 2.9%.

VI. RELATED WORK

This paper makes contributions in several fields, particu-larly in the field of mobile data security and mobile applicationmanagement. In this section, we compare our work with thestate of the art in these fields.

284

Page 7: Labyrinth: Visually Configurable Data-Leakage Detection in ...Labyrinth respects the Bring Your Own Device (BYOD) paradigm because it does not require instrumenting the entire operating

A. Mobile Data Security

The state-of-the-art system for real-time privacy monitoringis TaintDroid [1]. TaintDroid features tolerable run-time over-head of about 10%, and can track taint flow not only throughvariables and methods but also through files and messagespassed between applications.

TaintDroid has been used, extended and customized byseveral follow-up research projects. Jung, et al. [6] enhanceTaintDroid to track additional sources, including contacts,camera, and microphone. They use the enhanced version ina field study, which revealed 129 of the 223 applications theystudied as vulnerable. 30 out of 257 alarms were judged asfalse positives. The Kynoid system [7] extends TaintDroidwith user-defined security policies, which include, for example,temporal constraints on data processing as well as restric-tions on destinations to which data is released. Beside thefact that Labyrinth supports both Android and iOS, and notjust Android, the main difference between Labyrinth and theapproaches above is that Labyrinth exercises fuzzy reasoning,in the form of statistical classification, rather than enforcinga clear-cut criterion. As part of this, Labyrinth factors intothe privacy judgment the data values flowing into the PacketAnalyzer. This provides additional evidence beyond data flow.

From this point of view, Labyrinth is more similar toBayesDroid [8], with several important differences: Labyrinth(i) supports also the iOS platforms, (ii) does not rely onoperating-system instrumentation, but only on application-level instrumentation, (iii) accounts for custom confidentialitysources via a visual configuration tool, (iv) bypasses thechallenge of capturing code-level release points (which mayoccur, for example, in native code) by relying on a proxyto intercept outbound communication, and (v) uses the proxyto further account for incoming communication, which pro-vides additional leverage, as illustrated in the case of theSwiftWeather application, discussed in Section V.

Zhou, et al. [9] have designed and implemented a systemthat detects instances where Android application without anypermission may acquire sensitive information about the user,including the user’s identity, healthcare records, geolocationand driving routes. This motivates the notion of user-centricprivacy enforcement, as embodied in Labyrinth, which is notpart of Zhou, et al.’s contributions. The authors also present amitigation mechanism to overcome at least some of the privacyviolations detected by their system. Their mechanism, similarto packet padding [10, 11], is demonstrated to be effective,though no evidence is provided for absence of side effects.

Nadkarni and Enck [12] observe that modern mobile op-erating systems, such as Android, iOS and Windows Mobile,allow tasks to be completed by stringing together a collec-tion of purpose-specific user applications, potentially causingaccidental disclosure of private information. To address thisproblem, they introduce the Aquifer framework, which letsthe developer define secrecy restrictions that protect the entireUI workflow defining the user task. Unlike Aquifer, however,Labyrinth allows end users to visually configure the privacypolicy of an application through a visual system.

AppIntent [13] also aims at detecting privacy violations,though this system is limited to Android mobile applicationsand does not support iOS. There are some similarities with

Labyrinth. Most notably, AppIntent outputs a sequence of GUIinteractions that lead to transmission of sensitive data, thushelping an analyst determine if that behavior was intended ornot. This is reminiscent of Labyrinth’s UI extraction process,though our goal with UI interaction is to capture views andlink them with sensitive information release.

AppFence [14] retrofits the Android operating system toimplement the following two controls for use with unmodifiedapplications: (i) covertly substituting shadow data in place ofdata that the user wishes to keep private; and (ii) blockingnetwork transmissions that contain data the user made availableto the application for on-device use only. While both AppFenceand Labyrinth perform privacy enforcement via data mocking,Labyrinth acts at the application level and does not requiremodifications to the platform, which is key for portability.

Like Labyrinth, the Aurasium system [15] also bypasses theneed to modify the Android OS while providing much of thesecurity and privacy that users desire. Aurasium automaticallyrepackages applications with additional user-level sandboxingand policy enforcement code, which closely monitors theapplication’s behavior for security and privacy violations. Still,the notion of user-centric privacy enforcement (and its visualconfiguration) is absent from Aurasium.

To the best of our knowledge, Labyrinth is the first mobilesecurity analysis tool that significantly enhances the researchfield of usable security by providing a visual tool for security-policy configuration that allows administrators to define theconfidentiality rules for a mobile application by interactingdirectly with the application’s UI at run time. This is doneby enabling administrators to use the mobile platform’s na-tive gesture capabilities in order to select application-specificsecurity fields as well as environment-dependent private data.

B. Mobile Application Management

Labyrinth enforces data privacy at the level of each in-dividual mobile application as opposed to the granularity ofthe entire mobile device. From this point of view, Labyrinthhas goals similar to those of MyExperience [16], LiveLab[17], and SystemSens [18] in providing a comprehensivesystem for monitoring and logging application usage andcontext. For example, MyExperience is a logging platformfor Microsoft Windows-based smartphones that provides asensor/trigger/action abstraction for monitoring applications insitu. LiveLab uses a background monitoring component thatcould be reprogrammed and reconfigured after deploymentbut can only be run on jail-broken iPhones. The SystemSensmonitoring platform has goals similar to those of MyExperi-ence and LiveLab, namely gaining a better understanding ofmobile usage by monitoring application parameters as well asadditional application context, including location and wirelessnetwork conditions. It includes a server-side data collectionarchitecture and can be reconfigured. However, Labyrinth goesbeyond these systems and provides a general framework foradding enterprise security tracking and enforcement. Labyrinthapplication modeling is tailored towards the creation of a visualmodel, which is more lightweight than the other frameworks.

VII. CONCLUSION AND FUTURE WORK

In this paper, we presented Labyrinth, a run-time privacyenforcement system that automatically detects leakage of pri-

285

Page 8: Labyrinth: Visually Configurable Data-Leakage Detection in ...Labyrinth respects the Bring Your Own Device (BYOD) paradigm because it does not require instrumenting the entire operating

vate data originating from standard as well as application-specific sources. Labyrinth features several novel contribu-tions: (i) it allows for visually configuring, directly atop theapplication’s UI, the fields that constitute custom sources ofprivate data; (ii) it does not require operating-system instru-mentation, but relies only an application-level instrumentationand on a proxy that intercepts the communication between themobile device and the back-end servers; and (iii) it performsan enhanced form of value-similarity analysis to detect dataleakage even when sensitive data (such as a password) hasbeen encoded or hashed. Labyrinth supports both Android andiOS. We have evaluated Labyrinth experimentally, and in thispaper we reported results on production-level applications.

In the future, we intend to extend Labyrinth and make itpart of commercial and enterprise production environments.This will require pre-instrumenting enterprise applicationswith the Labyrinth framework and pre-defining, for thoseapplications, the relevant security policies, which label bothapplication-specific and environment-dependent confidentialitysources. The Packet Analyzer proxy will dynamically detectmatching of the values communicated between the client andthe server with any of the security-sensitive values configuredon the application. However, the Packet Analyzer will alsocorrect any vulnerability occurring at run time by obfuscatingconfidential data, partially or in its entirety, before it istransmitted, in order to prevent leakage of security-sensitivevalues.

We also intend to develop specialized capabilities to handleencrypted communication with unauthorized third parties, suchas analytics and advertising servers. This will require usage ofstatistical learning methods and/or databases of signatures.

REFERENCES

[1] W. Enck, P. Gilbert, B. Chun, L. P. Cox, J. Jung, P. Mc-Daniel, and A. N. Sheth, “TaintDroid: An Information-flow Tracking System for Real-time Privacy Monitoringon Smartphones,” in OSDI, 2010.

[2] P. Hornyack, S. Han, J. Jung, S. Schechter, andD. Wetherall, “These Aren’t the Droids You’re Lookingfor: Retrofitting Android to Protect Data from ImperiousApplications,” in CCS, 2011.

[3] Z. Yang, M. Yang, Y. Zhang, G. Gu, P. Ning, and X. S.Wang, “AppIntent: Analyzing Sensitive Data Transmis-sion in Android for Privacy Leakage Detection,” in CCS,2013.

[4] J. Piskorski and M. Sydow, “String Distance Metrics forReference Matching and Search Query Correction,” inBIS, 2007.

[5] R. A. Wagner and M. J. Fischer, “The String-to-StringCorrection Problem,” J. ACM, vol. 21, no. 1, pp. 168–173, 1974.

[6] J. Jung, S. Han, and D. Wetherall, “Short Paper: En-hancing Mobile Application Permissions with Run-timeFeedback and Constraints,” in SPSM, 2012.

[7] D. Schreckling, J. Posegga, J. Kostler, and M. Schaff,“Kynoid: Real-Time Enforcement of Fine-grained, User-defined, and Data-centric Security Policies for Android,”in WISTP, 2012.

[8] O. Tripp and J. Rubin, “A Bayesian Approach to Pri-vacy Enforcement in Smartphones,” in USENIX Security,2014.

[9] X. Zhou, S. Demetriou, D. He, M. Naveed, X. Pan,X. Wang, C. A. Gunter, and K. Nahrstedt, “Identity,Location, Disease and More: Inferring Your Secrets fromAndroid Public Resources,” in CCS, 2013.

[10] S. Chen, R. Wang, X. Wang, and K. Zhang, “Side-Channel Leaks in Web Applications: A Reality Today,a Challenge Tomorrow,” in S&P, 2010.

[11] Q. Sun, D. R. Simon, Y. Wang, W. Russell, V. N.Padmanabhan, and L. Qiu, “Statistical Identification ofEncrypted Web Browsing Traffic,” in S&P, 2002.

[12] A. Nadkarni and W. Enck, “Preventing Accidental DataDisclosure in Modern Operating Systems,” in CCS, 2013.

[13] Z. Yang, M. Yang, Y. Zhang, G. Gu, P. Ning, and X. S.Wang, “AppIntent: Analyzing Sensitive Data Transmis-sion in Android for Privacy Leakage Detection,” in CCS,2013.

[14] P. Hornyack, S. Han, J. Jung, S. E. Schechter, andD. Wetherall, “These Aren’t the Droids You’re Lookingfor: Retrofitting Android to Protect Data from ImperiousApplications,” in CCS, 2011.

[15] R. Xu, H. Saıdi, and R. Anderson, “Aurasium: Prac-tical Policy Enforcement for Android Applications,” inUSENIX Security, 2012.

[16] J. Froehlich, M. Y. Chen, S. Consolvo, B. L. Harrison,and J. A. Landay, “MyExperience: A System for insitu Tracing and Capturing of User Feedback on MobilePhones,” in MobiSys, 2007.

[17] C. Shepard, A. Rahmati, C. Tossell, L. Zhong, and P. T.Kortum, “LiveLab: Measuring Wireless Networks andSmartphone Users in the Field,” SIGMETRICS Perfor-mance Evaluation Review, vol. 38, no. 3, 2010.

[18] D. E. H. Falaki, R. Mahajan, “SystemSens: A Toolfor Monitoring Usage in Smartphone Research Deploy-ments,” in MobiArch, 2011.

286