PrIA: A Private Intelligent Assistant

6
PrIA: A Private Intelligent Assistant Aruna Balasubramanian, Niranjan Balasubramanian, Supriyo Chakraborty*, Shashank Jain, and Vivek Tiwari Stony Brook University, *IBM T. J. Watson Research Center ABSTRACT Personalized services such as news recommendations are be- coming an integral part of our digital lives. The problem is that they extract a steep cost in terms of privacy. The service providers collect and analyze user’s personal data to provide the service, but can infer sensitive information about the user in the process. In this work we ask the ques- tion “How can we provide personalized news recommenda- tion without sharing sensitive data with the provider?” We propose a local private intelligence assistance frame- work (PrIA), which collects user data and builds a profile about the user and provides recommendations, all on the user’s personal device. It decouples aggregation and per- sonalization: it uses the existing aggregation services on the cloud to obtain candidate articles but makes the personal- ized recommendations locally. Our proof-of-concept imple- mentation and small scale user study shows the feasibility of a local news recommendation system. In building a private profile, PrIA avoids sharing sensitive information with the cloud-based recommendation service. However, the trade- off is that unlike cloud-based services, PrIA cannot leverage collective knowledge from large number of users. We quan- tify this trade-off by comparing PrIA with Google’s cloud- based recommendation service. We find that the average precision of PrIA’s recommendation is only 14% lower than that of Google’s service. Rather than choose between pri- vacy or personalization, this result motivates further study of systems that can provide both with acceptable trade-offs. 1. INTRODUCTION Personalized intelligence assistance comes at a steep cost to user privacy. Today, there are a range of personalized ser- vices including recommending news articles, recommending music, setting up appointments, answering questions, and much more. Most require the user to fork over their data such as click history, search queries, voice commands. While essential for effective personalization, this data can reveal sensitive information about the users which many would like Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full cita- tion on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- publish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. HotMobile ’17, February 21-22, 2017, Sonoma, CA, USA c 2017 ACM. ISBN 978-1-4503-4907-9/17/02. . . $15.00 DOI: http://dx.doi.org/10.1145/3032970.3032988 to keep private. Unfortunately, there is no viable alterna- tive today. A privacy-conscious user has to choose between getting these IA services and privacy. In this work, we pro- pose a privacy focused solution for one particular IA ser- vice, personalizing news recommendation and discuss issues in scaling the approach to a broader class of IA services. To illustrate the privacy issues in news recommendation consider Alice, a typical news consumer. She has many var- ied interests and uses a cloud-based news recommendations service, such as Google News, to follow news from different sources. Alice logs onto Google’s service; the service in-turn tracks news articles read by Alice and builds a user pro- file for her. The cloud-based service also aggregates articles from large number of news sources, and then uses a recom- mendation system to selects articles for Alice according to her user profile. The user profile built for Alice could reveal sensitive information about her, for example, pertaining to her political preference. We envision PrIA a Private Intelligent Assistant which provides personalized news recommendation similar to ser- vices such as Google News. However, different from these cloud-based services, the main idea in PrIA is to obtain ag- gregated news from the cloud but personalize it locally. PrIA tracks the user activities, builds a profile, and make recom- mendations to the user all from within the user’s personal device and thus does not divulge the user’s interactions to the aggregation service. In effect, PrIA decouples news ag- gregation from personalization. PrIA trades off knowledge of collective user behavior for more personalized local knowledge. Typically, intelligent as- sistant services gather data from a large number of users to build global models on the population at large. These global models are known to significantly improve personal- ization using techniques such as collaborative filtering [6]. By virtue of being private, PrIA does not have access to the global user model. However, we posit that PrIA can build a richer profile about the individual user, by combining textual data across the user’s email, social networks, browsing ac- tivities, and other contextual information. Cloud providers are limited to a smaller slice pertaining to interactions on their specific service or on their ecosystem. For example, the user’s Facebook posts on the Facebook application cannot be used by Google for their recommendation. By building a better user profile with more data, we hypothesize that PrIA can improve its personalization despite not having access to global user behavior. We show the feasibility of a local news recommendation system through a proof-of-concept implementation. This

Transcript of PrIA: A Private Intelligent Assistant

PrIA: A Private Intelligent Assistant

Aruna Balasubramanian, Niranjan Balasubramanian, Supriyo Chakraborty*,Shashank Jain, and Vivek Tiwari

Stony Brook University, *IBM T. J. Watson Research Center

ABSTRACTPersonalized services such as news recommendations are be-coming an integral part of our digital lives. The problemis that they extract a steep cost in terms of privacy. Theservice providers collect and analyze user’s personal datato provide the service, but can infer sensitive informationabout the user in the process. In this work we ask the ques-tion “How can we provide personalized news recommenda-tion without sharing sensitive data with the provider?”

We propose a local private intelligence assistance frame-work (PrIA), which collects user data and builds a profileabout the user and provides recommendations, all on theuser’s personal device. It decouples aggregation and per-sonalization: it uses the existing aggregation services on thecloud to obtain candidate articles but makes the personal-ized recommendations locally. Our proof-of-concept imple-mentation and small scale user study shows the feasibility ofa local news recommendation system. In building a privateprofile, PrIA avoids sharing sensitive information with thecloud-based recommendation service. However, the trade-off is that unlike cloud-based services, PrIA cannot leveragecollective knowledge from large number of users. We quan-tify this trade-off by comparing PrIA with Google’s cloud-based recommendation service. We find that the averageprecision of PrIA’s recommendation is only 14% lower thanthat of Google’s service. Rather than choose between pri-vacy or personalization, this result motivates further studyof systems that can provide both with acceptable trade-offs.

1. INTRODUCTIONPersonalized intelligence assistance comes at a steep cost

to user privacy. Today, there are a range of personalized ser-vices including recommending news articles, recommendingmusic, setting up appointments, answering questions, andmuch more. Most require the user to fork over their datasuch as click history, search queries, voice commands. Whileessential for effective personalization, this data can revealsensitive information about the users which many would like

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full cita-tion on the first page. Copyrights for components of this work owned by others thanACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-publish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected].

HotMobile ’17, February 21-22, 2017, Sonoma, CA, USAc© 2017 ACM. ISBN 978-1-4503-4907-9/17/02. . . $15.00

DOI: http://dx.doi.org/10.1145/3032970.3032988

to keep private. Unfortunately, there is no viable alterna-tive today. A privacy-conscious user has to choose betweengetting these IA services and privacy. In this work, we pro-pose a privacy focused solution for one particular IA ser-vice, personalizing news recommendation and discuss issuesin scaling the approach to a broader class of IA services.

To illustrate the privacy issues in news recommendationconsider Alice, a typical news consumer. She has many var-ied interests and uses a cloud-based news recommendationsservice, such as Google News, to follow news from differentsources. Alice logs onto Google’s service; the service in-turntracks news articles read by Alice and builds a user pro-file for her. The cloud-based service also aggregates articlesfrom large number of news sources, and then uses a recom-mendation system to selects articles for Alice according toher user profile. The user profile built for Alice could revealsensitive information about her, for example, pertaining toher political preference.

We envision PrIA a Private Intelligent Assistant whichprovides personalized news recommendation similar to ser-vices such as Google News. However, different from thesecloud-based services, the main idea in PrIA is to obtain ag-gregated news from the cloud but personalize it locally. PrIAtracks the user activities, builds a profile, and make recom-mendations to the user all from within the user’s personaldevice and thus does not divulge the user’s interactions tothe aggregation service. In effect, PrIA decouples news ag-gregation from personalization.

PrIA trades off knowledge of collective user behavior formore personalized local knowledge. Typically, intelligent as-sistant services gather data from a large number of usersto build global models on the population at large. Theseglobal models are known to significantly improve personal-ization using techniques such as collaborative filtering [6].By virtue of being private, PrIA does not have access to theglobal user model. However, we posit that PrIA can build aricher profile about the individual user, by combining textualdata across the user’s email, social networks, browsing ac-tivities, and other contextual information. Cloud providersare limited to a smaller slice pertaining to interactions ontheir specific service or on their ecosystem. For example, theuser’s Facebook posts on the Facebook application cannotbe used by Google for their recommendation. By building abetter user profile with more data, we hypothesize that PrIAcan improve its personalization despite not having access toglobal user behavior.

We show the feasibility of a local news recommendationsystem through a proof-of-concept implementation. This

PrIA implementation logs information about the user fromtheir browser history and their Facebook and Twitter feedsto create a user profile that represents the user’s core in-terests. The user profile is represented as a graph whichcontains the articles, and the key entities and topics ex-tracted from the articles. Given a set of candidate articlesfrom which to recommend, PrIA ranks the articles based ontheir similarity to the user profile graph by measuring thecentrality of each candidate article within the profile graph.PrIA obtains publicly available candidate news articles, ageneric collection without providing login credentials anddoes not reveal which of these articles that the user likesor reads. Thus, no sensitive information is revealed to theaggregation site.

We conduct a small scale user study with six subjectsover a period of around ten days to show that such a localrecommendation can be provided from laptop-class devices.Based on user feedback, we compare PrIA’s local recommen-dation with Google’s news recommendation service. Unsur-prisingly, PrIA’s recommendations have lower precision interms of the number of recommendations that were usefulto the user. However, despite not having access to collectiveuser knowledge, the average precision of PrIA is only 14%lower compared to Google’s service. In other words, provid-ing the news recommendation service with privacy guaran-tees only requires a modest cost in terms of personalizationeffectiveness. We believe that PrIA’s recommendations canbe made even more precise by tuning our NLP algorithms.

We envision that the PrIA framework can be used to sup-port a large number of IA services starting from news/eventrecommendation systems to question answering to contextaware services. We discuss the privacy, NLP, and systemschallenges in supporting these services through PrIA.

2. THREAT MODELNews aggregation sites which personalize recommenda-

tions do so by recording sensitive data that a user may wantto keep private. Every time the user signs in, the sites logwhich articles she reads, what she shares, and how she ratesthe suggested articles. All this information is used to builda user profile which is then used to make new recommenda-tions. While useful for personalization, this profile containsmany bits of sensitive information about the user such asher political inclinations, social habits, and medical condi-tions [9], all of which is now available to the recommendationservice provider in their cloud storage. A privacy focuseduser would like to avoid sharing this information with thenews aggregation sites.

Formally, we define the adversary as news aggregation ser-vices that provide personalized news recommendations tologged-in users. For example, Google’s news recommenda-tion service provides news recommendations and require theuser to sign-in to receive the service. Prior work on link-age attacks have demonstrated that personally identifiableattributes (e.g., user IDs) and quasi-identifiable attributes(e.g., age, zip code) can be exploited to uniquely identifyindividuals [11]. There are two additional scenarios to con-sider: (i) Individual news sites may also provide personaliza-tion requiring the use of a login. If the content they provideis available publicly otherwise, the individual sites are alsoadversaries that PrIA targets. (ii) News sites, individual oraggregation, may not require a login but can still track userinterests through fingerprinting or other similar techniques.

We do not consider these as adversaries. We expect that onecan use standard approaches to avoid fingerprinting (e.g..,Prevaricator [12]) or use tor [1] to avoid tracking.

2.1 Utility and Privacy metricsWe now define the utility and the privacy metrics for the

news recommendation system as follows. Let X be the set of

the top-k news articles that are recommended by PrIA and Xbe the subset of articles in X that the user found useful. Also,let Ωs be the set of sensitive behaviors that the user wouldlike to keep private. For example, a possible instantiation ofΩs = Political view, Medical condition, Stock market plan.Finally, let M denote the information shared by the user viaPrIA. For example, in case of the news recommendation sys-tem, M is the request for downloading all the generic RSSfeeds from Google news page.• Utility: We define the utility metric Ψutil as the fraction

of the recommended articles that the user found useful.While for now we consider a binary rating of articles, thesystem can be easily extended to incorporate a more fine-grained measure of rating. Formally,

Ψutil =|X||X| (1)

Ψutil is same as the Precision at k metric (§4).• Privacy: Intuitively, after observing the messages M from

PrIA, we do not want the adversarial belief about a par-ticular sensitive behavior of the user, as defined in Ωs,to increase beyond a particular user-specified threshold,δ. In other words, let the prior adversarial belief about asensitive behavior ω ∈ Ωs be given by P(ω). The poste-rior probability after observing the messagesM is given byP(ω|M). We want the difference of these two probabilitiesto be bounded by δ for every element in Ωs. Formally,

Ψpriv = P(ω|M)− P(ω) (2)

Eqn. 2, ensures that the knowledge gain of an adversary, isbounded by 0 ≤ δ ≤ 1 even after observing the messages Moutput from PrIA.

Thus, the design objective of PrIA is to maximize Ψutil

while ensuring that Ψpriv ≤ δ for all ω ∈ Ωs.

3. PrIA RECOMMENDATION MODELThe PrIA news recommendation system involves: (i) build-

ing a user profile to model user interest, (ii) obtaining a can-didate set of news articles from which to recommend and (ii)designing a ranking algorithm to score and rank the candi-date articles based on how well they match the user profile.

3.1 ArchitectureFigure 1 shows PrIA’s news recommendation architecture.

We assume that user’s personal device (PD) performs mostof the heavy lifting. The personal device is within the user’strust zone, and could be a personal computer or even a de-vice in the user’s private cloud. In our user study, the user’spersonal laptops act as the PD.

The PD builds a profile about the user’s interests based ondifferent sources of information obtained from the user’s dig-ital history starting from the user’s interactions with newsarticles, general browsing data, social media interactions,and other contextual information. The PD obtains the can-didate news article from an aggregation site and recom-

cnn.com

nytimes

reuters

washington post

news aggregation servers

Personal Device (PD)

PrIA App

recommendations

request news data

RSS data feed

No Sign-In required

Figure 1: The PrIA News Recommendation System

mends news articles. recommended news articles are sentto the recommendation app on the user’s phone.

We build the model on a laptop device rather than thesmartphone because news recommendation is essentially apush-based service and is not latency sensitive. We laterdiscuss how more latency-sensitive applications can be sup-ported in the PrIA framework (§5).

3.2 Creating User ProfilesA basic content-based recommendation algorithm uses a

simple bag-of-words [10] representation of the user’s digitalhistory. A given candidate article is recommended if thepairwise similarity between the candidate article and everyarticle in the user’s digital history is high [10].

The problem is that often the bag-of-words model alonedoes not effectively capture the key semantics of the do-main. It does not adequately capture important conceptsand tends to include a large number of words to representarticles. In the news domain, the topics of the articles (e.g.finance, sports) and the entities mentioned in them (e.g.,people, places, and organizations) reflect the important as-pects of user interest. For example, if the user reads severalarticles about Roger Federer, a tennis player, the user maybe interested in the entity Roger Federer, and the topic oftennis. Therefore, we first process the news articles to an-notate them with the topics and entities mentioned in thearticle, to augment the bag-of-words representation.

For extracting topics, PrIA uses Latent Dirichlet Alloca-tion (LDA) [3], a widely used topic modeling technique toautomatically discover the topics in the user history. Foreach article, LDA assigns a probability distribution over aset of K topics i.e., the probability of the article belonging toeach of the K topics. To extract entities, PrIA uses an off-the-shelf named entity tagger [5], which identifies mentionsof people, places, and organizations in each article.

PrIA uses these extracted topics and entities to create agraph based user profile. The nodes in the graph consist ofthe articles in the user’s history, and the entities and topicsrepresenting the user’s interests. The edges in the graph con-nect each article with the corresponding entities and topicsextracted from the article. This graph-based representationscales easily to large number of nodes and is well-suited torun efficient ranking algorithms.

3.3 Obtaining a candidate set of news articlesIn the current instantiation of PrIA, we download all news

articles from existing news aggregation services (in our im-plementation, we use Google’s news aggregation service).In effect, PrIA decouples news aggregation from personal-ization. News aggregation services such as Google aggregatenews articles, but also provide personalized recommendationto signed-in users. In contrast, PrIA does not require theuser to be signed-in, and the news articles we download aregeneric and not personalized to a user.

PrIA can work with any publicly available news sourceto obtain candidate news articles. In the absence of aggre-gations sites, one could use other sources such as RSS feedsubscriptions or even download news articles from variousnews sources using techniques to avoid fingerprinting [12].

3.4 Recommendation AlgorithmGiven a candidate pool of news articles, the recommenda-

tion task is to score each candidate article based on how wellit matches the user profile. To this end, PrIA uses a variantof the popular page rank technique, known as personalizedpage rank [7].

The key intuition in PrIA is that a candidate article thatis well connected to the user profile graph is likely to be ofinterest to the user. An article that mentions many entitiesand topics of interest to the user is going to be well connectedwithin the user profile graph. The Personalized Page Rankalgorithm measures this connectedness of candidate articles.

Figure 2 shows how the personalized page rank algorithmis applied to the user profile graph. The graph has nodesrepresenting articles, topics, and entities, and edges repre-senting the connection between the articles and the top-ics/entities. For recommendation, the candidate articlesare first directly added as nodes to the user profile graph.Then, the articles are processed to extract topics and enti-ties. If the topics and entities in the candidate article arealso present in the original user profile graph, a new edge isdrawn between the article and corresponding topic/entity.

Once the candidate articles are added, we score each can-didate article by its centrality in the graph (described in de-tail below). We note that such graph based ideas have beenexplored in the context of collaborative search, but existingwork only considers the direct neighbors of the candidatearticles in the user profile graph [8]. In PrIA, we propose amore general approach that uses all nodes in the user pro-file graph via random walks between the candidate articleand the rest of the graph. This technique ensures that weare able to take into account for the global structure of thegraph and the long distance influences when using the rec-ommendation.

Personalized Page Rank (PPR): Our goal is to scorenews articles based on how well they are connected to theuser profile graph. Page Rank is a measure that formalizesthe notion of connectedness on a graph. It measures con-nectedness of each node as the probability that a randomwalker would land on that node after taking long randomwalk over the graph following edges with a probability pro-portional to their weights.

PrIA uses the Personalized Page Rank [7]1 variant thatallows us to explicitly model a notion of core interests forthe user, rather than treat all nodes in the graph uniformly.Given a set of nodes that represent these core interests, PPR

1The word Personalized in the name here refers to searchpersonalization task used in the original paper and is notrelated to the news personalization we are concerned with.

Topics

Entities

Articles Candidate News Articles

User Profile Graph

Figure 2: User profile graph with topic, entities andarticles. The candidate news articles are augmentedto the existing user profile graph and the personal-ized page rank is run on the augmented graph.

modifies the page rank computation such that nodes that arewell connected to these seed nodes score higher. From therandom walk perspective, the walker randomly restarts froma seed node with some seed probability based on the impor-tance of the nodes. The effect is that the candidate articlesthat more easily reachable from the seed nodes receive higherscores. The formal PPR scoring function is shown below.

PPR(v) = αv + (1− αv)∑

v:e(u,v)∈E

P (v|u)PPR(u)

where (u, v) is an edge in the graph, v is a vertex, P (v|u) isthe transition probability, i.e., the probability with which arandom walker at node u will move to node v, and αv is theseed probability.

Transition Probabilities The transition probabilitiesP (v|u) are estimated based on the similarity or affinity be-tween two nodes: (i) Article node -Topic node edges arescored by the fraction of overlap between representative wordsin the topic and the article. (ii) Article node -Entity nodeedges are scored by the number of mentions of the entitywithin the article. (iii) Article node -Article node edges arescored by the fraction of overlapping words in the articlepair.

Seed Probabilities In the random walk, αv is the prob-ability with which the surfer restarts a random walk fromnode v. We estimate these seed probabilities iteratively atthe end of each day to track the user’s changing interests.Initially, we set the seed weights uniformly. At the end ofeach day, we run PPR until convergence and select the topscoring nodes as core nodes and renormalize their weights toassign seed probabilities. The rest of the nodes get a smalldefault probability.

As the user interacts with new data the user profile graphgrows and the user interests change with current events andhappenings [6]. Processing all data to extract topics and en-tities will consume resources. At the end of each day we uni-formly sample a subset of articles visited that day and add itto the profile graph. To avoid recalculating entire topic dis-tributions every day, we use an online version of LDA whichincrementally updates its topic distributions. Further, weadopt a simple hard decay strategy that drops articles thathave low page rank scores and are more than k days old.

3.5 Utility and privacyThe PPR algorithm above ensures that the articles are

ranked based on changing interests of the user. Thus, the top

PrIANews

WebBrowser Facebook Twi4er

PrIALoggerPersonalizedPageRank

LDAEn>tyExtrac>on

ProfileGraphBuilder

Non-personalizedAggregatedNewsAr>cles

Figure 3: The applications in PrIA for news recom-mendation.

k− candidate articles selected, are more likely to cover thetopics that are of interest to the user, maximizing the utilitymetric Ψutil. Further, since every PrIA user downloads thesame set of articles from the aggregation site, the users donot give any personal information to the aggregation site.In the absence of fingerprinting and any collusion betweenthe aggregation site and other individual news sources, PrIAprovides strong privacy guarantees.

4. IMPLEMENTATION AND USER STUDYWe implemented PrIA as described in the previous sec-

tion and conduct a user study with 6 subjects to comparethe performance of PrIA with existing cloud-based news rec-ommendation systems. The user study was approved by ourInstitutional Review Board.

Figure 3 shows the components in the implementation.The PrIA news recommendation runs on the user’s laptop(which acts as their personal device) and the user’s smart-phone. We use a server to facilitate communication betweenthe user’s laptop and the smartphone over a secure channel.

The implementation consists of four applications:1. Logger – The logger collects the web browsing history

from Chrome, and the user’s Twitter and Facebook feedsthrough the respective APIs. The logger runs on both thelaptop and smartphone and the user profile graph is builtover this collected history. Logged data from the smart-phone is encrypted and sent securely over to the user’slaptop via a server. The encryption key is known only tothe laptop/smartphone user so that the data cannot bedecrypted by the interim server. The server deletes anydata once the transfer is complete.

2. Profile Builder – The user profile graph is built at theuser’s laptop. The Stanford Named Entity tagger [5] isused to extract entities and the well known LDA algo-rithm [3] is used to extract the topics. We choose the top20 topics to add to the profile graph.

3. PPR Recommendation – The recommendation algorithmperiodically downloads all headline articles (40 to 50 ar-ticles) from Google News, a popular news aggregationsite. These articles are obtained without sign-in; there-fore the news articles are not personalized for the user.In other words, we use the Google News service in-lieuof a generic news aggregation service. PrIA runs the rec-ommendation algorithm over the candidate articles andpushes the recommended articles to the smartphone.

All computation over the user’s personal data is performedwithin the trust domain of the user and the interactions out-side the trust domain relates to either retrieving the articlesfrom the individual providers or obtaining non-personalizedcandidate articles from aggregation sites.

Slide right if you find the article useful. Slide left if you do not find the article useful of if you have already read the article. Please provide your feedback for both tabs. If the tab hangs, move to the next tab (the problem will resolve itself)

Is Wayne Rooney Way Past His Prime?

Wayne Rooney the Golden Boy of Manchester United has been missing a lot of questions locally and th The 100: Jason Rotherberg teases the Season 3 trailor is on its way!| melty.com Published on 2 Dec 2015 20:00:08 Now that we have a premiere data for Season 3 of the 100 we t Amazon rolls out ‘blue shade’ tool for Fire tablets to allow people to read at n 36 shares Take your Kindle to bed: Amazon rolls out blue shade tool for its tablets to allow Ronda Rousey shows her face in public for first time following UFC 193 knockout By Andre Vergara

Figure 4: The PrIA app that shows news recom-mendations by Google and PrIA in two tabs.

4.1 User Study resultsWe conduct a small scale user study with six subjects

summarized in Table 1. PrIA is downloaded on each userslaptop and Android phone. The users provided feedback onthe news recommendation provided by PrIA and an alter-native cloud-based news recommendation service. The userfeedback was collected for an average of 10 days for eachsubject, ranging from 6 to 15 days.

Comparisons with a Cloud-based news recommen-dation service: We compare PrIA with personalized newsrecommendation service provided by Google. The subjectsin the user study sign-in to obtain this personalized recom-mendation. Note that this is for comparisons alone. In thereal deployment, the user will not have to be logged in, sincePrIA does not require cloud support. We choose the top 20Google’s suggested news articles everyday. The recommen-dations from Google’s service and PrIA are presented in 2tabs to the user as shown in Figure 4. The tabs randomlyalternate between showing Google’s news recommendationsand PrIA news recommendation.

User feedback: The recommendation feedback providedby the subjects is binary: the subject marks if the recom-mended article is useful or not. The users ratings are sentto our server for evaluation. The rating only consists of theranks of each articles that the user marked as useful, anddoes not contain any details about the article itself. In ourcurrent instantiation, we do not keep track of how long andhow often the user engages with the news articles. Trackingthe real utility of the recommended news article is beyondthe scope of this work.

4.2 User study resultsFirst we analyze the feasibility of running PrIA on user

devices. Table below shows the amount of data collected fora single user.

Even though the actual size of articles that the user browsescan be hundreds of MBs, PrIA prunes the graph periodicallyby removing older articles and topics/entities that occur lessoften in the user’s history. Building the model the first timeis compute intensive, but on subsequent days, PrIA spendsless than 10 minutes processing the data and updating theuser profile graph. Together these numbers suggest promisefor the feasibility of implementing PrIA on commodity per-sonal user devices.

Study Participant Details# Users 6# Days 6-15# Articles rated 1956Statistics from a sample user

Average Profile size 28.7MBAverage # Nodes 3546Avgerage # Edges 29652Average time per day (after first day) 8.7 mins

Table 1: User study statistics.

The users rated a total of 1956 articles in total and found754 articles useful. Table below compares the news recom-mendation provided by PrIA and Google’s news recommen-dation service in terms of three metrics: Fraction of relevantarticles (i.e. those the users marked as useful) in the top five(precision@5), and top ten (precision@10) recommendationsand average precision (AP). AP is the average of precisionsat ranks with relevant articles. Given two rankings withsame number of relevant articles, AP is higher for the rank-ing in which the relevant articles are ranked higher thannon-relevant ones .

Recomm. Alg. Precision@10 Precision@5 APGoogle 0.45 0.44 0.51PrIA 0.38 0.35 0.44

Table 2: Comparing the precision of Google’s newsrecommendation and PrIA news recommendationacross three different metrics.

Since this is a small scale study, it is difficult to drawstrong conclusions. However, the average precision of PrIAdrops only by 14% relative to Google’s news recommenda-tion, which is surprising considering that there is no col-lective knowledge in PrIA. Further, our implementation iswithout any extensive tuning of the out-of-the-box compo-nents, a common necessity when using NLP algorithms.

5. DISCUSSIONOur work targets cloud applications that build models on

user’s textual data and use this model to provide personal-ized services.Personalizing from large publicly available resources:News aggregation sites have access to large amounts of arti-cles from which they recommend. In this work, we assumedthat PrIA only downloads a snapshot of the articles thatGoogle News displays once every day, which may miss somerelevant articles. Downloading all aggregated news articlesto local devices is not a scalable approach due to various rea-sons including data size, storage and bandwidth constraints,or some limits set by the public data source. One strategy toaddress this is to download subsets of articles that are likelyto be of interest to the user and then make recommendationsfrom this subset.

A key challenge of course is: what strategies can be usedto query a public data source to download a slice of thearticles both efficiently and without leaking sensitive in-formation? Building the user profile graph is a first step:the profile can help guide the selection of articles to down-load. Further, Private Information Retrieval (PIR) [4] tech-niques have been used to maintain the privacy of a client

that queries a database, without letting the database infersensitive information about the client. Similar to PIR tech-niques, we can devise strategies, that for a given word querywould generate multiple synonymous word queries to maskthe intended query from the news provider.Personalization over private data: Another class of ser-vices are intelligent assistance over user’s private data. Userstoday collect a large amount of personal data from severalsources and require assistance in querying and understand-ing the data. Imagine Alice read an email earlier about anevent. Later, she is trying to recollect the event but doesnot remember many details about the event. She should beable to use simple natural language to ask questions to anIA system to retrieve this information.

The first step towards performing these tasks to buildquestion-answering capabilities on the user’s own personaldata. In our earlier work, we show how search can be per-formed locally on the phone [2]. We envision a step furthertowards personalization that not only searches local contentbut also answers natural language questions. Systems suchas Alexa and Siri provide a similar service, but require thecloud infrastructure.

A key challenge in supporting services such as question-answering on local data is in running the models and storingdata locally on the phone. For push-based applications suchas recommendation services, it is sufficient for PrIA to modelthe service in a trusted domain and push the recommen-dation to the smartphone periodically. However, question-answering services are pull-based where responsiveness is keyand smartphone may not always be connected to the user’slaptop or private cloud. Recent works have shown the fea-sibility of running machine learning models on phones, forsimilar privacy concerns [13]. Such approaches can alleviatethe challenges in supporting personalization of private data.Personalization and Context based services (public+ private data): Finally, a class of IA applications can bethought of as using a hybrid model, that use both privateand public data. To illustrate a use case for a hybrid IAsystem, let says Alice likes the Harry Potter book series.She is visiting London where J.K.Rowling (the author ofthe Harry Potter series) is giving a talk. The IA systemshould notify Alice that she might be interested in going toRowling’s talk. Notice that several pieces of information isrequired for this notification. The IA service needs to knowthat Alice will be visiting London on certain days, and theservice needs to know the public information about whereJ.K.Rowling is giving a talk.

Such applications again fit within the PrIA framework,and the challenges in enabling these services are the sum ofthe problems in providing personalization over public andprivate data.

6. CONCLUSIONSAs the scope and performance of personalized IA services

improve, they become more indispensible, and it becomesever more critical to characterize the privacy trade-offs andinvestigate privacy preserving solutions. As a case in point,news recommendation services reveal news browsing history,which can be used to infer many sensitive personal informa-tion about the users. A local private news recommendationprovides an interesting trade off: it trades collective knowl-edge for a much bigger slice of personal data, which is keptprivate. This work shows the feasibility of building a local

recommendation system, which minimizes leaking personalinformation to the service provider. Looking ahead, for scal-ing to the larger class of intelligence assistance problems, weidentify significant challenges in modeling privacy, and ad-dressing algorithmic and system challenges in getting theseservices to run on constrained devices, all ripe areas for fu-ture investigation.

AcknowledgementsWe thank our shepherd Landon Cox for his insightful com-ments that helped improve the paper. We thank the anony-mous reviewers for their constructive feedback. We thankour 6 user study participants for downloading PrIA and pro-viding feedback on the recommendations. This work waspartially supported by CEWIT (Center for Excellence inWireless and Information Technology), Stony Brook.

7. REFERENCES[1] Tor Project. https://www.torproject.org/.

[2] A. Balasubramanian, N. Balasubramanian, S. Huston,D. Metlzer, and D. Wetherall. FindAll: A LocalSearch Engine for Mobile Phones. In Proc. ACMCoNext, December 2012.

[3] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latentdirichlet allocation. Journal of machine Learningresearch, 3(Jan):993–1022, 2003.

[4] B. Chor, E. Kushilevitz, O. Goldreich, and M. Sudan.Private information retrieval. J. ACM, 45(6), Nov.1998.

[5] J. R. Finkel, T. Grenager, and C. D. Manning.Incorporating non-local information into informationextraction systems by gibbs sampling. In ACL, 2005.

[6] F. Garcin, K. Zhou, B. Faltings, and V. Schickel.Personalized news recommendation based oncollaborative filtering. In Web Intelligence, 2012.

[7] T. H. Haveliwala. Topic-sensitive pagerank. In WWW,2002.

[8] L. Li and T. Li. News recommendation via hypergraphlearning: encapsulation of user behavior and newscontent. In Proceedings of the sixth ACM internationalconference on Web search and data mining, pages305–314. ACM, 2013.

[9] J. Liu, P. Dolan, and E. R. Pedersen. Personalizednews recommendation based on click behavior. In IUI,2010.

[10] Y. Lv, T. Moon, P. Kolari, Z. Zheng, X. Wang, andY. Chang. Learning to model relatedness for newsrecommendation. In Proceedings of the 20thinternational conference on World wide web, pages57–66. ACM, 2011.

[11] A. Narayanan and V. Shmatikov. Myths and fallaciesof personally identifiable information.Communications of the ACM, 53(6):24–26, 2010.

[12] N. Nikiforakis, W. Joosen, and B. Livshits.Privaricator: Deceiving fingerprinters with little whitelies. In Proceedings of the 24th InternationalConference on World Wide Web, pages 820–830.ACM, 2015.

[13] O. Riva, S. Nath, and E. Fernandes. Appstract:On-the-fly app content semantics with better privacy.In Proc. ACM Mobicom, 2016.