Shock: Aggregating Information While Preserving Privacy · with users’ email clients, taking...

14
Information Systems Frontiers 5:1, 15–28, 2003 C 2003 Kluwer Academic Publishers. Manufactured in The Netherlands. Shock: Aggregating Information While Preserving Privacy Eytan Adar and Rajan Lukose Information Dynamics Lab, HP Laboratories, 1501 Page Mill Road, Palo Alto, CA 94304, USA E-mail: [email protected] E-mail: [email protected] Caesar Sengupta Encentuate Pte. Ltd., 151 North Buona Vista Road #02-45, Singapore 139347, Republic of Singapore E-mail: [email protected] Josh Tyler Information Dynamics Lab, HP Laboratories, 1501 Page Mill Road, Palo Alto, CA 94304, USA E-mail: [email protected] Nathaniel Good School of Information Management and Systems, UC Berkeley, 102 South Hall, Berkeley, CA 94720 E-mail: [email protected] Abstract. An important problem facing large, distributed orga- nizations is the efficient management and distribution of infor- mation, knowledge, and expertise. In this paper we present the design and implementation of a low-cost, extensible, flexible, and dynamic peer-to-peer (P2P) knowledge network that helps ad- dress this problem. This system, known as Shock, is designed to protect the privacy of user’s personal information, such as email, web browsing habits, etc., while making that information avail- able for knowledge management applications. It reduces partici- pation costs for such applications as expert-finding, allows highly targeted messaging, and enables novel kinds of ad hoc conversa- tion and anonymous messaging. The system is tightly integrated with users’ email clients, taking advantage of email as habitat. Key Words. privacy, peer-to-peer networks, expertise location, anonymous messaging, recommendation systems, collaborative filtering, knowledge management, computer-supported coopera- tive work 1. Introduction The pervasiveness of information technology in the work practices of organizations has raised two issues that are in tension with each other. The first is- sue arises out of the recognition that work in many organizations is information and knowledge intensive and that much of an organization’s continuing value resides in its so-called intellectual capital or knowl- edge assets. As a result, knowledge management ap- plications such as expertise location (Streeter and Lochbaum, 1988), capturing an organization’s memory (Ackerman and Halverson, 1998), etc., have received much attention. The effectiveness of these applications is in part determined by the information made avail- able to them. An organization interested in maximiz- ing the value of its knowledge assets must make as much information as possible available for knowledge management applications. The second issue is the workplace privacy of users. With so much of our work practice being in the form of email, web browsing, instant messaging, etc. the pos- sibilities for easy electronic monitoring are manifold. While employee privacy rights in the United States are limited (though still uncertain (Lewis, 2001)), European laws are much more strict in this regard and constrain what information an employer may gather (Brittenden, http://www.ictur.labournet.org/Online.htm). Regardless of the legal implications, employee- company relationships and employee morale depend 15

Transcript of Shock: Aggregating Information While Preserving Privacy · with users’ email clients, taking...

Page 1: Shock: Aggregating Information While Preserving Privacy · with users’ email clients, taking advantage of email as habitat. Key Words. privacy, peer-to-peer networks, expertise

Information Systems Frontiers 5:1, 15–28, 2003C© 2003 Kluwer Academic Publishers. Manufactured in The Netherlands.

Shock: Aggregating Information While Preserving PrivacyEytan Adar and Rajan LukoseInformation Dynamics Lab, HP Laboratories, 1501 Page MillRoad, Palo Alto, CA 94304, USAE-mail: [email protected]: [email protected]

Caesar SenguptaEncentuate Pte. Ltd., 151 North Buona Vista Road #02-45,Singapore 139347, Republic of SingaporeE-mail: [email protected]

Josh TylerInformation Dynamics Lab, HP Laboratories, 1501 Page MillRoad, Palo Alto, CA 94304, USAE-mail: [email protected]

Nathaniel GoodSchool of Information Management and Systems, UC Berkeley,102 South Hall, Berkeley, CA 94720E-mail: [email protected]

Abstract. An important problem facing large, distributed orga-nizations is the efficient management and distribution of infor-mation, knowledge, and expertise. In this paper we present thedesign and implementation of a low-cost, extensible, flexible, anddynamic peer-to-peer (P2P) knowledge network that helps ad-dress this problem. This system, known as Shock, is designed toprotect the privacy of user’s personal information, such as email,web browsing habits, etc., while making that information avail-able for knowledge management applications. It reduces partici-pation costs for such applications as expert-finding, allows highlytargeted messaging, and enables novel kinds of ad hoc conversa-tion and anonymous messaging. The system is tightly integratedwith users’ email clients, taking advantage of email as habitat.

Key Words. privacy, peer-to-peer networks, expertise location,anonymous messaging, recommendation systems, collaborativefiltering, knowledge management, computer-supported coopera-tive work

1. Introduction

The pervasiveness of information technology in thework practices of organizations has raised two issuesthat are in tension with each other. The first is-sue arises out of the recognition that work in many

organizations is information and knowledge intensiveand that much of an organization’s continuing valueresides in its so-called intellectual capital or knowl-edge assets. As a result, knowledge management ap-plications such as expertise location (Streeter andLochbaum, 1988), capturing an organization’s memory(Ackerman and Halverson, 1998), etc., have receivedmuch attention. The effectiveness of these applicationsis in part determined by the information made avail-able to them. An organization interested in maximiz-ing the value of its knowledge assets must make asmuch information as possible available for knowledgemanagement applications.

The second issue is the workplace privacy of users.With so much of our work practice being in the form ofemail, web browsing, instant messaging, etc. the pos-sibilities for easy electronic monitoring are manifold.While employee privacy rights in the United Statesare limited (though still uncertain (Lewis, 2001)),European laws are much more strict in this regard andconstrain what information an employer may gather(Brittenden, http://www.ictur.labournet.org/Online.htm).Regardless of the legal implications, employee-company relationships and employee morale depend

15

Page 2: Shock: Aggregating Information While Preserving Privacy · with users’ email clients, taking advantage of email as habitat. Key Words. privacy, peer-to-peer networks, expertise

16 Adar et al.

on a company’s privacy policy (Weisband and Reinig,1995)

From the point of view of knowledge management,however, a user’s electronic trails hold great poten-tial value. For example, information about keywordsor phrases in a user’s email, what web-based resourcesand documents they access, and even with whom theycommunicate and how frequently they do it can aid inthe location of expertise and implicit knowledge withinthe organization.

Thus, in order for knowledge management solutionsto be effective, organizations must balance these twoseemingly conflicting demands. On the one hand theymust have as much information as possible availableto them, especially user specific data closely associ-ated with work practice. On the other hand the orga-nization must provide users with a solution they canbe comfortable with, and make allowances for pri-vacy concerns. One potential solution is to ask users toscreen the information themselves. This solution hasvery high participation costs for users. A much bettersolution would be to have a system designed to protectprivacy from the beginning, eliminating the need forexplicit screening, and as a result, radically reducingparticipation costs.

It is worth emphasizing that with respect to privacy,we are not here concerned with protecting the privacy ofusers from monitoring by those who administer the in-frastructure. This is not possible in practice. Instead, weare concerned with protecting the privacy of users bymaking use of (potentially private) information withoutrevealing that information to other users in the organi-zation or to a centralized store of data. As previousresearch has argued (Ackerman, 1994; Grinter, 1997),collaborative systems must pay heed to the needs andconcerns of users if they are to be used, and so we areattempting to make users feel comfortable, through atechnological solution, that their privacy is protected.

In this paper, we will describe the architectureand implementation of a system called Shock (SocialHarvesting of Community Knowledge) that takes asone of its primary design goals the protection of userprivacy while retaining the power afforded by localobservation of users. Shock is client software, down-loaded onto users’ computers, that automatically formsa detailed profile of a user’s behavior and environmentand stores it locally on the user’s computer. The clientsare able to talk to each other in a decentralized man-ner through a peer-to-peer network architecture. Theresulting system forms a low-cost, extensible, flexible,

and dynamic knowledge network within an organiza-tion that allows users to find others with specific ex-pertise, send highly targeted messages, and engage innovel kinds of ad hoc conversations with the option oftrue anonymity.

We will argue that Shock improves upon manyprior expert-finding systems by increasing the detailand richness of data available to the system and byreducing participation costs through automatic profil-ing. Both of these benefits derive from the privacy-preserving peer-to-peer architecture since users wouldbe reluctant to allow access to their detailed informationor automatic profiling without it. In addition, Shock’sarchitecture allows some novel features such as trulyanonymous and secure questions, responses, and con-versations. Another important feature of our implemen-tation is tight integration with our user base’s currenttools, specifically the Microsoft Outlook client, whichfurther reduces participation costs by taking advantageof how users treat their email client as their habitat(Ducheneaut and Bellotti, 2001).

In the next section, we discuss related work in moredetail and show how Shock compares to other systems.We then present a system walk-through, followed by apresentation of the design rationale, and description ofthe main system components. Next, we present a pre-liminary review of the system based on a large pilot testcurrently underway, and we conclude with a summaryand directions for future work.

2. Related Work

All expert-finding systems must operate by taking a de-scription of the expertise sought and matching it againstprofile information associated with possible experts.Expert-finding systems most often differ in the kindsof information available for the experts’ profiles and theway that profile information is gathered. These choicesstrongly impact user participation costs and the rich-ness of available data for profile generation.

Most prior systems have a centralized architecture,in which profile storage and matching occurs atcentral servers. For example, expert databases, suchas Microsoft’s SPUD (Davenport and Prusak, 1998),HP’s CONNEX1 and SAGE2 contain repositories ofmanually entered expertise data. While providing arepository of expert knowledge, manual entry requiresconstant maintenance and overhead to maintainusefulness. In addition, these databases generally

Page 3: Shock: Aggregating Information While Preserving Privacy · with users’ email clients, taking advantage of email as habitat. Key Words. privacy, peer-to-peer networks, expertise

Shock: Aggregating Information While Preserving Privacy 17

contain knowledge that is too broad to answer thespecific queries that are frequently used in expertfinding searches.

Automated expert finding systems such asAckerman’s Answer Garden (Ackerman and Malone,1990), ER (McDonald and Ackerman, 2000), P2Pq,3

Askme,4 Tacit,5 MIT’s Expert Finder (Vivacqua andLieberman, 2000), and also MITRE’s Expert Finder(Mattox, Maybury, and Morey, 1998), are solutions tothe high costs of maintaining expert databases. Suchsystems typically use some algorithm to classify in-formation and/or expertise and then distribute this tothe users seeking answers. Despite their automation,expert finding systems tend to limit themselves to min-ing very specific data sets. For example, MIT’s ExpertFinder recommends Java-programming experts basedon profiles created from the users source code by com-paring Java class usage to an external model. Similarly,the commercial Tacit product mines publicly availabledocuments and e-mail available on centralized servers(Microsoft Exchange, for example). Referral Web(Kautz, Selman, and Shah, 1997) utilizes the “six de-grees of separation” phenomena to capitalize on exist-ing social networks inside of an organization to dis-cover people who are experts and the paths of peopleto them by mining co-authorship graphs for publishedpapers.

Each of these automated systems tends to use infor-mation for profile generation that is already availablewithin the organization. For example, the ER systemused data from the host organization’s software devel-opment system and the call center help system. Sys-tems that go further by using more personal data, suchas email (e.g. Tacit), must address sensitive privacyconcerns (Schwartz and Wood, 1992), usually throughexplicit user screening of published data. This explicitscreening entails higher user participation costs. Fur-thermore, since they are centralized systems, they donot have access to other personal data such as webbrowsing or file viewing habits, which are easily hadat the client.

Centralized expert systems also tend to remove con-trol from the user about when they should be contacted.While systems such as P2Pq act as question brokers androute questions to users, the more traditional systemssuch as MITRE’s Expert Finder and CONNEX simplylist which experts a user can communicate with.

One of the main benefits of modern knowledge man-agement systems over other solutions such as pub-lic mailing lists and newsgroups is the reduction of

information overload. The attempt to connect individ-uals to the right information or information source islargely the subject of recommender systems (Konstanet al., 1997; Terveen et al., 1997; Shardanand and Maes,1995). Our own system draws inspiration from thetechniques used in recommender systems to reducedisruptions and increase productivity for individuals.Recommender systems have successfully employedvarious metrics to determine the information mostrelevant for a given user. For example, link analysis(Terveen et al., 1997), explicit (Konstan et al., 1997;Shardanand and Maes, 1995) and implicit (Morita andShinoda, 1994) recommendations. While they providevaluable lessons, the specific techniques of such sys-tems traditionally depend on public participation, ac-tive communities and sustained interest in a given topicto alleviate problems associated with data sparsity andbootstrapping. In addition, recommender systems uti-lize a centralized repository and thus are not compatiblewith our approach. Recent work in collaborative filter-ing and e-commerce looks into solutions to privacyissues (Ackerman, Cranor, and Reagle, 1999; Canny,2002), but has yet to be implemented and tested in livesystems.

Shock is similar to decentralized systems such asDEMOIR (Yimam, 2000) and Yenta (Foner, 1997).Although privacy concerns are addressed and incor-porated into their architectures, they are fundamen-tally different from Shock in their design. Yenta (Foner,1997) is a system created primarily for a passive userto gain value from a distributed network of users, byimplicitly determining what items that user may findinteresting. DEMOIR (Yimam, 2000) is a hybrid archi-tecture proposal that allows users to share information,but lacks the ability to transfer and target expertise ina P2P manner or provide a level of privacy control andguarantees that Shock provides. For example, the pro-posed DEMOIR architecture does not make explicitallowances for anonymous questions and answers.

Finally, the majority of the systems mentioned re-quire use of a secondary user interface. Shock, bycontrast, is embedded in the user’s email client, and pro-vides rapid integration into existing tools and knowl-edge bases within an organization. Prior work hasshown that email is a habitat for most office workers,and in designing Shock we sought to incorporate thetools for finding expert knowledge into the tools thatpeople use on a daily basis.

While the prior work addresses pieces of the largerproblem of finding experts to solve problems, we

Page 4: Shock: Aggregating Information While Preserving Privacy · with users’ email clients, taking advantage of email as habitat. Key Words. privacy, peer-to-peer networks, expertise

18 Adar et al.

Fig. 1. Outlook integration.

believe that no system exists that preserves privacy,provides targeted profiling while being cost effectiveand simple to maintain.

3. Usage Scenario (Walk-through)

To demonstrate the Shock system, we present a walk-through of a likely scenario. A user, Alice, wishes tofind out the experiences of others in her organizationwith the peer-to-peer system called Freenet. Fig. 1shows how her Outlook client looks with Shock in-stalled. A Shock toolbar, labeled (1) in Fig. 1, appearsbelow the standard Outlook toolbar. In the left paneis a special folder labeled “Shock” (2) that containssent and received questions. In this view, a receivedquestion is selected for viewing, and the message viewpane has a column labeled “score” (3) which shows therelevance of the received questions. By asking a Shockquestion, Alice will be starting a conversation that willappear in her Sent Questions folder (2).

The current system allows many different kinds ofquestions, each with different features enabled in the

interface (the implementation allows easy extensibilityand flexibility for defining new types). After clickingthe “Ask A Question” toolbar item, she selects a “Sur-vey/Poll” type and is presented with the screen shownin Fig. 2. She enters a descriptive question (1), definesher multiple choices (2), allows responders the abilityto reply with text comments in addition to their pollchoice (3), decides not to be anonymous (4), and setsthe message to expire in one week (5). In addition, shedecides to target her message by including a “Filter”in it. Filters, which we discuss in detail later, allowthe sender of a Shock message to (optionally) specifyhighly detailed criteria to help target the message. Shedecides that only users who have “Freenet” installedshould receive her message (6).

Clicking on the send button results in the mes-sage being sent over the Shock network according toa protocol we describe in Section 4.2. Other Shockclients on the network score the message (described inSection 4.4). The score will depend on the text of thequestion as well as the filters. Each client has a thresh-old value, set by the user, which Shock message scoresmust exceed in order to be presented in the interface.

Page 5: Shock: Aggregating Information While Preserving Privacy · with users’ email clients, taking advantage of email as habitat. Key Words. privacy, peer-to-peer networks, expertise

Shock: Aggregating Information While Preserving Privacy 19

Fig. 2. Asking a Shock question with a poll and a software filter.

If a user, Bob, is presented a message, it appearsas an Outlook item in the Received Questions folder(Fig. 1, label 2). The user can respond to the messagethrough the interface shown in Fig. 3. Here the user se-lects an item from the list, enters optional comments,decides if the response should be anonymous, and alsowhether the reply is to be encrypted so that only thesender can see it. Note that the sender, Alice, will neverknow that the question, with its filters, was presented toBob unless a reply is made and Bob chooses to explic-itly divulge his identity in that reply. Back at Alice’sclient, we see a partial view of the results of severalresponses (Fig. 4). This dynamically generated pageappears when she clicks on the corresponding Outlookitem in her “Sent Questions” folder. Notice the graph-ical poll summary and the threaded conversation. Tofacilitate conversations, all users who do not filter theinitial message will have this view and can participate

in the discussion (excepting those messages which areencrypted for someone else).

4. System Architecture

The Shock architecture was designed to simultaneouslyaddress various (and traditionally conflicting) designprinciples. Specifically, clients were to be easily in-stalled, maintained and used, privacy and anonymitywere to be respected, automation was used where pos-sible, and flexibility for additional features was built in.The most likely candidate that satisfied these variousconcerns was a P2P or P2P/server hybrid system wherethe bulk of work was done on each client.

This design allows Shock clients to automaticallycollect highly detailed information from the user’s be-havior while still maintaining privacy for that profile

Page 6: Shock: Aggregating Information While Preserving Privacy · with users’ email clients, taking advantage of email as habitat. Key Words. privacy, peer-to-peer networks, expertise

20 Adar et al.

Fig. 3. Responding to a Shock question with a poll.

through local storage. That is, all data collected is assecure as the source data. Anonymity is optionally pro-vided to users through the decentralized network topol-ogy and randomized laundering of messages. This issimilar to the way Gnutella6 and Crowds (Reiter andRubin, 1999) operate and essentially implements ananonymous bulletin board.

Fig. 5 abstractly illustrates a Shock client and theShock network. All user interactions with the systemare done through a UI module that includes both aweb style interface as well as a Microsoft Outlook in-terface. The client serves the role of generating userprofiles though Observer and Bootstrapping modules.These automate the process of building the user pro-file through indexing documents the user interacts withand cataloging other facts that are known about themor their systems (installed programs, for example). Ad-ditionally, the Shock client determines which incom-ing messages are to be shown to the user through aScoring module. The Network module serves to routequestions to and from other peers as well as interact-ing with the Shock message server. Each module isdescribed further below.

The client software is primarily written in Java withsome code in Visual Basic (VB) to interface to Mi-crosoft Outlook and Internet Explorer. While currently

being tested on Windows OS machines, we expect toport the software to other systems in the future.

4.1. System UIIn keeping with Shock’s design goal of encouragingusage by reducing participation cost, Shock’s user in-terface has been designed to seamlessly integrate into anormal user’s work practice. The present Shock user in-terface contains functions for generating and respond-ing to messages as well as general controls for Shock.

Since one of Shock’s key functions is messaging,we decided to integrate Shock with the most commonmessaging platform being used by Shock’s target de-mography, Microsoft Outlook. Therefore, the presentversion of Shock adds a special folder called ‘Shock’to Microsoft Outlook (Fig. 1). All of Shock’s messagesare stored in this folder and this folder behaves like aregular Outlook folder (i.e. the user can view, sort, ordelete messages in the same way). As mentioned ear-lier in the walkthrough, this folder contains two subfolders called “Sent Questions” and “Received Ques-tions”. The messages in the Received Questions containan additional property called “Score” which indicatesthe relevance of the particular question for that user.(Scoring will be described in detail in Section 4.4).

Shock messages are inherently threaded in natureand hence each instance in the Shock folder is in real-ity a message thread. When the message is opened, itdisplays the complete discussion thread for that mes-sage (Fig. 4). This view is similar to that of a web-basednews group. In practice, we have found this hybrid,email-newsgroup message representation model to behighly suited to Shock’s nature.

Shock adds a special toolbar called “Shock” to Out-look (Fig. 1). This toolbar allows the user to easily sendand reply to Shock and to change the various config-uration options. Like many other Microsoft Windowsproducts, Shock also presents an icon to the user in theSystem Tray. This icon changes to notify the user ofnew messages and can be used to access a special menu,which gives the user quick access to most Shock con-figuration parameters. Shock also provides a browser-based interface that can be used by users who prefernot to use the Outlook interface.

The use of these established and familiar modelsin Shock gives the novice user many affordances thathelp reduce the learning curve involved in using a newsystem and hence encourages usage.

In order to provide an interface between users whohave Shock installed and those that do not, the system

Page 7: Shock: Aggregating Information While Preserving Privacy · with users’ email clients, taking advantage of email as habitat. Key Words. privacy, peer-to-peer networks, expertise

Shock: Aggregating Information While Preserving Privacy 21

Fig. 4. Partial view of a conversation summary for a Shock question. The summary dynamically tallies the associated poll, and shows thethreaded group conversation.

Fig. 5. The Shock architecture.

allows users to send Shock messages as email. Next tothe “send” button in the interface is a “Send as Email”button (see Fig. 2). A Shock message sent this way isembedded in an email message that can be sent to any

user. When the message arrives at a user’s mail client,those with Shock installed will have the message auto-matically filtered and moved to the appropriate Shockfolder. Users who do not have Shock installed will be

Page 8: Shock: Aggregating Information While Preserving Privacy · with users’ email clients, taking advantage of email as habitat. Key Words. privacy, peer-to-peer networks, expertise

22 Adar et al.

presented with a regular email message that containsthe question and a link to the Shock install website. Thisfeature encourages the growth of the user base, and canalso facilitate the explicit creation of groups since onlyusers who received the first message will be aware ofand be able to participate in that message thread. Inaddition, this feature, combined with the robust pro-filing mechanism, offers a flexible implementation ofcomputational email (Borenstein, 1992).

4.2. Shock network architectureAs discussed above, maintaining privacy andanonymity is a cornerstone of the Shock systemdesign. In order to achieve these properties, Shockcan function as a purely decentralized P2P system.However, we have created a hybrid design that extendsthe design and provides these same features whileproviding scalability and persistence for messages.

4.2.1. Peer-to-peer networking. In a peer-to-peernetwork, each node connects directly to other nodes;there is no central server. The most well known exam-ples of true P2P networks are the file-sharing programsGnutella7 and Freenet (Clarke et al., 2000). In a typicalP2P system, when a client is added to the network, itconnects randomly to other peers. Messages are thensent through these links, eventually propagating to theother peers through a series of “hops” from peer topeer. In Shock, we randomize the way messages movefrom one peer to the next, making it practically im-possible to detect a message’s origin or destination.This randomization is similar to technique used by theCrowds system for anonymous web browsing (Reiterand Rubin, 1999).

In addition, a peer-to-peer network has the advan-tage that it has no single point of failure. If a client’sconnection client is terminated, the client will find anew client to which it can connect. This constantlyadapting network topology ensures that an organiza-tion’s knowledge base is never offline, and accommo-dates a mobile and dynamic organization.

Peer-to-peer networks have some limitations, how-ever. In particular, they typically have difficulty scal-ing effectively to large numbers of users (Yang andGarcia-Molina, 2001). The inherent message redun-dancy and multiple retransmissions of each message(one for each “hop”) add network overhead. The sec-ond problem with such a pure P2P model is the lack ofmessage persistence. The network lacks the ability toremember messages that were sent and received, and as

clients enter and leave the network, information is lost.While Shock can function in this mode, the preferredmethod is through the hybrid model illustrated in Fig. 5and described in the following section.

4.2.2. Implementation details. Shock clients main-tain both links to each other (typically 3-5 connections)as well as knowledge about a centralized server, thatacts as a simple message store. Periodically clients canretrieve information from the server (Fig. 5, link 2).Messages arrive at the server in one of two ways. Thefirst is through non-anonymous message transmission.In this mode the user sends a non-anonymous messageand will simply deliver the message (1) to the server.The second method of delivery leverages the P2P aspectof the system to deliver anonymous messages. In thismode a client sending a message will pass the messageto a randomly chosen neighbor client (3). That neigh-bor will randomly choose to transmit the message tothe server or to another neighbor. The message is thenpassed through a number of clients (4 and 5), finallyending up at the server (6). Note that neither the inter-mediate clients nor the final server is able to determinefrom where the message originated. The initial clientmay query the server to ensure that the message arrived(if not, it can be resent).

For additional reliability, in the case where the serveris not part of the shock network for some reason (e.g. ithas crashed or is being rebooted), the clients will switchto operating in a purely P2P mode and the system willfunction as before.

4.2.3. Security. A final feature of the network archi-tecture is the ability to provide private (and anonymous)communication between two clients (although this canbe scaled to groups). To provide this feature, Shockautomatically generates a public/private key pair foreach new message. The public key is included with themessage, and the private key is retained on the client’smachine. When another user desires to send a secureresponse to a message, the response is encrypted withthe original message’s public key, so that it can only bedecrypted by the owner of the message’s private key,namely, the creator of the message. Key pairs are notreused so that a message sender cannot be identified bythe user’s public key, thus preserving anonymity.

By using both the encryption and messageanonymization features, Shock uniquely providessupport for secure, anonymous interactions. Thepublic-key encryption scheme enables each party to

Page 9: Shock: Aggregating Information While Preserving Privacy · with users’ email clients, taking advantage of email as habitat. Key Words. privacy, peer-to-peer networks, expertise

Shock: Aggregating Information While Preserving Privacy 23

Fig. 6. A sample message.

guarantee that messages are sent only to the intendedrecipient, who can remain anonymous nevertheless.Thus, the two parties can communicate back and forthwithout revealing their identities. This feature helpscomplete the system’s ability to allow the full spectrumof possibilities for conversations, from fully identifiedto completely anonymous and private.

4.2.4. Message format. There are two primary mes-sage types in Shock, Introduction and Response. In-troductions are sent to ask a question or start a con-versation and Responses are the messages that follow.Fig. 6 illustrates a simple introduction. The messagecontains three types of fields: general headers, condi-tionals, and form objects. Response messages followthe same general structure but do not ordinarily con-tain conditionals. General headers (a) simply specifyinformation such as the message subject, time stamp,a globally unique identifier (GUID), and an identifierfor the user who sent the message (this may be “anony-mous”). Messages include the previously mentionedpublic key in order to support private communications.Introductions may also include an expiration date (notdepicted) which a user may attach to their question toindicate when to remove the question from the net-work. Response objects contain two additional fieldsindicating the GUID to which this is a response (wecan have responses to responses for message thread-ing purposes) as well as the GUID for the generalthread (this must be the GUID of an Introductionmessage).

The form object section (b) of the message specifiesthe fields into which users can respond. This is anal-ogous to an HTML form where the form programmerdescribes the different types of desired information.Specifically, in this instance the user is asking that afree form answer be filled out (the BasicForm line),and one item be selected from a list (the SingleSelec-tionForm). It is up to the interface to determine thebest way to represent these fields to the user. How-ever, in our case basic forms tend to take the form oftext input boxes and single selection forms are dropdown selection lists. A response object will containa version of the form object that holds the responsevalue.

Finally, the conditional section (c) specifies the ruleson which a message should be scored and filtered. Thismessage specifically states that the user must havethe program “freenet” installed (the ProgramCondi-tonal) and their profile should score high enough on thequestion (ProfileMatchConditional). It is the responsi-bility of the Shock profiling infrastructure to constructthis profile (see Section 4.3) Additional conditionalsand the exact scoring mechanism are described below(see Section 4.4).

4.3. ProfilingOne of Shock’s key features is its ability to gener-ate rich user profiles. These profiles are generated bytapping multiple data points, by capturing static userdata during bootstrapping, and by explicit user declara-tions. However, keeping these user profiles constantly

Page 10: Shock: Aggregating Information While Preserving Privacy · with users’ email clients, taking advantage of email as habitat. Key Words. privacy, peer-to-peer networks, expertise

24 Adar et al.

updated while ensuring that they are rich enough toencompass a user’s implicit knowledge is a non-trivialtask.

Shock’s solution to this problem is to provide anextensible architecture for the addition of Observersand Bootstrapping modules and to allow the user toexplicitly declare their interests. The primary role ofthe observer modules is to plug into various dynamicdata sources and receptors and to tap into the dataflowing through these points. Bootstrapping modulesare used to capture static pieces of data that are un-likely to change often and to bootstrap the user’sprofile.

The current implementation of Shock includes Ob-servers to tap into Microsoft Outlook and Internet Ex-plorer to capture new information. In addition to theseObservers, Shock provides corresponding Bootstrap-ping modules for pre-existing data (e.g. existing emailand browsing history). Other Bootstrapping modulesdiscover the programs installed on the user’s machineand capture an employee’s organizational profile (af-filiations, location etc.) from the employee database.Shock also allows its users to augment their automat-ically generated profiles by explicitly declaring theirexpertise in a Self Declared Profile.

Initial user feedback as well as existing research(Herlocker, Konstan, and Riedl, 2000) suggested thatusers desire control over their profiles. However, thisdesire is often in direct opposition to the need to main-tain unbiased profiles. Shock facilitates user control ofprofiles while at the same time preserving the inde-pendent and autonomous nature of the Shock profiler.Specifically, Shock allows users to (a) create a SelfDeclared profile, (b) turn off the profiler at will, (c)remove information from the profile through a searchand delete interface, and most drastically, (d) deletetheir entire profile.

Privacy has been one of Shock’s key design criteria.Each user’s profile is stored locally on her computer andthis profile never physically leaves the computer. Theonly information that leaves the user’s local machineis through that user’s explicit responses to questionsthat matched the profile. The choice of respondingto these messages rests with the user and the abil-ity to reply to messages anonymously ensures thatusers are able to participate on the Shock networkwhile keeping their profiles completely private. Thesefeatures, combined with Shock’s architecture, pro-vide the user with complete control over her profile’sprivacy.

4.4. Scoring and conditionalsAs described above, Shock messages contain rules bywhich messages are scored and filtered. The user whooriginates the message may set conditions and, depend-ing on the number of conditions that are met, and asa function of the independent scores, a total messagescore is generated. If the message exceeds a certainthreshold set by the user, the message will be displayedin the user’s interface.

Conditionals, which are the filter rules describedabove, come in two main varieties, boolean and fuzzy.Those that are boolean matches will either return ascore of 1 or 0. A fuzzy conditional will return a numberbetween 0 and 1 (inclusive). Furthermore, conditionalsmay be required or not required.

4.4.1. Fuzzy matching. The profile conditional isone of the most frequently used due to its versatilityand ease of use. Documents that the user accesses areindexed in a full text index. The question text is thenused to search the full text index for likely matches.Each matching document is then independently scored(using standard TFIDF (Salton, 1988) metrics) againstthe question text and the results are combined and nor-malized. This method attempts to model the likelihoodof a user’s interest in a question based on the numberof matching documents as well as taking into accountthe user’s other interests. Such a solution is necessaryas the Shock clients do not have a global view and can-not compare one user absolutely to another. Addition-ally, Shock provides fuzzy matching against declaredprofiles.

4.4.2. Boolean matching. Currently Shock providesthree boolean conditionals that allow targeting ofusers who visited specific web sites, have matchingfields in the enterprise directory (department, loca-tion, etc.), and who have e-mailed a specific domainor user. These are abstractly similar in operation sowe will only describe the web conditional in moredetail.

Through this conditional, a user may specify whichweb sites or specific pages the recipient of the questionshould have seen. The result of this is simply a 0 or 1.Possible extensions of boolean query include allowingthe asker the ability to specify that the recipient not haveseen a website (e.g. “Please look at page x if you haven’tseen it yet and tell me what you think.”), and employingrecency and frequency (e.g. “users who often visit website x”).

Page 11: Shock: Aggregating Information While Preserving Privacy · with users’ email clients, taking advantage of email as habitat. Key Words. privacy, peer-to-peer networks, expertise

Shock: Aggregating Information While Preserving Privacy 25

4.4.3. Scoring. The total score generated is a func-tion that combines the score of each independent con-ditional. The scoring mechanism takes into accountwhether conditionals are required or not. If all requiredconditionals score above 0, the combined score is com-pared against the threshold (otherwise the message isfiltered out since a requirement was not met).

Because of our object-oriented implementationstrategy, new conditionals are easily and constantly

Fig. 7. Anonymous discussion on sensitive issues.

added to the system. Additionally, we are currently ex-perimenting with alternative scoring mechanisms in-cluding manipulation of scores not only in response tolocal scores but global behavior. For example, ques-tions for which answers are observed on the networkwill have their score reduced (multiple users neednot answer the same question). Alternatively, ques-tions that receive no answers may have their scoresboosted.

Page 12: Shock: Aggregating Information While Preserving Privacy · with users’ email clients, taking advantage of email as habitat. Key Words. privacy, peer-to-peer networks, expertise

26 Adar et al.

While Shock is highly flexible in the options avail-able to users for filtering, we have designed the userinterface to provide a more abstract view, which pro-vides users with access to the filter fields appropriate fordifferent tasks. For example, the “software announce-ment” question screen will only display the programconditional. Again, because of our object oriented im-plementation, new Macros, as we call them, can easilybe constructed for specific tasks.

5. Pilot Study

At the time of this writing we are currently in themidst of a 3 month pilot study to test the usefulnessof Shock. The pilot study is being conducted withina subgroup of Hewlett-Packard, that consists of secu-rity consultants, support personnel, and administratorsfrom all over the world. A large number of people havebeen contacted to participate in the pilot study, fromover 20 countries and at least 6 different departmentsworldwide.

A baseline survey indicates that the users currentlyuse over 10 systems to access expert knowledge, in-cluding 6 in-house systems that are similar to ex-pert databases. Despite this, the most popular meth-ods of finding expert knowledge are through personalnetworks, as supported by Ackerman’s field studies(McDonald and Ackerman, 1998), or by running a websearch. The survey also showed that users asked ques-tions frequently and found that access to expert infor-mation was important to their job. These results helpjustify the need for easy and natural approaches to find-ing expert knowledge.

At the conclusion of this study we hope to gain a bet-ter understanding of usage patterns, and converge onmetrics to understand whether or not the pilot was suc-cessful. We are also planning to perform experimentscomparing Shock to existing solutions to determine itsimpact on the pilot community.

It is still too early to report on how Shock is per-forming, but we are able to provide some evidence ofShock’s usefulness. Specifically, we have seen usersutilizing the unique features of Shock to achieve cer-tain goals. For example, one user, interested in publictransportation to work targeted her question to usersin a specific geographical location. A user in that areareceived her message, and gave her advice.

Although most messages are not anonymous, pre-liminary observation of usage shows us that the people

do use the anonymous features. As expected, the anony-mous messages tend to be about more sensitive mate-rial, such as opinions on company policies and so forth.Fig. 7 shows a conversation within the company on asensitive company issue. Shock helped users maintaina sense of security and privacy, while encouraging con-versation on a sensitive topic.

In the beginning of the pilot study, it seems thatShock is accomplishing its goals of providing a private,low-cost flexible means for people within an organi-zation to find expert knowledge and discuss sensitiveissues with others in an organization without fear ofreprisal or backlash. We are encouraged by our initialsuccess, and look forward to further studies of Shock’seffectiveness within the pilot group.

6. Conclusion and Future Work

Shock is a system designed to help resolve the ten-sion between privacy and increased access to user in-formation for knowledge management applications. Itdoes this by employing locally stored automatic pro-filing for access to valuable information about users,close to their work practice. Shock allows users tomake some use of the profile information of othersby assembling Shock clients into a peer-to-peer net-work. Users can exchange a wide variety of messagetypes and form ad hoc community discussions aroundspecific topics. Shock also gives users the full abilityto manage their identity by allowing truly anonymousmessaging through the architecture. The resulting plat-form is low-cost, highly flexible and can support a va-riety of knowledge management applications, in a con-text that is tightly integrated with users’ current emailapplication.

We plan to analyze and evaluate Shock usage fromthe results of the ongoing pilot study. We are interestedin how good this system is at uncovering hidden knowl-edge in an organization. We plan to explore issues ofmessage quality and trust in anonymous or private set-tings, as well as the issues of public and private rewardsfor contributing to the network. We also hope to inves-tigate the effects of anonymity for the facilitation ofconversation inside an organization.

In the future, we plan to build upon this platform byincluding reputation management features, exploringother applications such as privacy-preserving collab-orative filtering, as well as novel economic incentivemechanisms.

Page 13: Shock: Aggregating Information While Preserving Privacy · with users’ email clients, taking advantage of email as habitat. Key Words. privacy, peer-to-peer networks, expertise

Shock: Aggregating Information While Preserving Privacy 27

In addition, Shock may become an unobtrusive wayto gather interesting data relevant for understanding thesocial networks within organizations, while preservingthe privacy of participants.

Acknowledgments

We would like to thank Lada Adamic, BernardoHuberman, and Marie-Jo Fremont for their ideas andencouragement.

Notes

1. http://www.carrozza.com/atwork/connex/.2. http://sage.fiu.edu/MegaSource.htm.3. http://www.p2pq.net.4. http://www.askme.com.5. http://www.tacit.com.6. http://www.gnutelliums.com/.7. http://www.gnutelliums.com/.

References

Ackerman M. Augmenting the Organizational Memory: A FieldStudy of Answer Garden. Irvine, CA: Department of Informationand Computer Science, University of California, 1994.

Ackerman MS, Cranor LF, Reagle J, Privacy in e-commerce: ex-amining user scenarios and privacy preferences. In: Proceedingsof the first ACM Conference on Electronic Commerce, Denver,Colorado, United States, 1999:1–8.

Ackerman MS, Halverson C. Considering an organization’s memory.In: CSCW, ACM, 1998.

Ackerman M, Malone T. Answer Garden: A tool for growing orga-nizational memory. ACM SIGOIS Bulletin 1990;11(2):31–39.

Borenstein NS. Computational mail as network infrastructure forcomputer-supported cooperative work. In: Conference Proceed-ings on Computer-Supported Cooperative Work, Nov. 01–04,Toronto, Ontario, Canada, 1992:67–74.

Brittenden S. (ed.) Online Rights—The Law in Europe. http://www.ictur.labournet.org/Online.htm.

Canny J, Collaborative filtering with privacy. In: IEEE Conf. on Se-curity and Privacy, Oakland, CA, May 2002.

Clarke I, Sandberg O, Wiley B, Hong TW. Freenet: A distributedanonymous information storage and retrieval system. In: Proceed-ings of the ICSI Workshop on Design Issues in Anonymity andUnobservability, 2000.

Davenport TH, Prusak L. Working Knowledge: How OrganizationsManage What They Know. Boston, MA: Harvard Business SchoolPress, 1998.

Ducheneaut N, Bellotti V. Email as habitat: An explorationof embedded personal information management. Interactions2001;8(5):30–38.

Foner, L Yenta. A multi-agent, referral based matchmaking system.In: Proceedings of the First International Conference on Au-tonomous Agents (Agents ’97), Marina del Rey, California, UnitedStates, 1997:301–307.

Grinter RE. From workplace to development. In: Proceedings ofthe International ACM SIGGROUP Conference on SupportingGroup Work: The Integration Challenge, Phoenix, Arizona, UnitedStates, 1997:231–240.

Herlocker J, Konstan J, Riedl J. Explaining collaborative fil-tering recommendations. In: Proceedings of the ACM 2000Conference on Computer Supported Cooperative Work, Dec. 2–6,2000.

Kautz H, Selman B, Shah M, Referral Web. Communications of theACM 1997;40(3):63–65.

Konstan J, Miller B, Maltz D, Herlocker J, Gordon L, Riedl J. Grou-pLens: Applying collaborative filtering to usenet news. Commu-nications of the ACM 1997;40(3):77–87.

Lewis NA. Plan for web monitoring in courts dropped.2001, New York Times (http://www.nytimes.com/2001/09/09/technology/09COUR.html).

Mattox D, Maybury M, Morey D. Enterprise Expert Knowledge andDiscovery. MITRE Corporation, 1998.

McDonald D, Ackerman M. Just Talk To Me: A Field Study of Exper-tise Location. In: Proceedings of the ACM Conference on Com-puter Supported Cooperative Work (CSCW ’98), 1998:315–324.

McDonald D, Ackerman M. Expertise Recommender: A FlexibleRecommendation System and Architecture. In: CSCW, 2000:231–240.

Morita M, Shinoda Y. Information filtering based on user behavioranalysis and best match text retrieval. In: Proceedings of SIGIRConference on Research and Development, 1994:272–281.

Reiter M, Rubin A. Anonymous web transactions with Crowds. Com-munications of the ACM 1999;42(2):32–38.

Salton G. Automatic Text Processing: The Transformation, Analy-sis and Retrieval of Information by Computer. Addison-Wesley,Reading, MA, 1988.

Schwartz MF, Wood DCM. Discovering shared interests amongpeople using graph analysis of global electronic mail traffic.Communications of the Association for Computing Machinery,1992.

Shardanand U, Maes P. Social information filtering. In: ConferenceProceedings on Human Factors in Computing Systems, Denver,Colorado, United States, 1995:210–217.

Streeter LA, Lochbaum KE. Who knows: A system based on auto-matic representation of semantic structure. In: RIAO ’88. 1988,MIT, Cambridge.

Terveen L, Hill W, Amento B, McDonald D, Creter J. PHOAKS,Communications of the ACM 1997;40(3):59–62.

Vivacqua A, Lieberman H. Agents to assist in finding help. In:ACM Conference on Human Factors in Computiong Systems (CHI2000), 2000:65–72.

Weisband SP, Reinig BA. Managing user pereceptions of email pri-vacy, Communications of the ACM 1995;38(12):40–47.

Yang B, Garcia-Molina H. Comparing hybrid peer-to-peer sys-tems. In: Proceedings of the 27th VLDB Conference, Roma, Italy,2001.

Yimam D. Expert finding systems for organizations: Domain analysisand the demoir approach. In: Beyond Knowledge Management:Sharing Expertise, Boston: MIT Press, 2000.

Page 14: Shock: Aggregating Information While Preserving Privacy · with users’ email clients, taking advantage of email as habitat. Key Words. privacy, peer-to-peer networks, expertise

28 Adar et al.

Eytan Adar is a research scientist in the Informa-tion Dynamics Lab at Hewlett-Packard Laboratories,Palo Alto, CA. He holds a BS and Meng in Com-puter Science from the Massachussetts Institute ofTechnology.

Rajan Lukose is a research scientist in the InformationDynamics Lab at Hewlett-Packard Laboratories, PaloAlto, CA. He holds a PhD. in physics from StanfordUniversity, granted in 1999.

Caesar Sengupta is a member of technical staff atEncentuate Inc. in Singapore. He holds a Masters

in Computer Science with Distinction in Research fromStanford University.

Josh Tyler is a member of research staff at Hewlett-Packard Labs in Palo Alto, CA. He holds a Masters inComputer Science from Stanford with a specializationin Human-Computer Interaction and a BS in ComputerScience from Washington University.

Nathaniel Good is a graduate student in SIMSat the University of California, Berkeley. He holdsa BS in Computer Science from the University ofMinnesota.