© Copyright IBM Corporation 2010 IBM Research On the Quality of Inferring Interests From Social...

19
© Copyright IBM Corporation 2010 IBM Research On the Quality of Inferring Interests From Social Neighbors Zhen Wen, Ching-Yung Lin IBM T. J. Watson Research Center

Transcript of © Copyright IBM Corporation 2010 IBM Research On the Quality of Inferring Interests From Social...

© Copyright IBM Corporation 2010

IBM Research

On the Quality of Inferring Interests From Social Neighbors

Zhen Wen, Ching-Yung Lin

IBM T. J. Watson Research Center

|

IBM Research

© Copyright IBM Corporation 2010

Motivation

Modeling user interests enables personalized services– More relevant search/recommendation results

– More targeted advertising

Data about users are sparse– Many user profiles are static, incomplete and/or outdated

– <10% employees actively participate social software [Brzozowski2009]

Inferring user interests from neighbors can be a solution– Also bring up a concern of exposing user’s private information

How true are “You are who you know”, “Birds of a Feather Flocks

Together”?

|

IBM Research

© Copyright IBM Corporation 2010

Challenges in Observing Users

Diverse types of media

– Public social media (friending, blogs, etc.)

Data are public but limited (esp. in enterprises)

– Private communication media (email, instant messaging, face-to-face meetings, etc)

Much more data

Privacy is a major issue

|

IBM Research

© Copyright IBM Corporation 2010

Example of Diverse Types of Media

Number of people participated in top 3 media in an Enterprise with 400K employees

Number of entries:• Social bookmarking: 400K• Electronic communication: 20M• File sharing: 140K

|

IBM Research

© Copyright IBM Corporation 2010

Our Goals

How well a user’s interests can be inferred from his/her social neighbors?

Can the diverse types of media be combined to improve inferring user interests from social neighbors?

Can the quality of the inference be predicted based of features of social neighbors?

– Only sufficiently accurate inference may help personalized services

|

IBM Research

© Copyright IBM Corporation 2010

Our Approach

Infer user interests from social neighbors

– Model user interests based on multiple types of information they accessed

– Construct employee social network from communication data

– Infer using social influence model

Study the relationship between inference quality and network characteristics

– Identify effective factors to ensure high quality results for applications

|

IBM Research

© Copyright IBM Corporation 2010

SmallBlue: Unlock the Power of Business Networks & Protect Privacy

Expertise: Search for people who know “xyz” in my networks..

Ego: Show my personal network evolution and social capital

Net: See how experts or community connect

Reach: helps me to understand this person and my formal and information paths to Reach him..

Whisper: Social Network enabled personalized live recommender..

Productivity: Social Network Analysis Service helps company understand how to enhance productivity.

Synergy: Personalized Search

crawlingDistributed

Streams

DBs &

Feeds

20,000,000 emails & SameTime messages

1,000,000 Learning click data

14,000,000 KnowledgeView, SalesOne, …, access data

1,000,000 Lotus Connections (blogs, flie sharing, bookmark) data

200,000 people’s consulting financial databases

400,000 IBMers organization/demographic data

400,000 webpages and knowledge assets

Social Network Analysis & Visualization, Expertise Mining,

and Multi-Channel Human Network/Behavior Analysis

Live Data

|

IBM Research

© Copyright IBM Corporation 2010

Privacy as Fundamental Human Rights and Global Privacy Laws

(United Nations) Universal Declaration of Human Rights [1948]

Article 12: No one shall be subjected to arbitrary interference with his privacy, family, home or correspondence, nor to attacks upon his honor and reputation. Everyone has the right to the protection of the law against such inference or attacks.

European Union• European Data Protection Directive (1995)

Canada• PIPEDA

(2001 - 2004)

U.S. – Sectoral• Children ’ s Privacy; COPPA (1999)

• Financial Sector GLB (2001)

• Health Sector; HIPAA (2002)

• California Privacy; (2005) Taiwan• Computer - Processed PD Protection Law (1995)

South Korea• Info & Comm Network Util. & Info Protection Law (2000)

Japan• Personal Data Protection Act (2005)

APEC• Guidelines (2004)

Existing Private SectorPrivacy Laws

Emerging Private SectorPrivacy Laws

Existing Private SectorPrivacy Laws

Emerging Private SectorPrivacy Laws

APEC• Guidelines (2004)

Russia• Federal law on Pers Data

(January 2007)

Australia• Privacy Amendment Act (2001)

New Zealand• Privacy Act (1993)

Chile• Protection of Private Life Law (1999)

Argentina• Protection of PD Law (2000)

Dubai• Data Protection Law

(January 2007)

EU Directive 95/46/EC Article 2 (a):

– Personal data shall mean any information relating to an identified or identifiable natural person

EU Directive 95/46/EC Article 7:

– Personal data may be processed only if:

The data subject has unambiguously given his consent; or

for the performance of a contract to which the data subject is party or in order to take steps at the request of the data subject prior to entering into a contract; or

for compliance with a legal obligation to which the controller is subject; or…

|

IBM Research

© Copyright IBM Corporation 2010

Dataset

25315 users’ contributed content

– 20M email/chats

– 400K social bookmarks

– 20K shared public files

– Profile information

Job role, division, news categories of interests, etc

Infer social network based on email/chats

X’: number of emails

|

IBM Research

© Copyright IBM Corporation 2010

User Interests Model – Implicit Interests

Model users’ interests implicitly indicated by their contributed content

– Extract latent topics from the multiple types of content using LDA

– Select top-N distinct topics as the implicit interests model of a user

The degree the user is interested

The similarity of topics

|

IBM Research

© Copyright IBM Corporation 2010

User Interests Model – Explicit Interests

29% users manually specify interests in their profile

– A list of selected terms

From a static 1120-term taxonomy related to work

Compare implicit and explicit interests

– Explicit interests models are more limited

Implicit interests cover 60.4% explicit interests

Explicit interests cover 2.2% implicit interests

|

IBM Research

© Copyright IBM Corporation 2010

Infer Interests Based on Social Influence

Social influence model

– Network autocorrelation model [Leenders02]

Social influence represented as a weighted combination of neighbors’ attributes

The weight is an exponential function of the social distance

|

IBM Research

© Copyright IBM Corporation 2010

Inference Quality

Condition Max Mean St. Deviation

Using social bookmark data only 59.4% 19.2% 10.7%

Using file sharing data only 44.9% 12.7% 7.2%

Using email/IM data only 62.1% 29.6% 14.1%

Using all three data 100% 45.1% 21.7%

Implicit interests: how close the inferred top-20 topics to the ground truth

– Significant advantage in combining multiple sources

– Large variance can affect practical application, thus need predict when to infer interests

– Much better recall than precision

Explicit interests: precision and recall of inferred terms

Measure Mean St. Deviation

Precision 30.1% 26.9%

Recall 61.5% 27.6%

|

IBM Research

© Copyright IBM Corporation 2010

Can Inference Quality be Predicted?

Hypothesis: inference quality can be predicted from social network properties

– User activeness: the amount of contribution

– In-degree

– Out-degree

– Betweenness

– User management role

Use Support Vector Regression to perform prediction

Evaluate prediction

– Precision/recall of the prediction (10-fold cross validation)

– Use prediction to improve inference Only infer when we predict it’s high quality

|

IBM Research

© Copyright IBM Corporation 2010

Quality Prediction Results

Precision/recall of prediction

Improve inference

Measure Improved toImprovement

(%)

Precision 60.5% 101%

Recall 85.7% 39.3%

Implicit Interests

Implicit Interests

Explicit Interests

Explicit Interests

|

IBM Research

© Copyright IBM Corporation 2010

Feature Comparison

“Leave-one-feature-out" comparisons of prediction results

Most social influences are from 1&2-degree

neighbors

You neighbors decide how well you can be

inferred

You neighbors’ network positions may be even more important than how active they are

– Formal organizational properties

Manager neighbors are more important in inference

– i.e., more social influence (about 5% more)

|

IBM Research

© Copyright IBM Corporation 2010

Related Work

User modeling

– Use behavioral data of the Ego

[Shepitsen08, Song05, Stoyanovic08, Teevan05]

– Use data of 1-degree neighbors

Issued the same query ([Piwowarski07, White09])

Collaborative filtering ([Goldberg92])

Social influence and correlation

– Correlation and related factors in social networks

[Singla08,Blei03, Crandall08, Anagnostopoulos08, Tang09]

– Infer user profiles in online communities

[Mislove2010]

|

IBM Research

© Copyright IBM Corporation 2010

Conclusion

There’s large variance in the quality of inferring user interests from social neighbors

The “recall” of the inference is much better than “precision”

The inference quality can be predicted from social network properties

|

IBM Research

© Copyright IBM Corporation 2010

Questions?