Download - It’s not in their tweets: Modeling topical expertise of Twitter users

It’s not in their tweets: Modeling topical expertise

of Twitter usersClaudia Wagner, Vera Liao, Peter Pirolli, Les Nelson and Markus Strohmaier

Amsterdam, 16.4.2012

with…

Vera Liao

Markus Strohmaier

Les Nelson

Peter Pirolli

3Motivation

On Twitter information consumption is mainly driven by social networks

Users need to decide whom to follow in order to get trustful and relevant information about the topics they are interested in

Evidence from real-life

Search online for evidence

Searching for evidence at Twitter user’s profile page

Bio

Tweets and Retweets

List Memberships

6Research Questions

How useful are different types of user-related data for humans to inform their expertise judgments of Twitter users?

How useful are different types of user-related data for learning computational expertise models of users?

User StudyExpertise Judgments of humans

16 participants

Task: Rate (1-5) expertise level of selected Twitter users (with high and low expertise) for the topic „semanticweb“

3 Conditions under which the user accounts were presented to subjects:

Condition 1: Tweets, Retweets, List, Bio

Condition 2: Only Tweets and Retweets are shown

Condition 3: Only List and Bio are shown

For each condition and expertise level we have 4 Twitter pages (4 replicates)

4 * 3 * 2 = 24 pages to rate per subject

User StudyExpertise Judgments of humans

2-way ANOVA

Within-Subject Variables:• Twitter user expertise (high/low) • 3 Conditions

Interaction between conditions and Twitter user expertise is significant (F(2) = 8,326 , p < 0,01 )

Post-Hoc Test shows that users’ ability to correctly judge expertise of Twitter users differs significantly under condition 1 and 2 and condition 2 and 3.

9Research Questions

How useful are different types of user-related data for humans to inform their expertise judgments of Twitter users?

How useful are different types of user-related data for learning computational expertise models of users?

10Dataset

10 topics semanticweb, biking, wine, democrat, republican, medicine, surfing, dogs, nutrition and diabetes

We use Wefollow directories as a manually created proxy ground truth for expertise

Top 150 users per Wefollow directory

Excluded users who are in more than one of the 10 directories and users who mainly tweet non-english

11Dataset

1145 usersMost recent 1000 tweets and retweets

Most recent 300 user lists

Bio info

Information on Twitter is sparseExtend URLs in Tweets, RTs and bio

Use list names as search query terms

Use top 5 search query result snippets obtained from Yahoo Boss3 to enrich list information

Computational Expertise ModelsMethodology

Learn latent semantic structures (topics) from Twitter communication by fitting an LDA model

Top 20 stemmed words of 3 randomly select topics learned by an LDA model with T=50

T1 T2 T3

Computational Expertise ModelsMethodology

Associate users with topics by using statistical Inference based on different types of user related data user’s topical expertise profile

Bio

Lists

Tweets

RTs

T1 T2 T3

T1 T3T2

T1 T3T2

T1 T3T2

Topical Similarity between lists/bio/tweets/RTs

15Types of User Lists

Manual inspection of user lists

Selected 10 users at random and inspected their user list memberships (455 user lists)

We found 3 main classes of user lists:Personal judgments (e.g., “great people”, “geeks”)

Personal relationships (e.g., “my family”,“colleagues”)

Topical Lists (e.g., “science”, “researcher”, “healthcare”)

16Value of User Lists

3 human raters judged if a list (label and/or description) belongs to the class Topical Lists

77,67% of user lists were topical lists

Inter-rater agreement Kappa=0.62

17

Quantify the Value of Lists/Bio/Tweets/RTs

Which type of information reflects best the topical expertise of a user?

Information Theoretic EvaluationWhich type of topic distribution reflects best the underlying category information of the user?

Normalized Mutual Information (NMI) between user’s topic distributions and user’s Wefollow directory

Task-based EvaluationWhich type of topic distributions are most useful for classifying users into their Wefollow directories?

F1-score of classifcation models

18

Information-Theoretic Evaluation ofComputational Expertise Models

Task-based Evaluation ofComputational Expertise Models

Compare topic distributions inferred via different types of user-related data within a classification task

Objective: Classifying users into Wefollow directories by using topic distribution as features

Classification Task:

Train Partial Least Square classifier with topic distributions inferred via different types of user-related data as features

Perform 5-fold-cross validation

Use F-measure (harmonic mean of precision and recall) to compare classifiers’ performance


T=300

x-axis shows reference values y-axis shows predictions

Conclusions

Different types of user-related data lead to different topic annotations

List-based topic annotations are most distinct from all others

Bio-, tweet- and retweet-based topic annotations are quite similar

For creating topical expertise profiles of users information about their list memberships is most useful

For informing humans’ expertise judgments about Twitter users contextual information (user’ bio and list memberships) is most useful

24Implications & Limitations

User InterfaceMake user lists and bio information more prominent

Incentives for people to use lists more heavilyE.g. provide weakly list-summaries

Search and Recommender Systems could benefit from exploiting user list information

Results are biased towards users with high Wefollow rank

Experimental Setup

THANK YOU

[email protected]://claudiawagner.info

src: http://adobeairstream.com/green/a-natural-predicament-sustainability-in-the-21st-century/

Bio and User Lists are useful for judging topical expertise

mailto:[email protected]

mailto:[email protected]