From Research of Social Media to Socially Mediated Research

From Research of Social Media to Socially Mediated Research

2010 HCIL Symposium Workshop - UMD

Government Applications of Social Media

Networks and Communities

May 28, 2010

Natasa Milic-FraylingMicrosoft Research Cambridge

Outline

Microsoft Research. Integrated Systems team, research areas and approach

‘Social’ as a research topic: Modelling Human to Human Interaction in Technology Mediated Communities

‘Social’ as facilitator of research Leveraging Communities of Practice.

0

200

400

600

800

1000

1991 1995 1999 2003 2006 2008

# PhD Researchers

Microsoft Research (MSR)

MSR Sites– Redmond, Washington (September 1991)– San Francisco, California (June 1995)– Cambridge, United Kingdom (July 1997)– Beijing, China (November 1998)– Silicon Valley, California (July 2001)– Bangalore, India (January 2005)– Cambridge, Massachusetts (July 2008)

MSR New England

MSR Asia

MSR India

Redmond

MSR Cambridge

Silicon Valley

WEB AND ON-LINE COMMUNITIES

CONTENT ANALYSIS AND RICH UI

MOBILE AND CROSS PLATFORM MEDIA

Information retrieval & NLP

Academic Disciplines

Research Areas

Machine Learning and Statistics

Mathematical Modelling

Graph Theory and Analysis

HCI and Design






Team

Research Areas




HCI and Design

Gabriella Janez Annika Rachel Gerard Natasa Eduarda

Gavin Jamie






Research Areas




HCI and Design

Gabriella Janez Annika Rachel Gerard Natasa Eduarda

Gavin JamieVinay

Aleks Ignjatovic

Ben Shneiderma

n

Elizabeth Bosnignore

Cody Dunn

Dana Rotman

Marc Smith

Derek Hansen

Tom Lee

Team




Research Areas

InSite LiveWeb site structure analysis and decomposition into subsites

Social FootprintsAnalysis of social interaction in online communities

NodeXL Interactive graph analysis and visualization.

Research Desktop Research in information management and tagging practices in the Desktop environmentSocial IRExtension of IR models with social network and models of approval, trust and reputation.

weConnectInvestigating narrow-cast of personalized content in close relationships and potential for mobile advertising. VideoSnapsInvestigating concepts and services for cross platform media editing and streaming.

Projects

Methodology – how to develop mobile and social applications.

Integration with the ecosystem – pre-requisites for adoption

Research Platforms




Research Areas

InSite LiveWeb site structure analysis and decomposition into subsites

Social FootprintsAnalysis of social interaction in online communities

NodeXL Interactive graph analysis and visualization.

Research Desktop Research in information management and tagging practices in the Desktop environmentSocial IRExtension of IR models with social network and models of approval, trust and reputation.

weConnectInvestigating narrow-cast of personalized content in close relationships and potential for mobile advertising. VideoSnapsInvestigating concepts and services for cross platform media editing and streaming.

Projects

Connect the quantitative analyses with the qualitative analyses.

Principles, mechanisms, and tools for knowledge management.

Trust and reputation.

Shared summaries and overviews.

INTERACTIONS IN TECHNOLOGY MEDIATED COMMUNTIES

social as a research topic

Community Question-Answering

2003

2006

2005

2002-06

2002

2008

2006

Community Question-Answering Question

Answers

Content Organization, Browsing and Search

Topic categoriesTags

100 Most Frequent Tags on Live QnA


Politics


Fun, Life, People, Philosophy

Community Analysis and Health Index

Towards a sustainable community

Support novice users in becoming active community participants

Support frequent users in increasing the volume and quality of their content contributions

Promote high quality contributions (for external exploitation – through search).

85% of new users start with a question

72% never ask a question again

5% will engage in answering

61% of questions from new users

don’t get more than 1 answer

(23% get 0 answers)

Example: Investigate QnA Voting PracticeApproach:

Statistical analysis of the user logs

Manual inspection of the content

– Taxonomy of the users’ intent; to be evolved by the community of practice

Define the basic features of the individuals and governing assumptions

Derive a mathematical model of the voters metric.

Observe the properties with regards to the irregular voting behaviour: random voting or collusion.

C

AA

V

answer to

vote on

answer to

comment to

comment to

Social network

activities:

Q

Answer to a question

Comment on an answer

Vote on the best answer

Which Answer to Vote On? Different ‘best answer’ connotations

The notion of the ‘best answer’ thus depends on the context and nature of the answers - from correctness and usefulness to entertainment value

Social bias

Assignment of votes may be influenced by social and personal ties, voter’s perception, familiarity, and preferential treatment of familiar community members

“Microsoft or Apple? Feel free to argue and point out their good and bad points. Also feel free to rebut or debate on other people's standpoint. Best argument/answer will get my friends’ and my "best answer" reward.”

Self-promotion

Individuals’ aspirations to excel in their social status can adversely affect the quality of their contribution to the community.

Reliability as Conformity?

Reliability of a voter

Relative reliability of two voters is determined by the proportion of all the voters who made the same choice of the best answer:

The reliability scores represent a fixed-point for the function F – apply Brouwer Fixed Point Theorem.

Real Data Analysis

Vote Count

FP Method

‘FUN’ ‘PHILOSOPHY’

Random VotingSimulate Random Voting by uniform distribution in place of Zipf’s Law

We vary the percentage of affected questions (from 1% to 10%) and the percentage of voters who voted randomly (from 1% to 10%).

The number of best answer changed is lower for fixed point score (right) than for plurality voting (left)

Simulate the collusion: fix the number of involved voters (‘stuffers’, here 4 and 10) and the percentage of questions affected (here 50%)

Both majority voting and fixed point scoring are susceptible to ballot stuffing

Fixed point scoring flags out the outliers and helps identifying collusion

Ballot Stuffing

Detecting Sybil Attack - Leveraging Social Networks

• Social networks are Fast Mixing– Random walks quickly

converge to stationary distribution

• Sybil attacks induce a bottleneck cut– Fast mixing is disrupted

• Knowledge of an apriori honest node– Breaks Symmetry

Honest Nodes

Sybil Nodes

Attackedges

LEVERAGING COMMUNITIES OF PRACTICE

social as facilitator of research

Issue: the Scale and the Limitations of Humans

We require user input in order to inform the systems’ design and verify our hypotheses

In search we build test collections:

– A set of topics, a corpus of documents, and relevance judgements for documents in the corpus

Question: how do we build test collections for books

– Search over Web pages involves low cost of inspection of individual Web pages

– Search over Book collections increases the cost due to the size and the coherence of topics across pages.

Web scenario

Book scenario

…

DATA STORE AND SEARCHABLE INDEX

Read’n Play

Architecture comprises four functional layers

Implemented using Web services - no client based interaction with the content

Can be repurposed for other research projects

SEARCH AND NAVIGATION SUPPORT

USER ANNOTATIONS

SOCIAL GAME SUPPORT

Image Database - Scanned Document Page

OCR Text Database

Text and Metadata Index

Social game

• Explorers • Reviewers

Reward for finding relevant

content

Reward for finding mistakes

in explorers’ work

Reward for re-assessment

(agreement is not necessary)

• Conflicts

Penalty

Explore

Pilot StudyParticipants Open to everyone 48 registered + 81 INEX participants 17 contributed assessments

(16 INEX participants)

Collected data Relevance assessments

– 3,478 judged books with – 23,098 judged pages from – 29 topics

Log data– 32,112 navigational events– 45,126 judgement events– 2,970 ‘search inside a book’ events

Incentives for participation• Tangible, e.g., monetary,

– Winners: Microsoft Hardware and software

– All: Access to collected data • Intangible reward, e.g., fun,

social gain

– Leader board: Social status

FeasibilityAverages across the 17 assessors

7.2 days with activity, out of 42

11.4 hours judging time

220 judged books

Average effort

7.3 minutes per relevant book, 2.7 minutes per irrelevant book (comparable to INEX 2003 ad hoc track)

37 seconds per relevant page, 22 seconds per irrelevant page

Extrapolated statistics

1000 books takes 52.7 hours, 1 : 9 ratio of relevant : irrelevant

33.3 days to judge one topic, with 95 minutes a day

70 topics, 200 books per topic with 20 judges takes 36.9 days

737 judges to complete task in one hour

Productivity Games

Summary Understanding social media requires cross-

disciplinary approach and new methods to study them

Defining the characteristics and metrics of ‘healthy communities’ is a challenging task.

‘Social’ is increasing its role as an enabler for large scale experiments

Generally, we need to be reflective of our methods and approaches we take when studying online communities.

Thank you

Microsoft ResearchCambridge

https://research.microsoft.com/

is

From Research of Social Media to Socially Mediated Research

Documents

Transcript of From Research of Social Media to Socially Mediated Research