From Research of Social Media to Socially Mediated Research
description
Transcript of From Research of Social Media to Socially Mediated Research
From Research of Social Media to Socially Mediated Research
2010 HCIL Symposium Workshop - UMD
Government Applications of Social Media
Networks and Communities
May 28, 2010
Natasa Milic-FraylingMicrosoft Research Cambridge
Outline
Microsoft Research. Integrated Systems team, research areas and approach
‘Social’ as a research topic: Modelling Human to Human Interaction in Technology Mediated Communities
‘Social’ as facilitator of research Leveraging Communities of Practice.
0
200
400
600
800
1000
1991 1995 1999 2003 2006 2008
# PhD Researchers
Microsoft Research (MSR)
MSR Sites– Redmond, Washington (September 1991)– San Francisco, California (June 1995)– Cambridge, United Kingdom (July 1997)– Beijing, China (November 1998)– Silicon Valley, California (July 2001)– Bangalore, India (January 2005)– Cambridge, Massachusetts (July 2008)
MSR New England
MSR Asia
MSR India
Redmond
MSR Cambridge
Silicon Valley
WEB AND ON-LINE COMMUNITIES
CONTENT ANALYSIS AND RICH UI
MOBILE AND CROSS PLATFORM MEDIA
Information retrieval & NLP
Academic Disciplines
Research Areas
Machine Learning and Statistics
Mathematical Modelling
Graph Theory and Analysis
HCI and Design
Academic Disciplines
WEB AND ON-LINE COMMUNITIES
CONTENT ANALYSIS AND RICH UI
MOBILE AND CROSS PLATFORM MEDIA
Information retrieval & NLP
Team
Research Areas
Machine Learning and Statistics
Mathematical Modelling
Graph Theory and Analysis
HCI and Design
Gabriella Janez Annika Rachel Gerard Natasa Eduarda
Gavin Jamie
WEB AND ON-LINE COMMUNITIES
CONTENT ANALYSIS AND RICH UI
MOBILE AND CROSS PLATFORM MEDIA
Information retrieval & NLP
Academic Disciplines
Research Areas
Machine Learning and Statistics
Mathematical Modelling
Graph Theory and Analysis
HCI and Design
Gabriella Janez Annika Rachel Gerard Natasa Eduarda
Gavin JamieVinay
Aleks Ignjatovic
Ben Shneiderma
n
Elizabeth Bosnignore
Cody Dunn
Dana Rotman
Marc Smith
Derek Hansen
Tom Lee
Team
WEB AND ON-LINE COMMUNITIES
CONTENT ANALYSIS AND RICH UI
MOBILE AND CROSS PLATFORM MEDIA
Research Areas
InSite LiveWeb site structure analysis and decomposition into subsites
Social FootprintsAnalysis of social interaction in online communities
NodeXL Interactive graph analysis and visualization.
Research Desktop Research in information management and tagging practices in the Desktop environmentSocial IRExtension of IR models with social network and models of approval, trust and reputation.
weConnectInvestigating narrow-cast of personalized content in close relationships and potential for mobile advertising. VideoSnapsInvestigating concepts and services for cross platform media editing and streaming.
Projects
Methodology – how to develop mobile and social applications.
Integration with the ecosystem – pre-requisites for adoption
Research Platforms
WEB AND ON-LINE COMMUNITIES
CONTENT ANALYSIS AND RICH UI
MOBILE AND CROSS PLATFORM MEDIA
Research Areas
InSite LiveWeb site structure analysis and decomposition into subsites
Social FootprintsAnalysis of social interaction in online communities
NodeXL Interactive graph analysis and visualization.
Research Desktop Research in information management and tagging practices in the Desktop environmentSocial IRExtension of IR models with social network and models of approval, trust and reputation.
weConnectInvestigating narrow-cast of personalized content in close relationships and potential for mobile advertising. VideoSnapsInvestigating concepts and services for cross platform media editing and streaming.
Projects
Connect the quantitative analyses with the qualitative analyses.
Principles, mechanisms, and tools for knowledge management.
Trust and reputation.
Shared summaries and overviews.
INTERACTIONS IN TECHNOLOGY MEDIATED COMMUNTIES
social as a research topic
Community Question-Answering
2003
2006
2005
2002-06
2002
2008
2006
Community Question-Answering Question
Answers
Content Organization, Browsing and Search
Topic categoriesTags
100 Most Frequent Tags on Live QnA
100 Most Frequent Tags on Live QnA
Politics
100 Most Frequent Tags on Live QnA
Fun, Life, People, Philosophy
Community Analysis and Health Index
Towards a sustainable community
Support novice users in becoming active community participants
Support frequent users in increasing the volume and quality of their content contributions
Promote high quality contributions (for external exploitation – through search).
85% of new users start with a question
72% never ask a question again
5% will engage in answering
61% of questions from new users
don’t get more than 1 answer
(23% get 0 answers)
Example: Investigate QnA Voting PracticeApproach:
Statistical analysis of the user logs
Manual inspection of the content
– Taxonomy of the users’ intent; to be evolved by the community of practice
Define the basic features of the individuals and governing assumptions
Derive a mathematical model of the voters metric.
Observe the properties with regards to the irregular voting behaviour: random voting or collusion.
C
AA
V
answer to
vote on
answer to
comment to
comment to
Social network
activities:
Q
Answer to a question
Comment on an answer
Vote on the best answer
Which Answer to Vote On? Different ‘best answer’ connotations
The notion of the ‘best answer’ thus depends on the context and nature of the answers - from correctness and usefulness to entertainment value
Social bias
Assignment of votes may be influenced by social and personal ties, voter’s perception, familiarity, and preferential treatment of familiar community members
“Microsoft or Apple? Feel free to argue and point out their good and bad points. Also feel free to rebut or debate on other people's standpoint. Best argument/answer will get my friends’ and my "best answer" reward.”
Self-promotion
Individuals’ aspirations to excel in their social status can adversely affect the quality of their contribution to the community.
Reliability as Conformity?
Reliability of a voter
Relative reliability of two voters is determined by the proportion of all the voters who made the same choice of the best answer:
The reliability scores represent a fixed-point for the function F – apply Brouwer Fixed Point Theorem.
Real Data Analysis
Vote Count
FP Method
‘FUN’ ‘PHILOSOPHY’
Random VotingSimulate Random Voting by uniform distribution in place of Zipf’s Law
We vary the percentage of affected questions (from 1% to 10%) and the percentage of voters who voted randomly (from 1% to 10%).
The number of best answer changed is lower for fixed point score (right) than for plurality voting (left)
Simulate the collusion: fix the number of involved voters (‘stuffers’, here 4 and 10) and the percentage of questions affected (here 50%)
Both majority voting and fixed point scoring are susceptible to ballot stuffing
Fixed point scoring flags out the outliers and helps identifying collusion
Ballot Stuffing
Detecting Sybil Attack - Leveraging Social Networks
• Social networks are Fast Mixing– Random walks quickly
converge to stationary distribution
• Sybil attacks induce a bottleneck cut– Fast mixing is disrupted
• Knowledge of an apriori honest node– Breaks Symmetry
Honest Nodes
Sybil Nodes
Attackedges
LEVERAGING COMMUNITIES OF PRACTICE
social as facilitator of research
Issue: the Scale and the Limitations of Humans
We require user input in order to inform the systems’ design and verify our hypotheses
In search we build test collections:
– A set of topics, a corpus of documents, and relevance judgements for documents in the corpus
Question: how do we build test collections for books
– Search over Web pages involves low cost of inspection of individual Web pages
– Search over Book collections increases the cost due to the size and the coherence of topics across pages.
Web scenario
Book scenario
…
DATA STORE AND SEARCHABLE INDEX
Read’n Play
Architecture comprises four functional layers
Implemented using Web services - no client based interaction with the content
Can be repurposed for other research projects
SEARCH AND NAVIGATION SUPPORT
USER ANNOTATIONS
SOCIAL GAME SUPPORT
Image Database - Scanned Document Page
OCR Text Database
Text and Metadata Index
Social game
• Explorers • Reviewers
Reward for finding relevant
content
Reward for finding mistakes
in explorers’ work
Reward for re-assessment
(agreement is not necessary)
• Conflicts
Penalty
Explore
Pilot StudyParticipants Open to everyone 48 registered + 81 INEX participants 17 contributed assessments
(16 INEX participants)
Collected data Relevance assessments
– 3,478 judged books with – 23,098 judged pages from – 29 topics
Log data– 32,112 navigational events– 45,126 judgement events– 2,970 ‘search inside a book’ events
Incentives for participation• Tangible, e.g., monetary,
– Winners: Microsoft Hardware and software
– All: Access to collected data • Intangible reward, e.g., fun,
social gain
– Leader board: Social status
FeasibilityAverages across the 17 assessors
7.2 days with activity, out of 42
11.4 hours judging time
220 judged books
Average effort
7.3 minutes per relevant book, 2.7 minutes per irrelevant book (comparable to INEX 2003 ad hoc track)
37 seconds per relevant page, 22 seconds per irrelevant page
Extrapolated statistics
1000 books takes 52.7 hours, 1 : 9 ratio of relevant : irrelevant
33.3 days to judge one topic, with 95 minutes a day
70 topics, 200 books per topic with 20 judges takes 36.9 days
737 judges to complete task in one hour
Productivity Games
Summary Understanding social media requires cross-
disciplinary approach and new methods to study them
Defining the characteristics and metrics of ‘healthy communities’ is a challenging task.
‘Social’ is increasing its role as an enabler for large scale experiments
Generally, we need to be reflective of our methods and approaches we take when studying online communities.
Thank you
Microsoft ResearchCambridge
https://research.microsoft.com/
is