Applications of Voting Theory to Information Mashups
description
Transcript of Applications of Voting Theory to Information Mashups
ICSC 2008 Julia Grace, IBM Almaden Research
Applications of Voting Theory to Information Mashups
Alfredo AlbaVarun Bhagwan
Julia GraceDaniel GruhlKevin Haas
Meenakshi NagarajanJan Pieper
Christine RobsonNachiketa Sahoo
ICSC 2008 Julia Grace, IBM Almaden Research
Overview• BBC approached IBM in 2007
– Goal: Create a better music chart that is more reflective of current tastes and trends in popular music• Billboard charts are no longer relevant
– Do not reflect music listened to and purchased online
• Looked to online music communities for data– page views, music listens, blog posts
• We needed a way of combing these sources– Y.A.M.? (Yet another mashup?)
ICSC 2008 Julia Grace, IBM Almaden Research
Overview• Traditional Mashup
– Google Maps + Craigslist– Music Mashups
• Interweaving 2 tracks– Always same modalities
• Similar, homogenous data sets
Combine “like” data by simple summation
• Information Mashup– Data from disparate online music communities– Different modalities [views, listens, posts]
ICSC 2008 Julia Grace, IBM Almaden Research
Overview• New means of combining/mashing our
data– New methodolgy for mashups
• Our Approach– Voting theory
• Think of our data sources as constituents in an election
ICSC 2008 Julia Grace, IBM Almaden Research
Music Mashup
• End Goal: Gauge Popularity• Challenging
– Diverse data silos– Different sites have different demographics and user
bases– Data volumes vary widely
• MySpace: 13,697,565• Bebo: 10,194
– Data itself comes in different flavors
How do you represent the “voices” of each of these music communities in a single Top-10
list?
ICSC 2008 Julia Grace, IBM Almaden Research
Voting Theory• Voting Systems
– Designed to combine many “voices” into a single decision that is representative of all communities
Different voting systems haveDifferent priorities resulting inDifferent outcomes
• You have to choose the voting system that is right for your circumstances
• We are not going to invent a new voting system– Examine several well-known systems.
ICSC 2008 Julia Grace, IBM Almaden Research
Example: US Presidential Election• US Presidential Election uses Delegate System
– Guarantees states with larger populations don’t always drastically sway elections
– This methodology was used because at the time of implementation, that was what was important
• Bush vs. Gore 2000 Presidential Election
ICSC 2008 Julia Grace, IBM Almaden Research
How to Choose a Voting System?
• Voting theory: how “good” your voting system is varies from person to person and situation to situation
• Metric is needed to gauge the quality of a voting system for your circumstances.
ICSC 2008 Julia Grace, IBM Almaden Research
How to Choose a Voting System?• Example:
– Delegate System in United States – Equal voice for each state by
population was the priority
• How to evaluate the quality Top-10 list? – Ideally we would create lists and perform a massive user
study to determine the best voting system– This is not feasible and does not scale– So we need some heuristics to gauge the quality of our lists
Fortunately, this is a solved problem…Voting theory employs a “Social Welfare Function” to gauge the quality of a voting
system
ICSC 2008 Julia Grace, IBM Almaden Research
Social Welfare Functions• What is a Social Welfare Function? definition:
“Mathematical means to quantify the attributes that you prioritize in a voting system (i.e. all communities have a voice, most popular candidate wins)”
• “Simple example”– Situation: People only care if their first choice wins the
election – Resulting Social Welfare Function: Measure how many
people had their first choice picked
• We will use a Social Welfare Function to measure what is the best voting system to use to combine the data in our mashup to generate our Top-10 list of music
ICSC 2008 Julia Grace, IBM Almaden Research
Well established Social Welfare Functions
• Spearman Footrule: Christine– Type A personality– Preservation of position in the rankings– Entire ranking reflecting accurately
• middle range artists should be in the middle, low range towards the end, etc.
– For example: • Christine ranked Coldplay #2, so she will
be happy if Coldplay is #2 in the final list
• Precision Optimal Aggregation: Julia– Representation (not rank)– For example:
• Julia had Rihanna in her list, so she would like Rihanna to be in the final list
ICSC 2008 Julia Grace, IBM Almaden Research
Voting Systems• We evaluated 8 well established voting
systems• Important to keep in mind
– We are not electing a single candidate, we are creating a rank-ordered list
– Position of artists matters just as much as who is #1
ICSC 2008 Julia Grace, IBM Almaden Research
Voting Systems• Total vote (i.e. election by popular vote)
– Tally counts, listens, etc. regardless of “type” of data– Easy to understand, very transparent– Modalities with very large amounts of data tend dominate the vote
• Weighted votes– Use a multiplier so that postings count more than listens, delegate, count rank
• Semi-Proportional– Each source gets the same number of votes regardless of how many people vote
• Delegates– Each source gets a set number of votes, decided in advance
• Simple Rank (Naru)– Every candidate gets a position vote – person
with the smallest number is the winner
• Inverse Rank– Close to Rank except use 1/number and biggest number wins more weight to
being close to top of list
• Run-off– When ½ the sources agree on a candidate that candidate is elected
ICSC 2008 Julia Grace, IBM Almaden Research
Election Setup
1. Data preparation: Crawled, extracted,
cleaned, mined, analyzed…
2. Applied a voting system• Total Vote, Naru, Run-off, etc.• Ouput: Top-10 list of popular artists
3. Tested Top-10 list against SWF1. Is Julia happy?2. Is Christine happy?
ICSC 2008 Julia Grace, IBM Almaden Research
#1
#2
#3
#4
#5
#6
#7
#8
#9
#10
Total Votes• Total Votes: simple summing
Key
Precision Optimal Aggregation SWF
Spearman Footrule SWF
Contribution of combined ranking for the artist from each source
YouTube and Bebo are nearly sole contributors to Rihanna being #1
Explanation: YouTube dominates all other music communities – it was coincidental that Bebo was also able to contribute to the rankings
YouTube video view counts are so high they dominate all other communities
ICSC 2008 Julia Grace, IBM Almaden Research
#1
#2
#3
#4
#5
#6
#7
#8
#9
#10
Naru• Election system used on Pacific Island nation of Naru
Significantly more even distribution of sources
Naru maximized the Precision Optimal Aggregation SWF
All communities contribute!
Key
Precision Optimal Aggregation SWF
Spearman Footrule SWF
Contribution of combined ranking for the artist from each source
ICSC 2008 Julia Grace, IBM Almaden Research
#1
#2
#3
#4
#5
#6
#7
#8
#9
#10
Run-off• From the top, select artists one at a time from each
source in a fixed order
Significantly more even distribution of sources
Run-off maximized the Spearman Footrule SWF
All communities contribute!
Key
Precision Optimal Aggregation SWF
Spearman Footrule SWF
Contribution of combined ranking for the artist from each source
ICSC 2008 Julia Grace, IBM Almaden Research
http://www.bbc.co.uk/soundindex/
ICSC 2008 Julia Grace, IBM Almaden Research
Lessons Learned• Choosing a voting methodology depends
on what you prioritize
• Think hard about what your Social Welfare Function– Deciding factor in how to combine data – How you measure the success of your mashup
ICSC 2008 Julia Grace, IBM Almaden Research
Conclusion• Novel, new approach to mashups
• We feel this is the future of information mashups from different modalities
ICSC 2008 Julia Grace, IBM Almaden Research
Thank you• Any Questions• Julia Grace ([email protected])