Crowdsourcing Insights with Opinion Space Ken Goldberg, IEOR, School of Information, EECS, UC...
-
Upload
leo-lester -
Category
Documents
-
view
212 -
download
0
Transcript of Crowdsourcing Insights with Opinion Space Ken Goldberg, IEOR, School of Information, EECS, UC...
Crowdsourcing Insights with Opinion SpaceKen Goldberg, IEOR, School of Information, EECS, UC Berkeley
“We’re moving from an Information Age to an Opinion Age.”- Warren Sack, UCSC
Motivation
Goals• Engage community• Understand community
– Solicit input– Understand the distribution of
viewpoints– Discover insightful comments
Goals of Community Members• Understand relationship to other
community members• Participate, express ideas, and be
heard• Encounter a diversity of viewpoints
Motivation
Classic approach: surveys, polls
Drawbacks: limited samples, slow, doesn’t increase engagement
Modern approach: online forums, comment lists
Drawbacks: data deluge, cyberpolarization, hard to discover insights
Approach: Visualization
Approach: Level the Playing Field
Approach: Wisdom of Crowds
Related Work: Visualization
Clockwise, starting from top left:
Morningside Analytics, MusicBox, Starry Night
Related Work: Politics
Clockwise, starting from top left:
EU Profiler, Poligraph, The How Progressive Are You? quiz
Related Work: Opinion Sharing
• Polling & Opinion Mining– Fishkin, 1991: deliberative polling– Dahlgren, 2005: Internet & the
public sphere– Berinsky, 1999: understanding
public opinion– Pang & Lee, 2008: sentiment
analysis
• Increasing Participation– Bishop, 2007: theoretical
framework– Brandtzaeg & Heim: user study– Ludford et al, 2004: uniqueness
& group dissimilarity
Related Work: Info Filtering
• K. Goldberg et al, 2001: Eigentaste
• E. Bitton, 2009: spatial model• Polikar, 2006: ensemble
learning
Opinion Space:Live Demonstration
Six 50-minute Learning Object Modules, preparation materials, slides for in-class lectures, discussion ideas, hand-on activities, and homework assignments.
To try it:google: “opinion space”
contact us:http://goldberg.berkeley.edu
Dimensionality Reduction
low variance projection maximal variance projection
Dimensionality Reduction
Principal Component Analysis (PCA)• Assumes independence and linearity• Minimizes squared error• Scalable: compute position of new user in constant time
Canonical Correlation Analysis
• 2-view PCA• Assume:
– Each data point has a latent low-dim canonical representation z
– Observetwo different representations of each data point (e.g. numerical ratings and text)
• Learn MLEs for low-rank projections A and B
• Equivalently, pick projection that maximizes correlation between views
zz
xx yyGraphical model for CCA
x = Az + εy = Bz + ε
z = A-1x = B-1y
CCA on Opinion Space
• Each user is a data point– xi = user i’s responses to propositions
– yi = vector representation of textual comment
• Run CCA to find A and B, use A-1 to find 2D representation
• Position of users reflects rating vector and textual response
• Ignores ratings that are not correlated with text, and vice versa
• Given text, can predict ratings (using B)
zz
xx yyGraphical model for CCA
x = Az + εy = Bz + ε
z = A-1x = B-1y
Multidimensional Scaling
• Goal: rearrange objects in low dim space so as to reproduce distances in higher dim
• Strategy: Rearrange & compare solns, maximizing goodness of fit:
• Can use any kind of similarity function• Pros
– Data need not be normal, relationships need not be linear
– Tends to yield fewer factors than FA• Con: slow, not scalable
dij f (ij ) 2i, j
δiji
j
diji
j
Kernel-based Nonlinear PCA
• Intuition: in general, can’t linearly separate n points in d < n dim, but can almost always do so in d ≥ n dim
• Method: compute covariance matrix after transforming data into higher dim space
• Kernel trick used to improve complexity• If Φ is the identity, Kernel PCA = PCA
C 1
m x j x j
T j1
m
Kernel-based Nonlinear PCA
• Pro: Good for finding clusters with arbitrary shape• Cons: Need to choose appropriate kernel (no unique
solution); does not preserve distance relationships
Input data KPCA output with Gaussian kernel
Stochastic Neighbor Embedding
• Converts Euclidean dists to conditional probabilities• pj|i = Pr(xi would pick xj as its neighbor | neighbors picked
according to their density under a Gaussian centered at xi)
• Compute similar prob qj|i in lower dim space
• Goal: minimize mismatch between pj|i and qj|i:
• Cons: tends to crowd points in center of map; difficult to optimize
C KL Pi Qi i
p j | i logp j | iq j | ij
i
Metavid
Six 50-minute Learning Object Modules, preparation materials, slides for in-class lectures, discussion ideas, hand-on activities, and homework assignments.
Opinion Space: Crowdsourcing InsightsScalability: n Participants, n Viewpointsn2 Peer to Peer ReviewsViewpoints are k-DimensionalDim. Reduction: 2D Map of Affinity/SimilarityInsight vs. Agreement: Nonlinear Scoring
Ken Goldberg, UC BerkeleyAlec Ross, U.S. State Dept
Opinion SpaceWisdom of Crowds: Insights are RareScalable, Self-Organizing, Spatial Interface Visualize Diversity of ViewpointsIncorporate Position into Scoring Metrics
Ken GoldbergUC Berkeley