Political Speech/Text - Detecting bias and...

28
Political Speech/Text Detecting bias and partisanship in political speeches and blogs Julie Medero UW-EE March 1, 2011 Julie Medero (UW-EE) Political Speech/Text March 1, 2011 1 / 19

Transcript of Political Speech/Text - Detecting bias and...

Page 1: Political Speech/Text - Detecting bias and …ssli.ee.washington.edu/courses/ee517/discussTalks/JM...Political Speech/Text Detecting bias and partisanship in political speeches and

Political Speech/TextDetecting bias and partisanship in political speeches and blogs

Julie Medero

UW-EE

March 1, 2011

Julie Medero (UW-EE) Political Speech/Text March 1, 2011 1 / 19

Page 2: Political Speech/Text - Detecting bias and …ssli.ee.washington.edu/courses/ee517/discussTalks/JM...Political Speech/Text Detecting bias and partisanship in political speeches and

Today’s Papers

Feature Selection and Evaluation for Identifying the Content ofPolitical Conflict

Burt Monroe, Michael Colaresi, Kevin Quinn (political scientists)Survey methods of lexical feature selection and evaluationQualitative comparison of output features

Shedding (a Thousand Points of) Light on Biased LanguageTae Yano, Philip Resnik, Noah Smith (computer scientists)Use feature selection to target sentences for human annotationQuantitative analysis of annotation distributions

Julie Medero (UW-EE) Political Speech/Text March 1, 2011 2 / 19

Page 3: Political Speech/Text - Detecting bias and …ssli.ee.washington.edu/courses/ee517/discussTalks/JM...Political Speech/Text Detecting bias and partisanship in political speeches and

Outline

1 Feature Selection and Evaluation Methods

2 ApplicationsToy ExamplesAnalyzing Bias in Blogs

3 Discussion

Julie Medero (UW-EE) Political Speech/Text March 1, 2011 3 / 19

Page 4: Political Speech/Text - Detecting bias and …ssli.ee.washington.edu/courses/ee517/discussTalks/JM...Political Speech/Text Detecting bias and partisanship in political speeches and

Outline

1 Feature Selection and Evaluation Methods

2 ApplicationsToy ExamplesAnalyzing Bias in Blogs

3 Discussion

Julie Medero (UW-EE) Political Speech/Text March 1, 2011 4 / 19

Page 5: Political Speech/Text - Detecting bias and …ssli.ee.washington.edu/courses/ee517/discussTalks/JM...Political Speech/Text Detecting bias and partisanship in political speeches and

Feature Selection and Evaluation

Difference Between Selection and EvaluationFeature Selection Binary, in-or-out DecisionFeature Evaluation Quantify usefulness of features (e.g. weighting)

Task DescriptionIdentifying partisan words in Congressional speeches on a topic

Three Ways to Frame the ProblemClassification TaskNon-Model ApproachesModel Approaches

Julie Medero (UW-EE) Political Speech/Text March 1, 2011 5 / 19

Page 6: Political Speech/Text - Detecting bias and …ssli.ee.washington.edu/courses/ee517/discussTalks/JM...Political Speech/Text Detecting bias and partisanship in political speeches and

Feature Selection and Evaluation

Difference Between Selection and EvaluationFeature Selection Binary, in-or-out DecisionFeature Evaluation Quantify usefulness of features (e.g. weighting)

Task DescriptionIdentifying partisan words in Congressional speeches on a topic

Three Ways to Frame the ProblemClassification TaskNon-Model ApproachesModel Approaches

Julie Medero (UW-EE) Political Speech/Text March 1, 2011 5 / 19

Page 7: Political Speech/Text - Detecting bias and …ssli.ee.washington.edu/courses/ee517/discussTalks/JM...Political Speech/Text Detecting bias and partisanship in political speeches and

Feature Selection and Evaluation

Difference Between Selection and EvaluationFeature Selection Binary, in-or-out DecisionFeature Evaluation Quantify usefulness of features (e.g. weighting)

Task DescriptionIdentifying partisan words in Congressional speeches on a topic

Three Ways to Frame the ProblemClassification TaskNon-Model ApproachesModel Approaches

Julie Medero (UW-EE) Political Speech/Text March 1, 2011 5 / 19

Page 8: Political Speech/Text - Detecting bias and …ssli.ee.washington.edu/courses/ee517/discussTalks/JM...Political Speech/Text Detecting bias and partisanship in political speeches and

Classification Approach to Feature Selection

Approach SummaryUse standard ML tools (SVM, AdaBoost, Random Forests)Find words that predict partisanship: c : w → p

ProblemsPartisanship is not a function of word choice – it’s (if anything) theother way aroundTreating partisanship as an unknown doesn’t make sense – it’sobservable

Julie Medero (UW-EE) Political Speech/Text March 1, 2011 6 / 19

Page 9: Political Speech/Text - Detecting bias and …ssli.ee.washington.edu/courses/ee517/discussTalks/JM...Political Speech/Text Detecting bias and partisanship in political speeches and

Non-Model Approaches

Methods(Normalized) Frequency Difference

H

(Log) Odds Ratio

L S

TF-IDF

S

WordScore

L H (normalized)

Problems

H Over-emphasis on high-frequency wordsStop lists can take out useful words, leave high-frequencynon-function words (e.g. Senate)L Over-emphasis on low-frequency wordsFrequency threshold just pushes problem off to words just overthe thresholdS SmoothingNeed to handle case of zero count in one set

Julie Medero (UW-EE) Political Speech/Text March 1, 2011 7 / 19

Page 10: Political Speech/Text - Detecting bias and …ssli.ee.washington.edu/courses/ee517/discussTalks/JM...Political Speech/Text Detecting bias and partisanship in political speeches and

Non-Model Approaches

Methods(Normalized) Frequency Difference H(Log) Odds Ratio

L S

TF-IDF

S

WordScore

L

H (normalized)

ProblemsH Over-emphasis on high-frequency wordsStop lists can take out useful words, leave high-frequencynon-function words (e.g. Senate)

L Over-emphasis on low-frequency wordsFrequency threshold just pushes problem off to words just overthe thresholdS SmoothingNeed to handle case of zero count in one set

Julie Medero (UW-EE) Political Speech/Text March 1, 2011 7 / 19

Page 11: Political Speech/Text - Detecting bias and …ssli.ee.washington.edu/courses/ee517/discussTalks/JM...Political Speech/Text Detecting bias and partisanship in political speeches and

Non-Model Approaches

Methods(Normalized) Frequency Difference H(Log) Odds Ratio L

S

TF-IDF

S

WordScore L H (normalized)

ProblemsH Over-emphasis on high-frequency wordsStop lists can take out useful words, leave high-frequencynon-function words (e.g. Senate)L Over-emphasis on low-frequency wordsFrequency threshold just pushes problem off to words just overthe threshold

S SmoothingNeed to handle case of zero count in one set

Julie Medero (UW-EE) Political Speech/Text March 1, 2011 7 / 19

Page 12: Political Speech/Text - Detecting bias and …ssli.ee.washington.edu/courses/ee517/discussTalks/JM...Political Speech/Text Detecting bias and partisanship in political speeches and

Non-Model Approaches

Methods(Normalized) Frequency Difference H(Log) Odds Ratio L STF-IDF SWordScore L H (normalized)

ProblemsH Over-emphasis on high-frequency wordsStop lists can take out useful words, leave high-frequencynon-function words (e.g. Senate)L Over-emphasis on low-frequency wordsFrequency threshold just pushes problem off to words just overthe thresholdS SmoothingNeed to handle case of zero count in one set

Julie Medero (UW-EE) Political Speech/Text March 1, 2011 7 / 19

Page 13: Political Speech/Text - Detecting bias and …ssli.ee.washington.edu/courses/ee517/discussTalks/JM...Political Speech/Text Detecting bias and partisanship in political speeches and

Model-Based Approach

Generative model of word choice, P(w |p)

Assume a multinomial distributionML Estimate of odds ratio is the smoothed log odds-ratioBonus: generative model means we can estimate (and scale by)sample variance

Choosing a PriorAn informed prior can help keep function words from beingweighted too heavilyDirichlet prior works well in practice and yields closed-formsolutionsLaplace priors force more parameters to zero, but makescomputation messy.

Julie Medero (UW-EE) Political Speech/Text March 1, 2011 8 / 19

Page 14: Political Speech/Text - Detecting bias and …ssli.ee.washington.edu/courses/ee517/discussTalks/JM...Political Speech/Text Detecting bias and partisanship in political speeches and

Model-Based Approach

Generative model of word choice, P(w |p)

Assume a multinomial distributionML Estimate of odds ratio is the smoothed log odds-ratioBonus: generative model means we can estimate (and scale by)sample variance

Choosing a PriorAn informed prior can help keep function words from beingweighted too heavilyDirichlet prior works well in practice and yields closed-formsolutionsLaplace priors force more parameters to zero, but makescomputation messy.

Julie Medero (UW-EE) Political Speech/Text March 1, 2011 8 / 19

Page 15: Political Speech/Text - Detecting bias and …ssli.ee.washington.edu/courses/ee517/discussTalks/JM...Political Speech/Text Detecting bias and partisanship in political speeches and

Outline

1 Feature Selection and Evaluation Methods

2 ApplicationsToy ExamplesAnalyzing Bias in Blogs

3 Discussion

Julie Medero (UW-EE) Political Speech/Text March 1, 2011 9 / 19

Page 16: Political Speech/Text - Detecting bias and …ssli.ee.washington.edu/courses/ee517/discussTalks/JM...Political Speech/Text Detecting bias and partisanship in political speeches and

Gender Differences in Senate Speeches

Julie Medero (UW-EE) Political Speech/Text March 1, 2011 10 / 19

Page 17: Political Speech/Text - Detecting bias and …ssli.ee.washington.edu/courses/ee517/discussTalks/JM...Political Speech/Text Detecting bias and partisanship in political speeches and

Issue Framing Over Time

Julie Medero (UW-EE) Political Speech/Text March 1, 2011 11 / 19

Page 18: Political Speech/Text - Detecting bias and …ssli.ee.washington.edu/courses/ee517/discussTalks/JM...Political Speech/Text Detecting bias and partisanship in political speeches and

Polarization of Issues Over Time

Julie Medero (UW-EE) Political Speech/Text March 1, 2011 12 / 19

Page 19: Political Speech/Text - Detecting bias and …ssli.ee.washington.edu/courses/ee517/discussTalks/JM...Political Speech/Text Detecting bias and partisanship in political speeches and

Analyzing Bias in Blogs

What is Bias?Tendency or preference towards a particular perspective, ideologyor resultDifferent from subjectivity

Task DescriptionLook at distribution of biased sentences in liberal and conservativeblogsBase labels on human annotationsUse word feature selection to pick sentences likely to have bias

Julie Medero (UW-EE) Political Speech/Text March 1, 2011 13 / 19

Page 20: Political Speech/Text - Detecting bias and …ssli.ee.washington.edu/courses/ee517/discussTalks/JM...Political Speech/Text Detecting bias and partisanship in political speeches and

Analyzing Bias in Blogs

What is Bias?Tendency or preference towards a particular perspective, ideologyor resultDifferent from subjectivity

Task DescriptionLook at distribution of biased sentences in liberal and conservativeblogsBase labels on human annotationsUse word feature selection to pick sentences likely to have bias

Julie Medero (UW-EE) Political Speech/Text March 1, 2011 13 / 19

Page 21: Political Speech/Text - Detecting bias and …ssli.ee.washington.edu/courses/ee517/discussTalks/JM...Political Speech/Text Detecting bias and partisanship in political speeches and

Data Set

SourcesExtracted 260k sentences from 13k blog posts:

Liberal: Digby, Think Progress, Talking Points MemoConservative: American Thinker, Hot Air, Michelle Malkin

Sentence SelectionLimit to 8-40 tokens longChoose sentences from 3 categories:

Partisan bigrams selected using log-likelihood ratio + stop word listPennebaker emotion words (Negative Emotion, Positive Emotion,Causation, Anger)Kill verbs (kill, slaughter, assassinate, shoot, poison, strangle,smother, choke, drown, suffocate, starve)

Julie Medero (UW-EE) Political Speech/Text March 1, 2011 14 / 19

Page 22: Political Speech/Text - Detecting bias and …ssli.ee.washington.edu/courses/ee517/discussTalks/JM...Political Speech/Text Detecting bias and partisanship in political speeches and

Data Set

SourcesExtracted 260k sentences from 13k blog posts:

Liberal: Digby, Think Progress, Talking Points MemoConservative: American Thinker, Hot Air, Michelle Malkin

Sentence SelectionLimit to 8-40 tokens longChoose sentences from 3 categories:

Partisan bigrams selected using log-likelihood ratio + stop word listPennebaker emotion words (Negative Emotion, Positive Emotion,Causation, Anger)Kill verbs (kill, slaughter, assassinate, shoot, poison, strangle,smother, choke, drown, suffocate, starve)

Julie Medero (UW-EE) Political Speech/Text March 1, 2011 14 / 19

Page 23: Political Speech/Text - Detecting bias and …ssli.ee.washington.edu/courses/ee517/discussTalks/JM...Political Speech/Text Detecting bias and partisanship in political speeches and

Annotators

Amazon Mechanical TurkU.S. citizens with 90% task approval ratingAvg. pairwise κ = .55, .50 for 50 most frequent workersOne annotator discarded for low agreementPaid $0.02-$0.04 per sentence (total cost $212)

User Information CollectedWho voted for in 2008 presidential election?Liberal or conservative for social issues?Liberal or conservative for fiscal issues?

Julie Medero (UW-EE) Political Speech/Text March 1, 2011 15 / 19

Page 24: Political Speech/Text - Detecting bias and …ssli.ee.washington.edu/courses/ee517/discussTalks/JM...Political Speech/Text Detecting bias and partisanship in political speeches and

Annotators

Amazon Mechanical TurkU.S. citizens with 90% task approval ratingAvg. pairwise κ = .55, .50 for 50 most frequent workersOne annotator discarded for low agreementPaid $0.02-$0.04 per sentence (total cost $212)

User Information CollectedWho voted for in 2008 presidential election?Liberal or conservative for social issues?Liberal or conservative for fiscal issues?

Julie Medero (UW-EE) Political Speech/Text March 1, 2011 15 / 19

Page 25: Political Speech/Text - Detecting bias and …ssli.ee.washington.edu/courses/ee517/discussTalks/JM...Political Speech/Text Detecting bias and partisanship in political speeches and

Annotation Task

Julie Medero (UW-EE) Political Speech/Text March 1, 2011 16 / 19

Page 26: Political Speech/Text - Detecting bias and …ssli.ee.washington.edu/courses/ee517/discussTalks/JM...Political Speech/Text Detecting bias and partisanship in political speeches and

Results

Overall distribution of Labels

Distribution of Labels by Source

Julie Medero (UW-EE) Political Speech/Text March 1, 2011 17 / 19

Page 27: Political Speech/Text - Detecting bias and …ssli.ee.washington.edu/courses/ee517/discussTalks/JM...Political Speech/Text Detecting bias and partisanship in political speeches and

Outline

1 Feature Selection and Evaluation Methods

2 ApplicationsToy ExamplesAnalyzing Bias in Blogs

3 Discussion

Julie Medero (UW-EE) Political Speech/Text March 1, 2011 18 / 19

Page 28: Political Speech/Text - Detecting bias and …ssli.ee.washington.edu/courses/ee517/discussTalks/JM...Political Speech/Text Detecting bias and partisanship in political speeches and

Discussion

Things that Surprised MeHow new the application of NLP to political texts is – the 2008article puts “text-as-data” in quotes!

QuestionsIs the Bias paper measuring the content of the blogs, what theirsentence-selection features choose, or some combination?What sort of consistency did the Bias study get on the wordevidence question? How did that correlate with theirsentence-selection features?Aren’t you really just sticking your threshold into your priorparameters with the Laplace prior?

Julie Medero (UW-EE) Political Speech/Text March 1, 2011 19 / 19