University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf ·...

224
Matching Methods for High-Dimensional Data with Applications to Text Molly Roberts, Brandon Stewart and Rich Nielsen UCSD, Princeton and MIT May 11, 2016 Roberts (UCSD) Text Matching May 11, 2016 1 / 32

Transcript of University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf ·...

Page 1: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for High-Dimensional Data withApplications to Text

Molly Roberts, Brandon Stewart and Rich Nielsen

UCSD, Princeton and MIT

May 11, 2016

Roberts (UCSD) Text Matching May 11, 2016 1 / 32

Page 2: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

How do people react to online repression?

I Lots of governments try to control online information

I Censoring the whole internet is hard (# of bloggers � # of censors)

I Limited external enforcement gov’ts scare people into self-policing

I Governments might jail some bloggers toscare people

I Then encourage self-censorship by signalingoff-limits topics

Roberts (UCSD) Text Matching 28 April 2016 2 / 32

Page 3: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

How do people react to online repression?

I Lots of governments try to control online information

I Censoring the whole internet is hard (# of bloggers � # of censors)

I Limited external enforcement gov’ts scare people into self-policing

I Governments might jail some bloggers toscare people

I Then encourage self-censorship by signalingoff-limits topics

Roberts (UCSD) Text Matching 28 April 2016 2 / 32

Page 4: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

How do people react to online repression?

I Lots of governments try to control online information

I Censoring the whole internet is hard (# of bloggers � # of censors)

I Limited external enforcement gov’ts scare people into self-policing

I Governments might jail some bloggers toscare people

I Then encourage self-censorship by signalingoff-limits topics

Roberts (UCSD) Text Matching 28 April 2016 2 / 32

Page 5: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

How do people react to online repression?

I Lots of governments try to control online information

I Censoring the whole internet is hard (# of bloggers � # of censors)

I Limited external enforcement gov’ts scare people into self-policing

I Governments might jail some bloggers toscare people

I Then encourage self-censorship by signalingoff-limits topics

Roberts (UCSD) Text Matching 28 April 2016 2 / 32

Page 6: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

How do people react to online repression?

I Lots of governments try to control online information

I Censoring the whole internet is hard (# of bloggers � # of censors)

I Limited external enforcement gov’ts scare people into self-policing

I Governments might jail some bloggers toscare people

I Then encourage self-censorship by signalingoff-limits topics

Roberts (UCSD) Text Matching 28 April 2016 2 / 32

Page 7: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

How do people react to online repression?

I Lots of governments try to control online information

I Censoring the whole internet is hard (# of bloggers � # of censors)

I Limited external enforcement gov’ts scare people into self-policing

I Governments might jail some bloggers toscare people

I Then encourage self-censorship by signalingoff-limits topics

Roberts (UCSD) Text Matching 28 April 2016 2 / 32

Page 8: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

How do people react to online repression?

I Lots of governments try to control online information

I Censoring the whole internet is hard (# of bloggers � # of censors)

I Limited external enforcement gov’ts scare people into self-policing

I Governments might jail some bloggers toscare people

I Then encourage self-censorship by signalingoff-limits topics

Roberts (UCSD) Text Matching 28 April 2016 2 / 32

Page 9: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

How do people react to online repression?

I Lots of governments try to control online informationI Censoring the whole internet is hard (# of bloggers � # of censors)I Limited external enforcement gov’ts scare people into self-policing

I Governments might jail some bloggers toscare people

I Then encourage self-censorship by signalingoff-limits topics

I But this could turn out one of two ways:I Bloggers might take cues to self-censor ORI Bloggers might hate censorship and rebel

I how censorship works: important tounderstand self-censorship

I BUT self-censorship very hard to measure

Roberts (UCSD) Text Matching 28 April 2016 3 / 32

Page 10: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

How do people react to online repression?

I Lots of governments try to control online informationI Censoring the whole internet is hard (# of bloggers � # of censors)I Limited external enforcement gov’ts scare people into self-policing

I Governments might jail some bloggers toscare people

I Then encourage self-censorship by signalingoff-limits topics

I But this could turn out one of two ways:

I Bloggers might take cues to self-censor ORI Bloggers might hate censorship and rebel

I how censorship works: important tounderstand self-censorship

I BUT self-censorship very hard to measure

Roberts (UCSD) Text Matching 28 April 2016 3 / 32

Page 11: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

How do people react to online repression?

I Lots of governments try to control online informationI Censoring the whole internet is hard (# of bloggers � # of censors)I Limited external enforcement gov’ts scare people into self-policing

I Governments might jail some bloggers toscare people

I Then encourage self-censorship by signalingoff-limits topics

I But this could turn out one of two ways:I Bloggers might take cues to self-censor OR

I Bloggers might hate censorship and rebel

I how censorship works: important tounderstand self-censorship

I BUT self-censorship very hard to measure

Roberts (UCSD) Text Matching 28 April 2016 3 / 32

Page 12: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

How do people react to online repression?

I Lots of governments try to control online informationI Censoring the whole internet is hard (# of bloggers � # of censors)I Limited external enforcement gov’ts scare people into self-policing

I Governments might jail some bloggers toscare people

I Then encourage self-censorship by signalingoff-limits topics

I But this could turn out one of two ways:I Bloggers might take cues to self-censor ORI Bloggers might hate censorship and rebel

I how censorship works: important tounderstand self-censorship

I BUT self-censorship very hard to measure

Roberts (UCSD) Text Matching 28 April 2016 3 / 32

Page 13: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

How do people react to online repression?

I Lots of governments try to control online informationI Censoring the whole internet is hard (# of bloggers � # of censors)I Limited external enforcement gov’ts scare people into self-policing

I Governments might jail some bloggers toscare people

I Then encourage self-censorship by signalingoff-limits topics

I But this could turn out one of two ways:I Bloggers might take cues to self-censor ORI Bloggers might hate censorship and rebel

I how censorship works: important tounderstand self-censorship

I BUT self-censorship very hard to measure

Roberts (UCSD) Text Matching 28 April 2016 3 / 32

Page 14: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

How do people react to online repression?

I Lots of governments try to control online informationI Censoring the whole internet is hard (# of bloggers � # of censors)I Limited external enforcement gov’ts scare people into self-policing

I Governments might jail some bloggers toscare people

I Then encourage self-censorship by signalingoff-limits topics

I But this could turn out one of two ways:I Bloggers might take cues to self-censor ORI Bloggers might hate censorship and rebel

I how censorship works: important tounderstand self-censorship

I BUT self-censorship very hard to measure

Roberts (UCSD) Text Matching 28 April 2016 3 / 32

Page 15: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

How do people react to online repression?

I Lots of governments try to control online informationI Censoring the whole internet is hard (# of bloggers � # of censors)I Limited external enforcement gov’ts scare people into self-policing

I Governments might jail some bloggers toscare people

I Then encourage self-censorship by signalingoff-limits topics

I But this could turn out one of two ways:I Bloggers might take cues to self-censor ORI Bloggers might hate censorship and rebel

I how censorship works: important tounderstand self-censorship

I BUT self-censorship very hard to measure

Roberts (UCSD) Text Matching 28 April 2016 3 / 32

Page 16: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

The perfect experiment

1. Be the Chinese government

2. Randomly assign censorship

3. See what bloggers write after censorship

Problem 1: unethicalProblem 2: we aren’t the Chinese government

Roberts (UCSD) Text Matching 28 April 2016 4 / 32

Page 17: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

The perfect experiment

1. Be the Chinese government

2. Randomly assign censorship

3. See what bloggers write after censorship

Problem 1: unethicalProblem 2: we aren’t the Chinese government

Roberts (UCSD) Text Matching 28 April 2016 4 / 32

Page 18: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

The perfect experiment

1. Be the Chinese government

2. Randomly assign censorship

3. See what bloggers write after censorship

Problem 1: unethicalProblem 2: we aren’t the Chinese government

Roberts (UCSD) Text Matching 28 April 2016 4 / 32

Page 19: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

The perfect experiment

1. Be the Chinese government

2. Randomly assign censorship

3. See what bloggers write after censorship

Problem 1: unethicalProblem 2: we aren’t the Chinese government

Roberts (UCSD) Text Matching 28 April 2016 4 / 32

Page 20: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

The perfect experiment

1. Be the Chinese government

2. Randomly assign censorship

3. See what bloggers write after censorship

Problem 1: unethical

Problem 2: we aren’t the Chinese government

Roberts (UCSD) Text Matching 28 April 2016 4 / 32

Page 21: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

The perfect experiment

1. Be the Chinese government

2. Randomly assign censorship

3. See what bloggers write after censorship

Problem 1: unethicalProblem 2: we aren’t the Chinese government

Roberts (UCSD) Text Matching 28 April 2016 4 / 32

Page 22: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

How Can We Measure Deterrence?

The best approximation:

Find two bloggers

Xsimilar users,

Xsimilar censorship histories,

Xsimilar numbers of posts

Xsimilar previous post sensitivity......

with very similar postswith similar posts

written on the same day

Only one censoredOnly one censored

Censorship ‘Mistake’

Does the censored blogger’s behavior change?

Does the censored blogger stay away from the topic?

Does the censored blogger pursue the topic?

Roberts (UCSD) Text Matching 28 April 2016 5 / 32

Page 23: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

How Can We Measure Deterrence?

The best approximation:

Find two bloggers

Xsimilar users,

Xsimilar censorship histories,

Xsimilar numbers of posts

Xsimilar previous post sensitivity......

with very similar postswith similar posts

written on the same day

Only one censoredOnly one censored

Censorship ‘Mistake’

Does the censored blogger’s behavior change?

Does the censored blogger stay away from the topic?

Does the censored blogger pursue the topic?

Roberts (UCSD) Text Matching 28 April 2016 5 / 32

Page 24: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

How Can We Measure Deterrence?

The best approximation:

Find two bloggers

Xsimilar users,

Xsimilar censorship histories,

Xsimilar numbers of posts

Xsimilar previous post sensitivity......

with very similar postswith similar posts

written on the same day

Only one censoredOnly one censored

Censorship ‘Mistake’

Does the censored blogger’s behavior change?

Does the censored blogger stay away from the topic?

Does the censored blogger pursue the topic?

Roberts (UCSD) Text Matching 28 April 2016 5 / 32

Page 25: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

How Can We Measure Deterrence?

The best approximation:

Find two bloggers

Xsimilar users,

Xsimilar censorship histories,

Xsimilar numbers of posts

Xsimilar previous post sensitivity......

with very similar postswith similar posts

written on the same day

Only one censoredOnly one censored

Censorship ‘Mistake’

Does the censored blogger’s behavior change?

Does the censored blogger stay away from the topic?

Does the censored blogger pursue the topic?

Roberts (UCSD) Text Matching 28 April 2016 5 / 32

Page 26: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

How Can We Measure Deterrence?

The best approximation:

Find two bloggers

Xsimilar users,

Xsimilar censorship histories,

Xsimilar numbers of posts

Xsimilar previous post sensitivity......

with very similar posts

with similar posts

written on the same day

Only one censoredOnly one censored

Censorship ‘Mistake’

Does the censored blogger’s behavior change?

Does the censored blogger stay away from the topic?

Does the censored blogger pursue the topic?

Roberts (UCSD) Text Matching 28 April 2016 5 / 32

Page 27: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

How Can We Measure Deterrence?

The best approximation:

Find two bloggers

Xsimilar users,

Xsimilar censorship histories,

Xsimilar numbers of posts

Xsimilar previous post sensitivity......

with very similar posts

with similar posts

written on the same day

Only one censoredOnly one censored

Censorship ‘Mistake’

Does the censored blogger’s behavior change?

Does the censored blogger stay away from the topic?

Does the censored blogger pursue the topic?

Roberts (UCSD) Text Matching 28 April 2016 5 / 32

Page 28: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

How Can We Measure Deterrence?

The best approximation:

Find two bloggers

Xsimilar users,

Xsimilar censorship histories,

Xsimilar numbers of posts

Xsimilar previous post sensitivity......

with very similar postswith similar posts

written on the same day

Only one censored

Only one censored

Censorship ‘Mistake’

Does the censored blogger’s behavior change?

Does the censored blogger stay away from the topic?

Does the censored blogger pursue the topic?

Roberts (UCSD) Text Matching 28 April 2016 5 / 32

Page 29: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

How Can We Measure Deterrence?

The best approximation:

Find two bloggers

Xsimilar users,

Xsimilar censorship histories,

Xsimilar numbers of posts

Xsimilar previous post sensitivity......

with very similar postswith similar posts

written on the same day

Only one censored

Only one censored

Censorship ‘Mistake’

Does the censored blogger’s behavior change?

Does the censored blogger stay away from the topic?

Does the censored blogger pursue the topic?

Roberts (UCSD) Text Matching 28 April 2016 5 / 32

Page 30: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

How Can We Measure Deterrence?

The best approximation:

Find two bloggers

Xsimilar users,

Xsimilar censorship histories,

Xsimilar numbers of posts

Xsimilar previous post sensitivity......

with very similar postswith similar posts

written on the same day

Only one censored

Only one censored

Censorship ‘Mistake’

Does the censored blogger’s behavior change?

Does the censored blogger stay away from the topic?

Does the censored blogger pursue the topic?

Roberts (UCSD) Text Matching 28 April 2016 5 / 32

Page 31: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

How Can We Measure Deterrence?

The best approximation:

Find two bloggers

Xsimilar users,

Xsimilar censorship histories,

Xsimilar numbers of posts

Xsimilar previous post sensitivity......

with very similar postswith similar posts

written on the same day

Only one censored

Only one censored

Censorship ‘Mistake’

Does the censored blogger’s behavior change?

Does the censored blogger stay away from the topic?

Does the censored blogger pursue the topic?

Roberts (UCSD) Text Matching 28 April 2016 5 / 32

Page 32: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

How Can We Measure Deterrence?

The best approximation:

Find two bloggers

Xsimilar users,

Xsimilar censorship histories,

Xsimilar numbers of posts

Xsimilar previous post sensitivity......

with very similar postswith similar posts

written on the same day

Only one censored

Only one censored

Censorship ‘Mistake’

Does the censored blogger’s behavior change?

Does the censored blogger stay away from the topic?

Does the censored blogger pursue the topic?

Roberts (UCSD) Text Matching 28 April 2016 5 / 32

Page 33: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

Text Matching

I Text as pre-treatment confounder a surprisingly frequent problemI Applications

I Does censorship change a bloggers behavior?I Do targeted killings of islamic extremists create interest in their work?I In International Relations, are women cited less frequently than men?I Control for letters of recommendation, trade treaties, Congressional

bills, etc

I BUT existing matching methods impossible to apply tohigh-dimensional data

I You can’t possibly match on every word! (and you wouldn’t want to)I We care about controlling for covariates predictive of treatmentI But with text, we don’t know what predicts treatment

Very little work on this.

Roberts (UCSD) Text Matching 28 April 2016 6 / 32

Page 34: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

Text Matching

I Text as pre-treatment confounder

a surprisingly frequent problemI Applications

I Does censorship change a bloggers behavior?I Do targeted killings of islamic extremists create interest in their work?I In International Relations, are women cited less frequently than men?I Control for letters of recommendation, trade treaties, Congressional

bills, etc

I BUT existing matching methods impossible to apply tohigh-dimensional data

I You can’t possibly match on every word! (and you wouldn’t want to)I We care about controlling for covariates predictive of treatmentI But with text, we don’t know what predicts treatment

Very little work on this.

Roberts (UCSD) Text Matching 28 April 2016 6 / 32

Page 35: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

Text Matching

I Text as pre-treatment confounder a surprisingly frequent problem

I ApplicationsI Does censorship change a bloggers behavior?I Do targeted killings of islamic extremists create interest in their work?I In International Relations, are women cited less frequently than men?I Control for letters of recommendation, trade treaties, Congressional

bills, etc

I BUT existing matching methods impossible to apply tohigh-dimensional data

I You can’t possibly match on every word! (and you wouldn’t want to)I We care about controlling for covariates predictive of treatmentI But with text, we don’t know what predicts treatment

Very little work on this.

Roberts (UCSD) Text Matching 28 April 2016 6 / 32

Page 36: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

Text Matching

I Text as pre-treatment confounder a surprisingly frequent problemI Applications

I Does censorship change a bloggers behavior?I Do targeted killings of islamic extremists create interest in their work?I In International Relations, are women cited less frequently than men?I Control for letters of recommendation, trade treaties, Congressional

bills, etc

I BUT existing matching methods impossible to apply tohigh-dimensional data

I You can’t possibly match on every word! (and you wouldn’t want to)I We care about controlling for covariates predictive of treatmentI But with text, we don’t know what predicts treatment

Very little work on this.

Roberts (UCSD) Text Matching 28 April 2016 6 / 32

Page 37: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

Text Matching

I Text as pre-treatment confounder a surprisingly frequent problemI Applications

I Does censorship change a bloggers behavior?

I Do targeted killings of islamic extremists create interest in their work?I In International Relations, are women cited less frequently than men?I Control for letters of recommendation, trade treaties, Congressional

bills, etc

I BUT existing matching methods impossible to apply tohigh-dimensional data

I You can’t possibly match on every word! (and you wouldn’t want to)I We care about controlling for covariates predictive of treatmentI But with text, we don’t know what predicts treatment

Very little work on this.

Roberts (UCSD) Text Matching 28 April 2016 6 / 32

Page 38: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

Text Matching

I Text as pre-treatment confounder a surprisingly frequent problemI Applications

I Does censorship change a bloggers behavior?I Do targeted killings of islamic extremists create interest in their work?

I In International Relations, are women cited less frequently than men?I Control for letters of recommendation, trade treaties, Congressional

bills, etc

I BUT existing matching methods impossible to apply tohigh-dimensional data

I You can’t possibly match on every word! (and you wouldn’t want to)I We care about controlling for covariates predictive of treatmentI But with text, we don’t know what predicts treatment

Very little work on this.

Roberts (UCSD) Text Matching 28 April 2016 6 / 32

Page 39: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

Text Matching

I Text as pre-treatment confounder a surprisingly frequent problemI Applications

I Does censorship change a bloggers behavior?I Do targeted killings of islamic extremists create interest in their work?I In International Relations, are women cited less frequently than men?

I Control for letters of recommendation, trade treaties, Congressionalbills, etc

I BUT existing matching methods impossible to apply tohigh-dimensional data

I You can’t possibly match on every word! (and you wouldn’t want to)I We care about controlling for covariates predictive of treatmentI But with text, we don’t know what predicts treatment

Very little work on this.

Roberts (UCSD) Text Matching 28 April 2016 6 / 32

Page 40: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

Text Matching

I Text as pre-treatment confounder a surprisingly frequent problemI Applications

I Does censorship change a bloggers behavior?I Do targeted killings of islamic extremists create interest in their work?I In International Relations, are women cited less frequently than men?I Control for letters of recommendation, trade treaties, Congressional

bills, etc

I BUT existing matching methods impossible to apply tohigh-dimensional data

I You can’t possibly match on every word! (and you wouldn’t want to)I We care about controlling for covariates predictive of treatmentI But with text, we don’t know what predicts treatment

Very little work on this.

Roberts (UCSD) Text Matching 28 April 2016 6 / 32

Page 41: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

Text Matching

I Text as pre-treatment confounder a surprisingly frequent problemI Applications

I Does censorship change a bloggers behavior?I Do targeted killings of islamic extremists create interest in their work?I In International Relations, are women cited less frequently than men?I Control for letters of recommendation, trade treaties, Congressional

bills, etc

I BUT existing matching methods impossible to apply tohigh-dimensional data

I You can’t possibly match on every word! (and you wouldn’t want to)I We care about controlling for covariates predictive of treatmentI But with text, we don’t know what predicts treatment

Very little work on this.

Roberts (UCSD) Text Matching 28 April 2016 6 / 32

Page 42: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

Text Matching

I Text as pre-treatment confounder a surprisingly frequent problemI Applications

I Does censorship change a bloggers behavior?I Do targeted killings of islamic extremists create interest in their work?I In International Relations, are women cited less frequently than men?I Control for letters of recommendation, trade treaties, Congressional

bills, etc

I BUT existing matching methods impossible to apply tohigh-dimensional data

I You can’t possibly match on every word! (and you wouldn’t want to)

I We care about controlling for covariates predictive of treatmentI But with text, we don’t know what predicts treatment

Very little work on this.

Roberts (UCSD) Text Matching 28 April 2016 6 / 32

Page 43: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

Text Matching

I Text as pre-treatment confounder a surprisingly frequent problemI Applications

I Does censorship change a bloggers behavior?I Do targeted killings of islamic extremists create interest in their work?I In International Relations, are women cited less frequently than men?I Control for letters of recommendation, trade treaties, Congressional

bills, etc

I BUT existing matching methods impossible to apply tohigh-dimensional data

I You can’t possibly match on every word! (and you wouldn’t want to)I We care about controlling for covariates predictive of treatment

I But with text, we don’t know what predicts treatment

Very little work on this.

Roberts (UCSD) Text Matching 28 April 2016 6 / 32

Page 44: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

Text Matching

I Text as pre-treatment confounder a surprisingly frequent problemI Applications

I Does censorship change a bloggers behavior?I Do targeted killings of islamic extremists create interest in their work?I In International Relations, are women cited less frequently than men?I Control for letters of recommendation, trade treaties, Congressional

bills, etc

I BUT existing matching methods impossible to apply tohigh-dimensional data

I You can’t possibly match on every word! (and you wouldn’t want to)I We care about controlling for covariates predictive of treatmentI But with text, we don’t know what predicts treatment

Very little work on this.

Roberts (UCSD) Text Matching 28 April 2016 6 / 32

Page 45: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

Text Matching

I Text as pre-treatment confounder a surprisingly frequent problemI Applications

I Does censorship change a bloggers behavior?I Do targeted killings of islamic extremists create interest in their work?I In International Relations, are women cited less frequently than men?I Control for letters of recommendation, trade treaties, Congressional

bills, etc

I BUT existing matching methods impossible to apply tohigh-dimensional data

I You can’t possibly match on every word! (and you wouldn’t want to)I We care about controlling for covariates predictive of treatmentI But with text, we don’t know what predicts treatment

Very little work on this.

Roberts (UCSD) Text Matching 28 April 2016 6 / 32

Page 46: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

Our Approach to Text Matching

1. Construct analogs to current methods

I Propensity score matching Multinomial Inverse RegressionI Coarsened exact matching Topically Coarsened Matching

2. Identify benefits and drawbacks of each

3. Create a new method

Topical Inverse Regression Matching (TIRM),by combining the two

Roberts (UCSD) Text Matching 28 April 2016 7 / 32

Page 47: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

Our Approach to Text Matching

1. Construct analogs to current methods

I Propensity score matching Multinomial Inverse RegressionI Coarsened exact matching Topically Coarsened Matching

2. Identify benefits and drawbacks of each

3. Create a new method

Topical Inverse Regression Matching (TIRM),by combining the two

Roberts (UCSD) Text Matching 28 April 2016 7 / 32

Page 48: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

Our Approach to Text Matching

1. Construct analogs to current methodsI Propensity score matching Multinomial Inverse Regression

I Coarsened exact matching Topically Coarsened Matching

2. Identify benefits and drawbacks of each

3. Create a new method

Topical Inverse Regression Matching (TIRM),by combining the two

Roberts (UCSD) Text Matching 28 April 2016 7 / 32

Page 49: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

Our Approach to Text Matching

1. Construct analogs to current methodsI Propensity score matching Multinomial Inverse RegressionI Coarsened exact matching Topically Coarsened Matching

2. Identify benefits and drawbacks of each

3. Create a new method

Topical Inverse Regression Matching (TIRM),by combining the two

Roberts (UCSD) Text Matching 28 April 2016 7 / 32

Page 50: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

Our Approach to Text Matching

1. Construct analogs to current methodsI Propensity score matching Multinomial Inverse RegressionI Coarsened exact matching Topically Coarsened Matching

2. Identify benefits and drawbacks of each

3. Create a new method

Topical Inverse Regression Matching (TIRM),by combining the two

Roberts (UCSD) Text Matching 28 April 2016 7 / 32

Page 51: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

Our Approach to Text Matching

1. Construct analogs to current methodsI Propensity score matching Multinomial Inverse RegressionI Coarsened exact matching Topically Coarsened Matching

2. Identify benefits and drawbacks of each

3. Create a new method

Topical Inverse Regression Matching (TIRM),by combining the two

Roberts (UCSD) Text Matching 28 April 2016 7 / 32

Page 52: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

Our Approach to Text Matching

1. Construct analogs to current methodsI Propensity score matching Multinomial Inverse RegressionI Coarsened exact matching Topically Coarsened Matching

2. Identify benefits and drawbacks of each

3. Create a new method Topical Inverse Regression Matching (TIRM)

,by combining the two

Roberts (UCSD) Text Matching 28 April 2016 7 / 32

Page 53: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

Our Approach to Text Matching

1. Construct analogs to current methodsI Propensity score matching Multinomial Inverse RegressionI Coarsened exact matching Topically Coarsened Matching

2. Identify benefits and drawbacks of each

3. Create a new method Topical Inverse Regression Matching (TIRM),by combining the two

Roberts (UCSD) Text Matching 28 April 2016 7 / 32

Page 54: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

Outline of Talk

I A Quick Review of Matching

I Text Analogs to Current Matching Method

I Topical Inverse Regression Matching

I Applications

Roberts (UCSD) Text Matching 28 April 2016 8 / 32

Page 55: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

Outline of Talk

I A Quick Review of Matching

I Text Analogs to Current Matching Method

I Topical Inverse Regression Matching

I Applications

Roberts (UCSD) Text Matching 28 April 2016 8 / 32

Page 56: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

Outline of Talk

I A Quick Review of Matching

I Text Analogs to Current Matching Method

I Topical Inverse Regression Matching

I Applications

Roberts (UCSD) Text Matching 28 April 2016 8 / 32

Page 57: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

Outline of Talk

I A Quick Review of Matching

I Text Analogs to Current Matching Method

I Topical Inverse Regression Matching

I Applications

Roberts (UCSD) Text Matching 28 April 2016 8 / 32

Page 58: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Introduction

Outline of Talk

I A Quick Review of Matching

I Text Analogs to Current Matching Method

I Topical Inverse Regression Matching

I Applications

Roberts (UCSD) Text Matching 28 April 2016 8 / 32

Page 59: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Previous Approaches to Matching

I Goal: ti ⊥⊥ yi (1), yi (0)|~xiI Many approaches: propensity score matching, coarsened exact matching, genetic

matching, covariate-balanced propensity scores, entropy balancing, synthetic matching,

mahalanobis matching, exact matching, subclassification matching, nearest neighbor

matching, full matching . . .

I Today two of these strategies:

1. model p(ti |~xi ) propensity scores2. match on all ~xi coarsened exact matching

I Both strategies scale poorly with high-dimensional covariates.

Roberts (UCSD) Text Matching 28 April 2016 9 / 32

Page 60: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Previous Approaches to Matching

I Goal: ti ⊥⊥ yi (1), yi (0)|~xi

I Many approaches: propensity score matching, coarsened exact matching, genetic

matching, covariate-balanced propensity scores, entropy balancing, synthetic matching,

mahalanobis matching, exact matching, subclassification matching, nearest neighbor

matching, full matching . . .

I Today two of these strategies:

1. model p(ti |~xi ) propensity scores2. match on all ~xi coarsened exact matching

I Both strategies scale poorly with high-dimensional covariates.

Roberts (UCSD) Text Matching 28 April 2016 9 / 32

Page 61: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Previous Approaches to Matching

I Goal: ti ⊥⊥ yi (1), yi (0)|~xiI Many approaches: propensity score matching

, coarsened exact matching, genetic

matching, covariate-balanced propensity scores, entropy balancing, synthetic matching,

mahalanobis matching, exact matching, subclassification matching, nearest neighbor

matching, full matching . . .

I Today two of these strategies:

1. model p(ti |~xi ) propensity scores2. match on all ~xi coarsened exact matching

I Both strategies scale poorly with high-dimensional covariates.

Roberts (UCSD) Text Matching 28 April 2016 9 / 32

Page 62: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Previous Approaches to Matching

I Goal: ti ⊥⊥ yi (1), yi (0)|~xiI Many approaches: propensity score matching, coarsened exact matching

, genetic

matching, covariate-balanced propensity scores, entropy balancing, synthetic matching,

mahalanobis matching, exact matching, subclassification matching, nearest neighbor

matching, full matching . . .

I Today two of these strategies:

1. model p(ti |~xi ) propensity scores2. match on all ~xi coarsened exact matching

I Both strategies scale poorly with high-dimensional covariates.

Roberts (UCSD) Text Matching 28 April 2016 9 / 32

Page 63: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Previous Approaches to Matching

I Goal: ti ⊥⊥ yi (1), yi (0)|~xiI Many approaches: propensity score matching, coarsened exact matching, genetic

matching

, covariate-balanced propensity scores, entropy balancing, synthetic matching,

mahalanobis matching, exact matching, subclassification matching, nearest neighbor

matching, full matching . . .

I Today two of these strategies:

1. model p(ti |~xi ) propensity scores2. match on all ~xi coarsened exact matching

I Both strategies scale poorly with high-dimensional covariates.

Roberts (UCSD) Text Matching 28 April 2016 9 / 32

Page 64: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Previous Approaches to Matching

I Goal: ti ⊥⊥ yi (1), yi (0)|~xiI Many approaches: propensity score matching, coarsened exact matching, genetic

matching, covariate-balanced propensity scores, entropy balancing, synthetic matching

,

mahalanobis matching, exact matching, subclassification matching, nearest neighbor

matching, full matching . . .

I Today two of these strategies:

1. model p(ti |~xi ) propensity scores2. match on all ~xi coarsened exact matching

I Both strategies scale poorly with high-dimensional covariates.

Roberts (UCSD) Text Matching 28 April 2016 9 / 32

Page 65: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Previous Approaches to Matching

I Goal: ti ⊥⊥ yi (1), yi (0)|~xiI Many approaches: propensity score matching, coarsened exact matching, genetic

matching, covariate-balanced propensity scores, entropy balancing, synthetic matching,

mahalanobis matching, exact matching, subclassification matching, nearest neighbor

matching, full matching . . .

I Today two of these strategies:

1. model p(ti |~xi ) propensity scores2. match on all ~xi coarsened exact matching

I Both strategies scale poorly with high-dimensional covariates.

Roberts (UCSD) Text Matching 28 April 2016 9 / 32

Page 66: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Previous Approaches to Matching

I Goal: ti ⊥⊥ yi (1), yi (0)|~xiI Many approaches: propensity score matching, coarsened exact matching, genetic

matching, covariate-balanced propensity scores, entropy balancing, synthetic matching,

mahalanobis matching, exact matching, subclassification matching, nearest neighbor

matching, full matching . . .

I Today two of these strategies:

1. model p(ti |~xi ) propensity scores2. match on all ~xi coarsened exact matching

I Both strategies scale poorly with high-dimensional covariates.

Roberts (UCSD) Text Matching 28 April 2016 9 / 32

Page 67: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Previous Approaches to Matching

I Goal: ti ⊥⊥ yi (1), yi (0)|~xiI Many approaches: propensity score matching, coarsened exact matching, genetic

matching, covariate-balanced propensity scores, entropy balancing, synthetic matching,

mahalanobis matching, exact matching, subclassification matching, nearest neighbor

matching, full matching . . .

I Today two of these strategies:

1. model p(ti |~xi ) propensity scores

2. match on all ~xi coarsened exact matching

I Both strategies scale poorly with high-dimensional covariates.

Roberts (UCSD) Text Matching 28 April 2016 9 / 32

Page 68: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Previous Approaches to Matching

I Goal: ti ⊥⊥ yi (1), yi (0)|~xiI Many approaches: propensity score matching, coarsened exact matching, genetic

matching, covariate-balanced propensity scores, entropy balancing, synthetic matching,

mahalanobis matching, exact matching, subclassification matching, nearest neighbor

matching, full matching . . .

I Today two of these strategies:

1. model p(ti |~xi ) propensity scores2. match on all ~xi coarsened exact matching

I Both strategies scale poorly with high-dimensional covariates.

Roberts (UCSD) Text Matching 28 April 2016 9 / 32

Page 69: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Previous Approaches to Matching

I Goal: ti ⊥⊥ yi (1), yi (0)|~xiI Many approaches: propensity score matching, coarsened exact matching, genetic

matching, covariate-balanced propensity scores, entropy balancing, synthetic matching,

mahalanobis matching, exact matching, subclassification matching, nearest neighbor

matching, full matching . . .

I Today two of these strategies:

1. model p(ti |~xi ) propensity scores2. match on all ~xi coarsened exact matching

I Both strategies scale poorly with high-dimensional covariates.

Roberts (UCSD) Text Matching 28 April 2016 9 / 32

Page 70: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Propensity Scores: An Analog for Text

I Classical approachI fit logistic regression π̂i = p(ti |~xi )I match units with similar probability of treatmentI pros: units matched by scalar (π̂i ) instead of long vector (~xi )I cons: only produces balance in expectation

I Problem: high-dimensional confoundersI X is N × V (# of documents by # of words in vocab)I can only estimate π̂i well when N � V , which isn’t the case!

Roberts (UCSD) Text Matching 28 April 2016 10 / 32

Page 71: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Propensity Scores: An Analog for Text

I Classical approach

I fit logistic regression π̂i = p(ti |~xi )I match units with similar probability of treatmentI pros: units matched by scalar (π̂i ) instead of long vector (~xi )I cons: only produces balance in expectation

I Problem: high-dimensional confoundersI X is N × V (# of documents by # of words in vocab)I can only estimate π̂i well when N � V , which isn’t the case!

Roberts (UCSD) Text Matching 28 April 2016 10 / 32

Page 72: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Propensity Scores: An Analog for Text

I Classical approachI fit logistic regression π̂i = p(ti |~xi )

I match units with similar probability of treatmentI pros: units matched by scalar (π̂i ) instead of long vector (~xi )I cons: only produces balance in expectation

I Problem: high-dimensional confoundersI X is N × V (# of documents by # of words in vocab)I can only estimate π̂i well when N � V , which isn’t the case!

Roberts (UCSD) Text Matching 28 April 2016 10 / 32

Page 73: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Propensity Scores: An Analog for Text

I Classical approachI fit logistic regression π̂i = p(ti |~xi )I match units with similar probability of treatment

I pros: units matched by scalar (π̂i ) instead of long vector (~xi )I cons: only produces balance in expectation

I Problem: high-dimensional confoundersI X is N × V (# of documents by # of words in vocab)I can only estimate π̂i well when N � V , which isn’t the case!

Roberts (UCSD) Text Matching 28 April 2016 10 / 32

Page 74: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Propensity Scores: An Analog for Text

I Classical approachI fit logistic regression π̂i = p(ti |~xi )I match units with similar probability of treatmentI pros: units matched by scalar (π̂i ) instead of long vector (~xi )

I cons: only produces balance in expectation

I Problem: high-dimensional confoundersI X is N × V (# of documents by # of words in vocab)I can only estimate π̂i well when N � V , which isn’t the case!

Roberts (UCSD) Text Matching 28 April 2016 10 / 32

Page 75: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Propensity Scores: An Analog for Text

I Classical approachI fit logistic regression π̂i = p(ti |~xi )I match units with similar probability of treatmentI pros: units matched by scalar (π̂i ) instead of long vector (~xi )I cons: only produces balance in expectation

I Problem: high-dimensional confoundersI X is N × V (# of documents by # of words in vocab)I can only estimate π̂i well when N � V , which isn’t the case!

Roberts (UCSD) Text Matching 28 April 2016 10 / 32

Page 76: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Propensity Scores: An Analog for Text

I Classical approachI fit logistic regression π̂i = p(ti |~xi )I match units with similar probability of treatmentI pros: units matched by scalar (π̂i ) instead of long vector (~xi )I cons: only produces balance in expectation

I Problem: high-dimensional confounders

I X is N × V (# of documents by # of words in vocab)I can only estimate π̂i well when N � V , which isn’t the case!

Roberts (UCSD) Text Matching 28 April 2016 10 / 32

Page 77: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Propensity Scores: An Analog for Text

I Classical approachI fit logistic regression π̂i = p(ti |~xi )I match units with similar probability of treatmentI pros: units matched by scalar (π̂i ) instead of long vector (~xi )I cons: only produces balance in expectation

I Problem: high-dimensional confoundersI X is N × V (# of documents by # of words in vocab)

I can only estimate π̂i well when N � V , which isn’t the case!

Roberts (UCSD) Text Matching 28 April 2016 10 / 32

Page 78: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Propensity Scores: An Analog for Text

I Classical approachI fit logistic regression π̂i = p(ti |~xi )I match units with similar probability of treatmentI pros: units matched by scalar (π̂i ) instead of long vector (~xi )I cons: only produces balance in expectation

I Problem: high-dimensional confoundersI X is N × V (# of documents by # of words in vocab)I can only estimate π̂i well when N � V , which isn’t the case!

Roberts (UCSD) Text Matching 28 April 2016 10 / 32

Page 79: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Propensity Scores: An Analog for Text

I Classical approachI fit logistic regression π̂i = p(ti |~xi )I match units with similar probability of treatmentI pros: units matched by scalar (π̂i ) instead of long vector (~xi )I cons: only produces balance in expectation

I Problem: high-dimensional confoundersI X is N × V (# of documents by # of words in vocab)I can only estimate π̂i well when N � V , which isn’t the case!

Roberts (UCSD) Text Matching 28 April 2016 10 / 32

Page 80: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Propensity Scores: An Analog for Text

I Solution: Multinomial Inverse Regression (Cook 2007, Taddy 2013)

I assume xi ∼Multinomial(~qi ,mi =∑

v xi,v )I where qi,v ∝ exp(αv + tiφv )I φv measures relationship between treatment and wordI projection zi = Φ′(~xi/mi ) is a sufficient reduction X ⊥⊥ T |Z estimate π̂i with projection

I Match on zi or π̂i

Roberts (UCSD) Text Matching 28 April 2016 11 / 32

Page 81: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Propensity Scores: An Analog for Text

I Solution: Multinomial Inverse Regression (Cook 2007, Taddy 2013)

I assume xi ∼Multinomial(~qi ,mi =∑

v xi,v )I where qi,v ∝ exp(αv + tiφv )I φv measures relationship between treatment and wordI projection zi = Φ′(~xi/mi ) is a sufficient reduction X ⊥⊥ T |Z estimate π̂i with projection

I Match on zi or π̂i

Roberts (UCSD) Text Matching 28 April 2016 11 / 32

Page 82: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Propensity Scores: An Analog for Text

I Solution: Multinomial Inverse Regression (Cook 2007, Taddy 2013)

I assume xi ∼Multinomial(~qi ,mi =∑

v xi,v )I where qi,v ∝ exp(αv + tiφv )I φv measures relationship between treatment and wordI projection zi = Φ′(~xi/mi ) is a sufficient reduction X ⊥⊥ T |Z estimate π̂i with projection

I Match on zi or π̂i

Roberts (UCSD) Text Matching 28 April 2016 11 / 32

Page 83: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Propensity Scores: An Analog for Text

I Solution: Multinomial Inverse Regression (Cook 2007, Taddy 2013)

I assume xi ∼Multinomial(~qi ,mi =∑

v xi,v )

I where qi,v ∝ exp(αv + tiφv )

I φv measures relationship between treatment and wordI projection zi = Φ′(~xi/mi ) is a sufficient reduction X ⊥⊥ T |Z estimate π̂i with projection

I Match on zi or π̂i

Roberts (UCSD) Text Matching 28 April 2016 11 / 32

Page 84: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Propensity Scores: An Analog for Text

I Solution: Multinomial Inverse Regression (Cook 2007, Taddy 2013)

I assume xi ∼Multinomial(~qi ,mi =∑

v xi,v )I where qi,v ∝ exp(αv + tiφv )

I φv measures relationship between treatment and wordI projection zi = Φ′(~xi/mi ) is a sufficient reduction X ⊥⊥ T |Z estimate π̂i with projection

I Match on zi or π̂i

Roberts (UCSD) Text Matching 28 April 2016 11 / 32

Page 85: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Propensity Scores: An Analog for Text

I Solution: Multinomial Inverse Regression (Cook 2007, Taddy 2013)

I assume xi ∼Multinomial(~qi ,mi =∑

v xi,v )I where qi,v ∝ exp(αv + tiφv )

I φv measures relationship between treatment and wordI projection zi = Φ′(~xi/mi ) is a sufficient reduction X ⊥⊥ T |Z estimate π̂i with projection

I Match on zi or π̂i

Roberts (UCSD) Text Matching 28 April 2016 11 / 32

Page 86: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Propensity Scores: An Analog for Text

I Solution: Multinomial Inverse Regression (Cook 2007, Taddy 2013)

I assume xi ∼Multinomial(~qi ,mi =∑

v xi,v )I where qi,v ∝ exp(αv + tiφv )I φv measures relationship between treatment and word

I projection zi = Φ′(~xi/mi ) is a sufficient reduction X ⊥⊥ T |Z estimate π̂i with projection

I Match on zi or π̂i

Roberts (UCSD) Text Matching 28 April 2016 11 / 32

Page 87: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Propensity Scores: An Analog for Text

I Solution: Multinomial Inverse Regression (Cook 2007, Taddy 2013)

I assume xi ∼Multinomial(~qi ,mi =∑

v xi,v )I where qi,v ∝ exp(αv + tiφv )I φv measures relationship between treatment and word

I projection zi = Φ′(~xi/mi ) is a sufficient reduction X ⊥⊥ T |Z estimate π̂i with projection

I Match on zi or π̂i

Roberts (UCSD) Text Matching 28 April 2016 11 / 32

Page 88: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Propensity Scores: An Analog for Text

I Solution: Multinomial Inverse Regression (Cook 2007, Taddy 2013)

I assume xi ∼Multinomial(~qi ,mi =∑

v xi,v )I where qi,v ∝ exp(αv + tiφv )I φv measures relationship between treatment and wordI projection zi = Φ′(~xi/mi ) is a sufficient reduction X ⊥⊥ T |Z estimate π̂i with projection

I Match on zi or π̂i

Roberts (UCSD) Text Matching 28 April 2016 11 / 32

Page 89: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Propensity Scores: An Analog for Text

I Solution: Multinomial Inverse Regression (Cook 2007, Taddy 2013)

I assume xi ∼Multinomial(~qi ,mi =∑

v xi,v )I where qi,v ∝ exp(αv + tiφv )I φv measures relationship between treatment and wordI projection zi = Φ′(~xi/mi ) is a sufficient reduction X ⊥⊥ T |Z estimate π̂i with projection

I Match on zi or π̂i

Roberts (UCSD) Text Matching 28 April 2016 11 / 32

Page 90: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Propensity Scores: An Analog for Text

I Solution: Multinomial Inverse Regression (Cook 2007, Taddy 2013)

I assume xi ∼Multinomial(~qi ,mi =∑

v xi,v )I where qi,v ∝ exp(αv + tiφv )I φv measures relationship between treatment and wordI projection zi = Φ′(~xi/mi ) is a sufficient reduction X ⊥⊥ T |Z estimate π̂i with projection

I Match on zi or π̂i

Roberts (UCSD) Text Matching 28 April 2016 11 / 32

Page 91: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Problems with MNIR Matching

Posts equally likely to be treated are not always semantically similar:

I wouldn’t be a problem in expectation BUT

I hard to assess balance in the text case

I could be more efficient if matches were more similar

Roberts (UCSD) Text Matching 28 April 2016 12 / 32

Page 92: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Problems with MNIR Matching

Posts equally likely to be treated are not always semantically similar:

I wouldn’t be a problem in expectation BUT

I hard to assess balance in the text case

I could be more efficient if matches were more similar

Roberts (UCSD) Text Matching 28 April 2016 12 / 32

Page 93: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Problems with MNIR Matching

Posts equally likely to be treated are not always semantically similar:

I wouldn’t be a problem in expectation BUT

I hard to assess balance in the text case

I could be more efficient if matches were more similar

Roberts (UCSD) Text Matching 28 April 2016 12 / 32

Page 94: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Problems with MNIR Matching

Posts equally likely to be treated are not always semantically similar:

I wouldn’t be a problem in expectation BUT

I hard to assess balance in the text case

I could be more efficient if matches were more similar

Roberts (UCSD) Text Matching 28 April 2016 12 / 32

Page 95: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Problems with MNIR Matching

Posts equally likely to be treated are not always semantically similar:

I wouldn’t be a problem in expectation BUT

I hard to assess balance in the text case

I could be more efficient if matches were more similar

Roberts (UCSD) Text Matching 28 April 2016 12 / 32

Page 96: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Problems with MNIR Matching

Posts equally likely to be treated are not always semantically similar:

I wouldn’t be a problem in expectation BUT

I hard to assess balance in the text case

I could be more efficient if matches were more similar

Roberts (UCSD) Text Matching 28 April 2016 12 / 32

Page 97: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Problems with MNIR Matching

Posts equally likely to be treated are not always semantically similar:

I wouldn’t be a problem in expectation BUT

I hard to assess balance in the text case

I could be more efficient if matches were more similar

Roberts (UCSD) Text Matching 28 April 2016 12 / 32

Page 98: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Problems with MNIR Matching

Posts equally likely to be treated are not always semantically similar:

I wouldn’t be a problem in expectation BUT

I hard to assess balance in the text case

I could be more efficient if matches were more similar

Roberts (UCSD) Text Matching 28 April 2016 12 / 32

Page 99: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Coarsened Exact Matching: An Analog for Text

I Classical approach

I coarsen each variable into natural categoriesi.e. years of education {high school, elementary school, college}

I exactly match on coarsened variableI pros: bounds imbalance on each variable

I Problem: high-dimensional confounderI thousands of variables, even if we coarsen, no exact match

Roberts (UCSD) Text Matching 28 April 2016 13 / 32

Page 100: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Coarsened Exact Matching: An Analog for Text

I Classical approachI coarsen each variable into natural categories

i.e. years of education {high school, elementary school, college}

I exactly match on coarsened variableI pros: bounds imbalance on each variable

I Problem: high-dimensional confounderI thousands of variables, even if we coarsen, no exact match

Roberts (UCSD) Text Matching 28 April 2016 13 / 32

Page 101: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Coarsened Exact Matching: An Analog for Text

I Classical approachI coarsen each variable into natural categories

i.e. years of education {high school, elementary school, college}I exactly match on coarsened variable

I pros: bounds imbalance on each variable

I Problem: high-dimensional confounderI thousands of variables, even if we coarsen, no exact match

Roberts (UCSD) Text Matching 28 April 2016 13 / 32

Page 102: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Coarsened Exact Matching: An Analog for Text

I Classical approachI coarsen each variable into natural categories

i.e. years of education {high school, elementary school, college}I exactly match on coarsened variableI pros: bounds imbalance on each variable

I Problem: high-dimensional confounderI thousands of variables, even if we coarsen, no exact match

Roberts (UCSD) Text Matching 28 April 2016 13 / 32

Page 103: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Coarsened Exact Matching: An Analog for Text

I Classical approachI coarsen each variable into natural categories

i.e. years of education {high school, elementary school, college}I exactly match on coarsened variableI pros: bounds imbalance on each variable

I Problem: high-dimensional confounder

I thousands of variables, even if we coarsen, no exact match

Roberts (UCSD) Text Matching 28 April 2016 13 / 32

Page 104: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Coarsened Exact Matching: An Analog for Text

I Classical approachI coarsen each variable into natural categories

i.e. years of education {high school, elementary school, college}I exactly match on coarsened variableI pros: bounds imbalance on each variable

I Problem: high-dimensional confounderI thousands of variables, even if we coarsen, no exact match

Roberts (UCSD) Text Matching 28 April 2016 13 / 32

Page 105: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Coarsened Exact Matching: An Analog for Text

I Classical approachI coarsen each variable into natural categories

i.e. years of education {high school, elementary school, college}I exactly match on coarsened variableI pros: bounds imbalance on each variable

I Problem: high-dimensional confounderI thousands of variables, even if we coarsen, no exact match

Roberts (UCSD) Text Matching 28 April 2016 13 / 32

Page 106: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Coarsened Exact Matching: An Analog for Text

I Solution: topically coarsened matching

I innovation: coarsen across variablessimple example: “tax”, “income”, “tariff” “economics”

I topics must be equivalent across documents instead of wordsI bounds imbalance across groups of stochastically equivalent words

I Estimate a topic model

I Match on the topic density rather than raw word counts

Roberts (UCSD) Text Matching 28 April 2016 14 / 32

Page 107: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Coarsened Exact Matching: An Analog for Text

I Solution: topically coarsened matchingI innovation: coarsen across variables

simple example: “tax”, “income”, “tariff” “economics”

I topics must be equivalent across documents instead of wordsI bounds imbalance across groups of stochastically equivalent words

I Estimate a topic model

I Match on the topic density rather than raw word counts

Roberts (UCSD) Text Matching 28 April 2016 14 / 32

Page 108: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Coarsened Exact Matching: An Analog for Text

I Solution: topically coarsened matchingI innovation: coarsen across variables

simple example: “tax”, “income”, “tariff” “economics”I topics must be equivalent across documents instead of words

I bounds imbalance across groups of stochastically equivalent words

I Estimate a topic model

I Match on the topic density rather than raw word counts

Roberts (UCSD) Text Matching 28 April 2016 14 / 32

Page 109: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Coarsened Exact Matching: An Analog for Text

I Solution: topically coarsened matchingI innovation: coarsen across variables

simple example: “tax”, “income”, “tariff” “economics”I topics must be equivalent across documents instead of wordsI bounds imbalance across groups of stochastically equivalent words

I Estimate a topic model

I Match on the topic density rather than raw word counts

Roberts (UCSD) Text Matching 28 April 2016 14 / 32

Page 110: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Coarsened Exact Matching: An Analog for Text

I Solution: topically coarsened matchingI innovation: coarsen across variables

simple example: “tax”, “income”, “tariff” “economics”I topics must be equivalent across documents instead of wordsI bounds imbalance across groups of stochastically equivalent words

I Estimate a topic model

I Match on the topic density rather than raw word counts

Roberts (UCSD) Text Matching 28 April 2016 14 / 32

Page 111: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Coarsened Exact Matching: An Analog for Text

I Solution: topically coarsened matchingI innovation: coarsen across variables

simple example: “tax”, “income”, “tariff” “economics”I topics must be equivalent across documents instead of wordsI bounds imbalance across groups of stochastically equivalent words

I Estimate a topic model

I Match on the topic density rather than raw word counts

Roberts (UCSD) Text Matching 28 April 2016 14 / 32

Page 112: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Problems with Topical CEM

Topics aren’t always the most important predictor of treatment:

Roberts (UCSD) Text Matching 28 April 2016 15 / 32

Page 113: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Problems with Topical CEM

Topics aren’t always the most important predictor of treatment:

Roberts (UCSD) Text Matching 28 April 2016 15 / 32

Page 114: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Problems with Topical CEM

Topics aren’t always the most important predictor of treatment:

Roberts (UCSD) Text Matching 28 April 2016 15 / 32

Page 115: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Problems with Topical CEM

Topics aren’t always the most important predictor of treatment:

Roberts (UCSD) Text Matching 28 April 2016 15 / 32

Page 116: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Problems with Topical CEM

Topics aren’t always the most important predictor of treatment:

Roberts (UCSD) Text Matching 28 April 2016 15 / 32

Page 117: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Problems with Topical CEM

Topics aren’t always the most important predictor of treatment:

Roberts (UCSD) Text Matching 28 April 2016 15 / 32

Page 118: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Topical Inverse Regression Matching (TIRM)

I We need something that:

1. Bounds imbalance between documents2. Doesn’t leave out important words

I TIRM: Jointly estimate probability of treatment and topic densityI Match on topic proportions & topic-specific probability of treatment

I topical bounding propertiesI estimates which words associated with treatment

I Ingredients:

I Structural Topic Model (Roberts, Stewart, Tingley et al 2014)I with treatment as content covariate

Roberts (UCSD) Text Matching 28 April 2016 16 / 32

Page 119: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Topical Inverse Regression Matching (TIRM)

I We need something that:

1. Bounds imbalance between documents

2. Doesn’t leave out important words

I TIRM: Jointly estimate probability of treatment and topic densityI Match on topic proportions & topic-specific probability of treatment

I topical bounding propertiesI estimates which words associated with treatment

I Ingredients:

I Structural Topic Model (Roberts, Stewart, Tingley et al 2014)I with treatment as content covariate

Roberts (UCSD) Text Matching 28 April 2016 16 / 32

Page 120: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Topical Inverse Regression Matching (TIRM)

I We need something that:

1. Bounds imbalance between documents2. Doesn’t leave out important words

I TIRM: Jointly estimate probability of treatment and topic densityI Match on topic proportions & topic-specific probability of treatment

I topical bounding propertiesI estimates which words associated with treatment

I Ingredients:

I Structural Topic Model (Roberts, Stewart, Tingley et al 2014)I with treatment as content covariate

Roberts (UCSD) Text Matching 28 April 2016 16 / 32

Page 121: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Topical Inverse Regression Matching (TIRM)

I We need something that:

1. Bounds imbalance between documents2. Doesn’t leave out important words

I TIRM: Jointly estimate probability of treatment and topic density

I Match on topic proportions & topic-specific probability of treatment

I topical bounding propertiesI estimates which words associated with treatment

I Ingredients:

I Structural Topic Model (Roberts, Stewart, Tingley et al 2014)I with treatment as content covariate

Roberts (UCSD) Text Matching 28 April 2016 16 / 32

Page 122: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Topical Inverse Regression Matching (TIRM)

I We need something that:

1. Bounds imbalance between documents2. Doesn’t leave out important words

I TIRM: Jointly estimate probability of treatment and topic densityI Match on topic proportions & topic-specific probability of treatment

I topical bounding propertiesI estimates which words associated with treatment

I Ingredients:

I Structural Topic Model (Roberts, Stewart, Tingley et al 2014)I with treatment as content covariate

Roberts (UCSD) Text Matching 28 April 2016 16 / 32

Page 123: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Topical Inverse Regression Matching (TIRM)

I We need something that:

1. Bounds imbalance between documents2. Doesn’t leave out important words

I TIRM: Jointly estimate probability of treatment and topic densityI Match on topic proportions & topic-specific probability of treatment

I topical bounding properties

I estimates which words associated with treatment

I Ingredients:

I Structural Topic Model (Roberts, Stewart, Tingley et al 2014)I with treatment as content covariate

Roberts (UCSD) Text Matching 28 April 2016 16 / 32

Page 124: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Topical Inverse Regression Matching (TIRM)

I We need something that:

1. Bounds imbalance between documents2. Doesn’t leave out important words

I TIRM: Jointly estimate probability of treatment and topic densityI Match on topic proportions & topic-specific probability of treatment

I topical bounding propertiesI estimates which words associated with treatment

I Ingredients:

I Structural Topic Model (Roberts, Stewart, Tingley et al 2014)I with treatment as content covariate

Roberts (UCSD) Text Matching 28 April 2016 16 / 32

Page 125: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Topical Inverse Regression Matching (TIRM)

I We need something that:

1. Bounds imbalance between documents2. Doesn’t leave out important words

I TIRM: Jointly estimate probability of treatment and topic densityI Match on topic proportions & topic-specific probability of treatment

I topical bounding propertiesI estimates which words associated with treatment

I Ingredients:

I Structural Topic Model (Roberts, Stewart, Tingley et al 2014)I with treatment as content covariate

Roberts (UCSD) Text Matching 28 April 2016 16 / 32

Page 126: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Topical Inverse Regression Matching (TIRM)

I We need something that:

1. Bounds imbalance between documents2. Doesn’t leave out important words

I TIRM: Jointly estimate probability of treatment and topic densityI Match on topic proportions & topic-specific probability of treatment

I topical bounding propertiesI estimates which words associated with treatment

I Ingredients:I Structural Topic Model (Roberts, Stewart, Tingley et al 2014)

I with treatment as content covariate

Roberts (UCSD) Text Matching 28 April 2016 16 / 32

Page 127: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Topical Inverse Regression Matching (TIRM)

I We need something that:

1. Bounds imbalance between documents2. Doesn’t leave out important words

I TIRM: Jointly estimate probability of treatment and topic densityI Match on topic proportions & topic-specific probability of treatment

I topical bounding propertiesI estimates which words associated with treatment

I Ingredients:I Structural Topic Model (Roberts, Stewart, Tingley et al 2014)I with treatment as content covariate

Roberts (UCSD) Text Matching 28 April 2016 16 / 32

Page 128: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Structural Topic Model

I STM adds a “structure” to the Latent Dirichlet Allocation (Blei, Ngand Jordan 2003) via a prior

I Replace topic prevalence prior → (heuristically) glm with arbitrarycovariates(Blei and Lafferty 2006, Mimno and McCallum 2008)

I Replace the distribution over words → multinomial logit (Eisenstein,Ahmed and Xing 2011)

I Documents have different expected topic proportions based onobserved covariates.

I Topics are now deviations from a baseline distribution.

P(word |topic , doc) ∝

exp(κ(m)+ topic∗κ(k)+ covariatedoc∗κ(c) + topic*covariatedoc∗κ(int))

κ(c) and κ(int) how words are related to treatment.

Roberts (UCSD) Text Matching 28 April 2016 17 / 32

Page 129: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Structural Topic Model

I STM adds a “structure” to the Latent Dirichlet Allocation (Blei, Ngand Jordan 2003) via a prior

I Replace topic prevalence prior → (heuristically) glm with arbitrarycovariates(Blei and Lafferty 2006, Mimno and McCallum 2008)

I Replace the distribution over words → multinomial logit (Eisenstein,Ahmed and Xing 2011)

I Documents have different expected topic proportions based onobserved covariates.

I Topics are now deviations from a baseline distribution.

P(word |topic , doc) ∝

exp(κ(m)+ topic∗κ(k)+ covariatedoc∗κ(c) + topic*covariatedoc∗κ(int))

κ(c) and κ(int) how words are related to treatment.

Roberts (UCSD) Text Matching 28 April 2016 17 / 32

Page 130: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Structural Topic Model

I STM adds a “structure” to the Latent Dirichlet Allocation (Blei, Ngand Jordan 2003) via a prior

I Replace topic prevalence prior → (heuristically) glm with arbitrarycovariates(Blei and Lafferty 2006, Mimno and McCallum 2008)

I Replace the distribution over words → multinomial logit (Eisenstein,Ahmed and Xing 2011)

I Documents have different expected topic proportions based onobserved covariates.

I Topics are now deviations from a baseline distribution.

P(word |topic , doc) ∝

exp(κ(m)+ topic∗κ(k)+ covariatedoc∗κ(c) + topic*covariatedoc∗κ(int))

κ(c) and κ(int) how words are related to treatment.

Roberts (UCSD) Text Matching 28 April 2016 17 / 32

Page 131: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Structural Topic Model

I STM adds a “structure” to the Latent Dirichlet Allocation (Blei, Ngand Jordan 2003) via a prior

I Replace topic prevalence prior → (heuristically) glm with arbitrarycovariates(Blei and Lafferty 2006, Mimno and McCallum 2008)

I Replace the distribution over words → multinomial logit (Eisenstein,Ahmed and Xing 2011)

I Documents have different expected topic proportions based onobserved covariates.

I Topics are now deviations from a baseline distribution.

P(word |topic , doc) ∝

exp(κ(m)+ topic∗κ(k)+ covariatedoc∗κ(c) + topic*covariatedoc∗κ(int))

κ(c) and κ(int) how words are related to treatment.

Roberts (UCSD) Text Matching 28 April 2016 17 / 32

Page 132: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Structural Topic Model

I STM adds a “structure” to the Latent Dirichlet Allocation (Blei, Ngand Jordan 2003) via a prior

I Replace topic prevalence prior → (heuristically) glm with arbitrarycovariates(Blei and Lafferty 2006, Mimno and McCallum 2008)

I Replace the distribution over words → multinomial logit (Eisenstein,Ahmed and Xing 2011)

I Documents have different expected topic proportions based onobserved covariates.

I Topics are now deviations from a baseline distribution.

P(word |topic , doc) ∝

exp(κ(m)+ topic∗κ(k)+ covariatedoc∗κ(c) + topic*covariatedoc∗κ(int))

κ(c) and κ(int) how words are related to treatment.

Roberts (UCSD) Text Matching 28 April 2016 17 / 32

Page 133: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Structural Topic Model

I STM adds a “structure” to the Latent Dirichlet Allocation (Blei, Ngand Jordan 2003) via a prior

I Replace topic prevalence prior → (heuristically) glm with arbitrarycovariates(Blei and Lafferty 2006, Mimno and McCallum 2008)

I Replace the distribution over words → multinomial logit (Eisenstein,Ahmed and Xing 2011)

I Documents have different expected topic proportions based onobserved covariates.

I Topics are now deviations from a baseline distribution.

P(word |topic , doc) ∝

exp(κ(m)+ topic∗κ(k)+ covariatedoc∗κ(c) + topic*covariatedoc∗κ(int))

κ(c) and κ(int) how words are related to treatment.

Roberts (UCSD) Text Matching 28 April 2016 17 / 32

Page 134: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Structural Topic Model

I STM adds a “structure” to the Latent Dirichlet Allocation (Blei, Ngand Jordan 2003) via a prior

I Replace topic prevalence prior → (heuristically) glm with arbitrarycovariates(Blei and Lafferty 2006, Mimno and McCallum 2008)

I Replace the distribution over words → multinomial logit (Eisenstein,Ahmed and Xing 2011)

I Documents have different expected topic proportions based onobserved covariates.

I Topics are now deviations from a baseline distribution.

P(word |topic , doc) ∝

exp(κ(m)+ topic∗κ(k)+ covariatedoc∗κ(c) + topic*covariatedoc∗κ(int))

κ(c) and κ(int) how words are related to treatment.Roberts (UCSD) Text Matching 28 April 2016 17 / 32

Page 135: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

TIRM

Match on:

1. θ: Estimated topic proportion (K covariates)

2. proj :

I let (xi/mi ) % of document i that is word xI (κ(c))′(xi/mi )

covariate-only projection

I (κ(c))′(xi/mi ) + 1mi

∑v xi,v

((κ

(int)v

)′θi

)

topic-covariate projection

3. Any other covariates you think are important

We generally use CEM to match but other methods could be used.

Limitations of TIRM

I New: relies on a parametric method to reduce dimensions

I Old: requires SUTVA, relevant covariates

Roberts (UCSD) Text Matching 28 April 2016 18 / 32

Page 136: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

TIRM

Match on:

1. θ: Estimated topic proportion (K covariates)

2. proj :

I let (xi/mi ) % of document i that is word xI (κ(c))′(xi/mi )

covariate-only projection

I (κ(c))′(xi/mi ) + 1mi

∑v xi,v

((κ

(int)v

)′θi

)

topic-covariate projection

3. Any other covariates you think are important

We generally use CEM to match but other methods could be used.

Limitations of TIRM

I New: relies on a parametric method to reduce dimensions

I Old: requires SUTVA, relevant covariates

Roberts (UCSD) Text Matching 28 April 2016 18 / 32

Page 137: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

TIRM

Match on:

1. θ: Estimated topic proportion (K covariates)

2. proj :I let (xi/mi ) % of document i that is word x

I (κ(c))′(xi/mi )

covariate-only projection

I (κ(c))′(xi/mi ) + 1mi

∑v xi,v

((κ

(int)v

)′θi

)

topic-covariate projection

3. Any other covariates you think are important

We generally use CEM to match but other methods could be used.

Limitations of TIRM

I New: relies on a parametric method to reduce dimensions

I Old: requires SUTVA, relevant covariates

Roberts (UCSD) Text Matching 28 April 2016 18 / 32

Page 138: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

TIRM

Match on:

1. θ: Estimated topic proportion (K covariates)

2. proj :I let (xi/mi ) % of document i that is word xI (κ(c))′(xi/mi )

covariate-only projection

I (κ(c))′(xi/mi ) + 1mi

∑v xi,v

((κ

(int)v

)′θi

)

topic-covariate projection

3. Any other covariates you think are important

We generally use CEM to match but other methods could be used.

Limitations of TIRM

I New: relies on a parametric method to reduce dimensions

I Old: requires SUTVA, relevant covariates

Roberts (UCSD) Text Matching 28 April 2016 18 / 32

Page 139: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

TIRM

Match on:

1. θ: Estimated topic proportion (K covariates)

2. proj :I let (xi/mi ) % of document i that is word xI (κ(c))′(xi/mi ) covariate-only projection

I (κ(c))′(xi/mi ) + 1mi

∑v xi,v

((κ

(int)v

)′θi

)

topic-covariate projection

3. Any other covariates you think are important

We generally use CEM to match but other methods could be used.

Limitations of TIRM

I New: relies on a parametric method to reduce dimensions

I Old: requires SUTVA, relevant covariates

Roberts (UCSD) Text Matching 28 April 2016 18 / 32

Page 140: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

TIRM

Match on:

1. θ: Estimated topic proportion (K covariates)

2. proj :I let (xi/mi ) % of document i that is word xI (κ(c))′(xi/mi ) covariate-only projection

I (κ(c))′(xi/mi ) + 1mi

∑v xi,v

((κ

(int)v

)′θi

)

topic-covariate projection

3. Any other covariates you think are important

We generally use CEM to match but other methods could be used.

Limitations of TIRM

I New: relies on a parametric method to reduce dimensions

I Old: requires SUTVA, relevant covariates

Roberts (UCSD) Text Matching 28 April 2016 18 / 32

Page 141: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

TIRM

Match on:

1. θ: Estimated topic proportion (K covariates)

2. proj :I let (xi/mi ) % of document i that is word xI (κ(c))′(xi/mi ) covariate-only projection

I (κ(c))′(xi/mi ) + 1mi

∑v xi,v

((κ

(int)v

)′θi

)topic-covariate projection

3. Any other covariates you think are important

We generally use CEM to match but other methods could be used.

Limitations of TIRM

I New: relies on a parametric method to reduce dimensions

I Old: requires SUTVA, relevant covariates

Roberts (UCSD) Text Matching 28 April 2016 18 / 32

Page 142: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

TIRM

Match on:

1. θ: Estimated topic proportion (K covariates)

2. proj :I let (xi/mi ) % of document i that is word xI (κ(c))′(xi/mi ) covariate-only projection

I (κ(c))′(xi/mi ) + 1mi

∑v xi,v

((κ

(int)v

)′θi

)topic-covariate projection

3. Any other covariates you think are important

We generally use CEM to match but other methods could be used.

Limitations of TIRM

I New: relies on a parametric method to reduce dimensions

I Old: requires SUTVA, relevant covariates

Roberts (UCSD) Text Matching 28 April 2016 18 / 32

Page 143: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

TIRM

Match on:

1. θ: Estimated topic proportion (K covariates)

2. proj :I let (xi/mi ) % of document i that is word xI (κ(c))′(xi/mi ) covariate-only projection

I (κ(c))′(xi/mi ) + 1mi

∑v xi,v

((κ

(int)v

)′θi

)topic-covariate projection

3. Any other covariates you think are important

We generally use CEM to match but other methods could be used.

Limitations of TIRM

I New: relies on a parametric method to reduce dimensions

I Old: requires SUTVA, relevant covariates

Roberts (UCSD) Text Matching 28 April 2016 18 / 32

Page 144: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

TIRM

Match on:

1. θ: Estimated topic proportion (K covariates)

2. proj :I let (xi/mi ) % of document i that is word xI (κ(c))′(xi/mi ) covariate-only projection

I (κ(c))′(xi/mi ) + 1mi

∑v xi,v

((κ

(int)v

)′θi

)topic-covariate projection

3. Any other covariates you think are important

We generally use CEM to match but other methods could be used.

Limitations of TIRM

I New: relies on a parametric method to reduce dimensions

I Old: requires SUTVA, relevant covariates

Roberts (UCSD) Text Matching 28 April 2016 18 / 32

Page 145: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

TIRM

Match on:

1. θ: Estimated topic proportion (K covariates)

2. proj :I let (xi/mi ) % of document i that is word xI (κ(c))′(xi/mi ) covariate-only projection

I (κ(c))′(xi/mi ) + 1mi

∑v xi,v

((κ

(int)v

)′θi

)topic-covariate projection

3. Any other covariates you think are important

We generally use CEM to match but other methods could be used.

Limitations of TIRM

I New: relies on a parametric method to reduce dimensions

I Old: requires SUTVA, relevant covariates

Roberts (UCSD) Text Matching 28 April 2016 18 / 32

Page 146: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

TIRM

Match on:

1. θ: Estimated topic proportion (K covariates)

2. proj :I let (xi/mi ) % of document i that is word xI (κ(c))′(xi/mi ) covariate-only projection

I (κ(c))′(xi/mi ) + 1mi

∑v xi,v

((κ

(int)v

)′θi

)topic-covariate projection

3. Any other covariates you think are important

We generally use CEM to match but other methods could be used.

Limitations of TIRM

I New: relies on a parametric method to reduce dimensions

I Old: requires SUTVA, relevant covariates

Roberts (UCSD) Text Matching 28 April 2016 18 / 32

Page 147: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Simulations

Set up:

1. Simulate 200 outcome and treatment with confounding topics andwords

2. Estimate STM3. Condition on topics and projection

−2.0 −1.5 −1.0 −0.5 0.0

05

1015

20

Estimated Effect

Den

sity

Naive EstimatorTopics OnlyTopics and ProjectionTrue

Roberts (UCSD) Text Matching 28 April 2016 19 / 32

Page 148: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Simulations

Set up:

1. Simulate 200 outcome and treatment with confounding topics andwords

2. Estimate STM3. Condition on topics and projection

−2.0 −1.5 −1.0 −0.5 0.0

05

1015

20

Estimated Effect

Den

sity

Naive EstimatorTopics OnlyTopics and ProjectionTrue

Roberts (UCSD) Text Matching 28 April 2016 19 / 32

Page 149: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Simulations

Set up:

1. Simulate 200 outcome and treatment with confounding topics andwords

2. Estimate STM

3. Condition on topics and projection

−2.0 −1.5 −1.0 −0.5 0.0

05

1015

20

Estimated Effect

Den

sity

Naive EstimatorTopics OnlyTopics and ProjectionTrue

Roberts (UCSD) Text Matching 28 April 2016 19 / 32

Page 150: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Simulations

Set up:

1. Simulate 200 outcome and treatment with confounding topics andwords

2. Estimate STM3. Condition on topics and projection

−2.0 −1.5 −1.0 −0.5 0.0

05

1015

20

Estimated Effect

Den

sity

Naive EstimatorTopics OnlyTopics and ProjectionTrue

Roberts (UCSD) Text Matching 28 April 2016 19 / 32

Page 151: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Matching Methods for Text

Simulations

Set up:

1. Simulate 200 outcome and treatment with confounding topics andwords

2. Estimate STM3. Condition on topics and projection

−2.0 −1.5 −1.0 −0.5 0.0

05

1015

20

Estimated Effect

Den

sity

Naive EstimatorTopics OnlyTopics and ProjectionTrue

Roberts (UCSD) Text Matching 28 April 2016 19 / 32

Page 152: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Censorship Effects

Example 1: How do bloggers react to censorship?

I Data: 593 bloggers over 6 months spanning 2011,2012

I 150,000 posts

I Return to blogs to measure censorship

I Find censors’ mistakes: two similar blogs, different censorship

I Also match on date, previous censorship, previous sensitivity.

I How do ’treated’ bloggers react to censorship?I Outcome: Bloggers’ writings after censorship:

I censorship rate afterI sensitivity of blog text after (estimated by TIRM)I topical content of blogs after

Roberts (UCSD) Text Matching 28 April 2016 20 / 32

Page 153: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Censorship Effects

Example 1: How do bloggers react to censorship?

I Data: 593 bloggers over 6 months spanning 2011,2012

I 150,000 posts

I Return to blogs to measure censorship

I Find censors’ mistakes: two similar blogs, different censorship

I Also match on date, previous censorship, previous sensitivity.

I How do ’treated’ bloggers react to censorship?I Outcome: Bloggers’ writings after censorship:

I censorship rate afterI sensitivity of blog text after (estimated by TIRM)I topical content of blogs after

Roberts (UCSD) Text Matching 28 April 2016 20 / 32

Page 154: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Censorship Effects

Example 1: How do bloggers react to censorship?

I Data: 593 bloggers over 6 months spanning 2011,2012

I 150,000 posts

I Return to blogs to measure censorship

I Find censors’ mistakes: two similar blogs, different censorship

I Also match on date, previous censorship, previous sensitivity.

I How do ’treated’ bloggers react to censorship?I Outcome: Bloggers’ writings after censorship:

I censorship rate afterI sensitivity of blog text after (estimated by TIRM)I topical content of blogs after

Roberts (UCSD) Text Matching 28 April 2016 20 / 32

Page 155: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Censorship Effects

Example 1: How do bloggers react to censorship?

I Data: 593 bloggers over 6 months spanning 2011,2012

I 150,000 posts

I Return to blogs to measure censorship

I Find censors’ mistakes: two similar blogs, different censorship

I Also match on date, previous censorship, previous sensitivity.

I How do ’treated’ bloggers react to censorship?I Outcome: Bloggers’ writings after censorship:

I censorship rate afterI sensitivity of blog text after (estimated by TIRM)I topical content of blogs after

Roberts (UCSD) Text Matching 28 April 2016 20 / 32

Page 156: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Censorship Effects

Example 1: How do bloggers react to censorship?

I Data: 593 bloggers over 6 months spanning 2011,2012

I 150,000 posts

I Return to blogs to measure censorship

I Find censors’ mistakes: two similar blogs, different censorship

I Also match on date, previous censorship, previous sensitivity.

I How do ’treated’ bloggers react to censorship?I Outcome: Bloggers’ writings after censorship:

I censorship rate afterI sensitivity of blog text after (estimated by TIRM)I topical content of blogs after

Roberts (UCSD) Text Matching 28 April 2016 20 / 32

Page 157: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Censorship Effects

Example 1: How do bloggers react to censorship?

I Data: 593 bloggers over 6 months spanning 2011,2012

I 150,000 posts

I Return to blogs to measure censorship

I Find censors’ mistakes: two similar blogs, different censorship

I Also match on date, previous censorship, previous sensitivity.

I How do ’treated’ bloggers react to censorship?I Outcome: Bloggers’ writings after censorship:

I censorship rate afterI sensitivity of blog text after (estimated by TIRM)I topical content of blogs after

Roberts (UCSD) Text Matching 28 April 2016 20 / 32

Page 158: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Censorship Effects

Example 1: How do bloggers react to censorship?

I Data: 593 bloggers over 6 months spanning 2011,2012

I 150,000 posts

I Return to blogs to measure censorship

I Find censors’ mistakes: two similar blogs, different censorship

I Also match on date, previous censorship, previous sensitivity.

I How do ’treated’ bloggers react to censorship?

I Outcome: Bloggers’ writings after censorship:

I censorship rate afterI sensitivity of blog text after (estimated by TIRM)I topical content of blogs after

Roberts (UCSD) Text Matching 28 April 2016 20 / 32

Page 159: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Censorship Effects

Example 1: How do bloggers react to censorship?

I Data: 593 bloggers over 6 months spanning 2011,2012

I 150,000 posts

I Return to blogs to measure censorship

I Find censors’ mistakes: two similar blogs, different censorship

I Also match on date, previous censorship, previous sensitivity.

I How do ’treated’ bloggers react to censorship?I Outcome: Bloggers’ writings after censorship:

I censorship rate afterI sensitivity of blog text after (estimated by TIRM)I topical content of blogs after

Roberts (UCSD) Text Matching 28 April 2016 20 / 32

Page 160: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Censorship Effects

Example 1: How do bloggers react to censorship?

I Data: 593 bloggers over 6 months spanning 2011,2012

I 150,000 posts

I Return to blogs to measure censorship

I Find censors’ mistakes: two similar blogs, different censorship

I Also match on date, previous censorship, previous sensitivity.

I How do ’treated’ bloggers react to censorship?I Outcome: Bloggers’ writings after censorship:

I censorship rate after

I sensitivity of blog text after (estimated by TIRM)I topical content of blogs after

Roberts (UCSD) Text Matching 28 April 2016 20 / 32

Page 161: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Censorship Effects

Example 1: How do bloggers react to censorship?

I Data: 593 bloggers over 6 months spanning 2011,2012

I 150,000 posts

I Return to blogs to measure censorship

I Find censors’ mistakes: two similar blogs, different censorship

I Also match on date, previous censorship, previous sensitivity.

I How do ’treated’ bloggers react to censorship?I Outcome: Bloggers’ writings after censorship:

I censorship rate afterI sensitivity of blog text after (estimated by TIRM)

I topical content of blogs after

Roberts (UCSD) Text Matching 28 April 2016 20 / 32

Page 162: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Censorship Effects

Example 1: How do bloggers react to censorship?

I Data: 593 bloggers over 6 months spanning 2011,2012

I 150,000 posts

I Return to blogs to measure censorship

I Find censors’ mistakes: two similar blogs, different censorship

I Also match on date, previous censorship, previous sensitivity.

I How do ’treated’ bloggers react to censorship?I Outcome: Bloggers’ writings after censorship:

I censorship rate afterI sensitivity of blog text after (estimated by TIRM)I topical content of blogs after

Roberts (UCSD) Text Matching 28 April 2016 20 / 32

Page 163: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Censorship Effects

TIRM Finds Almost Identical Posts

Original Data

String Kernel Similarity

Fre

quen

cy

0.0 0.2 0.4 0.6 0.8 1.0

02

46

8

Roberts (UCSD) Text Matching 28 April 2016 21 / 32

Page 164: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Censorship Effects

TIRM Finds Almost Identical Posts

TIRM

String Kernel Similarity

Fre

quen

cy

0.0 0.2 0.4 0.6 0.8 1.0

05

1015

20

Roberts (UCSD) Text Matching 28 April 2016 21 / 32

Page 165: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Censorship Effects

TIRM Finds Almost Identical Posts

Topic Match

String Kernel Similarity

Fre

quen

cy

0.0 0.2 0.4 0.6 0.8 1.0

010

2030

Roberts (UCSD) Text Matching 28 April 2016 21 / 32

Page 166: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Censorship Effects

TIRM Finds Almost Identical Posts

MNIR Match

String Kernel Similarity

Fre

quen

cy

0.0 0.2 0.4 0.6 0.8 1.0

05

1015

Roberts (UCSD) Text Matching 28 April 2016 21 / 32

Page 167: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Censorship Effects

Results

I We find 46 matched blogs (censors’ mistakes)

I Nearly perfect matches

I Most matched posts are about Bo Xilai incident, Maoist protests

I 5 posts before treatment:

I No statistical difference between actual censorshipI No statistical difference between TIRM-predicted censorshipI (Not surprising, we are matching on these!)

I 5 posts after treatment:

I Treated group: 20% censorship

Control group: 7% censorship

I TIRM estimates treated text significantly more sensitive than controlI Treated group talks significantly more about Bo Xilai incident after

censorship than controlI Treated group talks significantly more about CCP History/Mao after

censorship than control

Roberts (UCSD) Text Matching 28 April 2016 22 / 32

Page 168: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Censorship Effects

Results

I We find 46 matched blogs (censors’ mistakes)

I Nearly perfect matches

I Most matched posts are about Bo Xilai incident, Maoist protests

I 5 posts before treatment:

I No statistical difference between actual censorshipI No statistical difference between TIRM-predicted censorshipI (Not surprising, we are matching on these!)

I 5 posts after treatment:

I Treated group: 20% censorship

Control group: 7% censorship

I TIRM estimates treated text significantly more sensitive than controlI Treated group talks significantly more about Bo Xilai incident after

censorship than controlI Treated group talks significantly more about CCP History/Mao after

censorship than control

Roberts (UCSD) Text Matching 28 April 2016 22 / 32

Page 169: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Censorship Effects

Results

I We find 46 matched blogs (censors’ mistakes)

I Nearly perfect matches

I Most matched posts are about Bo Xilai incident, Maoist protests

I 5 posts before treatment:

I No statistical difference between actual censorshipI No statistical difference between TIRM-predicted censorshipI (Not surprising, we are matching on these!)

I 5 posts after treatment:

I Treated group: 20% censorship

Control group: 7% censorship

I TIRM estimates treated text significantly more sensitive than controlI Treated group talks significantly more about Bo Xilai incident after

censorship than controlI Treated group talks significantly more about CCP History/Mao after

censorship than control

Roberts (UCSD) Text Matching 28 April 2016 22 / 32

Page 170: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Censorship Effects

Results

I We find 46 matched blogs (censors’ mistakes)

I Nearly perfect matches

I Most matched posts are about Bo Xilai incident, Maoist protests

I 5 posts before treatment:

I No statistical difference between actual censorshipI No statistical difference between TIRM-predicted censorshipI (Not surprising, we are matching on these!)

I 5 posts after treatment:

I Treated group: 20% censorship

Control group: 7% censorship

I TIRM estimates treated text significantly more sensitive than controlI Treated group talks significantly more about Bo Xilai incident after

censorship than controlI Treated group talks significantly more about CCP History/Mao after

censorship than control

Roberts (UCSD) Text Matching 28 April 2016 22 / 32

Page 171: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Censorship Effects

Results

I We find 46 matched blogs (censors’ mistakes)

I Nearly perfect matches

I Most matched posts are about Bo Xilai incident, Maoist protests

I 5 posts before treatment:

I No statistical difference between actual censorshipI No statistical difference between TIRM-predicted censorshipI (Not surprising, we are matching on these!)

I 5 posts after treatment:

I Treated group: 20% censorship

Control group: 7% censorship

I TIRM estimates treated text significantly more sensitive than controlI Treated group talks significantly more about Bo Xilai incident after

censorship than controlI Treated group talks significantly more about CCP History/Mao after

censorship than control

Roberts (UCSD) Text Matching 28 April 2016 22 / 32

Page 172: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Censorship Effects

Results

I We find 46 matched blogs (censors’ mistakes)

I Nearly perfect matches

I Most matched posts are about Bo Xilai incident, Maoist protests

I 5 posts before treatment:I No statistical difference between actual censorship

I No statistical difference between TIRM-predicted censorshipI (Not surprising, we are matching on these!)

I 5 posts after treatment:

I Treated group: 20% censorship

Control group: 7% censorship

I TIRM estimates treated text significantly more sensitive than controlI Treated group talks significantly more about Bo Xilai incident after

censorship than controlI Treated group talks significantly more about CCP History/Mao after

censorship than control

Roberts (UCSD) Text Matching 28 April 2016 22 / 32

Page 173: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Censorship Effects

Results

I We find 46 matched blogs (censors’ mistakes)

I Nearly perfect matches

I Most matched posts are about Bo Xilai incident, Maoist protests

I 5 posts before treatment:I No statistical difference between actual censorshipI No statistical difference between TIRM-predicted censorship

I (Not surprising, we are matching on these!)

I 5 posts after treatment:

I Treated group: 20% censorship

Control group: 7% censorship

I TIRM estimates treated text significantly more sensitive than controlI Treated group talks significantly more about Bo Xilai incident after

censorship than controlI Treated group talks significantly more about CCP History/Mao after

censorship than control

Roberts (UCSD) Text Matching 28 April 2016 22 / 32

Page 174: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Censorship Effects

Results

I We find 46 matched blogs (censors’ mistakes)

I Nearly perfect matches

I Most matched posts are about Bo Xilai incident, Maoist protests

I 5 posts before treatment:I No statistical difference between actual censorshipI No statistical difference between TIRM-predicted censorshipI (Not surprising, we are matching on these!)

I 5 posts after treatment:

I Treated group: 20% censorship

Control group: 7% censorship

I TIRM estimates treated text significantly more sensitive than controlI Treated group talks significantly more about Bo Xilai incident after

censorship than controlI Treated group talks significantly more about CCP History/Mao after

censorship than control

Roberts (UCSD) Text Matching 28 April 2016 22 / 32

Page 175: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Censorship Effects

Results

I We find 46 matched blogs (censors’ mistakes)

I Nearly perfect matches

I Most matched posts are about Bo Xilai incident, Maoist protests

I 5 posts before treatment:I No statistical difference between actual censorshipI No statistical difference between TIRM-predicted censorshipI (Not surprising, we are matching on these!)

I 5 posts after treatment:

I Treated group: 20% censorship

Control group: 7% censorship

I TIRM estimates treated text significantly more sensitive than controlI Treated group talks significantly more about Bo Xilai incident after

censorship than controlI Treated group talks significantly more about CCP History/Mao after

censorship than control

Roberts (UCSD) Text Matching 28 April 2016 22 / 32

Page 176: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Censorship Effects

Results

I We find 46 matched blogs (censors’ mistakes)

I Nearly perfect matches

I Most matched posts are about Bo Xilai incident, Maoist protests

I 5 posts before treatment:I No statistical difference between actual censorshipI No statistical difference between TIRM-predicted censorshipI (Not surprising, we are matching on these!)

I 5 posts after treatment:I Treated group: 20% censorship

Control group: 7% censorshipI TIRM estimates treated text significantly more sensitive than controlI Treated group talks significantly more about Bo Xilai incident after

censorship than controlI Treated group talks significantly more about CCP History/Mao after

censorship than control

Roberts (UCSD) Text Matching 28 April 2016 22 / 32

Page 177: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Censorship Effects

Results

I We find 46 matched blogs (censors’ mistakes)

I Nearly perfect matches

I Most matched posts are about Bo Xilai incident, Maoist protests

I 5 posts before treatment:I No statistical difference between actual censorshipI No statistical difference between TIRM-predicted censorshipI (Not surprising, we are matching on these!)

I 5 posts after treatment:I Treated group: 20% censorship Control group: 7% censorship

I TIRM estimates treated text significantly more sensitive than controlI Treated group talks significantly more about Bo Xilai incident after

censorship than controlI Treated group talks significantly more about CCP History/Mao after

censorship than control

Roberts (UCSD) Text Matching 28 April 2016 22 / 32

Page 178: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Censorship Effects

Results

I We find 46 matched blogs (censors’ mistakes)

I Nearly perfect matches

I Most matched posts are about Bo Xilai incident, Maoist protests

I 5 posts before treatment:I No statistical difference between actual censorshipI No statistical difference between TIRM-predicted censorshipI (Not surprising, we are matching on these!)

I 5 posts after treatment:I Treated group: 20% censorship Control group: 7% censorshipI TIRM estimates treated text significantly more sensitive than control

I Treated group talks significantly more about Bo Xilai incident aftercensorship than control

I Treated group talks significantly more about CCP History/Mao aftercensorship than control

Roberts (UCSD) Text Matching 28 April 2016 22 / 32

Page 179: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Censorship Effects

Results

I We find 46 matched blogs (censors’ mistakes)

I Nearly perfect matches

I Most matched posts are about Bo Xilai incident, Maoist protests

I 5 posts before treatment:I No statistical difference between actual censorshipI No statistical difference between TIRM-predicted censorshipI (Not surprising, we are matching on these!)

I 5 posts after treatment:I Treated group: 20% censorship Control group: 7% censorshipI TIRM estimates treated text significantly more sensitive than controlI Treated group talks significantly more about Bo Xilai incident after

censorship than control

I Treated group talks significantly more about CCP History/Mao aftercensorship than control

Roberts (UCSD) Text Matching 28 April 2016 22 / 32

Page 180: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Censorship Effects

Results

I We find 46 matched blogs (censors’ mistakes)

I Nearly perfect matches

I Most matched posts are about Bo Xilai incident, Maoist protests

I 5 posts before treatment:I No statistical difference between actual censorshipI No statistical difference between TIRM-predicted censorshipI (Not surprising, we are matching on these!)

I 5 posts after treatment:I Treated group: 20% censorship Control group: 7% censorshipI TIRM estimates treated text significantly more sensitive than controlI Treated group talks significantly more about Bo Xilai incident after

censorship than controlI Treated group talks significantly more about CCP History/Mao after

censorship than control

Roberts (UCSD) Text Matching 28 April 2016 22 / 32

Page 181: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Effects of Gender Citations in IR

Example 2: Does gender affect citations in PoliticalScience?

I Maliniak, Powers, Walter (2013): women get cited less than men in IR

I Problem: women write about different topics than men

I Maliniak et al solution: Code articles into (many) categories

I Our solution: Text matching!

I Data: 3,201 journal articles from top 12 IR journals, 1980-2006.

I Code lots of variables, including gender, article age, tenure, etc.

I Treatment: all-female Control: co-ed/all-male

I Our motive: Find similar articles, see how they are cited differently.

Roberts (UCSD) Text Matching 28 April 2016 23 / 32

Page 182: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Effects of Gender Citations in IR

Example 2: Does gender affect citations in PoliticalScience?

I Maliniak, Powers, Walter (2013): women get cited less than men in IR

I Problem: women write about different topics than men

I Maliniak et al solution: Code articles into (many) categories

I Our solution: Text matching!

I Data: 3,201 journal articles from top 12 IR journals, 1980-2006.

I Code lots of variables, including gender, article age, tenure, etc.

I Treatment: all-female Control: co-ed/all-male

I Our motive: Find similar articles, see how they are cited differently.

Roberts (UCSD) Text Matching 28 April 2016 23 / 32

Page 183: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Effects of Gender Citations in IR

Example 2: Does gender affect citations in PoliticalScience?

I Maliniak, Powers, Walter (2013): women get cited less than men in IR

I Problem: women write about different topics than men

I Maliniak et al solution: Code articles into (many) categories

I Our solution: Text matching!

I Data: 3,201 journal articles from top 12 IR journals, 1980-2006.

I Code lots of variables, including gender, article age, tenure, etc.

I Treatment: all-female Control: co-ed/all-male

I Our motive: Find similar articles, see how they are cited differently.

Roberts (UCSD) Text Matching 28 April 2016 23 / 32

Page 184: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Effects of Gender Citations in IR

Example 2: Does gender affect citations in PoliticalScience?

I Maliniak, Powers, Walter (2013): women get cited less than men in IR

I Problem: women write about different topics than men

I Maliniak et al solution: Code articles into (many) categories

I Our solution: Text matching!

I Data: 3,201 journal articles from top 12 IR journals, 1980-2006.

I Code lots of variables, including gender, article age, tenure, etc.

I Treatment: all-female Control: co-ed/all-male

I Our motive: Find similar articles, see how they are cited differently.

Roberts (UCSD) Text Matching 28 April 2016 23 / 32

Page 185: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Effects of Gender Citations in IR

Example 2: Does gender affect citations in PoliticalScience?

I Maliniak, Powers, Walter (2013): women get cited less than men in IR

I Problem: women write about different topics than men

I Maliniak et al solution: Code articles into (many) categories

I Our solution: Text matching!

I Data: 3,201 journal articles from top 12 IR journals, 1980-2006.

I Code lots of variables, including gender, article age, tenure, etc.

I Treatment: all-female Control: co-ed/all-male

I Our motive: Find similar articles, see how they are cited differently.

Roberts (UCSD) Text Matching 28 April 2016 23 / 32

Page 186: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Effects of Gender Citations in IR

Example 2: Does gender affect citations in PoliticalScience?

I Maliniak, Powers, Walter (2013): women get cited less than men in IR

I Problem: women write about different topics than men

I Maliniak et al solution: Code articles into (many) categories

I Our solution: Text matching!

I Data: 3,201 journal articles from top 12 IR journals, 1980-2006.

I Code lots of variables, including gender, article age, tenure, etc.

I Treatment: all-female Control: co-ed/all-male

I Our motive: Find similar articles, see how they are cited differently.

Roberts (UCSD) Text Matching 28 April 2016 23 / 32

Page 187: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Effects of Gender Citations in IR

Example 2: Does gender affect citations in PoliticalScience?

I Maliniak, Powers, Walter (2013): women get cited less than men in IR

I Problem: women write about different topics than men

I Maliniak et al solution: Code articles into (many) categories

I Our solution: Text matching!

I Data: 3,201 journal articles from top 12 IR journals, 1980-2006.

I Code lots of variables, including gender, article age, tenure, etc.

I Treatment: all-female Control: co-ed/all-male

I Our motive: Find similar articles, see how they are cited differently.

Roberts (UCSD) Text Matching 28 April 2016 23 / 32

Page 188: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Effects of Gender Citations in IR

Example 2: Does gender affect citations in PoliticalScience?

I Maliniak, Powers, Walter (2013): women get cited less than men in IR

I Problem: women write about different topics than men

I Maliniak et al solution: Code articles into (many) categories

I Our solution: Text matching!

I Data: 3,201 journal articles from top 12 IR journals, 1980-2006.

I Code lots of variables, including gender, article age, tenure, etc.

I Treatment: all-female Control: co-ed/all-male

I Our motive: Find similar articles, see how they are cited differently.

Roberts (UCSD) Text Matching 28 April 2016 23 / 32

Page 189: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Effects of Gender Citations in IR

Words men and women use differently in IR

Original Data:

Roberts (UCSD) Text Matching 28 April 2016 24 / 32

Page 190: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Effects of Gender Citations in IR

Words men and women use differently in IR

Topic Matching:

Roberts (UCSD) Text Matching 28 April 2016 24 / 32

Page 191: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Effects of Gender Citations in IR

Words men and women use differently in IR

TIRM:

Roberts (UCSD) Text Matching 28 April 2016 24 / 32

Page 192: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Effects of Gender Citations in IR

TIRM Reduces Topical Differences

Mean topic difference (Women−Men)

−0.10 −0.05 0.00 0.05 0.10

Topic 2 : model, variabl, data, effect, measur

Topic 9 : nuclear, weapon, arm, forc, defens

Topic 6 : game, will, cooper, can, strategi

Topic 14 : war, conflict, state, disput, democraci

Topic 1 : state, power, intern, system, polit

Topic 7 : polici, foreign, public, polit, decis

Topic 12 : soviet, militari, war, forc, defens

Topic 13 : trade, econom, polici, bank, intern

Topic 8 : polit, parti, polici, govern, vote

Topic 15 : war, israel, peac, conflict, arab

Topic 3 : polit, conflict, group, ethnic, state

Topic 4 : econom, develop, industri, countri, world

Topic 10 : state, china, unit, foreign, polici

Topic 11 : intern, state, organ, institut, law

Topic 5 : polit, social, one, theori, world

Full Data Set (Unmatched)TIRMMNIRTopic MatchingHuman Coding Matched

Roberts (UCSD) Text Matching 28 April 2016 25 / 32

Page 193: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Effects of Gender Citations in IR

TIRM Reduces Topical Differences

Mean topic difference (Women−Men)

−0.10 −0.05 0.00 0.05 0.10

Topic 2 : model, variabl, data, effect, measur

Topic 9 : nuclear, weapon, arm, forc, defens

Topic 6 : game, will, cooper, can, strategi

Topic 14 : war, conflict, state, disput, democraci

Topic 1 : state, power, intern, system, polit

Topic 7 : polici, foreign, public, polit, decis

Topic 12 : soviet, militari, war, forc, defens

Topic 13 : trade, econom, polici, bank, intern

Topic 8 : polit, parti, polici, govern, vote

Topic 15 : war, israel, peac, conflict, arab

Topic 3 : polit, conflict, group, ethnic, state

Topic 4 : econom, develop, industri, countri, world

Topic 10 : state, china, unit, foreign, polici

Topic 11 : intern, state, organ, institut, law

Topic 5 : polit, social, one, theori, world

Full Data Set (Unmatched)TIRMMNIRTopic MatchingHuman Coding Matched

Roberts (UCSD) Text Matching 28 April 2016 25 / 32

Page 194: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Effects of Gender Citations in IR

Results

I Maliniak et al: Women receive 80% the citations of men

I In our data: women receive fewer citations robust across matches

I Final match: Women receive 40-60% the citations of men

I Still looking into why we are getting more extreme results

I Could be the difference is in very high citation counts

Roberts (UCSD) Text Matching 28 April 2016 26 / 32

Page 195: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Effects of Gender Citations in IR

Results

I Maliniak et al: Women receive 80% the citations of men

I In our data: women receive fewer citations robust across matches

I Final match: Women receive 40-60% the citations of men

I Still looking into why we are getting more extreme results

I Could be the difference is in very high citation counts

Roberts (UCSD) Text Matching 28 April 2016 26 / 32

Page 196: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Effects of Gender Citations in IR

Results

I Maliniak et al: Women receive 80% the citations of men

I In our data: women receive fewer citations robust across matches

I Final match: Women receive 40-60% the citations of men

I Still looking into why we are getting more extreme results

I Could be the difference is in very high citation counts

Roberts (UCSD) Text Matching 28 April 2016 26 / 32

Page 197: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Effects of Gender Citations in IR

Results

I Maliniak et al: Women receive 80% the citations of men

I In our data: women receive fewer citations robust across matches

I Final match: Women receive 40-60% the citations of men

I Still looking into why we are getting more extreme results

I Could be the difference is in very high citation counts

Roberts (UCSD) Text Matching 28 April 2016 26 / 32

Page 198: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Effects of Gender Citations in IR

Results

I Maliniak et al: Women receive 80% the citations of men

I In our data: women receive fewer citations robust across matches

I Final match: Women receive 40-60% the citations of men

I Still looking into why we are getting more extreme results

I Could be the difference is in very high citation counts

Roberts (UCSD) Text Matching 28 April 2016 26 / 32

Page 199: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Effects of Gender Citations in IR

Results

I Maliniak et al: Women receive 80% the citations of men

I In our data: women receive fewer citations robust across matches

I Final match: Women receive 40-60% the citations of men

I Still looking into why we are getting more extreme results

I Could be the difference is in very high citation counts

Roberts (UCSD) Text Matching 28 April 2016 26 / 32

Page 200: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Effects of Gender Citations in IR

Ex. 3: Did killing Bin Laden make his ideas less popular?

“His death will serve as a global clarion call for another generation ofjihadists.”– Ed Husain (CFR)

“al-Qaida may emerge even more radical, and more closely united underthe banner of an iconic martyr.”– Abdel Bari Atwan (The Guardian)

Roberts (UCSD) Text Matching 28 April 2016 27 / 32

Page 201: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Effects of Gender Citations in IR

Ex. 3: Did killing Bin Laden make his ideas less popular?

“His death will serve as a global clarion call for another generation ofjihadists.”– Ed Husain (CFR)

“al-Qaida may emerge even more radical, and more closely united underthe banner of an iconic martyr.”– Abdel Bari Atwan (The Guardian)

Roberts (UCSD) Text Matching 28 April 2016 27 / 32

Page 202: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Effects of Gender Citations in IR

Ex. 3: Did killing Bin Laden make his ideas less popular?

“The idea that Obama made a strategic misstep by killing a manresponsible for the death of thousands of U.S. citizens and committed tokilling thousands more is absurd. Rather than making him a martyr, BinLaden’s killing demonstrated that he was, like the rest of us, mortal.” –Robert Simcox (LA Times)

Roberts (UCSD) Text Matching 28 April 2016 28 / 32

Page 203: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Effects of Gender Citations in IR

Ex. 3: Did killing Bin Laden make his ideas less popular?

We don’t really know.

Usama Bin Laden Anwar al-Awlaki Abu Yahya al-Libi5/2/2011 9/30/2011 6/5/2012

Roberts (UCSD) Text Matching 28 April 2016 29 / 32

Page 204: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Effects of Gender Citations in IR

Ex. 3: Did killing Bin Laden make his ideas less popular?

We don’t really know.

Usama Bin Laden Anwar al-Awlaki Abu Yahya al-Libi5/2/2011 9/30/2011 6/5/2012

Roberts (UCSD) Text Matching 28 April 2016 29 / 32

Page 205: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Effects of Gender Citations in IR

Empirical Strategy

I View-count data from a Jihadist website, scraped over time

I Does targeted killing of Bin Laden increase views of his work?

I TIRM matching + match on pre-treatment page views.

I QOI is ATT: nearest neighbor matching instead of CEM

I Validation: Matches accord with sub-pages on website

Roberts (UCSD) Text Matching 28 April 2016 30 / 32

Page 206: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Effects of Gender Citations in IR

Empirical Strategy

I View-count data from a Jihadist website, scraped over time

I Does targeted killing of Bin Laden increase views of his work?

I TIRM matching + match on pre-treatment page views.

I QOI is ATT: nearest neighbor matching instead of CEM

I Validation: Matches accord with sub-pages on website

Roberts (UCSD) Text Matching 28 April 2016 30 / 32

Page 207: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Effects of Gender Citations in IR

Empirical Strategy

I View-count data from a Jihadist website, scraped over time

I Does targeted killing of Bin Laden increase views of his work?

I TIRM matching + match on pre-treatment page views.

I QOI is ATT: nearest neighbor matching instead of CEM

I Validation: Matches accord with sub-pages on website

Roberts (UCSD) Text Matching 28 April 2016 30 / 32

Page 208: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Effects of Gender Citations in IR

Empirical Strategy

I View-count data from a Jihadist website, scraped over time

I Does targeted killing of Bin Laden increase views of his work?

I TIRM matching + match on pre-treatment page views.

I QOI is ATT: nearest neighbor matching instead of CEM

I Validation: Matches accord with sub-pages on website

Roberts (UCSD) Text Matching 28 April 2016 30 / 32

Page 209: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Effects of Gender Citations in IR

Empirical Strategy

I View-count data from a Jihadist website, scraped over time

I Does targeted killing of Bin Laden increase views of his work?

I TIRM matching + match on pre-treatment page views.

I QOI is ATT: nearest neighbor matching instead of CEM

I Validation: Matches accord with sub-pages on website

Roberts (UCSD) Text Matching 28 April 2016 30 / 32

Page 210: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Effects of Gender Citations in IR

Empirical Strategy

I View-count data from a Jihadist website, scraped over time

I Does targeted killing of Bin Laden increase views of his work?

I TIRM matching + match on pre-treatment page views.

I QOI is ATT: nearest neighbor matching instead of CEM

I Validation: Matches accord with sub-pages on website

Roberts (UCSD) Text Matching 28 April 2016 30 / 32

Page 211: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Effects of Gender Citations in IR

Martyr Effect: Clear short-term increase in page views

2012 2013 2014

−3000

−2000

−1000

0

1000

2000

3000

Est

imat

ed In

crea

se in

Pag

e V

iew

s pe

r D

ocum

ent

Date

0

250

500

2011

−05−

02

2011

−07−

02

2011

−09−

02

2011

−11−

02

Matching on topics and page views

Figure: Estimated effects of Usama Bin Laden’s death (on May 2, 2011) onsubsequent page views of his documents on a large jihadist web-library.

Roberts (UCSD) Text Matching 28 April 2016 31 / 32

Page 212: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Effects of Gender Citations in IR

Martyr Effect: Clear short-term increase in page views

2012 2013 2014

−3000

−2000

−1000

0

1000

2000

3000

Est

imat

ed In

crea

se in

Pag

e V

iew

s pe

r D

ocum

ent

Date

0

250

500

2011

−05−

02

2011

−07−

02

2011

−09−

02

2011

−11−

02

Matching on topics and page views

Figure: Estimated effects of Usama Bin Laden’s death (on May 2, 2011) onsubsequent page views of his documents on a large jihadist web-library.

Roberts (UCSD) Text Matching 28 April 2016 31 / 32

Page 213: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Effects of Gender Citations in IR

Conclusion

I Lots of applications measure pre-treatment confounders with text

I No methods developed yet to do thisI We develop a new method, Topical Inverse Regression Matching

I Matching on topical density estimate

bounds differences betweentopics

I Match on probability of treatment

balances on words related totreatment

I Future work:

I Develop theoretical properties of TIRMI Extend to high-dimensional cases other than textI Create an R package

Roberts (UCSD) Text Matching 28 April 2016 32 / 32

Page 214: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Effects of Gender Citations in IR

Conclusion

I Lots of applications measure pre-treatment confounders with text

I No methods developed yet to do thisI We develop a new method, Topical Inverse Regression Matching

I Matching on topical density estimate

bounds differences betweentopics

I Match on probability of treatment

balances on words related totreatment

I Future work:

I Develop theoretical properties of TIRMI Extend to high-dimensional cases other than textI Create an R package

Roberts (UCSD) Text Matching 28 April 2016 32 / 32

Page 215: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Effects of Gender Citations in IR

Conclusion

I Lots of applications measure pre-treatment confounders with text

I No methods developed yet to do this

I We develop a new method, Topical Inverse Regression Matching

I Matching on topical density estimate

bounds differences betweentopics

I Match on probability of treatment

balances on words related totreatment

I Future work:

I Develop theoretical properties of TIRMI Extend to high-dimensional cases other than textI Create an R package

Roberts (UCSD) Text Matching 28 April 2016 32 / 32

Page 216: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Effects of Gender Citations in IR

Conclusion

I Lots of applications measure pre-treatment confounders with text

I No methods developed yet to do thisI We develop a new method, Topical Inverse Regression Matching

I Matching on topical density estimate

bounds differences betweentopics

I Match on probability of treatment

balances on words related totreatment

I Future work:

I Develop theoretical properties of TIRMI Extend to high-dimensional cases other than textI Create an R package

Roberts (UCSD) Text Matching 28 April 2016 32 / 32

Page 217: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Effects of Gender Citations in IR

Conclusion

I Lots of applications measure pre-treatment confounders with text

I No methods developed yet to do thisI We develop a new method, Topical Inverse Regression Matching

I Matching on topical density estimate

bounds differences betweentopics

I Match on probability of treatment

balances on words related totreatment

I Future work:

I Develop theoretical properties of TIRMI Extend to high-dimensional cases other than textI Create an R package

Roberts (UCSD) Text Matching 28 April 2016 32 / 32

Page 218: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Effects of Gender Citations in IR

Conclusion

I Lots of applications measure pre-treatment confounders with text

I No methods developed yet to do thisI We develop a new method, Topical Inverse Regression Matching

I Matching on topical density estimate bounds differences betweentopics

I Match on probability of treatment

balances on words related totreatment

I Future work:

I Develop theoretical properties of TIRMI Extend to high-dimensional cases other than textI Create an R package

Roberts (UCSD) Text Matching 28 April 2016 32 / 32

Page 219: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Effects of Gender Citations in IR

Conclusion

I Lots of applications measure pre-treatment confounders with text

I No methods developed yet to do thisI We develop a new method, Topical Inverse Regression Matching

I Matching on topical density estimate bounds differences betweentopics

I Match on probability of treatment

balances on words related totreatment

I Future work:

I Develop theoretical properties of TIRMI Extend to high-dimensional cases other than textI Create an R package

Roberts (UCSD) Text Matching 28 April 2016 32 / 32

Page 220: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Effects of Gender Citations in IR

Conclusion

I Lots of applications measure pre-treatment confounders with text

I No methods developed yet to do thisI We develop a new method, Topical Inverse Regression Matching

I Matching on topical density estimate bounds differences betweentopics

I Match on probability of treatment balances on words related totreatment

I Future work:

I Develop theoretical properties of TIRMI Extend to high-dimensional cases other than textI Create an R package

Roberts (UCSD) Text Matching 28 April 2016 32 / 32

Page 221: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Effects of Gender Citations in IR

Conclusion

I Lots of applications measure pre-treatment confounders with text

I No methods developed yet to do thisI We develop a new method, Topical Inverse Regression Matching

I Matching on topical density estimate bounds differences betweentopics

I Match on probability of treatment balances on words related totreatment

I Future work:

I Develop theoretical properties of TIRMI Extend to high-dimensional cases other than textI Create an R package

Roberts (UCSD) Text Matching 28 April 2016 32 / 32

Page 222: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Effects of Gender Citations in IR

Conclusion

I Lots of applications measure pre-treatment confounders with text

I No methods developed yet to do thisI We develop a new method, Topical Inverse Regression Matching

I Matching on topical density estimate bounds differences betweentopics

I Match on probability of treatment balances on words related totreatment

I Future work:I Develop theoretical properties of TIRM

I Extend to high-dimensional cases other than textI Create an R package

Roberts (UCSD) Text Matching 28 April 2016 32 / 32

Page 223: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Effects of Gender Citations in IR

Conclusion

I Lots of applications measure pre-treatment confounders with text

I No methods developed yet to do thisI We develop a new method, Topical Inverse Regression Matching

I Matching on topical density estimate bounds differences betweentopics

I Match on probability of treatment balances on words related totreatment

I Future work:I Develop theoretical properties of TIRMI Extend to high-dimensional cases other than text

I Create an R package

Roberts (UCSD) Text Matching 28 April 2016 32 / 32

Page 224: University of California, Los Angeleshelper.ipam.ucla.edu/publications/caws3/caws3_13063.pdf · Introduction How do people react to online repression? ILots of governments try to

Estimating Effects of Gender Citations in IR

Conclusion

I Lots of applications measure pre-treatment confounders with text

I No methods developed yet to do thisI We develop a new method, Topical Inverse Regression Matching

I Matching on topical density estimate bounds differences betweentopics

I Match on probability of treatment balances on words related totreatment

I Future work:I Develop theoretical properties of TIRMI Extend to high-dimensional cases other than textI Create an R package

Roberts (UCSD) Text Matching 28 April 2016 32 / 32