Computer-aided content analysis of digitally enabled movements

Post on 24-Jan-2015

340 views 0 download

Tags:

description

 

Transcript of Computer-aided content analysis of digitally enabled movements

Computer-aided content analysis of digitally-enabled movementsAlexander HannaUniversity of Wisconsin-Madison@alexhanna

Agenda

1. Focus on methods of prior digitally-mediated movement research

2. Outline application to specific case, April 6 Youth Movement

3. Describe and walk through coding procedure

Alexander Hanna, Wisconsin (@alexhanna)

The rise of digitally-enabled movements

Digitally-enabled movement - Movements that have incorporated some aspect of online activity through information and communication technologies (ICTs)

Movements leave traces which are both 1) records of events and 2) movement activity in and of themselves

Alexander Hanna, Wisconsin (@alexhanna)

Digitally-enabled movement activity

Coordinating people for other (usually offline) activities (e-mobilization in Earl and Kimport 2012)

Online mobilizations (e-tactics, ibid)

Issue discussion and development of discourses (analogous to “free spaces” [Evans and Boyte 1986; Polletta 1999] or counterpublics)

Persuasion and micromobilization (Snow et al. 1986)

Alexander Hanna, Wisconsin (@alexhanna)

Previous types of analysis

Case study (e.g. Gurak 1997, Eaton 2010)

Network analysis (e.g. Garrido and Halavais 2003, Bennett, Foot, Xenos 2011)

Volume and group properties (e.g. Caren, Jowers, and Gaby 2012)

Alexander Hanna, Wisconsin (@alexhanna)

Problems with these methods

Need to focus on content, but too much data for manual content analysis

Cost and time prohibitive to code by hand

Alexander Hanna, Wisconsin (@alexhanna)

Too many Datas.

A solution - computer-aided content analysis

Computer-aided content analysis (also called automated content analysis; textual analysis)

- Goal: extracting information out of text- Includes word search, statistical machine

learning, language modeling

Note: this supplements deep case knowledge, encourages both inductive and deductive approaches

Alexander Hanna, Wisconsin (@alexhanna)

Case: April 6 Youth Movement

Facebook group created as solidarity action with Egyptian workers

Proposed actions on April 6, strike date, and May 4, Mubarak’s birthday

Alexander Hanna, Wisconsin (@alexhanna)

Classification method of current study

Classification of documents for a number of set categories as a possible tool for study of digitally-enabled movements

Supervised machine learning, reporting proportions of categories in a body of texts (Hopkins and King 2010)

Categories derived from theory, interviews, and coding process

Alexander Hanna, Wisconsin (@alexhanna)

Supervised machine learning

“Training set” is handcoded

“Test set” as uncoded, to be coded by algorithm

Alexander Hanna, Wisconsin (@alexhanna)

“Supervised” means 1) categories known a priori; 2) involves handcoding

Think of supervised machine learning like regression analysis

Categories for classification

Offline coordination (e-mobilization)- Example: “get to Tahrir Square”

Internet action (e.g. “e-tactics)- Example: changing profile pictures

Media and press- Example: links to BBC, al-Jazeera

Reporting on events- Example: citizen journalism, pictures of events

Request for information- Example: “What is happening right now in Tahrir?”

Alexander Hanna, Wisconsin (@alexhanna)

Expectations

1. Increased offline coordination directly before mobilization dates

2. Increased reporting and press on mobilization dates

...but not clear what will go afterward.

Alexander Hanna, Wisconsin (@alexhanna)

Analysis process

1. Data collection2. Coding training set3. Reliability testing of training set 4. Data preprocessing5. Validation6. Applying analysis across dataset

Alexander Hanna, Wisconsin (@alexhanna)

Data collection, coding, preprocessing

Data collection- Scraping of FB group page March-May 2008- 64,197 messages, 3,841 unique users- Messages in Arabic, English, and “Franco”

Human coding of “training set”- 638 messages, assessed intercoder reliability

Data preprocessing- Stemming

Generating different parameters per language- Focusing only on Arabic, Franco

Alexander Hanna, Wisconsin (@alexhanna)

Validation

Alexander Hanna, Wisconsin (@alexhanna)

k-fold cross validation as a common method of validation

Source: http://www.imtech.res.in/raghava/gpsr/Evaluation_Bioinformatics_Methods.htm

Validation results

Alexander Hanna, Wisconsin (@alexhanna)

Following Hopkins and King (2010), split dataset in half, using one half to estimate other

Automatedanalysis

Alexander Hanna, Wisconsin (@alexhanna)

Automatedanalysis

Mostly no mobilization

Alexander Hanna, Wisconsin (@alexhanna)

Automatedanalysis

Mostly no mobilization

Peaks in offline coordination before days of mobilization

Alexander Hanna, Wisconsin (@alexhanna)

Automatedanalysis

Mostly no mobilization

Peaks in offline coordination before days of mobilization

Language matters in requests for information

Alexander Hanna, Wisconsin (@alexhanna)

Discussion

Expectations - Coordination increased before action- ...but no other categories did

Possible avenues for error- Coder misclassification in the training set- Insufficient information in training set

Alexander Hanna, Wisconsin (@alexhanna)

Conclusion

Rise of “big data” necessitates new methods, development of “computational social science” (Lazer et al. 2010)

Drawing on computer-aided methods for content analysis of digitally-enabled movement texts

The process requires extensive prep, theory- and data-informed categories, sufficient case knowledge, validation

Necessary to integrate these methods with existing quantitative and qualitative ones

Alexander Hanna, Wisconsin (@alexhanna)

Supplementary material

Alexander Hanna, Wisconsin (@alexhanna)

Methods of automated textual analysis

Alexander Hanna, Wisconsin (@alexhanna)

Supervised methods

“Supervised” because humans do manual coding

Categories are defined a priori to coding

Analogous to regression

Alexander Hanna, Wisconsin (@alexhanna)

Unsupervised methods

“Unsupervised” because machine defines categories from statistical co-occurrence

Analogous to factor analysis

Alexander Hanna, Wisconsin (@alexhanna)

Multilingual stemming

Consists, consisting, consistent -> consist

العربیة، عربیة -> عربي

al 3rabiyya, 3rabiyya -> 3rabi

Alexander Hanna, Wisconsin (@alexhanna)

Dataset features

Alexander Hanna, Wisconsin (@alexhanna)

Word search analysis

Coordination terms low throughout

Media as peaking on days of action

Alexander Hanna, Wisconsin (@alexhanna)