Computer-aided content analysis of digitally enabled movements

29
Computer-aided content analysis of digitally- enabled movements Alexander Hanna University of Wisconsin-Madison @alexhanna

description

 

Transcript of Computer-aided content analysis of digitally enabled movements

Page 1: Computer-aided content analysis of digitally enabled movements

Computer-aided content analysis of digitally-enabled movementsAlexander HannaUniversity of Wisconsin-Madison@alexhanna

Page 2: Computer-aided content analysis of digitally enabled movements

Agenda

1. Focus on methods of prior digitally-mediated movement research

2. Outline application to specific case, April 6 Youth Movement

3. Describe and walk through coding procedure

Alexander Hanna, Wisconsin (@alexhanna)

Page 3: Computer-aided content analysis of digitally enabled movements

The rise of digitally-enabled movements

Digitally-enabled movement - Movements that have incorporated some aspect of online activity through information and communication technologies (ICTs)

Movements leave traces which are both 1) records of events and 2) movement activity in and of themselves

Alexander Hanna, Wisconsin (@alexhanna)

Page 4: Computer-aided content analysis of digitally enabled movements

Digitally-enabled movement activity

Coordinating people for other (usually offline) activities (e-mobilization in Earl and Kimport 2012)

Online mobilizations (e-tactics, ibid)

Issue discussion and development of discourses (analogous to “free spaces” [Evans and Boyte 1986; Polletta 1999] or counterpublics)

Persuasion and micromobilization (Snow et al. 1986)

Alexander Hanna, Wisconsin (@alexhanna)

Page 5: Computer-aided content analysis of digitally enabled movements

Previous types of analysis

Case study (e.g. Gurak 1997, Eaton 2010)

Network analysis (e.g. Garrido and Halavais 2003, Bennett, Foot, Xenos 2011)

Volume and group properties (e.g. Caren, Jowers, and Gaby 2012)

Alexander Hanna, Wisconsin (@alexhanna)

Page 6: Computer-aided content analysis of digitally enabled movements

Problems with these methods

Need to focus on content, but too much data for manual content analysis

Cost and time prohibitive to code by hand

Alexander Hanna, Wisconsin (@alexhanna)

Too many Datas.

Page 7: Computer-aided content analysis of digitally enabled movements

A solution - computer-aided content analysis

Computer-aided content analysis (also called automated content analysis; textual analysis)

- Goal: extracting information out of text- Includes word search, statistical machine

learning, language modeling

Note: this supplements deep case knowledge, encourages both inductive and deductive approaches

Alexander Hanna, Wisconsin (@alexhanna)

Page 8: Computer-aided content analysis of digitally enabled movements

Case: April 6 Youth Movement

Facebook group created as solidarity action with Egyptian workers

Proposed actions on April 6, strike date, and May 4, Mubarak’s birthday

Alexander Hanna, Wisconsin (@alexhanna)

Page 9: Computer-aided content analysis of digitally enabled movements

Classification method of current study

Classification of documents for a number of set categories as a possible tool for study of digitally-enabled movements

Supervised machine learning, reporting proportions of categories in a body of texts (Hopkins and King 2010)

Categories derived from theory, interviews, and coding process

Alexander Hanna, Wisconsin (@alexhanna)

Page 10: Computer-aided content analysis of digitally enabled movements

Supervised machine learning

“Training set” is handcoded

“Test set” as uncoded, to be coded by algorithm

Alexander Hanna, Wisconsin (@alexhanna)

“Supervised” means 1) categories known a priori; 2) involves handcoding

Think of supervised machine learning like regression analysis

Page 11: Computer-aided content analysis of digitally enabled movements

Categories for classification

Offline coordination (e-mobilization)- Example: “get to Tahrir Square”

Internet action (e.g. “e-tactics)- Example: changing profile pictures

Media and press- Example: links to BBC, al-Jazeera

Reporting on events- Example: citizen journalism, pictures of events

Request for information- Example: “What is happening right now in Tahrir?”

Alexander Hanna, Wisconsin (@alexhanna)

Page 12: Computer-aided content analysis of digitally enabled movements

Expectations

1. Increased offline coordination directly before mobilization dates

2. Increased reporting and press on mobilization dates

...but not clear what will go afterward.

Alexander Hanna, Wisconsin (@alexhanna)

Page 13: Computer-aided content analysis of digitally enabled movements

Analysis process

1. Data collection2. Coding training set3. Reliability testing of training set 4. Data preprocessing5. Validation6. Applying analysis across dataset

Alexander Hanna, Wisconsin (@alexhanna)

Page 14: Computer-aided content analysis of digitally enabled movements

Data collection, coding, preprocessing

Data collection- Scraping of FB group page March-May 2008- 64,197 messages, 3,841 unique users- Messages in Arabic, English, and “Franco”

Human coding of “training set”- 638 messages, assessed intercoder reliability

Data preprocessing- Stemming

Generating different parameters per language- Focusing only on Arabic, Franco

Alexander Hanna, Wisconsin (@alexhanna)

Page 15: Computer-aided content analysis of digitally enabled movements

Validation

Alexander Hanna, Wisconsin (@alexhanna)

k-fold cross validation as a common method of validation

Source: http://www.imtech.res.in/raghava/gpsr/Evaluation_Bioinformatics_Methods.htm

Page 16: Computer-aided content analysis of digitally enabled movements

Validation results

Alexander Hanna, Wisconsin (@alexhanna)

Following Hopkins and King (2010), split dataset in half, using one half to estimate other

Page 17: Computer-aided content analysis of digitally enabled movements

Automatedanalysis

Alexander Hanna, Wisconsin (@alexhanna)

Page 18: Computer-aided content analysis of digitally enabled movements

Automatedanalysis

Mostly no mobilization

Alexander Hanna, Wisconsin (@alexhanna)

Page 19: Computer-aided content analysis of digitally enabled movements

Automatedanalysis

Mostly no mobilization

Peaks in offline coordination before days of mobilization

Alexander Hanna, Wisconsin (@alexhanna)

Page 20: Computer-aided content analysis of digitally enabled movements

Automatedanalysis

Mostly no mobilization

Peaks in offline coordination before days of mobilization

Language matters in requests for information

Alexander Hanna, Wisconsin (@alexhanna)

Page 21: Computer-aided content analysis of digitally enabled movements

Discussion

Expectations - Coordination increased before action- ...but no other categories did

Possible avenues for error- Coder misclassification in the training set- Insufficient information in training set

Alexander Hanna, Wisconsin (@alexhanna)

Page 22: Computer-aided content analysis of digitally enabled movements

Conclusion

Rise of “big data” necessitates new methods, development of “computational social science” (Lazer et al. 2010)

Drawing on computer-aided methods for content analysis of digitally-enabled movement texts

The process requires extensive prep, theory- and data-informed categories, sufficient case knowledge, validation

Necessary to integrate these methods with existing quantitative and qualitative ones

Alexander Hanna, Wisconsin (@alexhanna)

Page 23: Computer-aided content analysis of digitally enabled movements

Supplementary material

Alexander Hanna, Wisconsin (@alexhanna)

Page 24: Computer-aided content analysis of digitally enabled movements

Methods of automated textual analysis

Alexander Hanna, Wisconsin (@alexhanna)

Page 25: Computer-aided content analysis of digitally enabled movements

Supervised methods

“Supervised” because humans do manual coding

Categories are defined a priori to coding

Analogous to regression

Alexander Hanna, Wisconsin (@alexhanna)

Page 26: Computer-aided content analysis of digitally enabled movements

Unsupervised methods

“Unsupervised” because machine defines categories from statistical co-occurrence

Analogous to factor analysis

Alexander Hanna, Wisconsin (@alexhanna)

Page 27: Computer-aided content analysis of digitally enabled movements

Multilingual stemming

Consists, consisting, consistent -> consist

العربیة، عربیة -> عربي

al 3rabiyya, 3rabiyya -> 3rabi

Alexander Hanna, Wisconsin (@alexhanna)

Page 28: Computer-aided content analysis of digitally enabled movements

Dataset features

Alexander Hanna, Wisconsin (@alexhanna)

Page 29: Computer-aided content analysis of digitally enabled movements

Word search analysis

Coordination terms low throughout

Media as peaking on days of action

Alexander Hanna, Wisconsin (@alexhanna)