Computer-aided content analysis of digitally-enabled movementsAlexander HannaUniversity of Wisconsin-Madison@alexhanna
Agenda
1. Focus on methods of prior digitally-mediated movement research
2. Outline application to specific case, April 6 Youth Movement
3. Describe and walk through coding procedure
Alexander Hanna, Wisconsin (@alexhanna)
The rise of digitally-enabled movements
Digitally-enabled movement - Movements that have incorporated some aspect of online activity through information and communication technologies (ICTs)
Movements leave traces which are both 1) records of events and 2) movement activity in and of themselves
Alexander Hanna, Wisconsin (@alexhanna)
Digitally-enabled movement activity
Coordinating people for other (usually offline) activities (e-mobilization in Earl and Kimport 2012)
Online mobilizations (e-tactics, ibid)
Issue discussion and development of discourses (analogous to “free spaces” [Evans and Boyte 1986; Polletta 1999] or counterpublics)
Persuasion and micromobilization (Snow et al. 1986)
Alexander Hanna, Wisconsin (@alexhanna)
Previous types of analysis
Case study (e.g. Gurak 1997, Eaton 2010)
Network analysis (e.g. Garrido and Halavais 2003, Bennett, Foot, Xenos 2011)
Volume and group properties (e.g. Caren, Jowers, and Gaby 2012)
Alexander Hanna, Wisconsin (@alexhanna)
Problems with these methods
Need to focus on content, but too much data for manual content analysis
Cost and time prohibitive to code by hand
Alexander Hanna, Wisconsin (@alexhanna)
Too many Datas.
A solution - computer-aided content analysis
Computer-aided content analysis (also called automated content analysis; textual analysis)
- Goal: extracting information out of text- Includes word search, statistical machine
learning, language modeling
Note: this supplements deep case knowledge, encourages both inductive and deductive approaches
Alexander Hanna, Wisconsin (@alexhanna)
Case: April 6 Youth Movement
Facebook group created as solidarity action with Egyptian workers
Proposed actions on April 6, strike date, and May 4, Mubarak’s birthday
Alexander Hanna, Wisconsin (@alexhanna)
Classification method of current study
Classification of documents for a number of set categories as a possible tool for study of digitally-enabled movements
Supervised machine learning, reporting proportions of categories in a body of texts (Hopkins and King 2010)
Categories derived from theory, interviews, and coding process
Alexander Hanna, Wisconsin (@alexhanna)
Supervised machine learning
“Training set” is handcoded
“Test set” as uncoded, to be coded by algorithm
Alexander Hanna, Wisconsin (@alexhanna)
“Supervised” means 1) categories known a priori; 2) involves handcoding
Think of supervised machine learning like regression analysis
Categories for classification
Offline coordination (e-mobilization)- Example: “get to Tahrir Square”
Internet action (e.g. “e-tactics)- Example: changing profile pictures
Media and press- Example: links to BBC, al-Jazeera
Reporting on events- Example: citizen journalism, pictures of events
Request for information- Example: “What is happening right now in Tahrir?”
Alexander Hanna, Wisconsin (@alexhanna)
Expectations
1. Increased offline coordination directly before mobilization dates
2. Increased reporting and press on mobilization dates
...but not clear what will go afterward.
Alexander Hanna, Wisconsin (@alexhanna)
Analysis process
1. Data collection2. Coding training set3. Reliability testing of training set 4. Data preprocessing5. Validation6. Applying analysis across dataset
Alexander Hanna, Wisconsin (@alexhanna)
Data collection, coding, preprocessing
Data collection- Scraping of FB group page March-May 2008- 64,197 messages, 3,841 unique users- Messages in Arabic, English, and “Franco”
Human coding of “training set”- 638 messages, assessed intercoder reliability
Data preprocessing- Stemming
Generating different parameters per language- Focusing only on Arabic, Franco
Alexander Hanna, Wisconsin (@alexhanna)
Validation
Alexander Hanna, Wisconsin (@alexhanna)
k-fold cross validation as a common method of validation
Source: http://www.imtech.res.in/raghava/gpsr/Evaluation_Bioinformatics_Methods.htm
Validation results
Alexander Hanna, Wisconsin (@alexhanna)
Following Hopkins and King (2010), split dataset in half, using one half to estimate other
Automatedanalysis
Alexander Hanna, Wisconsin (@alexhanna)
Automatedanalysis
Mostly no mobilization
Alexander Hanna, Wisconsin (@alexhanna)
Automatedanalysis
Mostly no mobilization
Peaks in offline coordination before days of mobilization
Alexander Hanna, Wisconsin (@alexhanna)
Automatedanalysis
Mostly no mobilization
Peaks in offline coordination before days of mobilization
Language matters in requests for information
Alexander Hanna, Wisconsin (@alexhanna)
Discussion
Expectations - Coordination increased before action- ...but no other categories did
Possible avenues for error- Coder misclassification in the training set- Insufficient information in training set
Alexander Hanna, Wisconsin (@alexhanna)
Conclusion
Rise of “big data” necessitates new methods, development of “computational social science” (Lazer et al. 2010)
Drawing on computer-aided methods for content analysis of digitally-enabled movement texts
The process requires extensive prep, theory- and data-informed categories, sufficient case knowledge, validation
Necessary to integrate these methods with existing quantitative and qualitative ones
Alexander Hanna, Wisconsin (@alexhanna)
Supplementary material
Alexander Hanna, Wisconsin (@alexhanna)
Methods of automated textual analysis
Alexander Hanna, Wisconsin (@alexhanna)
Supervised methods
“Supervised” because humans do manual coding
Categories are defined a priori to coding
Analogous to regression
Alexander Hanna, Wisconsin (@alexhanna)
Unsupervised methods
“Unsupervised” because machine defines categories from statistical co-occurrence
Analogous to factor analysis
Alexander Hanna, Wisconsin (@alexhanna)
Multilingual stemming
Consists, consisting, consistent -> consist
العربیة، عربیة -> عربي
al 3rabiyya, 3rabiyya -> 3rabi
Alexander Hanna, Wisconsin (@alexhanna)
Dataset features
Alexander Hanna, Wisconsin (@alexhanna)
Word search analysis
Coordination terms low throughout
Media as peaking on days of action
Alexander Hanna, Wisconsin (@alexhanna)
Top Related