Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016
-
Upload
seattle-daml-meetup -
Category
Science
-
view
213 -
download
2
Transcript of Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016
Online Hate SpeechTowards automated moderation
Emily Y. SpahnGalvanize Data Science Immersive, Seattle, Mar 2016
What is Hate Speech?
Speech advocating incitement to harm based on the target's membership in a group
Definition
The Problem with Hate Speech Online
Alienates Users
&
Costs Time
AUTOMATE!
⇒ Build a model to predict if a comment is hate speech, and if so, against what group.
What can we do about online hate speech?
DATA SOURCE: May 2015 Reddit comments
Data Used: subreddits & data labeling
Reddit Comments from May 201554.5+ million comments with metadata, from 50,138 subreddits
Hateful Subreddits11 hateful subreddits
565,494 hateful comments:
● 56% body size● 33.6% gender● 9.4% race● 1% religion
Not Hateful Subreddits13 not hateful subreddits
1,012,052 not hateful comments:● 75% sometimes controversial
but well-moderated subreddits● 11.2 % gender● 7.7 % religion● 5.4 % body size● 0.4 % race
Tools UsedComputing & Analysis Natural Language
Processing & Classification Modeling
NLTK
Modeling
TF-IDF on 1.1 million comments
XGBoost multi-class classifier
Word2Vec for word embeddings
TF-IDF: Term Frequency-Inverse Document Frequency
words in comments
Image from http://brandonrose.org/clustering
matrix of numbers
i : the wordj: the document
Bag of words + factor to weight rarely occurring words more than common ones
Gradient Boosted Trees Classifier
From XGBoost Documentation
Decision trees:
Gradient Boosted Trees Classifier
From XGBoost Documentation
Tree Ensembles
Gradient Boosted Trees Classifier
Working on labeled data:Create one tree & run modelFind residuals (differences between model result & labeled data)
Create 2nd tree to fit to the residualsNew results = results from 1st tree + those from 2nd treeFind new residuals
Repeat, adding a tree to the model each time to fit the residuals, until you reach a cut off criteria.
ROC Curve: Examine classification model success Most important features
fat
like
peopl
just
white
dont
fuck
im
becaus
game
jew
women
weight
say
Potential Use Cases for the Predictive Model
More time for the mods!
User posts hateful comment
Model flags comment as hateful
Comment is in limbo until a human moderator reads it
Human evaluates comment and publishes or deletes
Power to the People!
Indicate via user icons or status information those who have a recent history of hateful comments.
Let site users decide if they want to read what this person has to say.
Word2Vec: Most Similar Words
“fat”
skinny
ugly
lazy
lard
fatshit
fatass
slenderman
gtbanned
stupid
hamplanet
skinny
overweight
obese
underweight
and
muscular
that
body
is
anorexic
Thank You!
Emily Y [email protected]
@eyspahn
https://github.com/eyspahn/OnlineHateSpeech
Clip art in the presentation from https://openclipart.org/
Example Comment
Data Used: subreddits Hateful Subreddits
Subreddit Name
Comment Count
Hate Type
CoonTown 51979 Race
WhiteRights 1352 Race
Transfags 2362 Gender
SlutJustice 209 Gender
TheRedPill 59145 Gender
KotakuInAction 128156 Gender
IslamUnveiled 110 Religion
GasTheKikes 919 Religion
AntiPOZi 4740 Religion
fatpeoplehate 311183 Size
TalesofFateHate 5239 Size
Not Hateful Subreddits
Subreddit Name
politics DebateReligion
worldnews religion
history islam
blackladies Judaism
lgbt BodyAcceptance
TransSpace fatlogic
TwoChromosomes women