Artificial Intelligence for Policing
-
Upload
miriam-fernandez -
Category
Technology
-
view
117 -
download
0
Transcript of Artificial Intelligence for Policing
1!
Oxford Internet Institute 23rd November 2017
1!
Artificial Intelligence for Policing
Presenting: Miriam Fernandez, Knowledge Media Institute
Lots of other faces behind this work! @miriam_fs
fernandezmiriam
@miriamfs
2!
Oxford Internet Institute 23rd November 2017
2!
What do you think when
you hear “Artificial Intelligence for
Policing”?
3!
Oxford Internet Institute 23rd November 2017
3!
https://www.youtube.com/watch?v=lG7DGMgfOb8 (2002 movie)
4!
Oxford Internet Institute 23rd November 2017
4!
https://www.facebook.com/financialtimes/videos/vb.8860325749/10155507438890750/?type=2&theater Financial Times 2017
5!
Oxford Internet Institute 23rd November 2017
5!
Drones!
Autonomous Weapons!
Surveillance!
6!
Oxford Internet Institute 23rd November 2017
6!
Washington Post (October 2016)
7!
Oxford Internet Institute 23rd November 2017
7!
8!
Oxford Internet Institute 23rd November 2017
8! Three lines of work presented in this talk
• Policing Engagement via Social Media
• Detecting Grooming Behaviour on Social Media
• Radicalisation detection on Social Media
9!
Oxford Internet Institute 23rd November 2017
10!
Oxford Internet Institute 23rd November 2017
11!
Oxford Internet Institute 23rd November 2017
11! Policing Engagement via Social Media
Miriam Fernandez, Tom Dickinson, and Harith Alani. ”And analysis of UK policing engagement via social media." International Conference on Social Informatics. Springer International Publishing, 2017. Miriam Fernandez, A. Elizabeth Cano, and Harith Alani. "Policing engagement via social media." International Conference on Social Informatics. Springer International Publishing, 2014.
12!
Oxford Internet Institute 23rd November 2017
12! Policing Engagement via Social Media
• Policing organisations use social media to spread the word on crime, severe weather, missing people, …
• Many forces have staff dedicated to this purpose and to improve the spreading of key messages to wider social media communities
• Research shows that exchanges between police and citizens are infrequent
13!
Oxford Internet Institute 23rd November 2017
13! Goal
• Understand what attracts citizen’s to social media policing content – What are the characteristics of the
content that generate higher attention levels • Writing style • Time of posting • Topics
– Help police forces to identify actions and recommendations to increase public engagement
14!
Oxford Internet Institute 23rd November 2017
14! Context: UK Policing
Corporate! Non-corporate!
15!
Oxford Internet Institute 23rd November 2017
15! Understanding Engagement
• Social media engagement has been studied – Through multiple lenses (marketing, social sciences, computer science) – In multiple scenarios (product selling, elections, campaigns, etc.)
• Study the literature of social media engagement – [Ariely] Very clear message with a very concrete action
• Patrol, missing persons, incidents, emergencies, local authorities? What can/should I do?
– [Vaynerchuk] Need to differentiate each social medium (context) • What happens in the world? To whom is the message targeted?
• Study the literature of social media police engagement – Works mainly focus on studying the different social media strategies that police
forces use to interact with the public • [Denef] UK Riots 2011. Instrumental vs. expressive approach
16!
Oxford Internet Institute 23rd November 2017
16! Barriers of Social Media Police Engagement (I)
• Legitimacy The police needs the trust and confidence
of the communities they serve !
17!
Oxford Internet Institute 23rd November 2017
17! Barriers of Social Media Police Engagement (II)
• Reputation
• Official communication channels (911)
• Surveillance
• Variety of topics • Budget
18!
Oxford Internet Institute 23rd November 2017
18! Approach (I)
• Data Collection – 154,679 posts from 48 corporate Twitter accounts – 1,300,070 posts from 2,450 non-corporate Twitter
accounts – January 2017
• Engagement Indicators – Retweets
• % of tweets retweeted • Average number of retweets per tweet
– Favourites (likes) • % of tweets favourited (liked) • Average number of likes per tweet
– Replies • At the time of analysis Twitter API does not allow to
collect replies per tweet
19!
Oxford Internet Institute 23rd November 2017
19! Just for some fun! J How am I doing?
20!
Oxford Internet Institute 23rd November 2017
20! Engagement Indicators (I)
• Most accounts have more than 60% of tweets retweeted – Top 5: MET, Nottinghamshire, Northumbria, Northamptonshire, Cumbria
0
0.2
0.4
0.6
0.8
1
1.2
north
umbr
iapo
l no
ttspo
lice
Jers
eyP
olic
e D
urha
mP
olic
e N
York
sPol
ice
Cum
bria
polic
e sw
polic
e po
lices
cotla
nd
Suf
folk
Pol
ice
DC
_Pol
ice
City
Pol
ice
NW
Pol
ice
Sta
ffsP
olic
e H
erts
Pol
ice
NC
A_U
K
Cle
vela
ndP
olic
e H
ants
Pol
ice
Hum
berb
eat
kent
_pol
ice
Dyf
edP
owys
gw
entp
olic
e C
ambs
Cop
s La
ncsP
olic
e le
icsp
olic
e W
Mer
ciaP
olic
e ch
eshi
repo
lice
suss
ex_p
olic
e w
arks
polic
e W
MP
olic
e P
olic
eSer
vice
NI
Ess
exP
olic
eUK
Th
ames
VP
Nor
than
tsP
olic
e be
dspo
lice
met
polic
euk
Nor
folk
Pol
ice
Glo
s_P
olic
e A
SP
olic
e do
rset
polic
e w
iltsh
irepo
lice
Wes
tYor
ksP
olic
e lin
cspo
lice
Mer
seyP
olic
e S
urre
yPol
ice
gmpo
lice
iom
polic
e sy
ptw
eet
Der
bysP
olic
e
% tweets retweeted
21!
Oxford Internet Institute 23rd November 2017
21! Engagement Indicators (II)
• Most accounts receive in average 10 retweets per tweet – Top 5: MET, Jersey, National Crime Agency, West Midlands, Scotland
0
10
20
30
40
50
60
70
north
umbr
iapo
l no
ttspo
lice
Jers
eyP
olic
e D
urha
mP
olic
e N
York
sPol
ice
Cum
bria
polic
e sw
polic
e po
lices
cotla
nd
Suf
folk
Pol
ice
DC
_Pol
ice
City
Pol
ice
NW
Pol
ice
Sta
ffsP
olic
e H
erts
Pol
ice
NC
A_U
K
Cle
vela
ndP
olic
e H
ants
Pol
ice
Hum
berb
eat
kent
_pol
ice
Dyf
edP
owys
gw
entp
olic
e C
ambs
Cop
s La
ncsP
olic
e le
icsp
olic
e W
Mer
ciaP
olic
e ch
eshi
repo
lice
suss
ex_p
olic
e w
arks
polic
e W
MP
olic
e P
olic
eSer
vice
NI
Ess
exP
olic
eUK
Th
ames
VP
Nor
than
tsP
olic
e be
dspo
lice
met
polic
euk
Nor
folk
Pol
ice
Glo
s_P
olic
e A
SP
olic
e do
rset
polic
e w
iltsh
irepo
lice
Wes
tYor
ksP
olic
e lin
cspo
lice
Mer
seyP
olic
e S
urre
yPol
ice
gmpo
lice
iom
polic
e sy
ptw
eet
Der
bysP
olic
e
Average Number of Retweets
22!
Oxford Internet Institute 23rd November 2017
22! Engagement Indicators (III)
• Some organisations retweet from others rather than originating discussions – Northumbria, Nottinghamshire, Jersey, Durham, North Yorkshire
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
north
umbr
iapo
l no
ttspo
lice
Jers
eyP
olic
e D
urha
mP
olic
e N
York
sPol
ice
Cum
bria
polic
e sw
polic
e po
lices
cotla
nd
Suf
folk
Pol
ice
DC
_Pol
ice
City
Pol
ice
NW
Pol
ice
Sta
ffsP
olic
e H
erts
Pol
ice
NC
A_U
K
Cle
vela
ndP
olic
e H
ants
Pol
ice
Hum
berb
eat
kent
_pol
ice
Dyf
edP
owys
gw
entp
olic
e C
ambs
Cop
s La
ncsP
olic
e le
icsp
olic
e W
Mer
ciaP
olic
e ch
eshi
repo
lice
suss
ex_p
olic
e w
arks
polic
e W
MP
olic
e P
olic
eSer
vice
NI
Ess
exP
olic
eUK
Th
ames
VP
Nor
than
tsP
olic
e be
dspo
lice
met
polic
euk
Nor
folk
Pol
ice
Glo
s_P
olic
e A
SP
olic
e do
rset
polic
e w
iltsh
irepo
lice
Wes
tYor
ksP
olic
e lin
cspo
lice
Mer
seyP
olic
e S
urre
yPol
ice
gmpo
lice
iom
polic
e sy
ptw
eet
Der
bysP
olic
e
Ratio non-original tweets
23!
Oxford Internet Institute 23rd November 2017
23! Non-Corporate accounts (I)
• 50% of the accounts have more than 60% of tweets retweeted
• Top 47 accounts have a higher ratio of retweets than corporate organisations (around 80%)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
1
% of tweets retweeted
24!
Oxford Internet Institute 23rd November 2017
24! Approach (II)
• Feature Extractors – Describe tweets in terms of their characteristics – Content Features
• Length / Readability / Informativeness / Complexity / Sentiment • Media / mentions / hashtags / URLs • Time in the day
– User Features • Network: In-degree / out-degree • Activity: Post count / post rate / age in the system
– Semantic Features • Use knowledge bases to extracts entities and concepts
– Persons / Organisations / Locations
• Use Machine Learning techniques to determine the characteristics “patterns” of those tweets receiving higher engagement levels
25!
Oxford Internet Institute 23rd November 2017
25! Results (I)
• Tweets receiving higher engagement are: – Longer, easier to read, more informative, lower complexity (avoid
complex terms), include media items (images, videos). – In terms of user features they tend to be posted by accounts with a
high number of followers (corporate) or with a high post rate and a high in-out degree ratio (non-corporate).
neg pos
05
1015
2025
30
lenght
neg pos
020
4060
80100
readability
neg pos
020
4060
80100
informativeness
neg pos
−4−2
02
4
polarity
26!
Oxford Internet Institute 23rd November 2017
26! Results (II)
• Tweets receiving higher engagement talk about – Weather / roads and infrastructures /
events / missing persons – Raise awareness (domestic abuse,
hate crime, modern slavery) – Tend to mention locations
• Tweets receiving lower engagement talk about – Crime updates: such as burglary,
assault or driving under the influence of alcohol
– Following requests (#ff) – Advices to stay safe
27!
Oxford Internet Institute 23rd November 2017
27! Results (III)
• Non-corporate accounts generate in average higher engagement
– Offer help, ask for help, advise on local issues, reassure safety, etc. (#wearehereforyou)
• Three additional ingredients
– They retweet messages about relevant events and popular users
– They engage closer with the communities (direct messages and mentions to citizens)
– They are fun!
28!
Oxford Internet Institute 23rd November 2017
28! Engagement Guidelines
• Focus – Consider the key goal to achieve / the audience to engage (general public,
local communities, teenagers) & provide a clear message with a concrete set of actions associated to it
• Be clear – Complex messages with police jargon are difficult to understand. Messages
should be simple, informative and useful. Use images/videos and humour to enhance dissemination
• Interact – Engage with the communities rather than only broadcast. Identify highly
engaging police staff members and community leaders and involve them • Stay active
– Engagement is a long-term commitment. Accounts active for longer time receive higher engagement.
• Be respectful – Reputation and legitimacy are extremely important. Post polite, safe and
respectful content
29!
Oxford Internet Institute 23rd November 2017
29! Detecting Grooming Behaviour on Social Media
Cano, E; Miriam, F.; and Alani, H (2014). Detecting child grooming behaviour patterns on social media. The 6th International Conference on Social Informatics (SocInfo), Barcelona, Spain.
Slides provided by Harith Alani (Professor of Web Science, Knowledge Media Institute) @halani
30!
Oxford Internet Institute 23rd November 2017
Child Grooming
Premeditated behaviour intending to secure the trust of a minor as a first step towards future engagement in sexual conduct.
Choo, K-K R. Responding to online child sexual grooming: an industry perspective, Trends & issues in crime and criminal justice, no. 379. July 2009
31!
Oxford Internet Institute 23rd November 2017
Claire Lilley, Ruth Ball, Heather Vernon, The experiences of 11-16 year olds on social networking sites, NSPCC 2014
“findings show that approximately 190,000 UK children (1 in 58) will suffer contact sexual abuse by a non-related adult before turning 18, with approximately 10,000 new child victims of contact sexual abuse being reported in the UK each year.”
32!
Oxford Internet Institute 23rd November 2017
“50% of all 11 and 12 year-olds in the UK use a social networking site, according to our research. This is because it's easy for children to access sites intended for older users.”
https://www.nspcc.org.uk/preventing-abuse/keeping-children-safe/share-aware/
33!
Oxford Internet Institute 23rd November 2017
https://www.statista.com/statistics/271348/facebook-users-in-the-united-kingdom-uk-by-age/
34!
Oxford Internet Institute 23rd November 2017
Children’s use of mobile phones - A special report 2014. http://www.gsma.com/publicpolicy/wp-content/uploads/2012/03/GSMA_Childrens_use_of_mobile_phones_2014.pdf
35!
Oxford Internet Institute 23rd November 2017
https://www.thinkuknow.co.uk/parents/articles/Online-grooming/
Online Grooming
36!
Oxford Internet Institute 23rd November 2017
https://www.thinkuknow.co.uk/14_plus/Need-advice/Online-grooming/
Signs of Online Grooming
37!
Oxford Internet Institute 23rd November 2017
Predator: hey whats up?… Predator: I like your pic, very cute Predator: so you're in san diego? 13-yr-old-girl: not far Predator: ok, you like older guys? 13-yr-old-girl: thers nice or bad ppl all ages Predator: have some pics if you want to see Predator: do your parents look on your computer? Predator: so are you by yourself or is someone else there with you? Predator: so it should just be us, our little secret Predator: so have you ever snuck out? 13-yr-old-girl: not rlly lol Predator: yeah, what about tonight? Predator: think you could sneak out tonight? Predator: well if the wrong person found out then I'd be screwed 13-yr-old-girl: im not a teller lol Predator: I know, just wouldn't want your dad to find out Predator: if you are still up why not sneak out for a few minutes Predator: but that's the fun of it 13-yr-old-girl: fun to sneak? Predator: yes Predator: so your dad doesn't know Predator: would take a nap but I leave for bible study around 6:30 Predator: I know I'm bad, going to bible study and talking about sex with you Predator: yeah, there's nothing wrong with us being friends, we have the same lord remember ;) Predator: would take me like an hour and a half to get there Predator: see you in a little while
~700 messages
Over a 5 month period
Grooming in Action
38!
Oxford Internet Institute 23rd November 2017
Olson, L. N., Daggs, J. L., Ellevold, B. L. and Rogers, T. K. K. (2007), Entrapping the Innocent: Toward a Theory of Child Sexual Predators’ Luring Communication. Communication Theory, 17: 231–251
Olson’s Theory of Luring Communication (LTC)
39!
Oxford Internet Institute 23rd November 2017
Predator: hey whats up?… Predator: I like your pic, very cute Predator: so you're in san diego? 13-yr-old-girl: not far Predator: ok, you like older guys? 13-yr-old-girl: thers nice or bad ppl all ages Predator: have some pics if you want to see Predator: do your parents look on your computer? Predator: so are you by yourself or is someone else there with you? Predator: so it should just be us, our little secret Predator: so have you ever snuck out? 13-yr-old-girl: not rlly lol Predator: yeah, what about tonight? Predator: think you could sneak out tonight? Predator: well if the wrong person found out then I'd be screwed 13-yr-old-girl: im not a teller lol Predator: I know, just wouldn't want your dad to find out Predator: if you are still up why not sneak out for a few minutes Predator: but that's the fun of it 13-yr-old-girl: fun to sneak? Predator: yes Predator: so your dad doesn't know Predator: would take a nap but I leave for bible study around 6:30 Predator: I know I'm bad, going to bible study and talking about sex with you Predator: yeah, there's nothing wrong with us being friends, we have the same lord remember ;) Predator: would take me like an hour and a half to get there Predator: see you in a little while
Approach
Grooming
Trust Development
Isolation
Physical Approach
Physical Approach
40!
Oxford Internet Institute 23rd November 2017
Predator: hey whats up?… Predator: I like your pic, very cute Predator: so you're in san diego? 13-yr-old-girl: not far Predator: ok, you like older guys? 13-yr-old-girl: thers nice or bad ppl all ages Predator: have some pics if you want to see Predator: do your parents look on your computer? Predator: so are you by yourself or is someone else there with you? Predator: so it should just be us, our little secret Predator: so have you ever snuck out? 13-yr-old-girl: not rlly lol Predator: yeah, what about tonight? Predator: think you could sneak out tonight? Predator: well if the wrong person found out then I'd be screwed 13-yr-old-girl: im not a teller lol Predator: I know, just wouldn't want your dad to find out Predator: if you are still up why not sneak out for a few minutes Predator: but that's the fun of it 13-yr-old-girl: fun to sneak? Predator: yes Predator: so your dad doesn't know Predator: would take a nap but I leave for bible study around 6:30 Predator: I know I'm bad, going to bible study and talking about sex with you Predator: yeah, there's nothing wrong with us being friends, we have the same lord remember ;) Predator: would take me like an hour and a half to get there Predator: see you in a little while
Approach
Grooming
Trust Development
Isolation
Physical Approach
Physical Approach
Can we automatically identify these stages?
41!
Oxford Internet Institute 23rd November 2017
“think you could sneak out tonight?“
Grooming Trust
Development Physical
Approach other
Automatic Classifiers
Yes No No No
Identifying Grooming Stages
42!
Oxford Internet Institute 23rd November 2017
Dataset
• 50 transcripts of conversations between convicted predators and volunteers who posed as minors
• Conversations vary between 83 to 12K lines.
• Each predator line manually labelled by two annotators.
• Annotations labels: 1)Trust development, 2) Grooming, 3) Seek physical approach, 4) Other.
Trust Dev. Grooming Phys. Approach Other
1225 3304 2700 3304 sentences
Dataset
43!
Oxford Internet Institute 23rd November 2017
Processing Chat Text
• Challenges in processing chat-room conversations – Use of irregular and ill-formed words. – Use of chat slang and teen-lingo – Use of emoticons.
Generated a list of over 1K terms and definitions:
Chat term Translation Emoticon Translation
ASLP Age, sex, location, picture :’-( I’m crying
AWGTHTHTTA
Are we going to have to go through this again?
o/\o High five
BRB Be right back @_@ I’m tired, trying to stay awake
CWOT Complete waste of time ( ‘}{‘ ) kiss
44!
Oxford Internet Institute 23rd November 2017
Analysis Features and Results
Results - with all features:
Feature Description
N-gram word combinations extracted from text (N=1,2,3)
Part-of-speech tagging noun, verb, adjective, plural, etc.
sentiment average sentiment of terms in sentence
length number of words in sentence
Psycho-linguistic Patterns 62 psycho-linguistic patterns in English (swearing, sexual, agreement, etc.)
Semantic frames Type of event, relation, or entity in text, e.g., secrecy, desirability, emotion, kinship
Trust Development
Grooming Phys. Approach average
Precision 79.2% 87.6% 87.2% 84.7%
Recall 82.3% 88.8% 88.7% 86.6%
F1 80.7% 88.2% 87.9% 85.6%
45!
Oxford Internet Institute 23rd November 2017
45! Radicalisation detection on Social Media
Saif H. Fernandez M. Dickinson T, Kastler L. & Alani H. A Semantic Graph-based Approach for Radicalisation Detection on Social Media. ESWC 2017 Saif H. Fernandez, M. Rowe, M. & Alani H. On the Role of Semantics for Detecting pro-ISIS stances on social media. ISWC 2016 Rowe M & Saif H. Mining Pro-ISIS Radicalisation Signals from Social Media Users. ICWSM 2016. Nominated for best paper Award!
slides by Hassan Saif,!!
46!
Oxford Internet Institute 23rd November 2017
Online Radicalisation • Is the process by which
individuals are introduced to ideological messages and belief systems that encourage movement from mainstream beliefs toward extreme views, primarily through the use of online media [International Assoc of Chiefs of Police and United States of America]
47!
Oxford Internet Institute 23rd November 2017
Islamic State in Iraq and Syria (ISIS)
Social Media Propaganda & Recruiting
48!
Oxford Internet Institute 23rd November 2017
ISIS on Social Media
49!
Oxford Internet Institute 23rd November 2017
50!
Oxford Internet Institute 23rd November 2017
Research Questions and Objectives
• RQ1: How can we detect when a user has adopted a pro-ISIS stance?
• RQ2: What happens to Twitter users before and after the exhibit radicalised behaviour?
• RQ3: What influences users to adopt pro-ISIS language?
• RQ4: Can we automatically identify users that have adopted pro- vs. anti-ISIS stances?
51!
Oxford Internet Institute 23rd November 2017
Data Collection and Analysis
Kurdish
Jihadist
Pro-Assad
Secular/Moderate
Fig. 1: Syrian account network (652 nodes, 3,260 edges). Four major categories; Jihadist (gold, right), Kurdish (red, top),Pro-Assad (purple, left), and Secular/Moderate opposition (blue, center). Black nodes are members of multiple communities.Visualization was performed with the OpenOrd layout in Gephi.
contrast with the polarization analyzed in certain studies ofmainstream political activism [3], [10], the three communitiesselected consist of two polar opposites, jihadist and secularrevolutionary, with the third community considerably moderatein comparison. The analysis process includes the generationof rankings of the preferred YouTube channels for eachcommunity, where these channels and corresponding Freebasetopics assigned by YouTube are used to assist interpretationwhile also providing a certain level of validation2. We alsoconsider online activity surrounding “real world” events, suchas YouTube video responses to the Ghouta chemical weaponattack on 21 August 2013 [11]. The insights revealed in thisstudy confirm that alternative analytical approaches can playa key role in studies of online activity where prior knowledgemay be scarce or unreliable.
ANALYZING ONLINE POLITICAL ACTIVISM
In this paper, we consider online activity associated withthe Syria conflict within the context of other studies ofonline political activism that have focused upon relativelystatic, often mainstream groupings about which a considerablelevel of prior knowledge is available. This includes situa-tions featuring a polarization effect, or others where multiplegroupings are in existence. For example, the study of USliberal and conservative blogs by Adamic and Glance [3] foundclear separation between both communities, with noticeablebehavioral differences in terms of network density based onlinks between blogs, blog content itself, and interaction withmainstream media. They did not focus on “other” blogs, suchas those of a libertarian, independent or moderate nature (and
2http://www.freebase.com/
found few references to these from the liberal and conservativeblogs), but suggested that they could be considered in futureanalysis. Progressive and conservative polarization on Twitterwas investigated by Conover et al. , where hashtags were usedto gather data leading to two network representations based onTwitter retweets and mentions [10]. By specifically requestingthe detection of exactly two communities, polarization wasclearly observable in the retweet network. This was not thecase with analogous two-community detection within the cor-responding mentions network, where the authors suggestedthat this feature may foster cross-ideological interactions ofsome nature. In both cases, increasing the number of targetcommunities beyond two revealed smaller politically hetero-geneous communities rather than those of a more fine-grainedideological structure.
Mustafaraj et al. analyzed the vocal minority (prolific tweet-ers) and silent majority (accounts that tweeted only once)within US Democrat and Republican Twitter supporters, gath-ering data by searching for tweets containing the names of twoMassachusetts senate candidates [12]. They also found similarpolarized retweet communities in the vocal minority, whileat the same time, the activity of both of these communitieswas consistently different to the silent majority at the oppo-site end of the spectrum. The machine learning frameworkproposed by Pennacchiotti and Popescu for the classificationof Twitter accounts was evaluated using three gold standarddata sets, including one associated with political affiliation thatwas generated from lists of users who classified themselvesas either Democrat or Republican in the Twitter directoriesWeFollow and Twellow [13]. Similar political affiliation onTwitter was studied by Wong et al. , where they proposed amethod to quantify US political leaning that focused on tweets
O’Callaghan et al. 2014
625 Users
2.4M Users
154K EU Users
104M Tweets
English 43%
Arabic 41%
Others 16%
52!
Oxford Internet Institute 23rd November 2017
Identifying Signals of Radicalisation
Lexicon- and Network-based Approach
H1 – Sharing Incitement Material H2 – Using Extremist Language
دولة الخلافة
ISIS Shirk
Caliphate Islamic State
ارهاب
Radicalization Lexicon 25.5K Suspended ISIS Accounts
53!
Oxford Internet Institute 23rd November 2017
Activation Points (RQ1)
• Increase in users activated between May 2014 and November 2014 coincides with execution of 6 hostages by ISIS and the videos of these executions posted via social media
• The majority of users share content from pro-ISIS accounts before going on to posts pro-ISIS terms themselves
Table 2: Significant events involving ISIS/ISIL and the West.Date Description08-04-2013 ISIS expand into Syria04-01-2014 Fallujah captured by ISIS15-01-2014 ISIL retake Ar-Raqqah01-05-2014 ISIS carry out public executions in Ar-Raqqah09-06-2014 Mosul falls under ISIS control02-09-2014 Hostage Steven Sotloff executed13-09-2014 Hostage David Haines executed22-09-2014 Hostage Samira Salih al-Nuaimi executed03-10-2014 Hostage Alan Henning executed07-10-2014 Abu Bakr al-Baghdadi injured in US air strike16-10-2014 Hostage Peter Kassig executed14-01-2015 Christopher Lee Cornell arrested for bomb plot25-01-2015 Hostage Haruna Yukawa executed31-01-2015 Hotage Kenji Goto executed06-02-2015 Hostage Kayla Mueller killed in air strike26-02-2015 Jihadi John is identified as Mohammed Emwazi18-03-2015 ISIS responsible for Tunisia museum attack15-05-2015 Abu Sayyaf killed by US special forces30-06-2015 Alaa Saadeh arrested for attempts to aid ISIS11-07-2015 Maher Meshaal killed in coalition air strike
ses. Figure 2(a) and figure 2(b) show the number of userswho are activated on each day according to each hypothesis.We note that the span of activations of H1 users is shorterthan H2 users - as the former requires sharing content frombanned or pro-ISIS accounts, while the latter looks at theuse of pro-ISIS terms. One thing that is immediately appar-ent from the plots is that there is a large surge in activityfrom May 2014 onwards - for both H1 and H2 activations.To investigate why this surge occurs, we identified a seriesof key events related to ISIS/ISIL from 2013 onwards - theseare shown in Table 2. As noted, the increase in activationsbetween May 2014 and November 2014 coincides with exe-cution of 6 hostages by ISIS and the videos of these execu-tions posted via social media. Although we cannot discerncausation (of activation) from correlation here, there doesappear to be an association between such information ap-pearing in the public domain (of executions) and users eithersharing pro-ISIS content (Figure 2(a)) or adopting pro-ISISlanguage (Figure 2(b)).
In order to examine whether there was a link betweenusers sharing content from pro-ISIS accounts (via retweet-ing) and then posting pro-ISIS content themselves, we de-rived the �(ah1 � ah2)-distribution using all users that fallwithin the intersection of the H1 and H2 users’ sets. For eachuser in this intersection set (u 2 UH1 \ UH2) we measuredthe difference (in days) between their H2 activation point(ah2) - i.e. when they first post pro-ISIS rhetoric themselves- and their H1 activation point (ah1) - i.e. when they firstshared content from pro-ISIS accounts. Figure 2(c) presentsthe distribution of �(ah2 � ah1). We note that this distri-bution has a right skew indicating that the majority of userspost pro-ISIS terms before then going on to share contentfrom pro-ISIS accounts - note that we only have 64 userswithin intersection of H1 and H2 users.
Detecting Behaviour DivergenceHaving detected the activation points of users within boththe H1 and H2 hypotheses’ sets, we then moved on to ex-amine what happens once users have become activated:RQ2: What happens to Twitter users before they exhibit rad-icalised behaviour, and also after such exhibition? As be-haviour is a fairly abstract concept, we operationalise itsmeasurement through three dimensions: (i) the lexical termsused by a user (i.e. non-stop word terms published in his/hertweets), (ii) the users whose content the user has shared(i.e. propagated through his network), and (iii) the users thatthe user has mentioned. Each dimension, which we refer toas lexical, sharing, and interactions respectively, in essenceforms a discrete probability distribution that we can derivefrom a given half-closed time interval (i.e. [t, t0) : t < t0).Each distribution is then derived from the relative frequencydistribution of the user’s behaviour within the allotted timewindow: for instance, the lexical dimension’s distribution(PL
[t,t0)) is the relative frequency distribution of terms usedwithin the user’s tweets within the time window.5 As we aredealing with both Arabic and English tweets, we ran a pro-cess of transliteration on the former to convert Arabic scriptto English unicode characters, thereby allowing for both lan-guages to be handled using the same base language.
In order to examine whether a user’s behaviour haschanged once activated we computed the relative entropy(aka. Kullback-Leibler/KL divergence) over three time win-dows. Each time window has a midpoint (m), this midpointthen forms the boundary from which a given behaviour di-mension has two probability distributions computed (one be-fore the midpoint, and one after the midpoint). Let P[t,m)denote the distribution prior to m, and Q[m,t0) denote thedistribution on and after m, then the relative entropy is com-puted using P and Q as follows:
H(Q||P ) =
X
i
P (i) logP (i)
Q(i)(1)
As mentioned above, we measured the relative entropyover three windows, these were as follows:
1. Activation Window: the midpoint (m) of the window is thegiven user’s activation point (i.e. ah1 or ah2), and we setthe bounds of the window by going back k days from m.
2. Pre-Control Window: the midpoint of the window is 2kdays back from the activation point of the user, and thebounds are set to [a� 3k, a� k).
3. Post-Control Window: the midpoint of the window is 2kdays forward from the activation point of the user, and thebounds of the window are set to [a+ k, a+ 3k).Hence, our experimental setting provides three non-
overlapping time windows over which we could computethe relative entropy of user behaviour (lexical, sharing, in-teractions). For users labelled as pro-ISIS by H1 and H2 wecomputed their three relative entropy values over the three
5The sharing and interactions distributions are computed in thesame manner, using the relative frequencies of users whose contentis shared and users mentioned respectively.
54!
Oxford Internet Institute 23rd November 2017
Behaviour Before/After Activation (RQ2)
• Users exhibit a large divergence in their language once activated – Before activation the majority of topics users discuss focus on politics,
where words like Syria, Israel and Egypt are mentioned in a negative context and with high frequency
– After activation religious words (e.g. Allah, muslims, quran) become more popular.
Pre-Activation Activation Post-Activation
55!
Oxford Internet Institute 23rd November 2017
Influencing Pro-ISIS Term Adoption (RQ3)
• We study the effect of – Lexical Homophily: similarity in language – Sharing Homophily: diffusion of information from the same accounts – Interaction Homophily: common communications
Social dynamics play a strong role in term uptake. Subcommunities act as bridges between radicalised user and the future adopter
pro-ISIS User Potential Adopter
56!
Oxford Internet Institute 23rd November 2017
OBJ. Detect sub-communities of users from whom radicalised content is shared
DetectingPro-ISIS Subcommunities
57!
Oxford Internet Institute 23rd November 2017
Kurdish
Jihadist
Pro-Assad
Secular/Moderate
Fig. 1: Syrian account network (652 nodes, 3,260 edges). Four major categories; Jihadist (gold, right), Kurdish (red, top),Pro-Assad (purple, left), and Secular/Moderate opposition (blue, center). Black nodes are members of multiple communities.Visualization was performed with the OpenOrd layout in Gephi.
contrast with the polarization analyzed in certain studies ofmainstream political activism [3], [10], the three communitiesselected consist of two polar opposites, jihadist and secularrevolutionary, with the third community considerably moderatein comparison. The analysis process includes the generationof rankings of the preferred YouTube channels for eachcommunity, where these channels and corresponding Freebasetopics assigned by YouTube are used to assist interpretationwhile also providing a certain level of validation2. We alsoconsider online activity surrounding “real world” events, suchas YouTube video responses to the Ghouta chemical weaponattack on 21 August 2013 [11]. The insights revealed in thisstudy confirm that alternative analytical approaches can playa key role in studies of online activity where prior knowledgemay be scarce or unreliable.
ANALYZING ONLINE POLITICAL ACTIVISM
In this paper, we consider online activity associated withthe Syria conflict within the context of other studies ofonline political activism that have focused upon relativelystatic, often mainstream groupings about which a considerablelevel of prior knowledge is available. This includes situa-tions featuring a polarization effect, or others where multiplegroupings are in existence. For example, the study of USliberal and conservative blogs by Adamic and Glance [3] foundclear separation between both communities, with noticeablebehavioral differences in terms of network density based onlinks between blogs, blog content itself, and interaction withmainstream media. They did not focus on “other” blogs, suchas those of a libertarian, independent or moderate nature (and
2http://www.freebase.com/
found few references to these from the liberal and conservativeblogs), but suggested that they could be considered in futureanalysis. Progressive and conservative polarization on Twitterwas investigated by Conover et al. , where hashtags were usedto gather data leading to two network representations based onTwitter retweets and mentions [10]. By specifically requestingthe detection of exactly two communities, polarization wasclearly observable in the retweet network. This was not thecase with analogous two-community detection within the cor-responding mentions network, where the authors suggestedthat this feature may foster cross-ideological interactions ofsome nature. In both cases, increasing the number of targetcommunities beyond two revealed smaller politically hetero-geneous communities rather than those of a more fine-grainedideological structure.
Mustafaraj et al. analyzed the vocal minority (prolific tweet-ers) and silent majority (accounts that tweeted only once)within US Democrat and Republican Twitter supporters, gath-ering data by searching for tweets containing the names of twoMassachusetts senate candidates [12]. They also found similarpolarized retweet communities in the vocal minority, whileat the same time, the activity of both of these communitieswas consistently different to the silent majority at the oppo-site end of the spectrum. The machine learning frameworkproposed by Pennacchiotti and Popescu for the classificationof Twitter accounts was evaluated using three gold standarddata sets, including one associated with political affiliation thatwas generated from lists of users who classified themselvesas either Democrat or Republican in the Twitter directoriesWeFollow and Twellow [13]. Similar political affiliation onTwitter was studied by Wong et al. , where they proposed amethod to quantify US political leaning that focused on tweets
O’Callaghan et al. 2014
625 Users
2.4M Users
154K EU Users
104M Tweets
Sharing Incitement Material
Using Extremist Language
566 pro-ISIS users 566 anti-ISIS users
Pro and anti-ISIS Stances (RQ4)
58!
Oxford Internet Institute 23rd November 2017
TweetsConceptual.Semantics.Extraction
DBpedia
Semantic.Graph.Representation
Frequent.Semantic.Subgraph.Mining Classifier.Training
Pipeline of detecting pro-ISIS stances using semantic sub-graph mining-based feature extraction
• Extract and use the semantic interdependencies and relations between words to learn patterns of radicalisation.
ISIS
Syria
Jihadist Group
Country (Military Intervention Against ISIL, place, Syria)
Entities Concepts Semantic Relations
Semantic Graph-based Approach for Pro-ISIS Stance Detection
59!
Oxford Internet Institute 23rd November 2017
per-Stance classification performance of the five feature sets
86.3 86.3
84.886
91.7
84.4 84.4
81
87.1
92.8
80
82
84
86
88
90
92
94
Unigrams Sen6ment Topics Network Seman6cs
an6-ISIS pro-ISIS
radicalisation classification, i.e., classifying users in our dataset according to their stanceas pro-ISIS or anti-ISIS. Hence, our experimental setup requires the selection of (i) anannotated dataset of Twitter users (pro-ISIS and anti-ISIS) together with their timelines,(i) baseline features for cross-comparison and (ii) a supervised classification method.These elements are explained in the following subsections.
4.1 Dataset of pro-ISIS and anti-ISIS Twitter users
Our approach relies on a training dataset of 1, 132 European Twitter users (togetherwith their timelines) collected in our previous work [14]. In this work the pro-ISISstance of 727 Twitter users was determined based on their sharing of incitement materialfrom known pro-ISIS accounts and on their use of extremist language. By the time ofconducting this research, 161 of these Twitter accounts were suspended or changed theprivacy to protected, preventing us from accessing their profile information. As such, weresorted to remove them from the original set, resulting in 566 pro-ISIS users in total. Tobalance our dataset, we added 566 anti-ISIS users, whose stance is determined by theuse of anti-ISIS rhetoric. Table 2 shows the total number, and distribution of tweets andwords for each user group. As we can observe, both the number of tweets and wordsfor anti-ISIS users are significantly higher than the ones for pro-ISIS users. We referthe reader to the body of our work [14] for more details about the construction andannotation of this dataset.
pro-ISIS Users anti-ISIS Users
Total number of Tweets 602,511 1,368,827Average Number of Tweets per User 1,065 2,418Total number of Words 3,945,815 9,375,841Average Number of Words per User 6,971 16,570
Table 2: Statistics of the Twitter dataset used for evaluation
4.2 Baseline Features
Unigrams Features: Word unigrams are features traditionally used for various classifi-cation tasks of tweets data. For example, in the context of a sentiment analysis task, mod-els trained from word unigrams were shown to outperform random classifiers by 20%. [1]We generate the user’s unigram vector t
uunig
as the vector tuunig
= (w1, w2, ..., wm
)of the words in his timeline. Note that stopwords, non-English words and special char-acters are removed from the timeline prior to building t
uunig
in order to reduce itsdimensionality.
Sentiment Features: Sentiment features denote the sentiment orientation (positive,negative, neutral) of users in our dataset. The rational behind using these featuresis that the sentiment conveyed by the users’ posts may help discriminating betweenpro- and anti-ISIS stances. To extract these features for a given user u, we first ex-tracted the sentiment orientation of each tweet in the user’s timeline. To this end,we used SentiStrength [17], a lexicon-based sentiment detection method for the so-cial web. To construct the sentiment vector t
usentiment
for user u, we augment theunigrams feature vector t
uunig
with the extracted sentiment orientation of tweets as:tusentiment
= (w1, w2, ..., wm
, p
pos
, p
neg
, p
neu
), where p
pos
, p
neg
and p
neu
are the
Results
60!
Oxford Internet Institute 23rd November 2017
60! Questions?