Using Big Data and Network Centrality Metrics to Identify Influencers Caitlin Rowe, Jed Lee, Max...

27
Using Big Data and Network Centrality Metrics to Identify Influencers Caitlin Rowe, Jed Lee, Max Nugmanov & Ruben Vargas Advisor: David Beskow

Transcript of Using Big Data and Network Centrality Metrics to Identify Influencers Caitlin Rowe, Jed Lee, Max...

Page 1: Using Big Data and Network Centrality Metrics to Identify Influencers Caitlin Rowe, Jed Lee, Max Nugmanov & Ruben Vargas Advisor: David Beskow.

Using Big Data and Network Centrality Metrics to Identify Influencers

Caitlin Rowe, Jed Lee, Max Nugmanov & Ruben Vargas

Advisor: David Beskow

Page 2: Using Big Data and Network Centrality Metrics to Identify Influencers Caitlin Rowe, Jed Lee, Max Nugmanov & Ruben Vargas Advisor: David Beskow.

Agenda

• Introduce Regionally Aligned Units

• Problem Statement

• Framing the Problem

• Data

• Model

• Results2

Page 3: Using Big Data and Network Centrality Metrics to Identify Influencers Caitlin Rowe, Jed Lee, Max Nugmanov & Ruben Vargas Advisor: David Beskow.

Recent Ebola Virus Outbreak

October 2014—U.S. Army Africa deployed 3,200 soldiers to Monrovia, Liberia as Joint Force Command for Operation United Assistance– support interagency

humanitarian efforts– supervise construction of Ebola

treatment units

• These Soldiers/Leaders deployed into an austere and unfamiliar environment and problem set

• Important that soldiers understand operating environment – Influential actors– Key infrastructure 3

Page 4: Using Big Data and Network Centrality Metrics to Identify Influencers Caitlin Rowe, Jed Lee, Max Nugmanov & Ruben Vargas Advisor: David Beskow.

Difficult Environments

4

Humanitarian Assistance

Natural Disaster Regional Conflict

Symmetric & Asymmetric Threats

Page 5: Using Big Data and Network Centrality Metrics to Identify Influencers Caitlin Rowe, Jed Lee, Max Nugmanov & Ruben Vargas Advisor: David Beskow.

Problem Statement

5

We will fuse open-source data to identify influential

state and non-state actors and infrastructure in

order to help tactical leaders of regionally aligned

units understand and engage their areas of interest

before and during temporary deployments to

austere environments.

Influential People? Important Places?

Page 6: Using Big Data and Network Centrality Metrics to Identify Influencers Caitlin Rowe, Jed Lee, Max Nugmanov & Ruben Vargas Advisor: David Beskow.

Framing the Problem:The Operations Process

6

Build and Maintain Situational Understanding

Visualize

Take from ADRP 5.0

Take from ADRP 5.0

Page 7: Using Big Data and Network Centrality Metrics to Identify Influencers Caitlin Rowe, Jed Lee, Max Nugmanov & Ruben Vargas Advisor: David Beskow.

Framing the Problem:OSINT – Open Source Intelligence

• OSINT – intelligence that is produced from publicly available information and is collected, exploited, and disseminated in a timely manner to an appropriate audience for the purpose of addressing a specific intelligence requirement – Congress findings

• Open source data contains a wealth of information that can help military leaders understand these complex environments.

• Creation of OSC under CIA • Working to get OSINT on equal ground

with all other intelligence sources

7

Page 8: Using Big Data and Network Centrality Metrics to Identify Influencers Caitlin Rowe, Jed Lee, Max Nugmanov & Ruben Vargas Advisor: David Beskow.

• Global Knowledge Graph is part of GDELT** Project

• GDELT is built using an enhanced TABARI algorithm

• GKG is an open source database that “connects the world's people, organizations, locations, themes, counts, and emotions into a single holistic network over the entire planet” 8

Data:Global Knowledge Graph

**GDELT is the Global Database of Events, Language, and Tone

Page 9: Using Big Data and Network Centrality Metrics to Identify Influencers Caitlin Rowe, Jed Lee, Max Nugmanov & Ruben Vargas Advisor: David Beskow.

9

Data:TABARI Algorithm

• Text Analysis By Augmented Replacement Instructions

• Machine coding of international event data using pattern recognition and simple grammatical parsing

• Designed to extract information from short news articles– Applied to ~80k English articles daily

Page 10: Using Big Data and Network Centrality Metrics to Identify Influencers Caitlin Rowe, Jed Lee, Max Nugmanov & Ruben Vargas Advisor: David Beskow.

Date Counts Themes Locations Persons Organizations Tone Source URLs

john kerry;barack obama;chris christie;george w bush

barack obama

john kerry

Persons found in the same article are listed in the Persons field, separated by a semicolon

The Global

Network

We assumed persons found in the same article are “connected”

GKG Data

Page 11: Using Big Data and Network Centrality Metrics to Identify Influencers Caitlin Rowe, Jed Lee, Max Nugmanov & Ruben Vargas Advisor: David Beskow.

11

GDELT

from by Kalev Leetaru in “Foreign Policy”

Page 12: Using Big Data and Network Centrality Metrics to Identify Influencers Caitlin Rowe, Jed Lee, Max Nugmanov & Ruben Vargas Advisor: David Beskow.

12

Identifying Influential People

Step 1:

Unix Code queries the GKG data set in order to find a geographic or topical subset (i.e. Bangladesh subset).

Linux Query

Step 2:

Use Network Centrality Models in R to identify influential individual names

Network Centrality Wiki Merge

Step 3:

Use Python to programmatically search wikipedia for 2 sentence descriptions

Page 13: Using Big Data and Network Centrality Metrics to Identify Influencers Caitlin Rowe, Jed Lee, Max Nugmanov & Ruben Vargas Advisor: David Beskow.

13

Measuring Network Centrality

Betweenness

Closeness

Degree

Eigenvector

j

ijiiD XXndC )(

A1

1

),()(

g

jjiic nndnC

kj

jkijkiB gngnC /)()(

Page 14: Using Big Data and Network Centrality Metrics to Identify Influencers Caitlin Rowe, Jed Lee, Max Nugmanov & Ruben Vargas Advisor: David Beskow.

14

Degree Centrality

• Number of neighbors, or edges, for a node

• Counts connections with other nodes

• Ignores any directions on the edges

• Measures local centrality• Computationally fast

j

ijiiD XXndC )(

Page 15: Using Big Data and Network Centrality Metrics to Identify Influencers Caitlin Rowe, Jed Lee, Max Nugmanov & Ruben Vargas Advisor: David Beskow.

• Closeness is a measure of a nodes distance to all other nodes.

• Closeness measures how easy it is for information to spread from a node to all other nodes sequentially

15

Closeness Centrality

1

1

),()(

g

jjiic nndnC

Page 16: Using Big Data and Network Centrality Metrics to Identify Influencers Caitlin Rowe, Jed Lee, Max Nugmanov & Ruben Vargas Advisor: David Beskow.

16

Betweenness Centrality

• Measures the number of shortest paths that cross a node

• Shows when one particular person is located in a strategically central position of the group

kj

jkijkiB gngnC /)()(

gjk = the number of geodesics connecting jk;gjk(ni) = the number that actor i is on.

Page 17: Using Big Data and Network Centrality Metrics to Identify Influencers Caitlin Rowe, Jed Lee, Max Nugmanov & Ruben Vargas Advisor: David Beskow.

17

Eigenvector Centrality

• Measures the importance of a node in a network, using adjacency matrix

• Assigns relative scores to nodes by giving “points” to nodes that are more influential than others

• Nodes depend on number of connections and relative scores of nodes

• Connections to more influential people will contribute to their own influence

A = adjacency matrix of graphλ = constant, eigenvalueν = eigenvector

Page 18: Using Big Data and Network Centrality Metrics to Identify Influencers Caitlin Rowe, Jed Lee, Max Nugmanov & Ruben Vargas Advisor: David Beskow.

18

Our Primary Algorithm

UNIX grep command for query

Bootstrap for Large Datasets

Combination of 2

Calculate Network Centrality

Page 19: Using Big Data and Network Centrality Metrics to Identify Influencers Caitlin Rowe, Jed Lee, Max Nugmanov & Ruben Vargas Advisor: David Beskow.

19

Nigeria Case Study (1 of 3)

• Boko Haram – Extremist group formed in 2009 that seeks establishment of an Islamic state in Nigeria.• In addition to Nigeria, they also

operate in Chad, Cameroon, and Niger

• Opposes Westernization of Nigerian society

• Pledged allegiance to Islamic State of Iraq and Syria (ISIS) on March 7, 2015.

• They have destabilized Nigeria, killing over 13,000 civilians and causing 1.5 million people to flee

What if US Regionally Aligned Forces deployed to Nigeria to advise and assist in the fight against Boko Haram?

Page 20: Using Big Data and Network Centrality Metrics to Identify Influencers Caitlin Rowe, Jed Lee, Max Nugmanov & Ruben Vargas Advisor: David Beskow.

20

Nigeria Case Study (2 of 3)

Abuja, Nigeria13,527 People in Network

321,206 Connections

• Given data for entire world for the last 30 days, we selected only the data related to Abuja, Nigeria

• This gives us a network with: • 13,527 People• 321,206 Connections• Density of 0.004

• This is a disconnected

network

Page 21: Using Big Data and Network Centrality Metrics to Identify Influencers Caitlin Rowe, Jed Lee, Max Nugmanov & Ruben Vargas Advisor: David Beskow.

21

Nigeria Case Study (3 of 3)

Rank Name Description

1 goodluck jonathan Goodluck Ebele Azikiwe Jonathan, GCFR, BNER, GCON (born 20 November 1957) is a Nigerian politician who has been President of Nigeria since 2010. Prior to his role as President, he served as Governor of Bayelsa State from 2005 to 2007 and as Vice-President of Nigeria from 2007 to 2010. Jonathan is a member of the ruling People's Democratic Party (PDP).

2 muhammadu buhari Muhammadu Buhari (born 17 December 1942) is a Nigerian politician and a retired Major General in the Nigerian Army who ruled Nigeria from 31 December 1983 to 27 August 1985, after taking power in a military coup d'état. The term Buharism is ascribed to the Buhari military government.

3 attahiru jegaProfessor Attahiru Muhammadu Jega is a Nigerian academic and Vice-Chancellor of Bayero University, Kano.

4 olusegun obasanjo Oluc¹gun Mathew OkikiÍla ; (born circa 5 March 1938) is a former Nigerian Army general who was President of Nigeria from 1999 to 2007. A Nigerian of Yoruba descent, Obasanjo was a career soldier before serving twice as his nation's head of state, as a military ruler from 13 February 1976 to 1 October 1979 and as a democratically elected president from 29 May 1999 to 29 May 2007.

5 sambo dasuki Sambo Dasuki, a retired Colonel of the Nigerian Army, is the current National Security Adviser (NSA) to President Goodluck Jonathan of Nigeria. He was appointed NSA on June 22, 2012 following the removal of General Owoye Andrew Azazi, retired.

6 rotimi amaechi Chibuike Rotimi Amaechi (born 27 May 1965) is the current Governor of Rivers State, Nigeria, since 2007. He was re-elected for a second term on 26 April 2011. Amaechi was a member of the People's Democratic Party before defecting to the All Progressives Congress on 27 November 2013.

7 garba shehu Kabir Garba Marafa is a Nigerian politician who was elected Senator for Zamfara Central in Zamfara State, Nigeria in the April 2011 elections, running on the All Nigeria People's Party (ANPP) ticket.Engineer Kabir Garba Marafa was formerly the Commissioner for Water Resources in Zamfara State.

8 ayo fayose Peter Ayodele Fayose (born 15 November 1960) is the current governor of Ekiti State in Nigeria .

9 ibrahim babangida General Ibrahim Badamasi Babangida (born August 17, 1941), also known as IBB, is a retired Nigerian Army officer who was the military ruler of Nigeria. He ruled Nigeria from 27 August 1985, when he overthrew Major General Muhammadu Buhari in a coup, until his departure from office on 27 August 1993, having annulled the elections held on June 12 that year.

10 olisa metuhOlisa Metuh is a Nigerian Lawyer, Politician and the National Publicity Secretary of the People's Democratic Party.

Rank Name

1 goodluck jonathan

2 muhammadu buhari

3 attahiru jega

4 olusegun obasanjo

5 sambo dasuki

6 rotimi amaechi

7 garba shehu

8 ayo fayose

9 ibrahim babangida

10 olisa metuh

Page 22: Using Big Data and Network Centrality Metrics to Identify Influencers Caitlin Rowe, Jed Lee, Max Nugmanov & Ruben Vargas Advisor: David Beskow.

22

Web App

http://data-analytics.net/Apps/fusionNet/

Page 23: Using Big Data and Network Centrality Metrics to Identify Influencers Caitlin Rowe, Jed Lee, Max Nugmanov & Ruben Vargas Advisor: David Beskow.

23

Current Status and Way Forward

• Tactical units and leaders are testing the model and web-app

• 95th Civil Affairs Brigade and the Communications-Electronics Research, Development and Engineering Center will continue to develop the tool next year

• The model has sparked interest in the OSINT Community at the Intelligence and Security Command at Fort Belvoir

Page 24: Using Big Data and Network Centrality Metrics to Identify Influencers Caitlin Rowe, Jed Lee, Max Nugmanov & Ruben Vargas Advisor: David Beskow.

24

Feedback from Users

“This is an awesome concept! It would definitely be helpful in preparing to visit an country or conduct an engagement”

“This also broadens the lens beyond just military considerations to a fuller geo-political/social spectrum.”

“This could be used for: a) Prepping Senior leader for country travel and engagements, b) Regional Engagement Directorates keeping tabs on country development, c) Intel analyst looking at open source trends”

Page 25: Using Big Data and Network Centrality Metrics to Identify Influencers Caitlin Rowe, Jed Lee, Max Nugmanov & Ruben Vargas Advisor: David Beskow.

Questions?

The authors would like to thank the Data Tactics Corporation for their support and collaboration throughout this project.

25

Page 26: Using Big Data and Network Centrality Metrics to Identify Influencers Caitlin Rowe, Jed Lee, Max Nugmanov & Ruben Vargas Advisor: David Beskow.

Back-up Slides

26

Page 27: Using Big Data and Network Centrality Metrics to Identify Influencers Caitlin Rowe, Jed Lee, Max Nugmanov & Ruben Vargas Advisor: David Beskow.

27

Local Networks