Finding Insights In Connected Data: Using Graph Databases In Journalism
-
Upload
william-lyon -
Category
Software
-
view
1.042 -
download
3
Transcript of Finding Insights In Connected Data: Using Graph Databases In Journalism
Finding Insights in Connected DataGraph Databases in Journalism
NICAR 2016Denver
William Lyon@lyonwj
Agenda
• What is a graph database?• Why graphs in journalism?
• Demo1: Graphing US Congress
• Demo2: Hillary email dataset
What is a graph?
Chart
Chart Graph
VIEW
ED
VIEWED
BOUG
HTVIEW
ED BOUGHT
BOUGHT
BO
UG
HT
BOUG
HT
MANAGE
MANAGE
LEADS
REGION
MANAGE
MANAGE
REGION
LEADS
LEADS
COLL
ABO
RAT
ACCOUNT HOLDER 2
ACCOUNT
ACCOUNT
CREDIT CARD
BANKACCOUNT
BANKACCOUNT BANK
ADDRESS
PHONE
PHONE NUMBER
SSN 2
LOAN
SSN 2
UNSECURE LOAN
CREDIT CARD
Graph Databases in Journalism
Graph Databases Software that stores & queries data as a graph.
Graph Database
• Property graph data model• Nodes and relationships
• Native graph processing• Cypher query language
neo4j.com
Why graph databases in journalism?
Why graph databases in journalism?
bills.csv
committees.csv
votes.csv
https://www.govtrack.us/developers
bills.csv
committees.csv
votes.csv
https://www.govtrack.us/developers
SELECT l.name, c.jurisdictionFROM legislators p LEFT JOIN committee c ON c.member_ID=l.thomasIDWHERE c.thomasID = “HSAP”
SQLER Diagrams
Relational Versus Graph Models
Relational Model Graph Model
KNOWS
KNOWS
KNOWS
ANDREAS
TOBIAS
MICA
DELIA
Person FriendPerson-Friend
ANDREASDELIA
TOBIAS
MICA
Graph Database
Relational Database
A way of representing data
Property Graph Model
The Whiteboard Model Is the Physical Model
Property Graph Model Components
Nodes • The objects in the graph • Can have name-value properties • Can be labeled
Relationships • Relate nodes by type and
direction • Can have name-value properties
CAR
DRIVES
name: “Dan” born: May 29, 1970
twitter: “@dan”name: “Ann”
born: Dec 5, 1975
since: Jan 10, 2011
brand: “Volvo” model: “V70”
LOVES
LOVES
LIVES WITH
OWNS
PERSON PERSON
Cypher Query Language
SQL for graphs
Cypher: Powerful and Expressive Query Language
CREATE (:Person { name:“Dan”} ) -[:LOVES]-> (:Person { name:“Ann”} )
LOVES
Dan Ann
LABEL PROPERTY
NODE NODE
LABEL PROPERTY
MATCH (boss)-[:MANAGES*0..3]->(sub), (sub)-[:MANAGES*1..3]->(report)WHERE boss.name = “John Doe”RETURN sub.name AS Subordinate, count(report) AS Total
Express Complex Queries Easily with Cypher
Find all direct reports and how many people they manage, up to 3 levels down
Cypher Query
SQL Query
Graphing US Congress
Demo
https://github.com/legis-graph/legis-graph
https://github.com/legis-graph/legis-graph
LOAD CSV WITH HEADERS FROM “file:///legislators.csv” AS line MERGE (l:Legislator (thomasID: line.thomasID}) SET l = line MERGE (s:State {code:line.state})<-[:REPRESENTS]-(l) …
US Congress
https://github.com/legis-graph/legis-graph
http://legis-graph.github.io/legis-graph-spatial/
contributions
committees
candidates
https://gist.github.com/johnymontana/02ae47fc0a29719db045
+
https://gist.github.com/johnymontana/02ae47fc0a29719db045
Graph data models are easy to evolve!Takeaway
Hillary Clinton EmailsDemo
Clinton email graph model
Data munging
http://graphics.wsj.com/hillary-clinton-email-documents/
Data munging
https://github.com/OpenRefine/OpenRefine/wiki/Faceting
LOAD CSV - Cypher
http://www.developeradvocate.com/2015/11/graphing-hillary-clinton-email/
Content mining
“Networks give structure to the conversation while content mining gives meaning.”
http://breakthroughanalysis.com/2015/10/08/ltapreriitsouda/
- Preriit Souda
Extracting topics from email text
Extracting topics from email text
http://www.markhneedham.com/blog/2015/02/13/neo4j-building-a-topic-graph-with-prismatic-interest-graph-api/
Clinton email graph model
Clinton email graph model
http://bit.ly/1R1ybyu
Resources
Visualization
https://linkurio.us/
http://visjs.org/
http://neo4j.com/developer/guide-data-visualization/
Data analysis with Neo4j
Py2neo http://py2neo.org/2.0/
IPython Notebook https://github.com/versae/ipython-cypher
R-lang http://neo4j.com/developer/r/
ICIJ Case StudySwiss Leaks
https://youtu.be/4__ni4aC8gI http://neo4j.com/case-studies/icij/
graphdatabases.com