Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York...
Transcript of Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York...
![Page 1: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/1.jpg)
Easier than Excel: Social Network Analysis of
DocGraph with Gephi Janos G. Hajagos
Stony Brook School of Medicine
Fred Trotter fredtrotter.com
![Page 2: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/2.jpg)
DocGraph Based on FOIA request to CMS by Fred Trotter Pre-released at Strata RX 2012 Medicare providers (more than doctors) CY 2011 dates of service Share 11 or more patients in a 30 day forward window Initial access restricted to MedStartr funders
2
![Page 3: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/3.jpg)
DocGraph by the numbers Directed graph Average total degree 52.8 940,492 providers (graph nodes/vertices) 49,685,810 shared edges
3
![Page 4: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/4.jpg)
Geographic visualization
4
http://isurfsoftware.com/blog/2012/12/13/visualizing-geographic-connections-between-us-doctors/
![Page 5: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/5.jpg)
DocGraph data
5
![Page 6: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/6.jpg)
6
![Page 7: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/7.jpg)
NPPES National Plan and Provider Enumeration System Source of NPI (National Provider Identifier) No cost download Information is entered and updated by provider
- Data quality is good to poor CSV file with 314 columns A custom MySQL load script is used to normalize the database Bloom.api open source project to make data easier to access
- http://www.bloomapi.com/
7
![Page 8: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/8.jpg)
Tabular data
8
![Page 9: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/9.jpg)
Things we can do with tabular data
9
![Page 10: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/10.jpg)
Graph data Relation between authors and MeSH terms from PubMed
10
http://dx.doi.org/10.6084/m9.figshare.94595
![Page 11: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/11.jpg)
Graph types Undirected graph
- Facebook friendships Directed graph
- Twitter: follow and be followed Bipartite graph Multipartite
- RDF graph model - Property graph model Allow parallel edges
- RDF graph Model
11
![Page 12: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/12.jpg)
Components of a network/graph
12
![Page 13: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/13.jpg)
Graphs in healthcare Prescriber and patient (bipartite)
- NCPDP data with NPI Referral data sets Shared patients
- DocGraph Social networks
- Tweeting about a disease Limited by imagination
13
![Page 14: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/14.jpg)
Generating GraphML XML based file format for graphs Readable by a large number of tools
- Gephi - Mathematica - igraph (R) NetworkX a Python library for graphs which can export to GraphML GraphML is not a file format for really large graphs GraphML is not readable by d3.js
14
![Page 15: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/15.jpg)
15
GraphML can be loaded into Mathematica
![Page 16: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/16.jpg)
Gephi
16
![Page 17: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/17.jpg)
Gephi Java based open source tool Focused on interactivity
- Fast graphics - Multi-threaded - Visual updates Strong graph analytics Graphs stored in memory
- Upper limit is about 100,000 nodes Netbeans plugin architecture
- Integration with Neo4J - Additional layout algorithms
17
![Page 19: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/19.jpg)
Downloading sample files
https://dl.dropboxusercontent.com/u/21690634/DocGraph/docgraph_tutorial_examples.zip
19
![Page 20: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/20.jpg)
Subsets are generated using a Python script
20
python extract_providers_to_graphml.py "npi='1750499653'" sterrence Leaf-edges
Opening connection referral Configuration Selection criteria for subset graph: npi='1750499653' Referral table _name: referral.referral2011 NPI detail table name: referral.npi_summary_primary_taxonomy Nodes will be labeled by: provider_name Leaf-to-leaf edges will be exported? False … Imported 1 nodes … Imported 986 nodes … Imported 1724 edges Edge types imported {'core-to-leaf': 866, 'leaf-to-core': 856: None : 2} Leaf-to-leaf edges were not selected for export Writing GraphML file
![Page 21: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/21.jpg)
Generating a subset: some concepts
21
Core nodes
Adding leaf nodes
Connecting core nodes
Connecting to leaf nodes
Connecting leaf nodes
![Page 22: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/22.jpg)
Sample files jamestown_core_provider_graph.graphml
- Providers selected with practice addresses in Jamestown, NY - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml
- Includes providers above and those who are linked to them - 1,322 nodes with 12,457 edges albany_core_provider_graph.graphml
- Providers selected with practice addresses in Albany, NY - A small city in New York (approximately 100,000 residents) - 1,368 nodes with 44,711 edges
22
![Page 23: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/23.jpg)
Sample files (continued) bronx_core_provider_graph.graphml
- Providers selected with practice addresses in Bronx, NY - Urban community (1.4 million residents) - 3,268 nodes and 53,828 edges
23
![Page 24: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/24.jpg)
Opening a graph file
24
![Page 25: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/25.jpg)
Import report
25
![Page 26: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/26.jpg)
Force directed layout of the graph
26
![Page 27: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/27.jpg)
Results of the layout
27
![Page 28: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/28.jpg)
ForceAtlas 2 works well for larger graphs
28
![Page 29: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/29.jpg)
Navigating the graph Best experience with a three button mouse with a scroll wheel
- Right click and hold to pan - Scroll wheel to zoom in and out - Left click to select - Right click for context menus MacBook users
- command key and click and hold down on trackpad to pan - Two fingers to zoom on trackpad - Click on trackpad to select - Control click for context menus
29
![Page 30: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/30.jpg)
Coloring the graph (partitioning)
30
![Page 31: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/31.jpg)
Coloring the graph (partitioning)
31
![Page 32: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/32.jpg)
Varying node size based on importance Step 1: Need to select a measure for node importance
- Degree - PageRank - Eigenvector centrality Step 2: Run the measure against the graph Step 3: Ranking tab and “Size/Weight” Step 4: Set size range
32
![Page 33: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/33.jpg)
Graph measures Degree
- In-degree - Out-degree Graph structure measures
- Clustering (global and local) - Network diameter Centrality Measures
- Eigenvector centrality - PageRank (Google search) Community measures And more . . . . .
33
![Page 34: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/34.jpg)
Interactively viewing node attributes
34
Click the “T” icon on the bottom to turn on node labeling
![Page 35: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/35.jpg)
Data Laboratory
35
![Page 36: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/36.jpg)
Selecting visible fields
36
![Page 37: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/37.jpg)
Viewing edge attributes
37
![Page 38: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/38.jpg)
Saving your graph Save your graph in .gephi format
- xml based format - preserves layout, size, and color Save in GraphML format for use with outside programs
38
![Page 39: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/39.jpg)
Filtering nodes by attributes
39
![Page 40: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/40.jpg)
Hints for filtering nodes Drag field filter “is_physician” from the top pane to the lower pane Set the value to filter on
- Value should equal 1 - 1 is equivalent to true Click “Filter” to apply
40
![Page 41: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/41.jpg)
Producing a final graph
41
We need to rescale the edge weights in the graph
![Page 42: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/42.jpg)
Producing a final graph after scaling
42
![Page 43: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/43.jpg)
Bronx core provider graph
43
![Page 44: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/44.jpg)
Challenge questions Which institution is the most “important” provider for the Bronx?
- Hint: try a centrality measure Can you determine if geography plays a role in patient sharing in the Bronx?
- Which parameter could be used to partition the graph? Can you filter the graph to show only radiologists? Which radiologist has the highest “authority” in the graph?
44
![Page 45: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/45.jpg)
Other tools for graph analysis NetworkX
- Python - Lots of algorithms igraph
- R and Python Gremlin – graph traversal and manipulation
- Groovy shell - Gremlin interface is implemented for Neo4J And more . . .
45
![Page 46: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/46.jpg)
Scaling the analysis to the entire DocGraph Most healthcare graphs will be big (millions of nodes) What we learn at the local level can be applied at the global level
- Importance of geography - Supernodes (radiologist, ER docs, pathologist, transportation, …) Many graph measures don’t scale well
- Maximal cliques Currently exploring how to use Faunus to scale the analysis
with Hadoop
46
![Page 47: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/47.jpg)
Links http://strata.oreilly.com/2012/11/docgraph-open-social-doctor-data.html (information) https://github.com/jhajagos/DocGraph (code) http://notonlydev.com/docgraph-data/ (open source $1 covers bandwidth fees) https://groups.google.com/forum/#!forum/docgraph (mailing list)
47
![Page 48: Easier than Excel: Social Network Analysis of DocGraph ... · - Small city in far western New York (approximately 30,000 residents) - 179 nodes with 5,560 edges jamestown_core_and_leaf_provider_graph.graphml](https://reader035.fdocuments.in/reader035/viewer/2022070707/5ea1245be64f8a2aeb77335d/html5/thumbnails/48.jpg)
Questions
48
Try to publish your own healthcare dataset as a graph!