The Hitchhiker’s Guide to Graph Exchange Formats · The Hitchhiker’s Guide to Graph Exchange...

31
The Hitchhiker’s Guide to Graph Exchange Formats Prof. Matthew Roughan [email protected] http://www.maths.adelaide.edu.au/matthew.roughan/ Work with Jono Tuke UoA June 4, 2015 M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 1 / 31

Transcript of The Hitchhiker’s Guide to Graph Exchange Formats · The Hitchhiker’s Guide to Graph Exchange...

Page 1: The Hitchhiker’s Guide to Graph Exchange Formats · The Hitchhiker’s Guide to Graph Exchange Formats ... Lot’s of other considerations Software support I documentation I maintenance

The Hitchhiker’s Guide to Graph ExchangeFormats

Prof. Matthew [email protected]

http://www.maths.adelaide.edu.au/matthew.roughan/Work with Jono Tuke

UoA

June 4, 2015

M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 1 / 31

Page 2: The Hitchhiker’s Guide to Graph Exchange Formats · The Hitchhiker’s Guide to Graph Exchange Formats ... Lot’s of other considerations Software support I documentation I maintenance

Graphs

Graph: G(N,E)I N = set of nodes (vertices)I E = set of edges (links)

Often we have additional information, e.g.,I link distanceI node typeI graph name

M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 2 / 31

Page 3: The Hitchhiker’s Guide to Graph Exchange Formats · The Hitchhiker’s Guide to Graph Exchange Formats ... Lot’s of other considerations Software support I documentation I maintenance

Why?

To represent data where “connections” are 1st class objects intheir own right

I storing the data in the right format improves access, processing, ...I it’s natural, elegant, efficient, ...

Many, many datasets

M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 3 / 31

Page 4: The Hitchhiker’s Guide to Graph Exchange Formats · The Hitchhiker’s Guide to Graph Exchange Formats ... Lot’s of other considerations Software support I documentation I maintenance

ISPs: Internode: layer 3

http:

//www.internode.on.net/pdf/network/internode-domestic-ip-network.pdf

M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 4 / 31

Page 5: The Hitchhiker’s Guide to Graph Exchange Formats · The Hitchhiker’s Guide to Graph Exchange Formats ... Lot’s of other considerations Software support I documentation I maintenance

ISPs: Level 3 (NA)

http://www.fiberco.org/images/Level3-Metro-Fiber-Map4.jpg

M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 5 / 31

Page 6: The Hitchhiker’s Guide to Graph Exchange Formats · The Hitchhiker’s Guide to Graph Exchange Formats ... Lot’s of other considerations Software support I documentation I maintenance

Telegraph submarine cables

http://en.wikipedia.org/wiki/File:1901_Eastern_Telegraph_cables.png

M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 6 / 31

Page 7: The Hitchhiker’s Guide to Graph Exchange Formats · The Hitchhiker’s Guide to Graph Exchange Formats ... Lot’s of other considerations Software support I documentation I maintenance

Electricity grid

M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 7 / 31

Page 8: The Hitchhiker’s Guide to Graph Exchange Formats · The Hitchhiker’s Guide to Graph Exchange Formats ... Lot’s of other considerations Software support I documentation I maintenance

Bus network (Adelaide CBD)

M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 8 / 31

Page 9: The Hitchhiker’s Guide to Graph Exchange Formats · The Hitchhiker’s Guide to Graph Exchange Formats ... Lot’s of other considerations Software support I documentation I maintenance

French Rail

http://www.alleuroperail.com/europe-map-railways.htm

M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 9 / 31

Page 10: The Hitchhiker’s Guide to Graph Exchange Formats · The Hitchhiker’s Guide to Graph Exchange Formats ... Lot’s of other considerations Software support I documentation I maintenance

Protocol relationships

M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 10 / 31

Page 11: The Hitchhiker’s Guide to Graph Exchange Formats · The Hitchhiker’s Guide to Graph Exchange Formats ... Lot’s of other considerations Software support I documentation I maintenance

Food web

M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 11 / 31

Page 12: The Hitchhiker’s Guide to Graph Exchange Formats · The Hitchhiker’s Guide to Graph Exchange Formats ... Lot’s of other considerations Software support I documentation I maintenance

Network of Musicians (last.fm)

http://sixdegrees.hu/last.fm/

M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 12 / 31

Page 13: The Hitchhiker’s Guide to Graph Exchange Formats · The Hitchhiker’s Guide to Graph Exchange Formats ... Lot’s of other considerations Software support I documentation I maintenance

Exchange

Open research requires exchange of dataI replications of, and comparisons with resultsI comparisons between datasets

Working on data in closed formats is badI vendor lock-inI not freeI not portable (or sometimes even backward compatible)I often black boxes

Exchange formats are designed to facilitate open exchange ofdata

M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 13 / 31

Page 14: The Hitchhiker’s Guide to Graph Exchange Formats · The Hitchhiker’s Guide to Graph Exchange Formats ... Lot’s of other considerations Software support I documentation I maintenance

Portability

Main requirement is portabilityPortability between software

I graph entryI graph analysisI graph visualisationI ...

Portability between architectureI OS (Mac, Linux, Windows, FreeBSD, ...)I Hardware (big-endian v little-endian)

RequirementsOpen formatDocumented

M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 14 / 31

Page 15: The Hitchhiker’s Guide to Graph Exchange Formats · The Hitchhiker’s Guide to Graph Exchange Formats ... Lot’s of other considerations Software support I documentation I maintenance

Graph exchange file formats1 bintsv4 bintsv4 (GraphLab)2 BioGRID TAB X BioGRID TAB 2.0 Format3 BLAG, GDToolkit Batch layout generator (GDToolkit)4 BVGraph X Boldi-Vigna graph compression5 Chaco X Chaco graph format6 Cluto Cluto/Metis/Graclus format7 DGS X Dynamic GraphStream Format8 DGML Directed Graph Markup Language9 DIMACS DIMACS graph format

10 Dot GraphVis Dot Language11 DotML Dot Markup Language12 DyNetML DyNetML XML13 GAMFF A Graph and Matrix Format14 GDF Guess Data Format15 GDL Graph Description Language16 GEDCOM Geneaological data17 GEXF X Graph Exchange XML Format18 GML X Graph Modelling Language19 Graph6 X Graph620 Graph::Easy X Perl Graph::Easy format21 GraphEd GraphEd simple format22 GraphJSON Graph JSON23 GraphML X Graph Markup Language24 GraphSON TinkerPop’s JSON-based Graph format25 GraphXML X XML-Based Graph Description Language26 GraX GraX27 GRXL XML Specification for Grrr Program28 GT-ITM Georgia Tech Internetwork Topology Models29 GXL X Graph eXchange Language30 Harwell-Boeing Harwell-Boeing sparse (TGFaceny) matri31 Inet Inet Topology Generator file32 ITDK X CAIDA Internet Topology Data Kit33 JSON Graph json-graph-specification34 LEDA LEDA format35 LGF LEMON Graph Format36 LGL Large Graph Layout37 LibSea X CAIDA LibSea format38 KrackPlot X KrackPlot data format39 Matlab Matlab saved workspace40 Matrix Matrix Market sparse matrix41 Mivia Mivia ARG database format42 MultiNet MultiNet43 Netdraw VNA Netdraw VNA44 NetML Network Markup Language45 Ncol Large Graph Layout46 NNF Nested Network Format47 Nod KrackPlot Node format48 NOS Neo Org Stat format49 ns-tcl ns-2 Tcl network definition50 OGDL X Ordered Graph Data Language51 OGML Open Graph Markup Language52 Osprey Osprey file format53 Otter Otter’s native format54 Pajek (.net) X Pajek Tool’s .net format55 Pajek (.paj) X Pajek Tool project (.clu, .vec, .per, ...)56 Planar X Plantri Planar Code andedgeCode57 PSI MI Protenomics Standards Initiative Molecular Interaction58 RSF Rigi Standard Format59 Rocketfuel Rocketfuel ISP Maps60 Rutherford-Boeing Rutherford-Boeing sparse (TGFaceny) matri61 SGB Stanford GraphBase62 SGF Structured Graph Format63 S-Dot S-Dot (lisp interface to Graphviz)64 SIF Simple Interaction Format65 SNAP X Stanford Network Analysis Platform66 SoNIA X So NIA Son format67 Sparse6 X Sparse668 StOCNET StOCNET native format69 TEI Text Encoding Initiative Graph Format (XML-compatible)70 TGF, TGF X Trival Graph Format, and other simpleedgelists (CSV, TSV, Excel, ...)71 Tulip TLP Tulip graph format72 UCINET DL UCINET Data Language73 XGMML eXtensible Graph Markup and Modeling Language74 XMLBIF XML-based BayesNets Interchange Format75 XTND XML Transition Network Definition76 YGF X Y Graph Format

M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 15 / 31

Page 16: The Hitchhiker’s Guide to Graph Exchange Formats · The Hitchhiker’s Guide to Graph Exchange Formats ... Lot’s of other considerations Software support I documentation I maintenance

Graph exchange file formats

Over 100 considered76 analysed

I aiming at exchange formatsF not all designed for exchange,

but some became de factoexchange formats

I sought minimal level ofdocumentations

F not all exchange formats aredocumented (still)

0

1

2

1990 1995 2000 2005 2010 2015Year

coun

t

structureBNFIntermediateJSONOtherSimpleXML

M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 16 / 31

Page 17: The Hitchhiker’s Guide to Graph Exchange Formats · The Hitchhiker’s Guide to Graph Exchange Formats ... Lot’s of other considerations Software support I documentation I maintenance

Descriptors

Many possibilities (considered 26)File type

I encodingI representationI meta-dataI compressionI ...

Graph typesI directed, multi-, hyper-, ...

AttributesI what attributes can be stored with graphI extensibility

GeneralI extensibilityI

M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 17 / 31

Page 18: The Hitchhiker’s Guide to Graph Exchange Formats · The Hitchhiker’s Guide to Graph Exchange Formats ... Lot’s of other considerations Software support I documentation I maintenance

Encoding Types

Storage typeasciiascii/binarybinaryISO 8859unicodeUTF−8

StructureBNFIntermediateJSONOtherSimpleXML

M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 18 / 31

Page 19: The Hitchhiker’s Guide to Graph Exchange Formats · The Hitchhiker’s Guide to Graph Exchange Formats ... Lot’s of other considerations Software support I documentation I maintenance

Graph RepresentationsList of links: explicitly give

I N: e.g., N = {1,2,3}I E : e.g., E = {(1,2), (1,3)}

Adjacency matrix: define connectivity through a (0,1) matrix Adefined by

Aij =

{1, if (i , j) ∈ E0, otherwise

e.g.,

A =

0 1 11 0 01 0 0

List of neighbour lists:

I for each node: list its neighbours

I e.g.1 2, 323

PathsConstructive and/or proceduralM.Roughan (UoA) Hitch Hikers Guide June 4, 2015 19 / 31

Page 20: The Hitchhiker’s Guide to Graph Exchange Formats · The Hitchhiker’s Guide to Graph Exchange Formats ... Lot’s of other considerations Software support I documentation I maintenance

Graph Representations

Representationedgeedge/const/procedge/matrixedge/neighedge/neigh/matrixedge/pathedge/pathsedge/proceduralmatrixmatrix/smatrixneighneigh/edge/matrixsmatrixsmatrix/matrix

M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 20 / 31

Page 21: The Hitchhiker’s Guide to Graph Exchange Formats · The Hitchhiker’s Guide to Graph Exchange Formats ... Lot’s of other considerations Software support I documentation I maintenance

Descriptors: Graph Types and Generalisations

directed/undirectedacyclic/treesmultigraph or pseudograph: has multiple parallel links betweentwo nodes

I e.g. its easy to have two links between two routersI also allows self-loop

hypergraph: links connect more than two nodesI e.g., where you have a connective medium (rather than a wire), for

instance in a wireless network.hierarchy

I nodes have subgraphs

meta-graph

M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 21 / 31

Page 22: The Hitchhiker’s Guide to Graph Exchange Formats · The Hitchhiker’s Guide to Graph Exchange Formats ... Lot’s of other considerations Software support I documentation I maintenance

Descriptors: Attributes

Static DynamicNode name, location, type, ... up/down, ...Edge distance, capacity, ... up/down, utilisation, ...

M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 22 / 31

Page 23: The Hitchhiker’s Guide to Graph Exchange Formats · The Hitchhiker’s Guide to Graph Exchange Formats ... Lot’s of other considerations Software support I documentation I maintenance

Descriptors: Attributes

Single number/weight (per edge)Multiple

I fixed vs extensibleDefaults

I multiple inheritanceVisualization

I not really attributes of graphI used for drawing itI color, shape, layout, ...

PortsTemporal dynamics

M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 23 / 31

Page 24: The Hitchhiker’s Guide to Graph Exchange Formats · The Hitchhiker’s Guide to Graph Exchange Formats ... Lot’s of other considerations Software support I documentation I maintenance

Descriptors: General

ExtensibleSchema checkingChecksumsMultiple graphsExternal references

M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 24 / 31

Page 25: The Hitchhiker’s Guide to Graph Exchange Formats · The Hitchhiker’s Guide to Graph Exchange Formats ... Lot’s of other considerations Software support I documentation I maintenance

Common Attributes

multiple attributes

edge edge

built in compression

checksums

multiple inheritance

incremental specifications

temporal data dynamics

external data references

ports

hyper graph

multiple graphs

default values

extensible

multi graph

hierarchy

schema checking

checked

visualisation data

edge weights

constructive

procedural

smatrix

matrix

neigh

edge

general attributerepresentation type

0.0 0.2 0.4 0.6Proportion with attribute

M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 25 / 31

Page 26: The Hitchhiker’s Guide to Graph Exchange Formats · The Hitchhiker’s Guide to Graph Exchange Formats ... Lot’s of other considerations Software support I documentation I maintenance

Attribute Correlations

multi graph : checked

structure : proc

edgeweights : multiple attributes

default values : ports

multi graph : ports

multiple attributes : schema checking

multi graph : default values

default values : visualisation data

multiple attributes : edge

hyper graph : ports

integral metadata : schema checking

multi graph : multiple attributes

integral metadata : multiple attributes

structure : schema checking

0.00000 0.00004 0.00008 0.00012P−value

M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 26 / 31

Page 27: The Hitchhiker’s Guide to Graph Exchange Formats · The Hitchhiker’s Guide to Graph Exchange Formats ... Lot’s of other considerations Software support I documentation I maintenance

Lot’s of other considerations

Software supportI documentationI maintenanceI issue of partial support

Public dataI how many people already use it

EfficiencyHuman readability

I self-describing

M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 27 / 31

Page 28: The Hitchhiker’s Guide to Graph Exchange Formats · The Hitchhiker’s Guide to Graph Exchange Formats ... Lot’s of other considerations Software support I documentation I maintenance

What’s missing?

A lot of DB conceptsMost are designed to be read in one piece

I no data partitioningI no parallelisationI no serialisationI no random access

Most are not designed with data curation in mindI no support for editingI no support for version, ...

Graph DBs handle theseI but not the portability/exchange issues

M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 28 / 31

Page 29: The Hitchhiker’s Guide to Graph Exchange Formats · The Hitchhiker’s Guide to Graph Exchange Formats ... Lot’s of other considerations Software support I documentation I maintenance

What’s missing?

CompressionStoring large graphs

I similar to storing imagesI nice to have native compression

Only one (BVGraph) treats compression seriouslyI couple of others take a little care but aren’t true compressions

Most have XML-like bloatI file compresses OK, but read/write performance?

There are a few papers on graph compression, but little work onformats that support it

M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 29 / 31

Page 30: The Hitchhiker’s Guide to Graph Exchange Formats · The Hitchhiker’s Guide to Graph Exchange Formats ... Lot’s of other considerations Software support I documentation I maintenance

Conclusion

Graph Exchange FormatsI very manyI lots of overlapI lots of varietyI no “one true” format

Perhaps we need three:I TGF (Trivial Graph Format)I GraphML (Feature rich, extensible, ...)I Format that can cope with really big graphs

Maybe a container format is what we really need?

M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 30 / 31

Page 31: The Hitchhiker’s Guide to Graph Exchange Formats · The Hitchhiker’s Guide to Graph Exchange Formats ... Lot’s of other considerations Software support I documentation I maintenance

M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 30 / 31