An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums...
Transcript of An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums...
![Page 1: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/1.jpg)
An Introduction to Graph Analytics Platforms
M. Tamer Ozsu
University of WaterlooDavid R. Cheriton School of Computer Science
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 1 / 59
![Page 2: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/2.jpg)
Graph Data are Very Common
Internet
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 2 / 59
![Page 3: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/3.jpg)
Graph Data are Very Common
Socialnetworks
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 2 / 59
![Page 4: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/4.jpg)
Graph Data are Very Common
Trade volumesand
connections
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 2 / 59
![Page 5: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/5.jpg)
Graph Data are Very Common
Biologicalnetworks
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 2 / 59
![Page 6: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/6.jpg)
Graph Data are Very Common
As of September 2011
MusicBrainz
(zitgist)
P20
Turismo de
Zaragoza
yovisto
Yahoo! Geo
Planet
YAGO
World Fact-book
El ViajeroTourism
WordNet (W3C)
WordNet (VUA)
VIVO UF
VIVO Indiana
VIVO Cornell
VIAF
URIBurner
Sussex Reading
Lists
Plymouth Reading
Lists
UniRef
UniProt
UMBEL
UK Post-codes
legislationdata.gov.uk
Uberblic
UB Mann-heim
TWC LOGD
Twarql
transportdata.gov.
uk
Traffic Scotland
theses.fr
Thesau-rus W
totl.net
Tele-graphis
TCMGeneDIT
TaxonConcept
Open Library (Talis)
tags2con delicious
t4gminfo
Swedish Open
Cultural Heritage
Surge Radio
Sudoc
STW
RAMEAU SH
statisticsdata.gov.
uk
St. Andrews Resource
Lists
ECS South-ampton EPrints
SSW Thesaur
us
SmartLink
Slideshare2RDF
semanticweb.org
SemanticTweet
Semantic XBRL
SWDog Food
Source Code Ecosystem Linked Data
US SEC (rdfabout)
Sears
Scotland Geo-
graphy
ScotlandPupils &Exams
Scholaro-meter
WordNet (RKB
Explorer)
Wiki
UN/LOCODE
Ulm
ECS (RKB
Explorer)
Roma
RISKS
RESEX
RAE2001
Pisa
OS
OAI
NSF
New-castle
LAASKISTI
JISC
IRIT
IEEE
IBM
Eurécom
ERA
ePrints dotAC
DEPLOY
DBLP (RKB
Explorer)
Crime Reports
UK
Course-ware
CORDIS (RKB
Explorer)CiteSeer
Budapest
ACM
riese
Revyu
researchdata.gov.
ukRen. Energy Genera-
tors
referencedata.gov.
uk
Recht-spraak.
nl
RDFohloh
Last.FM (rdfize)
RDF Book
Mashup
Rådata nå!
PSH
Product Types
Ontology
ProductDB
PBAC
Poké-pédia
patentsdata.go
v.uk
OxPoints
Ord-nance Survey
Openly Local
Open Library
OpenCyc
Open Corpo-rates
OpenCalais
OpenEI
Open Election
Data Project
OpenData
Thesau-rus
Ontos News Portal
OGOLOD
JanusAMP
Ocean Drilling Codices
New York
Times
NVD
ntnusc
NTU Resource
Lists
Norwe-gian
MeSH
NDL subjects
ndlna
myExperi-ment
Italian Museums
medu-cator
MARC Codes List
Man-chester Reading
Lists
Lotico
Weather Stations
London Gazette
LOIUS
Linked Open Colors
lobidResources
lobidOrgani-sations
LEM
LinkedMDB
LinkedLCCN
LinkedGeoData
LinkedCT
LinkedUser
FeedbackLOV
Linked Open
Numbers
LODE
Eurostat (OntologyCentral)
Linked EDGAR
(OntologyCentral)
Linked Crunch-
base
lingvoj
Lichfield Spen-ding
LIBRIS
Lexvo
LCSH
DBLP (L3S)
Linked Sensor Data (Kno.e.sis)
Klapp-stuhl-club
Good-win
Family
National Radio-activity
JP
Jamendo (DBtune)
Italian public
schools
ISTAT Immi-gration
iServe
IdRef Sudoc
NSZL Catalog
Hellenic PD
Hellenic FBD
PiedmontAccomo-dations
GovTrack
GovWILD
GoogleArt
wrapper
gnoss
GESIS
GeoWordNet
GeoSpecies
GeoNames
GeoLinkedData
GEMET
GTAA
STITCH
SIDER
Project Guten-berg
MediCare
Euro-stat
(FUB)
EURES
DrugBank
Disea-some
DBLP (FU
Berlin)
DailyMed
CORDIS(FUB)
Freebase
flickr wrappr
Fishes of Texas
Finnish Munici-palities
ChEMBL
FanHubz
EventMedia
EUTC Produc-
tions
Eurostat
Europeana
EUNIS
EU Insti-
tutions
ESD stan-dards
EARTh
Enipedia
Popula-tion (En-AKTing)
NHS(En-
AKTing) Mortality(En-
AKTing)
Energy (En-
AKTing)
Crime(En-
AKTing)
CO2 Emission
(En-AKTing)
EEA
SISVU
education.data.g
ov.uk
ECS South-ampton
ECCO-TCP
GND
Didactalia
DDC Deutsche Bio-
graphie
datadcs
MusicBrainz
(DBTune)
Magna-tune
John Peel
(DBTune)
Classical (DB
Tune)
AudioScrobbler (DBTune)
Last.FM artists
(DBTune)
DBTropes
Portu-guese
DBpedia
dbpedia lite
Greek DBpedia
DBpedia
data-open-ac-uk
SMCJournals
Pokedex
Airports
NASA (Data Incu-bator)
MusicBrainz(Data
Incubator)
Moseley Folk
Metoffice Weather Forecasts
Discogs (Data
Incubator)
Climbing
data.gov.uk intervals
Data Gov.ie
databnf.fr
Cornetto
reegle
Chronic-ling
America
Chem2Bio2RDF
Calames
businessdata.gov.
uk
Bricklink
Brazilian Poli-
ticians
BNB
UniSTS
UniPathway
UniParc
Taxonomy
UniProt(Bio2RDF)
SGD
Reactome
PubMedPub
Chem
PRO-SITE
ProDom
Pfam
PDB
OMIMMGI
KEGG Reaction
KEGG Pathway
KEGG Glycan
KEGG Enzyme
KEGG Drug
KEGG Com-pound
InterPro
HomoloGene
HGNC
Gene Ontology
GeneID
Affy-metrix
bible ontology
BibBase
FTS
BBC Wildlife Finder
BBC Program
mes BBC Music
Alpine Ski
Austria
LOCAH
Amster-dam
Museum
AGROVOC
AEMET
US Census (rdfabout)
Media
Geographic
Publications
Government
Cross-domain
Life sciences
User-generated content
Linked data
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 2 / 59
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch.http://lod-cloud.net/
![Page 7: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/7.jpg)
Outline
1 Introduction – Graph Types
2 Property Graph ProcessingClassificationOnline queryingOffline analytics
3 Graph Analytics Computational ModelsVertex-CentricBlock-CentricMapReduce-BasedModified MapReduce
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 3 / 59
![Page 8: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/8.jpg)
Outline
1 Introduction – Graph Types
2 Property Graph ProcessingClassificationOnline queryingOffline analytics
3 Graph Analytics Computational ModelsVertex-CentricBlock-CentricMapReduce-BasedModified MapReduce
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 4 / 59
![Page 9: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/9.jpg)
Graph Types
Property graph
film 2014(initial release date, “1980-05-23”)
(label, “The Shining”)
books 0743424425(rating, 4.7)
offers 0743424425amazonOffer
geo 2635167(name, “United Kingdom”)
(population, 62348447) actor 29704(actor name, “Jack Nicholson”)
film 3418(label, “The Passenger”)
film 1267(label, “The Last Tycoon”)
director 8476(director name, “Stanley Kubrick”)
film 2685(label, “A Clockwork Orange”)
film 424(label, “Spartacus”)
actor 30013
(relatedBook)
(hasOffer)
(based near)(actor)
(director) (actor)
(actor) (actor)
(director) (director)
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 5 / 59
![Page 10: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/10.jpg)
Graph Types
RDF graph
mdb:film/2014
“1980-05-23”
movie:initial release date
“The Shining”refs:label
bm:books/0743424425
4.7
rev:rating
bm:offers/0743424425amazonOffer
geo:2635167
“United Kingdom”
gn:name
62348447
gn:population
mdb:actor/29704
“Jack Nicholson”
movie:actor name
mdb:film/3418
“The Passenger”
refs:label
mdb:film/1267
“The Last Tycoon”
refs:label
mdb:director/8476
“Stanley Kubrick”
movie:director name
mdb:film/2685
“A Clockwork Orange”
refs:label
mdb:film/424
“Spartacus”
refs:label
mdb:actor/30013
movie:relatedBook
scam:hasOffer
foaf:based nearmovie:actor
movie:directormovie:actor
movie:actor movie:actor
movie:director movie:director
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 5 / 59
![Page 11: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/11.jpg)
Graph Types
Property graph
film 2014(initial release date, “1980-05-23”)
(label, “The Shining”)
books 0743424425(rating, 4.7)
offers 0743424425amazonOffer
geo 2635167(name, “United Kingdom”)
(population, 62348447) actor 29704(actor name, “Jack Nicholson”)
film 3418(label, “The Passenger”)
film 1267(label, “The Last Tycoon”)
director 8476(director name, “Stanley Kubrick”)
film 2685(label, “A Clockwork Orange”)
film 424(label, “Spartacus”)
actor 30013
(relatedBook)
(hasOffer)
(based near)(actor)
(director) (actor)
(actor) (actor)
(director) (director)
Workload: Online queries andanalytic workloads
Query execution: Varies
RDF graph
mdb:film/2014
“1980-05-23”
movie:initial release date
“The Shining”refs:label
bm:books/0743424425
4.7
rev:rating
bm:offers/0743424425amazonOffer
geo:2635167
“United Kingdom”
gn:name
62348447
gn:population
mdb:actor/29704
“Jack Nicholson”
movie:actor name
mdb:film/3418
“The Passenger”
refs:label
mdb:film/1267
“The Last Tycoon”
refs:label
mdb:director/8476
“Stanley Kubrick”
movie:director name
mdb:film/2685
“A Clockwork Orange”
refs:label
mdb:film/424
“Spartacus”
refs:label
mdb:actor/30013
movie:relatedBook
scam:hasOffer
foaf:based nearmovie:actor
movie:directormovie:actor
movie:actor movie:actor
movie:director movie:director
Workload: SPARQL queries
Query execution: subgraphmatching by homomorphism
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 5 / 59
![Page 12: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/12.jpg)
RDF Introduction
Everything is an uniquely namedresource
Prefixes can be used to shorten thenames
Properties of resources can be defined
Relationships with other resources canbe defined
Resource descriptions can becontributed by different people/groupsand can be located anywhere in the web
Integrated web “database”
http://data.linkedmdb.org/resource/actor/JN29704
xmlns:y=http://data.linkedmdb.org/resource/actor/
y:JN29704
y:JN29704:hasName “Jack Nicholson”
y:JN29704:BornOnDate “1937-04-22”
y:TS2014:title “The Shining”
y:TS2014:releaseDate “1980-05-23”
y:TS2014
JN29704:movieActor
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 6 / 59
![Page 13: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/13.jpg)
RDF Introduction
Everything is an uniquely namedresource
Prefixes can be used to shorten thenames
Properties of resources can be defined
Relationships with other resources canbe defined
Resource descriptions can becontributed by different people/groupsand can be located anywhere in the web
Integrated web “database”
http://data.linkedmdb.org/resource/actor/JN29704
xmlns:y=http://data.linkedmdb.org/resource/actor/
y:JN29704
y:JN29704:hasName “Jack Nicholson”
y:JN29704:BornOnDate “1937-04-22”
y:TS2014:title “The Shining”
y:TS2014:releaseDate “1980-05-23”
y:TS2014
JN29704:movieActor
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 6 / 59
![Page 14: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/14.jpg)
RDF Introduction
Everything is an uniquely namedresource
Prefixes can be used to shorten thenames
Properties of resources can be defined
Relationships with other resources canbe defined
Resource descriptions can becontributed by different people/groupsand can be located anywhere in the web
Integrated web “database”
http://data.linkedmdb.org/resource/actor/JN29704
xmlns:y=http://data.linkedmdb.org/resource/actor/
y:JN29704
y:JN29704:hasName “Jack Nicholson”
y:JN29704:BornOnDate “1937-04-22”
y:TS2014:title “The Shining”
y:TS2014:releaseDate “1980-05-23”
y:TS2014
JN29704:movieActor
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 6 / 59
![Page 15: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/15.jpg)
RDF Introduction
Everything is an uniquely namedresource
Prefixes can be used to shorten thenames
Properties of resources can be defined
Relationships with other resources canbe defined
Resource descriptions can becontributed by different people/groupsand can be located anywhere in the web
Integrated web “database”
http://data.linkedmdb.org/resource/actor/JN29704
xmlns:y=http://data.linkedmdb.org/resource/actor/
y:JN29704
y:JN29704:hasName “Jack Nicholson”
y:JN29704:BornOnDate “1937-04-22”
y:TS2014:title “The Shining”
y:TS2014:releaseDate “1980-05-23”
y:TS2014
JN29704:movieActor
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 6 / 59
![Page 16: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/16.jpg)
RDF Introduction
Everything is an uniquely namedresource
Prefixes can be used to shorten thenames
Properties of resources can be defined
Relationships with other resources canbe defined
Resource descriptions can becontributed by different people/groupsand can be located anywhere in the web
Integrated web “database”
http://data.linkedmdb.org/resource/actor/JN29704
xmlns:y=http://data.linkedmdb.org/resource/actor/
y:JN29704
y:JN29704:hasName “Jack Nicholson”
y:JN29704:BornOnDate “1937-04-22”
y:TS2014:title “The Shining”
y:TS2014:releaseDate “1980-05-23”
y:TS2014
JN29704:movieActor
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 6 / 59
![Page 17: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/17.jpg)
RDF Data Model
Triple: Subject, Predicate (Property), Object(s, p, o)
Subject: the entity that is described (URIor blank node)
Predicate: a feature of the entity (URI)Object: value of the feature (URI, blank
node or literal)
(s, p, o) ∈ (U ∪ B)× U × (U ∪ B ∪ L)
Set of RDF triples is called an RDF graph
U
Subject Object
U B U B L
U: set of URIsB: set of blank nodesL: set of literals
Predicate
Subject Predicate Objecthttp://...imdb.../film/2014 rdfs:label “The Shining”http://...imdb.../film/2014 movie:releaseDate “1980-05-23”http://...imdb.../29704 movie:actor name “Jack Nicholson”. . . . . . . . .
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 7 / 59
![Page 18: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/18.jpg)
RDF Example InstancePrefixes: mdb=http://data.linkedmdb.org/resource/; geo=http://sws.geonames.org/
bm=http://wifo5-03.informatik.uni-mannheim.de/bookmashup/lexvo=http://lexvo.org/id/;wp=http://en.wikipedia.org/wiki/
Subject Predicate Object
mdb: film/2014 rdfs:label “The Shining”mdb:film/2014 movie:initial release date “1980-05-23”’mdb:film/2014 movie:director mdb:director/8476mdb:film/2014 movie:actor mdb:actor/29704mdb:film/2014 movie:actor mdb: actor/30013mdb:film/2014 movie:music contributor mdb: music contributor/4110mdb:film/2014 foaf:based near geo:2635167mdb:film/2014 movie:relatedBook bm:0743424425mdb:film/2014 movie:language lexvo:iso639-3/engmdb:director/8476 movie:director name “Stanley Kubrick”mdb:film/2685 movie:director mdb:director/8476mdb:film/2685 rdfs:label “A Clockwork Orange”mdb:film/424 movie:director mdb:director/8476mdb:film/424 rdfs:label “Spartacus”mdb:actor/29704 movie:actor name “Jack Nicholson”mdb:film/1267 movie:actor mdb:actor/29704mdb:film/1267 rdfs:label “The Last Tycoon”mdb:film/3418 movie:actor mdb:actor/29704mdb:film/3418 rdfs:label “The Passenger”geo:2635167 gn:name “United Kingdom”geo:2635167 gn:population 62348447geo:2635167 gn:wikipediaArticle wp:United Kingdombm:books/0743424425 dc:creator bm:persons/Stephen+Kingbm:books/0743424425 rev:rating 4.7bm:books/0743424425 scom:hasOffer bm:offers/0743424425amazonOfferlexvo:iso639-3/eng rdfs:label “English”lexvo:iso639-3/eng lvont:usedIn lexvo:iso3166/CAlexvo:iso639-3/eng lvont:usesScript lexvo:script/Latn
URI Literal
URI
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 8 / 59
![Page 19: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/19.jpg)
RDF Graph
mdb:film/2014
“1980-05-23”
movie:initial release date
“The Shining”refs:label
bm:books/0743424425
4.7
rev:rating
bm:offers/0743424425amazonOffer
geo:2635167
“United Kingdom”
gn:name
62348447
gn:population
mdb:actor/29704
“Jack Nicholson”
movie:actor name
mdb:film/3418
“The Passenger”
refs:label
mdb:film/1267
“The Last Tycoon”
refs:label
mdb:director/8476
“Stanley Kubrick”
movie:director name
mdb:film/2685
“A Clockwork Orange”
refs:label
mdb:film/424
“Spartacus”
refs:label
mdb:actor/30013
movie:relatedBook
scam:hasOffer
foaf:based nearmovie:actor
movie:directormovie:actor
movie:actor movie:actor
movie:director movie:director
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 9 / 59
![Page 20: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/20.jpg)
RDF Query Model – SPARQL
Query Model - SPARQL Protocol and RDF Query LanguageGiven U (set of URIs), L (set of literals), and V (set of variables), aSPARQL expression is defined recursively:
an atomic triple pattern, which is an element of
(U ∪ V )× (U ∪ V )× (U ∪ V ∪ L)
?x rdfs:label “The Shining”
P FILTER R, where P is a graph pattern expression and R is a built-inSPARQL condition (i.e., analogous to a SQL predicate)
?x rev:rating ?p FILTER(?p > 3.0)
P1 AND/OPT/UNION P2, where P1 and P2 are graph patternexpressions
Example:SELECT ?nameWHERE {
?m r d f s : l a b e l ?name . ?m movie : d i r e c t o r ?d .?d movie : d i r e c t o r n a m e ” S t a n l e y K u b r i c k ” .?m movie : r e l a t e d B o o k ?b . ?b r e v : r a t i n g ? r .FILTER(? r > 4 . 0 )
}© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 10 / 59
![Page 21: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/21.jpg)
SPARQL Queries
SELECT ?nameWHERE {
?m r d f s : l a b e l ?name . ?m movie : d i r e c t o r ?d .?d movie : d i r e c t o r n a m e ” S t a n l e y K u b r i c k ” .?m movie : r e l a t e d B o o k ?b . ?b r e v : r a t i n g ? r .FILTER(? r > 4 . 0 )
}
?m ?dmovie:director
?name
rdfs:label
?b
movie:relatedBook
“Stanley Kubrick”
movie:director name
?rrev:rating
FILTER(?r > 4.0)
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 11 / 59
![Page 22: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/22.jpg)
Outline
1 Introduction – Graph Types
2 Property Graph ProcessingClassificationOnline queryingOffline analytics
3 Graph Analytics Computational ModelsVertex-CentricBlock-CentricMapReduce-BasedModified MapReduce
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 12 / 59
![Page 23: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/23.jpg)
Outline
1 Introduction – Graph Types
2 Property Graph ProcessingClassificationOnline queryingOffline analytics
3 Graph Analytics Computational ModelsVertex-CentricBlock-CentricMapReduce-BasedModified MapReduce
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 13 / 59
![Page 24: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/24.jpg)
Classification [Ammar and Ozsu, 2016]
Graph Dynamism
StaticGraphs
DynamicGraphs
StreamingGraphs
EvolvingGraphs
Algorithm Types
Offline Online
Streaming Incremental
Dynamic
BatchDynamic
Workload Types
OnlineQueries
AnalyticsWorkloads
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 14 / 59
![Page 25: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/25.jpg)
Classification [Ammar and Ozsu, 2016]
Graph Dynamism
StaticGraphs
DynamicGraphs
StreamingGraphs
EvolvingGraphs
Algorithm Types
Offline Online
Streaming Incremental
Dynamic
BatchDynamic
Workload Types
OnlineQueries
AnalyticsWorkloads
Focus here is on the
dynamism of the
graphs in whether or
not they change and
how they change.
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 14 / 59
![Page 26: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/26.jpg)
Classification [Ammar and Ozsu, 2016]
Graph Dynamism
StaticGraphs
DynamicGraphs
StreamingGraphs
EvolvingGraphs
Algorithm Types
Offline Online
Streaming Incremental
Dynamic
BatchDynamic
Workload Types
OnlineQueries
AnalyticsWorkloads
Focus here is on the
dynamism of the
graphs in whether or
not they change and
how they change.
Focus here is on how
algorithms behave as
their input changes.
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 14 / 59
![Page 27: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/27.jpg)
Classification [Ammar and Ozsu, 2016]
Graph Dynamism
StaticGraphs
DynamicGraphs
StreamingGraphs
EvolvingGraphs
Algorithm Types
Offline Online
Streaming Incremental
Dynamic
BatchDynamic
Workload Types
OnlineQueries
AnalyticsWorkloads
Focus here is on the
dynamism of the
graphs in whether or
not they change and
how they change.
Focus here is on how
algorithms behave as
their input changes.
The types of workloads
that the approaches are
designed to handle.
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 14 / 59
![Page 28: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/28.jpg)
Classification [Ammar and Ozsu, 2016]
Graph Dynamism
StaticGraphs
DynamicGraphs
StreamingGraphs
EvolvingGraphs
Algorithm Types
Offline Online
Streaming Incremental
Dynamic
BatchDynamic
Workload Types
OnlineQueries
AnalyticsWorkloads
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 14 / 59
![Page 29: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/29.jpg)
Classification [Ammar and Ozsu, 2016]
Graph Dynamism
StaticGraphs
DynamicGraphs
StreamingGraphs
EvolvingGraphs
Algorithm Types
Offline Online
Streaming Incremental
Dynamic
BatchDynamic
Workload Types
OnlineQueries
AnalyticsWorkloads
Graphs do not
change or we
are not inter-
ested in their
changes – only
a snapshot is
considered.
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 14 / 59
![Page 30: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/30.jpg)
Classification [Ammar and Ozsu, 2016]
Graph Dynamism
StaticGraphs
DynamicGraphs
StreamingGraphs
EvolvingGraphs
Algorithm Types
Offline Online
Streaming Incremental
Dynamic
BatchDynamic
Workload Types
OnlineQueries
AnalyticsWorkloads
Graphs do not
change or we
are not inter-
ested in their
changes – only
a snapshot is
considered.
Graphs change
and we are
interested in
their changes.
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 14 / 59
![Page 31: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/31.jpg)
Classification [Ammar and Ozsu, 2016]
Graph Dynamism
StaticGraphs
DynamicGraphs
StreamingGraphs
EvolvingGraphs
Algorithm Types
Offline Online
Streaming Incremental
Dynamic
BatchDynamic
Workload Types
OnlineQueries
AnalyticsWorkloads
Graphs do not
change or we
are not inter-
ested in their
changes – only
a snapshot is
considered.
Graphs change
and we are
interested in
their changes.
Dynamic
graphs with
high veloc-
ity changes –
not possible to
see the entire
graph at once.
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 14 / 59
![Page 32: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/32.jpg)
Classification [Ammar and Ozsu, 2016]
Graph Dynamism
StaticGraphs
DynamicGraphs
StreamingGraphs
EvolvingGraphs
Algorithm Types
Offline Online
Streaming Incremental
Dynamic
BatchDynamic
Workload Types
OnlineQueries
AnalyticsWorkloads
Graphs do not
change or we
are not inter-
ested in their
changes – only
a snapshot is
considered.
Graphs change
and we are
interested in
their changes.
Dynamic
graphs with
high veloc-
ity changes –
not possible to
see the entire
graph at once.
Dynamic
graphs with un-
known changes
– requires re-
discovery of
the graph (e.g.,
LOD).
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 14 / 59
![Page 33: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/33.jpg)
Classification [Ammar and Ozsu, 2016]
Graph Dynamism
StaticGraphs
DynamicGraphs
StreamingGraphs
EvolvingGraphs
Algorithm Types
Offline Online
Streaming Incremental
Dynamic
BatchDynamic
Workload Types
OnlineQueries
AnalyticsWorkloads
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 14 / 59
![Page 34: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/34.jpg)
Classification [Ammar and Ozsu, 2016]
Graph Dynamism
StaticGraphs
DynamicGraphs
StreamingGraphs
EvolvingGraphs
Algorithm Types
Offline Online
Streaming Incremental
Dynamic
BatchDynamic
Workload Types
OnlineQueries
AnalyticsWorkloads
Computation accesses a
portion of the graph
and the results are
computed for a subset
of vertices; e.g., point-
to-point shortest path,
subgraph matching,
reachability, SPARQL.
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 14 / 59
![Page 35: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/35.jpg)
Classification [Ammar and Ozsu, 2016]
Graph Dynamism
StaticGraphs
DynamicGraphs
StreamingGraphs
EvolvingGraphs
Algorithm Types
Offline Online
Streaming Incremental
Dynamic
BatchDynamic
Workload Types
OnlineQueries
AnalyticsWorkloads
Computation accesses a
portion of the graph
and the results are
computed for a subset
of vertices; e.g., point-
to-point shortest path,
subgraph matching,
reachability, SPARQL.
Computation accesses
the entire graph and
may require multiple
iterations; e.g., PageR-
ank, clustering, graph
colouring, all pairs
shortest path.
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 14 / 59
![Page 36: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/36.jpg)
Classification [Ammar and Ozsu, 2016]
Graph Dynamism
StaticGraphs
DynamicGraphs
StreamingGraphs
EvolvingGraphs
Algorithm Types
Offline Online
Streaming Incremental
Dynamic
BatchDynamic
Workload Types
OnlineQueries
AnalyticsWorkloads
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 14 / 59
![Page 37: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/37.jpg)
Classification [Ammar and Ozsu, 2016]
Graph Dynamism
StaticGraphs
DynamicGraphs
StreamingGraphs
EvolvingGraphs
Algorithm Types
Offline Online
Streaming Incremental
Dynamic
BatchDynamic
Workload Types
OnlineQueries
AnalyticsWorkloads
Sees the en-
tire input in
advance.
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 14 / 59
![Page 38: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/38.jpg)
Classification [Ammar and Ozsu, 2016]
Graph Dynamism
StaticGraphs
DynamicGraphs
StreamingGraphs
EvolvingGraphs
Algorithm Types
Offline Online
Streaming Incremental
Dynamic
BatchDynamic
Workload Types
OnlineQueries
AnalyticsWorkloads
Sees the en-
tire input in
advance.
Sees the input
piece-meal as it
executes.
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 14 / 59
![Page 39: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/39.jpg)
Classification [Ammar and Ozsu, 2016]
Graph Dynamism
StaticGraphs
DynamicGraphs
StreamingGraphs
EvolvingGraphs
Algorithm Types
Offline Online
Streaming Incremental
Dynamic
BatchDynamic
Workload Types
OnlineQueries
AnalyticsWorkloads
Sees the en-
tire input in
advance.
Sees the input
piece-meal as it
executes.
One-pass on-
line algorithm
with limited
memory.
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 14 / 59
![Page 40: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/40.jpg)
Classification [Ammar and Ozsu, 2016]
Graph Dynamism
StaticGraphs
DynamicGraphs
StreamingGraphs
EvolvingGraphs
Algorithm Types
Offline Online
Streaming Incremental
Dynamic
BatchDynamic
Workload Types
OnlineQueries
AnalyticsWorkloads
Sees the en-
tire input in
advance.
Sees the input
piece-meal as it
executes.
One-pass on-
line algorithm
with limited
memory.
Online algo-
rithm with
some info
about forth-
coming input.
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 14 / 59
![Page 41: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/41.jpg)
Classification [Ammar and Ozsu, 2016]
Graph Dynamism
StaticGraphs
DynamicGraphs
StreamingGraphs
EvolvingGraphs
Algorithm Types
Offline Online
Streaming Incremental
Dynamic
BatchDynamic
Workload Types
OnlineQueries
AnalyticsWorkloads
Sees the en-
tire input in
advance.
Sees the input
piece-meal as it
executes.
One-pass on-
line algorithm
with limited
memory.
Online algo-
rithm with
some info
about forth-
coming input.
Sees the en-
tire input
in advance,
which may
change; an-
swers computed
as change oc-
curs.
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 14 / 59
![Page 42: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/42.jpg)
Classification [Ammar and Ozsu, 2016]
Graph Dynamism
StaticGraphs
DynamicGraphs
StreamingGraphs
EvolvingGraphs
Algorithm Types
Offline Online
Streaming Incremental
Dynamic
BatchDynamic
Workload Types
OnlineQueries
AnalyticsWorkloads
Sees the en-
tire input in
advance.
Sees the input
piece-meal as it
executes.
One-pass on-
line algorithm
with limited
memory.
Online algo-
rithm with
some info
about forth-
coming input.
Sees the en-
tire input
in advance,
which may
change; an-
swers computed
as change oc-
curs.
Similar to dynamic,
but computation
happens in batches
of changes.
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 14 / 59
![Page 43: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/43.jpg)
Example Design Points
Graph Dynamism
StaticGraphs
DynamicGraphs
StreamingGraphs
EvolvingGraphs
Algorithm Types
Offline Online
Streaming Incremental
Dynamic
BatchDynamic
Workload Types
OnlineQueries
AnalyticsWorkloads
Compute the query result/perform analytic computation over the graphas it exists.
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 15 / 59
![Page 44: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/44.jpg)
Example Design Points
Graph Dynamism
StaticGraphs
DynamicGraphs
StreamingGraphs
EvolvingGraphs
Algorithm Types
Offline Online
Streaming Incremental
Dynamic
BatchDynamic
Workload Types
OnlineQueries
AnalyticsWorkloads
Compute the query result/perform analytic computation over the graphas it is revealed.
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 15 / 59
![Page 45: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/45.jpg)
Example Design Points
Graph Dynamism
StaticGraphs
DynamicGraphs
StreamingGraphs
EvolvingGraphs
Algorithm Types
Offline Online
Streaming Incremental
Dynamic
BatchDynamic
Workload Types
OnlineQueries
AnalyticsWorkloads
Compute the query result/perform analytic computation on each snap-shot from scratch.
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 15 / 59
![Page 46: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/46.jpg)
Example Design Points
Graph Dynamism
StaticGraphs
DynamicGraphs
StreamingGraphs
EvolvingGraphs
Algorithm Types
Offline Online
Streaming Incremental
Dynamic
BatchDynamic
Workload Types
OnlineQueries
AnalyticsWorkloads
Continuously compute the query result/perform analytic computation asthe input changes.
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 15 / 59
![Page 47: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/47.jpg)
Example Design Points
Graph Dynamism
StaticGraphs
DynamicGraphs
StreamingGraphs
EvolvingGraphs
Algorithm Types
Offline Online
Streaming Incremental
Dynamic
BatchDynamic
Workload Types
OnlineQueries
AnalyticsWorkloads
Compute the query result/perform analytic computation after a batch ofinput changes.
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 15 / 59
![Page 48: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/48.jpg)
Example Design Points – Not all alternatives make sense
Graph Dynamism
StaticGraphs
DynamicGraphs
StreamingGraphs
EvolvingGraphs
Algorithm Types
Offline Online
Streaming Incremental
Dynamic
BatchDynamic
Workload Types
OnlineQueries
AnalyticsWorkloads
Dynamic (or batch-dynamic) algorithms do not make sense for staticgraphs.
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 16 / 59
![Page 49: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/49.jpg)
Graph Processing Systems
System Memory/Disk
ArchitectureComputingparadigm
SupportedWorkloads
Hadoop Disk Parallel/Distributed MapReduce Analytical
Haloop Disk Parallel/Distributed MapReduce Analytical
Pegasus Disk Parallel/Distributed MapReduce Analytical
GraphX Disk Parallel/DistributedMapReduce
(Spark)Analytical
Pregel/Giraph Memory Parallel/Distributed Vertex-Centric Analytical
GraphLab Memory Parallel/Distributed Vertex-Centric Analytical
GraphChi Disk Single machine Vertex-Centric Analytical
Stream Disk Single machine Edge-Centric Analytical
Trinity Memory Parallel/DistributedFlexible using K-V
store on DSMOnline &Analytical
Titan Disk Parallel/DistributedK-V store
(Cassandra)Online
Neo4J Disk Single machineProcedural/Linked-list
Online
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 17 / 59
![Page 50: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/50.jpg)
Graph Workloads
Online graph querying
Reachability
Single source shortest-path
Subgraph matching
SPARQL queries
Offline graph analytics
PageRank
Clustering
Strongly connectedcomponents
Diameter finding
Graph colouring
All pairs shortest path
Graph pattern mining
Machine learning algorithms(Belief propagation, Gaussiannon-negative matrixfactorization)
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 18 / 59
![Page 51: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/51.jpg)
Outline
1 Introduction – Graph Types
2 Property Graph ProcessingClassificationOnline queryingOffline analytics
3 Graph Analytics Computational ModelsVertex-CentricBlock-CentricMapReduce-BasedModified MapReduce
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 19 / 59
![Page 52: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/52.jpg)
Reachability Queries
film 2014(initial release date, “1980-05-23”)
(label, “The Shining”)
books 0743424425(rating, 4.7)
offers 0743424425amazonOffer
geo 2635167(name, “United Kingdom”)
(population, 62348447) actor 29704(actor name, “Jack Nicholson”)
film 3418(label, “The Passenger”)
film 1267(label, “The Last Tycoon”)
director 8476(director name, “Stanley Kubrick”)
film 2685(label, “A Clockwork Orange”)
film 424(label, “Spartacus”)
actor 30013
(relatedBook)
(hasOffer)
(based near)(actor)
(director) (actor)
(actor) (actor)
(director) (director)
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 20 / 59
![Page 53: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/53.jpg)
Reachability Queries
film 2014(initial release date, “1980-05-23”)
(label, “The Shining”)
books 0743424425(rating, 4.7)
offers 0743424425amazonOffer
geo 2635167(name, “United Kingdom”)
(population, 62348447) actor 29704(actor name, “Jack Nicholson”)
film 3418(label, “The Passenger”)
film 1267(label, “The Last Tycoon”)
director 8476(director name, “Stanley Kubrick”)
film 2685(label, “A Clockwork Orange”)
film 424(label, “Spartacus”)
actor 30013
(relatedBook)
(hasOffer)
(based near)(actor)
(director) (actor)
(actor) (actor)
(director) (director)
Can you reach film 1267 from film 2014?
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 20 / 59
![Page 54: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/54.jpg)
Reachability Queries
film 2014(initial release date, “1980-05-23”)
(label, “The Shining”)
books 0743424425(rating, 4.7)
offers 0743424425amazonOffer
geo 2635167(name, “United Kingdom”)
(population, 62348447) actor 29704(actor name, “Jack Nicholson”)
film 3418(label, “The Passenger”)
film 1267(label, “The Last Tycoon”)
director 8476(director name, “Stanley Kubrick”)
film 2685(label, “A Clockwork Orange”)
film 424(label, “Spartacus”)
actor 30013
(relatedBook)
(hasOffer)
(based near)(actor)
(director) (actor)
(actor) (actor)
(director) (director)
Is there a book whose rating is > 4.0 associated with a film that wasdirected by Stanley Kubrick?
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 20 / 59
![Page 55: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/55.jpg)
Reachability Queries
Think of Facebook graph and finding friends of friends.
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 20 / 59
![Page 56: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/56.jpg)
Subgraph Matching
?m ?dmovie:director
?name
rdfs:label
?b
movie:relatedBook
“Stanley Kubrick”
movie:director name
?rrev:rating
FILTER(?r > 4.0)
mdb:film/2014
“1980-05-23”
movie:initial release date
“The Shining”refs:label
bm:books/0743424425
4.7
rev:rating
bm:offers/0743424425amazonOffer
geo:2635167
“United Kingdom”
gn:name
62348447
gn:population
mdb:actor/29704
“Jack Nicholson”
movie:actor name
mdb:film/3418
“The Passenger”
refs:label
mdb:film/1267
“The Last Tycoon”
refs:label
mdb:director/8476
“Stanley Kubrick”
movie:director name
mdb:film/2685
“A Clockwork Orange”
refs:label
mdb:film/424
“Spartacus”
refs:label
mdb:actor/30013
movie:relatedBook
scam:hasOffer
foaf:based nearmovie:actor
movie:directormovie:actor
movie:actor movie:actor
movie:director movie:director
SubgraphM
atching
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 21 / 59
![Page 57: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/57.jpg)
Outline
1 Introduction – Graph Types
2 Property Graph ProcessingClassificationOnline queryingOffline analytics
3 Graph Analytics Computational ModelsVertex-CentricBlock-CentricMapReduce-BasedModified MapReduce
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 22 / 59
![Page 58: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/58.jpg)
PageRank Computation
A web page is important if it is pointed to by other importantpages.
P1 P2
P3
P5P6
P4
r(Pi ) =∑
Pj∈BPi
r(Pj)
|FPj|
r(P2) =r(P1)
2+
r(P3)
3
rk+1(Pi ) =∑
Pj∈BPi
rk(Pj)
|FPj|
BPi: in-neighbours of Pi
FPi: out-neighbours of Pi
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 23 / 59
![Page 59: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/59.jpg)
PageRank Computation
A web page is important if it is pointed to by other importantpages.
P1 P2
P3
P5P6
P4
rk+1(Pi ) =∑
Pj∈BPi
rk(Pj)
|FPj|
Iteration 0 Iteration 1 Iteration 2Rank atIter. 2
r0(P1) = 1/6 r1(P1) = 1/18 r2(P1) = 1/36 5r0(P2) = 1/6 r1(P2) = 5/36 r2(P2) = 1/18 4r0(P3) = 1/6 r1(P3) = 1/12 r2(P3) = 1/36 5r0(P4) = 1/6 r1(P4) = 1/4 r2(P4) = 17/72 1r0(P5) = 1/6 r1(P5) = 5/36 r2(P5) = 11/72 3r0(P6) = 1/6 r1(P6) = 1/6 r2(P6) = 14/72 2
Iterative processing.
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 23 / 59
![Page 60: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/60.jpg)
Outline
1 Introduction – Graph Types
2 Property Graph ProcessingClassificationOnline queryingOffline analytics
3 Graph Analytics Computational ModelsVertex-CentricBlock-CentricMapReduce-BasedModified MapReduce
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 24 / 59
![Page 61: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/61.jpg)
Some Alternative Computational Models for OfflineAnalytics
Vertex-centric (Scatter-Gather)Specify (a) computation at each vertex, and (b) communication withneighbour verticesSynchronous – Pregel [Malewicz et al., 2010], GiraphAsynchronous – GraphLab [Low et al., 2012]
Block-centricSimilar to vertex-centric but on blocks for communication
Connected subgraph of the graph
Blogel [Yan et al., 2014]MapReduce
Need to save in HDFS intermediate results of each iteration – bothgood and badHadoop, Haloop [Bu et al., 2012]
Modified MapReduceBased on Spark [Zaharia et al., 2010; Zaharia, 2016]
Keep intermediate states in memoryProvide fault-tolerance by keeping lineage
GraphX [Gonzalez et al., 2014]
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 25 / 59
![Page 62: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/62.jpg)
Some Alternative Computational Models for OfflineAnalytics
Vertex-centric (Scatter-Gather)Specify (a) computation at each vertex, and (b) communication withneighbour verticesSynchronous – Pregel [Malewicz et al., 2010], GiraphAsynchronous – GraphLab [Low et al., 2012]
Block-centricSimilar to vertex-centric but on blocks for communication
Connected subgraph of the graph
Blogel [Yan et al., 2014]
MapReduceNeed to save in HDFS intermediate results of each iteration – bothgood and badHadoop, Haloop [Bu et al., 2012]
Modified MapReduceBased on Spark [Zaharia et al., 2010; Zaharia, 2016]
Keep intermediate states in memoryProvide fault-tolerance by keeping lineage
GraphX [Gonzalez et al., 2014]
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 25 / 59
![Page 63: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/63.jpg)
Some Alternative Computational Models for OfflineAnalytics
Vertex-centric (Scatter-Gather)Specify (a) computation at each vertex, and (b) communication withneighbour verticesSynchronous – Pregel [Malewicz et al., 2010], GiraphAsynchronous – GraphLab [Low et al., 2012]
Block-centricSimilar to vertex-centric but on blocks for communication
Connected subgraph of the graph
Blogel [Yan et al., 2014]MapReduce
Need to save in HDFS intermediate results of each iteration – bothgood and badHadoop, Haloop [Bu et al., 2012]
Modified MapReduceBased on Spark [Zaharia et al., 2010; Zaharia, 2016]
Keep intermediate states in memoryProvide fault-tolerance by keeping lineage
GraphX [Gonzalez et al., 2014]
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 25 / 59
![Page 64: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/64.jpg)
Some Alternative Computational Models for OfflineAnalytics
Vertex-centric (Scatter-Gather)Specify (a) computation at each vertex, and (b) communication withneighbour verticesSynchronous – Pregel [Malewicz et al., 2010], GiraphAsynchronous – GraphLab [Low et al., 2012]
Block-centricSimilar to vertex-centric but on blocks for communication
Connected subgraph of the graph
Blogel [Yan et al., 2014]MapReduce
Need to save in HDFS intermediate results of each iteration – bothgood and badHadoop, Haloop [Bu et al., 2012]
Modified MapReduceBased on Spark [Zaharia et al., 2010; Zaharia, 2016]
Keep intermediate states in memoryProvide fault-tolerance by keeping lineage
GraphX [Gonzalez et al., 2014]
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 25 / 59
![Page 65: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/65.jpg)
Outline
1 Introduction – Graph Types
2 Property Graph ProcessingClassificationOnline queryingOffline analytics
3 Graph Analytics Computational ModelsVertex-CentricBlock-CentricMapReduce-BasedModified MapReduce
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 26 / 59
![Page 66: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/66.jpg)
Vertex-Centric Computation
“Think like a vertex”
vertex_scatter(vertex v)
Push local computation toneighbours on the out-boundedges
vertex_gather(vertex v)
Gather local computation fromneighbours on the in-bound edges
Continue until all vertices areinactive
Vertex state machine
?
Active Inactive
Vote halt
Message received
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 27 / 59
![Page 67: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/67.jpg)
Vertex-Centric Computation
“Think like a vertex”
vertex_scatter(vertex v)
Push local computation toneighbours on the out-boundedges
vertex_gather(vertex v)
Gather local computation fromneighbours on the in-bound edges
Continue until all vertices areinactive
Vertex state machine
?
Active Inactive
Vote halt
Message received
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 27 / 59
![Page 68: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/68.jpg)
Synchronous Vertex-Centric Computation
Machine 1
Machine 2
Machine 3
Machine 1
Machine 2
Machine 3
Machine 1
Machine 2
Machine 3
CommunicationBarrier
Each machine performsvertex-centric computationon its graph partition
CommunicationBarrier
Superstep 1 Superstep 2 Superstep 3
Computation
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 28 / 59
![Page 69: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/69.jpg)
Synchronous Vertex-Centric Computation
Machine 1
Machine 2
Machine 3
Machine 1
Machine 2
Machine 3
Machine 1
Machine 2
Machine 3
CommunicationBarrier
Each machine performsvertex-centric computationon its graph partition
CommunicationBarrier
Superstep 1 Superstep 2 Superstep 3
Computation
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 28 / 59
![Page 70: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/70.jpg)
Synchronous Vertex-Centric Computation
Machine 1
Machine 2
Machine 3
Machine 1
Machine 2
Machine 3
Machine 1
Machine 2
Machine 3
CommunicationBarrier
Each machine performsvertex-centric computationon its graph partition
CommunicationBarrier
Superstep 1 Superstep 2 Superstep 3
Computation
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 28 / 59
![Page 71: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/71.jpg)
Synchronous Vertex-Centric Computation
Machine 1
Machine 2
Machine 3
Machine 1
Machine 2
Machine 3
Machine 1
Machine 2
Machine 3
CommunicationBarrier
Each machine performsvertex-centric computationon its graph partition
CommunicationBarrier
Superstep 1 Superstep 2 Superstep 3
Computation
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 28 / 59
![Page 72: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/72.jpg)
Synchronous Vertex-Centric Computation
Machine 1
Machine 2
Machine 3
Machine 1
Machine 2
Machine 3
Machine 1
Machine 2
Machine 3
CommunicationBarrier
Each machine performsvertex-centric computationon its graph partition
CommunicationBarrier
Superstep 1 Superstep 2 Superstep 3
Computation
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 28 / 59
![Page 73: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/73.jpg)
Asynchronous Vertex-Centric Computation
No communication barriers. 3
Uses the most recent vertex values. 3
Implemented via distributed locking
Machine 1
Machine 2
Machine 3
Machine 1
Machine 2
Machine 3
v0
v1 v2
v3 v4
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 29 / 59
![Page 74: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/74.jpg)
Asynchronous Vertex-Centric Computation
No communication barriers. 3
Uses the most recent vertex values. 3
Implemented via distributed locking
Machine 1
Machine 2
Machine 3
Machine 1
Machine 2
Machine 3
v0
v1 v2
v3 v4
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 29 / 59
![Page 75: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/75.jpg)
Asynchronous Vertex-Centric Computation
No communication barriers. 3
Uses the most recent vertex values. 3
Implemented via distributed locking
Machine 1
Machine 2
Machine 3
Machine 1
Machine 2
Machine 3
v0
v1 v2
v3 v4
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 29 / 59
![Page 76: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/76.jpg)
Asynchronous Vertex-Centric Computation
No communication barriers. 3
Uses the most recent vertex values. 3
Implemented via distributed locking
Machine 1
Machine 2
Machine 3
Machine 1
Machine 2
Machine 3
v0
v1 v2
v3 v4
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 29 / 59
![Page 77: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/77.jpg)
Asynchronous Vertex-Centric Computation
No communication barriers. 3
Uses the most recent vertex values. 3
Implemented via distributed locking
Machine 1
Machine 2
Machine 3
Machine 1
Machine 2
Machine 3
v0
v1 v2
v3 v4
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 29 / 59
![Page 78: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/78.jpg)
Asynchronous Vertex-Centric Computation
No communication barriers. 3
Uses the most recent vertex values. 3
Implemented via distributed locking
Machine 1
Machine 2
Machine 3
Machine 1
Machine 2
Machine 3
v0
v1 v2
v3 v4
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 29 / 59
![Page 79: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/79.jpg)
Summary of an Experiment [Han et al., 2014]
A large study comparing Giraph, GraphLab, GPS, Mizan.
1 Giraph scales better across graphs;GraphLab scales better across more machines.
2 Distributed locking for asynchronous execution is not scalable –Performance degrades as more machines are used due to lockcontention, termination scheme, lack of message batching
3 Graph storage should be memory and mutation efficient.
4 Message processing optimizations are very important.
5 Workloads have different resource demands
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 30 / 59
![Page 80: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/80.jpg)
Summary of an Experiment [Han et al., 2014]
A large study comparing Giraph, GraphLab, GPS, Mizan.
1 Giraph scales better across graphs;GraphLab scales better across more machines.
2 Distributed locking for asynchronous execution is not scalable –Performance degrades as more machines are used due to lockcontention, termination scheme, lack of message batching
3 Graph storage should be memory and mutation efficient.
4 Message processing optimizations are very important.
5 Workloads have different resource demands
64 machines TW UK
Giraph (byte array) 5.8GB 7.0GBGraphLab (sync) 4.5GB 14GB
TW 16 machines 128 machines
Giraph (byte array) 8.5GB 5.8GBGraphLab (sync) 11GB 3.3GB
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 30 / 59
![Page 81: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/81.jpg)
Summary of an Experiment [Han et al., 2014]
A large study comparing Giraph, GraphLab, GPS, Mizan.
1 Giraph scales better across graphs;GraphLab scales better across more machines.
2 Distributed locking for asynchronous execution is not scalable –Performance degrades as more machines are used due to lockcontention, termination scheme, lack of message batching
3 Graph storage should be memory and mutation efficient.
4 Message processing optimizations are very important.
5 Workloads have different resource demands
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 30 / 59
![Page 82: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/82.jpg)
Summary of an Experiment [Han et al., 2014]
A large study comparing Giraph, GraphLab, GPS, Mizan.
1 Giraph scales better across graphs;GraphLab scales better across more machines.
2 Distributed locking for asynchronous execution is not scalable –Performance degrades as more machines are used due to lockcontention, termination scheme, lack of message batching
3 Graph storage should be memory and mutation efficient.
4 Message processing optimizations are very important.
5 Workloads have different resource demands
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 30 / 59
![Page 83: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/83.jpg)
Summary of an Experiment [Han et al., 2014]
A large study comparing Giraph, GraphLab, GPS, Mizan.
1 Giraph scales better across graphs;GraphLab scales better across more machines.
2 Distributed locking for asynchronous execution is not scalable –Performance degrades as more machines are used due to lockcontention, termination scheme, lack of message batching
3 Graph storage should be memory and mutation efficient.
4 Message processing optimizations are very important.
5 Workloads have different resource demands
No Mutations
Time Memory
Byte array 3 3Hash map 7 7
With Mutations (DMST)
Time Memory
Byte array 77 3Hash map 3 7
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 30 / 59
![Page 84: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/84.jpg)
Summary of an Experiment [Han et al., 2014]
A large study comparing Giraph, GraphLab, GPS, Mizan.
1 Giraph scales better across graphs;GraphLab scales better across more machines.
2 Distributed locking for asynchronous execution is not scalable –Performance degrades as more machines are used due to lockcontention, termination scheme, lack of message batching
3 Graph storage should be memory and mutation efficient.
4 Message processing optimizations are very important.
5 Workloads have different resource demands
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 30 / 59
![Page 85: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/85.jpg)
Summary of an Experiment [Han et al., 2014]
A large study comparing Giraph, GraphLab, GPS, Mizan.1 Giraph scales better across graphs;
GraphLab scales better across more machines.2 Distributed locking for asynchronous execution is not scalable –
Performance degrades as more machines are used due to lockcontention, termination scheme, lack of message batching
3 Graph storage should be memory and mutation efficient.4 Message processing optimizations are very important.5 Workloads have different resource demands
Algorithm CPU Memory Network
PageRank Medium Medium HighSSSP Low Low LowWCC Low Medium MediumDMST High High Medium
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 30 / 59
![Page 86: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/86.jpg)
Outline
1 Introduction – Graph Types
2 Property Graph ProcessingClassificationOnline queryingOffline analytics
3 Graph Analytics Computational ModelsVertex-CentricBlock-CentricMapReduce-BasedModified MapReduce
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 31 / 59
![Page 87: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/87.jpg)
Block-Centric Computation
Blogel [Yan et al., 2014]: “Think like a block”; also “think like agraph” [Tian et al., 2013]
Vertex-centric assumes all vertices communicate over the network;this is not efficient
Read-world graphs have skewed vertex degree distribution
Common in power-law graphsProblem: imbalanced communication workloads
Real-world graphs have large diameters
Common in road networks, web graphs, terrain meshesProblem: one superstep per hop ⇒ too many supersteps
Real-world graphs have high average vertex degree
Common in social networks, mobile communication networksProblem: heavy average communication workloads
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 32 / 59
![Page 88: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/88.jpg)
Blogel Principles
Exploit the partitioning of the graph
Message exchanges only among blocks
Block: a connected subgraph of the graph
Within a block, run a serial in-memory algorithm; no need to follow aBSP model
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 33 / 59
![Page 89: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/89.jpg)
Benefits of Block-Centric Computation
High-degree vertices inside a block send no messages
Fewer number of supersteps
Fewer number of blocks than vertices
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 34 / 59
![Page 90: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/90.jpg)
Example: Weakly Connected Component
Algorithm exchanges vertex id’swith neighbours
id(vi )← min{vi , vj , . . . , vk}where vj , . . . , vk are neighboursof vi
Vertex-centric requires everyvertex sends to its neighboursuntil every vertex is reached
Block-centric needs twoiterations:
1 All vertices in partition Aexchange ids; X and Y sendids to neighbours in partitionB
2 All vertices in partition Bexchange ids
A B
0
X
Y
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 35 / 59
![Page 91: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/91.jpg)
Block Construction
The partitioning algorithm needs to maximize number of vertices thathave all their edges in the same partition
Hash partitioning is not suitable because many vertices will probablyhave at least one cut-edge
URL partitioner
For web graphs: based on domain names of web page nodes
2D partitioner
For spatial networks: based on coordinates of node
Graph Voronoi diagram partitioner
For general graphs
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 36 / 59
![Page 92: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/92.jpg)
Outline
1 Introduction – Graph Types
2 Property Graph ProcessingClassificationOnline queryingOffline analytics
3 Graph Analytics Computational ModelsVertex-CentricBlock-CentricMapReduce-BasedModified MapReduce
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 37 / 59
![Page 93: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/93.jpg)
MapReduce Basics [Li et al., 2014]
For data analysis of very large data sets
Highly dynamic, irregular, schemaless, etc.SQL too heavy
“Embarrassingly parallel problems”
New, simple parallel programming modelData structured as (key, value) pairs
E.g. (doc-id, content), (word, count), etc.
Functional programming style with two functions to be given:
Map(k1,v1) → list(k2,v2)
Reduce(k2, list (v2)) → list(v3)
Implemented on a distributed file system (e.g., Google File System)on very large clusters
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 38 / 59
![Page 94: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/94.jpg)
MapReduce Processing
...Inp
ut
dat
ase
t
Map
Map
Map
Map
(k1, v)
(k2, v)(k2, v)
(k2, v)
(k1, v)
(k1, v)
(k2, v)
Group by k
Group by k
(k1, (v , v , v))
(k1, (v , v , v , v)) Reduce
Reduce
Ou
tpu
td
ata
set
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 39 / 59
![Page 95: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/95.jpg)
MapReduce Architecture
Scheduler
Master
Input Module
Map Module
Combine Module
Partition Module
Map Process
Worker
Input Module
Map Module
Combine Module
Partition Module
Map Process
Worker
Input Module
Map Module
Combine Module
Partition Module
Map Process
Worker
Group Module
Reduce Module
Output Module
Reduce Process
Worker
Group Module
Reduce Module
Output Module
Reduce Process
Worker
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 40 / 59
![Page 96: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/96.jpg)
Execution Flow with Architecture [Dean and Ghemawat, 2008]MapReduce: Simplified Data Processing on Large Clusters
7. When all map tasks and reduce tasks have been completed, the mas-ter wakes up the user program. At this point, the MapReduce callin the user program returns back to the user code.
After successful completion, the output of the mapreduce executionis available in the R output files (one per reduce task, with file namesspecified by the user). Typically, users do not need to combine these Routput files into one file; they often pass these files as input to anotherMapReduce call or use them from another distributed application thatis able to deal with input that is partitioned into multiple files.
3.2 Master Data StructuresThe master keeps several data structures. For each map task andreduce task, it stores the state (idle, in-progress, or completed) and theidentity of the worker machine (for nonidle tasks).
The master is the conduit through which the location of interme-diate file regions is propagated from map tasks to reduce tasks. There -fore, for each completed map task, the master stores the locations andsizes of the R intermediate file regions produced by the map task.Updates to this location and size information are received as map tasksare completed. The information is pushed incrementally to workersthat have in-progress reduce tasks.
3.3 Fault ToleranceSince the MapReduce library is designed to help process very largeamounts of data using hundreds or thousands of machines, the librarymust tolerate machine failures gracefully.
Handling Worker FailuresThe master pings every worker periodically. If no response is receivedfrom a worker in a certain amount of time, the master marks the workeras failed. Any map tasks completed by the worker are reset back to theirinitial idle state and therefore become eligible for scheduling on otherworkers. Similarly, any map task or reduce task in progress on a failedworker is also reset to idle and becomes eligible for rescheduling.
Completed map tasks are reexecuted on a failure because their out-put is stored on the local disk(s) of the failed machine and is thereforeinaccessible. Completed reduce tasks do not need to be reexecutedsince their output is stored in a global file system.
When a map task is executed first by worker A and then later exe-cuted by worker B (because A failed), all workers executing reducetasks are notified of the reexecution. Any reduce task that has notalready read the data from worker A will read the data from worker B.
MapReduce is resilient to large-scale worker failures. For example,during one MapReduce operation, network maintenance on a runningcluster was causing groups of 80 machines at a time to become unreach-able for several minutes. The MapReduce master simply re executed thework done by the unreachable worker machines and continued to makeforward progress, eventually completing the MapReduce operation.
Semantics in the Presence of FailuresWhen the user-supplied map and reduce operators are deterministicfunctions of their input values, our distributed implementation pro-duces the same output as would have been produced by a nonfaultingsequential execution of the entire program.
split 0
split 1
split 2
split 3
split 4
(1) fork
(3) read(4) local write
(1) fork(1) fork
(6) write
worker
worker
worker
Master
UserProgram
outputfile 0
outputfile 1
worker
worker
(2)assignmap
(2)assignreduce
(5) remote
(5) read
Inputfiles
Mapphasr
Intermediate files(on local disks)
Reducephase
Outputfiles
Fig. 1. Execution overview.
COMMUNICATIONS OF THE ACM January 2008/Vol. 51, No. 1 109
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 41 / 59
![Page 97: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/97.jpg)
Hadoop
Most popular MapReduce implementation – developed by Yahoo!Two components
Processing engineHDFS: Hadoop Distributed Storage System – others possibleCan be deployed on the same machine or on different machines
ProcessesJob tracker: hosted on the master node and implements the scheduleTask tracker: hosted on the worker nodes and accepts tasks from job trackerand executes them
HDFSName node: stores how data are partitioned, monitors the status of datanodes, and data dictionaryData node: Stores and manages data chunks assigned to it
Task Tracker Job Tracker Task Tracker
Data Node Name Node Data Node
Worker 1 Name Node Worker n
MapReduce
HDFS
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 42 / 59
![Page 98: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/98.jpg)
HaLoop [Bu et al., 2012]
Overcome MapReduce shortcomings for iterative jobs
Having to save data in HDFS in between each iterationChecking the fixpoint requires a new job at each iteration
Scheduler change: assign to the same machine the map & reducetasks that occur in different iterations but access the same data
Cache invariant data
Cache reduce output to easily check for fixpoint
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 43 / 59
![Page 99: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/99.jpg)
Outline
1 Introduction – Graph Types
2 Property Graph ProcessingClassificationOnline queryingOffline analytics
3 Graph Analytics Computational ModelsVertex-CentricBlock-CentricMapReduce-BasedModified MapReduce
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 44 / 59
![Page 100: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/100.jpg)
Spark System
MapReduce does not perform well in iterative computations
Workflow model is acyclicHave to write to HDFS after each iteration and have to read fromHDFS at the beginning of next iteration
Spark objectives
Better support for iterative programsProvide a complete ecosystemSimilar abstraction (to MapReduce) for programmingMaintain MapReduce fault-tolerance and scalability
Fundamental concepts
RDD: Reliable Distributed DatasetsCaching of working setMaintaining lineage for fault-tolerance
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 45 / 59
![Page 101: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/101.jpg)
Spark System
MapReduce does not perform well in iterative computations
Workflow model is acyclicHave to write to HDFS after each iteration and have to read fromHDFS at the beginning of next iteration
Spark objectives
Better support for iterative programsProvide a complete ecosystemSimilar abstraction (to MapReduce) for programmingMaintain MapReduce fault-tolerance and scalability
Fundamental concepts
RDD: Reliable Distributed DatasetsCaching of working setMaintaining lineage for fault-tolerance
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 45 / 59
![Page 102: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/102.jpg)
Spark System
MapReduce does not perform well in iterative computations
Workflow model is acyclicHave to write to HDFS after each iteration and have to read fromHDFS at the beginning of next iteration
Spark objectives
Better support for iterative programsProvide a complete ecosystemSimilar abstraction (to MapReduce) for programmingMaintain MapReduce fault-tolerance and scalability
Fundamental concepts
RDD: Reliable Distributed DatasetsCaching of working setMaintaining lineage for fault-tolerance
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 45 / 59
![Page 103: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/103.jpg)
Spark Ecosystem [Michiardi, 2015]
NativeSparkApps
SparkSQL
SparkStreaming
MLlib(machinelearning)
GraphX(graph
processing)
Apache Spark
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 46 / 59
![Page 104: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/104.jpg)
Spark Programming Model [Zaharia et al., 2010, 2012]
HDFS
Create RDD
· · ·
RDD
Cache? CacheYes
TransformRDD?
No
Process
No
TransformYes
HDFS
Each transform generates anew RDD that may also becached or processed
Created from HDFS or parallelized arrays;Partitioned across worker machines;May be made persistent lazily;
Processing done on one of the RDDs;Done in parallel across workers;First processing on a RDD is from disk;Subsequent processing of the same RDD from cache
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 47 / 59
![Page 105: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/105.jpg)
Spark Programming Model [Zaharia et al., 2010, 2012]
HDFS
Create RDD
· · ·
RDD
Cache? CacheYes
TransformRDD?
No
Process
No
TransformYes
HDFS
Each transform generates anew RDD that may also becached or processed
Created from HDFS or parallelized arrays;Partitioned across worker machines;May be made persistent lazily;
Processing done on one of the RDDs;Done in parallel across workers;First processing on a RDD is from disk;Subsequent processing of the same RDD from cache
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 47 / 59
![Page 106: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/106.jpg)
Spark Programming Model [Zaharia et al., 2010, 2012]
HDFS
Create RDD
· · ·
RDD
Cache? CacheYes
TransformRDD?
No
Process
No
TransformYes
HDFS
Each transform generates anew RDD that may also becached or processed
Created from HDFS or parallelized arrays;Partitioned across worker machines;May be made persistent lazily;
Processing done on one of the RDDs;Done in parallel across workers;First processing on a RDD is from disk;Subsequent processing of the same RDD from cache
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 47 / 59
![Page 107: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/107.jpg)
Spark Programming Model [Zaharia et al., 2010, 2012]
HDFS
Create RDD
· · ·
RDD
Cache? CacheYes
TransformRDD?
No
Process
No
TransformYes
HDFS
Each transform generates anew RDD that may also becached or processed
Created from HDFS or parallelized arrays;Partitioned across worker machines;May be made persistent lazily;
Processing done on one of the RDDs;Done in parallel across workers;First processing on a RDD is from disk;Subsequent processing of the same RDD from cache
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 47 / 59
![Page 108: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/108.jpg)
Example – Log Mining [Zaharia et al., 2010, 2012]
Load log messages from a file system, create a new file by filtering theerror messages, read this file into memory, then interactively search forvarious patterns
lines = spark.textFile(hdfs://...)
CreateRDD
errors = lines.filter( .startsWith(ERROR))
Transform RDD
messages = errors.map( .split(‘\t ’)(2))
Another transform
cachedMsgs = messages.cache()
Cache results
cachedMsgs.filter( .contains(foo)).count
Action
cachedMsgs.filter( .contains(bar)).count
Another Action
accesses cache
Driver
WorkerWorkerWorker
Block 1 Block 2 Block 3
TasksResults
Cache Cache Cache
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 48 / 59
![Page 109: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/109.jpg)
Example – Log Mining [Zaharia et al., 2010, 2012]
Load log messages from a file system, create a new file by filtering theerror messages, read this file into memory, then interactively search forvarious patternslines = spark.textFile(hdfs://...)
CreateRDD
errors = lines.filter( .startsWith(ERROR))
Transform RDD
messages = errors.map( .split(‘\t ’)(2))
Another transform
cachedMsgs = messages.cache()
Cache results
cachedMsgs.filter( .contains(foo)).count
Action
cachedMsgs.filter( .contains(bar)).count
Another Action
accesses cache
Driver
WorkerWorkerWorker
Block 1 Block 2 Block 3
TasksResults
Cache Cache Cache
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 48 / 59
![Page 110: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/110.jpg)
Example – Log Mining [Zaharia et al., 2010, 2012]
Load log messages from a file system, create a new file by filtering theerror messages, read this file into memory, then interactively search forvarious patternslines = spark.textFile(hdfs://...)
CreateRDD
errors = lines.filter( .startsWith(ERROR))
Transform RDD
messages = errors.map( .split(‘\t ’)(2))
Another transform
cachedMsgs = messages.cache()
Cache results
cachedMsgs.filter( .contains(foo)).count
Action
cachedMsgs.filter( .contains(bar)).count
Another Action
accesses cache
Driver
WorkerWorkerWorker
Block 1 Block 2 Block 3
TasksResults
Cache Cache Cache
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 48 / 59
![Page 111: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/111.jpg)
Example – Log Mining [Zaharia et al., 2010, 2012]
Load log messages from a file system, create a new file by filtering theerror messages, read this file into memory, then interactively search forvarious patternslines = spark.textFile(hdfs://...)
CreateRDD
errors = lines.filter( .startsWith(ERROR))
Transform RDD
messages = errors.map( .split(‘\t ’)(2))
Another transform
cachedMsgs = messages.cache()
Cache results
cachedMsgs.filter( .contains(foo)).count
Action
cachedMsgs.filter( .contains(bar)).count
Another Action
accesses cache
Driver
WorkerWorkerWorker
Block 1 Block 2 Block 3
TasksResults
Cache Cache Cache
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 48 / 59
![Page 112: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/112.jpg)
Example – Log Mining [Zaharia et al., 2010, 2012]
Load log messages from a file system, create a new file by filtering theerror messages, read this file into memory, then interactively search forvarious patternslines = spark.textFile(hdfs://...)
CreateRDD
errors = lines.filter( .startsWith(ERROR))
Transform RDD
messages = errors.map( .split(‘\t ’)(2))
Another transform
cachedMsgs = messages.cache()
Cache results
cachedMsgs.filter( .contains(foo)).count
Action
cachedMsgs.filter( .contains(bar)).count
Another Action
accesses cache
Driver
WorkerWorkerWorker
Block 1 Block 2 Block 3
TasksResults
Cache Cache Cache
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 48 / 59
![Page 113: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/113.jpg)
Example – Log Mining [Zaharia et al., 2010, 2012]
Load log messages from a file system, create a new file by filtering theerror messages, read this file into memory, then interactively search forvarious patternslines = spark.textFile(hdfs://...)
CreateRDD
errors = lines.filter( .startsWith(ERROR))
Transform RDD
messages = errors.map( .split(‘\t ’)(2))
Another transform
cachedMsgs = messages.cache()
Cache results
cachedMsgs.filter( .contains(foo)).count
Action
cachedMsgs.filter( .contains(bar)).count
Another Action
accesses cache
Driver
WorkerWorkerWorker
Block 1 Block 2 Block 3
TasksResults
Cache Cache Cache
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 48 / 59
![Page 114: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/114.jpg)
Example – Log Mining [Zaharia et al., 2010, 2012]
Load log messages from a file system, create a new file by filtering theerror messages, read this file into memory, then interactively search forvarious patternslines = spark.textFile(hdfs://...)
CreateRDD
errors = lines.filter( .startsWith(ERROR))
Transform RDD
messages = errors.map( .split(‘\t ’)(2))
Another transform
cachedMsgs = messages.cache()
Cache results
cachedMsgs.filter( .contains(foo)).count
Action
cachedMsgs.filter( .contains(bar)).count
Another Action
accesses cache
Driver
WorkerWorkerWorker
Block 1 Block 2 Block 3
Tasks
Results
Cache Cache Cache
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 48 / 59
![Page 115: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/115.jpg)
Example – Log Mining [Zaharia et al., 2010, 2012]
Load log messages from a file system, create a new file by filtering theerror messages, read this file into memory, then interactively search forvarious patternslines = spark.textFile(hdfs://...)
CreateRDD
errors = lines.filter( .startsWith(ERROR))
Transform RDD
messages = errors.map( .split(‘\t ’)(2))
Another transform
cachedMsgs = messages.cache()
Cache results
cachedMsgs.filter( .contains(foo)).count
Action
cachedMsgs.filter( .contains(bar)).count
Another Action
accesses cache
Driver
WorkerWorkerWorker
Block 1 Block 2 Block 3
TasksResults
Cache Cache Cache
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 48 / 59
![Page 116: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/116.jpg)
Example – Log Mining [Zaharia et al., 2010, 2012]
Load log messages from a file system, create a new file by filtering theerror messages, read this file into memory, then interactively search forvarious patternslines = spark.textFile(hdfs://...)
CreateRDD
errors = lines.filter( .startsWith(ERROR))
Transform RDD
messages = errors.map( .split(‘\t ’)(2))
Another transform
cachedMsgs = messages.cache()
Cache results
cachedMsgs.filter( .contains(foo)).count
Action
cachedMsgs.filter( .contains(bar)).count
Another Action
accesses cache
Driver
WorkerWorkerWorker
Block 1 Block 2 Block 3
TasksResults
Cache Cache Cache
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 48 / 59
![Page 117: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/117.jpg)
Example – Log Mining [Zaharia et al., 2010, 2012]
Load log messages from a file system, create a new file by filtering theerror messages, read this file into memory, then interactively search forvarious patternslines = spark.textFile(hdfs://...)
CreateRDD
errors = lines.filter( .startsWith(ERROR))
Transform RDD
messages = errors.map( .split(‘\t ’)(2))
Another transform
cachedMsgs = messages.cache()
Cache results
cachedMsgs.filter( .contains(foo)).count
Action
cachedMsgs.filter( .contains(bar)).count
Another Action
accesses cache
Driver
WorkerWorkerWorker
Block 1 Block 2 Block 3
TasksResults
Cache Cache Cache
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 48 / 59
![Page 118: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/118.jpg)
RDD and Processing
HDFS
lines = spark.textFile(hdfs://...)
linesError, msg1
Warn, msg2
Error, msg1
Info, msg8
Warn, msg2
Info, msg8
Error, msg3
Info, msg5
Info, msg5
Error, msg4
Warn, msg9
Error, msg1
errors
errors = lines.filter( .startsWith(ERROR))
Error, msg1
Error, msg1
Error, msg3 Error, msg4
Error, msg1
messages
messages = errors.map .split(‘\t ’)(2)
msg1
msg1
msg3 msg4
msg1
Th
ese
are
no
tye
tg
ener
ated
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 49 / 59
![Page 119: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/119.jpg)
RDD and Processing
lineserrors
messagesmsg1
msg1
msg3 msg4
msg1
lines
messages.filter( .contains(foo)).count
errors
messagesmsg1
msg1
msg3 msg4
msg1
Now
the
RD
Ds
are
mat
eria
lized
;
Co
mm
and
no
tye
tex
ecu
ted
Driver
messages.filter( .contains(foo)).count
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 49 / 59
![Page 120: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/120.jpg)
GraphX [Gonzalez et al., 2014]
Built on top of Spark
Objective is to combine data analytics with graph processing
Unify computation on tables and graphs
Carefully convert graph to tabular representation
Native GraphX API or can accommodate vertex-centric computation
NativeSparkApps
SparkSQL
SparkStreaming
MLlib(machinelearning)
GraphX(graph
processing)
Apache Spark
Vertex-centric API
AppApp
App App
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 50 / 59
![Page 121: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/121.jpg)
GraphX: Representation of Graphs as Tables
A
B
C
D
E
F
G
H
I
J
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 51 / 59
![Page 122: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/122.jpg)
GraphX: Representation of Graphs as Tables
Partition 1
Partition 2
A
B
C
D
E
F
G
H
I
J
Edge-disjointpartitioning
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 51 / 59
![Page 123: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/123.jpg)
GraphX: Representation of Graphs as Tables
Partition 1
Partition 2
Mac
hin
e1
Mac
hin
e2
Vertex Table
(RDD)v-prop:vertex prop.
A
B
C
D
E
F
G
H
I
J
Edge-disjointpartitioning
A v-prop
B v-prop
...
I v-prop
D v-prop
E v-prop
F v-prop
J v-prop
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 51 / 59
![Page 124: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/124.jpg)
GraphX: Representation of Graphs as Tables
Partition 1
Partition 2
Mac
hin
e1
Mac
hin
e2
Vertex Table
(RDD)v-prop:vertex prop.
Edge Table
(RDD)e-prop:edge prop.
A
B
C
D
E
F
G
H
I
J
Edge-disjointpartitioning
A v-prop
B v-prop
...
I v-prop
D v-prop
E v-prop
F v-prop
J v-prop
A e-prop B
A e-prop C
...
F e-prop G
A e-prop D
A e-prop E...
E e-prop F
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 51 / 59
![Page 125: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/125.jpg)
GraphX: Representation of Graphs as Tables
Partition 1
Partition 2
Mac
hin
e1
Mac
hin
e2
Vertex Table
(RDD)v-prop:vertex prop.
Edge Table
(RDD)e-prop:edge prop.
A
B
C
D
E
F
G
H
I
J
Edge-disjointpartitioning
A v-prop
B v-prop
...
I v-prop
D v-prop
E v-prop
F v-prop
J v-prop
A e-prop B
A e-prop C
...
F e-prop G
A e-prop D
A e-prop E...
E e-prop FJoining vertices
and edgesMove vertices to edges
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 51 / 59
![Page 126: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/126.jpg)
GraphX: Representation of Graphs as Tables
Partition 1
Partition 2
Mac
hin
e1
Mac
hin
e2
Vertex Table
(RDD)v-prop:vertex prop.
Edge Table
(RDD)e-prop:edge prop.
RoutingTable
(RDD)
A
B
C
D
E
F
G
H
I
J
Edge-disjointpartitioning
A v-prop
B v-prop
...
I v-prop
D v-prop
E v-prop
F v-prop
J v-prop
A e-prop B
A e-prop C
...
F e-prop G
A e-prop D
A e-prop E...
E e-prop F
A 1 2
B 1
...
I 1
F 1 2
D 2
E 2
J 2
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 51 / 59
![Page 127: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/127.jpg)
GraphX: Computation Model
Mac
hin
e1
Mac
hin
e2
Vertex Table Edge Table
A v-prop
B v-prop
...
I v-prop
D v-prop
E v-prop
F v-prop
J v-prop
A e-prop B
A e-prop C
...
F e-prop G
A e-prop D
A e-prop E...
E e-prop F
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 52 / 59
![Page 128: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/128.jpg)
GraphX: Computation Model
Mac
hin
e1
Mac
hin
e2
Vertex Table Edge Table
A v-prop
B v-prop
...
I v-prop
D v-prop
E v-prop
F v-prop
J v-prop
A e-prop B
A e-prop C
...
F e-prop G
A e-prop D
A e-prop E...
E e-prop F
First Phase: JoinVertex table on Edge table
Triples View
A v-prop e-prop B v-prop
A v-prop e-prop C v-prop
C v-prop e-prop G v-prop
...
E v-prop e-prop G v-prop
J v-prop e-prop G v-prop
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 52 / 59
![Page 129: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/129.jpg)
GraphX: Computation Model
Mac
hin
e1
Mac
hin
e2
Vertex Table Edge Table
A v-prop
B v-prop
...
I v-prop
D v-prop
E v-prop
F v-prop
J v-prop
A e-prop B
A e-prop C
...
F e-prop G
A e-prop D
A e-prop E...
E e-prop FTriples View
A v-prop e-prop B v-prop
A v-prop e-prop C v-prop
C v-prop e-prop G v-prop
...
E v-prop e-prop G v-prop
J v-prop e-prop G v-prop
Second Phase: Compute neighbourhoodGroup-by aggregate
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 52 / 59
![Page 130: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/130.jpg)
GraphX: Operators
Table transform operators – inherited from Sparkmap(func) Return a new RDD formed by passing each element
of the source through a function func
filter(func) Return a new RDD formed by selecting thoseelements of the source on which func returns true
flatMap(func) Similar to map, but each input item can be mappedto 0 or more output items
mapPartitions(func) Similar to map, but runs separately on each partition(block) of the RDD, so func must be of type Iterator
sample(repl , fraction,seed)
Sample a fraction fraction of the data, with orwithout replacement (set repl accordingly), using agiven random number generator seed
union(otherDataset)intersection()
Return a new RDD containing the union/intersectionof the elements in the source RDD and the argument
groupByKey() Operates on a RDD of (K, V) pairs, returns a RDDof (K, Iterable<V>) pairs
reduceByKey(func, . . .) Operates on a RDD of (K, V) pairs, returns a RDDof (K, V) pairs where the values for each key areaggregated using the given reduce function func
Graph operatorsGraph(vertex coll ,edge coll)
Logically binds together a pair of vertex and edgeproperty collections into a property graph; verifiesthat each vertex occurs only once and edges connectexisting vertices
triplets(vertex coll ,vertex coll , edge coll)
Returns the triplets view of the graph
mrTriplets(map,reduce) MapReduce triplets - encodes the two-stage processof join to create triplets and group by
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 53 / 59
![Page 131: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/131.jpg)
GraphX: Operators
Table transform operators – inherited from Spark
Graph operatorsGraph(vertex coll ,edge coll)
Logically binds together a pair of vertex and edgeproperty collections into a property graph; verifiesthat each vertex occurs only once and edges connectexisting vertices
triplets(vertex coll ,vertex coll , edge coll)
Returns the triplets view of the graph
mrTriplets(map,reduce) MapReduce triplets - encodes the two-stage processof join to create triplets and group by
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 53 / 59
![Page 132: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/132.jpg)
Acknowledgements
This presentation draws upon collaborative research and discussions withthe following colleagues
Khaled Ammar, U. Waterloo Khuzaima Daudjee, U. Waterloo
Young Han, U. Waterloo
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 54 / 59
![Page 133: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/133.jpg)
Thank you!
Research supported by
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 55 / 59
![Page 134: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/134.jpg)
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 56 / 59
![Page 135: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/135.jpg)
References I
Ammar, K. and Ozsu, M. T. (2016). Approaches to graph processing – an overview. Inpreparation.
Bu, Y., Howe, B., Balazinska, M., and Ernst, M. D. (2012). The HaLoop approach tolarge-scale iterative data analysis. VLDB J., 21(2):169–190.
Dean, J. and Ghemawat, S. (2008). Mapreduce: Simplified data processing on largeclusters. Commun. ACM, 51(1):107–113.
Gonzalez, J. E., Xin, R. S., Dave, A., Crankshaw, D., Franklin, M. J., and Stoica, I.(2014). GraphX: graph processing in a distributed dataflow framework. In Proc. 11thUSENIX Symp. on Operating System Design and Implementation, pages 599–613.
Han, M., Daudjee, K., Ammar, K., Ozsu, M. T., Wang, X., and Jin, T. (2014). Anexperimental comparison of Pregel-like graph processing systems. Proc. VLDBEndowment, 7(12):1047–1058.
Li, F., Ooi, B. C., Ozsu, M. T., and Wu, S. (2014). Distributed data management usingMapReduce. ACM Comput. Surv., 46(3):Article No. 31.
Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., and Hellerstein, J. M.(2012). Distributed graphlab: A framework for machine learning in the cloud. Proc.VLDB Endowment, 5(8):716–727.
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 57 / 59
![Page 136: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/136.jpg)
References II
Malewicz, G., Austern, M. H., Bik, A. J. C., Dehnert, J. C., Horn, I., Leiser, N., andCzajkowski, G. (2010). Pregel: a system for large-scale graph processing. In Proc.ACM SIGMOD Int. Conf. on Management of Data, pages 135–146.
Michiardi, P. (2015). Introduction to spark internals. Slideshare. Available from:http://www.slideshare.net/michiard/introduction-to-spark-internals?
qid=511145e7-79d7-41d8-a133-9e705d4933c3&v=qf1&b=&from_search=11 [Lastretrieved: 9 July 2015].
Tian, Y., Balmin, A., Corsten, S. A., Tatikonda, S., and McPherson, J. (2013). From“think like a vertex” to “think like a graph”. Proc. VLDB Endowment, 7(3):193–204.
Yan, D., Cheng, J., Lu, Y., and Ng, W. (2014). Blogel: A block-centric framework fordistributed computation on real-world graphs. Proc. VLDB Endowment,7(14):1981–1992.
Zaharia, M. (2016). An Architecture for Fast and General Data Processing on LargeClusters. ACM Books. Forthcoming.
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M. J.,Shenker, S., and Stoica, I. (2012). Resilient distributed datasets: A fault-tolerantabstraction for in-memory cluster computing. In Proc. 9th USENIX Symp. onNetworked Systems Design & Implementation, pages 2–2.
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 58 / 59
![Page 137: An Introduction to Graph Analytics Platforms · NDL subjects ndlna my Experi-ment Italian Museums medu-cator MARC Codes List Man-chester Reading Lists Lotico ... Bio-graphie data](https://reader034.fdocuments.in/reader034/viewer/2022042307/5ed3e5e7e6b26c6781533345/html5/thumbnails/137.jpg)
References III
Zaharia, M., Chowdhury, M., Franklin, M. J., Shenker, S., and Stoica, I. (2010). Spark:Cluster computing with working sets. In Proc. 2nd USENIX Workshop on Hot Topicsin Cloud Computing, pages 10–10.
© M. Tamer Ozsu Dagstuhl Spring School (2016/03/07–09) 59 / 59