Multi-Relational Graph Structures: From Algebra to Application
-
Upload
marko-rodriguez -
Category
Technology
-
view
2.152 -
download
3
description
Transcript of Multi-Relational Graph Structures: From Algebra to Application
Multi-Relational Graph Structures:
From Algebra to Application
Marko A. RodriguezT-5, Center for Nonlinear StudiesLos Alamos National Laboratoryhttp://markorodriguez.com
October 27, 2009
Abstract
In a single-relational graph, all edges share the same meaning. In contrast,a multi-relational graph represents a heterogeneous set of edges, whereeach edge is labeled to denote the type of relationship that exists betweenthe two vertices it connects. While less prevalent than the single-relationalgraph, the multi-relational graph structure is beginning to see widespreadadoption in both academia and industry. An algebra for manipulatingmulti-relational graph structures and the realization of this algebra invarious application scenarios is presented in this talk.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
My Computer Eco-System
• Articles/Lectures: LATEX, OmniGraffle, LATEX iT
• Software Development: Java, R Statistics
• Large-Scale Data Management: MySQL, Neo4j, Linked Process
• Graph/Network Analysis: iGraph, rPath, Confluence, JUNG
• Web of Data/Semantic Web: Open Sesame (SAIL), Protege
• 3D Modeling/Programming: Java Monkey Engine, Blender, Gimp
• Audio Synthesis/Processing: Max/MSP, ProTools
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
Outline
• Introduction to Graph Structures
? The Single-Relational Graph? The Multi-Relational Graph
• A Multi-Relational Path Algebra
• Application to Recommender Systems
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
Outline
• Introduction to Graph Structures
? The Single-Relational Graph? The Multi-Relational Graph
• A Multi-Relational Path Algebra
• Application to Recommender Systems
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
A Single-Relational Graph Example
Article A
Article B
Article E
Article D
Article C Article F
An article citation graph. Each vertex is an article and each edge denotes that the tail
article cites the head article.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
Single-Relational Graph Notation
• Homogenous set of vertex and edge types.1
• There are undirected and directed forms, where V is the set of verticesand E is an unordered or ordered set of edges, respectively.
? G = (V,E ⊆ V × V )? G = (V,E ⊆ (V × V )) (we will focus on directed graphs in this talk.)
• There is an adjacency matrix representation A ∈ 0, 1n×n, wheren = |V | and
Ai,j =
1 if (i, j) ∈ E0 otherwise.
1Unless the graph is bipartite.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
The Use of Single-Relational Graphs in Research
• Most common graph structure used in 90’s and 00’s research.
? scholarly graphs: citations, coauthorship relationships, article/journalusage, acknowledgements, funding sources.
? technological graphs: software dependencies, Internet architecture,web citations.
? communication graphs: email correspondence, cell phone calls,micro-blog “following.”
• Numerous algorithms have been developed for analyzing such structures.
? geodesics: radius, diameter, eccentricity, closeness, betweenness.? spectral: eigenvector centrality, pagerank, spreading activation.? community detection: walktrap, edge betweenness, leading
eigenvector, spin-glass.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
My Work with Single-Relational Graphs
• Articles of mine that make use of the single-relational graph structure.
? Bollen, J., Van de Sompel, H., Hagberg, A., Bettencourt, L.M.A, Chute, R., Rodriguez, M.A., Balakireva,L.L., “Clickstream Data Yields High-Resolution Maps of Science,” PLoS One, 4(3), e4803, 2009.
? Bollen, J., Van de Sompel, H., Rodriguez, M.A., “Towards Usage-Based Impact Metrics: First Results from the MESURProject,” Joint Conference on Digital Libraries (JCDL), 2008.
? Rodriguez, M.A., Pepe, A., “On the Relationship Between the Structural and Socioacademic Communities ofa Coauthorship Network,” Journal of Informetrics, 2(3), pp. 195–201, 2008.
? Rodriguez, M.A., Bollen, J., “An Algorithm to Determine Peer-Reviewers,” Conference on Information and KnowledgeManagement (CIKM), pp. 319–328, 2008.
? Rodriguez, M.A., Bollen, J., Van de Sompel, H., “Mapping the Bid Behavior of Conference Referees,” Journalof Informetrics, 1(1), pp. 62–82, 2007.
? Bollen, J., Rodriguez, M.A., Van de Sompel, H., “Journal Status,” Scientometrics, 69(3), pp. 669-687, 2006.? Rodriguez, M.A., Bollen, J., Van de Sompel, H., “The Convergence of Digital Libraries and the Peer-Review Process,”
Journal of Information Science, 32(2), pp. 149–159, 2006.? Rodriguez, M.A., Steinbock, D.J., “A Social Network for Societal-Scale Decision-Making Systems,” Proceedings of the
North American Association for Computational Social and Organizational Science Conference, 2004.
• They focus on supporting/analyzing/ranking/visualizing the scholarlycommunity and large-scale decision support systems (i.e. governancesystems).
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
Studying the Reading Behavior of Scholars
Bollen, J., Van de Sompel, H., Hagberg, A., Bettencourt, L.M.A, Chute, R., Rodriguez, M.A., Balakireva, L.L., “Clickstream
Data Yields High-Resolution Maps of Science,” PLoS One, 4(3), e4803, 2009.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
Studying Characteristics that Lead to Coauthorship
!!
!!
!
!
!
!!
!!
!
!!
!
!
!
!
!
!!
!!!!
!
!
!
!
!
!
!! !
!!!
!
!!
!
!
!!! !!
!
!
!!
!
!
!! !
! !!!
!
!!!
!
!
!!
! !!
!!
!!!!
!
!!
!
!
!
!!
!
!!
!!!!
!
!
!
!!
!!
!!!
!!!
!
!!
!!
!!!
!
!!
!
!! !!
!!
!
!
!
!!!
!!
!
!
!!!!
! !
!
!
!!
!
!!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!!!!!
!!
!
!!
!!!
!
!
!!!
!
!
!
!
! !
!
!!
!
!
!!
! !
!!
!!
!
!
!
!
!
!
!!
!!!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!!
!
!
!
!
!
!!
!
!
!
!!!
!!!!
!
!!
!!
!
!! !
!!!
!
!
!
!!
!
!
!!
!!
!
!
!!
!!
!
!!
!!
!
!
!
!
!!
!
Rodriguez, M.A., Pepe, A., “On the Relationship Between the Structural and Socioacademic Communities of a Coauthorship
Network,” Journal of Informetrics, 2(3), pp. 195–201, 2008.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
Predicting Referees Based on Coauthorship Patterns
SOMPEL
NELSON
LAGOZE
ARMS
MARCHIONINI
JESUROGA
FOO
LIM
SUGIMOTO
BORGMAN
FOX
MARSHALL
LEGGETTCHEN
GOLOVCHINSKYFURUTA
WITTEN
CUNNINGHAM
FUHR
NEUHOLD SOLVBERG
TAYLOR
SUMNER
FULKER
WRIGHT
JANEE
THANOS
KHOO
GIERSCH
ALLEN
SANCHEZRASMUSSENLYNCH BAKER
MOORE
RAY
CASSEL
TSE
CASTELLI
RECKER BISHOFF
Rodriguez, M.A., Bollen, J., Van de Sompel, H., “Mapping the Bid Behavior of Conference Referees,” Journal of Informetrics,
1(1), pp. 62–82, 2007.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
A Multi-Relational Graph Example
Person A
Article B
Person E
Article D
Article C Article F
authored
cites
authored
peer-reviewed
cites
authored
acknowledges
A scholarly graph. Each vertex is a scholarly artifact and each edge denotes the type of
directed relationship that exists between the two scholarly artifacts it connects.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
Multi-Relational Graph Notation
• Heterogeneous set of vertex types and a heterogeneous set of edge types.
• This data structure is becoming more prevalent due to both the SemanticWeb/Web of Data movement and the graph database movement.
• G = (V,E = E0, E1, . . . , Em ⊆ (V ×V )), where E is a family of typededge sets of length m. For example, E0 is the “authored” adjacencymatrix, E1 is the “cites” adjacency matrix, etc.
• There is a three-way tensor representation A ∈ 0, 1n×n×m, where
Aki,j =
1 if (i, j) ∈ Ek : k ≤ m0 otherwise.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
A Three-Way Tensor Representation of aMulti-Relational Graph
0
0
0
0
1
1
1
0
0
0
0
0 0 0
0
0
0
0 0
0
0
0 0
0
0
A ! 0, 1n!n!m
|E| =m |V | = n
|V|=n
authoredcite
s...
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
My Work with Multi-Relational Graphs
• Articles of mine that make use of the multi-relational graph structure.
? Rodriguez M.A., Shinavier, J., “Exposing Multi-Relational Networks to Single-Relational Network AnalysisAlgorithms,” Journal of Informetrics, in press, 2009. [Presented in the second part of this presentation.]
? Rodriguez, M.A., Geldart, J., “An Evidential Path Logic for Multi-Relational Networks,” Proceedings of the Associationfor the Advancement of Artificial Intelligence Spring Symposium: Technosocial Predictive Analytics Symposium, volumeSS-09-09, pp. 114–119, 2009.
? Rodriguez M.A., Bollen, J., Van de Sompel, H., “Automatic Metadata Generation using Associative Networks,” ACMTransactions on Information Systems, 27(2), pp. 1–20, 2009.
? Rodriguez, M.A., “Grammar-Based Random Walkers in Semantic Networks,” Knowledge-Based Systems,21(7), pp. 727–739, 2008. [Presented in the third part of this presentation.]
? Rodriguez, M.A., “Social Decision Making with Multi-Relational Networks and Grammar-Based Particle Swarms,” HawaiiInternational Conference on Systems Science (HICSS), pp. 39–49, 2007.
? Bollen, J., Rodriguez, M.A., Van de Sompel, H., Balakireva, L.L., Hagberg, A., “The Largest Scholarly SemanticNetwork...Ever.,” ACM World Wide Web Conference, 2007.
? Rodriguez, M.A., “A Multi-Relational Network to Support the Scholarly Communication Process,” International Journalof Public Information Systems, 2007(1), pp. 13–29, 2007.
• They focus on multi-relational graph algorithms, logic, informationretrieval, decision support systems, bibliometrics, recommender systems.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
Resource Description Framework Graph
lanl:person_a
lanl:article_b
lanl:person_e
lanl:article_d
lanl:article_c lanl:article_f
lanl:authored
lanl:cites
lanl:authored
lanl:peer_reviewed
lanl:cites
lanl:authored
lanl:acknowledges
lanl: !" http://lanl.gov#
A scholarly graph. Each vertex and edge type is identified by a Uniform Resource
Identifier and thus, encoded in the address space of the World Wide Web.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
Resource Description Framework Graph
• Vertices and edge labels are identified by Uniform Resource Identifiers(URI). Thus, there is a single address space where the world’s data canbe interrelated.
• G = (U ∪ B) × U × (U ∪ B ∪ L), where U is the set of all URIs, B isthe set of all blank nodes, and L is the set of all literals.
• There exist various implementations of this standard model.
? Open Sesame (http://openrdf.org/).? AllegroGraph (http://www.franz.com/agraph/allegrograph/).? OWLim (http://www.ontotext.com/owlim/).? Jena (http://jena.sourceforge.net/)
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
Linked Data and the Web of Data
http://dbpedia.org/resource/Albert Einstein
dbpedia:Albert_Einstein
dbpedia:United_States
flickr:Albert_Einstein
dbpedia:Alfred_Kleiner
dbpedia:citizenship dbpedia:doctoralAdvisor
dbpprop:hasPhotoCollection
http://farm4.static.flickr.com/3408/3547607847_65abfd03a5_m.jpg
foaf:depiction
http://farm1.static.flickr.com/60/170621225_661c705eb4_m.jpg
foaf:depiction
http://dbpedia.org/resource/Albert Einstein
http://www4.wiwiss.fu-berlin.de/flickrwrappr/photos/Albert_Einstein
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
My Work with Resource Description Framework Graphs
• Articles of mine that make use of RDF/Web of Data/Semantic Web.
? Rodriguez, M.A., “Interpretations of the Web of Data,” Data Management in the Semantic Web, eds. H. Jin and Z. Lv,Nova, in press, 2009.
? Rodriguez, M.A., “A Reflection on the Structure and Process of the Web of Data,” Bulletin of the American Society forInformation Science and Technology, 35(6), pp. 38–43, 2009.
? Rodriguez, M.A., “A Graph Analysis of the Linked Data Cloud,” http://arxiv.org/abs/0903.0194, February2009.
? Rodriguez, M.A., Allen, D.W., Shinavier, J., Ebersole, G., “A Recommender System to Support the ScholarlyCommunication Process,” KRS-2009-02, 2009. [Presented in the third part of this presentation.]
? Rodriguez, M.A., Watkins, J., “Faith in the Algorithm, Part 2: Computational Eudaemonics,” Lecture Notes in ArtificialIntelligence, eds. Velsquez, J.D., Howlett, R.J., and Jain, L.C., volume 5712, pp 813–820, 2009.
? Rodriguez, M.A., “General-Purpose Computing on a Semantic Network Substrate,” Emergent Web Intelligence,Advanced Information and Knowledge Processing series, Eds. R. Chbeir, A. Hassanien, A. Abraham, and Y. Badr, inpress, 2008.
? Rodriguez, M.A., Pepe, A., Shinavier, J., “The Dilated Triple,” Emergent Web Intelligence, Advanced Information andKnowledge Processing series, eds. R. Chbeir, A. Hassanien, A. Abraham, and Y. Badr, in press, 2008.
• They focus on graph algorithms, distributed computing, graph-basedcomputing, recommender systems.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
The Web of Data as of March 2009
geospecies
freebase
dbpedia
libris
geneid
interpro
hgnc
symbol
pubmed
mgi
geneontology
uniprot
pubchem
unists
omim
homologene
pfam
pdb
reactome
chebi
uniparc
kegg
cas
uniref
prodomprosite
taxonomy
dailymed
linkedct
acm
dblprkbexplorer
laascnrs
newcastle
eprints
ecssouthampton
irittoulouseciteseer
pisa
resexibm
ieee
rae2001
budapestbme
eurecom
dblphannover
diseasome
drugbank
geonames
yago
opencyc
w3cwordnet
umbel
linkedmdb
rdfbookmashup
flickrwrappr
surgeradio
musicbrainz myspacewrapper
bbcplaycountdata
bbcprogrammes
semanticweborg
revyu
swconferencecorpus
lingvoj
pubguide
crunchbase
foafprofiles
riese
qdos
audioscrobbler
flickrexporter
bbcjohnpeel
wikicompany
govtrack
uscensusdata
openguides
doapspace
bbclatertotp
eurostat
semwebcentral
dblpberlin
siocsites
jamendo
magnatuneworldfactbook
projectgutenberg
opencalais
rdfohloh
virtuososponger
geospecies
freebase
dbpedia
libris
geneid
interpro
hgnc
symbol
pubmed
mgi
geneontology
uniprot
pubchem
unists
omim
homologene
pfam
pdb
reactome
chebi
uniparc
kegg
cas
uniref
prodomprosite
taxonomy
dailymed
linkedct
acm
dblprkbexplorer
laascnrs
newcastle
eprints
ecssouthampton
irittoulouseciteseer
pisa
resexibm
ieee
rae2001
budapestbme
eurecom
dblphannover
diseasome
drugbank
geonames
yago
opencyc
w3cwordnet
umbel
linkedmdb
rdfbookmashup
flickrwrappr
surgeradio
musicbrainz myspacewrapper
bbcplaycountdata
bbcprogrammes
semanticweborg
revyu
swconferencecorpus
lingvoj
pubguide
crunchbase
foafprofiles
riese
qdos
audioscrobbler
flickrexporter
bbcjohnpeel
wikicompany
govtrack
uscensusdata
openguides
doapspace
bbclatertotp
eurostat
semwebcentral
dblpberlin
siocsites
jamendo
magnatuneworldfactbook
projectgutenberg
opencalais
rdfohloh
virtuososponger
Rodriguez, M.A., “A Graph Analysis of the Linked Data Cloud,” http://arxiv.org/abs/0903.0194, February 2009.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
The Web of Data as of March 2009data set domain data set domain data set domain
audioscrobbler music govtrack government pubguide booksbbclatertotp music homologene biology qdos socialbbcplaycountdata music ibm computer rae2001 computerbbcprogrammes media ieee computer rdfbookmashup booksbudapestbme computer interpro biology rdfohloh socialchebi biology jamendo music resex computercrunchbase business laascnrs computer riese governmentdailymed medical libris books semanticweborg computerdblpberlin computer lingvoj reference semwebcentral socialdblphannover computer linkedct medical siocsites socialdblprkbexplorer computer linkedmdb movie surgeradio musicdbpedia general magnatune music swconferencecorpus computerdoapspace social musicbrainz music taxonomy referencedrugbank medical myspacewrapper social umbel generaleurecom computer opencalais reference uniref biologyeurostat government opencyc general unists biologyflickrexporter images openguides reference uscensusdata governmentflickrwrappr images pdb biology virtuososponger referencefoafprofiles social pfam biology w3cwordnet referencefreebase general pisa computer wikicompany businessgeneid biology prodom biology worldfactbook governmentgeneontology biology projectgutenberg books yago generalgeonames geographic prosite biology . . .
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
Application Development on the Web of Data
Web of Data
127.0.0.1 127.0.0.2 127.0.0.3
Application 1 Application 2 Application 3
structures structuresstructures
processes processes processes
127.0.0.1 127.0.0.2 127.0.0.3
Application 1 Application 2 Application 3a. b.
structures structures structures
processes processes processes
a.) standard model b.) Web of Data model — public data changes the development
paradigm.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
A Key/Value Graph Example
A
B
E
D
C F
type = personname = Markoage = 29
type = articlename = "Algori..."created = 1/1/09
type = articlename = "Network..."created = 2/1/08
type = citesweight = 1.0
type = authoredweight = 1.0
type = citesweight = 1.0
type = acknowledgesweight = 1.0
type = personname = Johanage = 37
type = peer-reviewedweight = -1.0
type = authoredweight = 0.5
type = authoredweight =1.0
type = articlename = "A Distributed..."created = 12/1/07
type = articlename = "Linked..."created = 1/30/09
A scholarly graph. Both vertices and edges maintain a key/value pair map that allows metadata to be
attached to them.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
Key/Value Graph
• G = (V,E ⊆ (V ×V ), λ : (V ∪E)×Ω→ Σ), where Ω is the set of keysand Σ is the set of values.
• Has a convenient representation in object-oriented programminglanguages and used by various standards and graph packages.
? GraphML (http://graphml.graphdrawing.org/).? Neo4j (http://neo4j.org).? NetworkX (http://networkx.lanl.gov).? Confluence (http://markorodriguez.com/docs/conf/api/).? iGraph (http://igraph.sourceforge.net/).
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
Outline
• Introduction to Graph Structures
? The Single-Relational Graph? The Multi-Relational Graph
• A Multi-Relational Path Algebra
• Application to Recommender Systems
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
Problem Statement
• There is a need to port all the known single-relational graph analysisalgorithms over to the multi-relational domain.
? Why?: There is a large body of algorithms in the domain of single-relational graph analysis.
? Why?: Multi-relational graph structures are becoming more prevalentand can be used to model more complex structures.
• The set of single-relational graph analysis algorithms should not be“blindly” applied to multi-relational graphs.
? Why?: For example, 〈marko, knows, johan〉 says more about socialcommunicaiton than 〈marko, livesInSameCityAs, bob〉.
? Why?: Multi-relational graph analysis algorithms must respect themeaning of the edges.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
Solution Statement
• Provide an algebra to map a multi-relational graph to a“semantically-rich” single-relational graph that can be subjectedto all the known single-relational graph analysis algorithms.
Rodriguez M.A., Shinavier, J., “Exposing Multi-Relational Networks to
Single-Relational Network Analysis Algorithms,” Journal of Informetrics,
ISSN:1751-1577, Elsevier, doi:10.1016/j.joi.2009.06.004,
http://arxiv.org/abs/0806.2274, LA-UR-08-03931, in press, 2009.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
A Three-Way Tensor Representation of aMulti-Relational Graph
As stated previously, a three-way tensor can be used to represent amulti-relational graph. If
G = (V,E = E0, E1, . . . , Em ⊆ (V × V ))
is a multi-relational graph, then A ∈ 0, 1n×n×m and
Aki,j =
1 if (i, j) ∈ Ek : k ≤ m0 otherwise.
A is the three-way tensor representation of the multi-relational graph.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
The General Purpose of the Path Algebra
• Map a multi-relational tensor A ∈ 0, 1n×n×m to a single-relational path matrix
Z ∈ Rn×n+ — this path matrix is a weighted single-relational graph.
0
0
0
0
1
1
1
0
0
0
0
0 0 0
0
0
0
0 0
0
0
0 0
0
0
0
0
0
72
1
15.3
0
0
0
23
0
24 00
0
0
0
4 0
0
0
0 12
0
0
A ! 0, 1n!n!m Z ! Rn!n+
1
3 4
5
2
24
1
72
423
15.3
12
!
• The created single-relational graph’s edges are loaded with meaning. For example,
given the right tensor, it is possible to create a coauthorship graph for scholars from
the same university who are not on the same project, but share a graduate student.
• The theorems of the algebra can be used to manipulate your operation to a more
efficient form.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
The Elements of the Path Algebra
• A ∈ 0, 1n×n×m: a three-way tensor representation of a multi-relationalgraph.
• Z ∈ Rn×n+ : a path matrix derived by means of operations applied to A.
——————————————————————————————
• Cj ∈ 0, 1n×n: a “to” path filter.
• Ri ∈ 0, 1n×n: a “from” path filter.
• Ei,j ∈ 0, 1n×n: an entry path filter.
• I ∈ 0, 1n×n: the identity matrix as a self-loop filter.
• 1 ∈ 1n×n: a matrix in which all entries are equal to 1.
• 0 ∈ 0n×n: a matrix in which all entries are equal to 0.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
The Operations of the Path Algebra
• A ·B: ordinary matrix multiplication determines the number of (A,B)-paths between vertices.
• A>: matrix transpose inverts path directionality.
• A B: Hadamard, entry-wise multiplication applies a filter to selectivelyexclude paths.
• n(A): not generates the complement of a 0, 1n×n matrix.
• c(A): clip generates a 0, 1n×n matrix from a Rn×n+ matrix.
• v±(A): vertex generates a 0, 1n×n matrix from a Rn×n+ matrix, where
only certain rows or columns contain non-zero values.
• λA: scalar multiplication weights the entries of a matrix.
• A + B: matrix addition merges paths.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
hA1 : authored
i hA2 : cites
i hA3 : contains
i hA4 : category
i hA5 : developed
iExample Scholarly Tensor Used in the Remainder of the
Presentation
• A1 authored : human→ article
• A2 cites : article→ article
• A3 contains : journal→ article
• A4 category : journal→ subject category
• A5 developed : human→ program/software.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
The Traverse Operation
• An interesting aspect of the single-relational adjacency matrix A ∈ 0, 1n×n is that when it is raised
to the kth power, the entry A(k)i,j is equal to the number of paths of length k that connect vertex i to
vertex j.
• Given, by definition, that A(1)i,j (i.e. Ai,j) represents the number of paths that go from i to j of length
1 (i.e. a single edge) and by the rules of ordinary matrix multiplication,
A(k)i,j =
∑l∈V
A(k−1)i,l ·Al,j : k ≥ 2.
0
0
1
0
0
0 0
1
0 0
0
1
0
0
0 0
1
0
·0
0
0
0
0
0 1
0
0
=
a b c
a b c
a
b
c
a b c a b c
a
b
c
a
b
c
there is a path of length 2 from a to c
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
hA1 : authored
i hA2 : cites
i hA3 : contains
i hA4 : category
i hA5 : developed
iThe Traverse Operation
Z = A1 · A2 · A1>,Zi,j defines the number of paths from vertex i to vertex j such that a path goes from author i to one the
articles he or she has authored, from that article to one of the articles it cites, and finally, from that cited
article to its author j. Semantically, Z is an author-citation single-relational path matrix.
Human A
authored
Article B
authored
Human D
Article Ccites
author-citation
A1
A2
A1!
Z
• NOTE: All diagrams are with respect to a “source” vertex (the blue vertex) in order to preserve clarity. In reality, theoperations operate on all vertices in parallel.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
The Filter Operation
Various path filters can be defined and applied using the entry-wiseHadamard matrix product denoted , where
A B =
A1,1 ·B1,1 · · · A1,m ·B1,m... . . . ...
An,1 ·Bn,1 · · · An,m ·Bn,m
.
0
0
0
72
1
15.3
0
0
0
23
0
24 00
0
0
0
4 0
0
0
0 12
0
0
0
0
0
1
1
0
0
0
0
1
0
0 00
0
0
0
0 0
0
0
0 0
0
0! =
0
0
0
72
1
0
0
0
0
23
0
0 00
0
0
0
0 0
0
0
0 0
0
0
Path Matrix Path Filter Filtered Path Matrix
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
The Filter Operation
• A 1 = A• A 0 = 0• A B = B A• A (B + C) = (A B) + (A C)• A> B> = (A B)>.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
The Not Filter
The not filter is useful for excluding a set of paths to or from a vertex.
n : 0, 1n×n → 0, 1n×n
with a function rule of
n(A)i,j =
1 if Ai,j = 00 otherwise.
0
0
0
1
1
1
0
0
0
1
0
1 00
0
0
0
1 0
0
0
0 1
0
0=n
1
1
1
0
0
0
1
1
1
0
1
0 11
1
1
1
0 1
1
1
1 0
1
1
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
The Not Filter
If A ∈ 0, 1n×n, then
• n(n(A)) = A• A n(A) = 0• n(A) n(A) = n(A).
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
hA1 : authored
i hA2 : cites
i hA3 : contains
i hA4 : category
i hA5 : developed
iThe Not Filter
A coauthorship path matrix is
Z = A1 · A1> n(I)
Human A
authored
Article B
Human Ccoauthor
A1 A1!
Z
authored
coauthor
n(I)
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
The Clip Filter
The general purpose of clip is to take a path matrix and “clip,” ornormalize, it to a 0, 1n×n matrix.
c : Rn×n+ → 0, 1n×n
c(Z)i,j =
1 if Zi,j > 00 otherwise.
0
0
0
72
1
15.3
0
0
0
23
0
24 00
0
0
0
4 0
0
0
0 12
0
0
0
0
0
1
1
1
0
0
0
1
0
1 00
0
0
0
1 0
0
0
0 1
0
0=c
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
The Clip Filter
If A,B ∈ 0, 1n×n and Y,Z ∈ Rn×n+ , then
• c(A) = A• c(n(A)) = n(c(A)) = n(A)• c(Y Z) = c(Y) c(Z)• n(A B) = c (n(A) + n(B))• n(A + B) = n(A) n(B)
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
hA1 : authored
i hA2 : cites
i hA3 : contains
i hA4 : category
i hA5 : developed
iThe Clip Filter
Suppose we want to create an author citation path matrix that does not allow self citation or coauthorcitations.
Z =
„A1 · A2 · A1>
«| z
cites
n
„c
„A1 · A1> n(I)
««| z
no coauthors
n(I)|zno self
Human A
authored
Article B
authored
Human E
Article Ccites
author-citation
A1
A2
A1!
Z
authored
Human D
A1!
authored
coauthor
self n(I)
n!c!A1 · A1! ! n(I)
""
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
hA1 : authored
i hA2 : cites
i hA3 : contains
i hA4 : category
i hA5 : developed
iThe Clip Filter
However, using various theorems of the algebra,
Z =(A1 · A2 · A1>
)︸ ︷︷ ︸
cites
n(c(A1 · A1> n(I)
))︸ ︷︷ ︸
no coauthors
n(I)︸︷︷︸no self
becomes
Z =(A1 · A2 · A1>
) n(c(A1 · A1>
)) n(I).
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
The Vertex Filter
In many cases, it is important to filter out particular paths to and from avertex.
v− : Rn×n+ × N→ 0, 1n×n,
v−(Z)i,j =
1 if
∑k∈V Zi,k > 0
0 otherwise
turns a non-zero column into an all 1-column and
v+ : Rn×n+ × N→ 0, 1n×n,
v+(Z)i,j =
1 if
∑k∈V Zk,j > 0
0 otherwise
turns a non-zero row into an all 1-row.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
The Vertex Filter
0
23
2
0
1
0
0
0
0
0
0
0 10
0
0
0
0 0
32
0
0 0
0
0
1
1
1
1
1
0
0
0
0
0
0
0 10
0
0
0
1 0
1
1
1 0
0
0=v!
v+ not diagrammed, but acts the same except for makes 1-rows. Two import filters are the column and
row filters, C ∈ 0, 1n×n and R ∈ 0, 1n×n, respectively.
1
1
1
1
1
0
0
0
0
0
0
0 00
0
0
0
0 0
0
0
0 0
0
0
0
0
1
0
0
0
0
0
0
1
0
0 00
0
1
0
0 0
1
0
0 0
0
1C2 = R3 =
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
The Vertex Filter
• v−(Ci) = Ci
• v+(Rj) = Rj
• v−(Z) = v+(Z>)>• v+(Z) = v−(Z>)>.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
hA1 : authored
i hA2 : cites
i hA3 : contains
i hA4 : category
i hA5 : developed
iThe Vertex Filter
Assume that vertex 1 is the social science subject category vertex and we want to create a journalcitation graph for social science journals only.
Z =hv+“C1 A4
” A3
i| z
soc.sci. journal articles
·A2 ·»A3> v
−„
R1 A4>«–
| z articles in soc.sci. journals
.
Social Science
Journal A
Journal E
Journal FArticle D
Article Ccategory
contains
contains
contains
Article B
cites
cites
category
v+!C1 !A4
"A3
A2
A2
A3!
A3!v!
!R1 !A4"
"
1social-science journal citation
Z
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
hA1 : authored
i hA2 : cites
i hA3 : contains
i hA4 : category
i hA5 : developed
iThe Vertex Filter
hv+“C1 A4
” A3
i| z
soc.sci. journal articles
0000
0J-A
0
0
1111
1 00
00
0
0 0000 000
A-B
A-C
A-D
J-E
J-F
S
J-A A-B A-C A-DJ-E J-FS
0 000
0 0
0 01 0 0 0 00 01 0 0 0 0
0
00
0
C1
0000
0J-A
0
0
0011
0 00
00
0
0 0000 000
A-B
A-C
A-D
J-E
J-F
S
J-A A-B A-C A-DJ-E J-FS
0 000
0 0
0 00 0 0 0 00 00 0 0 0 0
0
00
0
A4
0011
0J-A
0
0
0011
0 00
11
0
1 1100 001
A-B
A-C
A-D
J-E
J-F
S
J-A A-B A-C A-DJ-E J-FS
0 011
0 0
0 00 0 0 0 00 00 0 0 0 0
0
11
0
v+(C1 !A4)
0000
0J-A
0
0
0000
0 00
00
0
0 1000 000
A-B
A-C
A-D
J-E
J-F
S
J-A A-B A-C A-DJ-E J-FS
0 001
0 1
0 00 0 0 0 00 00 0 0 0 0
0
00
0! =
0000
0J-A
0
0
0000
0 00
00
0
0 1000 000
A-B
A-C
A-D
J-E
J-F
S
J-A A-B A-C A-DJ-E J-FS
0 001
0 0
0 00 0 0 0 00 00 0 0 0 0
0
00
0
A3 v+(C1 !A4) !A3
! =0000
0J-A
0
0
0011
0 00
00
0
0 0000 000
A-B
A-C
A-D
J-E
J-F
S
J-A A-B A-C A-DJ-E J-FS
0 000
0 0
0 00 0 0 0 00 00 0 0 0 0
0
00
0
C1 !A4
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
hA1 : authored
i hA2 : cites
i hA3 : contains
i hA4 : category
i hA5 : developed
iThe Vertex Filter
Z =[v+(C1 A4
) A3]︸ ︷︷ ︸
soc.sci. journal articles
·A2 ·[A3> v−
(R1 A4>
)]︸ ︷︷ ︸
articles in soc.sci. journals
.
However,
v−(R1 A4>
)= v−
((C1 A4
)>)Cx = R>x
= v+(C1 A4
)>v+(Z) =v−(Z>)>.
Therefore, because A> B> = (A B)>,
Z =[v+(C1 A4
) A3]︸ ︷︷ ︸
reused
·A2 · [v+(C1 A4
) A3]︸ ︷︷ ︸
reused
>.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
The Weight and Merge Operations
• λZ: scalar multiplication weights paths.
• Y + Z: matrix addition merges paths.
0
0
0
72
1
15.3
0
0
0
23
0
24 00
0
0
0
4 0
0
0
0 12
0
0
0
0
0
10
1
0
0
0
0
1
0
0 00
0
34
0
0 0
0
0
0 2
0
0+ =
0
0
0
2
15.3
0
0
0
24
0
24 00
0
34
0
4 0
0
0
0 14
0
0
82
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
hA1 : authored
i hA2 : cites
i hA3 : contains
i hA4 : category
i hA5 : developed
iThe Weight and Merge Operations
Z = 0.6(A1 · A1> n(I)
)︸ ︷︷ ︸
coauthorship
+ 0.4(A5 · A5> n(I)
)︸ ︷︷ ︸
co-development
merges the article and software program collaboration path matrices asspecified by their respective weights of 0.6 and 0.4. The semantics of theresultant is a software program and article collaboration path matrix thatfavors article collaboration over software program collaboration. Asimplification of the previous composition is
Z =[0.6(A1 · A1>
)+ 0.4
(A5 · A5>
)] n(I).
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
Outline
• Introduction to Graph Structures
? The Single-Relational Graph? The Multi-Relational Graph
• A Multi-Relational Path Algebra
• Application to Recommender Systems
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
kReef: A Scholarly Recommendation Engine
1. The scholarly community is modeled using a multi-relational graph.
2. A “walker”-version of the path algebra is applied to the graph to support scholars.
Multi-Relational Graph Database
ontologyinstances
Grammar WalkerEngine TranslatorsAnalytics
Engine
Graphical User Interface
1
2
Rodriguez, M.A., Allen, D.W., Shinavier, J., Ebersole, G., “A Recommender System to Support the Scholarly Communication
Process,” KRS-2009-02, http://arxiv.org/abs/0905.1594, 2009.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
kReef: Ontology Classes
core:Agent core:Item core:Event
core:Personcore:Group
core:Organization core:Project
core:Call
core:Collection
core:Book
core:Journal
core:Librarycore:Dataset
core:FundingOpportunity
core:Magazine
core:Document
core:Article
core:Viewgraph
core:Webpage
core:Media
core:Academic
core:Commerical
core:Government
core:Audio
core:Image
core:Videocore:CallForChapters
core:CallForPapers
core:CallForProposals
core:CallForTutorials
core:CallForWorkshops
core:Software
core:Conference
core:Course
core:Meeting
core:Panel
core:Presentation
core:Session
core:SocialEvent
core:Tutorial
core:Workshop
core:Keynotecore:Newspaper
core:Proceedings
core:Reefsource
Ag
Gr Pe
Or
Ac
Cm
Gv
Pj
It Ev
Do Co
Ar Bo
Jo
Lb
Mg
Np
Po
Vg
Wp
Md
Fu
Da
Sw
Ca Au
Im
Vi
Cc
Cp
Cl
Ct
Cw
Cs
Cf
Me
Kn
Pn
Wk
Tu
Se
Ss
Ps
• NOTE: All edges denote an rdf:subClassOf relationship (either directly or inferred).
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
kReef: Ontology Properties
Table 1: core:Reefsource rdf:Property relationsrdf:Property rdfs:domain rdfs:range
core:title core:Reefsource xsd:string
core:abstract core:Reefsource xsd:string
core:guid core:Reefsource xsd:string
Table 2: core:Agent rdf:Property relationsrdf:Property rdfs:domain rdfs:range
core:attends core:Agent core:Event
core:created core:Agent core:Item
core:member core:Group core:Person
core:subGroup core:Group core:Group
core:firstName core:Person xsd:string
core:lastName core:Person xsd:string
core:occupation core:Person xsd:string
core:sex core:Person core:Gender
Table 3: core:Item rdf:Property relationsrdf:Property rdfs:domain rdfs:range
core:cites core:Item core:Item
core:containedIn core:Item core:Collection
core:creationTime core:Item xsd:dateTime
core:doi core:Item xsd:anyURI
core:publisher core:Item core:Group
core:dueDate core:Call xsd:dateTime
core:callFor core:Call core:Reefsource
core:contains core:Collection core:Item
core:editor core:Collection core:Agent
core:isbn core:Collection xsd:anyURI
core:issn core:Collection xsd:anyURI
core:oaipmh core:Library xsd:anyURI
core:startPage core:Article xsd:int
core:endPage core:Article xsd:int
core:number core:Article xsd:int
core:volume core:Article xsd:int
Table 4: core:Event rdf:Property relationsrdf:Property rdfs:domain rdfs:range
core:startTime core:Event xsd:dateTime
core:endTime core:Event xsd:dateTime
core:presents core:Event core:Item
core:organizedBy core:Event core:Agent
core:subEvent core:Event core:Event
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
kReef: Instance Data Ingestion
Multi-Relational Graph Database
ontologyinstances
arXiv CiteULike
CiteSeerCrossRef
BibSonomy
CogPrintsCogPrints
Connotea
ACM, IEEE, IOP, Springer, Blackwell, Elsevier, etc.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
kReef: Grammar Walker Engine Overview
• A walker-based implementation of the path algebra is applied to thescholarly model in order to support scholars in their professional lives.The path description is known as a “grammar” because it can be modeledas a finite state machine embedded in the walker.
? identify articles related to some interesting resource.? identify collaborators for a funding opportunity.? identify a publication venue for a newly created article.? identify referees to review an article.? identify resources of interest in one’s community.
Rodriguez, M.A., “Grammar-Based Random Walkers in Semantic Networks,” Knowledge-Based Systems, 21(7), pp. 727–739,
http://arxiv.org/abs/0803.4355, 2008.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
kReef: Grammar Walker Engine Algorithm, Part 1
• First, when trying to solve a recommendation problem, determine whichabstract path should be searched to find a solution — this is usuallybased on hunch and then validated using real-world data.
? For example, what makes a good peer-reviewer/referee for an article:someone that is cited by the article and their respective coauthors.Moreover, a referee should not include the authors of the article ortheir coauthors one step away in the coauthorship network (conflict ofinterest).
• Let us denote the path description/grammar/contraint ψ.
Rodriguez, M.A., Bollen, J., “An Algorithm to Determine Peer-Reviewers,” Conference on Information and Knowledge
Management (CIKM), pp. 319–328, http://arxiv.org/abs/cs/0605112, 2008.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
kReef: Grammar Walker Engine Algorithm, Part 2
• Program a collection of discrete walkers to traverse the abstractpath defined by ψ. Each walker starts at some vertex i ∈ V and withan energy value ε ∈ R. As it walks the graph, its energy decays.
? Given the peer-review/referee example, the source vertex is the articlethat requires a set of referees.
!!t=1
t=2
t=3
i
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
kReef: Grammar Walker Engine Algorithm, Part 3
• The solution to the problem is where the highest energy flow inthe network exists after k time steps.
? Given the peer-review example, the highest energy vertices are thosepeople most competent to review the article in question.
In short,Ψ× P(V )→ ω,
where Ψ is the set of all grammars, P(V ) is the set of all sets of sourcevertices, and ω : V → R is the resultant energy flow for each vertex in thegraph. Or,
Grammar︸ ︷︷ ︸path description
× Set<Vertex>︸ ︷︷ ︸source vertices
→ Map<Vertex, Double>︸ ︷︷ ︸ranked results
.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
Other Application Scenarios
• Populating metadata poor resources with data propagated from metadatarich resources. Walkers take particular paths, pick up metadata fromrich resources, and attach metadata to atrophied resources.
? Rodriguez M.A., Bollen, J., Van de Sompel, H., “Automatic Metadata Generation using Associative Networks,” ACM
Transactions on Information Systems, 27(2), pp. 1–20, http://arxiv.org/abs/0807.0023, 2009.
• Generate a context-senstive representative decision-making structure thatreflects the voting behavior of the full population even as the actual votingpopulation wanes in size.
? Rodriguez, M.A., “Social Decision Making with Multi-Relational Networks and Grammar-Based Particle Swarms,” Hawaii
International Conference on Systems Science (HICSS), pp. 39–49, http://arxiv.org/abs/cs/0609034, 2007.
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
Future Work in this Area
• Further develop the path algebra. Explore other matrix and tensoroperations and determine if they are meaningful in the context ofmanipulating multi-relational graphs.
• Develop a programming language (Turing Complete?) to easilyrepresent path descriptions for walkers. Make it easier for developersto deploy swarms of walkers within a multi-relational network for variousapplication scenarios.
? Recommender systems? Vertex and edge ranking systems? Information retrieval systems? General graph analysis
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
Conclusion
• Thank you for your time...
? My homepage: http://markorodriguez.com? Linked Process: http://linkedprocess.org? Neno/Fhat: http://neno.lanl.gov? Collective Decision Making Systems: http://cdms.lanl.gov? Faith in the Algorithm: http://faithinthealgorithm.net? MESUR: http://www.mesur.org
MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009