Graph Analysis of Candidate GQL Features · 2/28/2019  · • The idea is to have this landscape...

Post on 24-Jul-2020

4 views 0 download

Transcript of Graph Analysis of Candidate GQL Features · 2/28/2019  · • The idea is to have this landscape...

GraphAnalysisofCandidateGQLFeaturesGraphQueryLanguageProject

ExistingLanguagesWorkingGroupThomasFrisendal

thomasf@tf-informatik.dk,@VizDataModeler2019-02-26

The”ExistingLanguagesWorkingGroup”• Inpreparationtothecommencement ofplanningforGQL, interestedparties-- drawnfromindustry(Neo4j,Oracle,RedisLabsand

TigerGraph),thecommunity(anoteddatamodellingexpertandpublishedtechnicalauthor),andacademia(theUniversityofTalcainChile)-- formedaninformalworkinggroupcalledthe“ExistingLanguagesWorkingGroup”.

• Wehaveworkedinanincrementalfashiononsystematically identifying,surveying,analysingandcomparinggraphquerylanguagefeatures,drawnfromthefollowingexistingquerylanguages:• Cypher• PGQL• GSQL• SQLPGQ[Framework:2020,Foundation:2020,SQL/PGQIWD,ERF-035• G-CORE.

• Wehopetocompriseacatalogueof:• thegroupsoffeatures• towhichextent(ifatall)these aresupportedineachlanguage• exemplarsyntax• supplementaryartifactstoaidintheunderstandingoftheunderlyingsemantics• grammarconstructs• andanyadditionaldetailsofinterest.

• TheideaistohavethislandscapeofexistingquerylanguagesinordertoinformthedesignanddevelopmentofGQLbyvirtue of awell-informedworkplanandhelpingtoleadtoamorerobustoutcome;i.e.thiswouldhelpustohaveclearandmeaningfuldiscussionsonscopeandpriorities,andwillfacilitateclearandunambiguousdesignchoices.Moreover,thiswillhelpusto identifyareasofconsolidation,innovationandopportunitiesforlanguageinteroperationinGQL(forexample,withSPARQL).

CombattingComplexity:TheELWGGraphDatabase• Establishingananalyticalgraphdatabaseforall5languagesacrossall212features

• Downtothekeywordlevelforeachfeatureofeachlanguageacross5descriptive(text/syntax)dimensions• Nowinits3rdedition• Methodology:

• Consolidateallsheetsintoone• GenerateMERGEcommandsforthefeaturestreeandthe5languages(bywayofExcelformulas)• Somemanualintervention(removeCR’sandchange;’sto§’s)• LoadintoNeo4j• Connectallcomponents• BuildtagsforDescriptors,GrammarTagsandSyntaxTags• BuildaKeywordtagtreebasedonallofthe3above• Dosomereporting(thispptandsomeexcelsheets)

• Willbemadeavailabetophase2andintheGQLdesignwork(foranalysis)• Ambition:Pragmatic,analyticalsupporttool,notanormativesource• Errarehumanumest– reporterrorsandomissions,please(afewknownissuesalready)

Curren

tMetaMod

el

StatisticsNodetypes Count Minrels MaxrelsFeature 212 6 14FeatureArea 6 1 17FeatureGroup 30 2 27InclDoc 5 80 549InclLang 1306 4 4Language 5 208 311GCOREFeature 212 2 18GSQLFeature 212 2 30OpenCypherFeature 212 2 29PGQLFeature 212 1 25SQLFeature 212 2 29DescriptorTag 401 1 22GrammarTag 299 1 424KeywordTag 659 1 247SyntaxTag 214 1 247

TheFeaturesTree

Comparison

ofPlann

edor

Implem

entedFeatures

GCORE GSQL OpenCypher

PGQL SQL

Implem

entatio

nStatus

(Not=’X’)

GCORE:72,GSQL:152,Cypher:168,PGQL:113,SQL:140

Implem

entatio

nStatus

NotSup

ported

(’X’)

GCORE:118,GSQL:54,Cypher:43,PGQL:99,SQL:71

TheDe

scrip

torTags

TheGrammarTags

FunctionInvocation(Cypher)

NotDefined(SQL)

TheSyntaxGraph

Partofthe

SyntaxG

raph

Zoom

inginona”W

ord”

inth

eSyntaxGraph

Even

MoreTagsinth

eKeyw

ordGraph

Essentially theSyntaxTagsenhanced withkeywordsextractedfromtheDescriptorandGrammar Tags

Collected

Keywordsper

FeatureandLanguage

UsingaGraphAlgorithmtoMeasureSimilarityofExpression(Jaccard)

FeatureName AvgSimAnd 1,00Comparingvalues(equality) 1,00Equality 1,00Greaterthan 1,00Greaterthanorequalto 1,00Inequality 1,00Lessthan 1,00Lessthanorequalto 1,00Negation 1,00Or 1,00Typecoercions(i,e,implicittypeconversions) 1,00approximate32-bitbinarydecimalnumber 1,00approximate64-bitbinarydecimalnumber 1,00Edgedirections:l-to-r 0,87Specifyingaconditionalvalue 0,87date 0,83localtime 0,83Checkifapropertyexistsonanodeoranedge 0,80Edgedirections:r-to-l 0,79Edgepatternwithdisjunctionoflabels 0,79

MATCHwithmorethanonenode/edge/pathpattern(i,e,allowingfor'star'-shapedpatternsetc),Essentiallythiscanalsobeusedtoobtainacrossproduct 0,75Edgepatternwithdirection 0,75Subtraction 0,74Edgedirections:anydirection 0,73

FeatureName AvgSimDynamicpropertyaccess(accessingapropertyofanodeoredgebyusingadynamically-computedstringvalueasthekey§ e,g,allowingforthekeytobepassedinasaparameter) -Escapingcharacters -Flatteningalist(transformalistintoaseriesofrows§transpose) -Get alltheelementsofalist/collection/arrayexcludingthefirstelement -Get allthelabelsforanode -Get theidentifierofanodeoredge -Nodepatternwithlabelnegation -interval -multidimensionalarray -Obtainthecurrentdate/time 0,06Get allthenodesinapath 0,07List/collection/arrayconcatenation 0,07Get alltheedgesinapath 0,08Determinewhether ornotavalueisamemberofamultiset 0,08Inputgraphspecification 0,08Listequality 0,08Create anedge 0,09Get theedgelabelasastring 0,09

Subtractionoperatorfortemporaltypesanddurations 0,11Create anode 0,11

Get thefirstelementinalist/collection/array 0,11Replace 0,11Checkingifapatternexists 0,12Amalgamatemultiplevaluesintoasinglelist 0,13

-

0,20

0,40

0,60

0,80

1,00

1,20

And

Lessth

anapproximate64-bitbinary…

Edgedire

ctions:r-to

-lEdgepatternwithlabel

Compute'e'raisedtoagiven…

Sortingreturnedro

wsEdgepropertypredica

tes

timewith

timezone

Updateallpropertie

sona

n…basiclist/array

Projectin

grows

Standardaggregatin

goperatio

nsDe

leteanedge

Elem

ente

xistencechecking

Conversio

nPower

Additio

noperatorfortem

poral…

Readingfro

magraph

multiset

Createanedge

Geta

llthenodesinap

ath

Geta

lltheelem

entsofa…

AvgSim

10DataExtractsinExcel(ELWG_reports_20190228.zip)• CandidateFeatures_20190228• DescriptorTags_20190228• FeaturesNotSupported_20190228• FeatureSyntaxSimilarity_20190228• GrammarTags_20190228• KeywordTagsAcrossLanguages_20190228• KeyWordTagsCollections_20190228• SyntaxSummary_20190228• SyntaxTags_20190228• SyntaxXref_20190228

Contact information:

ThomasFrisendal(Copenhagen, Denmark)

thomasf@tf-informatik.dk@VizDataModelerlinkedin.com/in/thomas-frisendal-19a56a