Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No...
Transcript of Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No...
BioinfRes SoSe 17
Bioinforma)csResources-NoSQL2-
Lecture&ExercisesProf.B.Rost,Dr.L.Richter,J.Reeb
Ins)tutfürInforma)kI12
BioinfRes SoSe 17
PreliminaryScheduleApr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence Databases (3. sh.) Jun 30th MongoDB, JavaScript (8.sh.) May 19th Structure Databases (4. sh.) Jul 7th PredictProtein (9.sh.) May 26th No Lecture Jul 14st JavaScript/Node.js Applications Jun 2nd SQL (5. sh.) Jul 21st Wrap Up, Q&A Jun 9th SQL, NoSql (6. sh) Jul 28th Exam
* These exercises can earn you a bonus
BioinfRes SoSe 17
Evalua)on
● LecturesareevaluatedbetweenJune19thand30th
● Pleasetake15mintocompletethesurvey● Thenecessaryinforma)onwassenttothestudentsregisteredforthelecture
● Thislectureis:0000002112Bioinforma)scheRessourcen(IN2321) Lecturers:(Dr.Richter,M.Sc.Reeb)
BioinfRes SoSe 17
Orga-ExamDate
● ExamscheduledforFriday,Jul28th
● Time:16:30-18:00
● Room:MW0350Egbert-von-HoyerLectureHall(MechanicalEngineeringBuilding)
● Registra)onisMANDATORY
● sofar13studentsregistered
BioinfRes SoSe 17
ShortSQLRecap● schema● typeddata
● tables
● definedlayout● spaceconsump)oniscomputable
BioinfRes SoSe 17
ShortSQLRecap● welldefinedtheory● rela)onalalgebra
● ACIDprinciple
● standardizedquerylanguage● fastaccesswithindices
● wellsupportedbysoawarevendors
BioinfRes SoSe 17
NoSQL● inprincipleknownforalong)me● KenThompson1978:Key/Valuesystem
● bigpushin2000:Web2.0
● Map/Reduce,BigTabledatabases● datavolumeintherangeofTBandPB
● growingrela)onaldatabasesmoreandmoredifficultoncommodityhardware
● hgp://www.w3resource.com/mongodb/nosql.php
BioinfRes SoSe 17
Defini)on
● nonrela)onaldatamodel● enablesdistributedandhorizontalscalability
● opensource
● noorsimpleschema● supportforsimpledatareplica)on
● simpleAPI
● differentconsistencymodel
BioinfRes SoSe 17
IssueswithRela)onalDB
● istheschemabad,thequeryalsois● basedonstrings,suscep)blefortypos
● errorsarenotdetectedatcompile)me
● cannotberefactored
BioinfRes SoSe 17
CategoriesofNoSQLSystems
● WideColumnStores/ColumnFamilySystems● DocumentStores
● Key/Values/TupleStores
● GraphDatabases
BioinfRes SoSe 17
Key/ValueSystems● atleastverysimpleschema:keyandvalue● keyscanbegroupedinnamespacesanddatabases
● valuescanbecomplexbesidessimplestringsthereare:- hashes- set- lists
● queriesmostlylimitedtoAPI
BioinfRes SoSe 17
ColumnFamily
● keyscanpointtoanarbitrarynumberofkey/valuepairs
● nestedkey/valuepairs● nestedcolumns
BioinfRes SoSe 17
DocumentStores
● worksnoton“actual”documents● structureddatalike:- JSON- YAML- RDF
BioinfRes SoSe 17
GraphDatabases
● basesongraphortreestructurestoconnectelements
● propertygraph:- nodestoreflectsitems- edgestoreflectrela)ons
● verysuitablefortraversing
BioinfRes SoSe 17
Theore)calConcepts
● Map/Reduce● CAP-Theorem/EventuallyConsistent
● ConsistentHashing
● MVCC-Protocol● VectorClock
● Paxos
● REST
BioinfRes SoSe 17
Map/Reduce
● requirea(map/reduce)framework● designedforefficienthandlingofdataintheorderofTeraorPetabytes
● developedbyGoogle
● patentedsince2010
BioinfRes SoSe 17
Map/ReduceDetails
● originatesfromfunc)onalprogramming● parallelprocessing
● nosideeffects
● nodeadlocks● noracecondi)ons
● ini)aldatastructureisnotaltered
● newcopywitheverylevel
BioinfRes SoSe 17
Map/ReduceDetails
● func)onslikeinmath:- asetoftransforma)ondefini)ons- nocontrolstructures- recursion- func)onscanbeusedasargumentorreturnvalue:higherorderfunc)ons
BioinfRes SoSe 17
Map/ReduceDetails
● twofunc)ons:map,reduce/fold● usedalterna)ng(twophaseapproach)
● map(inparallel):- appliedtoallelementsoflist- returnsamodifiedlist
● reduce:- aggregatethereturnvaluesfrommapintooneresult
BioinfRes SoSe 17
Map/ReduceDetails● userhastoprovide:- mapfunc)on- reducefunc)on
● frameworkprovides:- automa)cparalleliza)onanddistribu)on- faulttolerancemechanismsforhard-andsotwarefailure
- I/Oscheduling- statusandcontrolinforma)on
BioinfRes SoSe 17
PseudocodeExamplemap(Stringkey,Stringvalue)://key:documentname//value:documentcontentsforeachwordwinvalue:EmitIntermediate(w,"1");
reduce(Stringkey,Iteratorvalues)://key:aword//values:alistofcountsintresult=0;foreachvinvalues:result+=ParseInt(v);
Emit(AsString(result));
BioinfRes SoSe 17
Characteris)csofaMap/ReduceSystem
● commodityhardware● Ethernetnetwork
● largenumberofnodes(>100)
● distributedfilesystem,dataisstoredinchunksandredundant
● dataarelocaltoprocessingnode
BioinfRes SoSe 17
CAPandEventuallyConsistent
● horizontalscalingofrela)onaldatabasesinsufficient- toomuch)metoextenddatabasetomorecomputers
- frequentlymodifica)onofsourcecoderequired
● mostlyduetoimplementa)onofACIDprinciple
BioinfRes SoSe 17
CAPTheorem
● Consistency,availabilityandpar))ontolerancecannotallcompletelysa)sfiedatthesame)me
● onlytwoofthesecriteriacanbesa)sfiedatthesame)me,here:availabilityandpar))ontoleranceistheimportantcombina)on
● consistencyisreduced
BioinfRes SoSe 17
Consistency
● aaeratransac)onthedatabaseisconsistent,i.e.- allreplica)ngnodesofdatabasesystemhavethesamestateaaerantransac)on;changesarepropagatedtoallnodes
- readaccesstoanynodereturnsthesameresult- thisrequiretowaitforthecomple)onofthepropaga)on
BioinfRes SoSe 17
Availability
● acceptableresponse)me● dependsonthespecificbusinesscase
● acertainresponse)meisguaranteeduptoaspecifiedloadlevel
BioinfRes SoSe 17
Par))onTolerance
● ifanodeoraconnec)onfailsthesystemremainstoberesponsive
● inlargecomputercentersthosefailuresarefrequent
BioinfRes SoSe 17
BASEConsistencyModel
● Basicallyavailable● Soastate
● Eventuallyconsistent
BioinfRes SoSe 17
Characteris)cs
● focusonavailability● consistencyislessimportant
● BASEisop)mis)caboutconsistencyanddefinesisasatransi)onprocessandnotasadefinedstateaaeratransac)on->EventuallyConsistency
● consistentatsomepointin)me
● interpreta)ondifferentbetweensystems
BioinfRes SoSe 17
LevelsofConsistency
● CausalConsistency● Read-your-writeConsistency
● SessionConsistency
● MonotonicReadConsistency● MonotonicWriteConsistency
BioinfRes SoSe 17
ConsistentHashing● belongstothefamilyofhashingfunc)on● mapselementsof(poten)ally)verylargesourcesettoahashvaluefromatypicallymuchsmallervalueset
● advantage:constant)me
● applica)ons:- checksums- securingagainstmanipula)ons- fastsearchindatastructures
BioinfRes SoSe 17
ConsistentHashing● here:findaconstantplacememoryforanobject● minimizeobjectmovementsonaddi)onorremovalofnodes
● minimizeobjectmovementsuponinser)ons
● distributeequallyamongresources
● circularhashspace
● serversanddataobjectareintegrated(clockwise● uponinser)onorremovalonlyneighborsareaffected
BioinfRes SoSe 17
Mul)versionConcurrencyControl(MVCC)
● dataobjectsareversioned● representschange)meline
● everywriteaccesscreatesanewversion
● containsreferencetotheleastrecentversion● conflictresolu)onthroughexplicitversioncomparison
BioinfRes SoSe 17
Mul)versionConcurrencyControl(MVCC)
● disadvantageofconven)onallocks:- completetablesarelocked- inefficientifcommunica)on)meishighbecauseoflongcachepipelineornetworktraffic
- not100%guaranteedindistributedsystems- parallelaccessareblocked
BioinfRes SoSe 17
Mul)versionConcurrencyControl(MVCC)
● dataobjectsareversioned● representschange)meline
● everywriteaccesscreatesanewversion
● containsreferencetotheleastrecentversion● conflictresolu)onthroughexplicitversioncomparison
BioinfRes SoSe 17
MVCC–NoConflict
Alice
Bob
transaction TxAlice
t0 t1 t v0
vlatest=v0 vlatest=v1
v1 v0 v1
read v0
read v0
read v0 read v1
write v0 v1
BioinfRes SoSe 17
MVCC–ConflictCase
Alice
Bob
transaction TxAlice
t0 t v0
vlatest=v0 vlatest=v1a
v1a v1b
read v0
read v0 write v0 v1a
t1 t2 t3
write v0 v1b
transaction TxBob
Conflict! vlatest!=v0
BioinfRes SoSe 17
VectorClocks
● challenge:- manyinstanceswritedata- theyhavetobesynchronizedandorderedaaerwards
● solu)on:VectorClocks- originatedinthefieldofopera)ngsystems- LeslieLamport(1978)describesTimestamps/Clocks
BioinfRes SoSe 17
LamportTimestamps/Criteria● weakconsistencycriterion:ifevente1causesevente2thenthe)mestampofe1hasbesmallerthanthe)mestampofe2
● strongconsistencycriterion(theopposite):ifthe)mestampofe1issmallerthantheoneofe2thenevente1hasbeenthecauseforevente2
● eventscanbesortedinapar)alorder- everyeventgetsa)mestampwhichdoesnotreflectreal)me
- monotoneincreasinginteger
● Timestampsfulfillonlytheweakcriterion
BioinfRes SoSe 17
VersionVector/VectorClock
● VersionVector:Vector(Tuple)ofvalues/)mestampsofanobject
● VectorClock:- Eachprocess/databasehasancounterwhichisincremented
- everyprocessremembersthesenderandthe)mestamp
- everymessage/versionhasavectorofid-)mestamppairsagached
BioinfRes SoSe 17
VectorClocksinNoSQL
● sotheVectorClockisalistofIDxTimetuples● thisenabletheclienttosortandfigureoutthedifferentversionsifmul)pleclientsupdateandreplicaterecordsatthesame)me
● wedemonstratethiswithasimpleexample:- threepeople,denotedbytheirini)alwanttoagreeonasportsac)vity
BioinfRes SoSe 17
Laura
Paul
Anna
jogging
L:1
surfing
P:1 L:1
jogging
A:1 L:1
surfing
L:2 P:1
surfing
L:2 P:1 A:0
jogging
A:1 L:1 P:0
surfing
P:2 A:1 L:2
BioinfRes SoSe 17
StorytotheExample
● Laura,AnnaandPaul(replacingnodes)wanttoagreeonsports(haveconsistentdata)- nodescanrequestthecurrentversionofarecordandtheycanupdateeachother
- simultaneousbroadcastcreatesconfusion- goal:consistentinforma)on(consensusprotocols)
BioinfRes SoSe 17
StorytotheExample/Solu)on
● Laurastarts,sugges)ngtogojogging:jogging,[L:1](joggingisthedatatostore,L:1theVectorClock)andsends/replicatesthistoAnnaandPaul
● Paulisbecomingac)veandsuggesttogosurfing:surfing,[L:1,P:1]andsendsthistoAnnaandPaul.
● BecauseofnetworkproblemAnnadoesnotreceivethemessage,Laurareceivesit
BioinfRes SoSe 17
StorytotheExample/Solu)on● LauraagreestoPaulandreturnthesurfingsugges)on,incremen)nghercounter:surfing,[P:1,L:2]
● Annabecomesconcernedandagreestojogging,basedonLaura’ssugges)on:jogging[L:1,A:1]andsendsittoPaul
● Paulhasto(andcan)detecttheconflict:joggingcouldhadamajority(Laura&Anna),BUTLauraalsoalreadyagreedonsurfing(Laura&Paul)
BioinfRes SoSe 17
StorytotheExample/Solu)on
● surfing[A:0,P:1,L:2]jogging[P:0,L:1,A:1]notyetknowncountersarelistedwith0
● PaulcandetectthatAnna’smessagewasnotaresponsetohissugges)onsinceP:0.Therearetwopossibleresolu)ons:- jogging,becauseini)allybothgirlswantedto- surfing,becauseLaurachangedhermind
BioinfRes SoSe 17
StorytotheExample/Solu)on
● PauldecidestogoonwithsurfingandcommunicatesthistoAnnaandLaura:surfing,[L:2,A:1,P:2]
● thediscussioncoulds)llgoonnow,butthiswaytheVectorClockshelptomakereasonabledecisionsandtocheckcausaldependencies
BioinfRes SoSe 17
Paxos● goal:ensuredataintegrityifnodesinclusterwithreplicateddatafail
● belongstoQuorum-Consensusalgorithms● leadstoanagreementbetweenpar)cipa)ngnodes
● superiortoclassicalTwo-Phase-Commit(2PC)● tolerantfor:- minorityofthenodesfails- atransac)oncrashes- messageloss
BioinfRes SoSe 17
BasicPaxosConsensusAlgorithm
● basedonvo)ng:- oneclientsuggestsavalues- theotheracceptors(quorum)vote- eachballothasaleader(coordinator)- proposerssupportclients,convinceacceptorsandcoordinateconflictresolu)ons
BioinfRes SoSe 17
BasicPaxos–Execu)on● Phase1a(prepare):proposer/leaderacquiresthecurrent(maximum)ballotnumberfromphase1andsendsittothequorum
● Phase1b(prepare):ifthereceivednumberislargerthananynumberreceivedbeforeanodesendsitsstatustotheleaderincluding:- largestreceivednumberfromphase1a- largestnumbersentinphase2b- nosmallerorequalballotnumbersthanthecurrentwillbeaccepted
BioinfRes SoSe 17
BasicPaxos–Execu)on● Phase2a(accept):iftheleaderforaballotreceivedposi)ve1bmessagesfromaquorum- free–noquorumhassentanumberlargerthan2bandhasthereforevotedforavaluev(nocompletedballotbefore)
- forced–aquorumhassentaballotlargerphase2b,i.e.ithasselectedanvaluev
- ifforcedleadersendsvaluev,iffreeleadercansendanyvalue
BioinfRes SoSe 17
BasicPaxos–Execu)on● Phase2b(accepted):ifanacceptorgetsa2amessageforwhichheagreedbeforewitha1bmessage,thevalueisacceptedanditsendsaphase2bmessagewithvandballottotheleader
● Phase3:Iftheleadergetaphase2bmessageforvandballotfromaquorum,itknowsthatvwasacceptedandcommunicatesthistoallinterestedprocesses
BioinfRes SoSe 17
GraphDatabases● graphsallowtorepresentconnectedinforma)onveryintui)velybyusingver)cesandedges
● usefulforcurrentproblemslike,a.o.:- internetrou)ng- contactsinsocialnetworks- recommendersystems- frauddetec)on- regulatorynetworks- seman)cweb- ...
BioinfRes SoSe 17
GraphLingo
● graphsarerepresentedbyapair(tuple)oftwosets,V(ver)ces)andE(edges)
● ver)cesarenodes,represen)ngakindoffact● edgesaretheconnec)ons/rela)onsbetweenver)cesandcanbedirectedorundirected
● G=(V,E),V={1,2},E=VxV={(1,2)}
BioinfRes SoSe 17
PropertyGraphModel
● directed,mul)-rela)onalgraph● labeled/(typed)edges
● ver)cesandlabelshaveproper)es
● proper)esarekey/valuepairsoftype<String,Object>like:Name:AliceorAge:30
BioinfRes SoSe 17
PropertyGraphModel● strongtypingofver)cesandedgespossible(‘Type’/’_Type’,dependsonthesupportofthesystem)- usefultoaseman)cmeaning- supportofautoma)chandling- allowsfordefini)onofconsistencycriteriaandindices
- makepar))onofgraphseasier
● bidirec)onaledgesarerealizedbytwounidirec)onaledges
BioinfRes SoSe 17
PropertyGraphModel
● mul)-edgesrequiredifferentlabels● ver)cesandedgeshaveanuniqueiden)ty:‘Id’,’_Id’
● usedforids:integers,string,URIs
● extension:mul)valueproperty,whichallowlistsorsetsofvalues
● specialcaseforedgelabelvalues:edgeweights
● anotherextension:higherorderrela)onswithhyperedgesandhyperver)ces
BioinfRes SoSe 17
Id 1
Type Person Name Alice Age 20
Id 4
Type Group Name Chess
Id 3
Type Person Name Paula
Age 23
Id 2
Type Person Name Bob Age 25
Label: knows since: 06/2013
Labe
l: kn
ows
sinc
e: 0
6/20
13
Label: knows since: 06/2013
Label: is_member since: 07/2013
Label: has member since: 07/2013
BioinfRes SoSe 17
PropertyGraphModel/Extensions
● higherorderrela)onswithhyperedgesandhypernodes- hyperedge:connectsmorethantwonodes- hypervertex:combina)onofasetofver)ces/nodes,keepsinternaledges
● paths:sequenceofedges
● subgraphs:adefinedcombina)onofnodesandedgesintoasinglenode
● versioninforma)onallowstorepresentthegraphevolu)onand/orconcurrency
BioinfRes SoSe 17
GraphRepresenta)ons
● differentrepresenta)onsavailableforpersistenceandmemory
● difficulttomatchagoodperformaningpersistenceandagoodsupportforavarietyofgraphalgorithmsatthesame)me
BioinfRes SoSe 17
AdjacencyMatrix
● squarematrix/table● allnnodesarelistedhorizontallyandver)cally
● ifanedgeexistsbetweennodesuandv,thereisanentryinthetableatposi)on[u,v]
● testfortheconnec)onoftwonodesuandvcanbedoneveryquick
BioinfRes SoSe 17
AdjacencyMatrix/Problems
● disadvantage:hugespaceconsump)onevenwithsparsematrices,i.e.graphswithmanynodesbutonlyafewedges
● itisdifficulttoiden)fytheconnec)ngedgesforagivennode
● toiden)fyneighborsyoualwayshavetoreadacompleteroworcolumn
● hypergraphscannotberepresented
BioinfRes SoSe 17
IncidencyMatrix
● amatrixwithnodesononeaxisandedgesontheotheraxis
● muchmorespaceefficientforveryweaklyconnectededgesthantheadjacencymatrix
● inmoreconnectedgraphsitneedsmorespacethantheadjacencymatrix
● canrepresenthypergraphs
BioinfRes SoSe 17
AdjacencyList
● extensionofedgelist● edgesaresortedaccordingtotheirstartnode
● foreverynodetheconnec)ngedgesarestored
● )meconsump)ondependsonlytoconnec)vityofthenode,notonthecompletegraphsize
BioinfRes SoSe 17
EdgeList
● nodesandedgesarestoredseparately● inser)onanddele)onofsingleedgesisveryefficient
● iden)fica)onofconnec)ngedgesgivenanodeisinefficient,sincethewholeedgelisthastobesearched
BioinfRes SoSe 17
ExampleGraph
v1
v3 v4
v2
e1 e2
e3
e5
e4 v1 v2 v3 v4
v1 0 0 0 0
v2 1 1 1 0
v3 2 1 0 0
v4 0 1 0 0
e1 e2 e3 e4 e5 e6
v1 1 1 1 0 0 0
v2 0 0 -1 2 1 1
v3 -1 -1 0 0 0 1
v4 0 0 0 0 -1 0
V1 v2 v3 v3 V2 v2 v3 v4 V3 v2 V3
BioinfRes SoSe 17
GraphTraversal
● eitherpar)alorcompletevisitofthenodes● threestrategies:- breadth-first/depth-first- algorithmictraversals- randombased
BioinfRes SoSe 17
GraphIndexingandPar))oning
● Graphindexesarefirst-classci)zens● caninsertedassub-graphsandagachedtospecificnodesasspecificinforma)on
● IfGraphgetstobigitcanbesplitintopar)algraphs
● Op)malPar)oningishighlydomainandseman)cs-dependent->nogoodstandardsolu)on
BioinfRes SoSe 17
TinkerpopGraphProcessingStep
● agempttoprovideuniforminterfacesforProperty-Graphbasedsystems
● coversthebackenddatabasefromtheapplica)ondeveloper
● consistsofseveralsub-projects:- Blueprints:JavainterfaceforProperty-Graphmodels–noownpersistenceyet
- supportstransac)ons
BioinfRes SoSe 17
// Erzeuge einen neuen Graphen mit Neo4j-PersistenzGraph graph = new Neo4jGraph("/tmp/my_graph");
// Erzeuge Knoten mit Id "Alice", "Bob" und "Carol"Vertex alice = graph.addVertex("Alice");Vertex bob = graph.addVertex("Bob");Vertex carol = graph.addVertex("Carol");
// Füge die Namen und das Alter als Properties hinzualice.setProperty("Name", "Alice");alice.setProperty("Alter", 18);bob .setProperty("Name", "Bob");bob .setProperty("Alter", 22);carol.setProperty("Name", "Carol");carol.setProperty("Alter", 20);
// Erstelle die dazugehörigen Kanten...Edge e1 = graph.addEdge("e1", alice, bob, "kennt");Edge e2 = graph.addEdge("e2", alice, carol, "kennt");Edge e3 = graph.addEdge("e3", carol, bob, "kennt");
// ...und setzte das Kanten-Property "seit"e1.setProperty("seit", "2001/10/03");e2.setProperty("seit", "2003/12/04");e3.setProperty("seit", "2001/07/12");
graph.shutdown();
- orderid - 17800851 - transid - 17800851_1D -
taken from Stefan Edlich et al. “NoSQL”, 2. Auflage, Hanser Verlag (2011)
GraphCrea)on
BioinfRes SoSe 17
Set<String> indexKeys = new HashSet<String>();indexKeys.add("Name");
// Indiziere die angegebenen Property-Schlüssel aller KnotenAutomaticIndex index = graph.createAutomaticIndex( "IndexOfName", Vertex.class, indexKeys);
// Bereits vorhandene Knoten müssen neu indiziert werdenAutomaticIndexHelper.reIndexElements(index, graph.getVertices());
// Iteriere über die Ergebnisse der Indexanfragefor (Vertex vertex : index.get("Name", "Alice")) { System.out.println("Vertex: " + vertex);}
- orderid - 17800851 - transid - 17800851_1D -
taken from Stefan Edlich et al. “NoSQL”, 2. Auflage, Hanser Verlag (2011)
BioinfRes SoSe 17
GraphQueryLanguages
● nocommonstandardyet● pagern-based:SPARQL,RDFQuerylanguage
● naviga)on-base:Gremlin,sonesGQL
● logic-bases:OWL,GraphLog
BioinfRes SoSe 17
Neo4j
● oneoftheoldestNoSQLgraphdatabases(2003)● fullACIDsupport
● usesownformattostoregraphsondisc
● ApacheLuceneusedforindexing● canrunasserveraswellasembedded
BioinfRes SoSe 17
Integra)onwithJava
● easiestintegra)onusingMaven(addingtothexmlfile),then:GraphDatabaseService graphdb = new EmbeddedGraphDatabase("/var/graphdb"); "
BioinfRes SoSe 17
GraphCrea)onenum Relationships implements Rel { knows } "
Transaction tx = graphdb.beginTx(); try { "Node Alice = graphdb.createNode(); "Node Bob = graphdb.createNode(); "Node Carol = graphdb.createNode(); "Alice.setProperty("Name", ""Alice"); "Bob.setProperty("Name", "Bob"); "Carol.setProperty("Name", "Carol"); "Alice.setProperty(”Age", "18); "Bob.setProperty("Age", 20); "Carol.setProperty("Age", 22);
"Relationship Alice_Bob = Alice.createRelationshipTo(Bob, "Rel.knows); "Relationship Alice_Carol = Alice.createRelationshipTo(Carol, "Rel.knows); "Relationship Carol_Bob = Carol.createRelationshipTo(Bob, "Rel.knows); ""
BioinfRes SoSe 17
GraphCrea)on"Alice_Bob.setProperty(”since", ); "Alice_Carol.setProperty(”since", ); "Carol_Bob.setProperty(”since", ); "tx.success();
} catch (Exception e) { "tx.failure();
} finally { "tx.finish(); "
} "
"
"
BioinfRes SoSe 17
ManualIndexing"IndexManager index = graphdb.index(); "Index<Node> UserIdx = index.forNodes(”User"); "RelationshipIndex KnowsIdx = index.forRelationships(”knows"); "UserIdx.add(Alice, "Name", Alice.getProperty("Name")); "UserIdx.add(Alice, "Age", Alice.getProperty("Age")); "[...] "
"
"
BioinfRes SoSe 17
TraversalConfigura)on
● besidessimpletraversalsandwildcardsearchestherearenumberofsophis)catedtweaks:- Order:Determinesthebranchingorder(DFS/BFS)- Uniqueness:howtohandlemul)plehitsofthesamenodes
- Pruning:whichbranchesnottofollow- Filtering:whichhitsareconsideredfortheresult- Rela)onshipexpanding:dedicatededgehandling
BioinfRes SoSe 17
ExampleTraversalTraversalDescription td = new TraversalDescriptionImpl(); td = td.prune(Traversal.pruneAfterDepth(2)). "filter(Traversal.returnAllButStartNode()). "relationships(KNOWS);
Traverser tr = td.traverse(startNode); for ( Path path : tr ) { "System.out.println( "End Node: " + "path.endNode().getProperty( NodeProperty.NAME ) );
} "
"
"
"
BioinfRes SoSe 17
Cypher
● owngraphquerylanguagesinceversion1.4- developedforpagernrecogni)on- declara)ve- implementedinScala->parallelenabled
● querystructure:- startswithasetofnodes- matchstatement(nodein(),edges->)- returnstatementwithop)onalwhereorsort
BioinfRes SoSe 17
Examples//start nodes via Ids start Person = (1, 2) match (Person)-[:knows]->(Friend) where Friend.Age > 18 return Friend.Name, Friend.Age, Friend.City? sort by Friend.Name "
"
// starts nods via index query start Person = (Person-index, Name, "Alice") match (Person)-[:knows]->()-[:knows]->(FriendofFriend) where not(FriendofFriend.Age < 17) return FriendofFriend.Name "
"
"
"
BioinfRes SoSe 17
InterfacingNoSql
● specificAPIsvaryheavily● mostsupportRESTfulinterface:- REpresenta)onalStateTransfer- architectureforwebapplica)ons- predominantlyimplementedusingHTTPprotocol- DescribedbyRoyThomasFielding:“ArchitectureStylesandtheDesignofNetwork-basedSoawareArchitectures”,Disserta)on,UCIrvine,2000
BioinfRes SoSe 17
CRUD
● minimumsetofaccessfunc)ons:- Create,Read,Update,Delete
CRUD SQL HTTP Create insert POST Read select GET Update update PUT Delete delete DELETE
BioinfRes SoSe 17
Components● Resources,Opera)ons,Links● eachrequestisindependent,i.e.ithasnostate->noneedforsynchroniza)on
● abstractviewofhgpprotocol:nounsandverbs.–eachrequestisdefinedbytheapplica)onofaverbtonounandanop)onalresponse
● arequestiscomposedofaheaderwithamethodandmetadatainkey/valueformatandanop)onalbody
● aresponseislikearequestbutwithoutamethod
BioinfRes SoSe 17
Resources
● addressableend-pointtothesystem- e.g.HTMLdocument,video,aprocess
● aresourceisabstractandcanhavemorethanonerepresenta)on
● theuseralwaysinteractswitharepresenta)on(HTML,agraphicsformat,XML,...)andmaychoosethedesiredone
BioinfRes SoSe 17
Opera)ons
● HTTPdefinesasetofopera)onswithknownseman)cs:- GET- HEAD- PUT- POST- DELETE
BioinfRes SoSe 17
Characteris)csforOpera)ons
● opera)onscanbeclassifiedaccordingthecriteriasafeandidempotentwhichareimportantforthesystem’sintegrityandcachingperformance
● safe:nosideeffects,noresponsibilityfortheuser
● idempotent:sideeffect,butonlythefirst)me–uponmul)pleexecu)ontheserverstatedoesnotchangeanymore
BioinfRes SoSe 17
GET/HEAD
● safeandidempotent● HEAD:returnsonlymetainforma)onabouttheresource
● GET:containsinaddi)ontothemetainforma)onalsoarepresenta)onoftheresource
● anonconformingexample:“hgp://www.example.com/api?ac)on=delete”
BioinfRes SoSe 17
PUT
● idempotent● thereferencedresourcerepresenta)onistransmigedtotheserver(sideeffect->notsafe)
● onlythefirstexecu)onchangesthestateoftheserver
● thiscanbeachievedifaservermaintainsversionnumbersforadocumentwhichhastobematchbytherequest
BioinfRes SoSe 17
PUT–SimpleExample● Q(request):GETdoc● R(response):returndocv=1,doccontent
● Q:Putdocv=1,doccontentmodified
● R:Requestv=1matchesserverv=1;doccontentmodifiedstored;updateversionv=2
● Q(asecond)me):Putdocv=1,doccontentmodified(maybeagain)
● R:Requestv=1doesnotmatchserverv=2;doccontentnotstored;
BioinfRes SoSe 17
DELETE
● idempotent- oncetheresourceisremovedallsubsequentrequestsfail->serverstateremainsthesame
● notsafe
● thereferredresourceisremovefromtheserver/accessblocked
BioinfRes SoSe 17
POST
● noguaranteesatall● transmitsdataforprocessing
● theprocessingresultcanbeusedtocreateanewresource,modifyaexis)ngoneornotatall
● canbeusedforverycomplexqueriesbecauseallparameterscanbeincludedinthebody–GEThadtoincludeitintheURI
BioinfRes SoSe 17
LINKS
● HTTPdoesnotrepresentlinks● linksaremodeledinURIs
● encodingdependsonthetypeofrepresenta)on
● cancontainmetadatatosupportappropriateresourcebytheuser
BioinfRes SoSe 17
Example(StefanEdlichetal..“NoSQL”,2.Auflage,HanserVerlag(2011)
POST /api/ HTTP/1.1Host: cocktails.example.comContent-Type: application/json…{ "name" : "Ipanema", "description" : "Eine alkoholfreie Variante für den Caipirinha-Abend", "ingredients" : { "Limette" : { "amount" : 1, "preparation" : "Achteln" }, "Brauner Zucker" : { "amount" : 2, "unit" : "TL" }, … }, "preparation" : "Limetten und Zucker in einem Glas mörsern, mit crushed ice bedecken und den Flüssigkeiten auffüllen. Mit einem Strohhalm servieren"}
- orderid - 17800851 - transid - 17800851_1D -
BioinfRes SoSe 17
HTTP/1.1 201 CreatedContent-Type: application/jsonLocation: http://cocktails.example.com/cocktails/1…
{ "id" : "1" }
GET /cocktails/1 HTTP/1.1Host: cocktails.example.com…
{ "id" : "1", "name" : "Ipanema", "description" : "Eine alkoholfreie Variante für den Caipirinha-Abend", "ingredients" : { "Limette" : { "amount" : 1, "preparation" : "Achteln" }, "Brauner Zucker" : { "amount" : 2, "unit" : "TL" }, … }, "preparation" : "Limetten und Zucker in einem Glas mörsern, mit crushed ice bedecken und den Flüssigkeiten auffüllen. Mit einem Strohhalm servieren", "links" : { "linktypes/publish" : "http://cocktails.example.com/publish/1", "linktypes/edit" : "http://cocktails.example.com/cocktails/1", "linktypes/delete" : "http://cocktails.example.com/cocktails/1" }}
- orderid - 17800851 - transid - 17800851_1D -
BioinfRes SoSe 17
PUT /cocktails/1 HTTP/1.1Host: cocktails.example.comContent-Type: application/json…{ … "tags" : [ "alkoholfrei", "Eis" ], …}
DELETE /cocktails/1 HTTP/1.1Host: cocktails.example.com
POST /publish/1 HTTP/1.1Host: cocktails.example.comContent-Type: application/json…{ "publish" : true }
{ "id" : "1", … "links" : { "linktypes/delete" : "http://cocktails.example.com/cocktails/1", "linktypes/ratings" : "http://cocktails.example.com/ratings/1" }}
BioinfRes SoSe 17
DocumentStores
● originatestoDamienKatzandLotusNotes,CouchDB
● theresponsibilityfortheschemaismovedfromthedatabasetowardstheapplica)on:- lossofenforcementofnormaliza)onandreferen)alintegrity
- gainofflexibilityandschemamodifica)onsatrun-)mefortheapplica)on
● datamostlystoredasJSON
BioinfRes SoSe 17
MongoDB
● DocumentStore● trytoclosethegapbetweenclassicRDBMSandKey/Valuestores
● supportedbyanumberofsuccessfulinternetcompanies(10gen,...)
● goodintegra)onwithprogramminglanguages:C++,C#,Java,JavaScript,PHP,Ruby,Perl,Python
BioinfRes SoSe 17
JSONExample{ "id" : "1", "name" : "Ipanema", "description" : "Eine alkoholfreie Variante für den Caipirinha-Abend", "ingredients" : { "Limette" : { "amount" : 1, "preparation" : "Achteln" }, "Brauner Zucker" : { "amount" : 2, "unit" : "TL" }, … }, "preparation" : "Limetten und Zucker in einem Glas mörsern, mit crushed ice bedecken und den Flüssigkeiten auffüllen. Mit einem Strohhalm servieren", "links" : { "linktypes/publish" : "http://cocktails.example.com/publish/1", "linktypes/edit" : "http://cocktails.example.com/cocktails/1", "linktypes/delete" : "http://cocktails.example.com/cocktails/1" }}
- orderid - 17800851 - transid - 17800851_1D -
BioinfRes SoSe 17
JSONinMongoDB● eachdocumentneedsaspecialIDfield:_id● the_idvalueshastobeunique
● canbeanything
● automa)cdefault:- automa)c12-bytenumber:
● 4byte)mestamp● 3byteclientmachineid● 2byteprocessid● 3bytecounter
BioinfRes SoSe 17
Demo
● Checkoutthecommandlineandpythontutorialunder:hgp://api.mongodb.com/python/current/tutorial.html
● getatoymongodbserverforfreeat:hgps://mlab.com