Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No...

101
BioinfRes SoSe 17 Bioinforma)cs Resources - NoSQL 2- Lecture & Exercises Prof. B. Rost, Dr. L. Richter, J. Reeb Ins)tut für Informa)k I12

Transcript of Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No...

Page 1: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

Bioinforma)csResources-NoSQL2-

Lecture&ExercisesProf.B.Rost,Dr.L.Richter,J.Reeb

Ins)tutfürInforma)kI12

Page 2: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

PreliminaryScheduleApr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence Databases (3. sh.) Jun 30th MongoDB, JavaScript (8.sh.) May 19th Structure Databases (4. sh.) Jul 7th PredictProtein (9.sh.) May 26th No Lecture Jul 14st JavaScript/Node.js Applications Jun 2nd SQL (5. sh.) Jul 21st Wrap Up, Q&A Jun 9th SQL, NoSql (6. sh) Jul 28th Exam

* These exercises can earn you a bonus

Page 3: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

Evalua)on

●  LecturesareevaluatedbetweenJune19thand30th

●  Pleasetake15mintocompletethesurvey●  Thenecessaryinforma)onwassenttothestudentsregisteredforthelecture

●  Thislectureis:0000002112Bioinforma)scheRessourcen(IN2321) Lecturers:(Dr.Richter,M.Sc.Reeb)

Page 4: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

Orga-ExamDate

●  ExamscheduledforFriday,Jul28th

●  Time:16:30-18:00

●  Room:MW0350Egbert-von-HoyerLectureHall(MechanicalEngineeringBuilding)

●  Registra)onisMANDATORY

●  sofar13studentsregistered

Page 5: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

ShortSQLRecap●  schema●  typeddata

●  tables

●  definedlayout●  spaceconsump)oniscomputable

Page 6: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

ShortSQLRecap●  welldefinedtheory●  rela)onalalgebra

●  ACIDprinciple

●  standardizedquerylanguage●  fastaccesswithindices

●  wellsupportedbysoawarevendors

Page 7: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

NoSQL●  inprincipleknownforalong)me●  KenThompson1978:Key/Valuesystem

●  bigpushin2000:Web2.0

●  Map/Reduce,BigTabledatabases●  datavolumeintherangeofTBandPB

●  growingrela)onaldatabasesmoreandmoredifficultoncommodityhardware

●  hgp://www.w3resource.com/mongodb/nosql.php

Page 8: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

Defini)on

●  nonrela)onaldatamodel●  enablesdistributedandhorizontalscalability

●  opensource

●  noorsimpleschema●  supportforsimpledatareplica)on

●  simpleAPI

●  differentconsistencymodel

Page 9: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

IssueswithRela)onalDB

●  istheschemabad,thequeryalsois●  basedonstrings,suscep)blefortypos

●  errorsarenotdetectedatcompile)me

●  cannotberefactored

Page 10: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

CategoriesofNoSQLSystems

●  WideColumnStores/ColumnFamilySystems●  DocumentStores

●  Key/Values/TupleStores

●  GraphDatabases

Page 11: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

Key/ValueSystems●  atleastverysimpleschema:keyandvalue●  keyscanbegroupedinnamespacesanddatabases

●  valuescanbecomplexbesidessimplestringsthereare:-  hashes-  set-  lists

●  queriesmostlylimitedtoAPI

Page 12: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

ColumnFamily

●  keyscanpointtoanarbitrarynumberofkey/valuepairs

●  nestedkey/valuepairs●  nestedcolumns

Page 13: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

DocumentStores

●  worksnoton“actual”documents●  structureddatalike:-  JSON-  YAML-  RDF

Page 14: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

GraphDatabases

●  basesongraphortreestructurestoconnectelements

●  propertygraph:-  nodestoreflectsitems-  edgestoreflectrela)ons

●  verysuitablefortraversing

Page 15: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

Theore)calConcepts

●  Map/Reduce●  CAP-Theorem/EventuallyConsistent

●  ConsistentHashing

●  MVCC-Protocol●  VectorClock

●  Paxos

●  REST

Page 16: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

Map/Reduce

●  requirea(map/reduce)framework●  designedforefficienthandlingofdataintheorderofTeraorPetabytes

●  developedbyGoogle

●  patentedsince2010

Page 17: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

Map/ReduceDetails

●  originatesfromfunc)onalprogramming●  parallelprocessing

●  nosideeffects

●  nodeadlocks●  noracecondi)ons

●  ini)aldatastructureisnotaltered

●  newcopywitheverylevel

Page 18: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

Map/ReduceDetails

●  func)onslikeinmath:-  asetoftransforma)ondefini)ons-  nocontrolstructures-  recursion-  func)onscanbeusedasargumentorreturnvalue:higherorderfunc)ons

Page 19: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

Map/ReduceDetails

●  twofunc)ons:map,reduce/fold●  usedalterna)ng(twophaseapproach)

●  map(inparallel):-  appliedtoallelementsoflist-  returnsamodifiedlist

●  reduce:-  aggregatethereturnvaluesfrommapintooneresult

Page 20: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

Map/ReduceDetails●  userhastoprovide:-  mapfunc)on-  reducefunc)on

●  frameworkprovides:-  automa)cparalleliza)onanddistribu)on-  faulttolerancemechanismsforhard-andsotwarefailure

-  I/Oscheduling-  statusandcontrolinforma)on

Page 21: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

PseudocodeExamplemap(Stringkey,Stringvalue)://key:documentname//value:documentcontentsforeachwordwinvalue:EmitIntermediate(w,"1");

reduce(Stringkey,Iteratorvalues)://key:aword//values:alistofcountsintresult=0;foreachvinvalues:result+=ParseInt(v);

Emit(AsString(result));

Page 22: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

Characteris)csofaMap/ReduceSystem

●  commodityhardware●  Ethernetnetwork

●  largenumberofnodes(>100)

●  distributedfilesystem,dataisstoredinchunksandredundant

●  dataarelocaltoprocessingnode

Page 23: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

CAPandEventuallyConsistent

●  horizontalscalingofrela)onaldatabasesinsufficient-  toomuch)metoextenddatabasetomorecomputers

-  frequentlymodifica)onofsourcecoderequired

●  mostlyduetoimplementa)onofACIDprinciple

Page 24: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

CAPTheorem

●  Consistency,availabilityandpar))ontolerancecannotallcompletelysa)sfiedatthesame)me

●  onlytwoofthesecriteriacanbesa)sfiedatthesame)me,here:availabilityandpar))ontoleranceistheimportantcombina)on

●  consistencyisreduced

Page 25: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

Consistency

●  aaeratransac)onthedatabaseisconsistent,i.e.-  allreplica)ngnodesofdatabasesystemhavethesamestateaaerantransac)on;changesarepropagatedtoallnodes

-  readaccesstoanynodereturnsthesameresult-  thisrequiretowaitforthecomple)onofthepropaga)on

Page 26: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

Availability

●  acceptableresponse)me●  dependsonthespecificbusinesscase

●  acertainresponse)meisguaranteeduptoaspecifiedloadlevel

Page 27: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

Par))onTolerance

●  ifanodeoraconnec)onfailsthesystemremainstoberesponsive

●  inlargecomputercentersthosefailuresarefrequent

Page 28: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

BASEConsistencyModel

●  Basicallyavailable●  Soastate

●  Eventuallyconsistent

Page 29: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

Characteris)cs

●  focusonavailability●  consistencyislessimportant

●  BASEisop)mis)caboutconsistencyanddefinesisasatransi)onprocessandnotasadefinedstateaaeratransac)on->EventuallyConsistency

●  consistentatsomepointin)me

●  interpreta)ondifferentbetweensystems

Page 30: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

LevelsofConsistency

●  CausalConsistency●  Read-your-writeConsistency

●  SessionConsistency

●  MonotonicReadConsistency●  MonotonicWriteConsistency

Page 31: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

ConsistentHashing●  belongstothefamilyofhashingfunc)on●  mapselementsof(poten)ally)verylargesourcesettoahashvaluefromatypicallymuchsmallervalueset

●  advantage:constant)me

●  applica)ons:-  checksums-  securingagainstmanipula)ons-  fastsearchindatastructures

Page 32: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

ConsistentHashing●  here:findaconstantplacememoryforanobject●  minimizeobjectmovementsonaddi)onorremovalofnodes

●  minimizeobjectmovementsuponinser)ons

●  distributeequallyamongresources

●  circularhashspace

●  serversanddataobjectareintegrated(clockwise●  uponinser)onorremovalonlyneighborsareaffected

Page 33: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

Mul)versionConcurrencyControl(MVCC)

●  dataobjectsareversioned●  representschange)meline

●  everywriteaccesscreatesanewversion

●  containsreferencetotheleastrecentversion●  conflictresolu)onthroughexplicitversioncomparison

Page 34: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

Mul)versionConcurrencyControl(MVCC)

●  disadvantageofconven)onallocks:-  completetablesarelocked-  inefficientifcommunica)on)meishighbecauseoflongcachepipelineornetworktraffic

-  not100%guaranteedindistributedsystems-  parallelaccessareblocked

Page 35: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

Mul)versionConcurrencyControl(MVCC)

●  dataobjectsareversioned●  representschange)meline

●  everywriteaccesscreatesanewversion

●  containsreferencetotheleastrecentversion●  conflictresolu)onthroughexplicitversioncomparison

Page 36: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

MVCC–NoConflict

Alice

Bob

transaction TxAlice

t0 t1 t v0

vlatest=v0 vlatest=v1

v1 v0 v1

read v0

read v0

read v0 read v1

write v0 v1

Page 37: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

MVCC–ConflictCase

Alice

Bob

transaction TxAlice

t0 t v0

vlatest=v0 vlatest=v1a

v1a v1b

read v0

read v0 write v0 v1a

t1 t2 t3

write v0 v1b

transaction TxBob

Conflict! vlatest!=v0

Page 38: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

VectorClocks

●  challenge:-  manyinstanceswritedata-  theyhavetobesynchronizedandorderedaaerwards

●  solu)on:VectorClocks-  originatedinthefieldofopera)ngsystems-  LeslieLamport(1978)describesTimestamps/Clocks

Page 39: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

LamportTimestamps/Criteria●  weakconsistencycriterion:ifevente1causesevente2thenthe)mestampofe1hasbesmallerthanthe)mestampofe2

●  strongconsistencycriterion(theopposite):ifthe)mestampofe1issmallerthantheoneofe2thenevente1hasbeenthecauseforevente2

●  eventscanbesortedinapar)alorder-  everyeventgetsa)mestampwhichdoesnotreflectreal)me

-  monotoneincreasinginteger

●  Timestampsfulfillonlytheweakcriterion

Page 40: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

VersionVector/VectorClock

●  VersionVector:Vector(Tuple)ofvalues/)mestampsofanobject

●  VectorClock:-  Eachprocess/databasehasancounterwhichisincremented

-  everyprocessremembersthesenderandthe)mestamp

-  everymessage/versionhasavectorofid-)mestamppairsagached

Page 41: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

VectorClocksinNoSQL

●  sotheVectorClockisalistofIDxTimetuples●  thisenabletheclienttosortandfigureoutthedifferentversionsifmul)pleclientsupdateandreplicaterecordsatthesame)me

●  wedemonstratethiswithasimpleexample:-  threepeople,denotedbytheirini)alwanttoagreeonasportsac)vity

Page 42: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

Laura

Paul

Anna

jogging

L:1

surfing

P:1 L:1

jogging

A:1 L:1

surfing

L:2 P:1

surfing

L:2 P:1 A:0

jogging

A:1 L:1 P:0

surfing

P:2 A:1 L:2

Page 43: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

StorytotheExample

●  Laura,AnnaandPaul(replacingnodes)wanttoagreeonsports(haveconsistentdata)-  nodescanrequestthecurrentversionofarecordandtheycanupdateeachother

-  simultaneousbroadcastcreatesconfusion-  goal:consistentinforma)on(consensusprotocols)

Page 44: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

StorytotheExample/Solu)on

●  Laurastarts,sugges)ngtogojogging:jogging,[L:1](joggingisthedatatostore,L:1theVectorClock)andsends/replicatesthistoAnnaandPaul

●  Paulisbecomingac)veandsuggesttogosurfing:surfing,[L:1,P:1]andsendsthistoAnnaandPaul.

●  BecauseofnetworkproblemAnnadoesnotreceivethemessage,Laurareceivesit

Page 45: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

StorytotheExample/Solu)on●  LauraagreestoPaulandreturnthesurfingsugges)on,incremen)nghercounter:surfing,[P:1,L:2]

●  Annabecomesconcernedandagreestojogging,basedonLaura’ssugges)on:jogging[L:1,A:1]andsendsittoPaul

●  Paulhasto(andcan)detecttheconflict:joggingcouldhadamajority(Laura&Anna),BUTLauraalsoalreadyagreedonsurfing(Laura&Paul)

Page 46: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

StorytotheExample/Solu)on

●  surfing[A:0,P:1,L:2]jogging[P:0,L:1,A:1]notyetknowncountersarelistedwith0

●  PaulcandetectthatAnna’smessagewasnotaresponsetohissugges)onsinceP:0.Therearetwopossibleresolu)ons:-  jogging,becauseini)allybothgirlswantedto-  surfing,becauseLaurachangedhermind

Page 47: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

StorytotheExample/Solu)on

●  PauldecidestogoonwithsurfingandcommunicatesthistoAnnaandLaura:surfing,[L:2,A:1,P:2]

●  thediscussioncoulds)llgoonnow,butthiswaytheVectorClockshelptomakereasonabledecisionsandtocheckcausaldependencies

Page 48: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

Paxos●  goal:ensuredataintegrityifnodesinclusterwithreplicateddatafail

●  belongstoQuorum-Consensusalgorithms●  leadstoanagreementbetweenpar)cipa)ngnodes

●  superiortoclassicalTwo-Phase-Commit(2PC)●  tolerantfor:-  minorityofthenodesfails-  atransac)oncrashes-  messageloss

Page 49: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

BasicPaxosConsensusAlgorithm

●  basedonvo)ng:-  oneclientsuggestsavalues-  theotheracceptors(quorum)vote-  eachballothasaleader(coordinator)-  proposerssupportclients,convinceacceptorsandcoordinateconflictresolu)ons

Page 50: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

BasicPaxos–Execu)on●  Phase1a(prepare):proposer/leaderacquiresthecurrent(maximum)ballotnumberfromphase1andsendsittothequorum

●  Phase1b(prepare):ifthereceivednumberislargerthananynumberreceivedbeforeanodesendsitsstatustotheleaderincluding:-  largestreceivednumberfromphase1a-  largestnumbersentinphase2b-  nosmallerorequalballotnumbersthanthecurrentwillbeaccepted

Page 51: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

BasicPaxos–Execu)on●  Phase2a(accept):iftheleaderforaballotreceivedposi)ve1bmessagesfromaquorum-  free–noquorumhassentanumberlargerthan2bandhasthereforevotedforavaluev(nocompletedballotbefore)

-  forced–aquorumhassentaballotlargerphase2b,i.e.ithasselectedanvaluev

-  ifforcedleadersendsvaluev,iffreeleadercansendanyvalue

Page 52: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

BasicPaxos–Execu)on●  Phase2b(accepted):ifanacceptorgetsa2amessageforwhichheagreedbeforewitha1bmessage,thevalueisacceptedanditsendsaphase2bmessagewithvandballottotheleader

●  Phase3:Iftheleadergetaphase2bmessageforvandballotfromaquorum,itknowsthatvwasacceptedandcommunicatesthistoallinterestedprocesses

Page 53: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

GraphDatabases●  graphsallowtorepresentconnectedinforma)onveryintui)velybyusingver)cesandedges

●  usefulforcurrentproblemslike,a.o.:-  internetrou)ng-  contactsinsocialnetworks-  recommendersystems-  frauddetec)on-  regulatorynetworks-  seman)cweb-  ...

Page 54: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

GraphLingo

●  graphsarerepresentedbyapair(tuple)oftwosets,V(ver)ces)andE(edges)

●  ver)cesarenodes,represen)ngakindoffact●  edgesaretheconnec)ons/rela)onsbetweenver)cesandcanbedirectedorundirected

●  G=(V,E),V={1,2},E=VxV={(1,2)}

Page 55: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

PropertyGraphModel

●  directed,mul)-rela)onalgraph●  labeled/(typed)edges

●  ver)cesandlabelshaveproper)es

●  proper)esarekey/valuepairsoftype<String,Object>like:Name:AliceorAge:30

Page 56: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

PropertyGraphModel●  strongtypingofver)cesandedgespossible(‘Type’/’_Type’,dependsonthesupportofthesystem)-  usefultoaseman)cmeaning-  supportofautoma)chandling-  allowsfordefini)onofconsistencycriteriaandindices

-  makepar))onofgraphseasier

●  bidirec)onaledgesarerealizedbytwounidirec)onaledges

Page 57: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

PropertyGraphModel

●  mul)-edgesrequiredifferentlabels●  ver)cesandedgeshaveanuniqueiden)ty:‘Id’,’_Id’

●  usedforids:integers,string,URIs

●  extension:mul)valueproperty,whichallowlistsorsetsofvalues

●  specialcaseforedgelabelvalues:edgeweights

●  anotherextension:higherorderrela)onswithhyperedgesandhyperver)ces

Page 58: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

Id 1

Type Person Name Alice Age 20

Id 4

Type Group Name Chess

Id 3

Type Person Name Paula

Age 23

Id 2

Type Person Name Bob Age 25

Label: knows since: 06/2013

Labe

l: kn

ows

sinc

e: 0

6/20

13

Label: knows since: 06/2013

Label: is_member since: 07/2013

Label: has member since: 07/2013

Page 59: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

PropertyGraphModel/Extensions

●  higherorderrela)onswithhyperedgesandhypernodes-  hyperedge:connectsmorethantwonodes-  hypervertex:combina)onofasetofver)ces/nodes,keepsinternaledges

●  paths:sequenceofedges

●  subgraphs:adefinedcombina)onofnodesandedgesintoasinglenode

●  versioninforma)onallowstorepresentthegraphevolu)onand/orconcurrency

Page 60: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

GraphRepresenta)ons

●  differentrepresenta)onsavailableforpersistenceandmemory

●  difficulttomatchagoodperformaningpersistenceandagoodsupportforavarietyofgraphalgorithmsatthesame)me

Page 61: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

AdjacencyMatrix

●  squarematrix/table●  allnnodesarelistedhorizontallyandver)cally

●  ifanedgeexistsbetweennodesuandv,thereisanentryinthetableatposi)on[u,v]

●  testfortheconnec)onoftwonodesuandvcanbedoneveryquick

Page 62: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

AdjacencyMatrix/Problems

●  disadvantage:hugespaceconsump)onevenwithsparsematrices,i.e.graphswithmanynodesbutonlyafewedges

●  itisdifficulttoiden)fytheconnec)ngedgesforagivennode

●  toiden)fyneighborsyoualwayshavetoreadacompleteroworcolumn

●  hypergraphscannotberepresented

Page 63: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

IncidencyMatrix

●  amatrixwithnodesononeaxisandedgesontheotheraxis

●  muchmorespaceefficientforveryweaklyconnectededgesthantheadjacencymatrix

●  inmoreconnectedgraphsitneedsmorespacethantheadjacencymatrix

●  canrepresenthypergraphs

Page 64: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

AdjacencyList

●  extensionofedgelist●  edgesaresortedaccordingtotheirstartnode

●  foreverynodetheconnec)ngedgesarestored

●  )meconsump)ondependsonlytoconnec)vityofthenode,notonthecompletegraphsize

Page 65: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

EdgeList

●  nodesandedgesarestoredseparately●  inser)onanddele)onofsingleedgesisveryefficient

●  iden)fica)onofconnec)ngedgesgivenanodeisinefficient,sincethewholeedgelisthastobesearched

Page 66: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

ExampleGraph

v1

v3 v4

v2

e1 e2

e3

e5

e4 v1 v2 v3 v4

v1 0 0 0 0

v2 1 1 1 0

v3 2 1 0 0

v4 0 1 0 0

e1 e2 e3 e4 e5 e6

v1 1 1 1 0 0 0

v2 0 0 -1 2 1 1

v3 -1 -1 0 0 0 1

v4 0 0 0 0 -1 0

V1 v2 v3 v3 V2 v2 v3 v4 V3 v2 V3

Page 67: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

GraphTraversal

●  eitherpar)alorcompletevisitofthenodes●  threestrategies:-  breadth-first/depth-first-  algorithmictraversals-  randombased

Page 68: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

GraphIndexingandPar))oning

●  Graphindexesarefirst-classci)zens●  caninsertedassub-graphsandagachedtospecificnodesasspecificinforma)on

●  IfGraphgetstobigitcanbesplitintopar)algraphs

●  Op)malPar)oningishighlydomainandseman)cs-dependent->nogoodstandardsolu)on

Page 69: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

TinkerpopGraphProcessingStep

●  agempttoprovideuniforminterfacesforProperty-Graphbasedsystems

●  coversthebackenddatabasefromtheapplica)ondeveloper

●  consistsofseveralsub-projects:-  Blueprints:JavainterfaceforProperty-Graphmodels–noownpersistenceyet

-  supportstransac)ons

Page 70: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

// Erzeuge einen neuen Graphen mit Neo4j-PersistenzGraph graph = new Neo4jGraph("/tmp/my_graph");

// Erzeuge Knoten mit Id "Alice", "Bob" und "Carol"Vertex alice = graph.addVertex("Alice");Vertex bob = graph.addVertex("Bob");Vertex carol = graph.addVertex("Carol");

// Füge die Namen und das Alter als Properties hinzualice.setProperty("Name", "Alice");alice.setProperty("Alter", 18);bob .setProperty("Name", "Bob");bob .setProperty("Alter", 22);carol.setProperty("Name", "Carol");carol.setProperty("Alter", 20);

// Erstelle die dazugehörigen Kanten...Edge e1 = graph.addEdge("e1", alice, bob, "kennt");Edge e2 = graph.addEdge("e2", alice, carol, "kennt");Edge e3 = graph.addEdge("e3", carol, bob, "kennt");

// ...und setzte das Kanten-Property "seit"e1.setProperty("seit", "2001/10/03");e2.setProperty("seit", "2003/12/04");e3.setProperty("seit", "2001/07/12");

graph.shutdown();

- orderid - 17800851 - transid - 17800851_1D -

taken from Stefan Edlich et al. “NoSQL”, 2. Auflage, Hanser Verlag (2011)

GraphCrea)on

Page 71: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

Set<String> indexKeys = new HashSet<String>();indexKeys.add("Name");

// Indiziere die angegebenen Property-Schlüssel aller KnotenAutomaticIndex index = graph.createAutomaticIndex( "IndexOfName", Vertex.class, indexKeys);

// Bereits vorhandene Knoten müssen neu indiziert werdenAutomaticIndexHelper.reIndexElements(index, graph.getVertices());

// Iteriere über die Ergebnisse der Indexanfragefor (Vertex vertex : index.get("Name", "Alice")) { System.out.println("Vertex: " + vertex);}

- orderid - 17800851 - transid - 17800851_1D -

taken from Stefan Edlich et al. “NoSQL”, 2. Auflage, Hanser Verlag (2011)

Page 72: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

GraphQueryLanguages

●  nocommonstandardyet●  pagern-based:SPARQL,RDFQuerylanguage

●  naviga)on-base:Gremlin,sonesGQL

●  logic-bases:OWL,GraphLog

Page 73: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

Neo4j

●  oneoftheoldestNoSQLgraphdatabases(2003)●  fullACIDsupport

●  usesownformattostoregraphsondisc

●  ApacheLuceneusedforindexing●  canrunasserveraswellasembedded

Page 74: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

Integra)onwithJava

●  easiestintegra)onusingMaven(addingtothexmlfile),then:GraphDatabaseService graphdb = new EmbeddedGraphDatabase("/var/graphdb"); "

Page 75: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

GraphCrea)onenum Relationships implements Rel { knows } "

Transaction tx = graphdb.beginTx(); try { "Node Alice = graphdb.createNode(); "Node Bob = graphdb.createNode(); "Node Carol = graphdb.createNode(); "Alice.setProperty("Name", ""Alice"); "Bob.setProperty("Name", "Bob"); "Carol.setProperty("Name", "Carol"); "Alice.setProperty(”Age", "18); "Bob.setProperty("Age", 20); "Carol.setProperty("Age", 22);

"Relationship Alice_Bob = Alice.createRelationshipTo(Bob, "Rel.knows); "Relationship Alice_Carol = Alice.createRelationshipTo(Carol, "Rel.knows); "Relationship Carol_Bob = Carol.createRelationshipTo(Bob, "Rel.knows); ""

Page 76: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

GraphCrea)on"Alice_Bob.setProperty(”since", ); "Alice_Carol.setProperty(”since", ); "Carol_Bob.setProperty(”since", ); "tx.success();

} catch (Exception e) { "tx.failure();

} finally { "tx.finish(); "

} "

"

"

Page 77: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

ManualIndexing"IndexManager index = graphdb.index(); "Index<Node> UserIdx = index.forNodes(”User"); "RelationshipIndex KnowsIdx = index.forRelationships(”knows"); "UserIdx.add(Alice, "Name", Alice.getProperty("Name")); "UserIdx.add(Alice, "Age", Alice.getProperty("Age")); "[...] "

"

"

Page 78: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

TraversalConfigura)on

●  besidessimpletraversalsandwildcardsearchestherearenumberofsophis)catedtweaks:-  Order:Determinesthebranchingorder(DFS/BFS)-  Uniqueness:howtohandlemul)plehitsofthesamenodes

-  Pruning:whichbranchesnottofollow-  Filtering:whichhitsareconsideredfortheresult-  Rela)onshipexpanding:dedicatededgehandling

Page 79: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

ExampleTraversalTraversalDescription td = new TraversalDescriptionImpl(); td = td.prune(Traversal.pruneAfterDepth(2)). "filter(Traversal.returnAllButStartNode()). "relationships(KNOWS);

Traverser tr = td.traverse(startNode); for ( Path path : tr ) { "System.out.println( "End Node: " + "path.endNode().getProperty( NodeProperty.NAME ) );

} "

"

"

"

Page 80: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

Cypher

●  owngraphquerylanguagesinceversion1.4-  developedforpagernrecogni)on-  declara)ve-  implementedinScala->parallelenabled

●  querystructure:-  startswithasetofnodes-  matchstatement(nodein(),edges->)-  returnstatementwithop)onalwhereorsort

Page 81: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

Examples//start nodes via Ids start Person = (1, 2) match (Person)-[:knows]->(Friend) where Friend.Age > 18 return Friend.Name, Friend.Age, Friend.City? sort by Friend.Name "

"

// starts nods via index query start Person = (Person-index, Name, "Alice") match (Person)-[:knows]->()-[:knows]->(FriendofFriend) where not(FriendofFriend.Age < 17) return FriendofFriend.Name "

"

"

"

Page 82: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

InterfacingNoSql

●  specificAPIsvaryheavily●  mostsupportRESTfulinterface:-  REpresenta)onalStateTransfer-  architectureforwebapplica)ons-  predominantlyimplementedusingHTTPprotocol-  DescribedbyRoyThomasFielding:“ArchitectureStylesandtheDesignofNetwork-basedSoawareArchitectures”,Disserta)on,UCIrvine,2000

Page 83: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

CRUD

●  minimumsetofaccessfunc)ons:-  Create,Read,Update,Delete

CRUD SQL HTTP Create insert POST Read select GET Update update PUT Delete delete DELETE

Page 84: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

Components●  Resources,Opera)ons,Links●  eachrequestisindependent,i.e.ithasnostate->noneedforsynchroniza)on

●  abstractviewofhgpprotocol:nounsandverbs.–eachrequestisdefinedbytheapplica)onofaverbtonounandanop)onalresponse

●  arequestiscomposedofaheaderwithamethodandmetadatainkey/valueformatandanop)onalbody

●  aresponseislikearequestbutwithoutamethod

Page 85: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

Resources

●  addressableend-pointtothesystem-  e.g.HTMLdocument,video,aprocess

●  aresourceisabstractandcanhavemorethanonerepresenta)on

●  theuseralwaysinteractswitharepresenta)on(HTML,agraphicsformat,XML,...)andmaychoosethedesiredone

Page 86: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

Opera)ons

●  HTTPdefinesasetofopera)onswithknownseman)cs:-  GET-  HEAD-  PUT-  POST-  DELETE

Page 87: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

Characteris)csforOpera)ons

●  opera)onscanbeclassifiedaccordingthecriteriasafeandidempotentwhichareimportantforthesystem’sintegrityandcachingperformance

●  safe:nosideeffects,noresponsibilityfortheuser

●  idempotent:sideeffect,butonlythefirst)me–uponmul)pleexecu)ontheserverstatedoesnotchangeanymore

Page 88: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

GET/HEAD

●  safeandidempotent●  HEAD:returnsonlymetainforma)onabouttheresource

●  GET:containsinaddi)ontothemetainforma)onalsoarepresenta)onoftheresource

●  anonconformingexample:“hgp://www.example.com/api?ac)on=delete”

Page 89: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

PUT

●  idempotent●  thereferencedresourcerepresenta)onistransmigedtotheserver(sideeffect->notsafe)

●  onlythefirstexecu)onchangesthestateoftheserver

●  thiscanbeachievedifaservermaintainsversionnumbersforadocumentwhichhastobematchbytherequest

Page 90: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

PUT–SimpleExample●  Q(request):GETdoc●  R(response):returndocv=1,doccontent

●  Q:Putdocv=1,doccontentmodified

●  R:Requestv=1matchesserverv=1;doccontentmodifiedstored;updateversionv=2

●  Q(asecond)me):Putdocv=1,doccontentmodified(maybeagain)

●  R:Requestv=1doesnotmatchserverv=2;doccontentnotstored;

Page 91: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

DELETE

●  idempotent-  oncetheresourceisremovedallsubsequentrequestsfail->serverstateremainsthesame

●  notsafe

●  thereferredresourceisremovefromtheserver/accessblocked

Page 92: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

POST

●  noguaranteesatall●  transmitsdataforprocessing

●  theprocessingresultcanbeusedtocreateanewresource,modifyaexis)ngoneornotatall

●  canbeusedforverycomplexqueriesbecauseallparameterscanbeincludedinthebody–GEThadtoincludeitintheURI

Page 93: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

LINKS

●  HTTPdoesnotrepresentlinks●  linksaremodeledinURIs

●  encodingdependsonthetypeofrepresenta)on

●  cancontainmetadatatosupportappropriateresourcebytheuser

Page 94: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

Example(StefanEdlichetal..“NoSQL”,2.Auflage,HanserVerlag(2011)

POST /api/ HTTP/1.1Host: cocktails.example.comContent-Type: application/json…{ "name" : "Ipanema", "description" : "Eine alkoholfreie Variante für den Caipirinha-Abend", "ingredients" : { "Limette" : { "amount" : 1, "preparation" : "Achteln" }, "Brauner Zucker" : { "amount" : 2, "unit" : "TL" }, … }, "preparation" : "Limetten und Zucker in einem Glas mörsern, mit crushed ice bedecken und den Flüssigkeiten auffüllen. Mit einem Strohhalm servieren"}

- orderid - 17800851 - transid - 17800851_1D -

Page 95: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

HTTP/1.1 201 CreatedContent-Type: application/jsonLocation: http://cocktails.example.com/cocktails/1…

{ "id" : "1" }

GET /cocktails/1 HTTP/1.1Host: cocktails.example.com…

{ "id" : "1", "name" : "Ipanema", "description" : "Eine alkoholfreie Variante für den Caipirinha-Abend", "ingredients" : { "Limette" : { "amount" : 1, "preparation" : "Achteln" }, "Brauner Zucker" : { "amount" : 2, "unit" : "TL" }, … }, "preparation" : "Limetten und Zucker in einem Glas mörsern, mit crushed ice bedecken und den Flüssigkeiten auffüllen. Mit einem Strohhalm servieren", "links" : { "linktypes/publish" : "http://cocktails.example.com/publish/1", "linktypes/edit" : "http://cocktails.example.com/cocktails/1", "linktypes/delete" : "http://cocktails.example.com/cocktails/1" }}

- orderid - 17800851 - transid - 17800851_1D -

Page 96: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

PUT /cocktails/1 HTTP/1.1Host: cocktails.example.comContent-Type: application/json…{ … "tags" : [ "alkoholfrei", "Eis" ], …}

DELETE /cocktails/1 HTTP/1.1Host: cocktails.example.com

POST /publish/1 HTTP/1.1Host: cocktails.example.comContent-Type: application/json…{ "publish" : true }

{ "id" : "1", … "links" : { "linktypes/delete" : "http://cocktails.example.com/cocktails/1", "linktypes/ratings" : "http://cocktails.example.com/ratings/1" }}

Page 97: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

DocumentStores

●  originatestoDamienKatzandLotusNotes,CouchDB

●  theresponsibilityfortheschemaismovedfromthedatabasetowardstheapplica)on:-  lossofenforcementofnormaliza)onandreferen)alintegrity

-  gainofflexibilityandschemamodifica)onsatrun-)mefortheapplica)on

●  datamostlystoredasJSON

Page 98: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

MongoDB

●  DocumentStore●  trytoclosethegapbetweenclassicRDBMSandKey/Valuestores

●  supportedbyanumberofsuccessfulinternetcompanies(10gen,...)

●  goodintegra)onwithprogramminglanguages:C++,C#,Java,JavaScript,PHP,Ruby,Perl,Python

Page 99: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

JSONExample{ "id" : "1", "name" : "Ipanema", "description" : "Eine alkoholfreie Variante für den Caipirinha-Abend", "ingredients" : { "Limette" : { "amount" : 1, "preparation" : "Achteln" }, "Brauner Zucker" : { "amount" : 2, "unit" : "TL" }, … }, "preparation" : "Limetten und Zucker in einem Glas mörsern, mit crushed ice bedecken und den Flüssigkeiten auffüllen. Mit einem Strohhalm servieren", "links" : { "linktypes/publish" : "http://cocktails.example.com/publish/1", "linktypes/edit" : "http://cocktails.example.com/cocktails/1", "linktypes/delete" : "http://cocktails.example.com/cocktails/1" }}

- orderid - 17800851 - transid - 17800851_1D -

Page 100: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

JSONinMongoDB●  eachdocumentneedsaspecialIDfield:_id●  the_idvalueshastobeunique

●  canbeanything

●  automa)cdefault:-  automa)c12-bytenumber:

●  4byte)mestamp●  3byteclientmachineid●  2byteprocessid●  3bytecounter

Page 101: Bioinformacs Resources - NoSQL 2-€¦ · Apr. 28th Intro, General Overview (1. sh.) Jun 16th No Lecture May 5th Sequence Databases (2. sh.) Jun 23rd NoSql 2 (7.sh.) May 12th Sequence

BioinfRes SoSe 17

Demo

●  Checkoutthecommandlineandpythontutorialunder:hgp://api.mongodb.com/python/current/tutorial.html

●  getatoymongodbserverforfreeat:hgps://mlab.com