L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges...
Transcript of L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges...
![Page 1: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease](https://reader034.fdocuments.in/reader034/viewer/2022050718/5e18b9fcaaff1f7df01e4327/html5/thumbnails/1.jpg)
71
L24:NoSQL(continued)
CS3200 Databasedesign(sp18 s2)https://course.ccs.neu.edu/cs3200sp18s2/4/12/2018
![Page 2: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease](https://reader034.fdocuments.in/reader034/viewer/2022050718/5e18b9fcaaff1f7df01e4327/html5/thumbnails/2.jpg)
72
LastClasstoday
• NoSQL(15min):GraphDBs• CourseEvaluation(15min)• Coursereview
![Page 3: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease](https://reader034.fdocuments.in/reader034/viewer/2022050718/5e18b9fcaaff1f7df01e4327/html5/thumbnails/3.jpg)
73
• Introduction• TransactionConsistency• 4maindatamodels﹣ Key-ValueStores(e.g.,Redis)﹣ Column-FamilyStores(e.g.,Cassandra)﹣ DocumentStores(e.g.,MongoDB)﹣ GraphDatabases(e.g.,Neo4j)
• ConcludingRemarks
Outline
![Page 4: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease](https://reader034.fdocuments.in/reader034/viewer/2022050718/5e18b9fcaaff1f7df01e4327/html5/thumbnails/4.jpg)
74
GraphDatabases• Restrictedcaseofarelationalschema:
- Nodes (+labels/properties)- Edges (+labels/properties)
• Motivatedbythepopularityofnetwork/communicationorientedapplications
• Efficientsupportforgraph-orientedqueries- Reachability,graphpatterns,pathpatterns
- OrdinaryRDBseithernotsupportorinefficientforsuchqueries• Pathoflengthkisak-wiseselfjoin;yetaveryspecialone...
• Specializedlanguagesforgraphqueries- Forexample,patternlanguageforpaths
• Plusdistributed,2-of-CAP,etc.- Dependingonthedesignchoicesofthevendor
![Page 5: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease](https://reader034.fdocuments.in/reader034/viewer/2022050718/5e18b9fcaaff1f7df01e4327/html5/thumbnails/5.jpg)
75
ExampleDatabases
• Graphwithnodes/edgesmarkedwithlabelsandproperties(labeledpropertygraph)- Sparksee (DEX)(Java,1st release2008)- neo4j (Java,1st release2010)- InfiniteGraph (Java/C++,1st release2010)- OrientDB (Java,1st release2010)
• Triplestores:SupportW3CRDFandSPARQL,alsoviewedasgraphdatabases- MarkLogic,AllegroGraph,Blazegraph,IBMSystemG,OracleSpatial&Graph,OpenLink
Virtuoso,ontotext
![Page 6: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease](https://reader034.fdocuments.in/reader034/viewer/2022050718/5e18b9fcaaff1f7df01e4327/html5/thumbnails/6.jpg)
76
neo4j
• Opensource,writteninJava- Firstversionreleased2010
• SupportstheCypher querylanguage(declarativegraphQL)
• Clusteringsupport- Replicationandsharding throughmaster-slavearchitectures
• Usedbyebay,Walmart,Cisco,NationalGeographic,TomTom,Lufthansa,...
![Page 7: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease](https://reader034.fdocuments.in/reader034/viewer/2022050718/5e18b9fcaaff1f7df01e4327/html5/thumbnails/7.jpg)
77
ExamplestakenfromGraphDatabases byRobinson,Webber,andEifrem (O’Reilly)– freeeBook
![Page 8: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease](https://reader034.fdocuments.in/reader034/viewer/2022050718/5e18b9fcaaff1f7df01e4327/html5/thumbnails/8.jpg)
78
TheGraphDataModelinCypher
• Labeledpropertygraph model
• Node- Hasasetoflabels (typicallyonelabel)- Hasasetofproperties key:value (wherevalueisofaprimitivetypeoranarrayofprimitives)
• Edge (relationship)- Directed:nodeànode- Hasaname- Hasasetof properties (likenodes)
![Page 9: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease](https://reader034.fdocuments.in/reader034/viewer/2022050718/5e18b9fcaaff1f7df01e4327/html5/thumbnails/9.jpg)
79
Example:CypherGraphforSocialNetworks
Graphs Are EverywhereGraphs are extremely useful in understanding a wide diversity of datasets in fieldssuch as science, government, and business. The real world—unlike the forms-basedmodel behind the relational database—is rich and interrelated: uniform and rule-bound in parts, exceptional and irregular in others. Once we understand graphs, webegin to see them in all sorts of places. Gartner, for example, identifies five graphs inthe world of business—social, intent, consumption, interest, and mobile—and saysthat the ability to leverage these graphs provides a “sustainable competitive advan‐tage.”
For example, Twitter’s data is easily represented as a graph. In Figure 1-1 we see asmall network of Twitter users. Each node is labeled User, indicating its role in thenetwork. These nodes are then connected with relationships, which help furtherestablish the semantic context: namely, that Billy follows Harry, and that Harry, inturn, follows Billy. Ruth and Harry likewise follow each other, but sadly, althoughRuth follows Billy, Billy hasn’t (yet) reciprocated.
Figure 1-1. A small social graph
Of course, Twitter’s real graph is hundreds of millions of times larger than the exam‐ple in Figure 1-1, but it works on precisely the same principles. In Figure 1-2 we’veexpanded the graph to include the messages published by Ruth.
2 | Chapter 1: Introduction
labelproperty
directionname
![Page 10: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease](https://reader034.fdocuments.in/reader034/viewer/2022050718/5e18b9fcaaff1f7df01e4327/html5/thumbnails/10.jpg)
80
(email_4)-[:TO]->(davina), (email_4)-[:TO]->(edward);
CREATE (email_5:Email {id:'5', content:'email contents'}), (davina)-[:SENT]->(email_5), (email_5)-[:TO]->(alice), (email_5)-[:BCC]->(bob), (email_5)-[:BCC]->(edward);
This leads to the more complex, and interesting, graph we see in Figure 3-10.
Figure 3-10. A graph of email interactions
Common Modeling Pitfalls | 57
AnotherExample:EmailExchange
![Page 11: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease](https://reader034.fdocuments.in/reader034/viewer/2022050718/5e18b9fcaaff1f7df01e4327/html5/thumbnails/11.jpg)
81
QueryExample
(email_4)-[:TO]->(davina), (email_4)-[:TO]->(edward);
CREATE (email_5:Email {id:'5', content:'email contents'}), (davina)-[:SENT]->(email_5), (email_5)-[:TO]->(alice), (email_5)-[:BCC]->(bob), (email_5)-[:BCC]->(edward);
This leads to the more complex, and interesting, graph we see in Figure 3-10.
Figure 3-10. A graph of email interactions
Common Modeling Pitfalls | 57
MATCH (bob:User{username:'Bob'})-[:SENT]->(email)-[:CC]->(alias), (alias)-[:ALIAS_OF]->(bob)
RETURN email
Node{id:"1",content:"..."}
![Page 12: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease](https://reader034.fdocuments.in/reader034/viewer/2022050718/5e18b9fcaaff1f7df01e4327/html5/thumbnails/12.jpg)
82
CreatingGraphData
This first modeling attempt results in a star-shaped graph with Bob at the center. Hisactions of emailing, copying, and blind-copying are represented by relationships thatextend from Bob to the nodes representing the recipients of his mail. As we see inFigure 3-8, however, the most critical element of the data, the actual email, is missing.
Figure 3-8. Missing email node leads to lost information
This graph structure is lossy, a fact that becomes evident when we pose the followingquery:
MATCH (bob:User {username:'Bob'})-[e:EMAILED]-> (charlie:User {username:'Charlie'})RETURN e
This query returns the EMAILED relationships between Bob and Charlie (there willlikely be one for each email that Bob has sent to Charlie). This tells us that emailshave been exchanged, but it tells us nothing about the emails themselves:
+----------------+| e |+----------------+| :EMAILED[1] {} |+----------------+1 row
54 | Chapter 3: Data Modeling with Graphs
CREATE (alice:User {username:'Alice'}), (bob:User {username:'Bob'}),(charlie:User {username:'Charlie'}),(davina:User {username:'Davina'}),(edward:User {username:'Edward'}),(alice)-[:ALIAS_OF]->(bob)
![Page 13: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease](https://reader034.fdocuments.in/reader034/viewer/2022050718/5e18b9fcaaff1f7df01e4327/html5/thumbnails/13.jpg)
83
CreatingGraphData
This first modeling attempt results in a star-shaped graph with Bob at the center. Hisactions of emailing, copying, and blind-copying are represented by relationships thatextend from Bob to the nodes representing the recipients of his mail. As we see inFigure 3-8, however, the most critical element of the data, the actual email, is missing.
Figure 3-8. Missing email node leads to lost information
This graph structure is lossy, a fact that becomes evident when we pose the followingquery:
MATCH (bob:User {username:'Bob'})-[e:EMAILED]-> (charlie:User {username:'Charlie'})RETURN e
This query returns the EMAILED relationships between Bob and Charlie (there willlikely be one for each email that Bob has sent to Charlie). This tells us that emailshave been exchanged, but it tells us nothing about the emails themselves:
+----------------+| e |+----------------+| :EMAILED[1] {} |+----------------+1 row
54 | Chapter 3: Data Modeling with Graphs
MATCH (bob:User {username:'Bob'}), (charlie:User {username:'Charlie'}), (davina:User {username:'Davina'}), (edward:User {username:'Edward'})
CREATE (bob)-[:EMAILED]->(charlie), (bob)-[:CC]->(davina),(bob)-[:BCC]->(edward)
CREATE (alice:User {username:'Alice'}), (bob:User {username:'Bob'}),(charlie:User {username:'Charlie'}),(davina:User {username:'Davina'}),(edward:User {username:'Edward'}),(alice)-[:ALIAS_OF]->(bob)
![Page 14: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease](https://reader034.fdocuments.in/reader034/viewer/2022050718/5e18b9fcaaff1f7df01e4327/html5/thumbnails/14.jpg)
84
AnotherExample
Figure 3-12. Explicitly modeling replies in high fidelity
Here we capture each matched path, binding it to the identifier p. In the RETURNclause we then calculate the length of the reply-to chain (subtracting 1 for the SENTrelationship), and return the replier’s name and the depth at which he or she replied.This query returns the following results:
+-------------------+| replier | depth |+-------------------+| "Davina" | 1 || "Bob" | 1 || "Charlie" | 2 || "Bob" | 3 |+-------------------+4 rows
We see that both Davina and Bob replied directly to Bob’s original email; that Charliereplied to one of the replies; and that Bob then replied to one of the replies to a reply.
It’s a similar pattern for a forwarded email, which can be regarded as a new email thatsimply happens to contain some of the text of the original email. As with the replycase, we model the new email explicitly. We also reference the original email from the
Common Modeling Pitfalls | 61
MATCH p = (email:Email {id:'6'})<-[:REPLY_TO*1..4]-(:Reply)<-[:SENT]-(replier)
RETURN replier.username AS replier, length(p) - 1 AS depthORDER BY depth
replier depth
Davina 1
Bob 1
Charlie 2
Bob 3
Path assignment
![Page 15: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease](https://reader034.fdocuments.in/reader034/viewer/2022050718/5e18b9fcaaff1f7df01e4327/html5/thumbnails/15.jpg)
85
AnotherExample
Figure 3-12. Explicitly modeling replies in high fidelity
Here we capture each matched path, binding it to the identifier p. In the RETURNclause we then calculate the length of the reply-to chain (subtracting 1 for the SENTrelationship), and return the replier’s name and the depth at which he or she replied.This query returns the following results:
+-------------------+| replier | depth |+-------------------+| "Davina" | 1 || "Bob" | 1 || "Charlie" | 2 || "Bob" | 3 |+-------------------+4 rows
We see that both Davina and Bob replied directly to Bob’s original email; that Charliereplied to one of the replies; and that Bob then replied to one of the replies to a reply.
It’s a similar pattern for a forwarded email, which can be regarded as a new email thatsimply happens to contain some of the text of the original email. As with the replycase, we model the new email explicitly. We also reference the original email from the
Common Modeling Pitfalls | 61
MATCH p = (email:Email {id:'6'})<-[:REPLY_TO*1..4]-(:Reply)<-[:SENT]-(replier)
RETURN replier.username AS replier, length(p) - 1 AS depthORDER BY depth
replier depth
Davina 1
Bob 1
Charlie 2
Bob 3
Path assignment
![Page 16: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease](https://reader034.fdocuments.in/reader034/viewer/2022050718/5e18b9fcaaff1f7df01e4327/html5/thumbnails/16.jpg)
86
AnotherExample
Figure 3-12. Explicitly modeling replies in high fidelity
Here we capture each matched path, binding it to the identifier p. In the RETURNclause we then calculate the length of the reply-to chain (subtracting 1 for the SENTrelationship), and return the replier’s name and the depth at which he or she replied.This query returns the following results:
+-------------------+| replier | depth |+-------------------+| "Davina" | 1 || "Bob" | 1 || "Charlie" | 2 || "Bob" | 3 |+-------------------+4 rows
We see that both Davina and Bob replied directly to Bob’s original email; that Charliereplied to one of the replies; and that Bob then replied to one of the replies to a reply.
It’s a similar pattern for a forwarded email, which can be regarded as a new email thatsimply happens to contain some of the text of the original email. As with the replycase, we model the new email explicitly. We also reference the original email from the
Common Modeling Pitfalls | 61
MATCH p = (email:Email {id:'6'})<-[:REPLY_TO*1..4]-(:Reply)<-[:SENT]-(replier)
RETURN replier.username AS replier, length(p) - 1 AS depthORDER BY depth
replier depth
Davina 1
Bob 1
Charlie 2
Bob 3
Path assignment
![Page 17: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease](https://reader034.fdocuments.in/reader034/viewer/2022050718/5e18b9fcaaff1f7df01e4327/html5/thumbnails/17.jpg)
87
Whentouseit
• Useit:- Connecteddata,e.g.socialgraphs,employeeswheretheyworked- Location-basedservices- Recommendationengines
• Don'tuseit:- Changepropertiesonmanyentities
![Page 18: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease](https://reader034.fdocuments.in/reader034/viewer/2022050718/5e18b9fcaaff1f7df01e4327/html5/thumbnails/18.jpg)
88
• Introduction• TransactionConsistency• 4maindatamodels﹣ Key-ValueStores(e.g.,Redis)﹣ Column-FamilyStores(e.g.,Cassandra)﹣ DocumentStores(e.g.,MongoDB)﹣ GraphDatabases(e.g.,Neo4j)
• ConcludingRemarks
Outline
![Page 19: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease](https://reader034.fdocuments.in/reader034/viewer/2022050718/5e18b9fcaaff1f7df01e4327/html5/thumbnails/19.jpg)
89
StrategyCanvas:ExampleNintendoWii(1/3)
¤ INSEAD Blue Ocean Strategy Institute 2013
Nintendo Wii Strategy Canvas
Price
High resolutiongraphics
Non-gamingfunctionalities
HDTV capabilities
Processingpower
Onlinegaming
Design aesthetics
Available gametitles
Motion
Family friendlygames
High
Low
Off
erin
g L
evel
Source:INSEAD,BlueOceanStrategyInstitute,2013.
![Page 20: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease](https://reader034.fdocuments.in/reader034/viewer/2022050718/5e18b9fcaaff1f7df01e4327/html5/thumbnails/20.jpg)
90
StrategyCanvas:ExampleNintendoWii(2/3)
¤ INSEAD Blue Ocean Strategy Institute 2013
Nintendo Wii
Nintendo Wii Strategy Canvas
Price
High resolutiongraphics
Non-gamingfunctionalities
HDTV capabilities
Processingpower
Onlinegaming
Design aesthetics
Available gametitles
Motion
Family friendlygames
High
Low
Off
erin
g L
evel
Video Game Console Industry
Source:INSEAD,BlueOceanStrategyInstitute,2013.
![Page 21: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease](https://reader034.fdocuments.in/reader034/viewer/2022050718/5e18b9fcaaff1f7df01e4327/html5/thumbnails/21.jpg)
91
StrategyCanvas:ExampleNintendoWii(3/3)
¤ INSEAD Blue Ocean Strategy Institute 2013
Nintendo Wii
Create
Nintendo Wii Strategy Canvas
Price
High resolutiongraphics
Non-gamingfunctionalities
HDTV capabilities
Processingpower
Onlinegaming
Design aesthetics
Available gametitles
Motion
Family friendlygames
High
Low
Off
erin
g L
evel
Video Game Console Industry
Eliminate Reduce Raise
Source:INSEAD,BlueOceanStrategyInstitute,2013.
![Page 22: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease](https://reader034.fdocuments.in/reader034/viewer/2022050718/5e18b9fcaaff1f7df01e4327/html5/thumbnails/22.jpg)
92
RedefinetheMarket
![Page 23: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease](https://reader034.fdocuments.in/reader034/viewer/2022050718/5e18b9fcaaff1f7df01e4327/html5/thumbnails/23.jpg)
93
ConcludingRemarksonCommonNoSQL
• Aimtoavoidjoin&ACIDoverhead- Joinedwithin,correctnesscompromisedforquickanswers;believeinbesteffort
• Avoidtheideaofaschema• Querylanguagesaremoreimperative- Andlessdeclarative- Developerbetterknowswhat’sgoingon;lessrelianceonsmartoptimizationplans
- Moreresponsibilityondevelopers• Nostandardwellstudiedlanguages(yet)