Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed...
Transcript of Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed...
![Page 1: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon](https://reader036.fdocuments.in/reader036/viewer/2022071003/5fc01c88db4ede475d04fe55/html5/thumbnails/1.jpg)
Databases 1
Daniel POP
![Page 2: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon](https://reader036.fdocuments.in/reader036/viewer/2022071003/5fc01c88db4ede475d04fe55/html5/thumbnails/2.jpg)
Week 12
![Page 3: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon](https://reader036.fdocuments.in/reader036/viewer/2022071003/5fc01c88db4ede475d04fe55/html5/thumbnails/3.jpg)
Agenda
1. Introduction to NoSQL databases
2. Data representation in NoSQL
– XML
– JSON
3. Key-value database
4. (Wide) Column database
5. Document database
6. Graph database
![Page 4: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon](https://reader036.fdocuments.in/reader036/viewer/2022071003/5fc01c88db4ede475d04fe55/html5/thumbnails/4.jpg)
Introduction to NoSQL databases
![Page 5: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon](https://reader036.fdocuments.in/reader036/viewer/2022071003/5fc01c88db4ede475d04fe55/html5/thumbnails/5.jpg)
Relational databases advantages
1. Reliability
2. Safety
3. ACID: Atomicity, Consistency, Isolation,
Durability
4. Easy, standardized query language (SQL)
![Page 6: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon](https://reader036.fdocuments.in/reader036/viewer/2022071003/5fc01c88db4ede475d04fe55/html5/thumbnails/6.jpg)
Relational databases limitations
1. Scalability
• usually costly scale-up approach
• problems with distributed databases (difficult to
join distributed tables)
2. Impedance mismatch
• different representations in memory and database
3. Complexity
• data as tables, schema-bound
4. Query language (SQL) only for structured data
5. Dealing with semi-/un- structured data
![Page 7: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon](https://reader036.fdocuments.in/reader036/viewer/2022071003/5fc01c88db4ede475d04fe55/html5/thumbnails/7.jpg)
Unstructured data explosition
![Page 8: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon](https://reader036.fdocuments.in/reader036/viewer/2022071003/5fc01c88db4ede475d04fe55/html5/thumbnails/8.jpg)
Semi-structured data examples
![Page 9: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon](https://reader036.fdocuments.in/reader036/viewer/2022071003/5fc01c88db4ede475d04fe55/html5/thumbnails/9.jpg)
Un-structured data examples
![Page 10: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon](https://reader036.fdocuments.in/reader036/viewer/2022071003/5fc01c88db4ede475d04fe55/html5/thumbnails/10.jpg)
What is Big Data?
From a technology perspective, Big Data is defined as
those data sets whose size, type, and speed-of-creation
make them impractical to process and analyse
with traditional database technologies and related tools in
a cost- or time-effective way.
Source: wikibon.org
![Page 11: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon](https://reader036.fdocuments.in/reader036/viewer/2022071003/5fc01c88db4ede475d04fe55/html5/thumbnails/11.jpg)
4 Vs of BigData
• Volume: – 12TB of tweets / day => product sentiment analysis
• Velocity: – fraud detection
– predict customer churn faster (anlyse 500 million daily calls in real-time)
• Variety: – exploit 80% growth of un&semi-structured data for customer satisfaction
• Variability / Veracity:– variance in meaning (1 in 3 business leader don‘t trust the information they
use to make decisions)
Source: Brian Hopkins (Forrester)
![Page 12: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon](https://reader036.fdocuments.in/reader036/viewer/2022071003/5fc01c88db4ede475d04fe55/html5/thumbnails/12.jpg)
Distributed systemsDistributed system is a “collection of independent computers that appear to
the users of the system as a single computer” (Tanenbaum)
![Page 13: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon](https://reader036.fdocuments.in/reader036/viewer/2022071003/5fc01c88db4ede475d04fe55/html5/thumbnails/13.jpg)
Requirements of distributed applications
1. Consistency: all nodes ‘see’ the same data at the same
time
2. Availability: guarantee that every request receives a
response about whether it succeeded or failed
3. Partition tolerance: the system continues to operate
despite arbitrary message loss or failure of part of the
system
![Page 14: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon](https://reader036.fdocuments.in/reader036/viewer/2022071003/5fc01c88db4ede475d04fe55/html5/thumbnails/14.jpg)
Brewer’s CAP Theorem
1. Consistency: all nodes ‘see’ the same data at the same
time
2. Availability: guarantee that every request receives a
response about whether it succeeded or failed
3. Partition tolerance: the system continues to operate
despite arbitrary message loss or failure of part of the
system
Brewer’s (CAP) Theorem: it is impossible for
a distributed computer system to
simultaneously provide all three requirements.
![Page 15: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon](https://reader036.fdocuments.in/reader036/viewer/2022071003/5fc01c88db4ede475d04fe55/html5/thumbnails/15.jpg)
Consistency vs. Availability
More at https://www.youtube.com/watch?v=ASiU89Gl0F0
![Page 16: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon](https://reader036.fdocuments.in/reader036/viewer/2022071003/5fc01c88db4ede475d04fe55/html5/thumbnails/16.jpg)
BASE instead of ACID
1. Basic Availability
2. Soft-state
3. Eventual consistency: informally guarantees that, if
no new updates are made to a given data item,
eventually all accesses to that item will return the last
updated value
![Page 17: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon](https://reader036.fdocuments.in/reader036/viewer/2022071003/5fc01c88db4ede475d04fe55/html5/thumbnails/17.jpg)
NoSQL = Not Only SQL
• Not based on relational model, not using SQL
• No fixed schema, new attributes can be added to
data items at any time (schema-less)
• Designed for distributed, large clusters (scale-
out), including support for distributed processing
(MapReduce)
• Deliver eventual consistency
• Data-model specific query languages
![Page 18: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon](https://reader036.fdocuments.in/reader036/viewer/2022071003/5fc01c88db4ede475d04fe55/html5/thumbnails/18.jpg)
NoSQL: classification and usage pattern
Data
Size
Data
Complexity
Key-value DB
(Wide) Column DB
Document DB
Graph DB
Data model
![Page 19: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon](https://reader036.fdocuments.in/reader036/viewer/2022071003/5fc01c88db4ede475d04fe55/html5/thumbnails/19.jpg)
Visual guide to NoSQL systems
![Page 21: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon](https://reader036.fdocuments.in/reader036/viewer/2022071003/5fc01c88db4ede475d04fe55/html5/thumbnails/21.jpg)
NoSQL brief history
• Carlo Strozzi (1998) – to name his lightweight, open-source,
relational database, but without SQL interface
• Johan Oskarsson and Eric Evans (2009) – meetup on distributed structured data storage; they used #nosql as Twitter hashtag
for this
• Google Bigtable (2006)
• Amazon DynamoDB (2007)
• ....
• Used on large scale at Amazon, Facebook, Google, Twitter etc.
![Page 22: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon](https://reader036.fdocuments.in/reader036/viewer/2022071003/5fc01c88db4ede475d04fe55/html5/thumbnails/22.jpg)
Bibliography
Pramod J. Sadalage, Martin Fowler. NoSQL Distilled.
Addison Wesley, 2012
Watch M. Fowler @ NoSQL matters Conference in
Cologne, Germany 2013
http://martinfowler.com/nosql.html
1. Lorenzo Alberton. NoSQL Databases: why, what and when
2. Yousof Alsatom. Introduction to NoSQL
3. Wikipedia
4. Introduction to NoSQL on w3resource.com
5. http://www.nosql-database.org/ - list, by data model, of NoSQL databases
![Page 23: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon](https://reader036.fdocuments.in/reader036/viewer/2022071003/5fc01c88db4ede475d04fe55/html5/thumbnails/23.jpg)
XML Data
![Page 24: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon](https://reader036.fdocuments.in/reader036/viewer/2022071003/5fc01c88db4ede475d04fe55/html5/thumbnails/24.jpg)
Extensible Markup Language
• Standard for data representation and exchange
• Document format similar to HTML
– Tags describe content instead of formatting
• Also streaming format Nested Elements
•described by tag
•may have attributes
•text
Rules of well-formed XML:
• Single root element
• Matched tags and nesting
• Unique attributes within
elements
![Page 25: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon](https://reader036.fdocuments.in/reader036/viewer/2022071003/5fc01c88db4ede475d04fe55/html5/thumbnails/25.jpg)
Working with XML documents
• XML Parser
– Returns an error message (if XML document is not well
formed) or a ‘hook’ to manipulate the document
– 3 types of parsers
• Document Object Model (DOM) based (e.g. Xerces, libxml2)
• Event base, such as Simple API for XML (SAX) (e.g. Expat, JAXP
API)
• In the middle, Streaming API for XML (StAX) (e.g. Sun Java StAX
XML Processor, Woodstox)
• Displaying XML
– Convert XML document to HTML using CSS (Cascading Style
Sheets) or XSLT (Extensible Stylesheet Language
Transformations)
![Page 26: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon](https://reader036.fdocuments.in/reader036/viewer/2022071003/5fc01c88db4ede475d04fe55/html5/thumbnails/26.jpg)
Validating XML documents
Content-specific specifications
- Document Type Descriptor (DTD)
- XML Schema Definition (XSD)
Tools: LibXML 2 (XML Lint)
![Page 27: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon](https://reader036.fdocuments.in/reader036/viewer/2022071003/5fc01c88db4ede475d04fe55/html5/thumbnails/27.jpg)
Document Type Descriptor (DTD)
Can specify
• Elements,
• Attributes,
• Nesting,
• Ordering,
• Number of occurrences,
• ID and IDREF
![Page 28: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon](https://reader036.fdocuments.in/reader036/viewer/2022071003/5fc01c88db4ede475d04fe55/html5/thumbnails/28.jpg)
Document Type Descriptor (DTD)* = 0 or more
| = or
? = optional
+ = 1 or more
CDATA = string
#PCDATA = text
![Page 29: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon](https://reader036.fdocuments.in/reader036/viewer/2022071003/5fc01c88db4ede475d04fe55/html5/thumbnails/29.jpg)
Document Type Descriptor (DTD)* = 0 or more
| = or
? = optional
+ = 1 or more
CDATA = string
#PCDATA = text
ID = identifier
IDREF = refers to
one identifier
IDREFS = refers to
one or more
identifiers
Source: Jennifer Widom – Database Course, Stanford University
![Page 30: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon](https://reader036.fdocuments.in/reader036/viewer/2022071003/5fc01c88db4ede475d04fe55/html5/thumbnails/30.jpg)
XML Schema Definition (XSD)
Can specify
• Elements,
• Attributes,
• Nesting,
• Ordering,
• Number of occurrences,
• Data types,
• Keys,
• References (typed pointers),
• … and many more
![Page 31: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon](https://reader036.fdocuments.in/reader036/viewer/2022071003/5fc01c88db4ede475d04fe55/html5/thumbnails/31.jpg)
XML Schema Definition (XSD)
Can specify
• Elements,
• Attributes,
• Nesting,
• Ordering,
• Number of occurrences,
• Data types,
• Keys,
• References (typed pointers),
• … and many more
![Page 32: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon](https://reader036.fdocuments.in/reader036/viewer/2022071003/5fc01c88db4ede475d04fe55/html5/thumbnails/32.jpg)
Typed values, occurrences constraints
• Typed values
<xsd:attribute name=“Price” type=“xsd:integer” use=“required”>
• Occurrences constraints
<xsd:element name="Book"
type="BookType”
minOccurs="0"
maxOccurs="unbounded" />
Remark: If minOccurs/maxOccurs is not specified, the default
value used is 1.
![Page 33: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon](https://reader036.fdocuments.in/reader036/viewer/2022071003/5fc01c88db4ede475d04fe55/html5/thumbnails/33.jpg)
Keys and references
• Key declarations
<xsd:key name="BookKey">
<xsd:selector xpath="Book" />
<xsd:field xpath="@ISBN" />
</xsd:key>
• References (typed pointers)
<xsd:keyref name="BookKeyRef" refer="BookKey">
<xsd:selector xpath="Book/Remark/BookRef" />
<xsd:field xpath="@book" />
</xsd:keyref>
![Page 34: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon](https://reader036.fdocuments.in/reader036/viewer/2022071003/5fc01c88db4ede475d04fe55/html5/thumbnails/34.jpg)
Relational vs. XML
Source: Jennifer Widom – Database Course, Stanford University
Relational XML
Structure
Schema
Queries
Ordering
Implementation
![Page 35: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon](https://reader036.fdocuments.in/reader036/viewer/2022071003/5fc01c88db4ede475d04fe55/html5/thumbnails/35.jpg)
Relational vs. XML
Source: Jennifer Widom – Database Course, Stanford University
Relational XML
Structure Tables Hierarchical
(tree, graph)
Schema
Queries
Ordering
Implementation
![Page 36: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon](https://reader036.fdocuments.in/reader036/viewer/2022071003/5fc01c88db4ede475d04fe55/html5/thumbnails/36.jpg)
Relational vs. XML
Source: Jennifer Widom – Database Course, Stanford University
Relational XML
Structure Tables Hierarchical
(tree, graph)
Schema Fixed Flexible
Queries
Ordering
Implementation
![Page 37: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon](https://reader036.fdocuments.in/reader036/viewer/2022071003/5fc01c88db4ede475d04fe55/html5/thumbnails/37.jpg)
Relational vs. XML
Source: Jennifer Widom – Database Course, Stanford University
Relational XML
Structure Tables Hierarchical
(tree, graph)
Schema Fixed Flexible
Queries Simple (SQL) More cumbersome
(XPath / XQuery)
Ordering
Implementation
![Page 38: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon](https://reader036.fdocuments.in/reader036/viewer/2022071003/5fc01c88db4ede475d04fe55/html5/thumbnails/38.jpg)
Relational vs. XML
Source: Jennifer Widom – Database Course, Stanford University
Relational XML
Structure Tables Hierarchical
(tree, graph)
Schema Fixed Flexible
Queries Simple (SQL) More cumbersome
(XPath / XQuery)
Ordering None Implied
Implementation
![Page 39: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon](https://reader036.fdocuments.in/reader036/viewer/2022071003/5fc01c88db4ede475d04fe55/html5/thumbnails/39.jpg)
Relational vs. XML
Source: Jennifer Widom – Database Course, Stanford University
Relational XML
Structure Tables Hierarchical
(tree, graph)
Schema Fixed Flexible
Queries Simple (SQL) More cumbersome
(XPath / XQuery)
Ordering None Implied
Implementation Native 1. XML-enabled
(extensions to
RDBMS)
2. Native XML
![Page 40: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon](https://reader036.fdocuments.in/reader036/viewer/2022071003/5fc01c88db4ede475d04fe55/html5/thumbnails/40.jpg)
Bibliography
NoSQL Distilled
by Pramod J.
Sadalage and
Martin Fowler,
Addison
Wesley, 2012
Chapter 1
A First Course in
Database Systems
(3rd edition) by
Jeffrey Ullman and
Jennifer Widom,
Prentice Hall, 2007
Chapter 11