Dropping ACID: Wrapping Your Mind Around NoSQL Databases
-
Upload
kyle-banerjee -
Category
Technology
-
view
175 -
download
2
Transcript of Dropping ACID: Wrapping Your Mind Around NoSQL Databases
Kyle BanerjeeDigital Services Program ManagerOrbis Cascade Alliance
Dropping ACID:
Wrapping Your Mind Around NoSQL Databases
Why should anyone care?
Great for the Web
• No schema – easy to store data that are really awkward to work with in RDBMS
• Much easier horizontal scalability than RDBMS
• Works great with huge amounts of data
• High fault tolerance
• Integration of both RESTful and cloud computing technologies
There is no magic
• Databases are fast because they physically structure data so it can be accessed efficiently
• NoSQL achieves performance through tradeoffs that make sense in a Web environment
• RDBMS can be used in high performance applications
• Compromises (e.g. denormalization, sharding) that kill the advantage of having an RDBMS are often necessary
• Technically more complex (i.e. expen$ive) to implement/maintain
What is a NoSQL database?
A nonrelational data store
–Document Store
–Wide Column Store
–Key Value Store
–Graph
–XML
NoSQL databases differ significantly in what they are good for
What’s best depends on your data
Complexity
Key/Value stores
Size
Wide column
Document
databases Graph
databases
Your priorities
• What types of queries do you need to support?
• How much data?
• Optimized for reads, writes, or updates?
• Versioning
• How separate is data from app? Will other applications need to access it in future?
And how you want to interact with it
• RESTful inteface
• Query API
• NonSQL query languages
• Via indexed values, keys, nodes
• File access
Key value stores
• Basically a hash
• Focus on scaling to huge amounts of data
• Examples: Amazon SimpleDB, Voldemort, Dynomite, BerkeleyDB, Riak
Wide column stores
• Somewhat like column oriented relational databases
• Same elements don’t have to have same columns
• Examples: Hadoop, Cassandra, Hbase
Document databases
• Like key-value stores, but values have meaning to database
• Examples: CouchDB, MongoDB
Graph databases
• Uses nodes, relationships between nodes and key-value properties
• Recursive structures in relational DBs require expensive joins
• Examples: Neo4j, VertexDB, AllegroGraph
Things that simplify life
• JSON
• RESTful interface or easy API
• Multiversion Concurrency Control (MVCC)
Traditional RDBMS
animal_type
animal_id: integer
description: varchar
pet
pet_id: integer
animal_id: integer
name: varchar
likes
pet_id: integer
friend_id: integer
pet animal_type likes animal_type
Charley dog Powder dog
Charley dog Bo dog
hates
pet_id: integer
animal_id: integer
pet animal_type hates animal_type
Charley dog Abby cat
Charley dog Spidey tarantula
JSON Example
{
"name": "Charley",
"animal_type": "dog",
"likes": [
{"name": "Powder", "animal_type": "dog"},
{"name": "Bo", "animal_type": "dog"}
],
"hates": [
{"name": "Abby", "animal_type": "cat "},
{"name": “Spidey", "animal_type": “tarantula"}
]
}
Why JSON?
• Lightweight, interoperable and open
• Can be composed in any text editor
• Syntax is crazy easy
• With RESTful API, can be used with any software that supports HTTP (even the user’s browser can make direct DB calls)
• Allows you to send and receive data as it is used
How easy can REST be?
Create: HTTP PUT /db/docid
Read: HTTP GET /db/docid
Update: HTTP POST /db/docid
Delete: HTTP DELETE /db/docid
MVCC in a nutshell
• Creates new version each time an update is made
• Timestamps used to prevent conflicts
• Reads are always possible
Disadvantages of NoSQL
• Performance and scalability achieved at the expense of feature support
• No joins. Grouping and ordering become more problematic
• No SQL
• No transactions
• Eventual consistency vs strict consistency
• Tools are often lacking
The bottom line
• In a library context, NoSQL is appropriate when flexible schema or fast displays that contain related data are needed
• Understand the problem at hand as well as the pros/cons of your options before deciding on a solution
• Don’t ditch your RDBMS