NoSQL and Architectures
-
Upload
eberhard-wolff -
Category
Technology
-
view
15.107 -
download
10
description
Transcript of NoSQL and Architectures
Eberhard Wolff - @ewolff
NoSQL & Architectures
Eberhard Wolff @ewolff
Eberhard Wolff - @ewolff
About me
► Eberhard Wolff ► Freelance consultant ► Head technology advisory board at
adesso ► Speaker ► Author
► Blog: http://ewolff.com ► Twitter: @ewolff
Eberhard Wolff - @ewolff
Back in the Days….
Eberhard Wolff - @ewolff
NoSQL Is All About the Persistence Question
Eberhard Wolff - @ewolff
Key-Value Stores
► Maps keys to values ► Just a large globally available Map ► i.e. not very powerful data model
► No complex queries or indices ► Just access by key ► Might add e.g. full text engine
► Redis: Cache + Persistence ► Riak: Massive scale +Solr queries
Key Value 42 Some
data
Eberhard Wolff - @ewolff
Wide Column ► Add any "column" you like to a row ► key-(column-value) ► Column families like tables ► E.g. in the "Users" column family
> "someuser" è ("username"è"someuser"), ("email" è"[email protected]")
► Columns named: indexing possible ► So fast queries possible
► Apache Cassandra ► Amazon SimpleDB ► Apache HBase ► All tuned for large data sets
XX
XX XX XX XX
XX XX XX
XX XX XX XX
XX XX XX XX
XX XX XX XX
XX XX XX XX
XX XX
XX XX XX
XX XX XX
XX XX XX
XX XX XX XX
XX XX XX XX
XX xX XX XX XX
Eberhard Wolff - @ewolff
Document Stores
► Aggregates are typically stored as "documents“ (key-value collection) ► JSON quite common ► No fixed schema ► Indexes possible ► Queries possible
> E.g. "find all baskets that contain the product 123"
► Still great horizontal scalability ► Relations might be modeled as links
► MongoDB, CouchDB
Eberhard Wolff - @ewolff
Graph ► Nodes with Properties ► Typed relationships with
properties
► Ideal e.g. to model relations in a social network
► Easy to find number of followers, degree of relation etc.
► Hard to scale out
► Neo4j
Eberhard Wolff - @ewolff
NoSQL Benefits Costs • Scale out instead of Scale Up • Cheap Hardware • Usually Open Source
Flexibility • Schema in code not in
database • Easier to upgrade schema • Easier to handle
heterogeneous data No Object/relational impedance mismatch • NoSQL database are more OO like
Ops
Dev
Eberhard Wolff - @ewolff
Drivers Exponential Data
Growth
More Connected Data
Semi Structured Data
Scale Out
Key Value
Wide Column
Document
Graph
Cost
Flexibility
Eberhard Wolff - @ewolff
Document-oriented Databases are the best NoSQL database
For at least one definition of “best”
Eberhard Wolff - @ewolff
Document-oriented databases ► Offer scale out
> Unless you need huge amounts of data
► Offer a rich and flexible data model > …and queries
► Other databases have other sweet spots
> Huge data sets > Graph structures > Analyzing data
► Niches or mainstream?
Cost
Flexibility
Eberhard Wolff - @ewolff
Financial System
► Different financial products
► Mapping objects / database
► Inheritance
Eberhard Wolff - @ewolff
E/R Model
Investment
Stock Zero Bond Option
Country > 20 database tables Up to 25 attributes
Currency
Eberhard Wolff - @ewolff
#SRLSY??
Eberhard Wolff - @ewolff
Investment Type ID
Zero Bond
Interest Rate
Fixed Rate Bond
Interest Rate
Stock Option
…
Preferred Underlying asset
Country Price Country
Currency
Eberhard Wolff - @ewolff
Polyglot Persistence in Ecommerce Application
Financial Data
RDBMS
Product Catalog
Document Store
Shopping Cart
Key / Value
Recommendation
Graph
Needs transactions & reports. Data fit well in tables.
High Performance & Scalability No complex queries
Complex document-like data structures and complex queries
Based on friends, their purchases and reviews
Eberhard Wolff - @ewolff
The NoSQL Game
Financial Data
RDBMS
Product Catalog
Document Store
Shopping Cart
Key / Value
Recommendation
Graph
Needs transactions & reports. Data fit well in tables.
High Performance & Scalability No complex queries
Complex document-like data structures and complex queries
Based on friends, their purchases and reviews
0 1000
900 800
2700 High Score!
Eberhard Wolff - @ewolff
Just Like the Patterns Game! Points for each Pattern used Extra points if one class implements multiple Pattern
Eberhard Wolff - @ewolff
This is not howSoftware Architecture works.
Eberhard Wolff - @ewolff
Why not? More is worse!
More hardware More Ops Trouble • Installation • Backup • Disaster Recovery • Monitoring • Optimizations
More Developer Skills Not necessarily bad
Eberhard Wolff - @ewolff
But: Polyglot Persistence Has a Point
► Object-oriented Databases did it wrong ► Strategy: Replace RDBMS ► Enterprises will stick to RDBMS ► Pure technology migration basically
never happens ► …only vendors think differently
Eberhard Wolff - @ewolff
Archive
Current Data
RDBMS
Archive
Document Store
Classic approach for current data NoSQL for the archive
Eberhard Wolff - @ewolff
Archives for Insurances
► Legacy migration ► Querying and visualizing not migrated
data ► i.e. old contracts ► Legacy hard- and software can be
switched off ► Flexibility: Host data formats ► Cost: Inexpensively handling large data
volumes
Eberhard Wolff - @ewolff
Complex Document Processing System
MongoDB Redis elastic search
Document-oriented
Meta Data for quick access
Search index
Documents
Key/value in memory Search
engine
Eberhard Wolff - @ewolff
Alternative: Only elasticsearch
elastic search
• Stores original documents as well
• (like a key/value store) • Support for complex queries • Very powerful features also for
data mining / analytics • Not well suited for update heavy
operations • Backup / disaster recovery? • Written in Java
Eberhard Wolff - @ewolff
Scaling elasticsearch
Server Server Server
Shard 1 Replica 1
Replica 2 Shard 2
Replica 3 Shard 3
Eberhard Wolff - @ewolff
Alternative: Only MongoDB
MongoDB
• Now with (limited beta) fulltext search
• Excellent support for updates • Quite fast – memory mapped
files • Also fast for updates • Disaster recovery possible • Map/Reduce support • Written in C++
Eberhard Wolff - @ewolff
Scaling MongoDB
Replica 1
Shard 1
Replica 2
Replica 3
Shard 2
Replica 1
Replica 2
Replica 3
Eberhard Wolff - @ewolff
Scaling MongoDB
Replica 1
Shard 1
Replica 2
Replica 3
Replica 1
Shard 2
Replica 2
Replica 3
Replica 1
Shard 3
Replica 2
Replica 3
Eberhard Wolff - @ewolff
What about Redis?
Redis
• MongoDB uses memory mapped files – Why Redis?
• Like a Swiss Knife • Cache • Messaging • Central coordination in a
distributed environment • Written in C
Eberhard Wolff - @ewolff
Scaling Redis
Asynchronous replication built in
Server
Replica
Replica
Eberhard Wolff - @ewolff
Alternative: Riak
• Key / value store • But includes Solr for fulltext
search • What is the difference to a
document store then? • Map/reduce possible • Written in Erlang • Smart scaling
Eberhard Wolff - @ewolff
Scaling Riak
Server A Shard1 Shard3
Shard4
Server B Shard2 Shard1
Shard4
Server D Shard4 Shard2
Shard3
Server C Shard3 Shard2
Shard1
Eberhard Wolff - @ewolff
Scaling Riak
Server A Shard1 Shard3
Shard4
Server B Shard2 Shard1
Shard4
Server D Shard4 Shard2
Shard3
Server C Shard3 Shard2
Shard1
Eberhard Wolff - @ewolff
Scaling Riak
Server A Shard1 Shard3
Shard4
Server B Shard2 Shard1
Shard4
Server D Shard4 Shard2
Shard3
Server C Shard3 Shard2
Shard1
New Server
Eberhard Wolff - @ewolff
Document-oriented Databases are the best NoSQL database
For at least one definition of “best”
Key/Value!
Eberhard Wolff - @ewolff
Your Choice – a trade off!
Typical architecture decision
MongoDB Redis elastic search riak
Eberhard Wolff - @ewolff
Data Access: RDBMS
RDBMS
• Schema • Stored Procedures
Data Model
Data Access • Queries • Other code
Architect/ Developer
Optimizations
• Indices • Tables
spaces No need to change code • …
DBA
Eberhard Wolff - @ewolff
RDBMS separate data from data access
Joins and normalization allow flexible data access
patterns
Indices
Eberhard Wolff - @ewolff
Sacrifice Joins for Scalability ► Join: Combine tables to retrieve results ► Need transactions spanning multiple
tables ► Example: Customer table + addresses ► Inserts need locks and consistency
across both tables
► Limits scalability ► Global and distributed locks are nasty ► Consistency limits either availability or
partition tolerance
Eberhard Wolff - @ewolff
CAP Theorem
► Consistency > All nodes see the same data > Not the ACID Consistency
► Availability > Node failure do not prevent survivors from operating
► Partition Tolerance > System continues to operate despite arbitrary message loss
► Can at max have two ► Or rather: If network fail – choose A or C.
C
P A
Eberhard Wolff - @ewolff
Consistency
Partition Tolerance
Availability
RDBMS
2 Phase Commit
DNS Replication
Quorum
CAP Theorem
Eberhard Wolff - @ewolff
BASE ► Basically Available Soft state Eventually consistent ► I.e. trade consistency for availability
► Pun concerning ACID… ► Not the same C, however!
Eberhard Wolff - @ewolff
BASE ► Eventually consistent ► If no updates are sent for a while all previous updates will eventually propagate through the system ► Then all replicas are consistent ► Can deal with network partitioning: Message will be transferred later ► All replicas are always available
► Pun concerning ACID… ► Not the same C, however!
Eberhard Wolff - @ewolff
Banking is BASE
► ATMs relax rules on providing cash if network partitioned
► Your account is only guaranteed to be consistent by the end of the year
Eberhard Wolff - @ewolff
No Joins - What now? ► Customer and addresses must be
consistent! ► Solution: Store both as one entity ► Atomic changes easily possible ► Queries might be distributed across
multiple notes
► “NoSQL does not support transactions / ACID” is wrong
> NoSQL does not support Joins is better > Atomic changes still possible > Schema design different
Eberhard Wolff - @ewolff
Data Access MongoDB
MongoDB
• Influences access patterns
Data Model
Data Access • WriteConcerns
how much do love your data?
• Shard key • Consistency
Architect/ Developer
Optimizations
• Only basic indices
Other optimizations must be done incode
DBA
Eberhard Wolff - @ewolff
Cluster: RDBMS
► Transparent to developers
► How many nodes?
► A special setup of hardware and RDBMS software
DBA
Eberhard Wolff - @ewolff
Cluster: MongoDB ► CAP theorem
> If the network is down choose > Consistency xor > Availabilty
► Deals with replication ► MongoDB has
master / slave replication
Architect/ Developer
MongoDB
► Write Concerns: > Unacknowledged > Acknowledged > Journaled > Some nodes in the replica set
► Queries might go to master only or also slaves
► Influences consistency
Eberhard Wolff - @ewolff
More Power and more Responsibility
DB Admin
Architect
Eberhard Wolff - @ewolff
Architects ► Architecture has always been a multi-dimensional problem
► Need to choose persistence technology
► Need to think about operations
► Needs to do DBA work
Eberhard Wolff - @ewolff
NoSQL Is All About the Persistence Question