NoSQL & DataGrids from a Developer Perspective
Cyrille Le Clerc - Michaël Figuière
Speaker
Cyrille Le Clerc
@cyrilleleclerc
blog.xebia.fr
Apache CXF DataGrids
Large Scale
Speaker
Michaël Figuière
@mfiguiere
blog.xebia.fr
Search Engines NoSQL
DistributedSystems
About NoSQL
No SQL
About NoSQL
No SQL
Not Only
About NoSQL
No SQLRelational
Not Only
Once upon a time...
On the Web side
• Huge amount of data
• High availability
• Fault tolerance
• Scalability on commodity hardware
Similar needs for Web giants :- Created Dynamo- < 40 min of unavailability per year
- Created BigTable & MapReduce- Stores every webpages of Internet
Amazon : the birth of Dynamo
Fill cart Checkout Payment Process order Prepare Send
Requires high availability,key-value store is enough
Requires complex requests,temporal unavailability is acceptable
On the Financial side
• Very low latency
• Rich queries & transactions
• Scalability
• Data consistency
Needs within financial market :- Released Coherence in 2001- Started as a distributed cache
- Released Gigaspaces XAP in 2001- Routes the request inside the data
Data Partitioning and Replication
Use Case : Train Ticketing System
With trains, stations, seats, booking and passengers
Store everything in a Mainframe !
Up to 3 To of RAM ! More than $1,000,000
IBM z11
Data Partitioning
Split data for scalability
MainFrame
Smallservers
Partition gamma
Partition beta
Partition alpha
Data Replication
synchro
Duplicate data for high availability and
scalability
Partition alpha
Node 1
Node 2
Node 3
Partitioned Data Modeling
Partitioned Data Modeling
TrainStopdate
TrainStationcodename
Traincodetype
Seatnumberprice
Bookingreduction
Passengername
Typical relational data model
Partitionned Data Modeling
TrainStopdate
Seatnumberprice
Bookingreduction
Passengername
Reference data
Duplicated in each partition
TrainStationcodename
Root entity
Partitioning ready entities tree
Traincodetype
Find the root entity and denormalize
Partitionned Data Modeling
Remove unused data
TrainStopdate
Seatnumberprice
Bookingreduction
Passengername
booked
TrainStationcodename
Traincodetype
Partitionned Data Modeling
TrainStopdate
TrainStationcodename
Seatnumberpricebooked
Traincodetype
Sharding ready data structure
Consistency, Availability and Partition Tolerance
Data Consistency with replicas
Node 1
write to all
read from one
{ "name": "Barbie Computer", "price": 15.50, "tags" : [ "doll", "barbie" ]}
Node 2
Node 3
Node 1
Node 2
Node 3
Data Consistency with replicas
{ "name": "Barbie Computer", "price": 15.50, "tags" : [ "doll", "barbie" ]}
write to one
read from all
Node 1
Node 2
Node 3
Node 1
Node 2
Node 3
Data Consistency with replicas
• You can adjust the balance between number of writes and number of reads
• See Eventual Consistency
Data Consistency with Multiple Data Centers
West Coast
East Coast
{ "name": "Barbie Computer", "price": 15.50, "tags" : [ "doll", "barbie" ]}
{ "name": "Barbie Computer", "price": 15.50, "tags" : [ "doll", "barbie" ]}
Data Consistency with Multiple Data Centers
{ "name": "Barbie Computer", "price": 20.00, "tags" : [ "doll", "barbie" ]} { "name": "Barbie Computer",
"price": 15.50, "tags" : [ "doll", "barbie" ]}
set price to $ 20.00
propagation delay !
West Coast
East Coast
Data Consistency with Multiple Data Centers
{ "name": "Barbie Computer", "price": 20.00, "tags" : [ "doll", "barbie" ]} { "name": "Barbie Computer",
"price": 15.50, "tags" : [ "doll", "barbie", “girl” ]}
set price to $ 20.00
add tag “girl”reconciliation API needed !
West Coast
East Coast
Data Consistency with Multiple Data Centers
{ "name": "Barbie Computer", "price": 20.00, "tags" : [ "doll", "barbie" ]} { "name": "Barbie Computer",
"price": 15.50, "tags" : [ "doll", "barbie", “girl” ]}
set price to $ 20.00
add tag “girl”Network partitioning
West Coast
East Coast
Data Consistency with Multiple Data Centers
TokyoNew York
London
World wide replicationfor financial market
CAP Theorem
Consistency
Availability
PartitionTolerance
Only 2 of these 3 properties can be
achieved in storage system
CAP Theorem
Impossible
Relational DBNoSQL DB Consistency
Availability
PartitionTolerance
Data models & APIs
Request Driven Data Modeling
• Relational data modeling is business driven
• With partitioning, data modeling had to be adapted for requests
• NoSQL & DataGrids data modeling is request driven
Adaptation to requests comes with tuning
Because network latency matters
Two requests may require to store data twice
Key-Value Store
In memory
Persistent
In memory with async persistence
Example with a user profile
johndoe User profile as byte[]
Similar to a Java HashMap
Write Example with Riak
RiakClient riak = new RiakClient("http://server1:8098/riak");
RiakObject userProfileObj = new RiakObject("bucket", "johndoe", serializer.serialize(userProfile);
riak.store(userProfileObj);
Inserts a user profile into Riak
Read Example with Riak
FetchResponse response = riak.fetch("bucket", "johndoe");
if (response.hasObject()) {
userProfileObj = response.getObject();
}
Fetch a user profile using its key in Riak
Column Families Store
Column Families Store
Relational DB Column families DB
For each Row ID we have a list of key-value pairs
Key-value pairs are
sorted by keys
Example with a shopping cart
17:21 Iphone 17:32 DVD Player 17:44 MacBookjohndoe
6:10 Camera 8:29 Ipadwillsmith
14:45 PlayStation 15:01 Asus EEE 15:03 Iphonepitdavis
Write Example with Cassandra
Cluster cluster = HFactory.getOrCreateCluster("cluster", new CassandraHostConfigurator("server1:9160"));
Keyspace keyspace = HFactory.createKeyspace("EcommerceKeyspace", cluster);
Mutator<String> mutator = HFactory.createMutator(keyspace, stringSerializer);
mutator.insert("johndoe", "ShoppingCartColumnFamily", HFactory.createStringColumn("14:21", "Iphone"));
Inserts a column into the ShoppingCartColumnFamily
Read Example with Cassandra
SliceQuery<String, String, String> query = HFactory.createSliceQuery(keyspace, stringSerializer, stringSerializer, stringSerializer);
query.setColumnFamily("ShoppingCartColumnFamily") .setKey("johndoe") .setRange("", "", false, 10);
QueryResult<ColumnSlice<String, String>> result = query.execute();
Reads a slice of 10 columns from ShoppingCartColumnFamily
Document Store
Example with an item of a catalog
{ "name": "Iphone", "price": 559.0, "vendor": "Apple", "rating": 4.6, "tags": [ "phone", "touch" ]}
item_1
The database is aware of document’s fields and
can offers complex queries
Write Example with MongoDB
Mongo mongo = new Mongo("mongos_1", 27017);DB db = mongo.getDB("Ecommerce");DBCollection catalog = db.getCollection("Catalog");
BasicDBObject doc = new BasicDBObject();doc.put("name", "Iphone");doc.put("price", 559.0);
catalog.insert(doc);
Inserts an item document into MongoDB
Read Example with MongoDB
BasicDBObject query = new BasicDBObject();query.put("price", new BasicDBObject("$lt", 600));DBCursor cursor = catalog.find(query);
while(cursor.hasNext()) { System.out.println(cursor.next());}
Queries for all items with a price lower than 600
In Memory Data Grids
eXtreme Scale
Example with train booking with IBM eXtremeScale
With Data Grids,sub entities can have cross relations
@Entity(schemaRoot=true)public class Train { @Id String code; @Index @Basic String name; @OneToMany(cascade=CascadeType.ALL) List<Seat> seats = new ArrayList<Seat>(); @Version int version;
...}
TrainStopdate
Seatnumberpricebooked
Traincodetype
Write Example with IBM eXtreme Scale
void persist(Train train) { entityManager.persist(train);}
Inserts a train into eXtreme Scale
eXtreme Scale provides a JPA Style API
Read Example with IBM eXtreme Scale
/** Find by key */Train findById(String id) { return (Train) entityManager.find(Train.class, id);}
/** Query Language */Train findByTrain(String code) { Query q = entityManager.createQuery("select t from Train t where t.code=:code"); q.setParameter("code", code);
return (Train) q.getSingleResult();}
Simple and complex queries with eXtreme Scale
More APIs
• Another Java EE versus Spring battle ? JSR 347 Data Grids vs. Spring Data
Unified API ontop of relational, document, column, key-value ?
Object to tuple projection API
Transactions
Transactions
• NoSQL usually means NO transactions
• Except when it means eXtreme Transactions !
Transactions Concurrency
warehouse stocks
231
264
2
637
canon-eos: 1ipod : 1headphone : 1iphone: 1...
ipad : 1 iphone: 1
barbie : 1iphone: 1cabbage-doll: 1
concurrency on iphone
121
311
Place order
cancel order if one product is missing
12
SQL Transactions
warehouse stocks
231
264
2
637
canon-eos: 1ipod : 1headphone : 1iphone: 1...
lock duration = f(shoppingcart.length)if too many locks on the rows, then lock table !
beginfor each shoppingCart.product select for update ... update ...commit
121
311
12
ipad : 1 iphone: 1
barbie : 1iphone: 1cabbage-doll: 1
Place order
SQL Transactions
warehouse stocks
231
264
2
637
canon-eos: 1ipod : 1headphone : 1iphone: 1...
lock duration = f(shoppingcart.length)if too many locks on the rows, then lock table !
select for update ...
121
311
12
ipad : 1 iphone: 1
barbie : 1iphone: 1cabbage-doll: 1
Place order
SQL Transactions
warehouse stocks
231
264
2
637
canon-eos: 1ipod : 1headphone : 1iphone: 1...
lock duration = f(shoppingcart.length)if too many locks on the rows, then lock table !
select for update ...
121
311
12
ipad : 1 iphone: 1
barbie : 1iphone: 1cabbage-doll: 1
Place order
Transactions with Manual Compensation
warehouse stocks
231
264
2
637
code “do”, “undo” and the chain
121
311
12
if(stock - quantity > 0) { stock = stock - quantity;} else { throw exception() !
DO
stock = stock + quantity;
UNDO
-1
-1if(stock - quantity > 0) { stock = stock - quantity;} else { throw exception() !
DO
stock = stock + quantity;
UNDO
if(stock - quantity > 0) { stock = stock - quantity;} else { throw exception() !
DO
stock = stock + quantity;
UNDO
if(stock - quantity > 0) { stock = stock - quantity;} else { throw exception() !
DO
stock = stock + quantity;
UNDO
-1
-1if(stock - quantity > 0) { stock = stock - quantity;} else { throw exception() !}
DO
stock = stock + quantity;
UNDO
canon-eos: 1ipod : 1headphone : 1iphone: 1...
Place order
Transactions with Manual Compensation
warehouse stocks
231
264
0
637
121
311
12
if(stock - quantity > 0) { stock = stock - quantity;} else { throw exception() !
DO
stock = stock + quantity;
UNDO
-1
-1if(stock - quantity > 0) { stock = stock - quantity;} else { throw exception() !
DO
stock = stock + quantity;
UNDO
if(stock - quantity > 0) { stock = stock - quantity;} else { throw exception() !
DO
stock = stock + quantity;
UNDO
-1
barbie : 1iphone: 1cabbage-doll: 1
Place order
Transactions with Manual Compensation
warehouse stocks
231
264
0
636
121
311
12
if(stock - quantity > 0) { stock = stock - quantity;} else { throw exception() !
DO
stock = stock + quantity;
UNDO
-1
-1if(stock - quantity > 0) { stock = stock - quantity;} else { throw exception() !
DO
stock = stock + quantity;
UNDO
if(stock - quantity > 0) { stock = stock - quantity;} else { throw exception() !
DO
stock = stock + quantity;
UNDO
-1
barbie : 1iphone: 1cabbage-doll: 1
Place order
Transactions with Manual Compensation
warehouse stocks
231
264
0
636
121
311
12
if(stock - quantity > 0) { stock = stock - quantity;} else { throw exception() !
DO
stock = stock + quantity;
UNDO
-1
-1if(stock - quantity > 0) { stock = stock - quantity;} else { throw exception() !
DO
stock = stock + quantity;
UNDO
if(stock - quantity > 0) { stock = stock - quantity;} else { throw exception() !
DO
stock = stock + quantity;
UNDO
-1
no more iphone !barbie : 1iphone: 1cabbage-doll: 1
Place order
Transactions with Manual Compensation
warehouse stocks
231
264
0
636
121
311
12
if(stock - quantity > 0) { stock = stock - quantity;} else { throw exception() !
DO
stock = stock + quantity;
UNDO
-1
-1if(stock - quantity > 0) { stock = stock - quantity;} else { throw exception() !
DO
stock = stock + quantity;
UNDO
if(stock - quantity > 0) { stock = stock - quantity;} else { throw exception() !
DO
stock = stock + quantity;
UNDO
-1
interruptedbarbie : 1iphone: 1cabbage-doll: 1
cancelled
Place order
Transactions with Manual Compensation
warehouse stocks
231
264
0
636 +1
121
311
12
if(stock - quantity > 0) { stock = stock - quantity;} else { throw exception() !
DO
stock = stock + quantity;
UNDO
-1
-1
-1
interruptedbarbie : 1iphone: 1cabbage-doll: 1
cancelledif(stock - quantity > 0) { stock = stock - quantity;} else { throw exception() !
DO
stock = stock + quantity;
UNDO
if(stock - quantity > 0) { stock = stock - quantity;} else { throw exception() !
DO
stock = stock + quantity;
UNDO
undo
if(stock - quantity > 0) { stock = stock - quantity;} else { throw exception() !}
DO
stock = stock + quantity;
UNDO
Place order
Transactions with Manual Compensation
• Code “do” & “undo” & chain execution
• What about interrupted chain execution ? Data corruption ?
Transactions with Manual Compensation
• Code “do” & “undo” & chain execution
• What about interrupted chain execution ? Data corruption ?
data store managed transaction chain execution
Which solution to choose?
Key-Value Store
• Get and Set by key
• Riak and Voldemort provide a great scalability
• Memcached and Redis offer low overhead and latency
Simple but enough for a lot of use cases
Great to persist continuously growing datasets
Great for cache and live data
Column Families Store
• Get and Set by key of a list of columns
• Queries are simples, but columns slice fetching is possible
• Data model is too low level for many complex data modeling
Makes it possible to fetch and update partial data
Great for pagination
Should typically be used for the largest scalability needs
Document Store
• Schema less
• Complex queries are available
• Scalability may be limited if not querying using partition key
Great for continuously updated schemas
Necessary for filtering and search
Can be handle using multiple storage and limited queries
In Memory Data Grid
• Very Low Latency & eXtreme Transaction Processing (XTP)
• In Memory - No Persistence
• High budget and Developer skills required
Investment banking, booking & inventory systems
Most of the time backed with a database
Some Open Source alternatives are appearing
Polyglot storage for eCommerce
Application
Solr
MongoDB
Cassandra
Coherence
Productssearch
Warehouseinventory
Product catalog
User account and Shopping cart
Why NoSQL & DataGrids matter ?
• Polyglot Storage: databases that fit the needs of every type of data
• Linear Scalability: being able to handle any further business requirements
• High Availability: multi-servers and multi-datacenters
• Elasticity: natural integration with Cloud Computing philosophy
• Some new use cases now available
Questions / Answers
?
Top Related