SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

85
SQL, NoSQL, NewSQL? What's a developer to do? Chris Richardson Author of POJOs in Action Founder of the original CloudFoundry.com [email protected] @crichardson Blog: http://plainoldobjects.com Copyright (c) 2012 Chris Richardson. All rights reserved.

description

The database world is undergoing a major upheaval. NoSQL databases such as MongoDB and Cassandra are emerging as a compelling choice for many applications. They can simplify the persistence of complex data models and offering significantly better scalability and performance. But these databases have a very different and unfamiliar data model and APIs as well as a limited transaction model. Moreover, the relational world is fighting back with so-called NewSQL databases such as VoltDB, which by using a radically different architecture offers high scalability and performance as well as the familiar relational model and ACID transactions. Sounds great but unlike the traditional relational database you can’t use JDBC and must partition your data. In this presentation you will learn about popular NoSQL databases – MongoDB, and Cassandra – as well at VoltDB. We will compare and contrast each database’s data model and Java API using NoSQL and NewSQL versions of a use case from the book POJOs in Action. We will learn about the benefits and drawbacks of using NoSQL and NewSQL databases.

Transcript of SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Page 1: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

SQL, NoSQL, NewSQL? What's a developer to do?

Chris Richardson

Author of POJOs in Action Founder of the original CloudFoundry.com

[email protected] @crichardson Blog: http://plainoldobjects.com

Copyright (c) 2012 Chris Richardson. All rights reserved.

Page 2: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Overall presentation goal

The joy and pain of building Java

applications that use NoSQL and NewSQL

2

Page 3: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

About Chris

3

Page 4: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

(About Chris)

4

Page 5: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

About Chris()

5

Page 6: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

About Chris

6

Page 7: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

About Chris

http://www.theregister.co.uk/2009/08/19/springsource_cloud_foundry/

7

Page 8: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

About Chris

Developer Advocate for CloudFoundry.com

8

Page 9: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Agenda

 Why NoSQL? NewSQL?   Persisting entities   Implementing queries

Slide 9

Page 10: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Food to Go

  Take-out food delivery service

  “Launched” in 2006   Used a relational

database (naturally)

10

Page 11: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Success Growth challenges

  Increasing traffic   Increasing data volume  Distribute across a few data centers   Increasing domain model complexity

Page 12: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Limitations of relational databases

 Scaling  Distribution  Updating schema  O/R impedance mismatch  Handling semi-structured data

12

Page 13: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Solution: Spend Money

  Hire more DevOps   Use application-level sharding   Build your own middleware   …

  Buy SSD and RAM   Buy Oracle   Buy high-end servers   …

http://upload.wikimedia.org/wikipedia/commons/e/e5/Rising_Sun_Yacht.JPG

OR

http://www.trekbikes.com/us/en/bikes/road/race_performance/madone_5_series/madone_5_2/#

13

Page 14: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Solution: Use NoSQL

Higher performance Higher scalability Richer data-model Schema-less

Limited transactions Relaxed consistency Unconstrained data

Ben

efits

Draw

backs

14

Page 15: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Slide 15

MongoDB

Page 16: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

MongoDB is a document oriented DB

16

Server

Database: Food To Go

Collection: Restaurants

{ "_id" : ObjectId("4bddc2f49d1505567c6220a0") "name": "Ajanta", "serviceArea": ["94619", "99999"], "openingHours": [

{ "dayOfWeek": 1, "open": 1130, "close": 1430 }, { "dayOfWeek": 2, "open": 1130, "close": 1430 }, …

] }

BSON = binary JSON

Sequence of bytes on disk fast i/o

Page 17: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

MongoDB – writes are fast

Slide 17

Client MongoDB

Insert(collection, documents)

Insert(collection, documents)

Insert(collection, documents)

getLastError()

Async

Blocks until writes succeed

Page 18: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Config Server

Shard 1

MongoDB is scalable Mongod (replica)

Mongod (master) Mongod

(replica)

Shard 2 Mongod (replica)

Mongod (master) Mongod

(replica)

mongod

mongod

mongod

mongos

client

Collections spread over multiple

shards

A shard consists of a replica set = generalization of master slave

4/11/12 Slide 18

Page 19: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

MongoDB use cases

 Use cases   High volume writes   Complex data   Semi-structured data

 Used by: Shutterfly, Foursquare, Bit.ly, …

Slide 19

Page 20: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

20

Apache Cassandra

Page 21: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Keyspace

Column Family

K1

Cassandra is a column-oriented DB

N1 V1 TS1 N2 V2 TS2 N3 V3 TS3

N1 V1 TS1 N2 V2 TS2 N3 V3 TS3 K2

Column Name

Column Value

Timestamp Row Key

21

Column name/value: number, string, Boolean, timestamp, and composite

Page 22: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

CF.insert(key=K1, (N4, V4, TS4), …)

Cassandra– inserting/updating data

Idempotent= transaction

Column Family

K1 N1 V1 TS1

N2 V2 TS2 N3 V3 TS3

Column Family

K1 N1 V1 TS1

N2 V2 TS2 N3 V3 TS3 N4 V4 TS4

22

Page 23: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

CF.slice(key=K1, startColumn=N2, endColumn=N4)

Cassandra– retrieving data Column Family

K1 N1 V1 TS1

N2 V2 TS2 N3 V3 TS3 N4 V4 TS4

K1 N2 V2 TS2 N3 V3 TS3 N4 V4 TS4

23

Page 24: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Tunable reads and writes

 Any node (for writes)  One replica  Quorum of replicas   Local quorum   Each quorum  All replicas

Slide 24

Higher availability Lower response time Less consistency

Lower availability Higher response time More consistency

Page 25: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Datacenter 2

Cassandra cluster

Datacenter 1

Cassandra cluster

Cassandra is scalable

Slide 25

Node 1

Node 2

Node 3

Node 4

Node 1

Node 2

Node 3

Node 4

Application Application

Page 26: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Cassandra use cases

 Use cases   Big data   Multiple Data Center distributed

database   (Write intensive) Logging   High-availability (writes)

 Used by: Netflix, Facebook, Digg, etc.

Slide 26

Page 27: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Other NoSQL databases

http://nosql-database.org/ lists 122+ NoSQL databases

Type Examples

Extensible columns/Column-oriented

Hbase SimpleDB DynamoDB

Graph Neo4j

Key-value Redis Membase

Document CouchDb

27

Page 28: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Solution: Use NewSQL

 Relational databases with SQL and ACID transactions

AND

 New and improved architecture  Radically better scalability and

performance

 NewSQL vendors: ScaleDB, NimbusDB, …, VoltDB

Slide 28

Page 29: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Stonebraker’s motivations

29

“…Current databases are designed for 1970s hardware …”

Stonebraker: http://www.slideshare.net/VoltDB/sql-myths-webinar

Significant overhead in “…logging, latching, locking, B-tree, and buffer management

operations…”

SIGMOD 08: Though the looking glass: http://dl.acm.org/citation.cfm?id=1376713

Page 30: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

About VoltDB

 Open-source   In-memory relational database  Durability thru replication; snapshots

and logging   Transparent partitioning   Fast and scalable

Slide 30

http://www.mysqlperformanceblog.com/2011/02/28/is-voltdb-really-as-scalable-as-they-claim/

…VoltDB is very scalable; it should scale to 120 partitions, 39 servers, and 1.6 million complex transactions per second at over 300 CPU cores…

Page 31: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

The future is polyglot persistence

IEEE Software Sept/October 2010 - Debasish Ghosh / Twitter @debasishg

e.g. Netflix •  RDBMS •  SimpleDB •  Cassandra •  Hadoop/Hbase

31

Page 32: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Spring Data is here to help

http://www.springsource.org/spring-data

NoSQL databases

32

For

Page 33: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Agenda

 Why NoSQL? NewSQL?  Persisting entities   Implementing queries

Slide 33

Page 34: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Food to Go – Place Order use case

1.  Customer enters delivery address and delivery time

2.  System displays available restaurants 3.  Customer picks restaurant 4.  System displays menu 5.  Customer selects menu items 6.  Customer places order

34

Page 35: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Food to Go – Domain model (partial)

class Restaurant { long id; String name; Set<String> serviceArea; Set<TimeRange> openingHours; List<MenuItem> menuItems; }

class MenuItem { String name; double price; }

class TimeRange { long id; int dayOfWeek; int openingTime; int closingTime; }

35

Page 36: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Database schema ID Name …

1 Ajanta

2 Montclair Eggshop

Restaurant_id zipcode

1 94707

1 94619

2 94611

2 94619

Restaurant_id dayOfWeek openTime closeTime

1 Monday 1130 1430

1 Monday 1730 2130

2 Tuesday 1130 …

RESTAURANT table

RESTAURANT_ZIPCODE table

RESTAURANT_TIME_RANGE table

36

Page 37: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Pick your JDBC-based framework

interface AvailableRestaurantRepository {

void add(Restaurant restaurant); Restaurant findDetailsById(int id); … }

37

Spring JDBC Hibernate/JPA …

JDBC

Page 38: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Restaurant aggregate

But with NoSQL?

interface AvailableRestaurantRepository {

void add(Restaurant restaurant); Restaurant findDetailsById(int id); … }

38

Restaurant

MenuItem TimeRange

? ?

Page 39: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Slide 39

MongoDB

Page 40: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

MongoDB: persisting restaurants is easy

40

{ "_id" : ObjectId("4bddc2f49d1505567c6220a0") "name": "Ajanta", "serviceArea": ["94619", "99999"], "openingHours": [

{ "dayOfWeek": 1, "open": 1130, "close": 1430 }, { "dayOfWeek": 2, "open": 1130, "close": 1430 }, …

] }

Page 41: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Spring Data for Mongo code @Repository public class AvailableRestaurantRepositoryMongoDbImpl

implements AvailableRestaurantRepository {

public static String AVAILABLE_RESTAURANTS_COLLECTION = "availableRestaurants";

@Autowired private MongoTemplate mongoTemplate;

@Override public void add(Restaurant restaurant) { mongoTemplate.insert(restaurant, AVAILABLE_RESTAURANTS_COLLECTION); }

@Override public Restaurant findDetailsById(int id) { return mongoTemplate.findOne(new Query(where("_id").is(id)),

Restaurant.class, AVAILABLE_RESTAURANTS_COLLECTION); } }

41

Page 42: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Spring Configuration

@Configuration public class MongoConfig extends AbstractDatabaseConfig {

@Value("#{mongoDbProperties.databaseName}") private String mongoDbDatabase;

@Bean public Mongo mongo() throws UnknownHostException, MongoException { return new Mongo(databaseHostName); }

@Bean public MongoTemplate mongoTemplate(Mongo mongo) throws Exception { MongoTemplate mongoTemplate = new MongoTemplate(mongo, mongoDbDatabase); … return mongoTemplate; } }

42

Page 43: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

43

Apache Cassandra

Page 44: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Option #1: Use a column per attribute

Column Family: RestaurantDetails

1

name Ajanta

type indian

serviceArea[0] 94619

serviceArea[1] 94707

openingHours[0].dayOfWeek Monday

openingHours[0].open 1130

2

name Egg shop

type Break Fast

serviceArea[0] 94611

serviceArea[1] 94619

openingHours[0].dayOfWeek Monday

openingHours[0].open 0830

Column Name = path/expression to access property value

openingHours[0].close 1430

openingHours[0].close 1430

Page 45: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

2 attributes: { name: “Montclair Eggshop”, … }

Column Family: RestaurantDetails

1 attributes { name: “Ajanta”, …}

2

Option #2: Use a single column

attributes { name: “Eggshop”, …}

✔ 45

Column value = serialized object graph, e.g. JSON

Can’t use secondary indexes but they aren’t helpful for these use cases

Page 46: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Cassandra code: wrapper around Hector

public class AvailableRestaurantRepositoryCassandraKeyImpl implements AvailableRestaurantRepository {

@Autowired private final CassandraTemplate cassandraTemplate;

public void add(Restaurant restaurant) { cassandraTemplate.insertEntity(keyspace,

RESTAURANT_DETAILS_CF, restaurant);

}

public Restaurant findDetailsById(int id) { String key = Integer.toString(id); return cassandraTemplate.findEntity(Restaurant.class,

keyspace, key, RESTAURANT_DETAILS_CF); … }

… 46

Home grown wrapper class

http://en.wikipedia.org/wiki/Hector

Page 47: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Slide 47

Page 48: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Using VoltDB

 Use the original schema  Standard SQL statements

BUT YOU MUST

 Write stored procedures and invoke them using proprietary interface

  Partition your data

Slide 48

Page 49: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

About VoltDB stored procedures

 Key part of VoltDB  Replication = executing stored

procedure on replica   Logging = log stored procedure

invocation  Stored procedure invocation =

transaction

Slide 49

Page 50: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

About partitioning

Slide 50

ID Name …

1 Ajanta

2 Eggshop

Partition column

RESTAURANT table

ID Name …

1 Ajanta

ID Name …

2 Eggshop

Partition 1 Partition 2

Page 51: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Example VoltDB cluster

Slide 51

VoltDB Server A

ID Name …

1 Ajanta

VoltDB Server B

ID Name …

2 Eggshop

VoltDB Server C

ID Name …

… ..

Partition 1a Partition 2a Partition 3a

ID Name …

1 Ajanta

ID Name …

2 Eggshop

ID Name …

… ..

Partition 3b Partition 1b Partition 2b

(3 servers) x (2 partitions per server) ÷

(2 replicas per partition) 3 partitions

Page 52: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Single partition procedure: FAST

Slide 52

VoltDB Server A

ID Name …

1 Ajanta

VoltDB Server B

ID Name …

1 Eggshop

VoltDB Server C

ID Name …

… ..

SELECT * FROM RESTAURANT WHERE ID = 1

High-performance lock free code Partition column

Stored procedure parameter

Page 53: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Multi-partition procedure: SLOWER

Slide 53

VoltDB Server A

ID Name …

1 Ajanta

VoltDB Server B

ID Name …

1 Eggshop

VoltDB Server C

ID Name …

… ..

SELECT * FROM RESTAURANT WHERE NAME = ‘Ajanta’

Communication/Coordination overhead

Page 54: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Chosen partitioning scheme

54

<partitions> <partition table="restaurant" column="id"/> <partition table="service_area" column="restaurant_id"/> <partition table="menu_item" column="restaurant_id"/> <partition table="time_range" column="restaurant_id"/> <partition table="available_time_range" column="restaurant_id"/> </partitions>

Performance is excellent: much faster than MySQL

Page 55: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Stored procedure – AddRestaurant @ProcInfo( singlePartition = true, partitionInfo = "Restaurant.id: 0”)

public class AddRestaurant extends VoltProcedure {

public final SQLStmt insertRestaurant = new SQLStmt("INSERT INTO Restaurant VALUES (?,?);");

public final SQLStmt insertServiceArea = new SQLStmt("INSERT INTO service_area VALUES (?,?);");

public final SQLStmt insertOpeningTimes = new SQLStmt("INSERT INTO time_range VALUES (?,?,?,?);");

public final SQLStmt insertMenuItem = new SQLStmt("INSERT INTO menu_item VALUES (?,?,?);");

public long run(int id, String name, String[] serviceArea, long[] daysOfWeek, long[] openingTimes, long[] closingTimes, String[] names, double[] prices) {

voltQueueSQL(insertRestaurant, id, name);

for (String zipCode : serviceArea) voltQueueSQL(insertServiceArea, id, zipCode);

for (int i = 0; i < daysOfWeek.length ; i++) voltQueueSQL(insertOpeningTimes, id, daysOfWeek[i], openingTimes[i], closingTimes[i]);

for (int i = 0; i < names.length ; i++) voltQueueSQL(insertMenuItem, id, names[i], prices[i]);

voltExecuteSQL(true);

return 0; } }

55

Page 56: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

VoltDb repository – add() @Repository public class AvailableRestaurantRepositoryVoltdbImpl

implements AvailableRestaurantRepository {

@Autowired private VoltDbTemplate voltDbTemplate;

@Override public void add(Restaurant restaurant) { invokeRestaurantProcedure("AddRestaurant", restaurant); }

private void invokeRestaurantProcedure(String procedureName, Restaurant restaurant) { Object[] serviceArea = restaurant.getServiceArea().toArray(); long[][] openingHours = toArray(restaurant.getOpeningHours()); Object[][] menuItems = toArray(restaurant.getMenuItems());

voltDbTemplate.update(procedureName, restaurant.getId(), restaurant.getName(), serviceArea, openingHours[0], openingHours[1],

openingHours[2], menuItems[0], menuItems[1]); }

56

Flatten Restaurant

Page 57: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

VoltDbTemplate wrapper class public class VoltDbTemplate {

private Client client;

public VoltDbTemplate(Client client) { this.client = client; }

public void update(String procedureName, Object... params) { try { ClientResponse x =

client.callProcedure(procedureName, params); … } catch (Exception e) { throw new RuntimeException(e); } }

57

VoltDB client API

Page 58: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

VoltDb server configuration

voltcompiler target/classes \ src/main/resources/sql/voltdb-project.xml foodtogo.jar

58

<?xml version="1.0"?> <project> <info> <name>Food To Go</name> ... </info> <database> <schemas> <schema path='schema.sql' /> </schemas>

<partitions> <partition table="restaurant" column="id"/> ...

</partitions>

<procedures> <procedure class='net.chrisrichardson.foodToGo.newsql.voltdb.procs.AddRestaurant' /> ... </procedures> </database> </project>

<deployment> <cluster hostcount="1"

sitesperhost="5" kfactor="0" /> </deployment>

bin/voltdb leader localhost catalog foodtogo.jar deployment deployment.xml

Page 59: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Agenda

 Why NoSQL? NewSQL?   Persisting entities   Implementing queries

Slide 59

Page 60: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Finding available restaurants Available restaurants =

Serve the zip code of the delivery address

AND Are open at the delivery time

public interface AvailableRestaurantRepository {

List<AvailableRestaurant> findAvailableRestaurants(Address deliveryAddress, Date deliveryTime); …

}

60

Page 61: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Finding available restaurants on Monday, 6.15pm for 94619 zip

select r.* from restaurant r inner join restaurant_time_range tr on r.id =tr.restaurant_id inner join restaurant_zipcode sa on r.id = sa.restaurant_id Where ’94619’ = sa.zip_code and tr.day_of_week=’monday’ and tr.openingtime <= 1815 and 1815 <= tr.closingtime

Straightforward three-way join

61

Page 62: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Slide 62

MongoDB

Page 63: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

MongoDB = easy to query

Find a restaurant that serves the 94619 zip code and is open at 6.15pm on a Monday

{ serviceArea:"94619", openingHours: { $elemMatch : { "dayOfWeek" : "Monday", "open": {$lte: 1815}, "close": {$gte: 1815} } } }

DBCursor cursor = collection.find(qbeObject); while (cursor.hasNext()) { DBObject o = cursor.next(); … }

63

db.availableRestaurants.ensureIndex({serviceArea: 1})

Page 64: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

MongoTemplate-based code @Repository public class AvailableRestaurantRepositoryMongoDbImpl implements AvailableRestaurantRepository {

@Autowired private final MongoTemplate mongoTemplate;

@Override public List<AvailableRestaurant> findAvailableRestaurants(Address deliveryAddress,

Date deliveryTime) { int timeOfDay = DateTimeUtil.timeOfDay(deliveryTime); int dayOfWeek = DateTimeUtil.dayOfWeek(deliveryTime);

Query query = new Query(where("serviceArea").is(deliveryAddress.getZip()) .and("openingHours”).elemMatch(where("dayOfWeek").is(dayOfWeek) .and("openingTime").lte(timeOfDay) .and("closingTime").gte(timeOfDay)));

return mongoTemplate.find(AVAILABLE_RESTAURANTS_COLLECTION, query, AvailableRestaurant.class);

}

mongoTemplate.ensureIndex(“availableRestaurants”, new Index().on("serviceArea", Order.ASCENDING));

64

Page 65: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

65

But how do this with Apache Cassandra??!

Page 66: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Column Family

X Y Z

Slice is equivalent to a simple SELECT

keyVal colName colValue

X Y Z

columnFamily.slice(key=keyVal, startColumn=startVal, endColumn=endVal)

select key, colName, colValue from columnFamily where key = keyVal and colName >= startVal and colName <= endVal

Page 67: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Slide 67

Queries instead of data model drives NoSQL database design

We need to implement an index that can be queried using a slice

Page 68: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Simplification #1: Denormalization

Restaurant_id Day_of_week Open_time Close_time Zip_code

1 Monday 1130 1430 94707

1 Monday 1130 1430 94619

1 Monday 1730 2130 94707

1 Monday 1730 2130 94619

2 Monday 0700 1430 94619

SELECT restaurant_id FROM time_range_zip_code WHERE day_of_week = ‘Monday’ AND zip_code = 94619 AND 1815 < close_time

AND open_time < 1815

Simpler query:   No joins   Two = and two <

68

Page 69: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Simplification #2: Application filtering

SELECT restaurant_id, open_time FROM time_range_zip_code WHERE day_of_week = ‘Monday’ AND zip_code = 94619 AND 1815 < close_time

AND open_time < 1815

Even simpler query •  No joins •  Two = and one <

69

Page 70: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Simplification #3: Eliminate multiple =’s with concatenation

SELECT restaurant_id, open_time FROM time_range_zip_code WHERE zip_code_day_of_week = ‘94619:Monday’ AND 1815 < close_time

Restaurant_id Zip_dow Open_time Close_time

1 94707:Monday 1130 1430

1 94619:Monday 1130 1430

1 94707:Monday 1730 2130

1 94619:Monday 1730 2130

2 94619:Monday 0700 1430

Row key

range

70

Page 71: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Column Family: AvailableRestaurants

94619:Monday (1430,0700,2) JSON FOR EGG

(1430,1130,1) JSON FOR AJANTA

(2130,1730,1) JSON FOR AJANTA

Column family as an index

Restaurant_id Zip_dow Open_time Close_time

1 94707:Monday 1130 1430

1 94619:Monday 1130 1430

1 94707:Monday 1730 2130

1 94619:Monday 1730 2130

2 94619:Monday 0700 1430

94619:Monday 1430 2 0700

Page 72: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

slice(key= 94619:Monday, sliceStart = (1815, *, *), sliceEnd = (2359, *, *))

Querying with a slice

94619:Monday

72

Column Family: AvailableRestaurants

94619:Monday (1430,0700,2) JSON FOR

EGG

(1430,1130,1) JSON FOR AJANTA

(2130,1730,1) JSON FOR AJANTA

18:15 is after 17:30 {Ajanta}

(2130,1730,1) JSON FOR AJANTA

Page 73: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

private void insertAvailability(Restaurant restaurant) {

for (String zipCode : (Set<String>) restaurant.getServiceArea()) { for (TimeRange tr : (Set<TimeRange>) restaurant.getOpeningHours()) { String dayOfWeek = format2(tr.getDayOfWeek()); String openingTime = format4(tr.getOpeningTime()); String closingTime = format4(tr.getClosingTime()); String restaurantId = format8(restaurant.getId());

String key = formatKey(zipCode, dayOfWeek); String columnValue = toJson(restaurant);

Composite columnName = new Composite(); columnName.add(0, closingTime); columnName.add(1, openingTime); columnName.add(2, restaurantId);

ColumnFamilyUpdater<String, Composite> updater = compositeCloseTemplate.createUpdater(key);

updater.setString(columnName, columnValue);

compositeCloseTemplate.update(updater);

} } }

Needs a few pages of code

73

@Override public List<AvailableRestaurant> findAvailableRestaurants(Address deliveryAddress, Date deliveryTime) { int dayOfWeek = DateTimeUtil.dayOfWeek(deliveryTime); int timeOfDay = DateTimeUtil.timeOfDay(deliveryTime); String zipCode = deliveryAddress.getZip(); String key = formatKey(zipCode, format2(dayOfWeek));

HSlicePredicate<Composite> predicate = new HSlicePredicate<Composite>(new CompositeSerializer()); Composite start = new Composite(); Composite finish = new Composite(); start.addComponent(0, format4(timeOfDay), ComponentEquality.GREATER_THAN_EQUAL); finish.addComponent(0, format4(2359), ComponentEquality.GREATER_THAN_EQUAL); predicate.setRange(start, finish, false, 100);

final List<AvailableRestaurantIndexEntry> closingAfter = new ArrayList<AvailableRestaurantIndexEntry>();

ColumnFamilyRowMapper<String, Composite, Object> mapper = new ColumnFamilyRowMapper<String, Composite, Object>() {

@Override public Object mapRow(ColumnFamilyResult<String, Composite> results) { for (Composite columnName : results.getColumnNames()) { String openTime = columnName.get(1, new StringSerializer()); String restaurantId = columnName.get(2, new StringSerializer()); closingAfter.add(new AvailableRestaurantIndexEntry(openTime, restaurantId, results.getString(columnName))); } return null; } };

compositeCloseTemplate.queryColumns(key, predicate, mapper);

List<AvailableRestaurant> result = new LinkedList<AvailableRestaurant>();

for (AvailableRestaurantIndexEntry trIdAndAvailableRestaurant : closingAfter) { if (trIdAndAvailableRestaurant.isOpenBefore(timeOfDay)) result.add(trIdAndAvailableRestaurant.getAvailableRestaurant()); }

return result; }

Page 74: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

What did I just do to query the data?

 Wrote code to maintain an index  Reduced performance due to extra

writes

74

Page 75: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

But what would you rather implement?

Slide 75

“Complex” query logic

OR

Multi-datacenter, multi-master database

infrastructure

Page 76: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Slide 76

Page 77: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

VoltDB - attempt #0

Slide 77

SELECT r.* FROM restaurant r,time_range tr, service_area sa WHERE ? = sa.zip_code and sa.restaurant_id= tr.restaurant_id AND r.restaurant_id = sa.restaurant_id and tr.day_of_week=? AND tr.open_time <= ? AND ? <= tr.close_time

Slow

@ProcInfo( singlePartition = false) public class FindAvailableRestaurants extends VoltProcedure { ... }

Page 78: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

VoltDB - attempt #1

Slide 78

ERROR 10:12:03,251 [main] COMPILER: Failed to plan for statement type(findAvailableRestaurants_with_join) select r.* from restaurant r,time_range tr, service_area sa Where ? = sa.zip_code and sa.restaurant_id= tr.restaurant_id and r.restaurant_id = sa.restaurant_id and tr.day_of_week=? and tr.open_time <= ? and ? <= tr.close_time Error: "Unable to plan for statement. Possibly joining partitioned tables in a multi-partition procedure using a column that is not the partitioning attribute or a non-equality operator. This is statement not supported at this time." 2012-03-19 08:19:31,743 ERROR [main] COMPILER: Catalog compilation failed.

#fail

@ProcInfo( singlePartition = false) public class FindAvailableRestaurants extends VoltProcedure { ... }

create index idx_service_area_zip_code on service_area(zip_code); ..

Page 79: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

VoltDB - attempt #2

Slide 79

@ProcInfo( singlePartition = true, partitionInfo = "Restaurant.id: 0”) public class AddRestaurant extends VoltProcedure {

public final SQLStmt insertAvailable= new SQLStmt("INSERT INTO available_time_range VALUES (?,?,?, ?, ?, ?);");

public long run(....) { ...

for (int i = 0; i < daysOfWeek.length ; i++) { voltQueueSQL(insertOpeningTimes, id, daysOfWeek[i], openingTimes[i], closingTimes[i]); for (String zipCode : serviceArea) { voltQueueSQL(insertAvailable, id, daysOfWeek[i], openingTimes[i],

closingTimes[i], zipCode, name); } }

... voltExecuteSQL(true); return 0; } }

Works but queries are only slightly faster than MySQL!

public final SQLStmt findAvailableRestaurants_denorm = new SQLStmt( "select restaurant_id, name from available_time_range tr " + "where ? = tr.zip_code " + "and tr.day_of_week=? " + "and tr.open_time <= ? " + " and ? <= tr.close_time ");

Page 80: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

VoltDB - attempt #3

80

<partitions> ... <partition table="available_time_range" column="zip_code"/> </partitions>

Queries are really fast But inserts are not

@ProcInfo( singlePartition = false, ...) public class AddRestaurant extends VoltProcedure { ... }

@ProcInfo( singlePartition = true, partitionInfo = "available_time_range.zip_code: 0") public class FindAvailableRestaurants extends VoltProcedure { ... }

Page 81: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

VoltDB – key lesson

Slide 81

A given partitioning scheme can be good for some use cases but bad for others

Page 82: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Summary

82

Different databases =

Different tradeoffs

Page 83: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

… Summary…

Benefits Drawbacks

RDBMS • SQL • ACID • Familiar

• Lack of performance and scalability • Difficult to distribute • Schema updates • Handling semi-structured data

NoSQL • Scalability • Performance • Schema-less

• Lack of ACID • Limited querying (some)

NewSQL • ACID • SQL • Familiar • Scalability • Performance

• Proprietary API • Schema updates • Handling semi-structured data • Not all use cases are performant

Slide 83

Page 84: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

… Summary

 Very carefully pick the NewSQL/NoSQL DB for your application

 Consider a polyglot persistence architecture

  Encapsulate your data access code so you can switch

 Startups = avoid NewSQL/NoSQL for shorter time to market?

Slide 84

Page 85: SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

Thank you!

My contact info:

[email protected]

@crichardson

Blog: http://plainoldobjects.com