NoSE: Schema Design for NoSQL Applications

26
NoSE: Schema Design for NoSQL Applications Michael J. Mior, Kenneth Salem, Ashraf Aboulnaga, Rui Liu

Transcript of NoSE: Schema Design for NoSQL Applications

Page 1: NoSE: Schema Design for NoSQL Applications

NoSE: Schema Design forNoSQL Applications

Michael J. Mior, Kenneth Salem, Ashraf Aboulnaga, Rui Liu

Page 2: NoSE: Schema Design for NoSQL Applications

NoSE

● NoSQL App Development

● Problem Formulation

● NoSE Design and Implementation

● Evaluation

Page 3: NoSE: Schema Design for NoSQL Applications

NoSQL

● Eventually consistent, horizontally scalable, flexible schema

● Many different types of NoSQL databases○ Document stores○ Key-value stores○ Graph databases○ …○ Extensible record stores

Page 4: NoSE: Schema Design for NoSQL Applications

Extensible Record Store Data Model

CREATE COLUMNFAMILY "ReservationsByGuest"(

"GuestID" uuid, "ResID" uuid,

"ResStartDate" timestamp,

"RoomID" uuid, PRIMARY KEY(("GuestID"),

"ResStartDate", "ResID", "RoomID")

);

Partitioning key

Clustering key

Page 5: NoSE: Schema Design for NoSQL Applications

Database Application Development

1. Define application requirements

2. Decide on a data model for the target system

3. Implement the application according to the model

a. Database access

b. Application logic

Page 6: NoSE: Schema Design for NoSQL Applications

Database Application Development

1. Define application requirements

2. Decide on a data model for the target system

3. Implement the application according to the model

a. Database access

b. Application logic

}NoSE

Page 7: NoSE: Schema Design for NoSQL Applications

Schema Design Best Practices

Source: http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/

Page 8: NoSE: Schema Design for NoSQL Applications

Schema Design Best PracticesModel column families around query patterns

Source: http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/

Page 9: NoSE: Schema Design for NoSQL Applications

Schema Design Best PracticesModel column families around query patterns

But start your design with entities and relationships, if you can

Source: http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/

Page 10: NoSE: Schema Design for NoSQL Applications

Schema Design Best PracticesModel column families around query patterns

But start your design with entities and relationships, if you can

De-normalize and duplicate for read performance

Source: http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/

Page 11: NoSE: Schema Design for NoSQL Applications

Schema Design Best PracticesModel column families around query patterns

But start your design with entities and relationships, if you can

De-normalize and duplicate for read performance

But don’t de-normalize if you don’t need to

Source: http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/

Page 12: NoSE: Schema Design for NoSQL Applications

Schema Design Best PracticesModel column families around query patterns

But start your design with entities and relationships, if you can

De-normalize and duplicate for read performance

But don’t de-normalize if you don’t need to

Leverage wide rows for ordering, grouping, and filtering

Source: http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/

Page 13: NoSE: Schema Design for NoSQL Applications

Schema Design Best PracticesModel column families around query patterns

But start your design with entities and relationships, if you can

De-normalize and duplicate for read performance

But don’t de-normalize if you don’t need to

Leverage wide rows for ordering, grouping, and filtering

But don’t go too wideSource: http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/

Page 14: NoSE: Schema Design for NoSQL Applications

Schema Design ExampleFor a given guest, return the cities that guest has stayed in

CREATE COLUMNFAMILY "CitiesByGuest" ("GuestID" uuid,

"City" text, PRIMARY KEY(("GuestID"), "City"));

CREATE COLUMNFAMILY "HotelsByGuest" ("GuestID" uuid,

"HotelID" uuid, PRIMARY KEY(("GuestID"), "HotelID"));

CREATE COLUMNFAMILY "HotelsByID" ("HotelID" uuid,

"HotelCity" text, PRIMARY KEY(("HotelID"), "HotelCity"));

Page 15: NoSE: Schema Design for NoSQL Applications

NoSE Overview

Input Output

Conceptual

schema

Workload

Selected column

families

Query

implementation

plans

NoSE

Page 16: NoSE: Schema Design for NoSQL Applications

Application Conceptual Model

Hotel

HotelIDHotelNameHotelPhoneHotelAddressHotelCityHotelStateHotelZip

Room

RoomIDRoomNumberRoomRateRoomFloor

Reservation

ResIDResStartDateResEndDate

Guest

GuestIDGuestNameGuestEmail

Point of Interest

POIIDPOINamePOIDescription

Amenity

AmenityIDAmenityDescription

Page 17: NoSE: Schema Design for NoSQL Applications

Application WorkloadFor a given guest, return the cities that guest has stayed in

SELECT Hotel.HotelCity FROM Hotel.Room.Reservation.Guest

WHERE Guest.GuestID = ?

Hotel

HotelIDHotelCity

Room

RoomID

Reservation

ResID

Guest

GuestID

Page 18: NoSE: Schema Design for NoSQL Applications

NoSE ArchitectureNoSE

Input OutputCandidate

Enumeration

Query Planning

Schema

Optimization

Plan

Recommendation

Conceptual

schema

Workload

Selected column

families

Query

implementation plans

Page 19: NoSE: Schema Design for NoSQL Applications

Query Planning ExampleSELECT Name FROM Hotel WHERE Hotel.State = ‘NY’ AND

Hotel.Reservation.Room.Guest.GuestID = ? ORDER BY Name

GuestID↓

RoomID

RoomID↓

HotelID

HotelID↓

Name, State

Name

State

Page 20: NoSE: Schema Design for NoSQL Applications

Schema Optimization

Construct a linear program to optimize execution time

Cost of using column family j to answer query i

Use of column family j for query i in the final plan

Presence of column family j in final schema

Size of column family j

Page 21: NoSE: Schema Design for NoSQL Applications

Schema Optimization

Add constraints to ensure each query has a valid plan

Minimize the cost

Ensure column families used are present

Limit maximum storage space

Page 22: NoSE: Schema Design for NoSQL Applications

Updates● Updates make denormalization more expensive

● Add statements to update conceptual entities

● New column families are added to support updates

● Costs for updates are added to the linear program

Page 23: NoSE: Schema Design for NoSQL Applications

Evaluation

● Application defined by the RUBiS online auction benchmark

● Generate a schema and query plans recommended by NoSE

● Two schemas for comparison

○ Normalized (as much as possible)

○ Expert-selected

Page 24: NoSE: Schema Design for NoSQL Applications

Evaluation - Schema Performance

Page 25: NoSE: Schema Design for NoSQL Applications

Conclusion

● NoSE automates schema design for NoSQL applications

● Conforms to best practices without requiring expertise

● Schemas are better than those produced manually with an average of 1.8x and up to 125x performance improvement

Page 26: NoSE: Schema Design for NoSQL Applications

Questions?

git.io/nose-icde