The world's next top data model

29
#CASSANDRA13 Patrick McFadin | Solution Architect, DataStax The World's Next Top Data Model Monday, June 24, 13

description

My

Transcript of The world's next top data model

Page 1: The world's next top data model

#CASSANDRA13

Patrick McFadin | Solution Architect, DataStax

The World's Next Top Data Model

Monday, June 24, 13

Page 2: The world's next top data model

#CASSANDRA13

The saga continues!★ Data model is dead, long live the data

model.★ Bridging from Relational to Cassandra

★ Become a Super Modeler★ Core data modeling techniques using

CQL

Monday, June 24, 13

Page 3: The world's next top data model

#CASSANDRA13

Because I love talking about this

Just to recap...

Monday, June 24, 13

Page 4: The world's next top data model

#CASSANDRA13

Why does this matter?* Cassandra lives closer to your users or applications

* Not a hammer for all use case nails

* Proper use case, proper model...

* Get it wrong and...

Monday, June 24, 13

Page 5: The world's next top data model

#CASSANDRA13

When to use Cassandra*

* Need to be in more than one datacenter. active-active

* Scaling from 0 to, uh, well... we’re not really sure.

* Need as close to 100% uptime as possible.

* Getting these from any other solution would just be mega $

and...

*nutshell version. These are all ORs not ANDs

Monday, June 24, 13

Page 6: The world's next top data model

#CASSANDRA13

You get the data model right!

Monday, June 24, 13

Page 7: The world's next top data model

#CASSANDRA13

So let’s do that* Four real world examples

* Use case, what they were avoiding and model to accomplish

* You may think this is you, but it isn’t. I hear these all the time.

* All examples are in CQL3

Monday, June 24, 13

Page 8: The world's next top data model

#CASSANDRA13

But wait you say

CQL doesn’t do dynamic wide rows!

Monday, June 24, 13

Page 9: The world's next top data model

#CASSANDRA13

Yes it can!* CQL does wide rows the same way you did them in Thrift

* No really

* Read this blog post

http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows

...or just trust me and I’ll show you how

Monday, June 24, 13

Page 10: The world's next top data model

#CASSANDRA13

Customers giving you money is a good reason for uptime

Shopping Cart Data Model

Monday, June 24, 13

Page 11: The world's next top data model

#CASSANDRA13

Shopping cart use case* Store shopping cart data reliably

* Minimize (or eliminate) downtime. Multi-dc

* Scale for the “Cyber Monday” problem

* Every minute off-line is lost $$

* Online shoppers want speed!

The bad

Monday, June 24, 13

Page 12: The world's next top data model

#CASSANDRA13

Shopping cart data model* Each customer can have one or more shopping carts

* De-normalize data for fast access

* Shopping cart == One partition (Row Level Isolation)

* Each item a new column

Monday, June 24, 13

Page 13: The world's next top data model

#CASSANDRA13

Shopping cart data modelCREATE TABLE user (! username varchar,! firstname varchar,! lastname varchar,! shopping_carts set<varchar>,! PRIMARY KEY (username));

CREATE TABLE shopping_cart (! username varchar,! cart_name text! item_id int,! item_name varchar,

description varchar,! price float,! item_detail map<varchar,varchar>! PRIMARY KEY ((username,cart_name),item_id));

INSERT INTO shopping_cart (username,cart_name,item_id,item_name,description,price,item_detail)VALUES ('pmcfadin','Gadgets I want',8675309,'Garmin 910XT','Multisport training watch',349.99,{'Related':'Timex sports watch','Volume Discount':'10'});

INSERT INTO shopping_cart (username,cart_name,item_id,item_name,description,price,item_detail)VALUES ('pmcfadin','Gadgets I want',9748575,'Polaris Foot Pod','Bluetooth Smart foot pod',64.00{'Related':'Timex foot pod','Volume Discount':'25'});

One partition (storage row) of data

Item details. Flexible for whatev

Partition row key for one users cart

Creates partition row key

Monday, June 24, 13

Page 14: The world's next top data model

#CASSANDRA13

Watching users, making decisions. Freaky, but cool.

User Activity Tracking

Monday, June 24, 13

Page 15: The world's next top data model

#CASSANDRA13

User activity use case* React to user input in real time

* Support for multiple application pods

* Scale for speed

* Losing interactions is costly

* Waiting for batch(hadoop) is to long

The bad

Monday, June 24, 13

Page 16: The world's next top data model

#CASSANDRA13

User activity data model* Interaction points stored per user in short table

* Long term interaction stored in similar table with date partition

* Process long term later using batch

* Reverse time series to get last N items

Monday, June 24, 13

Page 17: The world's next top data model

#CASSANDRA13

User activity data modelCREATE TABLE user_activity (! username varchar,! interaction_time timeuuid,! activity_code varchar,! detail varchar,! PRIMARY KEY (username, interaction_time)) WITH CLUSTERING ORDER BY (interaction_time DESC);

CREATE TABLE user_activity_history (! username varchar,! interaction_date varchar,! interaction_time timeuuid,! activity_code varchar,! detail varchar,! PRIMARY KEY ((username,interaction_date),interaction_time));

INSERT INTO user_activity (username,interaction_time,activity_code,detail)VALUES ('pmcfadin',0D1454E0-D202-11E2-8B8B-0800200C9A66,'100','Normal login')USING TTL 2592000;

INSERT INTO user_activity_history (username,interaction_date,interaction_time,activity_code,detail)VALUES ('pmcfadin','20130605',0D1454E0-D202-11E2-8B8B-0800200C9A66,'100','Normal login');

Reverse order based on timestamp

Expire after 30 days

Monday, June 24, 13

Page 18: The world's next top data model

#CASSANDRA13

Data model usage

username | interaction_time | detail | activity_code----------+--------------------------------------+------------------------------------------+------------------ pmcfadin | 9ccc9df0-d076-11e2-923e-5d8390e664ec | Entered shopping area: Jewelry | 301 pmcfadin | 9c652990-d076-11e2-923e-5d8390e664ec | Created shopping cart: Anniversary gifts | 202 pmcfadin | 1b5cef90-d076-11e2-923e-5d8390e664ec | Deleted shopping cart: Gadgets I want | 205 pmcfadin | 1b0e5a60-d076-11e2-923e-5d8390e664ec | Opened shopping cart: Gadgets I want | 201 pmcfadin | 1b0be960-d076-11e2-923e-5d8390e664ec | Normal login | 100

select * from user_activity limit 5;

Maybe put a sale item for flowers too?

Monday, June 24, 13

Page 19: The world's next top data model

#CASSANDRA13

Machines generate logs at a furious pace. Be ready.

Log collection/aggregation

Monday, June 24, 13

Page 20: The world's next top data model

#CASSANDRA13

Log collection use case* Collect log data at high speed

* Cassandra near where logs are generated. Multi-datacenter

* Dice data for various uses. Dashboard. Lookup. Etc.

* The scale needed for RDBMS is cost prohibitive

* Batch analysis of logs too late for some use cases

The bad

Monday, June 24, 13

Page 21: The world's next top data model

#CASSANDRA13

Log collection data model* Use Flume to collect and fan out data to various tables

* Tables for lookup based on source and time

* Tables for dashboard with aggregation and summation

Monday, June 24, 13

Page 22: The world's next top data model

#CASSANDRA13

Log collection data model

CREATE TABLE log_lookup (! source varchar,! date_to_minute varchar,! timestamp timeuuid,! raw_log blob,! PRIMARY KEY ((source,date_to_minute),timestamp));

CREATE TABLE login_success (! source varchar,! date_to_minute varchar, ! successful_logins counter, ! PRIMARY KEY (source,date_to_minute)) WITH CLUSTERING ORDER BY (date_to_minute DESC);

CREATE TABLE login_failure (! source varchar,! date_to_minute varchar, ! failed_logins counter, ! PRIMARY KEY (source,date_to_minute)) WITH CLUSTERING ORDER BY (date_to_minute DESC);

Consider storing raw logs as GZIP

Monday, June 24, 13

Page 23: The world's next top data model

#CASSANDRA13

Log dashboard

0

25

50

75

100

10:01 10:03 10:05 10:07 10:09 10:11 10:13 10:15 10:17 10:19

Sucessful LoginsFailed Logins

SELECT date_to_minute,successful_loginsFROM login_successLIMIT 20;

SELECT date_to_minute,failed_loginsFROM login_failureLIMIT 20;

Monday, June 24, 13

Page 24: The world's next top data model

#CASSANDRA13

Because mistaks mistakes happen

User Form Versioning

Monday, June 24, 13

Page 25: The world's next top data model

#CASSANDRA13

Form versioning use case* Store every possible version efficiently

* Scale to any number of users

* Commit/Rollback functionality on a form

* In RDBMS, many relations that need complicated join

* Needs to be in cloud and local data center

The bad

Monday, June 24, 13

Page 26: The world's next top data model

#CASSANDRA13

Form version data model* Each user has a form

* Each form needs versioning

* Separate table to store live version

* Exclusive lock on a form

Monday, June 24, 13

Page 27: The world's next top data model

#CASSANDRA13

Form version data model

CREATE TABLE working_version (! username varchar,! form_id int,! version_number int,! locked_by varchar,! form_attributes map<varchar,varchar> ! PRIMARY KEY ((username, form_id), version_number)) WITH CLUSTERING ORDER BY (version_number DESC);

INSERT INTO working_version (username, form_id, version_number, locked_by, form_attributes)VALUES ('pmcfadin',1138,1,'',{'FirstName<text>':'First Name: ','LastName<text>':'Last Name: ','EmailAddress<text>':'Email Address: ','Newsletter<radio>':'Y,N'});

UPDATE working_version SET locked_by = 'pmcfadin'WHERE username = 'pmcfadin'AND form_id = 1138AND version_number = 1;

INSERT INTO working_version (username, form_id, version_number, locked_by, form_attributes)VALUES ('pmcfadin',1138,2,null,{'FirstName<text>':'First Name: ','LastName<text>':'Last Name: ','EmailAddress<text>':'Email Address: ','Newsletter<checkbox>':'Y'});

1. Insert first version

2. Lock for one user

3. Insert new version. Release lock

Monday, June 24, 13

Page 28: The world's next top data model

#CASSANDRA13

That’s it!

“Mind what you have learned. Save you it can.”

- Yoda. Master Data Modeler

Monday, June 24, 13

Page 29: The world's next top data model

#CASSANDRA13

Your data model is next!* Try out a few things

* See what works

* All else fails, engage an expert in the community

* Want more? Follow me on twitter: @PatrickMcFadin

Monday, June 24, 13