Writing Space and the Cassandra NoSQL DBMS

25
Writing Space and the Cassandra NoSQL DBMS Brian King (with thanks to Michael Aillon)

description

By: Brian King at Pearson Education

Transcript of Writing Space and the Cassandra NoSQL DBMS

Page 1: Writing Space and the Cassandra NoSQL DBMS

Writing Space and the

Cassandra NoSQL DBMS

Brian King (with thanks to Michael Aillon)

Page 2: Writing Space and the Cassandra NoSQL DBMS

Writing Space

Page 3: Writing Space and the Cassandra NoSQL DBMS

“Writing is one of the most effective tools available to develop a student's critical thinking.”

Why A Writing Space?

Page 4: Writing Space and the Cassandra NoSQL DBMS

•  Efficient Administration Of Writing Assignments •  Scalable Classrooms (500+) •  Workflow Optimization / Automation •  Integrated Access to Assessment Tools

o  Grammar Checking o  Auto-Scoring o  Plagiarism Detection (Source Check)

•  Grading Rubrics •  Online Editing and Document Upload •  Peer Review •  Group Projects

The Business Needs

Page 5: Writing Space and the Cassandra NoSQL DBMS

•  Highly "Internet" Scalable •  Global Presence •  Continuous Availability (Fault Tolerance) •  Broad OS And Browser Support •  Mobile Device Support - "Mobile First" •  Low Cost (Systems, Maintenance, Integration) •  Write Once, Integrate “Anywhere” •  Gain Experience With Modern NoSQL Technologies •  REST Service-Based Architecture •  Model UI

The Technical Goals

Page 6: Writing Space and the Cassandra NoSQL DBMS

Writing Space - Instructor

Page 7: Writing Space and the Cassandra NoSQL DBMS

Writing Space - Student

Page 8: Writing Space and the Cassandra NoSQL DBMS

Cassandra

Page 9: Writing Space and the Cassandra NoSQL DBMS

•  Highly Scalable •  Easy Multi-Data Center Support •  Performance •  Distributed Ring Configuration (Master-less) •  Dynamic Schema, “Schema-less” •  Slice Queries

What We Like

Page 10: Writing Space and the Cassandra NoSQL DBMS

•  Eventual / Tunable Consistency •  Key-Name-Value Data Store (Column Based) •  Data Modeling Based On Core Queries •  All Rows in a CF Typically Don't Live On 1 Server •  However, All Columns For a Row Do •  RDBMS Mindset •  No Ad Hoc Queries

What Challenged Us

Page 11: Writing Space and the Cassandra NoSQL DBMS

What Is Consistency?

•  Write Consistency: Number Of Replicas Written To •  Read Consistency: Number Of Replicas Queried

•  Replication Factor: Number Of Replicas For A Row •  Quorum Consistency Level (Read And Write):

o  Option In Specifying Read/Write Consistency o  (Replication_Factor / 2) + 1 o  Ensures Strong Consistency o  While Maintaining High Availability

•  With 4 Servers, Writing Space uses: o  Replication Factor = 3 o  Read and Write Quorum Consistency

Page 12: Writing Space and the Cassandra NoSQL DBMS

Typical RDBMS Features Not Available (Yet): •  Referential Integrity Constraints / Foreign Keys •  Commit / Rollback •  Stored Procedures •  Joins •  Views •  Triggers •  Functions •  Security Privileges •  Rules •  Partitioned Table Definitions

What's Not In Cassandra...

Page 13: Writing Space and the Cassandra NoSQL DBMS

Cassandra In Writing Space

Page 14: Writing Space and the Cassandra NoSQL DBMS

Document Versioning...

Page 15: Writing Space and the Cassandra NoSQL DBMS

How We Modeled Our Data...

Storage Strategy: Document-oriented

1:M

1:1

Page 16: Writing Space and the Cassandra NoSQL DBMS

The Writing Space DB Infrastructure

Page 17: Writing Space and the Cassandra NoSQL DBMS

The Hardware •  Many Inexpensive Servers (Actually 4 + 1) •  Our Configuration:

Processor: Xeon E5630, 2.53GHz, 4 Cores Memory: 96 GB Storage: Two Mirrored Spinning Disks For OS / Binaries Three Striped 480GB Solid State Drives

(Providing 1.3 TB Local DB Storage) •  Peer to Peer Ring •  Hot Swappable - Fault Tolerant •  "What's Your Insurance Company?"

Page 18: Writing Space and the Cassandra NoSQL DBMS

Why DataStax Cassandra?

•  A Certified, Production Ready Version Of Cassandra •  24/7 World Class Support •  Integration With Hadoop •  Integration With Solr •  OpsCenter (Multi-Data Center Management Tool)

Page 19: Writing Space and the Cassandra NoSQL DBMS

•  Doc Store and UI •  Load: 3x Anticipated Load •  Total Time Of Run: 1.75 hours •  Max Document Size: 10k (25k, 50k and 75k DS)

Results Average Response Time: < 300ms Maximum Running Vusers: 684 Total Throughput (bytes): 7,176,727,121 Average Throughput (bytes/sec): 1,993,535 Total Hits: 342,833 Average Hits per Second: 95 DB Server CPU < 0.3%

Performance

Page 20: Writing Space and the Cassandra NoSQL DBMS

•  Document Store only •  Load: 100x Anticipated Load •  Total Time Of Run: 1 hour •  Document Size: 25k, 50k and 75k

Results Average Response Time: < 100ms Maximum Running Vusers: 2,200 Total Throughput (bytes): 2,291,522,553 Average Throughput (bytes/sec): 565,808 Total Hits: 834,640 Average Hits per Second: 206 DB Server CPU < 1%

Performance

Page 21: Writing Space and the Cassandra NoSQL DBMS

Wrapping It Up

Page 22: Writing Space and the Cassandra NoSQL DBMS

Cloud Decision Points

•  Cost Savings •  Continuous Availability •  Performance / Dynamic (Elastic) Scalability •  Global Distribution Of Access Points •  Redundancy •  Disaster Recovery •  Resiliency To Node / Connectivity Loses A Must

Page 23: Writing Space and the Cassandra NoSQL DBMS

•  Think About Reporting Up Front •  Data Analytics – Hadoop and Solr Are Heavy Duty •  More Expensive Hardware? •  Different RAID Configuration (Not Striping) •  Get Training – Especially About Schema Design

What Would We Do Differently?

Page 24: Writing Space and the Cassandra NoSQL DBMS

Consider The Human Element... •  Mind Shift For RDBMS Folks •  Need To “Let Go” That Data Needs To Be Normalized •  Experience Of Operations Team •  Netflix - 4 People Managing 800+ Nodes

Global Enterprise •  Global Presence •  Disaster Recovery •  Internet Scale

Final Thoughts...

Page 25: Writing Space and the Cassandra NoSQL DBMS

Writing Space and the Cassandra NoSQL DBMS

Thank you! Questions?

[email protected]