Writing Space and the Cassandra NoSQL DBMS
-
Upload
planet-cassandra -
Category
Technology
-
view
1.142 -
download
1
description
Transcript of Writing Space and the Cassandra NoSQL DBMS
Writing Space and the
Cassandra NoSQL DBMS
Brian King (with thanks to Michael Aillon)
Writing Space
“Writing is one of the most effective tools available to develop a student's critical thinking.”
Why A Writing Space?
• Efficient Administration Of Writing Assignments • Scalable Classrooms (500+) • Workflow Optimization / Automation • Integrated Access to Assessment Tools
o Grammar Checking o Auto-Scoring o Plagiarism Detection (Source Check)
• Grading Rubrics • Online Editing and Document Upload • Peer Review • Group Projects
The Business Needs
• Highly "Internet" Scalable • Global Presence • Continuous Availability (Fault Tolerance) • Broad OS And Browser Support • Mobile Device Support - "Mobile First" • Low Cost (Systems, Maintenance, Integration) • Write Once, Integrate “Anywhere” • Gain Experience With Modern NoSQL Technologies • REST Service-Based Architecture • Model UI
The Technical Goals
Writing Space - Instructor
Writing Space - Student
Cassandra
• Highly Scalable • Easy Multi-Data Center Support • Performance • Distributed Ring Configuration (Master-less) • Dynamic Schema, “Schema-less” • Slice Queries
What We Like
• Eventual / Tunable Consistency • Key-Name-Value Data Store (Column Based) • Data Modeling Based On Core Queries • All Rows in a CF Typically Don't Live On 1 Server • However, All Columns For a Row Do • RDBMS Mindset • No Ad Hoc Queries
What Challenged Us
What Is Consistency?
• Write Consistency: Number Of Replicas Written To • Read Consistency: Number Of Replicas Queried
• Replication Factor: Number Of Replicas For A Row • Quorum Consistency Level (Read And Write):
o Option In Specifying Read/Write Consistency o (Replication_Factor / 2) + 1 o Ensures Strong Consistency o While Maintaining High Availability
• With 4 Servers, Writing Space uses: o Replication Factor = 3 o Read and Write Quorum Consistency
Typical RDBMS Features Not Available (Yet): • Referential Integrity Constraints / Foreign Keys • Commit / Rollback • Stored Procedures • Joins • Views • Triggers • Functions • Security Privileges • Rules • Partitioned Table Definitions
What's Not In Cassandra...
Cassandra In Writing Space
Document Versioning...
How We Modeled Our Data...
Storage Strategy: Document-oriented
1:M
1:1
The Writing Space DB Infrastructure
The Hardware • Many Inexpensive Servers (Actually 4 + 1) • Our Configuration:
Processor: Xeon E5630, 2.53GHz, 4 Cores Memory: 96 GB Storage: Two Mirrored Spinning Disks For OS / Binaries Three Striped 480GB Solid State Drives
(Providing 1.3 TB Local DB Storage) • Peer to Peer Ring • Hot Swappable - Fault Tolerant • "What's Your Insurance Company?"
Why DataStax Cassandra?
• A Certified, Production Ready Version Of Cassandra • 24/7 World Class Support • Integration With Hadoop • Integration With Solr • OpsCenter (Multi-Data Center Management Tool)
• Doc Store and UI • Load: 3x Anticipated Load • Total Time Of Run: 1.75 hours • Max Document Size: 10k (25k, 50k and 75k DS)
Results Average Response Time: < 300ms Maximum Running Vusers: 684 Total Throughput (bytes): 7,176,727,121 Average Throughput (bytes/sec): 1,993,535 Total Hits: 342,833 Average Hits per Second: 95 DB Server CPU < 0.3%
Performance
• Document Store only • Load: 100x Anticipated Load • Total Time Of Run: 1 hour • Document Size: 25k, 50k and 75k
Results Average Response Time: < 100ms Maximum Running Vusers: 2,200 Total Throughput (bytes): 2,291,522,553 Average Throughput (bytes/sec): 565,808 Total Hits: 834,640 Average Hits per Second: 206 DB Server CPU < 1%
Performance
Wrapping It Up
Cloud Decision Points
• Cost Savings • Continuous Availability • Performance / Dynamic (Elastic) Scalability • Global Distribution Of Access Points • Redundancy • Disaster Recovery • Resiliency To Node / Connectivity Loses A Must
• Think About Reporting Up Front • Data Analytics – Hadoop and Solr Are Heavy Duty • More Expensive Hardware? • Different RAID Configuration (Not Striping) • Get Training – Especially About Schema Design
What Would We Do Differently?
Consider The Human Element... • Mind Shift For RDBMS Folks • Need To “Let Go” That Data Needs To Be Normalized • Experience Of Operations Team • Netflix - 4 People Managing 800+ Nodes
Global Enterprise • Global Presence • Disaster Recovery • Internet Scale
Final Thoughts...