Post on 04-Oct-2018
1
Tackling The Challenges of Big Data Big Data Storage
Samuel Madden Professor and Director of Big Data at CSAIL
Massachusetts Institute of Technology
Tackling The Challenges of Big Data Big Data Storage NoSQL, NewSQL
Introduction
Samuel Madden Professor and Director of Big Data at CSAIL
Massachusetts Institute of Technology
© 2014 Massachusetts Institute of Technology!
© 2014 Massachusetts Institute of Technology!Tackling the Challenges of Big Data!
What Does a Traditional Database Provide? • Record-oriented persistent storage
– e.g., bank accounts, shopping carts, employee records • Carefully structured data (“schemas”)
– account: (customer, balance, interest_rate, …) • Powerful Query Language (SQL)
SELECT balance FROM account WHERE cust=‘Madden’ • Transactions (“ACID Semantics”)
– Group of statements executed together – “All or nothing”
* E.g., Transfer $100 from account A to B – Even in a distributed setting (at some complexity)
2
© 2014 Massachusetts Institute of Technology!Tackling the Challenges of Big Data!
A Thousand Flowers Bloom • Traditional properties not a good fit for all apps • Big Data à Proliferation of Storage Systems • New Capabilities Needed
– Very high throughput operation (millions of users) – Distributed across many machines – Highly available, even with network failures
* Eventually Consistent vs ACID – Different programming interfaces or data models
© 2014 Massachusetts Institute of Technology!Tackling the Challenges of Big Data!
Requirement: High Throughput
• Modern large web apps run thousands of transactions per sec second: – Ad Serving – Email Services – Shopping Carts – Financial Trades
• Conventional database systems not engineered for such rates
© 2014 Massachusetts Institute of Technology!Tackling the Challenges of Big Data!
Requirement: Distributed, Highly Available
• For web applications, availability is critically important
• May even be willing to sacrifice data quality
– E.g., OK if some search results are missing – Traditional databases ACID consistency over availability
• Availability through multi-node replication – Failover in the event of outage
3
© 2014 Massachusetts Institute of Technology!Tackling the Challenges of Big Data!
Requirement: New Programming Models • Not all data is
relational, e.g., json
• Sometimes SQL is complex, slow
• SQL is “schema first”; “wild” data may not conform to a schema, or schema may be hard to describe
{ sku: "00e8da9d",! type: "Film",! ...,! asin: "B000P0J0AQ",!! shipping: { ... },!! pricing: { ... },!! details: {! title: "The Matrix",! director: [ "Andy Wachowski", "Larry Wachowski" ],! writer: [ "Andy Wachowski", "Larry Wachowski" ],! ...,! aspect_ratio: "1.66:1"! },!}!Source: http://docs.mongodb.org/ecosystem/use-cases/product-catalog/!
A JSON Document
© 2014 Massachusetts Institute of Technology!Tackling the Challenges of Big Data!
Rest of This Module
• A survey of a few ideas and systems!– Impossible to be exhastive!
• NoSQL !– Non-tabular data!– Understanding eventual consistency!
• NewSQL!– H-Store: high throughput main memory relational
database!
Tackling The Challenges of Big Data Big Data Storage NoSQL, NewSQL
THANK YOU
© 2014 Massachusetts Institute of Technology!
4
Tackling The Challenges of Big Data Big Data Storage NoSQL, NewSQL
Alternative Data Models
Samuel Madden Professor and Director of Big Data at CSAIL
Massachusetts Institute of Technology
© 2014 Massachusetts Institute of Technology!
© 2014 Massachusetts Institute of Technology!Tackling the Challenges of Big Data!
What is a Data Model?
• A way of representing data in a database!• Classic Approach: “Relational”!
!• Why are data models significant?!
– Representation dictates how programs access data!– SQL: most common relational language!
* Allows complex queries over table structure!
CourseID! Title! ProfID! Time! Room!6.01! Intro to EECS! 1! 8:30 MW! 123!6.02! Intro to EECS ! 1! 9:30 TR! 145!6.033! Systems! 2! 9:30 MW! 154!6.814! Databaes! 3! 10:30 TR! 138!
Relational Representation of Course Catalog !
Reference, or Relationship!
SELECT profName FROM classes, profs!WHERE profs.ProfID = classes.ProfID!AND classes.ProfID in (! SELECT ProfID! FROM classes! HAVING count(*) > 1! GROUP BY ProfID!)!!A SQL Query to Find Profs Who Teach More than 1 Class!
join!
aggregate! subquery!
© 2014 Massachusetts Institute of Technology!Tackling the Challenges of Big Data!
NoSQL Data Models
Relational à SQL NoSQL à Non-relational?
Key Value Stores Document Stores
5
© 2014 Massachusetts Institute of Technology!Tackling the Challenges of Big Data!
Key Value Stores (Dynamo, Riak, Cassandra, … ) Data is a mapping from keys to arbitrary values Values don’t conform to any particular structure Programming Language is get/put
get(“6.01”) è “Intro to EECS 1 by Professor Madden” put(“6.005”,”Software Engineering by Prof. Jones”)
Limited multi-record transactional consistency
Key! Value!6.01! “Intro to EECS 1 by Professor Madden”!6.02! “Intro to EECS 2 by Professor Stonebraker”!6.033! “Systems by Professor Zeldovich”!6.814! “Databases by Professor Smith”!Key Value Representation of Course Catalog
© 2014 Massachusetts Institute of Technology!Tackling the Challenges of Big Data!
Document Stores (Mongo, CouchDB, ….) KeyValue++ Data maps from keys to (XML/JSON) documents Can lookup documents by key Also some ability to search contents of documents
Typically, no joins or multi-document updates
Key! Value!6.01! {title: “Intro To EECS”, prof: “Madden”, room: 123}!6.02! {title: “Intro To EECS 2”, professor: “Stonebraker”}!6.033! {title: “Systems”, room: 145}!6.814! {title: “Databases, room: 154, professor: “Smith”}!
Document Representation of Course Catalog
Different fields in docs
© 2014 Massachusetts Institute of Technology!Tackling the Challenges of Big Data!
Why Key Value?
• Simple to program and implement • Can be much faster than legacy SQL databases
– No complicated, unpredictable SQL queries • Easy to distribute across many machines
– Since no multi-record consistency
• Note: Same data can be represented in all models Next: Implications of lack of consistency Then: Can we achieve similar benefits in relational model?
6
Tackling The Challenges of Big Data Big Data Storage
NoSQL, NewSQL Alternative Data Models
THANK YOU
© 2014 Massachusetts Institute of Technology!
Tackling The Challenges of Big Data Big Data Storage
NoSQL, NewSQL Understanding Eventual Consistency
Samuel Madden
Professor and Director of Big Data at CSAIL
Massachusetts Institute of Technology
© 2014 Massachusetts Institute of Technology!
© 2014 Massachusetts Institute of Technology!Tackling the Challenges of Big Data!
NoSQL à Non Transactional
• Earlier: many key value stores aren’t transactional – Allows them run more ops per second
• Implications: – On a single machine, can’t guarantee “all or nothing”
* E.g., DB crashes in “Transfer $100 from A to B”
– On multiple machines, replica inconsistency * Eventual consistency means replicas will sync up, but until they do, may contain different data
7
© 2014 Massachusetts Institute of Technology!Tackling the Challenges of Big Data!
Why Replicate?
Replication is used to ensure availability
!!
Replica 1!!!
!!
Replica 2!!!
!!
Replica 3!!!
query!answer!
user!
query! answer!
© 2014 Massachusetts Institute of Technology!Tackling the Challenges of Big Data!
What About Updates?
!!
Replica 1!!!
!!
Replica 2!!!
!!
Replica 3!!!
update!
user!
update!update!
© 2014 Massachusetts Institute of Technology!Tackling the Challenges of Big Data!
Updates with Unavailable Replicas
• Option 1: Wait Until Failed Replica Comes Back – Problem: Can’t update system à “Unavailable”
• Option 2: Just Keep Going (“Eventual Consistency”) – Problem: What to do when failed node comes back? – What if some reads are sent to failed node?
• Option 3: Majority Write/Majority Read + Versions
8
© 2014 Massachusetts Institute of Technology!Tackling the Challenges of Big Data!
Majority Read/Majority Write
!!
Replica 1!X:0!!
!!
Replica 2!X:0!!
!!
Replica 3!X:0!!
Update!X:1!
user!
Update!X:1!
Update!X:1!
Do not write if less than a majority of replicas are available
© 2014 Massachusetts Institute of Technology!Tackling the Challenges of Big Data!
Majority Read/Majority Write
!!
Replica 1!X:1!!
!!
Replica 2!X:1!!
!!
Replica 3!X:0!!
Update!X:1!
user!
Update!X:1!
Update!X:1!
© 2014 Massachusetts Institute of Technology!Tackling the Challenges of Big Data!
Majority Read/Majority Write
!!
Replica 1!X:1!!
!!
Replica 2!X:1!!
!!
Replica 3!X:0!!
user!
Read!X! Read!
X!
X:1!
X:0!
If all reads and all writes go to a majority, guaranteed to see most recent version
9
© 2014 Massachusetts Institute of Technology!Tackling the Challenges of Big Data!
Majority Read/Write vs Eventual Consistency
Approach Pros Cons Wait For Failed Replicas on Update
!!!
!!
Eventual Consistency (“Just Keep Going”
!!
!!!!
Majority Read / Write
!!!!
© 2014 Massachusetts Institute of Technology!Tackling the Challenges of Big Data!
Majority Read/Write vs Eventual Consistency
Approach Pros Cons Wait For Failed Replicas on Update
Only need to read one replica Fully consistent
Unavailable on writes with failed replicas
Eventual Consistency (“Just Keep Going”
Majority Read / Write
© 2014 Massachusetts Institute of Technology!Tackling the Challenges of Big Data!
Majority Read/Write vs Eventual Consistency
Approach Pros Cons Wait For Failed Replicas on Update
Only need to read one replica Fully consistent
Unavailable on writes with failed replicas
Eventual Consistency (“Just Keep Going”
Only need to read one replica Able to tolerate multiple node failures
Reads may not see most recent version, or inconsistent versions
Majority Read / Write
10
© 2014 Massachusetts Institute of Technology!Tackling the Challenges of Big Data!
Majority Read/Write vs Eventual Consistency
Approach Pros Cons Wait For Failed Replicas on Update
Only need to read one replica Fully consistent
Unavailable on writes with failed replicas
Eventual Consistency (“Just Keep Going”
Only need to read one replica Able to tolerate multiple node failures
Reads may not see most recent version, or inconsistent versions
Majority Read / Write
Fully consistent Able to tolerate some node failures
Need to read multiple replicas Unavailable more than half of replicas failed
© 2014 Massachusetts Institute of Technology!Tackling the Challenges of Big Data!
Bringing Failed Replica Up To Date
• Eventual consistency à replicas sync up • Also needed in majority read/write scheme
• Many different approaches: – Keep a log of updates at each replica
* Copy to and replay at recovering replica – Periodically compare parts of replicas – Copy entire state of existing replica to new
replica – …
© 2014 Massachusetts Institute of Technology!Tackling the Challenges of Big Data!
Summary
• Eventual consistency – Possibility of inconsistent reads – Fast: Little distributed coordination
• Majority Read/Write – Avoid inconsistent reads problem – Slower: Requires more coordination
• Next: H-Store: How to build a VERY FAST and consistent SQL database
11
Tackling The Challenges of Big Data Big Data Storage
NoSQL, NewSQL Understanding Eventual Consistency
THANK YOU
© 2014 Massachusetts Institute of Technology!
Tackling The Challenges of Big Data Big Data Storage
NoSQL, NewSQL
H-Store Overview
Samuel Madden Professor and Director of Big Data at CSAIL
Massachusetts Institute of Technology
© 2014 Massachusetts Institute of Technology!
© 2014 Massachusetts Institute of Technology!Tackling the Challenges of Big Data!
NOSQL NEWSQL
TRADITIONAL
Slides and Graphics: Andy Pavlo!
12
© 2014 Massachusetts Institute of Technology!Tackling the Challenges of Big Data!
APPLICATION
34
Japanese “American Idol” VOTER BENCHMARK
1. Check whether user has already voted.
2. Insert new vote entry. 3. Update vote count for contestant.
TRANSACTION
© 2014 Massachusetts Institute of Technology!Tackling the Challenges of Big Data! 35!
0
10,000
20,000
30,000
40,000
50,000
1 2 3 4 5 6 7 8
Japanese “American Idol” VOTER BENCHMARK
MySQL Postgres
TXN/SEC CPU CORES
© 2014 Massachusetts Institute of Technology!Tackling the Challenges of Big Data!
BUFFER POOL
LOCKING
RECOVERY
REAL WORK
28% 30%
30% 12%
36
Measured CPU Cycles TRADITIONAL DBMS
OLTP THROUGH THE LOOKING GLASS, AND WHAT WE FOUND THERE SIGMOD, pp. 981-992, 2008.
13
© 2014 Massachusetts Institute of Technology!Tackling the Challenges of Big Data!
GUARANTEES
SCALABILITY
TRADITIONAL
NEWSQL
NOSQL
WEAK (None/Limited)
STRONG (ACID)
LOW (One Node)
HIGH (Many Nodes)
37
© 2014 Massachusetts Institute of Technology!Tackling the Challenges of Big Data!
CAN A DBMS SCALE UP WITHOUT GIVING UP
TRANSACTIONS?
© 2014 Massachusetts Institute of Technology!Tackling the Challenges of Big Data!
USE A LIGHTWEIGHT SYSTEM DESIGNED FOR
TRANSACTIONS.
Key Idea!
H-STORE: A HIGH-PERFORMANCE, DISTRIBUTED MAIN MEMORY TRANSACTION PROCESSING SYSTEM Proc. VLDB Endow., vol. 1, iss. 2, pp. 1496-1499, 2008.
14
© 2014 Massachusetts Institute of Technology!Tackling the Challenges of Big Data!
DISK ORIENTED
CONCURRENT EXECUTION
HEAVYWEIGHT RECOVERY
✔!
✔!
MAIN MEMORY STORAGE
SERIAL EXECUTION
COMPACT LOGGING
✔!
© 2014 Massachusetts Institute of Technology!Tackling the Challenges of Big Data!
Transaction
Execution
App
licat
ion
PARTITIONS
SINGLE-THREADED EXECUTION ENGINES
Transaction
Result
41
CMD LOG SNAPSHOTS
Procedure Name Input
Parameters run(phoneNum, contestantId, currentTime) {
result = execute(VoteCount, phoneNum); if (result > MAX_VOTES) { return (ERROR); } execute(InsertVote, phoneNum,
contestantId, currentTime);
return (SUCCESS); }
VoteCount: SELECT COUNT(*) FROM votes WHERE phone_num = ?;
InsertVote: INSERT INTO votes VALUES (?, ?, ?);
STORED PROCEDURE
© 2014 Massachusetts Institute of Technology!Tackling the Challenges of Big Data! 42!
Japanese “American Idol” VOTER BENCHMARK
0
50,000
100,000
150,000
200,000
250,000
1 2 3 4 5 6 7 8
H-Store
25x 0
10,000
20,000
30,000
40,000
50,000
1 2 3 4 5 6 7 8
TXN/SEC CPU CORES
MySQL Postgres
15
Tackling The Challenges of Big Data Big Data Storage NoSQL, NewSQL H-Store Overview
THANK YOU
© 2014 Massachusetts Institute of Technology!
Tackling The Challenges of Big Data!Big Data Storage NoSQL, NewSQL
H-Store Implementation
Samuel Madden Professor and Director of Big Data at CSAIL
Massachusetts Institute of Technology
© 2014 Massachusetts Institute of Technology!
© 2014 Massachusetts Institute of Technology!Tackling the Challenges of Big Data!
H-Store
No Disk No Locking No Concurrency Control No Logging
Buffer Pool
Locking
Recovery
Real Work
16
© 2014 Massachusetts Institute of Technology!Tackling the Challenges of Big Data!
No Disk
Partition DB into RAM-sized chunks Distribute Across a Cluster of Machines
Most OLTP workloads partition nearly perfectly
Most OLTP databases fit into aggregate cluster RAM 10+ TB!
Per-customer shopping carts Per-user email accounts
© 2014 Massachusetts Institute of Technology!Tackling the Challenges of Big Data!
No Concurrency Control
One Transaction at a Time Per Partition * Multi-partition transactions lock all partitions
Is this really going to perform well? * No stalls due to disk, network, etc * Most transactions run on a single partition
© 2014 Massachusetts Institute of Technology!Tackling the Challenges of Big Data!
No Disk-based Logging!• Recover from replicas
– By copying state on crash – Asynchronously checkpoint to disk – Use transaction logging to recover from crash
* Much less I/O at runtime than disk logging !
17
© 2014 Massachusetts Institute of Technology!Tackling the Challenges of Big Data!
Bottom Line
Buffer Pool
Locking
Recovery
Real Work
Buffer Pool
Locking
Recovery
Real Work
No Disk
No Concurrency
No Logging
© 2014 Massachusetts Institute of Technology!Tackling the Challenges of Big Data!
Many Optimizations
Speculative Execution For multi-partition transactions
Automatic Partitioning
Multi-thread DB on Multicores
LOW OVERHEAD CONCURRENCY CONTROL FOR PARTITIONED MAIN MEMORY DATABASES In Proceedings of SIGMOD, 2010.
SCHISM: A WORKLOAD-DRIVEN APPROACH TO DATABASE REPLICATION AND PARTITIONING. In Proceedings of VLDB, 2010.
SPEEDY TRANSACTIONS IN MULTICORE IN-MEMORY DATABASES In Proceedings of SOSP, 2013
© 2014 Massachusetts Institute of Technology!Tackling the Challenges of Big Data!
Recap: NoSQL vs NewSQL
• Big Data à New Requirements on Database Systems
• New Data Models – E.g., Documents
• High Availability – Using Replication – Eventual Consistency vs Majority Read/Write
• High Performance – Simplified query platform (NoSQL) – Modern, main-memory database system (H-Store)
18
Tackling The Challenges of Big Data Big Data Storage NoSQL, NewSQL
H-Store Implementation
THANK YOU
© 2014 Massachusetts Institute of Technology!