C-Store: An Introduction to Berkeley DB Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Mar....
-
Upload
elmer-short -
Category
Documents
-
view
213 -
download
0
Transcript of C-Store: An Introduction to Berkeley DB Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Mar....
C-Store: An Introduction to Berkeley DB
Jianlin FengSchool of SoftwareSUN YAT-SEN UNIVERSITYMar. 13, 2009
Overview of Berkeley DB
Means the Berkeley Database An open-source, embedded transactional data ma
nagement system A key/value store
Embedded ? As a library that is linked with an application Hides data management from end-user
Scales from Bytes to Petabytes Runs on everything from cell phone to large s
ervers.
Berkeley DB : Examples of Applications Google Accounts
Store all user and service account information and preferences.
Amazon’s user-customization
Berkeley DB has high reliability and high performance.
Berkeley DB: A Brief History (1) Began life in 1991 as a dynamic linear hashin
g implementation. historic UNIX database libraries: dbm, ndbm and
hsearch Released as a library in the 4.4 BSD in 1992.
db-1.85 == Hash + B-Tree
The package LIBTP Transactional Implementation of db-1.85 A research prototype that was never released.
Berkeley DB: A Brief History (2) In 1996, Seltzer and Bostic started Sleepycat
Software. for use in the Netscape browser
Berkeley DB 2.0, Released in 1997 Transactional implementation the first commercial release
Berkeley DB 3.0, Released in 1999 Transformed into an Object-Oriented Handle and
Method style API.
Berkeley DB: A Brief History (3) Berkeley DB 4.0, Released in 1999
Single-Master, Multiple-Reader Replication High Availability
replicas can take over for a failed master High Scalability
Read-only replicas can reduce master load Similar ideas are adopted in C-Store.
In Feb. 2006, Oracle acquired Sleepycat.
Sleepycat Public License: a Dual License The code
Is open source And may be downloaded and used freely
However, redistribution requires Either the package using Berkeley DB be release
d as open source Or that the distributors obtain a commercial licens
e from Sleepycat (and now Oracle, acquired in Feb. 2006).
Berkeley DB: Product Family Today The original Berkeley DB library Berkeley DB XML
Atop the library Berkeley DB Java Edition
100% pure Java implementation
Berkeley DB : Product Family Architecture
Berkeley DB: The Design Philosophy Provide mechanisms without specifying
policies
For example, Berkeley DB is abstracted as a store of <key, value> pairs. Both keys and values are opaque byte-strings. i.e., Berkeley DB has no schema, And the application that embeds Berkeley DB is
responsible for imposing its own schema on the data.
Advantages of <key, value> pairs An application is free to store data in
whatever form is most natural to it. Objects (like structures in C language) Rows in Oracle, SQL Server Columns in C-store
Different data formats can be stored in the same databases. As long as the application understands how to
interpret the data items.
Indexing Key Values
Indexing methods B-Tree Hash Queue A record-number-based index implemented atop
B-Tree Data manipulation
Put, store key/value pairs Get, retrieve key/value pairs Delete, remove key/value pairs
How Applications Access key/value pairs? Through handles on databases
Similar to relational tables Or through cursor handles
Representing a specific place within a database Used for iteration, i.e., fetch a key/value pair each
time. Databases are implemented atop OS file
system. A file may contain one or more databases.
Berkeley DB Replication:A Log-Shipping System A Replication Group
A single Master One or more Read-Only Replicas.
All write operations must be processed transactionally by the Master
The Master sends log records to each of the Replicas.
The Replicas apply log records only when they receive a transaction commit record.
Berkeley DB: Configuration Flexibility Configuration flexibility is critical
Due to a wide range of applications
Three ways Compile Time Configuration Feature Set Selection Runtime Configuration
Compile Time Configuration Option 1: small footprint build
-enable-smallbuild For use in a cell phone The compiled library contains only B-Tree index, Omits replication, cryptography, statistics collectio
n, etc. The library is about 0.5 MB.
Option 2: higher concurrency locking -enable-fine-grained-lock-manager For use in a Data Center Lock-Based Concurrency Control
Feature Set Selection
1. The Data Store (DS) feature set Most similar to the original db-1.85 library Good for temporary data storage
2. The Concurrent Data Store (CDS) feature set Acquires a single lock per API invocation Good for Read-Most applications
3. The Transactional Data Store (TDS) feature set Currently the most widely used feature set Acquires a single lock per page
4. The High Availability (HA) feature set Can continue running even after a site fails.
Runtime Configuration
Index Selection and Tuning Applications can select the page size in an index
Trading off Durability and Performance No-force log write Extreme case: applications can run completely in
memory Trading off Two-Phase Locking and Multivers
ion Concurrency Control. Note: C-Store adopts similar ideas for high pe
rformance.
Challenges of Berkeley DB’s Flexibility Need flexibility in Berkeley DB designers
Need flexibility in application developers
Any Dream? Any Idea?
iGoogle中国大学生创新设计大赛
中山大学软件学院第四届软件创新设计大赛
Some Research with Me?
References
M Seltzer . Berkeley DB: A Retrospective. IEEE Data Engineering Bulletin, Pp. 21-28, Volume 30, Number 3, September 2007
MA Olson, K Bostic, M Seltzer . Berkeley DB. USENIX Annual Technical Conference, Pp. 183–192, June 6-11, 1999, Monterey, California, USA.
Oracle Berkeley DB Site. http://www.oracle.com/technology/products/berkeley-db