Getting Biologists off ACID

22
Getting Biologists off ACID Ryan Verdon 3/13/12

description

Getting Biologists off ACID. Ryan Verdon 3/13/12. Outline. Thesis Idea Specific database Effects of losing ACID What is a NoSQL database Types of NoSQL databases with examples. Thesis Idea. Amount of biological data is growing exponentially Hardware cannot keep up with the growth. - PowerPoint PPT Presentation

Transcript of Getting Biologists off ACID

Page 1: Getting Biologists off ACID

Getting Biologists off ACID

Ryan Verdon3/13/12

Page 2: Getting Biologists off ACID

Outline

• Thesis Idea• Specific database• Effects of losing ACID• What is a NoSQL database• Types of NoSQL databases with examples

Page 3: Getting Biologists off ACID

Thesis Idea

• Amount of biological data is growing exponentially

• Hardware cannot keep up with the growth

Page 4: Getting Biologists off ACID

Database Survey• In 2003, a survey was conducted of 111 biological databases. • 96 (i.e. 87%) of the 111 considered databases have

Hypertext references to other databases,• 40 to 44 (i.e. 36% to 40%) are implemented as flat files,• 41 (or 42) (i.e. 37%) are implemented using a relational

database management system, • 7 (i.e. 6%) use an object database management system, • 3 (.i.e. 3%) use an object-relational database management

system, • and all databases collect data from different sources.

Page 5: Getting Biologists off ACID

Specific Database

Page 6: Getting Biologists off ACID

Flybase

• Primary biology database on the insect family Drosophilidae (fruit fly)

• Important research from fruit flies includes– Genes– Recombination– Signaling networks (important for major diseases)– Stem cells– Growth control

Page 7: Getting Biologists off ACID

Info on the Flybase database

• Single server relational database• Over 135 tables• Uses Chado schema• Gives data a “type”• Creates a graph that relates the different types

Page 8: Getting Biologists off ACID

Effects of losing ACID

Page 9: Getting Biologists off ACID

ACID

• Atomicity• Consistency• Isolation• Durability

Page 10: Getting Biologists off ACID

CAP Theorem

• Consistency• Availability• Partition tolerance

Page 11: Getting Biologists off ACID

BASE consistency model

• Basically Available• Soft state• Eventually consistent• Given enough time where no changes are

made, all replicas will see the same data

Page 12: Getting Biologists off ACID

NoSQL

Page 13: Getting Biologists off ACID

What is a NoSQL database

• Any non-relational database– Hierarchical– Graph– Object oriented– Etc.

Page 14: Getting Biologists off ACID

Why were NoSQL databases created

• To overcome limitations of relational databases– Predefined layout– Scaling– SQL– Large feature set

Page 15: Getting Biologists off ACID

Major types of NoSQL databases

• Key-value stores• Column-oriented databases• Document based stores

Page 16: Getting Biologists off ACID

Key-value stores

• Stored values are indexed for retrieval by keys• Can store unstructured or structured data• Similar to DHTs

Page 17: Getting Biologists off ACID

Example: Dynamo

• Created by Amazon, only used internally• Highly available even in the face of continual

failures• Amazon realized many services only need

primary-key access– Best seller lists– Shopping carts– Session management

Page 18: Getting Biologists off ACID

Column-oriented databases

• Contain extendable columns of closely related data

• Can greatly benefit from compression

Page 19: Getting Biologists off ACID

Example: HBase

• Apache open-source project• Based off of Google’s Bigtable database• Ties in closely with Hadoop and map-reduce

Page 20: Getting Biologists off ACID

Document based stores

• Data stored as a collection of documents• Documents can have any number of fields and

any length• Documents are accessed via a unique key• Capabilities of the query language heavily

depend on the implementation

Page 21: Getting Biologists off ACID

Example: MongoDB

• Started by 10gen• Supports– Replication– Map-reduce– Sharding

• Used by lots of large companies including– Disney– Craigslist

Page 22: Getting Biologists off ACID

Questions

?