Look Ma! No more blobs

Post on 28-Nov-2014

931 views 1 download

description

GridFS is a storage mechanism for persisting large binary data in MongoDB.

Transcript of Look Ma! No more blobs

Look Ma! No more blobs

Aparna Chaudhary

NoSQL matters, @Cologne Germany 2013

EMBRACEPOLYGLOT

PERSISTENCE!

STOP RDBMS ABUSE!

KNOW YOUR USE CASE

Parse

Extract

Store

Read XML

We don't do rocket science...

Use Case

Runtime support for document types

Metadata definition provided at runtime

Document type names - max 50 char

Look up content based on metadata

RA

Challenges

Storage of up to one million documents of 10KB to 2GB per document type per year

Write 1MB < x msec

Retrieve 1MB < y msec

......and detailsRA

But…the Numbers make it interesting...

How?

File System

MongoDB

RDBMS

JCR

Document Management

if you want to store files, its logical to use file system.

ain't it?

File System

✓ Ease of Use

✓ No special skill-set

✓ Backup and Recovery

✓ It’s free!

How do I name them?

Support for metadata storage?

Performance with too many small files?

Query - Administration?

High Availability?

Limitation on total number of

files?

Relational database

IntegrityConsistency

Durability

Atomicity

JoinsBackups

High Availability

You name it, We have it!

RDBMS

Aggregations

RDBMS Developer’s Perspective

Challenge #1

RA

We need runtime support for document type.

RA

We need runtime support for document type.

Challenge #1

DOC_1 DOC_2 DOC_3

DOC_4 DOC_5 DOC_6

Dynamic DDL Generation

DOC_1 DOC_2 DOC_3

DOC_4 DOC_5 DOC_6

Dynamic DDL Generation

Challenge #1String concatenations

are ugly…

DEV

String concatenations are ugly…

DEV

Challenge #1Let's build a utility.

DEV

Let's build a utility.

DEV

Challenge #1

More Work More Work

Challenge #2

RA

Document type is 50 char long

RA

Document type is 50 char long

Challenge #2TABLE NAME LIMITS

Wait…SQL-92 says 128 Char

?We rule. Let's support only

30 char.

TABLE NAME LIMITS

Wait…SQL-92 says 128 Char

?We rule. Let's support only

30 char.

Challenge #2

DOC_TYPE_MAPPING

Let's create a mapping table.

DEV

DOC_TYPE_MAPPING

Let's create a mapping table.

DEV

Challenge #2

Ugly unreadable table names!

Ugly unreadable table names!

So...f inally...Read XML

Dynamic DDL generation

Document Type Alias

DocumentTypeDefined

Yes

No

Extract Metadata

Store Metadata

Store Content

Simple use case becomes complex...

Remember...Our Challenge

QA

Let's see if we are in spec for response time.

Aah..what about performance now?

DEV

MongoDB

Document BasedGridFS

B-TreeDynamic Schema

JSON

BSON Query

Scalablehttp://www.10gen.com/presentations/storage-engine-internals

Joins

Complex Transaction

F1 F2 F3 F4 F5ID1

ID2

ID3

ID4

ID5

F1

F1

F1

F1

F2

F2 F3 F4 F5 F6

F2 F3 F4 F5 Fx

F8

F3

F9 F7

Concepts

Database

Collection

Collection Collection Collection

CollectionCollection

Database

Collection

Collection Collection Collection

CollectionCollection

Database

Collection

Collection Collection Collection

CollectionCollection

Database

Collection

Collection Collection Collection

CollectionCollection

Table = Collection

Column = Field

Row = Document

Database = Database

GridFS

MongoDB divides the

large content into

chunks

Stores Metadata and Chunks separately

http://docs.mongodb.org/manual/core/gridfs/

> mybucket.files{ "_id" : ObjectId("514d5cb8c2e6ea4329646a5c"),

"chunkSize" : NumberLong(262144),

"length" : NumberLong(103015),

"md5" : "34d29a163276accc7304bd69c5520e55",

"filename" : "health_record_2.xml",

"contentType" : application/xml,

"uploadDate" : ISODate("2013-03-23T07:41:44.907Z"),

"aliases" : null,

"metadata" : { "fname" : "Aparna", "lname" : "Chaudhary","country" : "Netherlands" }

}

ObjectId - 12 Byte BSON:4 Byte - Seconds since Epoch3 Byte - Machine Id2 Byte - Process Id3 Byte - Counter

> mybucket.chunks

{ "_id" : ObjectId("514d5cb8c2e6ea4329646a5d"), "files_id" : ObjectId("514d5cb8c2e6ea4329646a5c"),

"n" : 0,

"data" : BinData(0,...)

}

?I'm storing 10KB file, but

would it use 256KB on disk?

Last Chunk =

FileSize % 256+

Metadata overhead

256

1128KB

256 256 256 104 + x

10KB

10 + x

Chunk is as big as it

needs to be...

Challenge #1

DEV

MongoDB supports Dynamic Schema.

You can use collection per docType and they are created dynamically.

RA

We need runtime support for document type.

Challenge #2

RA

Document type is 50 char long

DEV

MongoDB namespace can be up to 123 char.

So...f inally...

Simple use case remains simple...well becomes

simpler...

Read XML

Extract Metadata

Store Metadata & Content

Remember...Our Challenge

QA

Let's see if we are in spec for response time.

DEV

Performance test is part of our definition of 'DONE'

BEcause seeing is believing!

Demo

‣ GridFS 2.4.0

‣ PostgreSQL 9.2

‣ Spring Data

‣ JMeter 2.7

‣ Mac OS X 10.8.3 2.3GHz Quad-Core Intel Core i7, 16GB RAM

https://github.com/aparnachaudhary/nosql-matters-demo

EMBRACEPOLYGLOT

PERSISTENCE!

STOP RDBMS ABUSE!

KNOW YOUR USE CASE

@aparnachaudhary

Java Developer, Data Lover

Eindhoven, Netherlands

http://blog.aparnachaudhary.com/

@aparnachaudhary

Thank You!