High Performance, Scalable MongoDB in a Bare Metal Cloud

Post on 22-Jun-2015

581 views 4 download

Tags:

Transcript of High Performance, Scalable MongoDB in a Bare Metal Cloud

High Performance, Scalable MongoDB

in a Bare Metal Cloud

Harold Hannon, Sr. Software Architect

100k servers

24k customers

23 million domains

13 data centers16 network POPs20Gb fiber interconnects

Global Footprint

On the agenda today…..• Big Data considerations• Some deployment options• Performance Testing with JS

Benchmarking Harness • Review some internal product research

performed• Discuss the impact of those findings on

our product development

“Build me a Big Data Solution”

Product Use Case

• MongoDB deployed for customers on purchase• Complex configurations including sharding and

replication• Configurable via Portal interface• Performance tuned to 3 ‘t-shirt size’

deployments

Big Data Requirements • High Performance• Reliable, Predictable Performance• Rapidly Scalable• Easy to Deploy

Requirements ReviewedCloud Provider Bare Metal Instance

High Performance

Reliable, Predictable Performance

Rapidly Scalable XEasy to Deploy X

I’ve got nothing……

The “Marc-O-Meter”

I’M NOT HAPPY

Marc… Angry

Thinking about Big Data

The 3 V’s

Physical Deployment

Cloud Vs Metal

Public Cloud

Public Cloud• Speed of deployment• Great for bursting use case• Imaging and cloning make POC/Dev work easy• Shared I/O• Great for POC/DEV• Excellent for App level applications• Not consistent enough for disk intensive applications• Must have application developed for “cloud”

Physical Servers

Bare Metal• Build to your specs• Robust, quickly scaled environment• Management of all aspects of environment• Image Based• No Hypervisor• Single Tenant• Great for Big Data Solutions

The Proof is in the Pudding

Beware The “Best Case Test Case”

185817.6 190525.4 187882.2 191101.8 184408.8 188135.4 187080.6 186343.4 191899.6 187736.6 188978.8 187440 186950.4 187623 187783.8 187775.8 192806.8 186643.2

192,806.8 Read Ops/Sec

Do It Yourself

• Data Set Sizing• Document/Object Sizes• Platform• Controlled client or AFAIC• Concurrency• Local or Remote Client• Read/Write Tests

JS Benchmarking Harness

• Data Set Sizing• Document/Object Sizes• Platform• Controlled client or AFAIC• Concurrency• Local or Remote Client• Read/Write Tests

db.foo.drop();db.foo.insert( { _id : 1 } )

ops = [{op: "findOne", ns: "test.foo", query: {_id: 1}}, {op: "update", ns: "test.foo", query: {_id: 1}, update: {$inc: {x: 1}}}]

for ( var x = 1; x <= 128; x *= 2) { res = benchRun( { parallel : x , seconds : 5 , ops : ops } ); print( "threads: " + x + "\t queries/sec: " + res.query );}

Quick Example

hostThe hostname of the machine mongod is running on (defaults to localhost).usernameThe username to use when authenticating to mongod (only use if running with auth).passwordThe password to use when authenticating to mongod (only use if running with auth).dbThe database to authenticate to (only necessary if running with auth).opsA list of objects describing the operations to run (documented below).parallelThe number of threads to run (defaults to single thread).secondsThe amount of time to run the tests for (defaults to one second).

Options

nsThe namespace of the collection you are running the operation on, should be of the form "db.collection".opThe type of operation can be "findOne", "insert", "update", "remove", "createIndex", "dropIndex" or "command".queryThe query object to use when querying or updating documents.updateThe update object (same as 2nd argument of update() function).docThe document to insert into the database (only for insert and remove).safeboolean specifying whether to use safe writes (only for update and insert).

Options

{ "#RAND_INT" : [ min , max , <multiplier> ] }[ 0 , 10 , 4 ] would produce random numbers between 0 and 10 and then multiply by 4.

{ "#RAND_STRING" : [ length ] }[ 3 ] would produce a string of 3 random characters.

var complexDoc3 = { info: "#RAND_STRING": [30] } }

var complexDoc3 = { info: { inner_field: { "#RAND_STRING": [30] } } }

Dynamic Values

Lots of them here:

https://github.com/mongodb/mongo/tree/master/jstests

Example Scripts

Read Only Test

• Random document size < 4k (mostly 1k)• 6GB Working Data Set Size• Random read only • 10 second per query set execution• Exponentially increasing concurrent clients from 1-128• 48 Hour Test Run• RAID10 4 SSD drives• Local Client• “Pre-warmed cache”

The ResultsConcurrent Clients Avg Read OPS/Sec

1 38288.5272 72103.357964 127451.88678 180798.439616 191817.336132 186429.451764 187011.7824128 188187.0704

Some Tougher Tests• Small MongoDB Bare Metal Cloud vs

Public Cloud Instance• Medium MongoDB Bare Metal Cloud vs

Public Cloud Instance• SSD and 15K SAS

• Large MongoDB Bare Metal Cloud vs

Public Cloud Instance• SSD and 15K SAS

Pre-configurations• Set SSD Read Ahead Defaults to 16 Blocks – SSD drives have

excellent seek times allowing for shrinking the Read Ahead to 16 blocks. Spinning disks might require slight buffering so these have been set to 32 blocks.

• noatime – Adding the noatime option eliminates the need for the system to make writes to the file system for files which are simply being read — or in other words: Faster file access and less disk wear.

• Turn NUMA Off in BIOS – Linux, NUMA and MongoDB tend not to work well together. If you are running MongoDB on NUMA hardware, we recommend turning it off (running with an interleave memory policy). If you don’t, problems will manifest in strange ways like massive slow downs for periods of time or high system CPU time.

• Set ulimit – We have set the ulimit to 64000 for open files and 32000 for user processes to prevent failures due to a loss of available file handles or user processes.

Use ext4 – We have selected ext4 over ext3. We found ext3 to be very slow in allocating files (or removing them). Additionally, access within large files is poor with ext3.

Private Network

JMETER SERVER

JMETER SERVER

JMETER SERVER

JMETER SERVER

RMI

Jmeter Master Client

RDP

Tester’s Local Machine

Test Environment

var numIterations = 1;var low_rand = 0;var RAND_STEP = 32767;

// print("high_id is "+high_id);// print("server is "+server);// print("maxThreads is "+maxThreads);// print("testDuration is "+testDuration);// print("readTest is "+readTest);// print("updateTest is "+updateTest);

Random.srand((new Date()).valueOf())

var last_id = 0;function nextId() {

return last_id++;}

var ops = [];

while (low_rand < high_id) {

if(readTest){ops.push({

op : "findOne",ns : "test.foo",query : {

incrementing_id : {"#RAND_INT" : [ low_rand, low_rand + RAND_STEP ]

}}

});}if(updateTest){

ops.push({ op: "update", ns: "test.foo", query: { incrementing_id: { "#RAND_INT" : [0,high_id]}}, update: { $inc: { counter: 1 }}, safe: true });}

low_rand += RAND_STEP;}

function printLine(tokens, columns, width) {line = "";column_width = width / columns;for (var i=0;i<tokens.length;i++) {

line += tokens[i];// token_width = tokens[token].toString().length;// pad = column_width - token_width;// while (pad--) { if(i != tokens.length-1)

line += " , ";// }

}line += " newline";print(line);

}

for (iteration = 1; iteration <= numIterations; iteration++) {print("theads, query/sec, query latency, updates/sec, update latency newline");// print("iteration " + iteration + " threads: " + maxThreads + " duration:// "// + testDuration);

// printLine([ "threads", "query/sec", "query latency", "update/sec",// "update latency" ], 5, 80);

for (x = 1; x <= maxThreads; x *= 2) {res = benchRun({

parallel : x,seconds : testDuration,ops : ops,host : server

});

printLine([ x, (res.query || 0),(res.findOneLatencyAverageMicros || 0).toFixed(2),(res.update || 0),(res.updateLatencyAverageMicros || 0).toFixed(2) ], 5, 80);

}}

Small TestSmall MongoDB ServerSingle 4-core Intel 1270 CPU64-bit CentOS8GB RAM2 x 500GB SATAII – RAID11Gb Network

Virtual Provider Instance4 Virtual Compute Units64-bit CentOS7.5GB RAM2 x 500GB Network Storage – RAID11Gb Network

Tests PerformedSmall Data Set (8GB of .5mb documents)200 iterations of 6:1 query-to-update operationsConcurrent client connections exponentially increased from 1 to 32Test duration spanned 48 hours

Small TestSmall Bare Metal Cloud Instance• 64-bit CentOS• 8GB RAM• 2 x 500GB SATAII – RAID1• 1Gb Network

Public Cloud Instance• 4 Virtual Compute Units• 64-bit CentOS• 7.5GB RAM• 2 x 500GB Network Storage – RAID1• 1Gb Network

Small Public Cloud

1 2 4 8 16 320

200

400

600

800

1000

1200

1400

Concurrent Clients

Op

s/S

eco

nd

Small Bare Metal

1 2 4 8 16 320

200

400

600

800

1000

1200

1400

1600

Concurrent Clients

Op

s/S

eco

nd

Medium TestMedium MongoDB ServerDual 6-core Intel 5670 CPUs64-bit CentOS36GB RAM2 x 64GB SSD – RAID1 (Journal Mount)4 x 300GB 15K SAS – RAID10 (Data Mount)1Gb Network – Bonded

Virtual Provider Instance26 Virtual Compute Units64-bit CentOS30GB RAM2 x 64GB Network Storage – RAID1 (Journal Mount)4 x 300GB Network Storage – RAID10 (Data Mount)1Gb Network

Tests PerformedSmall Data Set (32GB of .5mb documents)200 iterations of 6:1 query-to-update operationsConcurrent client connections exponentially increased from 1 to 128Test duration spanned 48 hours

Medium TestBare Metal Cloud Instance• Dual 6-core Intel 5670 CPUs• 64-bit CentOS• 36GB RAM• 2 x 64GB SSD – RAID1 (Journal Mount)• 4 x 300GB 15K SAS – RAID10 (Data Mount)• 1Gb Network – Bonded

Public Cloud Instance• 26 Virtual Compute Units• 64-bit CentOS• 30GB RAM• 2 x 64GB Network Storage – RAID1 (Journal Mount)• 4 x 300GB Network Storage – RAID10 (Data Mount)• 1Gb Network

Medium TestBare Metal Cloud Instance• Dual 6-core Intel 5670 CPUs• 64-bit CentOS• 36GB RAM• 2 x 64GB SSD – RAID1 (Journal Mount)• 4 x 400GB SSD– RAID10 (Data Mount)• 1Gb Network – Bonded

Public Cloud Instance• 26 Virtual Compute Units• 64-bit CentOS• 30GB RAM• 2 x 64GB Network Storage – RAID1 (Journal Mount)• 4 x 400GB Network Storage – RAID10 (Data Mount)• 1Gb Network

Medium TestTests Performed• Data Set (32GB of .5mb documents)• 200 iterations of 6:1 query-to-update operations• Concurrent client connections exponentially

increased from 1 to 128• Test duration spanned 48 hours

Medium Public Cloud

1 2 4 8 16 32 64 1280

500100015002000250030003500400045005000

Concurrent Clients

Op

s/S

eco

nd

Medium Bare Metal 15k SAS

1 2 4 8 16 32 64 1280

1000

2000

3000

4000

5000

6000

7000

8000

Concurrent Clients

Op

s/S

eco

nd

Medium Bare Metal SSD

1 2 4 8 16 32 64 1280

500

1000

1500

2000

2500

3000

3500

4000

4500

Concurrent Clients

Op

s/S

eco

nd

Large TestLarge MongoDB ServerDual 8-core Intel E5-2620 CPUs64-bit CentOS128GB RAM2 x 64GB SSD – RAID1 (Journal Mount)6 x 600GB 15K SAS – RAID10 (Data Mount)1Gb Network – Bonded

Virtual Provider Instance26 Virtual Compute Units64-bit CentOS64GB RAM (Maximum available on this provider)2 x 64GB Network Storage – RAID1 (Journal Mount)6 x 600GB Network Storage – RAID10 (Data Mount)1Gb Network

Tests PerformedSmall Data Set (64GB of .5mb documents)200 iterations of 6:1 query-to-update operationsConcurrent client connections exponentially increased from 1 to 128Test duration spanned 48 hours

Large TestBare Metal Cloud Instance• Dual 8-core Intel E5-2620 CPUs• 64-bit CentOS• 128GB RAM• 2 x 64GB SSD – RAID1 (Journal Mount)• 6 x 600GB 15K SAS – RAID10 (Data Mount)• 1Gb Network – Bonded

Public Cloud Instance• 26 Virtual Compute Units• 64-bit CentOS• 64GB RAM (Maximum available on this provider)• 2 x 64GB Network Storage – RAID1 (Journal Mount)• 6 x 600GB Network Storage – RAID10 (Data Mount)• 1Gb Network

Large TestBare Metal Cloud Instance• Dual 8-core Intel E5-2620 CPUs• 64-bit CentOS• 128GB RAM• 2 x 64GB SSD – RAID1 (Journal Mount)• 6 x 400GB SSD – RAID10 (Data Mount)• 1Gb Network – Bonded

Public Cloud Instance• 26 Virtual Compute Units• 64-bit CentOS• 64GB RAM (Maximum available on this provider)• 2 x 64GB Network Storage – RAID1 (Journal Mount)• 6 x 400GB Network Storage – RAID10 (Data Mount)• 1Gb Network

Large TestTests Performed• Data Set (64GB of .5mb documents)• 200 iterations of 6:1 query-to-update operations• Concurrent client connections exponentially

increased from 1 to 128• Test duration spanned 48 hours

Large Public Cloud

1 2 4 8 16 32 64 1280

1000

2000

3000

4000

5000

6000

Concurrent Clients

Op

s/S

eco

nd

Large Bare Metal 15k SAS

1 2 4 8 16 32 64 1280

1000

2000

3000

4000

5000

6000

7000

Concurrent Clients

Op

s/S

eco

nd

Large Bare Metal SSD

1 2 4 8 16 32 64 1280

1000

2000

3000

4000

5000

6000

Concurrent Clients

Op

s/S

eco

nd

Superior Performance

Deployment Size Bare Metal Drive Type

Bare Metal Average Performance Advantage over Virtual

Small SATA II 70%

Medium 15k SAS 133%

Medium SSD 297%

Large 15k SAS 111%

Large SSD 446%

Consistent Performance

Virtual Instance Bare Metal Instance

Small 6-36% 1-9%

Medium 8-43% 1-8%

Large 8-93% 1-9%

RSD (Relative Standard Deviation) by Platform

Requirements ReviewedCloud Provider Bare Metal Instance

High Performance XReliable, Predictable Performance XRapidly Scalable XEasy to Deploy X

Not Quite There Yet……

The “Marc-O-Meter”

NOT SURE IF WANT

The Dream

The Reality

Virtual Instance

Striped Network Attached Virtual Volumes

Cluster

Deployment ComplexityVirtual Instance

Striped Network Attached Virtual Volumes

Virtual Instance

Striped Network Attached Virtual Volumes

Virtual Instance

Striped Network Attached Virtual Volumes

Deployment Serenity:The Solution Designer

MongoDB Solutions• Preconfigured• Performance Tuned• Bare Metal Single Tenant• Complex Environment Configurations

Requirements ReviewedCloud Provider Bare Metal Instance

High Performance XReliable, Predictable Performance XRapidly Scalable X XEasy to Deploy X X

The “Marc-O-Meter”

B+ FOR EFFORT

Customer Feedback“We have over two terabytes of raw event data coming in every day ... Struq has been able to process over 95 percent of requests in fewer than 30 milliseconds”

- Aaron McKee CTO, Struq

The “Marc-O-Meter”

WIN!!

Summary• Bare Metal Cloud can be leveraged to

simplify deployments• Bare Metal has a significant

performance superiority/consistency over Public Cloud

• Public Cloud is best suited for Dev/POC or when running data sets in memory only

More information:

www.softlayer.comblog @ http://sftlyr.com/bdperf