Building a Scalable and Modern Infrastructure at CARFAX

34
A Scalable and Modern Infrastructure at CARFAX

description

The CARFAX vehicle history database contains over twelve billion documents in a twelve shard cluster that replicates to multiple data centers. This will be a step by step walk through of how we deploy our servers, manage high volume reads and writes, and our configuration for high availability. By automating everything from the operating system install up we are able deploy complete replica clusters quickly and efficiently. Using distributed processing and message queuing we load millions of new documents each day with a projected growth over a billion records per year. Through the use of tagging, server configuration, and read settings we deliver content with high consistency and availability.

Transcript of Building a Scalable and Modern Infrastructure at CARFAX

Page 1: Building a Scalable and Modern Infrastructure at CARFAX

A Scalable and Modern Infrastructure at CARFAX

Page 2: Building a Scalable and Modern Infrastructure at CARFAX

About Me• Jai Hirsch – Senior Systems Architect, Data

Technologies at CARFAX• Long-time Java and Database Developer• Data and Distributed Processing Enthusiast

• Github: https://github.com/JaiHirsch• Twitter: @JaiHirsch • Blog: http://jaihirsch.github.io/straw-in-a-haystack/

Page 3: Building a Scalable and Modern Infrastructure at CARFAX

“CARFAX helps millions of people buy and sell used cars with more confidence”

Page 4: Building a Scalable and Modern Infrastructure at CARFAX

CARFAX Vehicle History Report

Page 5: Building a Scalable and Modern Infrastructure at CARFAX

Documents on the Report

Page 6: Building a Scalable and Modern Infrastructure at CARFAX

NoSQL Before it Was Cool

Proprietary Key Value Store on OpenVMS Developed by CARFAX in 1984

Page 7: Building a Scalable and Modern Infrastructure at CARFAX

Never mind that sh*t! Here comes Mongo!

Page 8: Building a Scalable and Modern Infrastructure at CARFAX

Why MongoDB?Legacy structures mapped to

documentsHigh availability using replica setsPlatform IndependenceSupport

Page 9: Building a Scalable and Modern Infrastructure at CARFAX

MongoDB at CARFAXOur Production EnvironmentThe Legacy Database and High

Volume LoadsHigh Availability Reads

Page 10: Building a Scalable and Modern Infrastructure at CARFAX

Our Production Environment

Page 11: Building a Scalable and Modern Infrastructure at CARFAX

Server Deployment

AUTOMATEAUTOMATE

AUTOMATEAUTOMATE

Page 12: Building a Scalable and Modern Infrastructure at CARFAX

Server Configuration12 Shards with two spare servers racked for failover• OS: Linux• MongoDB 2.4.9• 128 GIGs of RAM• 1.8 TB of Drive Space • 10K RPM SAS Drives

Page 13: Building a Scalable and Modern Infrastructure at CARFAX
Page 14: Building a Scalable and Modern Infrastructure at CARFAX

The Future

Page 15: Building a Scalable and Modern Infrastructure at CARFAX
Page 16: Building a Scalable and Modern Infrastructure at CARFAX

Extract, Transform, Load

Page 17: Building a Scalable and Modern Infrastructure at CARFAX

Loading Millions to Billions of Records per Day

AUTOMATEAUTOMATE

AUTOMATEAUTOMATE

Page 18: Building a Scalable and Modern Infrastructure at CARFAX

First Attempt To Load Was Completely CPU Bound

Page 19: Building a Scalable and Modern Infrastructure at CARFAX

Not Acceptable!45 Days to

Backload the Legacy Database

Page 20: Building a Scalable and Modern Infrastructure at CARFAX

DistributedProcessing

Page 21: Building a Scalable and Modern Infrastructure at CARFAX

Acceptable! Billion+ inserts per

Day! 9 Days to Backload

Page 22: Building a Scalable and Modern Infrastructure at CARFAX

The MongoDB Implementation

13.6 billion+ documents 1.5 billion+ new documents per

year Document size: ~ 800 Bytes

Page 23: Building a Scalable and Modern Infrastructure at CARFAX

VHR Uses 200+ DocumentsWith Embedded Keys

Page 24: Building a Scalable and Modern Infrastructure at CARFAX

High Availability

Reads

Page 25: Building a Scalable and Modern Infrastructure at CARFAX

Millions of Reports per Day

AUTOMATEAUTOMATE

AUTOMATE

Page 26: Building a Scalable and Modern Infrastructure at CARFAX

Read Scalability With Tagging

Page 27: Building a Scalable and Modern Infrastructure at CARFAX

Each Data center is Tagged

Each Replica Set is Tagged

Page 28: Building a Scalable and Modern Infrastructure at CARFAX

5X More Reports per

Second

Page 29: Building a Scalable and Modern Infrastructure at CARFAX

But we can do More!

Page 30: Building a Scalable and Modern Infrastructure at CARFAX

Lets Wrap It UpDon’t buy a used car without a

CARFAX reportGrok your data and working setArchitect for your load volumeScale your reads to meet demand

30

Page 31: Building a Scalable and Modern Infrastructure at CARFAX

Keys To SuccessAUTOMATE EVERYTHINGTest Many ConfigurationsGrid Computing is AwesomeShard Early, Shard Often

Page 32: Building a Scalable and Modern Infrastructure at CARFAX

And Remember

Page 33: Building a Scalable and Modern Infrastructure at CARFAX

Friends Don’t Let Friends Use Default Ulimits!

Page 34: Building a Scalable and Modern Infrastructure at CARFAX

Thank You! The migration was a

success due to the incredible teams at CARFAX and MongoDB

We are always looking for great people to join us.

www.carfax.com/careers