A Year with Cinder and Ceph at TWC

30
A Year with Cinder and Ceph at TWC Photo by Navin https://flic.kr/p/7vSNe7 By Craig DeLatte & Bryan Stillwell - May 20, 2015

Transcript of A Year with Cinder and Ceph at TWC

A Year with Cinder and Ceph at TWC

Photo by Navinhttps://flic.kr/p/7vSNe7

By Craig DeLatte & Bryan Stillwell - May 20, 2015

What we will cover

• All views are from a systems admin perspective• Cinder

– Evaluating storage– Traditional vs. “grid” type storage

• Ceph• Adding more backends

Craig DeLatte & Bryan Stillwell

Our Criteria for Evaluating Storage

Craig DeLatte & Bryan Stillwell

•Must have passed the base cinder matrix for release•Must have open API access to allow for monitoring and gathering statistics•Must support nova’s live-migration•Ideally support a rack anywhere methodology

What Led Us to Using Ceph

• Supports live-migration• Ability to use x86 architecture instead of vendor specific hardware

Craig DeLatte & Bryan Stillwell

Our First Ceph Design

• What led to our design and where it went wrong– In our environment (and maybe yours) our customers only plan on

capacity, they assume unlimited performance

Craig DeLatte & Bryan Stillwell

First OpenStack Deployment

• Live migration testing• Ceph to the rescue

Craig DeLatte & Bryan Stillwell

Early Life with Ceph

• Out of family upgrades require a leap of faith• Be prepared to scare your co-workers• How your first production upgrade will feel

Craig DeLatte & Bryan Stillwell

Initial Ceph Cluster

OSDs: 60Journal Ratio: 5:1Drive Size: 1TBRaw Capacity: 60TBUsable Capacity: 20TB

Craig DeLatte & Bryan Stillwell

First Expansion

Craig DeLatte & Bryan Stillwell

First Expansion

OSDs: 60Journal Ratio: 5:1Drive Size: 1TBRaw Capacity: 60TBUsable Capacity: 20TB

OSDs: 75Journal Ratio: 5:1Drive Size: 1.2TBRaw Capacity: 90TBUsable Capacity: 30TB

Craig DeLatte & Bryan Stillwell

What Went Wrong

• Performance issues– Too high HDD:SSD ratio for journals– Not enough placement groups (PGs)

• VMs lost sight of storage (libvirt)• Legacy tunables• VMs lost site of storage again! (version mismatch)

Craig DeLatte & Bryan Stillwell

Corrections Made

• Ordered more SSDs to reduce HDD:SSD ratio• Re-used mon IPs• Placement groups went from 512 PGs/pool to 4096 PG/s pool• Tunables switched to ‘firefly’• Need to make sure ‘ALL’ systems are upgraded to the new version

ceph osd set nobackfillceph osd set noscrubceph osd set nodeep-scrub

osd max backfills = 1osd recovery max active = 1osd recovery op priority = 1

Craig DeLatte & Bryan Stillwell

Second Expansion

Craig DeLatte & Bryan Stillwell

Second Expansion

OSDs: 75Journal Ratio: 5:1Drive Size: 1.2TBRaw Capacity: 90TBUsable Capacity: 30TB

OSDs: 189Journal Ratio: 3:1Drive Size: 1.2TBRaw Capacity: 226.8TBUsable Capacity: 75.6TB

Craig DeLatte & Bryan Stillwell

What Went Wrong

• More performance problems during expansion• Unintentional upgrades (giant)

Craig DeLatte & Bryan Stillwell

Corrections Made

• Decided we needed dedicated mon nodes• Added a couple more options to improve performance• Started work on replacing ceph-deploy with puppet-ceph

osd max backfills = 1osd recovery max active = 1osd recovery op priority = 1osd recover max single start = 1osd op threads = 12

Craig DeLatte & Bryan Stillwell

Third Expansion

Craig DeLatte & Bryan Stillwell

Third Expansion

OSDs: 189Journal Ratio: 3:1Drive Size: 1.2TBRaw Capacity: 226.8TBUsable Capacity: 75.6TB

OSDs: 297Journal Ratio: 3:1Drive Size: 1.2TBRaw Capacity: 356.4TBUsable Capacity: 118.8TB

Craig DeLatte & Bryan Stillwell

What Went Wrong

• Performance problems when adding OSDs• Started removing OSDs before the data was off them

Craig DeLatte & Bryan Stillwell

Corrections Made

• Work started on replacing ceph-deploy with puppet-ceph• Added option to bring in new OSDs with a weight of 0

osd max backfills = 1osd recovery max active = 1osd recovery op priority = 1osd recover max single start = 1osd op threads = 12osd crush initial weight = 0

Craig DeLatte & Bryan Stillwell

Fourth Expansion (most recent)

Craig DeLatte & Bryan Stillwell

Fourth Expansion (most recent)

OSDs: 297Journal Ratio: 3:1Drive Size: 1.2TBRaw Capacity: 356.4TBUsable Capacity: 118.8TB

OSDs: 306Journal Ratio: 3:1Drive Size: 1.2TBRaw Capacity: 367.2TBUsable Capacity: 122.4TB

Craig DeLatte & Bryan Stillwell

Dedicated Mon Nodes

YES! Finally!

Craig DeLatte & Bryan Stillwell

Multiple Ceph Clusters

• 2 production• 2 staging• 2 lab• Virtual clusters for each member of the team

Craig DeLatte & Bryan Stillwell

The Next Cinder Hurdle

• Going from single backend to multi-backend• Naming of backends needs to be planned for• Not all lab testing will reveal issues when going to production

Craig DeLatte & Bryan Stillwell

Looking Foward

• New storage tiers (Performance-SSD, Capacity-HDD)• Emerging drive technologies• Newstore

Craig DeLatte & Bryan Stillwell

Takeaways

• Don't start small if you're going big

• Order the right number and type of SSDs

• Determine the right number of PGs early

• Dedicated mon nodes (fsync)

• Be careful with mon nodes in OpenStack

• Ceph upgrades (don't forget the compute nodes)

Craig DeLatte & Bryan Stillwell

What Made It Worth the Effort

• We are now not locked into vendor specific hardware• Scaling across racks, rows, and rooms• Nasty data migrations are a thing of the past• It allows us to future proof our data against EOL hardware support• We have a say!

– The Ceph working session is today at 11:50 in room 217

Craig DeLatte & Bryan Stillwell

Questions or Comments

• Email: [email protected]

• irc: cdelatte

• Email: [email protected]

• irc: bstillwell

Craig DeLatte & Bryan Stillwell

More TWC Talks

Wednesday, May 20th

Getting DNSaaS to Production with DesignateGrowing OpenStack at Time Warner CableChanging Culture at Time Warner CableNeutron in the Real World - TWC Implementation and Evolution

Thursday, May 21st

Real World Experiences with Upgrading OpenStack at Time Warner Cable

9:50a11:00a11:50a1:50p

2:20p

Craig DeLatte & Bryan Stillwell