Designing for High Performance Ceph at Scale
-
Upload
james-saint-rossy -
Category
Technology
-
view
378 -
download
4
Transcript of Designing for High Performance Ceph at Scale
![Page 1: Designing for High Performance Ceph at Scale](https://reader033.fdocuments.in/reader033/viewer/2022051706/58e74a0b1a28ab91558b4e0d/html5/thumbnails/1.jpg)
Designing for High Performance Ceph at Scale
April 26, 2016
James Saint-Rossy - Principal Storage Engineer, ComcastJohn Benton - Consulting Systems Engineer, WWT
![Page 2: Designing for High Performance Ceph at Scale](https://reader033.fdocuments.in/reader033/viewer/2022051706/58e74a0b1a28ab91558b4e0d/html5/thumbnails/2.jpg)
Today’s Agenda
• Our Lab/Production Environment• Holistic Architecture• Strategies for Benchmarking• Performance Bottlenecks/Lessons Learned• Tuning Tips and Tricks
Designing for High Performance Ceph at Scale2
![Page 3: Designing for High Performance Ceph at Scale](https://reader033.fdocuments.in/reader033/viewer/2022051706/58e74a0b1a28ab91558b4e0d/html5/thumbnails/3.jpg)
Our Typical Node ConfigurationStorage Node• 72 X 6 TB SATA 7.2K HDD’s• 3 X 1.6TB PCIe NVME’s (Journals)• 2 X Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (12 cores)• 256 GB of RAM• Dual Port 40Gbe NIC
Mon/RGW Node• 2 x Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz• 32 GB of Ram• Dual Port 10 Gbe NIC• ...Nothing Special
3 Designing for High Performance Ceph at Scale
![Page 4: Designing for High Performance Ceph at Scale](https://reader033.fdocuments.in/reader033/viewer/2022051706/58e74a0b1a28ab91558b4e0d/html5/thumbnails/4.jpg)
Lab/Production Environment Layout
Designing for High Performance Ceph at Scale4
![Page 5: Designing for High Performance Ceph at Scale](https://reader033.fdocuments.in/reader033/viewer/2022051706/58e74a0b1a28ab91558b4e0d/html5/thumbnails/5.jpg)
Holistic ArchitectureCustomer Requirements-IOPS/Read Write Mix/Object Size …-How Much Replication-Which APIs
Cost-HW Cost/Support Cost/Operational Cost?
Failure Domain-Servers/Racks/Servers/Rows Etc...
Data Center Constraints-Space/Power/Thermal
Operational Complexity-Complex Hardware Configs
Designing for High Performance Ceph at Scale5
![Page 6: Designing for High Performance Ceph at Scale](https://reader033.fdocuments.in/reader033/viewer/2022051706/58e74a0b1a28ab91558b4e0d/html5/thumbnails/6.jpg)
Holistic Architecture Cont’d
Journals- Colocated?- SSD vs NVME?
Designing for High Performance Ceph at Scale6
![Page 7: Designing for High Performance Ceph at Scale](https://reader033.fdocuments.in/reader033/viewer/2022051706/58e74a0b1a28ab91558b4e0d/html5/thumbnails/7.jpg)
Strategies for BenchmarkingTools-Fio for block-Cosbench for object
IOPS Isn’t Everything-1000 workers may give you 30% more iops but at the cost of 600% higher latency
Verify Published Stats With Benchmarks-… Always
Verify Scale-Out
Designing for High Performance Ceph at Scale7
![Page 8: Designing for High Performance Ceph at Scale](https://reader033.fdocuments.in/reader033/viewer/2022051706/58e74a0b1a28ab91558b4e0d/html5/thumbnails/8.jpg)
Performance - TCMalloc• As cluster size increased, %SYS was increasingly taxed• System profiling revealed up to 50% of CPU resources used by TCMalloc• This library can be tuned to have more memory. This was good for nearly a
50% increase
Designing for High Performance Ceph at Scale8
![Page 9: Designing for High Performance Ceph at Scale](https://reader033.fdocuments.in/reader033/viewer/2022051706/58e74a0b1a28ab91558b4e0d/html5/thumbnails/9.jpg)
Modern PC Architecture
9 Designing for High Performance Ceph at Scale
![Page 10: Designing for High Performance Ceph at Scale](https://reader033.fdocuments.in/reader033/viewer/2022051706/58e74a0b1a28ab91558b4e0d/html5/thumbnails/10.jpg)
Performance - Inter-node data flow
10 Designing for High Performance Ceph at Scale
![Page 11: Designing for High Performance Ceph at Scale](https://reader033.fdocuments.in/reader033/viewer/2022051706/58e74a0b1a28ab91558b4e0d/html5/thumbnails/11.jpg)
OSD Data Workflow
11
"complicated situation" by bandinisonfire is licensed under CC BY-NC-SA 2.0
Designing for High Performance Ceph at Scale
![Page 12: Designing for High Performance Ceph at Scale](https://reader033.fdocuments.in/reader033/viewer/2022051706/58e74a0b1a28ab91558b4e0d/html5/thumbnails/12.jpg)
Performance - NUMA• The bigger and faster the data node, the bigger the
bottleneck potential• We tuned several areas to avoid unnecessary trips
across the QPI bus• To map everything you must:
• Map CPU cores to sockets• Map PCIE devices to sockets• Map storage disks (and journals) to the associated
HBA
Designing for High Performance Ceph at Scale12
![Page 13: Designing for High Performance Ceph at Scale](https://reader033.fdocuments.in/reader033/viewer/2022051706/58e74a0b1a28ab91558b4e0d/html5/thumbnails/13.jpg)
NUMA - IRQsPin all soft IRQs for all IO devices to it’s associated NUMA node
13 Designing for High Performance Ceph at Scale
![Page 14: Designing for High Performance Ceph at Scale](https://reader033.fdocuments.in/reader033/viewer/2022051706/58e74a0b1a28ab91558b4e0d/html5/thumbnails/14.jpg)
NUMA - Mount PointsAlign mount points so that the OSD and journal are on the same NUMA node
14 Designing for High Performance Ceph at Scale
![Page 15: Designing for High Performance Ceph at Scale](https://reader033.fdocuments.in/reader033/viewer/2022051706/58e74a0b1a28ab91558b4e0d/html5/thumbnails/15.jpg)
NUMA - OSD ProcessesPin OSD processes to the NUMA node associated with the storage it controls
15 Designing for High Performance Ceph at Scale
![Page 16: Designing for High Performance Ceph at Scale](https://reader033.fdocuments.in/reader033/viewer/2022051706/58e74a0b1a28ab91558b4e0d/html5/thumbnails/16.jpg)
Performance - General Tips• Use latest vendor drivers.
-We have seen 30% improvements from stock drivers• OS tuning focused on increasing threads, file handles,
etc.• Jumbo frames help, particularly on the cluster network• Flow control issues with 40Gbe network adapters• Scan for failing (but perhaps not completely failed) disks
Designing for High Performance Ceph at Scale16
![Page 17: Designing for High Performance Ceph at Scale](https://reader033.fdocuments.in/reader033/viewer/2022051706/58e74a0b1a28ab91558b4e0d/html5/thumbnails/17.jpg)
Designing for High Performance Ceph at Scale17
"Question" by alphageek is licensed under CC BY-NC-SA 2.0
![Page 18: Designing for High Performance Ceph at Scale](https://reader033.fdocuments.in/reader033/viewer/2022051706/58e74a0b1a28ab91558b4e0d/html5/thumbnails/18.jpg)
Designing for High Performance Ceph at Scale18
![Page 19: Designing for High Performance Ceph at Scale](https://reader033.fdocuments.in/reader033/viewer/2022051706/58e74a0b1a28ab91558b4e0d/html5/thumbnails/19.jpg)
Performance - Mons• Mons are generally a glorified TFTP server and you can
get away with 1+2 for redundancy • That is, until they aren’t….....• In certain situations like cluster rebalancing or deleting a
pool with a lot of PG’s, a single CPU on *ALL* mons will become jammed up. They start evicting each other and meyhem ensues.
• How to fix this:
Presentation title (optional)19