JudCon 2010 Berlin: Practical Enterprise Java Performance Tuning
Tuning Linux for your database OSDC Berlin 2016
Transcript of Tuning Linux for your database OSDC Berlin 2016
Tuning Linux for your database
Colin Charles, Team MariaDB, MariaDB [email protected] / [email protected]
http://bytebot.net/blog/ | @bytebot on TwitterOSDC, Berlin, Germany
28 April 2016
1
whoami• Work on MariaDB at MariaDB Corporation
(SkySQL Ab)
• Merged with Monty Program Ab, makers of MariaDB
• Formerly MySQL AB (exit: Sun Microsystems)
• Past lives include Fedora Project (FESCO), OpenOffice.org
• MySQL Community Contributor of the Year Award winner 2014
2
Operating System
• Focus of this talk & tools: Linux
• But distributions make a difference too
• like RHEL6 has tuned but Ubuntu doesn’t
• Distribution versions can make a difference too
• RHEL5: ktune; RHEL6: tuned; RHEL7: tuned throughput-performance
4
Servers
• When you buy servers... do you think in terms of CPU, then memory, then I/O?
• Database servers clearly love I/O more than memory more than CPU
• queries! scans of large tables? index scans with random I/O? BI?
• Durability requires each transaction commit to be flushed to disk - i.e. call fsync()
5
I/O
• What kind of storage do you use?
• SAS or SATA?
• Ethernet (NAS, DRBD)
• SSD
• Fusion IO (NVMFS)
6
Schedulers (I/O Elevators)
• cfq: default, great for slower storage (SATA)
• noop: low latency storage (SSD)
• deadline: multi-process apps
• Most database workloads benefit from the deadline scheduler (goal is after all to minimise seeks, prioritise process I/O)
7
Elevators continued
• Boot time: elevator=deadline
•echo “deadline” > /sys/class/block/sdaN/queue/scheduler
• http://dimitrik.free.fr/blog/archives/2012/01/mysql-performance-linux-io.html
• http://www.mysqlperformanceblog.com/2009/01/30/linux-schedulers-in-tpcc-like-benchmark/
8
Sample elevator results
9
1 thread tps
1 thread ioutil%
8 threads tps
8 threads ioutil%
deadline* 15000 <1% 19100 <1%
deadline 6850 32% 15250 45%
cfq 5880 45% 1240 94%
File systems
• Choices: ext4, XFS
• mount options:
• ext4: rw, nosuid, noatime, data=ordered, nobarrier
• xfs: nobarrier
• Turn off I/O barriers
• http://lwn.net/Articles/283161/
• Separate files (data/logs/undo/etc.) or not?
• Don’t forget flash backed write caches or battery backed write cache on RAID cards (way more fsync/s)
• iostat -dmx <interval> is your friend
10
SSD & Flash• SSD
• much more IOPS than traditional disks
• lower latency
• make sure you have a new RAID controller
• Flash
• more IOPS than SSD
• very low latency
• InnoDB doublewrite buffer -- MariaDB 10.0, Percona Server 5.6 (Fusion IO), MySQL 5.7
11
RAID
• RAID0 - fast writes, reads, no redundancy
• RAID1 - slower writes, fast reads
• RAID5 - slow for random writes, fast for sequential writes, fast reads too but slow recovery
• RAID10 - fast reads & writes
12
LVM
• Why?
• easily expand disk
• snapshot backups
• Why not?
• 2-3% performance penalty
• Snapshot penalties
13
Memory• The more RAM the better -> reduces I/O requirements
• Use ECC RAM
• NUMA - “swap insanity”
• http://blog.jcole.us/2010/09/28/mysql-swap-insanity-and-the-numa-architecture/
•numactl --interleave=all mysqld &
• Twitter patch/Percona Server 5.6
• numa_interleave option
• innodb_buffer_pool_populate - NUMA decisions to be made when buffer cache is clean
14
Why does MySQL need memory?
• Per-session buffers
• sort buffer, temp tables
• Metadata/locking
• index statistics, table definitions
• Caching data
• decrease reads: faster response time
• decrease random writes: queues write in cache; perform more sequential writes
15
Where is memory used?
• filesystem cache
• binary / relay logs
• MyISAM data relies on filesystem cache
• MySQL cache:
• innodb_buffer_pool_size (pages), MyISAM’s key_buffer_size (indexes)
• Per session buffers
•sort_buffer_size, join_buffer_size, read_buffer_size, tmp_table_size
•free -m / SHOW ENGINE INNODB STATUS
16
Transparent Huge pages
• 2M (4M even) pages vs 4K standard Linux page
•vm.nr_hugepages=8192
• MySQL can benefit from transparent huge pages, MongoDB doesn’t (neither does Redis)
17
Swappiness• Controls how aggressively the system reclaims
“mapped” memory
• default: 60, vm.swappiness=0 is ok (but not for newer kernels!)
• decrease: aggressive reclaiming of unmapped pagecache memory
• increase: aggressive swapping of mapped memory
• 3.5 kernels change behaviour (and this was backported to RHEL 2.6.32-303)
18
CPU
• default is ondemand, you should turn it to performance generally
•echo "performance" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
• disable powersave mode
• cpufrequtils is where it’s at
• look at BIOS: turbo mode/C-state/OS power control
19
Network
• Tools: tc, dropwatch
• Use gigabit Ethernet (more bandwidth/throughput, less latency)
• Can also trunk/bond for HA
• net.ipv4.tcp_no_metrics_save=1
20
Network II
• ARP filter - prevents ARP flux
•echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter
• MTU size at 9,000 - “jumbo frames”
• sar -n DEV
21
Network III• allow more connections to queue up: echo 4096 > /proc/sys/net/ipv4/tcp_max_syn_backlog
• /etc/sysctl.conf - net.ipv4.tcp_max_syn_backlog = 4096
• increase kernel packet buffer: net.core.netdev_max_backlog = 4096
• increase socket connection wait queue: net.core.somaxconn = 4096
• Reduce TCP timeout that comes after closing socket (default 60s): net.ipv4.tcp_fin_timeout = 30
22
Network IV
• Fully synchronous replication, ala, Galera Cluster? (NDBCluster too)
• Fastest as your slowest node
• evs.send_window + evs.user_send_window (something large e.g. 512)
23
Resource Limits• ulimits / /etc/security/limits.conf - they need to be
high/sensible over the defaults
• -f (file size): unlimited
• -t (cpu time): unlimited
• -v (virtmem): unlimited
• -n (open files): 64000
• -m (memsize): unlimited
• -u (processes/threads): 32000
24
Tune that database
• Focus on database design - find slow, crappy queries and fix them (eg. mysql: pt-query-digest)
• Reduce locking or time waiting - always benchmark your application continually
• Don’t forget /etc/my.cnf (and appropriate friends)
25
KVM/Xen/Virtualization
• Most of your raw database performance comes from being hardware-centric
• KVM at device level you can turn off caching
• Native AIO used over threaded nowadays
• VMWare: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1008542
26
Containers
• Kubernetes - vitess.io rolls out w/o issue, CoreOS, Docker
• Remember when benchmarking, you have to look at local vs remote benchmarking (networking can add overhead)
• 1 thread (up to 25% difference), but average seems to be 15% loss for r/w workloads; remotely? 32% is possible
27
EC2
• EC2 - instance limitations
• EBS - unpredictable I/O performance, use RAID10 or RAID0
• RDS - similar performance between EBS RAID10 MySQL & RDS
• choice of MySQL 5.7 (and older) or MariaDB Server 10.0 or PostgreSQL
28
Why MariaDB/Percona Server
• Threadpool
• NUMA interleaving
• Numerous performance fixes in XtraDB like fast InnoDB restarts, parallel doublewrite buffer, etc.
• Can run (supported) MariaDB Galera Cluster/Percona XtraDB Cluster
29
Hardware overall
• Test everything
• Sometimes NIC’s are a bad batch or the driver is buggy (RHEL5 + some Broadcom NICs drop packets under high load - bnx2 before 2.6.33)
• Sometimes you get a bad batch of disk
30
Hadoop/Hbase
• running this in EC2 is problematic
• make sure name resolution works
• enable bonding
• networking: hadoop uses big buffers (64MB blocks by default)
• net.ipv4.tcp_rmem = 32768 436600 4194304
• net.ipv4.tcp_wmem = 32768 436600 4194304
• http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.3/bk_cluster-planning-guide/content/ch_hardware-recommendations.html
31
Other databases
• MongoDB: http://docs.mongodb.org/manual/administration/production-notes/
• PostgreSQL: http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
• Neo4j: http://docs.neo4j.org/chunked/stable/configuration-linux-notes.html
• Riak: http://docs.basho.com/riak/1.4.0/cookbooks/Linux-Performance-Tuning/
32
SELinux, firewalls, security
• The easy way is to “disable SELinux”
• The reality is you should learn it
• Its just like a firewall that you really should configure
33
Tools• ps
• vmstat 1
• iostat
• top
• free
• sar (sar -n DEV 1)
• gdb
• tcpdump
• strace
• oprofile
• htop
• MySQL only: Percona Toolkit (pt-diskstats, pt-summary, pt-query-digest, etc.)
34
Benchmarking• sysbench
• OLTP test, use tables with 20M rows and 20M transactions, check 1-128 threads/run
• LinkBench
• Yahoo! Cloud Serving Benchmark
• https://github.com/brianfrankcooper/YCSB
• Google’s PerfKit Benchmarker
• https://github.com/GoogleCloudPlatform/PerfKitBenchmarker
35
Resources
• http://doc.opensuse.org/products/draft/SLES/SLES-tuning_sd_draft/book.sle.tuning.html
• https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html-single/Performance_Tuning_Guide/index.html
• https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html-single/Virtualization_Tuning_and_Optimization_Guide/index.html
37
A word from your sponsors• MariaDB Corporation sponsored my T&E to
come to OSDC
• Remember you can buy a MariaDB Enterprise subscription to help you with your cloud deployments, MariaDB Server usage, etc.
• Services include: 24x7 support, consulting, training, remote DBA services, non-recurring engineering and more for all variants of MySQL
Thanks/Q&AColin Charles, [email protected] | [email protected]
http://bytebot.net/blog/ | @bytebotslides: slideshare.net/bytebot/
Download MariaDB and give it a try: http://mariadb.org/
39