Binding (with processor group name) versus Caging: Facts...

52
Binding (with processor_group_name) versus Caging: Facts, Observations and Customer cases Bertrand Drouvot

Transcript of Binding (with processor group name) versus Caging: Facts...

Page 1: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

Binding (with processor_group_name) versus Caging: Facts, Observations and

Customer cases

Bertrand  Drouvot

Page 2: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

About Me

➢ Oracle DBA since 1999

➢ Working for the Accenture Enkitec Group

➢ OCP 9i,10g,11g

➢ Rac certified Expert

➢ Exadata certified implementation specialist

➢ Blogger since 2012

➢ Oracle ACE

➢ @bertranddrouvot

➢ BasketBall fan

Page 3: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

Introduction (1/2)

➢ Nowadays it is very common to consolidate multiple databases on the same server.

➢ One could want to limit the CPU usage of databases and/or ensure guarantee CPU for some databases. I would like to compare two methods to achieve this.

➢ Instance Caging (available since 11.2).

➢ processor_group_name (available since 12.1).

Page 4: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

Introduction (2/2)

➢ Facts: Describing the features and a quick overview on how to implement them.

➢ Observations: What I observed on my lab server. You should observe the same on your environment for most of them.

➢ OEM and AWR views for Instance Caging and processor_group_name with CPU pressure.

➢ Customer cases: I cover all the cases I faced so far.

Page 5: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

Facts: Instance Caging

➢ Instance caging: Purpose is to Limit or “cage” the amount of CPU that a database instance can use at any time. It can be enabled online (no need to restart the database) in 2 steps:

➢ SQL> alter system set cpu_count = 2;

➢ SQL> alter system set resource_manager_plan = ‘default_plan’;

Page 6: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

Facts: Caging Graphical View

Page 7: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

Facts: Binding (1/3)

➢ processor_group_name: Bind a database instance to specific CPUs / NUMA nodes.

➢ On Linux x86-64, the named subset of CPUs is created through a Linux feature called control groups (cgroups).

➢ On Oracle Solaris 11 SRU 4, the named subset of CPUs is created through a feature called resource pools.

➢ It is enabled in 4 steps on my RedHat 6.5 machine.

1. Specify the CPUs or NUMA nodes by creating a “processor group”:

Page 8: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

Facts: Binding (2/3)

Page 9: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

Facts: Binding (3/3)

2. Start the cgconfig service.

➢ service cgconfig start

3. Set the Oracle parameter “processor_group_name” to the name of this processor group:

➢ SQL> alter system set processor_group_name='oracle' scope=spfile;

4. Restart the Database Instance.

Page 10: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

Binding Graphical View

Page 11: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

Facts: Availability

➢ The Instance caging can be enabled online.

➢ For cgroups/processor_group_name the database needs to be stopped and started.

Page 12: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

Observations

➢ For brevity, “Caging” is used for “Instance Caging” and “Binding” for “cgroups/processor_group_name” usages.

➢ Also “CPU” is used when CPU threads are referred to, which is the actual granularity for caging and binding.

➢ The observations have been made on a 12.1.0.2 database using large pages (use_large_pages=only). I will not cover the “non large pages” case as not using large pages is not an option for me.

Page 13: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

Observations: binding, NUMA memory, CPUs locality & LIO performance (1/3)

➢ The SGA will be allocated according to the cpuset.mems parameter.

➢ The processes will be bind to the CPUs according to the cpuset.cpus parameter.

Page 14: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

Observations: binding, NUMA memory, CPUs locality & LIO performance (2/3)

!!!!!

➢ Local: cpuset.cpus=”1-9″ and cpuset.mems=”0″

➢ Remote: cpuset.cpus=”1-9″ and cpuset.mems=”7″

➢ Each test will be based on the same SLOB configuration and workload: 8 readers and fix run time of 3 minutes.

Page 15: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

Observations: binding, NUMA memory, CPUs locality & LIO performance (3/3)

➢ Remote is about 2.15 times slower than with the local NUMA node access.

➢ Then pay attention to configure the group correctly with cpuset.cpus and cpuset.mems to ensure local NUMA node access.

Page 16: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

Observations: Without binding SGA is interleaved

Page 17: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

Observations: With binding SGA is not interleaved

Page 18: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

Observations: During LIO pressure, Instance Caging less performant on my lab server (1/2)

➢ Test 1: Curiosity, launch the SLOB runs (9 readers) without cpu binding and without Instance caging (without LIO pressure).

➢ Test 2: Instance caging (6) in place and LIO pressure (9 SLOB readers).

➢ Test 3: cpu binding in place (6 cpus same numa node) and LIO pressure (9 SLOB readers).

➢ Test 4: cpu binding (9 cpus same numa node) and Instance caging (6) in place with LIO pressure (9 SLOB readers) due to the caging.

➢ Test 5: Curiosity, launch the SLOB (9 readers) runs with cpu binding (9 cpus) and without Instance caging (without LIO pressure).

Page 19: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

Observations: During LIO pressure, Instance Caging less performant on my lab server (2/2)

➢ The ranking is the following (Logical IO per seconds in descending order):

1. cpu binding (without LIO pressure): 8 600 000 logical reads per second.

2. without anything (without LIO pressure): 6 300 000 logical reads per second.

3. cpu binding only and LIO pressure: 5 800 000 logical reads per second.

4. cpu binding and Instance caging (LIO pressure due to the caging): 5 100 000 logical reads per second.

5. Instance caging only and LIO pressure: 3 500 000 logical reads per second.

Page 20: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

Observations: binding already in place, change the number of cpus a database is allowed to use on the fly

➢ Initially: cpuset.cpus="1-6”

➢ Alert.log: Instance started in processor group oracle (NUMA Nodes: 0 CPUs: 1-6)

➢ taskset -c -p `ps -ef | grep -i smon | grep BDT12CG | awk '{print $2}'`

➢ pid 1019968's current affinity list: 1-6 ➢ /bin/echo 11-20 > /cgroup/cpuset/oracle/cpuset.cpus ➢ Alert.log: Detected change in CPU count to 10.

➢ taskset -c -p `ps -ef | grep -i smon | grep BDT12CG | awk '{print $2}'`

➢ pid 1019968's current affinity list: 11-20

Page 21: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

Observations: Summary

➢ Pay attention to configure the group correctly with cpuset.cpus and cpuset.mems to ensure local NUMA node access.

➢ With the binding the SGA is not interleaved.

➢ With Caging the SGA is interleaved (unless on top of the Binding).

➢ Instance caging has been less performant (compare to the cpu binding) during LIO pressure.

➢ With cpu binding already in place (using processor_group_name) we are able to change the number of cpus a database is allowed to use on the fly.

➢ All above mentioned observations are related to performance.

Page 22: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

CPU pressure with Caging: OEM viewcpu_count  parameter  set  to  6  and  is  running  9  SLOB  users  in  parallel  

In  average  about  6  sessions  are  running  on  the  CPU  and  about  3  are  waiting  in  the  “Scheduler”  wait  class  

Page 23: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

CPU pressure with Caging: AWR view

Approximately  60%  of  the  DB  time  is  spend  on  CPU  

40%  is  spend  waiting  because  of  the  caging  (Scheduler  wait  class)  

Page 24: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

CPU pressure with Binding: OEM viewcgroup  of  6  CPUs.  The  Instance  is  running  9  SLOB  users  in  parallel  

In  average  6  sessions  are  running  on  CPU  and  3  are  waiting  for  the  CPU  (runqueue)  

Page 25: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

CPU pressure with Binding: AWR view

New  “Top  Events”  section  as  of  12c  

Page 26: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

CPU pressure with Binding: Summary

Page 27: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

Case number 1: One database uses too much CPU and affects other instances’ performance

➢ Then we want to limit its CPU usage.

➢ Caging: We are able to cage the database Instance without any restart.

➢ Binding: We are able to bind the database Instance, but it needs a restart.

➢ Example of such a case by implementing Caging:

Page 28: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

Case number 1: One database uses too much CPU and affects other instances’ performance: What to choose?

➢ The caging is the best choice in this case (as it is the easiest and we don’t need to restart the database).

Page 29: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

For the following cases we need to define 2 categories of database

➢ The paying ones: Customer paid for guaranteed CPU resources available.

➢ The non-paying ones: Customer did not pay for guaranteed CPU resources available.

Page 30: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

Guarantee CPU resource for some databases at the database level: Caging (1/2)

➢ All the databases need to be caged (if not, there is no guaranteed resources availability as one non-caged could take all the CPU).

➢ As a consequence, you can’t mix paying and non-paying customer without caging the non-paying ones.

➢ You have to know how to cage each database individually to be sure that sum(cpu_count) <= Total number of threads (If not, CPU resources can’t be guaranteed).

➢ Be sure that sum(cpu_count) of non-paid <= “CPU Total – number of paid CPU” (If not, CPU resources can’t be guaranteed).

➢ There are a maximum number of databases that you can put on a machine: As cpu_count >=2 (see facts) then the maximum number of databases = Total number of threads /2

Page 31: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

Guarantee CPU resource for some databases at the database level: Caging (2/2)

Page 32: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

Guarantee CPU resource for some databases at the database level: Binding (1/2)

➢ Each database is linked to a set of CPU (There is no overlap, means that a CPU can be part of only one cgroup).

➢ One cgroup per paying database.

➢ One cgroup for all the non-paying databases.

➢ That way we can easily mix paying and non-paying customer on the same machine.

Page 33: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

Guarantee CPU resource for some databases at the database level: Binding (2/2)

Page 34: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

Guarantee CPU resource for some databases at the database level: Binding + Caging Mix (1/2)

➢ create a cgroup for all the paying databases and then put the caging on each paying database.

➢ Put the non-paying into another cgroup (no overlap with the paying one) without caging.

Page 35: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

Guarantee CPU resource for some databases at the database level: What to choose?

➢ No “best choice”.

➢ It all depends of the number of database we are talking about (For a large number of databases I would use the mix approach).

➢ don’t forget that with the caging option your can’t put more than number of threads/2 databases.

Page 36: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

Guarantee CPU for exactly one group of databases at the group level: Caging option 1 (1/2)

➢ Cage all the non-paying databases.

➢ Be sure that sum(cpu_count) of non-paid <= “CPU Total – number of paid CPU” (If not, CPU resources can’t be guaranteed).

➢ Don’t cage the paying databases.

➢ That way the paying databases have guaranteed at least the resources they paid for.

Page 37: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

Guarantee CPU for exactly one group of databases at the group level: Caging option 1 (2/2)

Page 38: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

Guarantee CPU for exactly one group of databases at the group level: Caging option 2 (1/2)

➢ Cage all the databases (paying and non-paying).

➢ You have to know how to cage each database individually to be sure that sum(cpu_count) <= Total number of threads (If not, CPU resources can’t be guaranteed).

➢ Be sure that sum(cpu_count) of non-paid <= “CPU Total – number of paid CPU” (If not, CPU resources can’t be guaranteed).

➢ That way the paying databases have guaranteed exactly the resources they paid for.

Page 39: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

Guarantee CPU for exactly one group of databases at the group level: Caging option 2 (2/2)

Page 40: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

Guarantee CPU for exactly one group of databases at the group level: Binding option 1 (1/2)

➢ Put all the non-paying databases into a “CPU Total – number of paid CPU ” cgroup.

➢ Allow all the CPUs to be used by the paying databases (don’t create a cgroup for the paying databases).

➢ That way the paying group has guaranteed at least the resources it paid for.

Page 41: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

Guarantee CPU for exactly one group of databases at the group level: Binding option 1 (2/2)

Page 42: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

Guarantee CPU for exactly one group of databases at the group level: Binding option 2 (1/2)

➢ Put all the paying databases into a cgroup.

➢ Put all the non-paying databases into another cgroup (no overlap with the paying cgroup).

➢ That way the paying databases have guaranteed exactly the resources they paid for.

Page 43: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

Guarantee CPU for exactly one group of databases at the group level: Binding option 2 (2/2)

Page 44: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

Guarantee CPU for exactly one group of databases at the group level: What to choose?

➢ I like the option 1) in both cases as it allows customer to get more than what they paid for (with the guarantee of what they paid for).

➢ Then the choice between caging and binding really depends of the number of databases we are talking about (binding is preferable and easily manageable for large number of databases).

Page 45: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

Guarantee CPU for more than one group of databases at the group level: Caging (1/2)

➢ All the databases need to be caged (if not, there is no guaranteed resources availability as one non-caged could take all the CPU).

➢ As a consequence, you can’t mix paying and non-paying customer without caging the non-paying ones.

➢ You have to know how to cage each database individually to be sure that sum(cpu_count) <= Total number of threads (If not, CPU resources can’t be guaranteed).

➢ Be sure that sum(cpu_count) of non-paid <= “CPU Total – number of paid CPU” (If not, CPU resources can’t be guaranteed).

➢ There are a maximum number of databases that you can put on a machine: As cpu_count >=2 (see facts) then the maximum number of databases = Total number of threads /2

Page 46: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

Guarantee CPU for more than one group of databases at the group level: Caging (2/2)

Page 47: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

Guarantee CPU for more than one group of databases at the group level: Binding (1/2)

➢ Create a cgroup for each paying database group.

➢ Create a cgroup for the non-paying ones (no overlap between all the groups).

➢ Overlapping is not possible in that case (means you can’t guarantee resources with overlapping).

Page 48: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

Guarantee CPU for more than one group of databases at the group level: Binding (2/2)

Page 49: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

Guarantee CPU for more than one group of databases at the group level: What to choose?

➢ It really depends of the number of databases and groups we are talking about (binding is preferable and easily manageable for large number of databases).

Page 50: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

Remarks

➢ If you see (or faced) more cases, then feel free to share and to comment on the blog post.

➢ I did not pay attention on potential impacts on LIO performance linked to any possible choice (caging vs binding with or without NUMA).

➢ I just mentioned them into the observations, but did not include them into the use cases (I just discussed the feasibility of the implementation).

➢ I really like the options 1 into the case with exactly one group. This option has been proposed by Karl Arao during a tweeter conversation.

Page 51: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

Conclusion

➢ The choice between caging and binding depends of:

➢ Your needs.

➢ The number of databases you are consolidating on the server.

➢ Your performance expectations.

Page 52: Binding (with processor group name) versus Caging: Facts ...files.meetup.com/6732322/binding_caging.pdfObservations For brevity, “Caging” is used for “Instance Caging” and

Questions