Cpu Monitoring and Tunig SMIT
-
Upload
jeetmajumdar007 -
Category
Documents
-
view
214 -
download
0
Transcript of Cpu Monitoring and Tunig SMIT
-
7/27/2019 Cpu Monitoring and Tunig SMIT
1/26
Introduction
AIX 5L Version 5.3 is the latest version of the AIX operating system that offers
simultaneous multi-threading (SMT) on eServer p5 systems to deliver industry leading
throughput and performance levels. With support for advanced virtualization, AIX 5L
Version 5.3 helps you to dramatically increase your server utilization and consolidateworkloads for more efficient management.
A review of computing history and operating systems shows that computer scientists have
developed many CPU scheduling policies. First-in, first-out (FIFO), shortest job first, and
round robin are just a few. Scheduling policies are important because a single policy might
not be best suited to all applications. Some applications in certain workloads can run well in a
default scheduling policy. However, the same applications with a different workload might
require a scheduling policy adjustment in order to achieve the optimal performance.
Note: This article is an update for AIX 5.3 performance. The advanced virtualization is not
discussed in this article. It has enhancements and updates to emphasize AIX 5L Version 5.3features, tools, and capabilities.
Back to top
What is SMT?
SMT is the ability of a single physical processor to concurrently dispatch instructions from
more than one hardware thread. In AIX 5L Version 5.3, a dedicated partition created with one
physical processor is configured as a logical two-way by default. Two hardware threads can
run on one physical processor at the same time. SMT is a good choice when overall
throughput is more important than the throughput of an individual thread. For example, Web
servers and database servers are good candidates for SMT.
Viewing processor and attribute information
By default, the SMT is enabled, as shown inListing 1below.
Listing 1. SMT# smtctl
This system is SMT capable.
SMT is currently enabled.
SMT threads are bound to the same physical processor.
Proc0 has 2 SMT threads
Bind processor 0 is bound with proc0
Bind processor 2 is bound with proc0
Proc2 has 2 SMT threads
https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing1https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing1https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing1https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing1https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pcon -
7/27/2019 Cpu Monitoring and Tunig SMIT
2/26
Bind processor 1 is bound with proc2
Bind processor 3 is bound with proc2
# lsattr -El proc0
frequency 1656376000 Processor Speed False
smt_enabled true Processor SMT enabled False
smt_threads 2 Processor SMT threads False
state enable Processor state False
type PowerPC_POWER5 Processor type False
The smtctl command provides privileged users and applications the ability control
utilization of processors with SMT support. With this command, you can turn SMT on or off.
The smtctl command syntax is:
smtctl [-m off | on [ -w boot | now] ]
What are shared processors?
Shared processors are physical processors that are allocated to partition on a timeslice basis.
You can use any physical processor in the shared processor pool to meet the execution needs
of any partition using the shared processor pool. An eServer p5 system can contain a mix of
shared and dedicated partitions. A partition must be all shared or all dedicated, and you can
not use dynamic LPAR (DLPAR) commands to change between the two. You need to bring
down the partition and switch it from using dedicated to shared, or vice versa.
Processing units
After a partition is configured, you can assign it an amount of processing units. A partition
must have a minimum of 1/10 of a processor. And after that requirement has been met, you
can configure processing units at the granularity on 1/100 of a processor. A partition that uses
shared processors is often called a shared partition. A dedicated partition is one that usesdedicated processors.
Each partition is configured with a percentage of execution dispatch time for each 10
milliseconds (ms) timeslice. For example:
A partition with 0.2 processing units is entitled to 20 percent capacity during eachtimeslice.
A partition with 1.8 processing units is entitled to 18ms processing time for each10ms timeslice (using multiple processors).
There is no accumulation of unused cycles. If a partition does not use the entitled processingcapacity, the excess processing time is ceded back to the shared processing pool.
-
7/27/2019 Cpu Monitoring and Tunig SMIT
3/26
Partitions with shared processors are either capped or uncapped. The capped partition is
assigned with a hard limit capacity. If a partition needs an extra CPU cycle (more than its
total processing units), it can utilize unused capacity in the shared pool.
Back to top
Scheduling algorithms
AIX 5 implements the following scheduling policies: FIFO, round robin, and a fair round
robin. The FIFO policy has three different implementations: FIFO, FIFO2, and FIFO3. The
round robin policy is named SCHED_RR in AIX, and the fair round robin is called
SCHED_OTHER. We discuss these policies in greater detail in the upcoming sections.
Scheduling policies can have a major impact on system performance, depending on how one
assigns and manages them (response time and throughput). For example, FIFO is a goodchoice for the job that uses a lot of CPU, but it also can choke out all of the other jobs waiting
in line. A basic round robin gives a "timeslice" or "quantum" to each job in a time-shared
manner. As a result, it tends to discriminate against I/O-intensive tasks, since those tasks
often give up CPU voluntarily due to I/O wait. The fair round robin is "fair" because
scheduling priorities change as the jobs accumulate quantums of CPU time during execution.
This allows the operating system to demote a CPU hugger so that an I/O bound job has a fair
chance to use the CPU resource.
Let's go over two important concepts before getting into the scheduling details: the nice value
and the AIX priority and run queue structure.
The nice and renice commands
AIX has two important scheduling commands: nice and renice. A user job in AIX carries a
base priority level of 40 and a default nice value of 20. Together, these two numbers form
the default priority level of 60. This value applies to most of the jobs you see in a system.
When you start a job with a nice command, such as nice -n 10 myjob, the number 10
becomes the delta_NICE. This number is added to the default 20 to create the new nicevalue of 30. In AIX, the higher this number, the lower the priority. Using this example, your
job now starts with a priority of 70, which is 10 levels worse in priority than the default.
The renice command applies to a job that has already started. For example, the renice -n
5 -p 2345 command causes process 2345 to have a nice value of 25. Note that the renice
value is always applied to a base nice of 20, regardless of the current nice value of the
process.
AIX priority and run queue structure
A thread carries a priority range from 0 to 255 (the range is from 0 to 127 on systems prior to
AIX 5). Priority 0 is the highest or the most favorable, and 255 is the lowest or least
favorable. AIX maintains a run queue in the form of a 256-level priority queue to efficientlysupport the 256 priority levels of threads.
https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pcon -
7/27/2019 Cpu Monitoring and Tunig SMIT
4/26
AIX also implements a 256-bit array to map to the 256 levels of the queue. If a particular
queue level is empty, the corresponding bit is set to 0. This design allows the AIX scheduler
to quickly identify the first non-empty level and start the first ready-to-run job in that level.
See the AIX run queue structure inFigure 1below.
Figure 1. Scheduler run queue
InFigure 1, the scheduler maintains a run queue of all the threads that are ready to bedispatched. All dispatchable threads of a given priority occupy consecutive positions in the
run queue.
AIX 5L implements one run queue for each CPU and a global queue. For example, there are
32 run queues and one global queue in an eServer pSeries p590 machine. With a per-CPU
run queue, a thread has better chance to go back to the same CPU after a preemption, which
is an affinity enhancement. Also, the contention among CPUs to lock the run queue structure
is much reduced with multiple run queues.
However, for some situations, a multiple run queue structure might not be desirable.
Exporting a system environment variable RT_GRQ=ON can cause a thread to be placed on
the global run queue when it becomes runnable. This can improve performance for threads
that are interrupt-driven and running SCHED_OTHER. Ifschedo o fixed_pri_global=1 is run on AIX 5L Version 5.2 and later, threads running the fixed priority are placed on theglobal run queue.
For local run queues, the dispatcher picks the best priority thread in the run queue when a
CPU is available. When a thread has been running on a CPU, it tends to stay on that CPU's
run queue. If that CPU is busy, then the thread can be dispatched to another idle CPU and
assigned to that CPU's run queue.
FIFO
https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#figure1ahttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#figure1ahttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#figure1ahttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#figure1ahttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#figure1ahttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#figure1ahttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#figure1ahttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#figure1a -
7/27/2019 Cpu Monitoring and Tunig SMIT
5/26
Although the FIFO policy is the simplest, it is rarely used because of its non-preemptive
nature. A thread with this scheduling policy runs all the way to completion, unless one of the
following happens:
It gives up the CPU voluntarily by executing a function that would put the thread tosleep, such as sleep() orselect().
It gets blocked due to resource contention. It has to wait for I/O completion.
The checkout lane at a grocery store uses a typical FIFO policy. Imagine yourself in the
checkout lane with only one TV dinner (and you're hungry), but the person in front has a full
load in his cart. What can you do? Not much. Since this is a FIFO, you must wait patiently
for your turn.
Similarly, it is obvious that job response time can suffer severely if several tasks are running
FIFO mode in AIX. Consequently, FIFO is rarely used in AIX. Only a process owned by root
can set itself or another thread to FIFO with the thread_setsched() system call.
There are two variations of the FIFO policy: FIFO2 and FIFO3. FIFO2 says that a thread is
put at the head of its run queue if it was asleep for only a short period of time less than a
predefined number of ticks (affinity_lim ticks, tunable with the schedo -p command).
This allows a thread to have a good chance to reuse the cache content. For FIFO3, a thread is
always put at the head of the queue when it becomes runnable.
Round robin
The well-known round robin scheduling policy is even older than UNIX itself. AIX 5Limplements round robin on top of its multilevel priority queue of 256 levels. At a given
priority level, a round robin thread shares the CPU timeslices with all other entries of the
same priority. A thread is scheduled to run until one of the following occurs:
It yields the CPU to other tasks. It is blocked for I/O. It uses up its timeslice.
When the timeslice is exhausted, if a thread of equal or better priority is available to run on
that CPU, the thread that is currently running is then placed at the end of the queue for the
next turn to own the processor. A thread can be preempted because of a higher priority jobwaking up or a device interrupt (for example, after an I/O is done).
For a round robin task only, this preempted thread is placed at the beginning of its queue
level, because AIX wants to ensure that a round robin job has a full timeslice before it is
moved to the end of the round robin chain. It is important to note that the priority of a round
robin thread is fixed and does not change over time. This makes the priority of a round robin
task persistent (as opposed to the changing priorities in fair round robin) and more
predictable.
Since a round robin thread has special status, only root can set a thread to run with the round
robin scheduling policy. To set SCHED_RR for a thread, use one of the following applicationprogramming interfaces (APIs): thread_setsched() orsetpri().
-
7/27/2019 Cpu Monitoring and Tunig SMIT
6/26
SCHED_OTHER
This last scheduling policy is also the default. While trying to establish the fairest policy
among tasks, this innovative SCHED_OTHER algorithm was created with a not so innovative
POSIX-defined name. The AIX SCHED_OTHER is a priority-queue round robin design at the
core, with one major difference: the priority is no longer fixed. If a task is using an excessiveamount of CPU time, its priority level should be downgraded to allow other jobs an
opportunity to access the CPU.
If a task is at a priority level so low (a high number) that it does not have an opportunity to
run, then its priority should be upgraded to a higher level (a lower number) so it can run to
finish. A new concept was also implemented to further enhance the effectiveness of the nice
value: If a task is nice (the UNIX nice value) at the beginning, the system will then force it
to be nice all the time. I discuss this feature later.
Traditional CPU utilization
Prior to AIX 5.3 or with SMT disabled, AIX processor utilization uses a sample-based
approach to approximate:
Percentage of processor time spent executing user programs System code Waiting for disk I/O Idle time
AIX produces 100 interrupts per second to take samples. At each interrupt, a local timer tick
(10ms) is charged to the current running thread that is preempted by the timer interrupt. Oneof the following utilization categories is chosen based on the state of the interrupted thread:
If the thread was executing code in the kernel using system call, the entire tick ischarged to the process system time.
If the thread was executing application code, the entire tick is charged to the processuser time. Otherwise, if the current running thread was the operating system's idle
process, the tick is changed in a separate variable. The problem with this method is
the process receiving the tick most likely did not run for the entire timer period and
happened to be executing when the timer expired. With AIX 5.3 SMT enabled, the
traditional utilization metrics are misleading as treating due to the two logical
processors. If one thread is 100 percent busy, one idle thread would result in 50 percent
utilization. But in reality, if one SMT thread is using all CPU resources, then that
CPU is 100 percent busy, as reported using the new Processor Utilization Resource
Register- (PURR) based method.
PURR
Beginning in AIX 5.3, the number of dispatch cycles for each thread can be measured using a
new register called the PURR. Each physical processor has two PURR registers (one for each
hardware thread). The PURR is a new register provided by the POWER5 processor, which is
used to provide an actual count of physical processing time units that a logical processor hasused. All performance tools and APIs utilize this PURR value to report CPU utilization
-
7/27/2019 Cpu Monitoring and Tunig SMIT
7/26
metrics for SMT systems. This register is a special-purpose register that can be read or
written by the POWER Hypervisor; however, it is read-only by the operating system.
The hardware increments for PURRs is based on how each thread is using the resources of
the processor, including the dispatch cycles that are allocated to each thread. For a cycle in
which no instructions are dispatched, the PURR of the thread that last dispatched an
instruction is incremented. The register advances automatically so that the operating systemcan always get the current up-to-date value.
When the processor is in single-thread mode, the PURR increments by one every eight
processor clock cycles. When the processor is in SMT mode, the thread that dispatches a
group of instructions in a cycle increments the counter by 1/8 in that cycle. If no group
dispatch occurs in a given cycle, both threads increment their PURR by 1/16. Over a period
of time, the sum of the two PURR registers, when running in SMT mode, should be very
close, but not greater than the number of timebase ticks.
AIX 5.3 CPU utilization
In AIX 5L V5.3, there are new metrics that are collected by the kernel that are stated-based
rather than a sample-based approach. State-based is the collection of information based on
PURR increments rather than a set time of 10ms. AIX 5.3 uses PURR for process accounting.
Instead of charging the entire 10ms clock tick to the interrupted process as before, processes
are charged on the PURR delta for the hardware thread since the last interval. At each
interrupt:
The elapsed PURR is calculated for the current sample period. This value is added to the appropriated utilization category (user, sys, iowait, and
idle), instead of the fixed-size increment (10 ms) that was previously added.
There are two different ways to measure: the threads processor time and the elapsed
time. To measure the elapsed time, the time-based register (TB) is still used. The physical
resource utilization metrics for a logical processor are:
(delta PURR/delta TB) represents the fraction of the physical processor consumed bya logical processor.
(delta PURR/delta TB) * 100 over an interval represent the percentage of dispatchcycles given to a logical processor.
CPU utilization example
Assume two threads are running on one physical processor with SMT enabled. Both SMT
threads of a physical CPU are busy. Using the old tick-based method, both SMT threads
would be reported as 100 percent busy but, in reality, they are really sharing the CPU
resources evenly. This means the new PURR-based method would show each SMT thread as
50 percent busy.
Using the PURR methods, each logical processor reports a utilization of 50 percent
representing the proportion of physical processor resources that it used, assuming equal
distribution of physical processor resources to both the hardware threads.
Additional CPU utilization metrics
-
7/27/2019 Cpu Monitoring and Tunig SMIT
8/26
The following metrics uses the per-thread PURR method to measure the thread's processor
time and uses the TB register to measure the elapsed time.
Table 1. Per-thread PURR method
Additional CPU utilization metrics Information provided%sys=(delta PURR in system mode/entitled PURR) *
100 where entitled PURR (ENT * delta TB), and
ENT is entitlement in # of processors (entitlement/100)
Physical CPU utilization metrics are
calculated using the PURR-based
samples and entitlement.
sum (delta PURR/delta TB) for each logical processor
in a partition
The Physical Processor Consumed
over an interval.
(PPC/ENT) * 100The percentage of entitlement
consumed.
(delta PIC/delta TB) where PIC is the Pool Idle count,
which represents the clock ticks where POWER
Hypervisor was idle
It provides the available pool of
processors.
Sum of traditional 10ms tic-based %sys and %user
Logical processor utilization helpsyou to determine if more virtual
processors should be added to apartition.
Back to top
AIX 5.3 command changes
When AIX is running with SMT enabled, commands that display CPU information, such as
vmstat, iostat, topas, and sar, display the PURR-based statistics, rather than the
traditional sample-based statistics. In SMT mode, additional columns of information are
displayed, as show inTable 2below.
Table 2. SMT mode
Column Descriptionpc or physc Physical Processor Consumed by the partition
pec or %entcPercentage of Entitlement Consumed by the partition
Another tool that needed modification was trace/trcrpt and several other tools that are basedon the trace utility. In an SMT environment, trace can optionally collect PURR register
values at each trace hook, and trcrpt can display elapsed PURR.
Table 3below shows the arguments to use for an SMT.
Table 3. Arguments for SMT
Argument Description
trace r PURRCollects the PURR register values. Only valid for a trace run on a 64-
bit kernel.
trcrpt OPURR=[on|off]
Tells trcrpt to show the PURR, along with any timestamps.
https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#table2https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#table2https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#table2https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#table3https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#table3https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#table3https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#table2https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pcon -
7/27/2019 Cpu Monitoring and Tunig SMIT
9/26
netpmon r PURRUses the PURR time instead of timebase in percent and CPU
calculation. Elapsed time calculations are unaffected.
pprof r PURRUses the PURR time instead of timebase in percent and CPU
calculation. Elapsed time calculations are unaffected.
gprof GPROF is the new environment variable to support the SMT.
curt r PURR Specifies the use of PURR register to calculate CPU times.splat p Specifies the use of PURR register to calculate CPU times.
Back to top
Thread priority formulas
You can calculate the priority of a thread using the formulas, as shown inListing 2below. It
is a function of the nice value, the CPU usage c, and a tuning factorr.
Back to top
How AIX calculates the new priority
The clock timer interrupt occurs every 10ms or 1 tick on each CPU. The timers are staggered
so that a CPU's clock timer does not go off at the same time as another CPU's clock timer.
When the CPU clock timer interrupt occurs (even before the thread has run for a full 10ms),
the thread has its CPU usage value (the CPU charge) incremented by one, up to a maximum
of 120. If a job does not get a full 10ms slice and is running RR policy, the system dispatcherchanges the thread's priority in the run queue to allow it to run again soon.
The priority of most user processes varies with the amount of CPU time the process has used
recently. The CPU scheduler's priority calculations are based on two parameters that are set
with schedo, sched_R, and sched_D. The sched_R and sched_D values are in 1/32 seconds.
The scheduler uses this formula to calculate the amount to add to a process's priority value as
a penalty for recent CPU use. For example:
CPU penalty = (recently used CPU value of the process) * (r/32)
The recalculation (once per second) of the recently used CPU value of each process is:
New recently used CPU value = (old recently used CPU value of the process)* (d/32)
Both r (sched_R parameter) and d (sched_D parameter) have default values of 16.
The recent CPU charge C is then used to determine the priority penalty and to recalculate thenew thread priority. Using the first formula as a reference (seeListing 2), you know that a
https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing2https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing2https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing2https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing2https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing2https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing2https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing2https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing2https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pcon -
7/27/2019 Cpu Monitoring and Tunig SMIT
10/26
newly started user task, which carries a base priority 40, a default nice value of 20, and no
CPU charge so far (C=0), begins with a priority level 60.
Also, in the first formula, the value r determines the penalty ratio with a range from zero to
32. An r value of zero means a no-charge penalty for the CPU, since it is always zero
(C*r/32). Ifr=32, it yields the highest possible penalty charge for a CPU -- each tick (10ms)of CPU usage translates to one priority-level downgrade.
In most cases, the value ofr lies near the middle between zero and 32. AIX defaults r to 16;
that is, every two ticks of CPU charge become one level of priority penalty. When the r value
is high, the impact of a nice value becomes less important since the CPU usage penalty
prevails. A smallerr, on the contrary, makes the effect of the nice value more obvious.
Based on this discussion, the effectiveness of the nice value diminishes after a while. The
reason for this is because the CPU charge grows in time and gradually becomes the main
factor in determining the new priority.
This formula has been modified in AIX 5L to increase the weight of the nice value in
calculating the priority level. With all the different versions of AIX, two new factors have
been introduced : x_nice and x_nice_factor ("extra nice" and "extra nice factor"). See the
second formula inListing 2below.
Listing 2. Thread priority formulasPriority = p_nice + (C * r/32) (1)
Priority = x_nice + (C * r/32 * x_nice_factor) (2)Where:
p_nice = base_PRIORITY + NICEbase_PRIORITY = 40NICE = 20 + delta_NICE(20 is the default nice value)That is,
P_nice = 60 + delta_NICE
C is the CPU usage chargeThe maximum value of C is 120
If NICE 20 thenx_nice = p_nice * 2 - 60 orx_nice = p_nice + delta_NICE, or (3)x_nice = 60 + (2 * delta_NICE) (3a)x_nice_factor = (x_nice + 4)/64 (4)Priority has a maximum value of 255
As you can see from Formula 2 and Formula 3, the x_nice now has doubled the increased
nice value. The x_nice_factor further strengthens the r ratio. For example, an initial nice
16, which gives a nice value of 36, results in a new x_nice_factor of 1.5. This value is a 50
percent higher CPU charge penalty for the CPU usage part over the lifetime of the thread.
https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing2https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing2https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing2https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing2 -
7/27/2019 Cpu Monitoring and Tunig SMIT
11/26
Decaying the CPU usage
It is possible that a thread can get a priority so low that it never has a chance to run. This
would occur if you use only Formulas 1 and 2 without a mechanism to push a thread's
priority level back up.
When a thread runs with SCHED_OTHER, its priority is degraded for its use of CPU time. When
it is not running and is waiting for its turn, AIX tries to regain its priority by "decaying" its
CPU charges, about once a second. The rule is simple: A CPU-bound job should be assigned
a lower priority to allow other jobs to run, but it should not be discriminated against to the
point that it cannot finish itself. All threads' CPU charge is decayed based on a predefined
factor of once per second, as follows:
New Charge C = (Old Charge C) * d / 32 (5)
A kernel process Sweapperdoes this job. Once every second, Swapper wakes up and handlesthe CPU charge decaying for all the threads. The default decay factor is 0.5 or d=16, which
"discounts" or "waives" half of the CPU charge.
With this mechanism, a CPU-intensive job accumulates CPU charge, gets to a lower priority
level, and then advances to a much higher level at the end of a second. On the other hand, an
I/O-intensive job does not vary its priority up and down as much, since it generally
accumulates less CPU time.
Back to top
Have you exhausted your CPU?
Now that you understand how the AIX scheduler prioritizes the workload, let's look at several
commonly used commands. If AIX seems to take too long to finish your workload or it does
not respond quickly enough, try these commands to investigate whether your system is CPU-
bound: vmstat, iostat, and sar.
We do not discuss all the possible ways to use these commands, but instead emphasize the
information they convey to you. For a detailed description of these commands, see your AIXpublications or visit the IBM System p and AIX Information Center at
http://publib16.boulder.ibm.com/pseries/index.htm. Scroll down, if necessary, and clickAIX
Version 5L Version 5.3 Version 5.3 information centerto start using the AIX 5 publications.
The priority change history of a thread
Listing 3shows how the CPU charge can change the priority of a thread:
Listing 3. Change of CPU charge and the priority of a thread
Base priority is 40Default NICE value is 20, assume task was run using thedefault nice value
https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttp://publib16.boulder.ibm.com/pseries/index.htmhttp://publib16.boulder.ibm.com/pseries/index.htmhttp://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsphttp://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsphttp://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsphttp://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsphttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing3https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing3https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing3http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsphttp://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsphttp://publib16.boulder.ibm.com/pseries/index.htmhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pcon -
7/27/2019 Cpu Monitoring and Tunig SMIT
12/26
p_nice = base_priority + NICE = 40 + 20 = 60Assume r = 2 to slow down the penalty increase (defaultr value is 16)Priority = p_nice + C*r/32 = 60 + C * r / 32Tick 0 P = 60 + 0 * 2 / 32 = 60Tick 1 P = 60 + 1 * 2 / 32 = 60
Tick 2 P = 60 + 2 * 2 / 32 = 60.Tick 15 P = 60 + 15 * 2 / 32 = 60Tick 16 P = 60 + 16 * 2 / 32 = 61Tick 17 P = 60 + 17 * 2 / 32 = 61..Tick 100 P = 60 + 100 * 2 / 32 = 66Tick 100 Swapper decays all CPU usage charges for all threads.New C CPU Charge = (Current CPU Charge) * d / 32Assume d = 16 (the default)For the test thread, new C = 100 * 16 / 32 = 50
Tick 101 P = 60 + 51 * 2 / 32 = 63
Listing 4shows how to specify a fast or slow priority:
Listing 4. Priority change of a typical CPU-bound job (fast verses slow)fast.c:main(){for (;;)}
slow.c:
main() {sleep 80;}
Back to top
Common commands
The vmstat, iostat, and sar commands are used frequently for CPU monitoring. You
should be familiar with the usage and the meaning of the reports each command generates.
vmstat
The vmstat command provides an overview of resource utilization through a report of CPU,
disk, and memory activity in a one-line-per-report format. The sample output inListing 5is
generated on an AIX 5L Version 5.3 system running "vmstat 1 6". This report was
generated every second, as requested. Since a count of six was specified following the
interval, reporting stops after the sixth report. One popular way to run the vmstat command
is to leave out the count parameter; vmstat then generates reports continuously until the
command terminates.
https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing4https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing4https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing5https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing5https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing5https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing5https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing4 -
7/27/2019 Cpu Monitoring and Tunig SMIT
13/26
Except for the avm and fre columns, the first report contains average statistics per second
since system startup. Subsequent reports contain statistics collected during the interval since
the previous report.
Beginning with AIX 5L Version 5.3, the vmstat command reports the number of physical
processors consumed (pc) and the percentage of entitlement consumed (ec) in the Micro-Partitioning and SMT environments. These metrics only display on Micro-Partitioning and
SMT environments.
AIX 5L adds a useful new option "-I" to vmstat that shows the number of threads waiting
for the raw I/O to complete (p column) and the number of file pages paged in/out per second
(fi/fo columns).
The following detailed descriptions of the columns convey useful information about CPU
utilization.Listing 5shows the output of the vmstat 1 6 command:
Listing 5. Output of the vmstat 1 6 command from a p520 system (two CPUs)vmstat 1 6System configuration: lcpu=4 mem=15808MBkthr memory page faults cpu----- ------- ------ -------- -----------r b avm fre re pi po fr sr cy in sy cs us sy id wa1 1 110996 763741 0 0 0 0 0 0 231 96 91 0 0 99 00 0 111002 763734 0 0 0 0 0 0 332 2365 179 0 1 99 00 0 111002 763734 0 0 0 0 0 0 330 2283 139 0 5 93 10 0 111002 763734 0 0 0 0 0 0 310 2212 153 0 0 99 01 0 111002 763734 0 0 0 0 0 0 314 2259 173 0 0 99 0
0 0 111002 763734 0 0 0 0 0 0 321 2261 177 0 1 99 0
Figure 2shows the output of the command vmstat -I 1 (issued during a software
installation):
Figure 2. Output of the vmstat -I 1 command
XML error: The image is not displayed because the width is greater than the maximum of
580 pixels. Please decrease the image width.
SeeTable 4below for a listing of relevant columns with descriptions.
Table 4. Description of relevant columns
ColumnDescription
kthr Kernel thread state changes per second over the sampling interval.
r Number of kernel threads placed in run queue.
b Number of kernel threads placed in the Virtual Memory Manager (VMM) waitqueue (awaiting resource, awaiting input/output).
https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing5https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing5https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing5https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#figure2https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#figure2https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#table4https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#table4https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#table4https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#table4https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#figure2https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing5 -
7/27/2019 Cpu Monitoring and Tunig SMIT
14/26
pThe number of threads waiting on raw I/Os (bypassing journaled file system (JFS))
to complete. This is only available on AIX 5 and later.
fi/foNumber of file pages paged in/out per second. Note: This column is available only
on AIX 5 and later systems.
cpu
Breakdown of percentage usage of CPU time. For multiprocessor systems, CPU
values are global averages among all processors. Also, the I/O wait state is definedsystem-wide and not per processor.
us Average percentage of CPU time executing in the user mode.
sy Average percentage of CPU time executing in the system mode.
idAverage percentage of time that CPUs were idle and the system did not have an
outstanding disk I/O request.
a
CPU idle time during which the system had outstanding disk/NFS I/O request(s). If
there is at least one outstanding I/O to a disk when wait is running, the time is
classified as waiting for I/O. Unless asynchronous I/O is being used by the process,
an I/O request to disk causes the calling process to block (or sleep) until the request
has been completed. Once an I/O request for a process completes, it is placed on the
run queue. If the I/Os were completing faster, more CPU time could be used.
pcNumber of physical processors consumed. Displayed only if the partition is running
with shared processor.
ecThe percentage of entitled capacity consumed. Displayed only if the partition is
running with the shared processor.
A CPU is marked wio at the time of a clock interrupt (every 1/100 ms), if the CPU is idling
and an outstanding I/O was initiated on that CPU. If a CPU is only idling with no outstanding
I/O from that CPU, it is marked as id instead ofwa. For example, a system with four CPUs
and one thread doing I/O reports a maximum of 25 percent wio time. A system with 12 CPUs
and one thread doing I/O reports a maximum of 8.3 percent wio time. To be precise, the wiomeasures the percent of time the CPU is idle as it waits for an I/O to complete.
These four columns should total 100 percent, or very close. If the sum of user and system (us
and sy) CPU-utilization percentages consistently approach a 100 percent, the system might
be encountering a CPU bottleneck.
iostat
The iostat command is used primarily to monitor system input and output devices, but it
can also provide CPU utilization data. Beginning with AIX 5.3, the iostat command reports
number of physical processors consumed (physc) and the percentage of entitlementconsumed (% entc) in Micro-Partitioning and SMT environments. These metrics are only
displayed on Micro-Partitioning/SMT environments. When SMT is enabled, iostat
automatically uses a new PURR-based data and formula for:
%user %sys %wait %idle
Listing 6is generated on an AIX 5L Version 5.3 system by entering "iostat 5 3", as
follows:
https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing6https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing6https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing6 -
7/27/2019 Cpu Monitoring and Tunig SMIT
15/26
Listing 6. iostat reportSystem configuration: lcpu=4 drives=9
tty: tin tout avq-cpu: %user %sys %idle %iowait0.0 4.3 0.2 0.6 98.8 0.4
Disks: %tm_act Kbps tps Kb_read Kb_wrtnhdisk0 0.0 0.2 0.0 7993 4408hdisk1 0.0 0.0 0.0 2179 1692hdisk2 0.4 1.5 0.3 67548 59151cd0 0.0 0.0 0.0 0 0tty: tin tout cpu: %user %sys %idle %iowait
0.0 30.3 8.8 7.2 83.9 0.2Disks: %tm_act Kbps tps Kb_read Kb_wrtnhdisk0 0.2 0.8 0.2 4 0hdisk1 0.0 0.0 0.0 0 0hdisk2 0.0 0.0 0.0 0 0cd0 0.0 0.0 0.0 0 0tty: tin tout cpu: %user %sys %idle %iowait
0.0 8.4 0.2 5.8 0.0 93.8Disks: %tm_act Kbps tps Kb_read Kb_wrtnhdisk0 0.0 0.0 0.0 0 0hdisk1 0.0 0.0 0.0 0 0hdisk2 98.4 75.6 61.9 396 2488cd0 0.0 0.0 0.0 0 0
Example iostat with SPLAR configuration#iostat t 2 3System Configuration: lcpu=4 ent=0.80avg-cpu %user %sys %idle %iowait physc %entc
0.1 0.2 99.7 0.0 0.0 0.90.1 0.4 99.5 0.0 0.0 1.1
0.1 0.2 99.7 0.0 0.0 0.9
Just like the vmstat command report, the first report contains statistic averages since thesystem started up. Subsequent reports contain statistics collected during the interval since the
previous report.
The four columns that show the breakdown of CPU usage time convey the same information
as the vmstat command. The columns should total approximately 100 percent. If the sum of
user and system (us and sy) CPU-utilization percentages consistently approach 100 percent,
the system might be encountering a CPU bottleneck.
On systems running one application, a high I/O wait percentage might be related to the
workload. On systems with many processes, some will be running while others wait for I/O.
In this case, the %iowait can be small or zero because running processes "hide" some wait
time. Although %iowait is low, a bottleneck can still limit application performance. If the
iostat command indicates that a CPU-bound situation does not exist and %iowait time is
greater than 20 percent, you might have an I/O or disk-bound situation.
sar
The sar command has two forms: The first form samples, displays, and/or saves systemstatistics and the second form processes and displays previously captured data. The sar
-
7/27/2019 Cpu Monitoring and Tunig SMIT
16/26
command can provide queue and processor statistics just like the vmstat and iostat
commands. However, it has two additional features:
Each sample has a leading time stamp, so an overall average appears at the end of thesamples.
The -P option can be used to generate per-processor statistics, in addition to theglobal averages among all processors. The sample code below shows sample output
from a four-way symmetric multiprocessor (SMP) system that resulted from entering
two commands:
osar -o savefile 5 3 > /dev/null &
oNote: This command collects the data three times at five-second intervals,
saves the collected data in savefile, and redirects the report to null so that noreport is written to the terminal.
osar -P ALL -u -f savefile
oo Note: The -P ALL is specified to get per-processor statistics for each
individual processor and -u CPU usage data. In addition, -f savefile tells
sar to generate the report using the data saved in savefile. The sar P
All output for all logical processors with SMT enabled shows the physical
processor consumed physc (delta PURR/delta TB). This column shows therelative SMT split between processors -- in other words, it illustrates the
measurement of fraction of time a logical processor was getting physical
processor cycles. Whenever the percentage of entitled capacity consumed is
under 100 percent, a line beginning with U is added to represent the unused
capacity. When running in shared mode, sar displays the percentage of
entitlement consumed %entc, which is ((PPC/ENT)*100).
Listing 7. A typical sar report from a 2-way p520 system with dedicated LPAR
configurationAIX nutmeg 3 5 00CD241F4C00 06/14/05
System configuration: lcpu=4
11:51:33 cpu %usr %sys %wio %idle physc11:51:34 0 0 0 0 100 0.30
1 1 1 1 98 0.692 2 1 0 96 0.693 0 0 0 100 0.31- 1 1 0 98 1.99
11:51:35 0 0 0 0 100 0.311 0 0 0 100 0.692 0 0 0 100 0.733 0 0 0 100 0.31- 0 0 0 100 2.04
-
7/27/2019 Cpu Monitoring and Tunig SMIT
17/26
11:51:36 0 0 0 0 100 0.311 0 0 0 100 0.692 0 0 0 100 0.703 0 0 0 100 0.31- 0 0 0 100 2.01
11:51:37 0 0 0 0 100 0.31
1 0 0 0 100 0.692 0 0 0 100 0.693 0 0 0 100 0.31- 0 0 0 100 2.00
Average 0 0 0 0 100 0.311 0 0 0 99 0.692 1 0 0 99 0.703 0 0 0 100 0.31- 0 0 0 99 2.01
mpstat
The mpstat command collects and displays performance statistics for all logical CPUs in the
system. If SMT is enabled, the mpstat s command displays physicals as well as usage of
logical processors, as shown inListing 8below.
Listing 8. A typical mpstat report from a 2-way p520 system with SPLAR configurationSystem configuration: lcpu=4
Proc0 Proc1
63.65% 63.65%
cpu2 cpu0 cpu1 cpu358.15% 5.50% 61.43% 2.22%
lparstat
The lparstat command provides a report of LPAR-related information and utilization
statistics. This command provides a display of current LPAR-related parameters and
hypervisor information, as well as utilization statistics for the LPAR. An interval mechanismretrieves numbers of reports at a certain interval.
The following statistics are displayed only when the partition type is shared:
physc Shows the number of physical processors consumed.
%entcShows the percentage of the entitled capacity consumed.
lbusyShows the percentage of logical processor(s) utilization that occurred while executing
at the user and system level.
app Shows the available physical processors in the shared pool.
phintShows the number of phantom (targeted to another shared partition in this pool)
interruptions received.
https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing8https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing8https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing8https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing8 -
7/27/2019 Cpu Monitoring and Tunig SMIT
18/26
The following statistics are displayed only when the -h flag is specified:
%hypvShows the percentage of time spent in hypervisor.
hcalls Shows number of hypervisor calls executed.
Listing 9. A typical lparstat report from a 2-way p520 machineSystem configuration: type=Dedicated mode=Capped smt=On lcpu=4 mem=15808
%user %sys %wait %idle----- ---- ----- -----
0.0 0.1 0.0 99.90.0 0.1 0.0 99.90.4 0.2 0.1 99.3
# lparstat 1 3
System configuration: type=Shared mode=Uncapped smt=On lcpu=2 mem=2560ent=0.50
%user %sys %wait %idle physc %entc lbusy app vcsw phint----- ---- ----- ----- ----- ----- ------ --- ---- -----
0.3 0.4 0.0 99.3 0.01 1.1 0.0 - 346 043.2 6.9 0.0 49.9 0.29 58.4 12.7 - 389 00.1 0.4 0.0 99.5 0.00 0.9 0.0 - 312 0
Back to top
Improving system performance
For a CPU-bound system, you can improve the system performance by manipulating thread
and process priorities of a specific process or tuning the scheduler algorithm to set a different
system-wide scheduling policy.
Changing user-process priority
The commands to change or set user task priority include the nice and renice commandsand two system calls that allow thread priority and scheduling policy to be changed through
API calls.
Using the nice command
The standard nice value of a foreground process is 20; the standard nice value of a
background process is 24, if started from ksh orcsh (20, if started by tcsh and bsh). The
system uses the nice value to calculate the priority of all threads associated with the process.
Using the nice command, a user can specify an increment or decrement to the standard nice
value so that a process can be started with a different priority. The thread priority is still non-
fixed and gets different values based on the thread's CPU usage.
By using nice, any user can run a command at a lower priority than normal. Only root can
use nice to run commands at a priority higher than normal. For example, the command nice-5 iostat 10 3 >iostat.out causes the iostat command to start with a nice value of 25
https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pcon -
7/27/2019 Cpu Monitoring and Tunig SMIT
19/26
(instead of 20), resulting in a lower starting priority. The values ofnice and priority can be
viewed using the ps command with the -l flag.Listing 10shows a typical output using the
ps -l command:
Listing 10. Using ps -l to observe process priorityF S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
240001 A 0 15396 5746 1 60 20 393ce 732 pts/3 0:00 ksh200001 A 0 15810 15396 3 70 25 793fe 524 pts/3 0:00
iostat
As root, you can run iostat at a higher priority with # nice --5 vmstat 10 3 >io.out.
The iostat command can run with a nice value of 15, resulting in a higher starting priority.
Using the renice command
If a process is already running, you can use the renice command to alter the nice value, and
thus the priority. The processes are identified by process ID, process group ID, or the name of
the user who owns the processes. The renice command cannot be used on fixed priority
processes.
Using the setpri() and thread_setsched() subroutines
There are now two system calls that allow users to make individual processes or threads to be
scheduled with fixed priority. The setpri() system call is process-oriented and
thread_setsched() is thread-oriented. Use caution when calling these two subroutines,
since improper use might cause the system to hang.
An application that runs under the root user ID can invoke the setpri() subroutine to set its
own priority or the priority of another process. The target process is scheduled using the
SCHED_RR scheduling policy with a fixed priority. The change is applied to all the threads in
the process. Note the following two examples:
retcode = setpri(0, 45);
Gives the calling process a fixed priority of 45.
retcode = setpri(1234, 35);
Gives the process with PID of 1234 a fixed priority of 35.
If the change is intended for a specific thread, the thread_setsched() subroutine can be
used:
retcode = thread_setsched(thread_id,priority_value, scheduling_policy)
The parameter scheduling_policy can be one of the following:
https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing10https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing10https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing10https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing10 -
7/27/2019 Cpu Monitoring and Tunig SMIT
20/26
SCHED_OTHER, SCHED_FIFO, or SCHED_RR.
When SCHED_OTHER is specified as the scheduling policy, the second parameter
(priority_value) is ignored.
Changing the scheduling algorithm globally
AIX allows users to make changes to the priority calculation formula using the schedo
command.
Adjusting r and d
As mentioned earlier, the formula for calculating the priority value is as follows:
Priority = x_nice + (C * r/32 * x_nice_factor)
The recent CPU usage value is displayed as the C column in the ps command output. Themaximum value of recent CPU usage is 120. Once every second, the CPU usage value for
each thread is degraded using the following formula:
New Charge C = (Old Charge C) * d / 32
The default value ofr is 16; therefore, the thread priority is penalized by recent CPU usage
* 0.5. The d also has a default value of 16, which means the recent CPU usage value ofevery process is reduced to half of its original value once every second. For some users, the
default values ofsched_R and sched_D do not allow enough distinction between foreground
and background processes. These two values can be tuned using sched_R and sched_D
options to the schedo command. Note the following two examples:
# schedo -o sched_R=0
(R=0, D=.5) indicates that the CPU penalty was always 0. The priority value of the
process would effectively be fixed, although it is not treated like an RR process.
# schedo -o sched_D=32
(R=0.5, D=1) indicates that long-running processes would reach a C value of 120 and
stay there. The recent CPU usage value does not get reduced once every second and
-
7/27/2019 Cpu Monitoring and Tunig SMIT
21/26
the priority of long-running processes would not fluctuate back to low numbers
(higher importance) to compete with new processes.
Changing the timeslice
Although the schedo command can modify the length of the scheduler timeslice, thetimeslice change only applies to RR threads. This does not affect threads running with other
scheduling policies. The syntax for this command is:
schedo -L timeslice
n is the number of 10ms clock ticks to be used as the timeslice. schedo -p -o timeslice=2would set the timeslice length to 20ms.
You must log on as root to make changes using the schedo command.
Back to top
Using additional techniques
Other techniques that can help a CPU-bound system include the following.
Scheduling
Depending on the relative importance of applications, you could schedule less important ones
for off-shift hours using at, cron, orbatch commands.
Using the mkpasswd command
If your system has thousands of entries in the /etc/passwd file, you could use mkpasswd
command to create a hashed or indexed version of the /etc/passwd file to save CPU time
spent in looking up a user ID.
Back to top
Tuning individual applications
The following techniques can help you diagnose and improve the performance of specific
applications running under AIX.
Using the ps command
The ps command or profiling can identify an application that is consuming large fractions of
CPU time. This information can then be used to narrow the search for a CPU bottleneck.
https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pcon -
7/27/2019 Cpu Monitoring and Tunig SMIT
22/26
After you find the problem area, you can tune up or improve the application. You might need
to recompile the application or change the source code.
Using the schedo command
The schedo command is used to set or display current or next boot values for all CPUscheduler tuning parameters. This command can only be executed by the root user. The
schedo command can also make permanent changes or defer changes until the next reboot.Beginning with AIX 5L Version 5.3, several tuning parameters have been added to the
schedo command.Listing 11shows all the CPU scheduler parameters.
Listing 11. CPU scheduler parameters# schedo -a
%usDelta = 100affinity_lim = 7
big_tick_size = 1fixed_pri_global = 0force_grq = 0
hotlocks_enable = 0idle_migration_barrier = 4
krlock_confer2self = n/akrlock_conferb4alloc = n/a
krlock_enable = n/akrlock_spinb4alloc = n/akrlock_spinb4confer = n/a
maxspin = 16384n_idle_loop_vlopri = 100
pacefork = 10sched_D = 16sched_R = 16
search_globalrq_mload = 256search_smtrunq_mload = 256setnewrq_sidle_mload = 384shed_primrunq_mload = 64sidle_S1runq_mload = 64sidle_S2runq_mload = 134sidle_S3runq_mload = 134sidle_S4runq_mload = 4294967040slock_spinb4confer = 1024
smt_snooze_delay = 0smtrunq_load_diff = 2
timeslice = 1unboost_inflih = 1v_exempt_secs = 2v_min_process = 2
v_repage_hi = 0v_repage_proc = 4
v_sec_wait = 1
Upgrading
Upgrading the system to a faster CPU or more CPUs might be necessary if tuning does notimprove the performance.
https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing11https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing11https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing11https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#listing11 -
7/27/2019 Cpu Monitoring and Tunig SMIT
23/26
Back to top
Case studies
Two real-world examples show how the performance experts from IBM implemented these
theories and techniques.
Case 1
Symptoms: The user has a batch script that starts up 500 other batch scripts, and each of
these scripts queries and updates a database. Each script also starts as a client request from
another machine. Each client request creates a database user thread on the database server
machine. The response time began at less than 10 seconds for a period of time. Then the
response time gradually became worse. At times it was more than a minute -- sometimes two
minutes.
Diagnosis: The run queue began growing until it reached into the hundreds. Another
symptom included the CPU being 100 percent utilized (this was an eight-way SMP system),
with 99 percent in user mode. By examining an AIX trace sample collected for a few
seconds, we saw a pattern emerge. While a thread was using the CPU, a network packet
would arrive and cause a network adapter interrupt. This would take the currently running
thread off its CPU so the interrupt could be serviced.
After servicing the interrupt, the scheduler verifies if any other threads are runnable and have
a better priority than the currently running thread. Since the currently running thread had run
for a few timeslices already, its CPU priority had increased as it accumulated CPU ticks.
Each of the 500 scripts began with priority 60. If they were runnable, they would preempt any
currently running thread with a thread priority higher than 60. The preempted thread would
then be put at the end of the run queue and have to wait for the CPU until its priority rose
again.
One effect of this preemption was that sometimes a thread would be preempted while holding
a database lock. Since this type of lock is implemented at the application layer within the
database software, the kernel does not know that the thread is holding a lock. If the lock was
a kernel-level lock or a pthread library mutex lock, then the kernel could perform priority
boosting and boost a thread's priority to the same level as that of a running thread that isrequesting the lock. This way, the requesting thread does not have to wait long for the lock
holder to get the CPU again and release the lock.
Since the lock in this scenario was a user lock, the database thread would spin on the lock
until it exhausted its spin count (a tunable database parameter), and then go to sleep. So the
99 percent used CPU was mostly due to the threads spinning on database locks.
Prescription: After determining that priority preemption was having a negative effect, we
tuned the scheduler formula, which calculates the thread priority. This particular formula is:
pri = base_pri + NICE + (C * r/32)
https://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pconhttps://www.ibm.com/developerworks/aix/library/au-aix5_cpu/#ibm-pcon -
7/27/2019 Cpu Monitoring and Tunig SMIT
24/26
pri is the new priority, base_pri is 40, NICE is the nice value (20 in this case), C is the CPU
usage in ticks, and r is 16.
As a thread accumulates CPU ticks, its priority value becomes larger, thereby making its
priority lower.
The schedo command provides a way to change the value ofr by using the sched_R option.
Running schedo -p -o sched_R=0 causes r to be 0, which then causes the CPU penalty
factor (C * r/32) to be 0. This prevents priorities from changing, unless the nice value is
changed. If the nice value is the same for all threads, then threads can complete their
timeslices without being preempted due to priority changes. This allows the thread that is
currently running and holding the database lock to keep running and then release the lock.
Results: These changes had an instantaneous impact on the performance. The response time,
which was over two minutes by this time, started getting better until all of the scripts were
completing in just a few seconds. The C value in the priority formula is recalculated once a
second by a CPU usage decay factor (C = C*d/32). Setting the d value to 0 when using the
schedo command would have accomplished the same result. In this case, ifd=0, then C*d/32
= 0. Since the CPU penalty factor is C*r/32, this also becomes 0 so that the priority will be
just 40 + NICE.
Case 2
Symptoms: A pSeries machine was used as both a database and an application server. Users
would input requests into a forms-based application and then submit the transactions. They
noticed that at certain times the forms would take longer to get updated on their screens and
their usual short-running queries would return in a longer time period.
Diagnosis: When this slowness was observed, there were also some long-running database
batch jobs that were submitted to the system. Normally, such batch jobs would be run at
night, but near the end of the month additional batch jobs were run during the day while the
users were on the system. The batch jobs were CPU-intensive and constantly on the run
queue. Therefore, users' threads had to compete with the threads of the batch jobs for the
CPU.
With priorities degrading as CPU usage increased, the batch jobs' priorities became worse
and allowed the users' threads to run. However, the kernel decays the CPU usage value C by
half once a second. This allowed the priorities of the batch jobs to improve in a short timeperiod. So the batch jobs would again compete for the CPU with the users' threads.
Prescription: By changing the decay factor (d/32) used to reduce CPU usage once a second,
we improved performance for the users. We used the schedo command to set the d value to
31. The higher the value ofd; the higher the value ofC (C=C*d/32).
Since C is used to calculate priorities (pri=40+NICE+C*r/32), the priority would get worse as
C became larger. By setting the d value to a higher number, the C value is reduced at a slower
than usual rate.
-
7/27/2019 Cpu Monitoring and Tunig SMIT
25/26
Results: The users' threads get the CPU more often than the batch threads. As a result, the
users saw an immediate improvement in performance. Of course, the batch jobs would be
slowed down somewhat, but these jobs would get the CPU whenever the users had any
"think" time or had to wait on I/O. The impact was minimal on the batch jobs, but
performance improvement for the users was dramatic.
Case study notes: Tracing a pattern
A final tip describes some odd things that impact performance. During one of our
benchmarks, we noticed that the CPU usage reached 100 percent, with most of the time being
charged to "system". At that time, the application performance degraded noticeably.
After we collected an AIX trace, we noticed a repeating pattern. One application process
would encounter a page fault on an address. That page fault caused a protection exception in
the VMM, which in turn caused the kernel to send this process a SIGSEGV (segmentationviolation) signal. When the process resumed, the page faulted on the same address again,
which then caused yet another protection exception and anotherSIGSEGV signal to be sent to
the process. The default signal disposition for the SIGSEGV signal is to kill the process andgenerate a core dump, but in this case, the application continued on and stayed in this loop.
Most of the CPU time was spent in this loop.
After investigation, we discovered the problem: A developer for another component had
installed a signal handler to catch the SIGSEGV signal in the code during the test process.
After the testing was completed, the developer had forgotten to remove the signal handler.
That component then linked with the rest of the application and, during the benchmark,
another unrelated component of the application caused a segmentation fault. This old signal
handler caught the exception, ignored it, and caused the process to resume. The currentinstruction (the one which caused the exception) was then restarted, causing an infinite loop
to occur.
Resources
TheAIX 5L Support for Micro-Partitioning and Simultaneous Multi-threadingwhitepaper describes the simultaneous multi-threading and optionally, Micro-Partitioning
new technologies and the AIX 5L support for them.
The articleOperating system exploitation of the POWER5 systemdiscusses how newperformance features deliver improved system scalability and performance.
TheAIX 5L Differences Guide Version 5.3 EditionRedbook focuses on thedifferences introduced in AIX 5L Version 5.3 when compared to AIX 5L Version 5.2.
TheCapped and Uncapped Partitions in IBM POWER5whitepaper introduces andexplains the concepts of capped and uncapped partitions and discusses priority
weighting and CPU utilization by memory pools.
http://www-1.ibm.com/servers/aix/whitepapers/aix_support.pdfhttp://www-1.ibm.com/servers/aix/whitepapers/aix_support.pdfhttp://www-1.ibm.com/servers/aix/whitepapers/aix_support.pdfhttp://www.research.ibm.com/journal/abstracts/rd/494/mackerras.htmlhttp://www.research.ibm.com/journal/abstracts/rd/494/mackerras.htmlhttp://www.research.ibm.com/journal/abstracts/rd/494/mackerras.htmlhttp://www.redbooks.ibm.com/abstracts/SG247463.htmlhttp://www.redbooks.ibm.com/abstracts/SG247463.htmlhttp://www.redbooks.ibm.com/abstracts/SG247463.htmlhttps://www.ibm.com/partnerworld/wps/servlet/ContentHandler?contentId=cnt5j0ZSCR$cYX4MDAD&roadMapId=aix5lsol&roadMapName=Porting+your+Solaris+solution+to+AIX+Version+6.1&locale=en_UShttps://www.ibm.com/partnerworld/wps/servlet/ContentHandler?contentId=cnt5j0ZSCR$cYX4MDAD&roadMapId=aix5lsol&roadMapName=Porting+your+Solaris+solution+to+AIX+Version+6.1&locale=en_UShttps://www.ibm.com/partnerworld/wps/servlet/ContentHandler?contentId=cnt5j0ZSCR$cYX4MDAD&roadMapId=aix5lsol&roadMapName=Porting+your+Solaris+solution+to+AIX+Version+6.1&locale=en_UShttps://www.ibm.com/partnerworld/wps/servlet/ContentHandler?contentId=cnt5j0ZSCR$cYX4MDAD&roadMapId=aix5lsol&roadMapName=Porting+your+Solaris+solution+to+AIX+Version+6.1&locale=en_UShttp://www.redbooks.ibm.com/abstracts/SG247463.htmlhttp://www.research.ibm.com/journal/abstracts/rd/494/mackerras.htmlhttp://www-1.ibm.com/servers/aix/whitepapers/aix_support.pdf -
7/27/2019 Cpu Monitoring and Tunig SMIT
26/26
TheAIX 5L Practical Performance Tools and Tuning GuideRedbook acomprehensive guide about the performance monitoring and tuning tools that are
provided with AIX 5L Version 5.3.
Want more? The developerWorksAIX and UNIXzone hosts hundreds of informativearticles and introductory, intermediate, and advanced tutorials.
Get involved in the developerWorks community by participating indeveloperWorksblogs.
http://www.redbooks.ibm.com/abstracts/SG246478.htmlhttp://www.redbooks.ibm.com/abstracts/SG246478.htmlhttp://www.redbooks.ibm.com/abstracts/SG246478.htmlhttp://www.ibm.com/developerWorks/aix/http://www.ibm.com/developerWorks/aix/http://www.ibm.com/developerWorks/aix/http://www.ibm.com/developerworks/blogs/http://www.ibm.com/developerworks/blogs/http://www.ibm.com/developerworks/blogs/http://www.ibm.com/developerworks/blogs/http://www.ibm.com/developerworks/blogs/http://www.ibm.com/developerworks/blogs/http://www.ibm.com/developerWorks/aix/http://www.redbooks.ibm.com/abstracts/SG246478.html