Analyzing Esxtop Data

6

Click here to load reader

Transcript of Analyzing Esxtop Data

Page 1: Analyzing Esxtop Data

8/9/2019 Analyzing Esxtop Data

http://slidepdf.com/reader/full/analyzing-esxtop-data 1/6

Analyzing esxtop data

 by admin

I’ve recently written a post about how to collect data with esxtop and resxtop, but how do youinterpret that data? esxtop is a great tool for troubleshooting and determining id there are any

capacity issues in your environment. There are many metrics available, too many to cover in just

this one post, so I will concentrate on the ones used most often when investigating issues related

to storage, network, cpu and memory capacityperformance.

Analyzing Disk Performance with esxtop

There are three screens in esxtop relating to disk performance. There is the disk device screen

!accessed by pressing "u’#

 8:51:42am up 13:29, 313 worlds, 4 VMs, 4 vCPUs; CPU load average: 0.02, 0.15,0.05

!V"C! P#$%&'()*&P#)$"$"(+ *!+ '*!+ #C$VU! -U *(# CM&s )!#&s 'mp/.vma1:C0:$0:*0 32 00 0 0.00 11.51 9.92mp/.vma1:C0:$1:*0 32 00 0 0.00 0.00 0.00mp/.vma1:C0:$2:*0 32 00 0 0.00 0.00 0.00mp/.vma32:C0:$0:*0 1 00 0 0.00 0.00 0.0010.405!4494C4540013C2555862# 128 00 0 0.00 0.00 0.00

$nd the disk adapter screen, accessed by pressing "d’#

 8:52:18am up 13:29, 313 worlds, 4 VMs, 4 vCPUs; CPU load average: 0.02, 0.15,0.05

 ##P$) P#$% +P$% CM&s )!#&s ')"$!&s M7)!#&s M7')$+&s#V&md #V&md #V&md #V& vma0 0 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 0

 vma1 3 5.94 5.54 0.40 0.01 0.000.19 0.01 0.20 0vma32 1 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 0vma33 2 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 0

The last one is the %& 'isk screen, accessed by pressing "v’#

Page 2: Analyzing Esxtop Data

8/9/2019 Analyzing Esxtop Data

http://slidepdf.com/reader/full/analyzing-esxtop-data 2/6

 4:43:5pm up 1 da 1:52, 306 worlds, 1 VMs, 1 vCPUs; CPU load average: 0.02,0.02, 0.01

  " VM+#M! V!V+#M! +V" CM&s )!#&s ')"$!&s M7)!#&sM7')$+&s *#$&rd *#$&wr  83880 <P 1 0.00 0.00 0.00 0.000.00 0.00 0.00

The main disk latency metrics to be aware of here, as described in this () article, are#

• CMDS/s * This is the total amount of commands per second, which includes I+- and

other --I commands !e.g. reservations and locks/. 0enerally speaking &'-s 1 I+-

unless there are a lot of other --I operationsmetadata operations such as reservations.

• DAVG/cmd * This is the average response time in milliseconds per command being sent

to the storage device.

•AVG/cmd * This is the amount of time the command spends in the %&(ernel.

• GAVG/cmd * This is the response time as experienced by the 0uest +-. This is

calculated by adding together the '$%0 and the ($%0 values.

$s a general rule '$%0cmd, ($%0cmd and 0$%0cmd should not exceed 23 milliseconds

!ms/ for sustained lengths of time.

There are also the following throughput metrics to be aware of#

• CMDS/s  * $s discussed above

• !"ADS/s  * 4umber of read commands issued per second

• #!$%"S/s * 4umber of write commands issued per second

• M&!"AD/s * &egabytes read per second

• M&#!%'/s 5 &egabytes written per second

Analyzing CP( Performance with esxtop

)efore looking at the metrics, I want to say a little bit about 6orlds. $ world, as viewed in

esxtop, is an entity that the %&(ernel schedules resources for, similar to a process in 6indows,for example. $ powered on virtual machine will consist of multiple worlds, with each allocated

v7, for example, having its own world. 6hen you look at a %& in the 7 few of esxtop

you are looking at the world group for the %& which contains all the worlds the make up therunning virtual machine.

Page 3: Analyzing Esxtop Data

8/9/2019 Analyzing Esxtop Data

http://slidepdf.com/reader/full/analyzing-esxtop-data 3/6

+n the 7 screen, accessed by pressing "c’ you can choose to filter the list to see only the

virtual machines#

3:51:30am up 2 das 3:59, 304 worlds, 1 VMs, 1 vCPUs; CPU load average: 0.01,0.01, 0.01PCPU U!=->: 1.9 1.8 1.9 1.9 #V: 1.9

PCPU U$"*=->: 4.1 3.8 2.8 3.6 #V: 3.

  " " +#M! +'* -U! -)U+ -? -'#"$-VM'#"$ -)? -"*! -(V)*P -C$P -M*M$ -'P'$  83880 83880 <P 5 1.31 1.16 0.13 496.450.06 1.65 98.19 0.02 0.00 0.00 0.00

To expand a world group for a %&, press "e’ then type in the 0I'#

 3:52:44am up 2 das 4:00, 30 worlds, 1 VMs, 1 vCPUs; CPU load average:0.01, 0.01, 0.01PCPU U!=->: 1.3 0.9 1.2 0. #V: 1.0PCPU U$"*=->: 2.0 1.0 1.4 0.8 #V: 1.3

  " " +#M! +'* -U! -)U+ -? -'#"$-VM'#"$ -)? -"*! -(V)*P -C$P -M*M$ -'P'$  10305 83880 vm/ 1 0.1 0.1 0.00 99.60 0.04 0.00 0.00 0.00 0.00 0.00  10308 83880 vmas.10306 1 0.00 0.00 0.00 99.89 0.01 0.00 0.00 0.00 0.00 0.00  10309 83880 vm/vread4:< 1 0.00 0.00 0.00 99.90 0.00 0.00 0.00 0.00 0.00 0.00  103060 83880 vm/m@s:<P 1 0.01 0.01 0.00 99.89 0.00 0.00 0.00 0.00 0.00 0.00  103061 83880 vm/vpu0:<P 1 0.9 0.69 0.1 98.590.0 0.52 98.53 0.01 0.00 0.00 0.00

-o, what are the main 7 counters to be aware of? 8irst of all, there are the ones relating to the physical 7s in the host. These are#

• PCP( (S"D)*+ * The percentage 7 usage per 7 and the 7 usage average

across all 7s.

• PCP( (%$,)*+ 5 The percentage of unhalted 7 cycles per 7 and the average

across all 7s.

If these values are high it means that you are using a lot of 7 resource on the host. If all of the

7s are running at or close to 2339 it is likely that you are overcommiting your 7resources.

-ome of the metrics relating to the worlds to pay attention to are#

• *(S"D * This is the percentage of 7 time accounted to the world. This value can be

over 233 as, when viewing the world group for the %&, the value maximum value is the

number of worlds in the group !46:'/ multiplied by 233. If the 97-;' value is high it

Page 4: Analyzing Esxtop Data

8/9/2019 Analyzing Esxtop Data

http://slidepdf.com/reader/full/analyzing-esxtop-data 4/6

means the %& is using lots of 7 resource. <ou can expand the %&’s world group to

see what is using the resource. 7sing the example above, the %&’s world group has =

worlds, which can be seen expanded in the following example.

• *S-S * This is the percentage of time that the system services are spending on the %&.

If this value is high it tends to mean that the %& is experiencing high I+.

• *.V!,P * This is the percentage of time spent by system services on other worlds.

6hen this value is high it is normally an indication that the host is experiencing high I+.

• *!(' * This is the percentage of total time scheduled for the world to run. 97-;' 1

9>74 9-<- * 9+%>:. 6hen the 9>74 value of a virtual machine is high, itmeans the %& is using a lot of 7 resource.

• *!D- * This is the percentage of time a world is waiting to run. If this value is higher

than @39 it means that the virtual machine is possibly under resource contention.

>emember that this value is per v7 world, so for virtual machine with multiple v7syou can expect higher values.

• *M,M%D * This is the percentage of time the world was ready to run but was

deliberately not scheduled as it would have violated 7 limits. This value is contained

in 9>'<. If this value is high then you could increase its limit, adding more v7s.

• *CS%P * This is the amount of time the world has spent in the ready, co5deschedule

state. This is only applicable for -& %&s. The scheduler tries to execute on all v7s.The 9T- value is the time the v7 is stopped from executing whilst waiting for

other v7s in the same virtual machine to executecatch up.

• *#A$% * The percentage of time a world has spent in the wait state. The 96$IT is the

total wait time which includes 9I':; and I+ wait time.

• *$D," * The percentage of time a world is in idle loop.

• *S#P#% * The percentage of time the world is waiting for the %&kernel swapping

memory.

-ome things to note#

• 97-;' 1 9>74 9-<- * 9+%>:

• 2339 1 9>74 9>;$'< 9-T 96$IT

Analyzing Memory Performance with esxtop

<ou can view the memory performance data in esxtop by pressing "m’#

Page 5: Analyzing Esxtop Data

8/9/2019 Analyzing Esxtop Data

http://slidepdf.com/reader/full/analyzing-esxtop-data 5/6

11:10:1pm up 5:11, 315 worlds, 2 VMs, 4 vCPUs; M!M overommA avg: 0.00,0.00, 0.00PM!M &M7: 4095 oal: 80 vm@, 641 oer, 2492 BreeVMM!M&M7: 4066 maaged: 244 mABree, 245 rsvd, 121 ursvd, Ag saeP%#)!&M7: 9 sared, 39 ommo: 30 savAg'#P &M7: 0 urr, 0 rlmg: 0.00 r&s, 0.00 w&sD"P &M7: 0 EApped, 0 savedM!MC$*&M7: 0 urr, 0 arge, 254 ma/

  " +#M! M!MD )#+$ D$$ $C% $C%F' 'CU)'$$ ')&s ''&s **')&s **''&s (V%U'  24950 <P1 25.00 255.66 30.66 81.92 9.12 0.000.00 0.00 0.00 0.00 0.00 5.98  2492 <P2 25.00 255.66 30.55 9.12 51.20 0.000.00 0.00 0.00 0.00 0.00 5.98

The physical memory is shown by the &;& metric. In the example above we can see that this

;-Ai host has B0) >$&, with CD3&) in use by the %&kernel and EB2&) in use by other

 processes. There is @BF@ &) free.

+f the metrics relating to the virtual machine worlds#

• M"MS  * This is the value ,in &), of the configured guest memory.

• G!A'% * This is the amount of memory that has been granted to the world group.

• *AC%V * This is the percentage of active guest memory.

• *MC%,S 5 This is the percentage of guest memory reclaimed by the balloon driver. If

this is high, it can be a sign of memory contention on the host.

• S#C(!  * urrent swap usage. If this is high it is a sign of memory contention on the

host.

Analyzing 'etwork Performance with esxtop

 4etwork performance data in esxtop is accessed by pressing "n’#

11:40:40pm up 5:41, 314 worlds, 2 VMs, 4 vCPUs; CPU load average: 0.04, 0.04,0.16

  P()$" U!7? $!#MP+"C +#M! P$$<&s M$<&sP$)<&s M)<&s -)P$< -)P)<  33554433 Maageme &a vwA0 0.00 0.000.00 0.00 0.00 0.00  33554434 vmA0 vwA0 6.80 0.0216.5 0.03 0.00 0.00  33554435 adow oB vmA0 &a vwA0 0.00 0.000.00 0.00 0.00 0.00

Page 6: Analyzing Esxtop Data

8/9/2019 Analyzing Esxtop Data

http://slidepdf.com/reader/full/analyzing-esxtop-data 6/6

  3355443 vmA2 vwA0 0.00 0.0025.36 0.04 0.00 0.00  33554436 adow oB vmA2 &a vwA0 0.00 0.000.00 0.00 0.00 0.00  33554438 vm@0 vmA0 vwA0 10.63 0.024.88 0.01 0.00 0.00  33554439 vm@2 vmA2 vwA0 0.00 0.000.00 0.00 0.00 0.00

&etrics to look out for here are &bTAs !&egabit Transmit/ and &b>As !&egabit >eceive/.

(eep and eye on 9'>TA and 9'>>A as they can be an indicator of a busy or saturated

network.