M17-DataCollectionTools

7/27/2019 M17-DataCollectionTools

http://slidepdf.com/reader/full/m17-datacollectiontools 1/41

Data CollectionTools

Module 17

Data ONTAP 8.0 7-Mode

Administration



© 2009 NetApp. All rights reserved.

Module Objectives

By the end of this module, you should be able to:

Use the sysstat, stats, statit, and

options commands

Describe the factors that affect RAID

performance

Execute commands to collect data about write

throughput

Execute commands to verify the operation of hardware, software, and network components

Identify commands and options used to obtain

configuration and status




System Health

Performance problems can originate from

multiple sources. To avoid some of these

problems, check or monitor the following:

Disk configuration

– Disk status – Write performance

– Read performance

RAID configuration Connectivity configuration

Performance measures




Disk Status




Disk Status

Monitor disks:

– shelfchk

– led_on diskid and led_off diskid

(priv set advanced command)

Storage Health Monitor: – Simple storage system management service

– Automatically initiates during system boot

– Provides background monitoring of individual disk

performance

– Detects impending disk problems before they

actually occur

– disk shm_stats (priv set advanced command)




Syslog Messages

shm: disk has reported a predicted

failure (PFA) event: disk XX ,serial_number XXXX

shm: link failure detected, upstreamfrom disk: id XX , serial_number XXXXX

shm: disk I/O completion times too long:disk XX , serial number XXXXX

shm: possible link errors on disk: idXX , serial number XXXXX

shm: disk returns excessive recovered

errors: disk XX , serial number XXXXX

shm: intermittent instability on theloop that is attached to Fibre Channeladapter: id XXX , name XXXXX


http://slidepdf.com/reader/full/m17-datacollectiontools 7/41© 2009 NetApp. All rights reserved.

Write Performance



Write Performance Commands

Use the following commands to research write

performance:

Command Function

sysstat Displays current statistics

statit Displays disk utilization

stats Displays performance data



Write Performance: sysstat Command

system> sysstat -c 10 -s 5

CPU NFS CIFS HTTP Net kB/s Disk kB/s Tape kB/s Cachein out read write read write age

2% 0 0 0 0 0 9 23 0 0 >60

0% 0 0 0 0 0 0 0 0 0 >60

5% 0 0 0 0 0 21 27 0 0 >60

1% 0 0 0 0 0 0 0 0 0 >60

5% 0 0 0 0 0 20 28 0 0 >60

1% 0 0 0 0 0 0 0 0 0 >60

4% 0 0 0 0 0 21 26 0 0 >60

1% 0 0 0 0 0 0 0 0 0 >60

5% 0 0 0 0 0 22 27 0 0 >60

0% 0 0 0 0 0 0 0 0 0 >60

--

Summary Statistics (10 samples 5.0 secs/sample)

CPU NFS CIFS HTTP Net kB/s Disk kB/s Tape kB/s Cache

in out read write read write ageMin

0% 0 0 0 0 0 0 0 0 0 >60

Avg

2% 0 0 0 0 0 9 13 0 0 >60

Max

5% 0 0 0 0 0 22 28 0 0 >60



stats: System Performance

The stats command displays statistical data about

the storage system and is capable of displayingstatistics on every aspect of the storage system

Statistics returned using the stats command are

based on the following hierarchy:

– Objects— Any entity in the system is an object (physicalor logical, including volumes, aggregates, qtrees, disks,

and NICs)

– Instances— An object such as a volume called nfsflex,

or an aggregate called aggr1, or a disk identified as0b.17

– Counters—The counters associated with particular

objects and instances



stats: Examples of Objects and Instances

Examples of objects:

– Aggregate

– Volume

– Qtree

– Disk – CIFS

– NFS

– LUN

Examples of instances: – /vol/vol0, /vol/nfstree, 0b.18

– /vol/flex1/lun_test

– cifs_ops, cifs_latency, cifs_read_ops



The stats Command

The stats command can be executed in one of

three ways, based on the frequency of displays:

Once—Current counter values are displayedstats show

Repeating—Counter values are displayed at afixed interval

stats show –i 1

Period—Counter values are gathered over asingle period of time and then displayed

stats start then stats stop



The stats Command (Cont.)

Use stats list counters to see what is available

The statistics available through the stats

infrastructure are available using other tools such asperfmom, perfstat and Operations Manager

The following are examples of stats commands:

system> stats show cifs:cifs:cifs_latency

cifs:cifs:cifs_latency:1.92m

system> stats show volume:vol0:write_latency

volume:vol0:write_latency:171.50us



Read Performance



Read Performance

Data ONTAP is optimized for write

performance

Read performance could decrease over timeNOTE: Efficient use of cache can offset some disk performance

issues.

Optimization:

– To measure optimization:

reallocate measure [vol | file]

– To resolve optimization:reallocate start < pathname>



RAID Configuration



RAID Groups

/vol0 /vol1 /vol2

rg0 rg0 rg0

rg1



RAID Group Size and Composition

The following are some examples of poor RAID

configuration choices:

Unnecessarily using multiple RAID groups

Using mixed disk sizes

Configuring RAID groups with wide variationsin capacity

Configuring RAID groups with only one or two

data disks Configuring RAID groups with a number of

disks larger than the default



Initial RAID Group Configuration

Limit the number of disks in a RAID group to

the recommended numbers

Ensure that each RAID group in an aggregate

has approximately the same capacity

Ensure that each RAID group in an aggregatehas at least three data disks

Use disks of the same size within a RAID

group to optimize write performance

Use RAID-DP™ to protect against disk failures



Adding Disks to Existing RAID Groups

Add RAID groups when the applied load is

stressing the drives in the current array

Add RAID groups and disks before the file

system or aggregate is 80% to 90% full

Add disks in groups Plan data expansion so that no fewer than

three data disks are used for any RAID group




MonitoringConnectivity




Connectivity

Use the following to monitor connectivity:

MAC – ifconfig

– ifstat

– arp

TCP/IP – ifconfig

– /etc/rc and /etc/hosts

– ping

– netstat -r Protocols

– nfsstat

– cifs stat

– nbtstat




Performance Measures




Measuring NFS Performance

options nfs.per_client_stats.enable [on|off]

Recommended to disable when not using nfsstat–l

This display shows

the breakdown on

this mountpoint of

lookups, reads,

writes, and all

operations. Theaverage deviation

and the settings for

retransmissions of

each type also are

displayed.

Data ONTAP NFS Output - Command: nfsstat -l

/n/homesystem from homesystem.corp.com:/homeFlags:vers=2,proto=udp,auth=unix,hard,intr,dynamic,rsize=8192 wsize=8192,retrans=5

Lookups: sttr=7(17ms), dev=4(20ms),cur=2(40ms)Reads: sttr=12(30ms), dev=4(20ms),cur=3(40ms) Writes: sttr=21(52ms), dev=5(25ms),cur=5(100ms) All: sttr=7(7ms), dev=4(20ms),cur=2(40ms)

The output includes server name and address, mount

flags, current read and write sizes, retransmissions count,

and timers used for dynamic retransmission.

Round trip response

times for specific

NFS operations are

displayed.




Measuring CIFS Performance

Analyzing smb_hist output

CIFS request time processing: (46457) - milliseconds units

0ms 1ms 2ms 3ms 4ms 5ms 6ms 7ms

13175 17752 5111 664 451 478 570 568

<16ms <24ms <32ms <40ms <48ms <56ms <64ms unused

4039 2309 569 165 61 21 10 0

This number is the total number of

operations since smb_hist statistics were last reset.

This column represents

millisecond (ms) time stampsfor operations.

Every other row displays the number of

operations that took place in the interval in

the row above it. In this example, 13,715

operations happened in less than .5 ms.

The time interval window lies halfway between

the values for adjacent columns. In this

example, 165 operations occurred in the 36-ms

to 44-ms windows.




Obtaining Statistics

The statit command:

Is an advanced-mode command used for more

detailed analysis of system performance

Gathers per-second statistics averaged over

the length of time it is running in thebackground

Shows statistics representing all physical and

some logical objects on the storage system

Most of the data collected represents rates at

which things are happening




Using the statit Command

To obtain statistics using the statit command,

complete the following steps:

1. To enter advanced privilege mode, enter:priv set advanced

2. To begin collecting statistics, enter:statit

–b.

3. After 30 seconds (or as necessary to end statistics

collection and include NFS statistics), enter:statit –e –n

4. To return to normal admin privilege mode, enter:priv set admin




Obtaining Statistics

The report generated is divided into the following

statistics sections: CPU

Multiprocessor

CSMP domain switches

Miscellaneous

WAFL

RAID

Network interface

Disk

Aggregate Spares and other disks

FCP

iSCSI

Tape




CPU Statistics

CPU Statistics

506.934263 time (seconds) 100 %275.044317 system time 54 %

23.412966 rupt time 5 % (7022 rupts x 0 usec/rupt

251.466451 non-rupt system time 50 %

271.837944 idle time 44 %

439.543653 time in CP 92 % 100 %

21.837230 rupt time in CP 5 % (132 rupts x 0 sec/rupt)




Multiprocessor Statistics

Multiprocessor Statistics (per second)

cpu0 cpu1 total

sk switches 1378.09 46.82 1424.91

hard switches 1175.27 29.15 1204.42

domain switches 103.89 16.08 119.96

CP rupts 0.00 0.00 0.00

nonCP rupts 100.00 0.00 100.00

nonCP rupt usec 0.00 0.00 0.00

Idle 1000000.00 1000000.00 2000000.00

kahuna 0.00 0.00 0.00

network 0.00 0.00 0.00

storage 0.00 0.00 0.00 exempt 0.00 0.00 0.00

raid 0.00 0.00 0.00

target 0.00 0.00 0.00

netcache 0.00 0.00 0.00

netcache2 0.00 0.00 0.00




Miscellaneous Statistics

Miscellaneous Statistics (per second)

1893.73 hard context switches0.00 NFS operations

0.00 CIFS operations

0.00 HTTP operations

0.00 NetCache URLs

0.00 streaming packets

0.00 network KB received0.00 network KB transmitted

18.16 disk KB read

61.30 disk KB written

0.28 NVRAM KB written

0.00 nolog KB written

0.00 WAFL® bufs given to clients

0.00 checksum cache hits ( 0%)0.00 no checksum - partial buffer

0.00 DAFS operations

0.00 FCP operations

0.00 iSCSI operations




WAFL Rates

WAFL Statistics (per second)

5.96 name cache hits ( 62%)3.69 name cache misses ( 38%)

19.30 inode cache hits ( 100%)

0.00 inode cache misses ( 0%)

55.06 buf cache hits ( 100%)

0.00 buf cache misses ( 0%)

0.00 blocks read

0.00 blocks read-ahead

0.00 chains read-ahead

0.00 blocks speculative read-ahead5.11 blocks written

0.57 stripes written

0.00 blocks over-written

0.28 wafl_timer generated CP0.00 snapshot generated CP

0.00 wafl_avail_bufs generated CP

0.00 dirty_blk_cnt generated CP

0.00 full NV-log generated CP

0.00 back-to-back CP

0.00 flush generated CP

0.00 sync generated CP

0.00 wafl_avail_vbufs generated CP

55.06 non-restart messages0.00 IOWAIT suspends

604852 buffers




Network Interface Statistics

Network Interface Statistics (per second)

iface side bytes packets multicasts errors collisionse0 recv 171.69 2.55 0.00 0.00 0.00

xmit 115.22 1.42 0.00 0.00 0.00

e9 recv 0.00 0.00 0.00 0.00 0.00

xmit 0.00 0.00 0.00 0.00 0.00

e6 recv 0.00 0.00 0.00 0.00 0.00

xmit 0.00 0.00 0.00 0.00 0.00vh recv 0.00 0.00 0.00 0.00 0.00

xmit 0.00 0.00 0.00 0.00 0.00




Disk Statistics

Disk Statistics (per second)

ut% is the percent of time the disk was busy.

xfers is the number of data transfer commands issued per second.

xfers = ureads + writes + cpreads + greads + gwrites

chain is the average number of 4K blocks per command.

usecs is the average disk round trip time per 4K block.

disk ut% xfers ureads--chain-usecs writes--chain-usecs cpreads-chain-usecs

/vol0/plex0/rg0:

8a.16 5 3.69 0.57 1.00 94500 ...

8a.21 4 3.12 0.57 1.00 39500 ...




Aggregate, Spares, and Disk Statistics

Aggregate statistics:

Minimum 0 0.00 0.00 0.00 0.00 0.00 0.00Mean 1 0.28 0.00 0.28 0.00 0.00 0.00

Maximum 5 3.69 0.57 3.12 0.00 0.00 0.00

Spares and other disks:

8b.16 2 1.70 1.70 1.00 10167 0.00 .... . 0.00 .... . 0.00 .... . 0.00 ..

8b.17 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .

8b.18 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .




FCP, iSCSI, and Tape Operations

FCP Statistics (per second)

0.00 FCP Bytes recv 0.00 FCP Bytes sent0.00 FCP ops

iSCSI Statistics (per second)

0.00 iSCSI Bytes recv 0.00 iSCSI Bytes xmit

0.00 iSCSI ops

Interrupt Statistics (per second)

2000.15 Clock 3.97 Fast Enet

47.68 FCAL 4.54 int_22

3.41 FCAL 2059.75 total




Other Resources

For more information about data collection and performance, see the

Fundamentals of Performance Analysis course.This advanced course shows you how to:

Analyze data using recommended methodology to correlateperformance data into performance analysis information

Monitor performance using performance tools and establish abaseline of expected throughput and response times for storagesystems under planned and increasing workloads

Perform capacity planning by monitoring performance andcomparing baseline information over time to determine when astorage system will reach maximum capacity

Perform tuning for optimal performance for protocols such as

CIFS, NFS and SAN (including locating resources with tuningguidelines for database scenarios)

Perform bottleneck analysis




Module Summary

In this module, you should have learned to:

Use the sysstat, stats, statit, and

options commands

Describe the factors that affect RAID

performance Execute commands to collect data about write

throughput

Execute commands to verify the operation of hardware, software, and network components

Identify commands and options used to obtain

configuration and status



Exercise

Module 17: Data Collection Tools

Estimated Time: 60 minutes



Check Your Understanding

What command(s) would you use to display

disk utilization?

– statit

What command(s) would you use to monitor

connectivity? – ifconfig, ifstat, arp, ping,

netstat

What command(s) would you use to help

detect impending disk problems before they

occur?

– disk shm_stats

M17-DataCollectionTools

Documents

Transcript of M17-DataCollectionTools