M17-DataCollectionTools

41
Data Collection Tools Module 17 Data ONTAP 8.0 7-Mode  Administration

Transcript of M17-DataCollectionTools

7/27/2019 M17-DataCollectionTools

http://slidepdf.com/reader/full/m17-datacollectiontools 1/41

Data CollectionTools

Module 17

Data ONTAP 8.0 7-Mode

 Administration

7/27/2019 M17-DataCollectionTools

http://slidepdf.com/reader/full/m17-datacollectiontools 2/41

© 2009 NetApp. All rights reserved.

Module Objectives

By the end of this module, you should be able to:

Use the sysstat, stats, statit, and

options commands

Describe the factors that affect RAID

performance

Execute commands to collect data about write

throughput

Execute commands to verify the operation of hardware, software, and network components

Identify commands and options used to obtain

configuration and status

7/27/2019 M17-DataCollectionTools

http://slidepdf.com/reader/full/m17-datacollectiontools 3/41

© 2009 NetApp. All rights reserved.

System Health

Performance problems can originate from

multiple sources. To avoid some of these

problems, check or monitor the following:

Disk configuration

 – Disk status – Write performance

 – Read performance

RAID configuration Connectivity configuration

Performance measures

7/27/2019 M17-DataCollectionTools

http://slidepdf.com/reader/full/m17-datacollectiontools 4/41

© 2009 NetApp. All rights reserved.

Disk Status

7/27/2019 M17-DataCollectionTools

http://slidepdf.com/reader/full/m17-datacollectiontools 5/41

© 2009 NetApp. All rights reserved.

Disk Status

Monitor disks:

 – shelfchk

 – led_on diskid and led_off diskid 

(priv set advanced command)

Storage Health Monitor: – Simple storage system management service

 –  Automatically initiates during system boot

 – Provides background monitoring of individual disk

performance

 – Detects impending disk problems before they

actually occur 

 – disk shm_stats (priv set advanced command)

7/27/2019 M17-DataCollectionTools

http://slidepdf.com/reader/full/m17-datacollectiontools 6/41

© 2009 NetApp. All rights reserved.

Syslog Messages

shm: disk has reported a predicted

failure (PFA) event: disk XX ,serial_number XXXX  

shm: link failure detected, upstreamfrom disk: id XX , serial_number XXXXX  

shm: disk I/O completion times too long:disk XX , serial number XXXXX  

shm: possible link errors on disk: idXX , serial number XXXXX  

shm: disk returns excessive recovered

errors: disk XX , serial number XXXXX  

shm: intermittent instability on theloop that is attached to Fibre Channeladapter: id XXX , name XXXXX  

7/27/2019 M17-DataCollectionTools

http://slidepdf.com/reader/full/m17-datacollectiontools 7/41© 2009 NetApp. All rights reserved.

Write Performance

7/27/2019 M17-DataCollectionTools

http://slidepdf.com/reader/full/m17-datacollectiontools 8/41© 2009 NetApp. All rights reserved.

Write Performance Commands

Use the following commands to research write

performance:

Command Function

sysstat Displays current statistics

statit Displays disk utilization

stats Displays performance data

7/27/2019 M17-DataCollectionTools

http://slidepdf.com/reader/full/m17-datacollectiontools 9/41© 2009 NetApp. All rights reserved.

Write Performance: sysstat Command

system> sysstat -c 10 -s 5

CPU NFS CIFS HTTP Net kB/s Disk kB/s Tape kB/s Cachein out read write read write age

2% 0 0 0 0 0 9 23 0 0 >60

0% 0 0 0 0 0 0 0 0 0 >60

5% 0 0 0 0 0 21 27 0 0 >60

1% 0 0 0 0 0 0 0 0 0 >60

5% 0 0 0 0 0 20 28 0 0 >60

1% 0 0 0 0 0 0 0 0 0 >60

4% 0 0 0 0 0 21 26 0 0 >60

1% 0 0 0 0 0 0 0 0 0 >60

5% 0 0 0 0 0 22 27 0 0 >60

0% 0 0 0 0 0 0 0 0 0 >60

--

Summary Statistics (10 samples 5.0 secs/sample)

CPU NFS CIFS HTTP Net kB/s Disk kB/s Tape kB/s Cache

in out read write read write ageMin

0% 0 0 0 0 0 0 0 0 0 >60

Avg

2% 0 0 0 0 0 9 13 0 0 >60

Max

5% 0 0 0 0 0 22 28 0 0 >60

7/27/2019 M17-DataCollectionTools

http://slidepdf.com/reader/full/m17-datacollectiontools 10/41© 2009 NetApp. All rights reserved.

stats: System Performance

The stats command displays statistical data about

the storage system and is capable of displayingstatistics on every aspect of the storage system

Statistics returned using the stats command are

based on the following hierarchy:

 – Objects— Any entity in the system is an object (physicalor logical, including volumes, aggregates, qtrees, disks,

and NICs)

 – Instances— An object such as a volume called nfsflex,

or an aggregate called aggr1, or a disk identified as0b.17 

 – Counters—The counters associated with particular 

objects and instances

7/27/2019 M17-DataCollectionTools

http://slidepdf.com/reader/full/m17-datacollectiontools 11/41© 2009 NetApp. All rights reserved.

stats: Examples of Objects and Instances

Examples of objects:

 –  Aggregate

 – Volume

 – Qtree

 – Disk – CIFS

 – NFS

 – LUN

Examples of instances: – /vol/vol0, /vol/nfstree, 0b.18

 – /vol/flex1/lun_test

 – cifs_ops, cifs_latency, cifs_read_ops

7/27/2019 M17-DataCollectionTools

http://slidepdf.com/reader/full/m17-datacollectiontools 12/41© 2009 NetApp. All rights reserved.

The stats Command

The stats command can be executed in one of 

three ways, based on the frequency of displays:

Once—Current counter values are displayedstats show

Repeating—Counter values are displayed at afixed interval

stats show –i 1

Period—Counter values are gathered over asingle period of time and then displayed

stats start then stats stop

7/27/2019 M17-DataCollectionTools

http://slidepdf.com/reader/full/m17-datacollectiontools 13/41© 2009 NetApp. All rights reserved.

The stats Command (Cont.)

Use stats list counters to see what is available

The statistics available through the stats 

infrastructure are available using other tools such asperfmom, perfstat and Operations Manager 

The following are examples of stats commands:

system> stats show cifs:cifs:cifs_latency

cifs:cifs:cifs_latency:1.92m

system> stats show volume:vol0:write_latency

volume:vol0:write_latency:171.50us

7/27/2019 M17-DataCollectionTools

http://slidepdf.com/reader/full/m17-datacollectiontools 14/41

7/27/2019 M17-DataCollectionTools

http://slidepdf.com/reader/full/m17-datacollectiontools 15/41© 2009 NetApp. All rights reserved.

Read Performance

7/27/2019 M17-DataCollectionTools

http://slidepdf.com/reader/full/m17-datacollectiontools 16/41© 2009 NetApp. All rights reserved.

Read Performance

Data ONTAP is optimized for write

performance

Read performance could decrease over timeNOTE: Efficient use of cache can offset some disk performance

issues.

Optimization:

 – To measure optimization:

reallocate measure [vol | file]

 – To resolve optimization:reallocate start < pathname>

7/27/2019 M17-DataCollectionTools

http://slidepdf.com/reader/full/m17-datacollectiontools 17/41© 2009 NetApp. All rights reserved.

RAID Configuration

7/27/2019 M17-DataCollectionTools

http://slidepdf.com/reader/full/m17-datacollectiontools 18/41© 2009 NetApp. All rights reserved.

RAID Groups

/vol0 /vol1 /vol2

rg0 rg0 rg0

rg1

7/27/2019 M17-DataCollectionTools

http://slidepdf.com/reader/full/m17-datacollectiontools 19/41© 2009 NetApp. All rights reserved.

RAID Group Size and Composition

The following are some examples of poor RAID

configuration choices:

Unnecessarily using multiple RAID groups

Using mixed disk sizes

Configuring RAID groups with wide variationsin capacity

Configuring RAID groups with only one or two

data disks Configuring RAID groups with a number of 

disks larger than the default

7/27/2019 M17-DataCollectionTools

http://slidepdf.com/reader/full/m17-datacollectiontools 20/41© 2009 NetApp. All rights reserved.

Initial RAID Group Configuration

Limit the number of disks in a RAID group to

the recommended numbers

Ensure that each RAID group in an aggregate

has approximately the same capacity

Ensure that each RAID group in an aggregatehas at least three data disks

Use disks of the same size within a RAID

group to optimize write performance

Use RAID-DP™ to protect against disk failures 

7/27/2019 M17-DataCollectionTools

http://slidepdf.com/reader/full/m17-datacollectiontools 21/41© 2009 NetApp. All rights reserved.

Adding Disks to Existing RAID Groups

 Add RAID groups when the applied load is

stressing the drives in the current array

 Add RAID groups and disks before the file

system or aggregate is 80% to 90% full

 Add disks in groups Plan data expansion so that no fewer than

three data disks are used for any RAID group

7/27/2019 M17-DataCollectionTools

http://slidepdf.com/reader/full/m17-datacollectiontools 22/41

© 2009 NetApp. All rights reserved.

MonitoringConnectivity

7/27/2019 M17-DataCollectionTools

http://slidepdf.com/reader/full/m17-datacollectiontools 23/41

© 2009 NetApp. All rights reserved.

Connectivity

Use the following to monitor connectivity:

MAC – ifconfig

 – ifstat

 – arp

TCP/IP – ifconfig

 – /etc/rc and /etc/hosts

 – ping

 – netstat -r Protocols

 – nfsstat

 – cifs stat

 – nbtstat

7/27/2019 M17-DataCollectionTools

http://slidepdf.com/reader/full/m17-datacollectiontools 24/41

© 2009 NetApp. All rights reserved.

Performance Measures

7/27/2019 M17-DataCollectionTools

http://slidepdf.com/reader/full/m17-datacollectiontools 25/41

© 2009 NetApp. All rights reserved.

Measuring NFS Performance

options nfs.per_client_stats.enable [on|off]

Recommended to disable when not using nfsstat–l

This display shows

the breakdown on

this mountpoint of 

lookups, reads,

writes, and all

operations. Theaverage deviation

and the settings for 

retransmissions of 

each type also are

displayed.

Data ONTAP NFS Output - Command: nfsstat -l

/n/homesystem from homesystem.corp.com:/homeFlags:vers=2,proto=udp,auth=unix,hard,intr,dynamic,rsize=8192 wsize=8192,retrans=5

Lookups: sttr=7(17ms), dev=4(20ms),cur=2(40ms)Reads: sttr=12(30ms), dev=4(20ms),cur=3(40ms) Writes: sttr=21(52ms), dev=5(25ms),cur=5(100ms) All: sttr=7(7ms), dev=4(20ms),cur=2(40ms)

The output includes server name and address, mount

flags, current read and write sizes, retransmissions count,

and timers used for dynamic retransmission.

Round trip response

times for specific

NFS operations are

displayed.

7/27/2019 M17-DataCollectionTools

http://slidepdf.com/reader/full/m17-datacollectiontools 26/41

© 2009 NetApp. All rights reserved.

Measuring CIFS Performance

 Analyzing smb_hist output

CIFS request time processing: (46457) - milliseconds units

0ms 1ms 2ms 3ms 4ms 5ms 6ms 7ms

13175 17752 5111 664 451 478 570 568

<16ms <24ms <32ms <40ms <48ms <56ms <64ms unused

4039 2309 569 165 61 21 10 0

This number is the total number of 

operations since smb_hist statistics were last reset.

This column represents

millisecond (ms) time stampsfor operations.

Every other row displays the number of 

operations that took place in the interval in

the row above it. In this example, 13,715

operations happened in less than .5 ms.

The time interval window lies halfway between

the values for adjacent columns. In this

example, 165 operations occurred in the 36-ms

to 44-ms windows.

7/27/2019 M17-DataCollectionTools

http://slidepdf.com/reader/full/m17-datacollectiontools 27/41

© 2009 NetApp. All rights reserved.

Obtaining Statistics

The statit command:

Is an advanced-mode command used for more

detailed analysis of system performance

Gathers per-second statistics averaged over 

the length of time it is running in thebackground

Shows statistics representing all physical and

some logical objects on the storage system

Most of the data collected represents rates at

which things are happening

7/27/2019 M17-DataCollectionTools

http://slidepdf.com/reader/full/m17-datacollectiontools 28/41

© 2009 NetApp. All rights reserved.

Using the statit Command

To obtain statistics using the statit command,

complete the following steps:

1. To enter advanced privilege mode, enter:priv set advanced 

2. To begin collecting statistics, enter:statit

–b.

3.  After 30 seconds (or as necessary to end statistics

collection and include NFS statistics), enter:statit –e –n 

4. To return to normal admin privilege mode, enter:priv set admin

7/27/2019 M17-DataCollectionTools

http://slidepdf.com/reader/full/m17-datacollectiontools 29/41

© 2009 NetApp. All rights reserved.

Obtaining Statistics

The report generated is divided into the following

statistics sections: CPU

Multiprocessor 

CSMP domain switches

Miscellaneous

WAFL

RAID

Network interface

Disk

 Aggregate Spares and other disks

FCP

iSCSI

Tape

7/27/2019 M17-DataCollectionTools

http://slidepdf.com/reader/full/m17-datacollectiontools 30/41

© 2009 NetApp. All rights reserved.

CPU Statistics

CPU Statistics

506.934263 time (seconds) 100 %275.044317 system time 54 %

23.412966 rupt time 5 % (7022 rupts x 0 usec/rupt

251.466451 non-rupt system time 50 %

271.837944 idle time 44 %

439.543653 time in CP 92 % 100 %

21.837230 rupt time in CP 5 % (132 rupts x 0 sec/rupt)

7/27/2019 M17-DataCollectionTools

http://slidepdf.com/reader/full/m17-datacollectiontools 31/41

© 2009 NetApp. All rights reserved.

Multiprocessor Statistics

Multiprocessor Statistics (per second) 

cpu0  cpu1  total 

sk switches  1378.09  46.82  1424.91 

hard switches  1175.27  29.15  1204.42 

domain switches  103.89  16.08  119.96 

CP rupts  0.00 0.00 0.00

nonCP rupts  100.00  0.00 100.00 

nonCP rupt usec  0.00  0.00  0.00 

Idle  1000000.00  1000000.00  2000000.00 

kahuna  0.00  0.00  0.00 

network  0.00  0.00  0.00 

storage  0.00  0.00  0.00 exempt  0.00  0.00  0.00 

raid  0.00  0.00  0.00 

target  0.00  0.00  0.00 

netcache  0.00  0.00  0.00 

netcache2  0.00  0.00  0.00 

7/27/2019 M17-DataCollectionTools

http://slidepdf.com/reader/full/m17-datacollectiontools 32/41

© 2009 NetApp. All rights reserved.

Miscellaneous Statistics

Miscellaneous Statistics (per second)

1893.73 hard context switches0.00 NFS operations

0.00 CIFS operations

0.00 HTTP operations

0.00 NetCache URLs

0.00 streaming packets

0.00 network KB received0.00 network KB transmitted

18.16 disk KB read

61.30 disk KB written

0.28 NVRAM KB written

0.00 nolog KB written

0.00 WAFL® bufs given to clients

0.00 checksum cache hits ( 0%)0.00 no checksum - partial buffer

0.00 DAFS operations

0.00 FCP operations

0.00 iSCSI operations

7/27/2019 M17-DataCollectionTools

http://slidepdf.com/reader/full/m17-datacollectiontools 33/41

© 2009 NetApp. All rights reserved.

WAFL Rates

WAFL Statistics (per second)

5.96 name cache hits ( 62%)3.69 name cache misses ( 38%)

19.30 inode cache hits ( 100%)

0.00 inode cache misses ( 0%)

55.06 buf cache hits ( 100%)

0.00 buf cache misses ( 0%)

0.00 blocks read

0.00 blocks read-ahead

0.00 chains read-ahead

0.00 blocks speculative read-ahead5.11 blocks written

0.57 stripes written

0.00 blocks over-written

0.28 wafl_timer generated CP0.00 snapshot generated CP

0.00 wafl_avail_bufs generated CP

0.00 dirty_blk_cnt generated CP

0.00 full NV-log generated CP

0.00 back-to-back CP

0.00 flush generated CP

0.00 sync generated CP

0.00 wafl_avail_vbufs generated CP

55.06 non-restart messages0.00 IOWAIT suspends

604852 buffers

7/27/2019 M17-DataCollectionTools

http://slidepdf.com/reader/full/m17-datacollectiontools 34/41

© 2009 NetApp. All rights reserved.

Network Interface Statistics

Network Interface Statistics (per second)

iface side bytes packets multicasts errors collisionse0 recv 171.69 2.55 0.00 0.00 0.00

xmit 115.22 1.42 0.00 0.00 0.00

e9 recv 0.00 0.00 0.00 0.00 0.00

xmit 0.00 0.00 0.00 0.00 0.00

e6 recv 0.00 0.00 0.00 0.00 0.00

xmit 0.00 0.00 0.00 0.00 0.00vh recv 0.00 0.00 0.00 0.00 0.00

xmit 0.00 0.00 0.00 0.00 0.00

7/27/2019 M17-DataCollectionTools

http://slidepdf.com/reader/full/m17-datacollectiontools 35/41

© 2009 NetApp. All rights reserved.

Disk Statistics

Disk Statistics (per second)

ut% is the percent of time the disk was busy.

xfers is the number of data transfer commands issued per second.

xfers = ureads + writes + cpreads + greads + gwrites

chain is the average number of 4K blocks per command.

usecs is the average disk round trip time per 4K block.

disk ut% xfers ureads--chain-usecs writes--chain-usecs cpreads-chain-usecs

/vol0/plex0/rg0:

8a.16 5 3.69 0.57 1.00 94500 ...

8a.21 4 3.12 0.57 1.00 39500 ...

7/27/2019 M17-DataCollectionTools

http://slidepdf.com/reader/full/m17-datacollectiontools 36/41

© 2009 NetApp. All rights reserved.

Aggregate, Spares, and Disk Statistics

Aggregate statistics:

Minimum 0 0.00 0.00 0.00 0.00 0.00 0.00Mean 1 0.28 0.00 0.28 0.00 0.00 0.00

Maximum 5 3.69 0.57 3.12 0.00 0.00 0.00

Spares and other disks:

8b.16 2 1.70 1.70 1.00 10167 0.00 .... . 0.00 .... . 0.00 .... . 0.00 ..

8b.17 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .

8b.18 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .

7/27/2019 M17-DataCollectionTools

http://slidepdf.com/reader/full/m17-datacollectiontools 37/41

© 2009 NetApp. All rights reserved.

FCP, iSCSI, and Tape Operations

FCP Statistics (per second)

0.00 FCP Bytes recv 0.00 FCP Bytes sent0.00 FCP ops

iSCSI Statistics (per second)

0.00 iSCSI Bytes recv 0.00 iSCSI Bytes xmit

0.00 iSCSI ops

Interrupt Statistics (per second)

2000.15 Clock 3.97 Fast Enet

47.68 FCAL 4.54 int_22

3.41 FCAL 2059.75 total 

7/27/2019 M17-DataCollectionTools

http://slidepdf.com/reader/full/m17-datacollectiontools 38/41

© 2009 NetApp. All rights reserved.

Other Resources

For more information about data collection and performance, see the

Fundamentals of Performance Analysis course.This advanced course shows you how to:

 Analyze data using recommended methodology to correlateperformance data into performance analysis information

Monitor performance using performance tools and establish abaseline of expected throughput and response times for storagesystems under planned and increasing workloads

Perform capacity planning by monitoring performance andcomparing baseline information over time to determine when astorage system will reach maximum capacity

Perform tuning for optimal performance for protocols such as

CIFS, NFS and SAN (including locating resources with tuningguidelines for database scenarios)

Perform bottleneck analysis

7/27/2019 M17-DataCollectionTools

http://slidepdf.com/reader/full/m17-datacollectiontools 39/41

© 2009 NetApp. All rights reserved.

Module Summary

In this module, you should have learned to:

Use the sysstat, stats, statit, and

options commands

Describe the factors that affect RAID

performance Execute commands to collect data about write

throughput

Execute commands to verify the operation of hardware, software, and network components

Identify commands and options used to obtain

configuration and status

7/27/2019 M17-DataCollectionTools

http://slidepdf.com/reader/full/m17-datacollectiontools 40/41

Exercise

Module 17: Data Collection Tools

Estimated Time: 60 minutes

7/27/2019 M17-DataCollectionTools

http://slidepdf.com/reader/full/m17-datacollectiontools 41/41

Check Your Understanding

What command(s) would you use to display

disk utilization?

 – statit

What command(s) would you use to monitor 

connectivity? – ifconfig, ifstat, arp, ping,

netstat 

What command(s) would you use to help

detect impending disk problems before they

occur?

 – disk shm_stats