M17-DataCollectionTools
-
Upload
kurtenweiser -
Category
Documents
-
view
221 -
download
0
Transcript of M17-DataCollectionTools
7/27/2019 M17-DataCollectionTools
http://slidepdf.com/reader/full/m17-datacollectiontools 1/41
Data CollectionTools
Module 17
Data ONTAP 8.0 7-Mode
Administration
7/27/2019 M17-DataCollectionTools
http://slidepdf.com/reader/full/m17-datacollectiontools 2/41
© 2009 NetApp. All rights reserved.
Module Objectives
By the end of this module, you should be able to:
Use the sysstat, stats, statit, and
options commands
Describe the factors that affect RAID
performance
Execute commands to collect data about write
throughput
Execute commands to verify the operation of hardware, software, and network components
Identify commands and options used to obtain
configuration and status
7/27/2019 M17-DataCollectionTools
http://slidepdf.com/reader/full/m17-datacollectiontools 3/41
© 2009 NetApp. All rights reserved.
System Health
Performance problems can originate from
multiple sources. To avoid some of these
problems, check or monitor the following:
Disk configuration
– Disk status – Write performance
– Read performance
RAID configuration Connectivity configuration
Performance measures
7/27/2019 M17-DataCollectionTools
http://slidepdf.com/reader/full/m17-datacollectiontools 4/41
© 2009 NetApp. All rights reserved.
Disk Status
7/27/2019 M17-DataCollectionTools
http://slidepdf.com/reader/full/m17-datacollectiontools 5/41
© 2009 NetApp. All rights reserved.
Disk Status
Monitor disks:
– shelfchk
– led_on diskid and led_off diskid
(priv set advanced command)
Storage Health Monitor: – Simple storage system management service
– Automatically initiates during system boot
– Provides background monitoring of individual disk
performance
– Detects impending disk problems before they
actually occur
– disk shm_stats (priv set advanced command)
7/27/2019 M17-DataCollectionTools
http://slidepdf.com/reader/full/m17-datacollectiontools 6/41
© 2009 NetApp. All rights reserved.
Syslog Messages
shm: disk has reported a predicted
failure (PFA) event: disk XX ,serial_number XXXX
shm: link failure detected, upstreamfrom disk: id XX , serial_number XXXXX
shm: disk I/O completion times too long:disk XX , serial number XXXXX
shm: possible link errors on disk: idXX , serial number XXXXX
shm: disk returns excessive recovered
errors: disk XX , serial number XXXXX
shm: intermittent instability on theloop that is attached to Fibre Channeladapter: id XXX , name XXXXX
7/27/2019 M17-DataCollectionTools
http://slidepdf.com/reader/full/m17-datacollectiontools 7/41© 2009 NetApp. All rights reserved.
Write Performance
7/27/2019 M17-DataCollectionTools
http://slidepdf.com/reader/full/m17-datacollectiontools 8/41© 2009 NetApp. All rights reserved.
Write Performance Commands
Use the following commands to research write
performance:
Command Function
sysstat Displays current statistics
statit Displays disk utilization
stats Displays performance data
7/27/2019 M17-DataCollectionTools
http://slidepdf.com/reader/full/m17-datacollectiontools 9/41© 2009 NetApp. All rights reserved.
Write Performance: sysstat Command
system> sysstat -c 10 -s 5
CPU NFS CIFS HTTP Net kB/s Disk kB/s Tape kB/s Cachein out read write read write age
2% 0 0 0 0 0 9 23 0 0 >60
0% 0 0 0 0 0 0 0 0 0 >60
5% 0 0 0 0 0 21 27 0 0 >60
1% 0 0 0 0 0 0 0 0 0 >60
5% 0 0 0 0 0 20 28 0 0 >60
1% 0 0 0 0 0 0 0 0 0 >60
4% 0 0 0 0 0 21 26 0 0 >60
1% 0 0 0 0 0 0 0 0 0 >60
5% 0 0 0 0 0 22 27 0 0 >60
0% 0 0 0 0 0 0 0 0 0 >60
--
Summary Statistics (10 samples 5.0 secs/sample)
CPU NFS CIFS HTTP Net kB/s Disk kB/s Tape kB/s Cache
in out read write read write ageMin
0% 0 0 0 0 0 0 0 0 0 >60
Avg
2% 0 0 0 0 0 9 13 0 0 >60
Max
5% 0 0 0 0 0 22 28 0 0 >60
7/27/2019 M17-DataCollectionTools
http://slidepdf.com/reader/full/m17-datacollectiontools 10/41© 2009 NetApp. All rights reserved.
stats: System Performance
The stats command displays statistical data about
the storage system and is capable of displayingstatistics on every aspect of the storage system
Statistics returned using the stats command are
based on the following hierarchy:
– Objects— Any entity in the system is an object (physicalor logical, including volumes, aggregates, qtrees, disks,
and NICs)
– Instances— An object such as a volume called nfsflex,
or an aggregate called aggr1, or a disk identified as0b.17
– Counters—The counters associated with particular
objects and instances
7/27/2019 M17-DataCollectionTools
http://slidepdf.com/reader/full/m17-datacollectiontools 11/41© 2009 NetApp. All rights reserved.
stats: Examples of Objects and Instances
Examples of objects:
– Aggregate
– Volume
– Qtree
– Disk – CIFS
– NFS
– LUN
Examples of instances: – /vol/vol0, /vol/nfstree, 0b.18
– /vol/flex1/lun_test
– cifs_ops, cifs_latency, cifs_read_ops
7/27/2019 M17-DataCollectionTools
http://slidepdf.com/reader/full/m17-datacollectiontools 12/41© 2009 NetApp. All rights reserved.
The stats Command
The stats command can be executed in one of
three ways, based on the frequency of displays:
Once—Current counter values are displayedstats show
Repeating—Counter values are displayed at afixed interval
stats show –i 1
Period—Counter values are gathered over asingle period of time and then displayed
stats start then stats stop
7/27/2019 M17-DataCollectionTools
http://slidepdf.com/reader/full/m17-datacollectiontools 13/41© 2009 NetApp. All rights reserved.
The stats Command (Cont.)
Use stats list counters to see what is available
The statistics available through the stats
infrastructure are available using other tools such asperfmom, perfstat and Operations Manager
The following are examples of stats commands:
system> stats show cifs:cifs:cifs_latency
cifs:cifs:cifs_latency:1.92m
system> stats show volume:vol0:write_latency
volume:vol0:write_latency:171.50us
7/27/2019 M17-DataCollectionTools
http://slidepdf.com/reader/full/m17-datacollectiontools 15/41© 2009 NetApp. All rights reserved.
Read Performance
7/27/2019 M17-DataCollectionTools
http://slidepdf.com/reader/full/m17-datacollectiontools 16/41© 2009 NetApp. All rights reserved.
Read Performance
Data ONTAP is optimized for write
performance
Read performance could decrease over timeNOTE: Efficient use of cache can offset some disk performance
issues.
Optimization:
– To measure optimization:
reallocate measure [vol | file]
– To resolve optimization:reallocate start < pathname>
7/27/2019 M17-DataCollectionTools
http://slidepdf.com/reader/full/m17-datacollectiontools 17/41© 2009 NetApp. All rights reserved.
RAID Configuration
7/27/2019 M17-DataCollectionTools
http://slidepdf.com/reader/full/m17-datacollectiontools 18/41© 2009 NetApp. All rights reserved.
RAID Groups
/vol0 /vol1 /vol2
rg0 rg0 rg0
rg1
7/27/2019 M17-DataCollectionTools
http://slidepdf.com/reader/full/m17-datacollectiontools 19/41© 2009 NetApp. All rights reserved.
RAID Group Size and Composition
The following are some examples of poor RAID
configuration choices:
Unnecessarily using multiple RAID groups
Using mixed disk sizes
Configuring RAID groups with wide variationsin capacity
Configuring RAID groups with only one or two
data disks Configuring RAID groups with a number of
disks larger than the default
7/27/2019 M17-DataCollectionTools
http://slidepdf.com/reader/full/m17-datacollectiontools 20/41© 2009 NetApp. All rights reserved.
Initial RAID Group Configuration
Limit the number of disks in a RAID group to
the recommended numbers
Ensure that each RAID group in an aggregate
has approximately the same capacity
Ensure that each RAID group in an aggregatehas at least three data disks
Use disks of the same size within a RAID
group to optimize write performance
Use RAID-DP™ to protect against disk failures
7/27/2019 M17-DataCollectionTools
http://slidepdf.com/reader/full/m17-datacollectiontools 21/41© 2009 NetApp. All rights reserved.
Adding Disks to Existing RAID Groups
Add RAID groups when the applied load is
stressing the drives in the current array
Add RAID groups and disks before the file
system or aggregate is 80% to 90% full
Add disks in groups Plan data expansion so that no fewer than
three data disks are used for any RAID group
7/27/2019 M17-DataCollectionTools
http://slidepdf.com/reader/full/m17-datacollectiontools 22/41
© 2009 NetApp. All rights reserved.
MonitoringConnectivity
7/27/2019 M17-DataCollectionTools
http://slidepdf.com/reader/full/m17-datacollectiontools 23/41
© 2009 NetApp. All rights reserved.
Connectivity
Use the following to monitor connectivity:
MAC – ifconfig
– ifstat
– arp
TCP/IP – ifconfig
– /etc/rc and /etc/hosts
– ping
– netstat -r Protocols
– nfsstat
– cifs stat
– nbtstat
7/27/2019 M17-DataCollectionTools
http://slidepdf.com/reader/full/m17-datacollectiontools 24/41
© 2009 NetApp. All rights reserved.
Performance Measures
7/27/2019 M17-DataCollectionTools
http://slidepdf.com/reader/full/m17-datacollectiontools 25/41
© 2009 NetApp. All rights reserved.
Measuring NFS Performance
options nfs.per_client_stats.enable [on|off]
Recommended to disable when not using nfsstat–l
This display shows
the breakdown on
this mountpoint of
lookups, reads,
writes, and all
operations. Theaverage deviation
and the settings for
retransmissions of
each type also are
displayed.
Data ONTAP NFS Output - Command: nfsstat -l
/n/homesystem from homesystem.corp.com:/homeFlags:vers=2,proto=udp,auth=unix,hard,intr,dynamic,rsize=8192 wsize=8192,retrans=5
Lookups: sttr=7(17ms), dev=4(20ms),cur=2(40ms)Reads: sttr=12(30ms), dev=4(20ms),cur=3(40ms) Writes: sttr=21(52ms), dev=5(25ms),cur=5(100ms) All: sttr=7(7ms), dev=4(20ms),cur=2(40ms)
The output includes server name and address, mount
flags, current read and write sizes, retransmissions count,
and timers used for dynamic retransmission.
Round trip response
times for specific
NFS operations are
displayed.
7/27/2019 M17-DataCollectionTools
http://slidepdf.com/reader/full/m17-datacollectiontools 26/41
© 2009 NetApp. All rights reserved.
Measuring CIFS Performance
Analyzing smb_hist output
CIFS request time processing: (46457) - milliseconds units
0ms 1ms 2ms 3ms 4ms 5ms 6ms 7ms
13175 17752 5111 664 451 478 570 568
<16ms <24ms <32ms <40ms <48ms <56ms <64ms unused
4039 2309 569 165 61 21 10 0
This number is the total number of
operations since smb_hist statistics were last reset.
This column represents
millisecond (ms) time stampsfor operations.
Every other row displays the number of
operations that took place in the interval in
the row above it. In this example, 13,715
operations happened in less than .5 ms.
The time interval window lies halfway between
the values for adjacent columns. In this
example, 165 operations occurred in the 36-ms
to 44-ms windows.
7/27/2019 M17-DataCollectionTools
http://slidepdf.com/reader/full/m17-datacollectiontools 27/41
© 2009 NetApp. All rights reserved.
Obtaining Statistics
The statit command:
Is an advanced-mode command used for more
detailed analysis of system performance
Gathers per-second statistics averaged over
the length of time it is running in thebackground
Shows statistics representing all physical and
some logical objects on the storage system
Most of the data collected represents rates at
which things are happening
7/27/2019 M17-DataCollectionTools
http://slidepdf.com/reader/full/m17-datacollectiontools 28/41
© 2009 NetApp. All rights reserved.
Using the statit Command
To obtain statistics using the statit command,
complete the following steps:
1. To enter advanced privilege mode, enter:priv set advanced
2. To begin collecting statistics, enter:statit
–b.
3. After 30 seconds (or as necessary to end statistics
collection and include NFS statistics), enter:statit –e –n
4. To return to normal admin privilege mode, enter:priv set admin
7/27/2019 M17-DataCollectionTools
http://slidepdf.com/reader/full/m17-datacollectiontools 29/41
© 2009 NetApp. All rights reserved.
Obtaining Statistics
The report generated is divided into the following
statistics sections: CPU
Multiprocessor
CSMP domain switches
Miscellaneous
WAFL
RAID
Network interface
Disk
Aggregate Spares and other disks
FCP
iSCSI
Tape
7/27/2019 M17-DataCollectionTools
http://slidepdf.com/reader/full/m17-datacollectiontools 30/41
© 2009 NetApp. All rights reserved.
CPU Statistics
CPU Statistics
506.934263 time (seconds) 100 %275.044317 system time 54 %
23.412966 rupt time 5 % (7022 rupts x 0 usec/rupt
251.466451 non-rupt system time 50 %
271.837944 idle time 44 %
439.543653 time in CP 92 % 100 %
21.837230 rupt time in CP 5 % (132 rupts x 0 sec/rupt)
7/27/2019 M17-DataCollectionTools
http://slidepdf.com/reader/full/m17-datacollectiontools 31/41
© 2009 NetApp. All rights reserved.
Multiprocessor Statistics
Multiprocessor Statistics (per second)
cpu0 cpu1 total
sk switches 1378.09 46.82 1424.91
hard switches 1175.27 29.15 1204.42
domain switches 103.89 16.08 119.96
CP rupts 0.00 0.00 0.00
nonCP rupts 100.00 0.00 100.00
nonCP rupt usec 0.00 0.00 0.00
Idle 1000000.00 1000000.00 2000000.00
kahuna 0.00 0.00 0.00
network 0.00 0.00 0.00
storage 0.00 0.00 0.00 exempt 0.00 0.00 0.00
raid 0.00 0.00 0.00
target 0.00 0.00 0.00
netcache 0.00 0.00 0.00
netcache2 0.00 0.00 0.00
7/27/2019 M17-DataCollectionTools
http://slidepdf.com/reader/full/m17-datacollectiontools 32/41
© 2009 NetApp. All rights reserved.
Miscellaneous Statistics
Miscellaneous Statistics (per second)
1893.73 hard context switches0.00 NFS operations
0.00 CIFS operations
0.00 HTTP operations
0.00 NetCache URLs
0.00 streaming packets
0.00 network KB received0.00 network KB transmitted
18.16 disk KB read
61.30 disk KB written
0.28 NVRAM KB written
0.00 nolog KB written
0.00 WAFL® bufs given to clients
0.00 checksum cache hits ( 0%)0.00 no checksum - partial buffer
0.00 DAFS operations
0.00 FCP operations
0.00 iSCSI operations
7/27/2019 M17-DataCollectionTools
http://slidepdf.com/reader/full/m17-datacollectiontools 33/41
© 2009 NetApp. All rights reserved.
WAFL Rates
WAFL Statistics (per second)
5.96 name cache hits ( 62%)3.69 name cache misses ( 38%)
19.30 inode cache hits ( 100%)
0.00 inode cache misses ( 0%)
55.06 buf cache hits ( 100%)
0.00 buf cache misses ( 0%)
0.00 blocks read
0.00 blocks read-ahead
0.00 chains read-ahead
0.00 blocks speculative read-ahead5.11 blocks written
0.57 stripes written
0.00 blocks over-written
0.28 wafl_timer generated CP0.00 snapshot generated CP
0.00 wafl_avail_bufs generated CP
0.00 dirty_blk_cnt generated CP
0.00 full NV-log generated CP
0.00 back-to-back CP
0.00 flush generated CP
0.00 sync generated CP
0.00 wafl_avail_vbufs generated CP
55.06 non-restart messages0.00 IOWAIT suspends
604852 buffers
7/27/2019 M17-DataCollectionTools
http://slidepdf.com/reader/full/m17-datacollectiontools 34/41
© 2009 NetApp. All rights reserved.
Network Interface Statistics
Network Interface Statistics (per second)
iface side bytes packets multicasts errors collisionse0 recv 171.69 2.55 0.00 0.00 0.00
xmit 115.22 1.42 0.00 0.00 0.00
e9 recv 0.00 0.00 0.00 0.00 0.00
xmit 0.00 0.00 0.00 0.00 0.00
e6 recv 0.00 0.00 0.00 0.00 0.00
xmit 0.00 0.00 0.00 0.00 0.00vh recv 0.00 0.00 0.00 0.00 0.00
xmit 0.00 0.00 0.00 0.00 0.00
7/27/2019 M17-DataCollectionTools
http://slidepdf.com/reader/full/m17-datacollectiontools 35/41
© 2009 NetApp. All rights reserved.
Disk Statistics
Disk Statistics (per second)
ut% is the percent of time the disk was busy.
xfers is the number of data transfer commands issued per second.
xfers = ureads + writes + cpreads + greads + gwrites
chain is the average number of 4K blocks per command.
usecs is the average disk round trip time per 4K block.
disk ut% xfers ureads--chain-usecs writes--chain-usecs cpreads-chain-usecs
/vol0/plex0/rg0:
8a.16 5 3.69 0.57 1.00 94500 ...
8a.21 4 3.12 0.57 1.00 39500 ...
7/27/2019 M17-DataCollectionTools
http://slidepdf.com/reader/full/m17-datacollectiontools 36/41
© 2009 NetApp. All rights reserved.
Aggregate, Spares, and Disk Statistics
Aggregate statistics:
Minimum 0 0.00 0.00 0.00 0.00 0.00 0.00Mean 1 0.28 0.00 0.28 0.00 0.00 0.00
Maximum 5 3.69 0.57 3.12 0.00 0.00 0.00
Spares and other disks:
8b.16 2 1.70 1.70 1.00 10167 0.00 .... . 0.00 .... . 0.00 .... . 0.00 ..
8b.17 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
8b.18 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
7/27/2019 M17-DataCollectionTools
http://slidepdf.com/reader/full/m17-datacollectiontools 37/41
© 2009 NetApp. All rights reserved.
FCP, iSCSI, and Tape Operations
FCP Statistics (per second)
0.00 FCP Bytes recv 0.00 FCP Bytes sent0.00 FCP ops
iSCSI Statistics (per second)
0.00 iSCSI Bytes recv 0.00 iSCSI Bytes xmit
0.00 iSCSI ops
Interrupt Statistics (per second)
2000.15 Clock 3.97 Fast Enet
47.68 FCAL 4.54 int_22
3.41 FCAL 2059.75 total
7/27/2019 M17-DataCollectionTools
http://slidepdf.com/reader/full/m17-datacollectiontools 38/41
© 2009 NetApp. All rights reserved.
Other Resources
For more information about data collection and performance, see the
Fundamentals of Performance Analysis course.This advanced course shows you how to:
Analyze data using recommended methodology to correlateperformance data into performance analysis information
Monitor performance using performance tools and establish abaseline of expected throughput and response times for storagesystems under planned and increasing workloads
Perform capacity planning by monitoring performance andcomparing baseline information over time to determine when astorage system will reach maximum capacity
Perform tuning for optimal performance for protocols such as
CIFS, NFS and SAN (including locating resources with tuningguidelines for database scenarios)
Perform bottleneck analysis
7/27/2019 M17-DataCollectionTools
http://slidepdf.com/reader/full/m17-datacollectiontools 39/41
© 2009 NetApp. All rights reserved.
Module Summary
In this module, you should have learned to:
Use the sysstat, stats, statit, and
options commands
Describe the factors that affect RAID
performance Execute commands to collect data about write
throughput
Execute commands to verify the operation of hardware, software, and network components
Identify commands and options used to obtain
configuration and status
7/27/2019 M17-DataCollectionTools
http://slidepdf.com/reader/full/m17-datacollectiontools 40/41
Exercise
Module 17: Data Collection Tools
Estimated Time: 60 minutes
7/27/2019 M17-DataCollectionTools
http://slidepdf.com/reader/full/m17-datacollectiontools 41/41
Check Your Understanding
What command(s) would you use to display
disk utilization?
– statit
What command(s) would you use to monitor
connectivity? – ifconfig, ifstat, arp, ping,
netstat
What command(s) would you use to help
detect impending disk problems before they
occur?
– disk shm_stats