Disk Drive Response Overview

October 12, 201 [Detecting Slow Disk Drives on E-Series Storage]

1 NetApp Corporation E-Series Software Engineering

DETECTING SLOW DISK DRIVES ON E-SERIES STORAGE M ichael A . Jastad, HPC Technologist NetApp, Inc. APG

Alexander Sammer, HPC Product Manager NetApp, Inc. APG

Key Words Drive Response Time, Performance Reader A ids -

INTRODUCTION As data storage systems grow in size and complexity, manually administering and optimizing storage performance in large storage environments is becoming more impractical and expensive. The NetApp Disk Drive Response Time (DDRT) Utility solves that problem by automating the process of monitoring, collecting, measuring, and reporting drive response time performance.

The automated results enable you to quickly and accurately identify problems, conduct root-cause analyses, and implement solutions for improving system performance in a shared file system or a High-Performance Computing (HPC) environment, thereby reducing the time, resources, and costs of maintaining large storage arrays.

KEY FEATURES OF NETAPP DDRT NetApp DDRT provides reliable and configurable monitoring and reporting that can adapt to changing conditions across multiple storage arrays in both small-scale and high performance computing (HPC) environments and in large, production, storage domains. The pluggable architecture enables NetApp DDRT to scale dynamically.

NetApp DDRT provides the metrics you need to monitor, detect, and isolate when and where an individual disk drive is failing or performing slower than desired and notify you in time to repair the problem before it impacts your system.

NetApp DDRT performs the following operations to measure and detect drives that are operating slower than normal:

Calculates , compares the mean with each drives response, and measures the difference. The results are

reflected in both reporting and event notification instrumentation.

Compares each drive s response time within a volume group, or collection of volume groups, against an absolute threshold value specified by a user and generates an event.

Compares the mean response time of each volume group against other volume groups configured in a threshold collection and reports the results in a graph format.

Without issuing lengthy and time-consuming commands, you can spot disk drives that exceed a defined response time and implement corrective action to increase the performance of your system.

You can monitor the response times of disk drives to see performance trending and improve system performance without disrupting normal operations.

If disk drives exceed set response times, NetApp DDRT automatically and immediately alerts you, which frees you from the task of constantly monitoring the system.

PRODUCT ASSEMBLY OVERVIEW NetApp DDRT consists of the NetApp E-Series Web Management Console integrated with the Unified Management Tool Kit (UMTK) that was developed specifically for High Performance Computing (HPC), environments.

Unified Management Tool Kit The UMTK is a collection of utilities used by IT administrators operating in HPC environments. The UMTK provides provisioning and performance monitoring capabilities in systems with large storage footprints. HPC storage footprints usually scale between 10 and 20 petabytes and are controlled by over 1000 controller pairs. Systems of this scale are typically built using basic building blocks, as shown below.



Infiniband Channel A

Compute Nodes

Infiniband Channel B

E-Series StorageE5400 (Pikes Peak)

Management Nodes (RHEL x86_64) - Zeon

Ethernet - OOB Management

`

Browser Client

`

Browser Client

Building Block Architecture Building blocks are built from key components that provide the essential functions and features needed to discover, manage, and monitor the performance of storage arrays and disk drives. Building blocks can manage storage groups containing up to 1000 storage arrays in a group. Up to five building blocks can be replicated in an application instance, which enables the application to dynamically scale when necessary, depending on the size of the storage footprint.

The building blocks and their replication within a single application instance ensure that the application is responsive and resilient when managing and monitoring large storage silos and the performance, events, and vital product data (VPD) collected over time.

Auto Storage Device Discovery Engine The device discovery engine receives input in the form of a class B or a class C IP address, or a range of class B IP addresses, and constructs the necessary parts to complete a well-formed IP address. For each of the generated IP addresses, asynchronous socket connections are created. The asynchronous socket connections scan for services listening on the SYMbol server port 2346. When the discovery engine receives a connection response on the socket, the discovery engine attempts to connect to the SYMbol service and query for the following storage product information:

Product ID Firmware version Array name SAID

When a connection is successfully established, the socket and its VPD is bundled and presented to the user.

Performance Data Polling Engine The polling engine requests performance bundles from each of the discovered storage arrays within a fixed collection cycle and inserts the raw performance data into the applications repository.

Poll Performance Data from

Discovered SA

Groom raw I/O and Persist in

Repository

Poll Interval Expired

No

Yes

Polling Engine

In addition to the raw performance data, time stamps are collected, and the calculations are used to



establish data points as an indication of performance within a specified period of time. The polling engine performs the following actions:

1. Retrieves the performance bundle from the storage array.

2. Indexes the bundle for disk drive and volume raw performance metrics.

3. Extracts the base time from the controller (or calculate the base time from the last polling cycle).

4. Resets the base time on the controller to start a new performance cycle.

5. Captures the current (observed) time. 6. Calculates the time stamp by subtracting the

base-time from the current time.

The polling engine then persists the following information:

Base time controller time stamp that marks the performance data interval.

Observed time current time. Time stamp difference calculated between the

observed time and the base time. Raw performance I/O for volumes and disk

drives raw I/O from the performance bundle. Quotient number derived by dividing the raw

I/O by the timestamp).

This process is performed for all disk drives, volumes, and storage arrays.

Storage Event Monitoring and Notification The NetApp event monitoring and notification assemblies are integrated as services and have two significant roles:

Threshold Event Notification State Change Event Notification

Threshold Event Notification Generation The Threshold Monitoring and Event Generation (TME) module is activated when NetApp DDRT starts. The TME module performs batch processing every 15 minutes (sampling interval = 15m) on repository data for drive response times that have exceeded their specified threshold.

Threshold Event Notification Users receive event notifications by email, which is configured during the creation of a Threshold Event Notification record, and posted to the Storage Health Dashboard when a specific threshold condition is detected. These conditions are in the form of a threshold rule and are applied to each threshold collection to avoid or reduce the possibility of

arbitrary drive failures caused by transient response-time events.

Threshold Conditions Threshold Rule

Descr iption 3.3 Rule (default)

When 3 consecutive threshold events occurring within 3 consecutive sampling intervals are observed.

4-in-1 Rule When 4 threshold events are observed within a 24-hour period

4 Decreasing When 4 threshold events are observed over decreasing sampling intervals.

Custom Allows a user to modify the permutations of the 3.3 and the 4-in-1 rules.

State Change Event Monitoring and Notification The State Change Event Monitoring System (SMS) uses connection information from the discovery engine to monitor for events using an RPC socket connections for each discovered array. When a change event or state change is detected, The SMS refreshes the SYMbol object graph stored in cache.

NetApp DDRT behavior must respond to drive state changes within the managed environment. NetApp DDRT flow control is based on the state and the lifecycle of the monitored drives. Because drives are monitored only when participating in a volume group, data collection for a drive is bounded by the active drives configured as part of a volume group.

VG_PARTIAL: A state in which some or all redundancy of the group is lost, but in which the data is still accessible. The volume group is operable, but not exportable.

VG_MISSING: A state in which all member drives of the volume group have been removed from the array. The volume group is neither operable nor exportable.

Fenced Drive or Manually Failed Drive Drives that have been fenced or manually failed and removed from a volume group are no longer monitored. The data for the failed drive or the fenced drive remains in the database and is eventually purged from the repository within the 30-day persistence window.

DRV_STAT_FIELD: The drive is in the failed state.



Replaced Drive New drives participating in a volume group are automatically monitored for performance, response, and state change. If the new drive is replacing another drive, the replaced drive is no longer monitored. The replaced drive performance and VPD data is eventually auto-purged after it expires within the 30 day persistence window.

DRV_REMOVED: The drive is not physically present.

DRV_REPLACED: The drive has been newly inserted into the storage array, and is currently being prepared for use.

PERFORMANCE REPORTS Report Generator The report generator queries performance data from the embedded repository collected by the polling engine. Data collection and report generation are determined by the options the user selects from the Reports section of NetApp DDRT. Performance reports can be selected and generated on a drive, volume, storage array, or collection-group basis.

Disk and Volume Performance After a drive or volume has been selected for reporting, the user can select a unit of time from a pull-down list; units of time range from 1 hour to 30 days. The default report shows the past 30 days. Disk and volume performance charts are generated from time-stamped performance data collected, and persisted in the application repository.

Disk Response Time Report The Disk Response Time Report shows a graph of the response time calculated over the past 30 days. The user can select a unit of time from a pull-down list; units of time range from 1 hour to 30 days. The default unit of time is the past 30 days. If a threshold rule has been created, and is applicable to the drive, the report displays the threshold value along with the recorded response times.

In addition to the graph, the report provides a summary showing the number of observed and notification events generated over the past 30 days. The summary also contains the drive s physical location (storage array SAID, storage array name, tray, and slot), volume group name, VPD (vendor, capacity, spindle speed, interface type, firmware level, and media type), and average drive response time.

Volume Group Mean Report The Volume Group Mean Report graphs and compares the arithmetic mean response time for all the volume groups configured as part of a collection. The mean response time of a volume group is calculated by averaging the calculated response times of all configured disk drives within the volume group. The pull-down combo selection, along with the report title, contains the user-defined collection name.

A summary table also provides additional information for each volume group in the report, including volume group name, RAID, capacity, available capacity, number of drives, drive mismatch bit set, total reads, total writes, notification events, observed events. The summary also includes the collection name, the applied threshold condition rules, and the threshold value.

Volume Workload Report The Volume Workload Report graphs volumes grouped and configured by a user to show side-by-side volume performance. The report enables users to identify hotspots and to identify opportunities for load balancing within their computing environment.

A summary table also provides additional information for each volume in the report, including volume group name, RAID, capacity, available capacity, cache setting, total block reads, total block writes, and block size.

Volume Group Response Time Report The Volume Group Response Time Report shows a graph of each disk response time configured as part of that volume group. The drive response times are calculated over the past 30 days. The user can select a unit of time from a pull-down list; units of time range from 1 hour to 30 days. The default unit of time is the past 30 days. If a threshold rule has been created and applies to the set of drives, the report displays the threshold value along with the recorded response times. Graphed disk drive response time data is rendered as a line graph using variations in both colors and patterns for comparison.

In addition to the graph, the report provides a summary of the number of observed events and notification events generated over the past 30 days. The summary also contains the physical location (storage array SAID, storage array name, tray, and slot), volume group name, VPD (vendor, capacity, spindle speed, interface type, firmware level, and media type), average drive response time, and the percent of operation within the mean.



LOGGING Log Format All log entries contain a time stamp that includes the date (dd/mm/yyyy) and time (hh:mm:ss:ms), followed by the type of error: Error, Warning; Event; Trace, and a description. All log files are located in a subfolder of the installation labeled ../log.

Error Log Error Log entries provide descriptions of error and warning conditions that are detected during process control and data flow execution.

Event Notification Log Event Notification Log entries provide descriptions of events and event notifications encountered when an observed-event-to-notification-event is triggered.

Trace Log Trace Log entries provide method entry and exit statements for trace debugging. The Trace Log is disabled by default.

MEASUREMENTS (%) > the VG mean (default 20% for READ) (ms) > the VG mean (default 5 ms for READ) (ms) > actual drive response time.

The user selects one of the three measurements shown above. An observed event is generated when the results of the user-selected measurement is true.

(%) > the VG Mean Example: The table below represents disk drive response times, in milliseconds (ms), for each interval . The drives in the table are part of the same volume group.

I0 I1 I2 I3 I4 I5 I6

Drive - 0 10 11 10 10 10 11 11

Drive - 1 10 9 11 11 10 10 11

Drive - 2 9 11 8 11 10 11 10

Drive - 3 11 10 9 11 11 10 11

Drive - 4 10 11 12 13 14 14 15

Threshold Value 12 12.48 12 13.44 13.2 13.44 13.92

Over Over Over

1. Examine the data in the table for each interval across all the drives within a given volume group, and calculate the volume group s mean response time:

For Each Drive in Interval-n TOTAL += Drive.Response.Time

VG_MEAN = TOTAL/VG.DriveCount;;

2. Determine the maximum threshold value for each interval by multiplying the VG mean by a specified percent (such as 20%), and add it back to the VG mean:

MAX_THRESHOLD_VALUE = ((VG_MEAN * .2) + VG_MEAN)

3. Within the interval, determine whether each drives response time is greater than the MAX_THRESHOLD_VALUE:

For Each Drive in Interval-n If (Drive-n.ResponseTime > MAX_THRESHOLD_VALUE)

Generate Event

(ms) > the VG Mean

I0 I1 I2 I3 I4 I5 I6

Drive - 0 10 11 10 10 10 11 11

Drive - 1 10 9 11 11 10 10 11

Drive - 2 9 11 8 11 10 11 10

Drive - 3 11 10 9 11 11 10 11

Drive - 4 10 11 12 13 16 17 17

VG Mean 10 10.4 10 11.2 11.4 11.8 12 Threshold Value 15 15.4 15 16.2 16.4 16.8 17

Over

1. Examine the data for each interval across all the drives within the volume group and calculate the volume group s mean response time:

For Each Drive in Interval-n TOTAL += Drive-n.ResponseTime

VG_MEAN = TOTAL/VG.DriveCount;;

2. Determine the maximum threshold value for each interval by adding the VG mean by a specified value in milliseconds (such as 5 ms):

MAX_THRESHOLD_VALUE = (VG_MEAN + DELTA)

3. Within the interval, determine whether each drive s response time is greater than the MAX_THRESHOLD_VALUE:

For Each Drive in Interval-n If (Drive-n.ResponseTime > MAX_THRESHOLD_VALUE) Generate Event



Actual Drive Response Time > User Specified Response Time

I0 I1 I2 I3 I4 I5 I6

Drive - 0 10 11 10 10 10 11 11

Drive - 1 10 9 11 11 10 10 11

Drive - 2 9 11 8 11 10 11 10

Drive - 3 11 10 9 11 11 10 11

Drive - 4 10 11 12 13 16 17 17

Threshold Value 15 15 15 15 15 15 15

Over Over Over

Examine the data for each interval across all the drives within the volume group, and compare each

response time with the user specified threshold:

For Each Drive in Interval-n If(Drive-n.ResponseTime > USER_SPECIFIED_THRESHOLD) Generate Event

Trending Drive trending is done by examining a disk drives response time across each iteration. (See the highlighted blue row in the table.)

Drive - 0 10 11 10 10 10 11 11

Drive - 1 10 9 11 11 10 10 11

Drive - 2 9 11 8 11 10 11 10

Drive - 3 11 10 9 11 11 10 11

I0 I1 I2 I3 I4 I5 I6

Drive Response Times by Volume Group

This chart shows that drive 4 is misbehaving compared to its peers within the same volume group.

CONCLUSION Storage administrators operating within HPC and supercomputing environments must have reliable, configurable, and scalable monitoring and reporting tools that can adapt to the changing conditions within the managed environment. Challenges in these environments can vary, but the primary goals are to ensure that systems are highly available and operating at peak performance.

The NetApp DDRT Utility is the comprehensive software solution that automates the process of detecting, locating, and isolating slow disk drives, sand provides storage administrators with the tools and analytics necessary to quickly and accurately clear performance bottlenecks and reconcile load-balancing issues.

Disk Drive Response Overview

Documents

Transcript of Disk Drive Response Overview