IBM Spectrum Scale 5.0.0: Problem Determination Guide · PDF fileUnderstanding the Remote Pr...

756
IBM Spectrum Scale Version 5.0.0 Problem Determination Guide SC27-9221-02 IBM

Transcript of IBM Spectrum Scale 5.0.0: Problem Determination Guide · PDF fileUnderstanding the Remote Pr...

  • IBM Spectrum ScaleVersion 5.0.0

    Problem Determination Guide

    SC27-9221-02

    IBM

  • IBM Spectrum ScaleVersion 5.0.0

    Problem Determination Guide

    SC27-9221-02

    IBM

  • NoteBefore using this information and the product it supports, read the information in Notices on page 691.

    This edition applies to version 5 release 0 modification 0 of the following products, and to all subsequent releasesand modifications until otherwise indicated in new editions:v IBM Spectrum Scale ordered through Passport Advantage (product number 5725-Q01)v IBM Spectrum Scale ordered through AAS/eConfig (product number 5641-GPF)v IBM Spectrum Scale for Linux on Z (product number 5725-S28)v IBM Spectrum Scale for IBM ESS (product number 5765-ESS)

    Significant changes or additions to the text and illustrations are indicated by a vertical line (|) to the left of thechange.

    IBM welcomes your comments; see the topic How to send your comments on page xxv. When you sendinformation to IBM, you grant IBM a nonexclusive right to use or distribute the information in any way it believesappropriate without incurring any obligation to you.

    Copyright IBM Corporation 2014, 2018.US Government Users Restricted Rights Use, duplication or disclosure restricted by GSA ADP Schedule Contractwith IBM Corp.

  • Contents

    Tables . . . . . . . . . . . . . . . ix

    About this information . . . . . . . . xiPrerequisite and related information . . . . . xxiiiConventions used in this information . . . . . xxivHow to send your comments . . . . . . . . xxv

    Summary of changes . . . . . . . xxvii

    Chapter 1. Performance monitoring . . . 1Network performance monitoring . . . . . . . 1

    Monitoring networks using GUI . . . . . . . 3Monitoring GPFS I/O performance with themmpmon command . . . . . . . . . . . . 4

    Overview of mmpmon . . . . . . . . . . 4Specifying input to the mmpmon command . . . 5Display I/O statistics per mounted file system . . 6Display I/O statistics for the entire node . . . . 8Understanding the node list facility. . . . . . 9Reset statistics to zero . . . . . . . . . . 16Understanding the request histogram facility . . 17Understanding the Remote Procedure Call (RPC)facility . . . . . . . . . . . . . . . 29Displaying mmpmon version . . . . . . . 34Example mmpmon scenarios and how to analyzeand interpret their results. . . . . . . . . 34Other information about mmpmon output . . . 43

    Performance monitoring tool overview . . . . . 44Configuring the performance monitoring tool . . 46Starting and stopping the performancemonitoring tool . . . . . . . . . . . . 76Restarting the performance monitoring tool . . 76Configuring the metrics to collect performancedata . . . . . . . . . . . . . . . . 77Viewing and analyzing the performance data . . 77

    Performance monitoring using IBM Spectrum ScaleGUI . . . . . . . . . . . . . . . . . 87

    Configuring performance monitoring options inGUI . . . . . . . . . . . . . . . . 89Configuring performance metrics and displayoptions in the Statistics page of the GUI . . . . 90Configuring the dashboard to view performancecharts . . . . . . . . . . . . . . . 93Querying performance data shown in the GUIthrough CLI . . . . . . . . . . . . . 95Monitoring performance of nodes . . . . . . 95Monitoring performance of file systems . . . . 96Monitoring performance of NSDs . . . . . . 97

    Performance monitoring limitations . . . . . . 98

    Chapter 2. Monitoring system healthusing IBM Spectrum Scale GUI . . . . 99Monitoring events using GUI . . . . . . . . 99Set up event notifications . . . . . . . . . 100

    Configuring email notifications . . . . . . 101Configuring SNMP manager . . . . . . . 102

    Monitoring tip events . . . . . . . . . . 103Monitoring thresholds . . . . . . . . . . 104

    Chapter 3. Monitoring system healthby using the mmhealth command . . 109Monitoring the health of a node . . . . . . . 109Event type and monitoring status for system health 111Threshold monitoring for system health . . . . 112System health monitoring use cases . . . . . . 113

    Chapter 4. Monitoring events throughcallbacks . . . . . . . . . . . . . 125

    Chapter 5. Monitoring capacitythrough GUI . . . . . . . . . . . . 127

    Chapter 6. Monitoring AFM and AFMDR . . . . . . . . . . . . . . . . 129Monitoring fileset states for AFM. . . . . . . 129Monitoring fileset states for AFM DR . . . . . 132Monitoring health and events . . . . . . . . 136

    Monitoring with mmhealth . . . . . . . . 136Monitoring callback events for AFM and AFMDR . . . . . . . . . . . . . . . . 136

    Monitoring performance. . . . . . . . . . 137Monitoring using mmpmon . . . . . . . . 137Monitoring using mmperfmon . . . . . . . 138

    Monitoring prefetch . . . . . . . . . . . 139Monitoring status using mmdiag . . . . . . . 139Policies used for monitoring AFM and AFM DR 141Monitoring AFM and AFM DR using GUI. . . . 142

    Chapter 7. GPFS SNMP support . . . 147Installing Net-SNMP . . . . . . . . . . . 147Configuring Net-SNMP . . . . . . . . . . 148Configuring management applications . . . . . 148Installing MIB files on the collector node andmanagement node. . . . . . . . . . . . 149Collector node administration . . . . . . . . 149Starting and stopping the SNMP subagent. . . . 150The management and monitoring subagent . . . 150

    SNMP object IDs . . . . . . . . . . . 151MIB objects . . . . . . . . . . . . . 151Cluster status information . . . . . . . . 151Cluster configuration information . . . . . 152Node status information. . . . . . . . . 152Node configuration information . . . . . . 152File system status information . . . . . . . 153File system performance information . . . . 154Storage pool information . . . . . . . . 154Disk status information . . . . . . . . . 155Disk configuration information . . . . . . 155

    Copyright IBM Corp. 2014, 2018 iii

  • Disk performance information . . . . . . . 156Net-SNMP traps . . . . . . . . . . . 156

    Chapter 8. Monitoring the IBMSpectrum Scale system by using callhome . . . . . . . . . . . . . . . 159Understanding call home . . . . . . . . . 159Configuring call home to enable manual andautomated data upload . . . . . . . . . . 161

    Configuring the call home groups manually . . 161Configuring the call home groups automatically 163

    Monitoring, uploading, and sharing collected datawith IBM Support . . . . . . . . . . . . 165Configuring call home using GUI . . . . . . 170Call home configuration examples . . . . . . 170

    Chapter 9. Monitoring remote clusterthrough GUI . . . . . . . . . . . . 173

    Chapter 10. Monitoring file auditlogging . . . . . . . . . . . . . . 175Monitoring the message queue server andZookeeper status . . . . . . . . . . . . 175Displaying the port that the Kafka broker serversare using . . . . . . . . . . . . . . . 175Determining the current topic generation numberthat is being used in the file system . . . . . . 175Monitoring the consumer status . . . . . . . 176Monitoring file audit logging states . . . . . . 176

    Chapter 11. Best practices fortroubleshooting . . . . . . . . . . 179How to get started with troubleshooting . . . . 179Back up your data. . . . . . . . . . . . 179Resolve events in a timely manner . . . . . . 180Keep your software up to date . . . . . . . 180Subscribe to the support notification. . . . . . 180Know your IBM warranty and maintenanceagreement details . . . . . . . . . . . . 181Know how to report a problem . . . . . . . 181Other problem determination hints and tips . . . 182

    Which physical disk is associated with a logicalvolume in AIX systems? . . . . . . . . . 182Which nodes in my cluster are quorum nodes? 182What is stored in the /tmp/mmfs directory andwhy does it sometimes disappear? . . . . . 183Why does my system load increase significantlyduring the night? . . . . . . . . . . . 183What do I do if I receive message 6027-648? . . 183Why can't I see my newly mounted Windowsfile system? . . . . . . . . . . . . . 184Why is the file system mounted on the wrongdrive letter? . . . . . . . . . . . . . 184Why does the offline mmfsck command failwith "Error creating internal storage"? . . . . 184Why do I get timeout executing function errormessage? . . . . . . . . . . . . . . 184Questions related to active file management . . 184

    Chapter 12. Understanding the systemlimitations . . . . . . . . . . . . . 187

    Chapter 13. Collecting details of theissues . . . . . . . . . . . . . . 189Collecting details of issues by using logs, dumps,and traces . . . . . . . . . . . . . . 189

    Time stamp in GPFS log entries . . . . . . 189Logs . . . . . . . . . . . . . . . 190Setting up core dumps on a client RHEL system 213Configuration changes required on protocolnodes to collect core dump data . . . . . . 214Setting up an Ubuntu system to capture crashfiles . . . . . . . . . . . . . . . 215Trace facility. . . . . . . . . . . . . 215

    Collecting diagnostic data through GUI . . . . 229CLI commands for collecting issue details . . . . 230

    Using the gpfs.snap command . . . . . . 230mmdumpperfdata command . . . . . . . 241mmfsadm command . . . . . . . . . . 243Commands for GPFS cluster state information 244GPFS file system and disk informationcommands . . . . . . . . . . . . . 248

    Collecting details of the issues from performancemonitoring tools . . . . . . . . . . . . 262Other problem determination tools . . . . . . 263

    Chapter 14. Managing deadlocks . . . 265Debug data for deadlocks . . . . . . . . . 265Automated deadlock detection . . . . . . . 266Automated deadlock data collection . . . . . . 267Automated deadlock breakup . . . . . . . . 268Deadlock breakup on demand. . . . . . . . 269

    Chapter 15. Installation andconfiguration issues . . . . . . . . 271Resolving most frequent problems related toinstallation, deployment, and upgrade . . . . . 272