Transcend NCS Network Troubleshooting Guide

®

PINE

INE

Network Control ServicesVersion 5.0 for UNIX®

3 C o m Tra n s c e n d

Net wor k Management

Network Troubleshooting Guide

Businesses run on networks and networks run with management.Part No. 09-1500-000

3Com Corporation5400 Bayfront Plaza Santa Clara, California 95052-8145

Copyright © 1999, 3Com Corporation. All rights reserved. No part of this documentation may be reproduced in any form or by any means or used to make any derivative work (such as translation, transformation, or adaptation) without written permission from 3Com Corporation.

3Com Corporation reserves the right to revise this documentation and to make changes in content from time to time without obligation on the part of 3Com Corporation to provide notification of such revision or change.

3Com Corporation provides this documentation without warranty, term, or condition of any kind, either implied or expressed, including, but not limited to, the implied warranties, terms or conditions of merchantability, satisfactory quality, and fitness for a particular purpose. 3Com may make improvements or changes in the product(s) and/or the program(s) described in this documentation at any time.

If there is any software on removable media described in this documentation, it is furnished under a license agreement included with the product as a separate document, in the hard copy documentation, or on the removable media in a directory file named LICENSE.TXT or !LICENSE.TXT. If you are unable to locate a copy, please contact 3Com and a copy will be provided to you.

UNITED STATES GOVERNMENT LEGEND

If you are a United States government agency, then this documentation and the software described herein are provided to you subject to the following:

All technical data and computer software are commercial in nature and developed solely at private expense. Software is delivered as “Commercial Computer Software” as defined in DFARS 252.227-7014 (June 1995) or as a “commercial item” as defined in FAR 2.101(a) and as such is provided with only such rights as are provided in 3Com’s standard commercial license for the Software. Technical data is provided with limited rights only as provided in DFAR 252.227-7015 (Nov 1995) or FAR 52.227-14 (June 1987), whichever is applicable. You agree not to remove or deface any portion of any legend provided on any licensed program or documentation contained in, or delivered to you in conjunction with, this User Guide.

Portions of this documentation are reproduced in whole or in part with permission from (as appropriate).

Unless otherwise indicated, 3Com registered trademarks are registered in the United States and may or may not be registered in other countries.

3Com, the 3Com logo, Boundary Routing, EtherDisk, EtherLink, EtherLink II, LinkBuilder, Net Age, NETBuilder, NETBuilder II, OfficeConnect, Parallel Tasking, SmartAgent, SuperStack, TokenDisk, TokenLink, LinkSwitch® 1000, LinkSwitch® 3000,Transcend, and ViewBuilder are registered trademarks of 3Com Corporation. ATMLink, AutoLink, CoreBuilder, DynamicAccess, FDDILink, NetProbe, and PACE are trademarks of 3Com Corporation. 3ComFacts is a service mark of 3Com Corporation.

Artisoft and LANtastic are registered trademarks of Artisoft, Inc. Banyan and VINES are registered trademarks of Banyan Systems Incorporated. CompuServe is a registered trademark of CompuServe, Inc. DEC and PATHWORKS are registered trademarks of Digital Equipment Corporation. Intel and Pentium are registered trademarks of Intel Corporation. AIX, AT, IBM, NetView, and OS/2 are registered trademarks and Warp is a trademark of International Business Machines Corporation. Microsoft, MS-DOS, Windows, and Windows NT are registered trademarks of Microsoft Corporation. Novell and NetWare are registered trademarks of Novell, Inc. PictureTel is a registered trademark of PictureTel Corporation. UNIX is a registered trademark of X/Open Company, Ltd. in the United States and other countries.

All other company and product names may be trademarks of the respective companies with which they are associated.

Guide written by Patricia Johnson, Chris Flisher, Sarah Newman, and Adam Bell. Edited by Ben Mann Jr.. Technical information provided by Dan Bailey, Bob McTague, Graeme Robertson, and Andrew Ward.

ii

CONTENTS

ABOUT THIS GUIDE

Finding Specific Information in This Guide 13What to Expect from This Guide 14Conventions 143Com Device Name Changes 16Related Documentation 16

3Com Publications 16User Guides 16Help Systems 183Com World Wide Web (WWW) 18

Year 2000 Compliance 19

PART I BEFORE TROUBLESHOOTING

1 NETWORK TROUBLESHOOTING OVERVIEW

Introduction to Network Troubleshooting 23About Connectivity Problems 23About Performance Problems 24Solving Connectivity and Performance Problems 24

Network Troubleshooting Framework 25Troubleshooting Strategy 26

Recognizing Symptoms 27User Comments 27Network Management Software Alerts 28Analyzing Symptoms 28

Understanding the Problem 29Identifying and Testing the Cause of the Problem 29

Sample Problem Analysis 30Equipment for Testing 31

Solving the Problem 32

iii

2 YOUR NETWORK TROUBLESHOOTING TOOLBOX

Transcend Applications 33Transcend Central 34Status Watch 34

Web Reporter 34Address Tracker 34LANsentry Manager 35Traffix Manager 35Device View 36

Network Management Platforms 363Com SmartAgent Embedded Software 37Other Commonly Used Tools 39

Ping 39Strategies for Using Ping 39Tips on Interpreting Ping Messages 40

Telnet 41FTP and TFTP 41Analyzers 41Probes 42Cable Testers 42

3 STEPS TO ACTIVELY MANAGING YOUR NETWORK

Designing Your Network for Troubleshooting 43Positioning Your SNMP Management Station 44Using Probes 45Monitoring Business-critical Networks 47

FDDI Backbone Monitoring 48Internet WAN Link Monitoring 48Switch Management Monitoring 48

Using Telnet, Serial Line, and Modem Connections 49Using Communications Servers 50Setting Up Redundant Management 51Other Tips on Network Design 52

Management Station Configuration 52More Tips 52

Preparing Devices for Management 52Configuring Management Parameters 53

iv

Configuring Traps 53Configuring Transcend NCS 53

Monitoring Devices 53Setting Thresholds and Alarms 54

Setting Thresholds in Status Watch 54Setting Thresholds and Alarms in LANsentry Manager 55Refining Alarm Settings 55Setting Alarms Based on a Baseline 56Other Tips for Setting Thresholds and Alarms 57

Knowing Your Network 57Knowing Your Network’s Configuration 57

Site Network Map 58Logical Connections 60Device Configuration Information 60Other Important Data About Your Network 61

Identifying Your Network’s Normal Behavior 62Baselining Your Network 62Identifying Background Noise 63

PART II NETWORK CONNECTIVITY PROBLEMS AND SOLUTIONS

4 MANAGER-TO-AGENT COMMUNICATION

Manager-to-Agent Communication Overview 67Understanding the Problem 67Identifying the Problem 67Solving the Problem 68

Verifying Management Configurations 68Manager-to-Agent Communication Reference 69

IP Address 69Gateway Address 69Subnet Mask 69SNMP Community Strings 69SNMP Traps 71

5 FDDI CONNECTIVITY

FDDI Connectivity Overview 73

v

Understanding the Problem 73Identifying the Problem 75Solving the Problem 76

Monitoring FDDI Connections 76Status Watch 76

Making Your FDDI Connections More Resilient 77Implementing Dual Homing 77Installing an Optical Bypass Unit 78

FDDI Connectivity Reference 79Peer Wrap Condition 79Twisted Ring Condition 79Undesired Connection Attempt Event 80

6 TOKEN RING CONNECTIVITY AND ERRORS

Token Ring Overview 81Using Transcend Applications to Identify Problems and Symptoms 82

Using Token Ring Statistics Tool 82Using LANsentry Manager 84

Using the Ring Station View 85Using TR Network Analyzer Tool 86

Network Graphs 87Active Station and Error Statistics List 87

Token Ring Status Tool 88Token Ring Utilization Tool 88

Identifying and Solving Ring Errors 89Troubleshooting Notes 90

7 ATM AND LANE CONNECTIVITY

ATM and LANE Connectivity Overview 93Color Status and Propagation 94Device Level Troubleshooting 95LANE Level Troubleshooting 95ATM Network Level Troubleshooting 97Virtual LANs Level Troubleshooting 97Identifying VLAN Splits 98

Indications in the VLAN Map 98Indications in the Backbone and Services Window 98

vi

Path Assistants for Identifying Connectivity and Performance Problems 99LE Path Assistant 99ATM Path Assistant 99Tracing a VC Path Between Two ATM End Nodes 100Examining Virtual Channels Across Layer 2 Topologies 100Tracing the LAN Emulation Control VCCs Between Two LANE Clients 100

PART III NETWORK PERFORMANCE PROBLEMS AND SOLUTIONS

8 BANDWIDTH UTILIZATION

Bandwidth Utilization Overview 103Understanding the Problem 103Identifying the Problem 103Solving the Problem 104

Identifying Utilization Problems 104Status Watch 104

Generating Historical Utilization Reports 106Web Reporter 106

Bandwidth Utilization Reference 106ATM Utilization 106Ethernet Utilization 107FDDI Utilization 108Token Ring Utilization 108

9 BROADCAST STORMS

Broadcast Storms Overview 109Understanding the Problem 109Identifying the Problem 109Solving the Problem 110

Identifying a Broadcast Storm 110Status Watch 110Traffix Manager 111

Disabling the Offending Interface 113Address Tracker 113

Correcting Spanning Tree Misconfigurations 113

vii

Device View 113Broadcast Storms Reference 114

Broadcast Packets 114Multicast Packets 114

10 DUPLICATE ADDRESSES

Duplicate Addresses Overview 115Understanding the Problem 115Identifying the Problem 115Solving the Problem 115

Finding Duplicate MAC Addresses 116Status Watch 116

Finding Duplicate IP Addresses 116Address Tracker 116LANsentry Manager 117

Duplicate Addresses Reference 117Duplicate MAC Addresses 117Duplicate IP Addresses 118

11 ETHERNET PACKET LOSS

Ethernet Packet Loss Overview 119Understanding the Problem 119Identifying the Problem 120Solving the Problem 120

Searching for Packet Loss 120Status Watch 121LANsentry Manager Network Statistics Graph 122Device View 125

Ethernet Packet Loss Reference 127Alignment Errors 127Collisions 127CRC Errors 127Excessive Collisions 128FCS Errors 128Late Collisions 128Nonstandard Ethernet Problems 129Receive Discards 129

viii

Too Long Errors 129Too Short Errors 130Transmit Discards 130

12 FDDI RING ERRORS

FDDI Ring Errors Overview 131Understanding the Problem 131Identifying the Problem 131Solving the Problem 132

Identifying Ring Errors 132Status Watch 132

FDDI Ring Errors Reference 133Elasticity Buffer Error Condition 133Frame Error Condition 133Frames Not Copied Condition 133Link Error Condition 134MAC Neighbor Change Event 134

13 NETWORK FILE SERVER TIMEOUTS

Network File Server Timeout Overview 135Understanding the Problem 135Identifying the Problem 135Solving the Problem 136

Looking for Obvious Errors 136Ping and Telnet 136LANsentry Manager Alarms View 136LANsentry Manager Statistics View 137LANsentry Manager History View 137

Reproducing the Fault While Monitoring the Network 138LANsentry Manager Top-N Graph 138LANsentry Manager Packet Capture 138LANsentry Manager Packet Decode 139Address Tracker 139LANsentry Manager Packet Decode 140

Correcting the Fault 140Network File Server Timeouts Reference 141

Jabbering 141

ix

Network File System (NFS) Protocol 141

14 MEASURING ATM NETWORK PERFORMANCE

Measuring Traffic Performance 143Utilization Map 143

Displaying Link Traffic 144Displaying Node Configuration 144

Configuring the Utilization Tool 144Map Configuration 144Polling Configuration 145Communication Configuration 145

Measuring Device Level Performance 145Using the History Graph 145Displaying Statistics 146

Measuring Port Level Performance 146Traffic 146Utilization 146Total Frames 147Good Frames 147Errored Frames 147

LANE Component Statistics 148LES Statistics 148LEC Statistics 148LANE User 149

PART IV REFERENCE

15 SNMP IN NETWORK TROUBLESHOOTING

SNMP Operation 153Manager/Agent Operation 153SNMP Messages 154Trap Reporting 154Security 155

SNMP MIBs 155MIB Tree 155MIB-II 157

x

RMON MIB 158RMON2 MIB 1593Com Enterprise MIBs 160

16 INFORMATION RESOURCES

Books 161URLs 162

INDEX

xi

ABOUT THIS GUIDE

This guide helps you to troubleshoot connectivity and performance problems on your network using Transcend® Network Management Software and other tools.

This guide is intended for network administrators who understand networking technologies and how to integrate networking devices. You should have a working knowledge of:

■ Transmission Control Protocol/Internet Protocol (TCP/IP)

■ Simple Network Management Protocol (SNMP)

■ Network management platforms

■ 3Com devices on your network

You should also be familiar with the interface and features of the Transcend Network Management Software that you have installed.

With subsequent releases of Transcend management software, this guide will be updated with new troubleshooting information and additional Transcend troubleshooting tools. The most current version of this guide is on the 3Com Web site under the Support:

http://www.3com.com

Finding Specific Information in This Guide

This guide, which is available online in Portable Document Format (PDF) and HyperText Markup Language (HTML) formats and in paper, is designed to be used online. For the online version, cross-references to other sections are indicated with links in blue, underlined text, which you can click. You can print any pages as needed.

14 CHAPTER : ABOUT THIS GUIDE

Table 1 provides guidelines for navigating through this document.

What to Expect from This Guide

This guide demonstrates how to troubleshoot problems on your network with the help of Transcend and other tools. It also shows you how to use Transcend to move beyond day-to-day troubleshooting to proactive network management.

This guide is not intended to help you identify and correct problems with installation and use of Transcend software. For that type of troubleshooting, see:

■ The Transcend Network Control Services Installation Guide (for help with installation and startup problems)

■ The Help or user guide for a specific application (for information about troubleshooting application problems)

This guide focuses on technologies to troubleshoot your network and demonstrates how these technologies are applied using Transcend management software.

Conventions Table 2 and Table 3 list conventions that are used throughout this guide.

Table 1 Guidelines for Finding Specific Information in This Guide

If you are looking for See

An introduction to network troubleshooting, information about troubleshooting tools, and guidelines for getting ready for management

Part I: “Before Troubleshooting”

Note: This part is recommended reading for users who are new to network management.

Specific troubleshooting scenarios to help you solve real network problems

Part II: “Network Connectivity Problems and Solutions”

Part III: “Network Performance Problems and Solutions”

Useful background information to help you with troubleshooting tasks

Part IV: “Reference”

Table 2 Notice Icons

Icon Notice Type Description

Information note Information that describes important features or instructions

Conventions 15

Caution Information that alerts you to potential loss of data or potential damage to an application, system, or device

Warning Information that alerts you to potential personal injury

Table 3 Text Conventions

Convention Description

Screen displays This typeface represents information as it appears on the screen.

Syntax The word “syntax” means that you must evaluate the syntax provided and then supply the appropriate values for the placeholders that appear in angle brackets. Example:

To enable RIPIP, use the following syntax:

SETDefault !<port> -RIPIP CONTrol = Listen

In this example, you must supply a port number for <port>.

Commands The word “command” means that you must enter the command exactly as shown and then press Return or Enter. Commands appear in bold. Example:

To remove the IP address, enter the following command:

SETDefault !0 -IP NETaddr = 0.0.0.0

The words “enter” and “type”

When you see the word “enter” in this guide, you must type something, and then press Return or Enter. Do not press Return or Enter when an instruction simply says “type.”

Keyboard key names If you must press two or more keys simultaneously, the key names are linked with a plus sign (+). Example:

Press Ctrl+Alt+Del

Words in italics Italics are used to:

■ Emphasize a point.

■ Denote a new term at the place where it is defined in the text.

■ Identify menu names, menu commands, and software button names. Examples:

From the Help menu, select Contents.

Click OK.

Table 2 Notice Icons

Icon Notice Type Description


3Com Device Name Changes

Many devices of the CoreBuilder™ family consist of some 3Com devices that previously belonged to different 3Com brands. These devices are known by their new CoreBuilder names in the Transcend® NCS software. See Table 4.

Related Documentation

The following documents provide background and related information about local-area networking and internetworking, SNMP-based network management, and 3Com enterprise computing technology.

Most user guides and release notes are available in Adobe Acrobat Reader Portable Document Format (PDF) or HTML on the 3Com World Wide Web site:

http://www.3com.com/

3Com Publications This guide is complemented by other 3Com documents, Help systems, and World Wide Web (WWW) documents.

User Guides

The following documents are shipped with your Transcend NCS software as printed books:

■ Transcend Network Control Services Introduction to Transcend Network Management, Version 5.0 for UNIX

■ Transcend Network Control Services Installation Guide, Version 5.0 for UNIX

■ Transcend Network Control Services Network Administration Guide, Version 5.0 for UNIX

Table 4 3Com Device Name Changes

Previous name New name

Cellplex® 7000 CoreBuilder™ 7000

LANplex 2500 CoreBuilder 2500

LANplex 6000 CoreBuilder 6000

ONcore hubs CoreBuilder 5000 hubs

ONcore Controller and Management modules

CoreBuilder 5000 Controller and Management modules

ONcore FastModule CoreBuilder 5000 FastModule

ONcore SwitchModule CoreBuilder 5000 SwitchModule

Related Documentation 17

■ Transcend Management Software Network Troubleshooting Guide, Version 5.0 for UNIX

■ Transcend Network Control Services Release Notes, Version 5.0 for UNIX

■ Transcend Network Control Services on the Web Quick Tour, Version 5.0 for UNIX

The following documents are shipped with your Transcend NCS software on the CD-ROM entitled Transcend Network Control Services Online Documentation Set:

■ Inventory Management

■ Transcend Network Control Services Transcend Central User Guide, Version 5.0 for UNIX

■ Configuration Management

■ Transcend Network Control Services Network Admin Tools User Guide, Version 5.0 for UNIX

■ Transcend Network Control Services Device View User Guide, Version 5.0 for UNIX

■ Transcend Network Control Services NETBuilder Management Application Suite User Guide, Version 5.0 for UNIX

■ Transcend Network Control Services Token Ring Manager User Guide, Version 5.0 for UNIX

■ Transcend Network Control Services Enterprise VLAN Manager User Guide, Version 5.0 for UNIX

■ Transcend Network Control Service PathBuilder Switch Manager User Guide, Version 5.0 for UNIX

■ Transcend Network Control Services Total Control Manager/SNMP User Guide

■ Monitoring and Reporting

■ Transcend Network Control Services Status Watch User Guide, Version 5.0 for UNIX

■ Transcend Network Control Services LANsentry Manager User Guide, Version 5.0 for UNIX

■ Transcend Network Control Services LANsentry Reporter User Guide, Version 5.0 for UNIX


Help Systems

Each Transcend NCS application contains a Help system that describes how to use all the features of the application. Help includes window descriptions, step-by-step instructions, conceptual information, and troubleshooting tips for that application.

You can access Help from:

■ The Help menu in any application by selecting Help Topics (in the Help Topics window, you can view the Contents and Index)

■ A Help button in windows and dialog boxes

■ Your 3Com/Transcnd/Help directory (or the directory that you have set for your Transcend software installation)

3Com World Wide Web (WWW)

The following 3Com Web resources provide additional information about Transcend Network Control Services:

■ 3Com Network Management Solution Center –– Contains a range of information about 3Com’s network management solutions including Transcend Network Control Services, Total Control™ Manager, Transcend Traffix™ Manager, Transcend dRMON Edge Monitor, InfoVista, and Transcend Enterprise Monitor hardware probes for Ethernet and Token Ring networks.

http://www.3com.com/products/trans_net_man.html

■ 3Com Support –– Provides access to technical support and includes data sheets, support tips, Frequently Asked Questions (FAQ) documents, user guides, release notes, and software downloads.

http://infodeli.3com.com/

■ Document Center –– Contains useful links to news, technical briefs, case studies, solutions guides, and product data sheets.

http://www.3com.com/util/dcenter.html

■ Technology Center –– Contains up-to-the-minute white papers, strategic overviews, and in-depth tutorials about networking technologies and innovations.

http://www.3com.com/technology/index.html

■ Networking Glossary –– Explains networking terms and acronyms.

http://www.3com.com/nsc/glossary/main.htm

Year 2000 Compliance 19

Year 2000 Compliance

For information on Year 2000 compliance and 3Com products, visit the 3Com Year 2000 Web page:

http://www.3com.com/products/yr2000.html

I
BEFORE TROUBLESHOOTING
Chapter 1 Network Troubleshooting Overview

Chapter 2 Your Network Troubleshooting Toolbox

Chapter 3 Steps to Actively Managing Your Network

1
NETWORK TROUBLESHOOTING OVERVIEW
These sections introduce you to the concepts and practice of network troubleshooting:

■ Introduction to Network Troubleshooting

■ Network Troubleshooting Framework

■ Troubleshooting Strategy

Introduction to Network Troubleshooting

Network troubleshooting means recognizing and diagnosing networking problems with the goal of keeping your network running optimally. As a network administrator, your primary concern is maintaining connectivity of all devices (a process often called fault management). You also continually evaluate and improve your network’s performance. Because serious networking problems can sometimes begin as performance problems, paying attention to performance can help you address issues before they become serious.

About ConnectivityProblems

Connectivity problems occur when end stations cannot communicate with other areas of your local area network (LAN) or wide area network (WAN). Using management tools, you can often fix a connectivity problem before users even notice it. Connectivity problems include:

■ Loss of connectivity — When users cannot access areas of your network, your organization’s effectiveness is impaired. Immediately correct any connectivity breaks.

■ Intermittent connectivity — Although users have access to network resources some of the time, they are still facing periods of downtime. Intermittent connectivity problems can indicate that your network is on the verge of a major break. If connectivity is erratic, investigate the problem immediately.

■ Timeout problems — Timeouts cause loss of connectivity, but are often associated with poor network performance.

24 CHAPTER 1: NETWORK TROUBLESHOOTING OVERVIEW

About PerformanceProblems

Your network has performance problems when it is not operating as effectively as it should. For example, response times may be slow, the network may not be as reliable as usual, and users may be complaining that it takes them longer to do their work. Some performance problems are intermittent, such as instances of duplicate addresses. Other problems can indicate a growing strain on your network, such as consistently high utilization rates.

If you regularly examine your network for performance problems, you can extend the usefulness of your existing network configuration and plan network enhancements, instead of waiting for a performance problem to adversely affect the users’ productivity.

Solving Connectivityand Performance

Problems

When you troubleshoot your network, you employ tools and knowledge already at your disposal. With an in-depth understanding of your network, you can use network software tools, such as “Ping”, and network devices, such as “Analyzers”, to locate problems, and then make corrections, such as swapping equipment or reconfiguring segments, based on your analysis.

Transcend® provides another set of tools for network troubleshooting. These tools have graphical user interfaces that make managing and troubleshooting your network easier. With “Transcend Applications”, you can:

■ Baseline your network’s normal status to use as a basis for comparison when the network operates abnormally

■ Precisely monitor network events

■ Be notified immediately of critical problems on your network, such as a device losing connectivity

■ Establish alert thresholds to warn you of potential problems that you can correct before they affect your network

■ Resolve problems by disabling ports or reconfiguring devices

See “Your Network Troubleshooting Toolbox” for details about each troubleshooting tool.

Network Troubleshooting Framework 25

Network Troubleshooting Framework

The International Standards Organization (ISO) Open Systems Interconnect (OSI) reference model is the foundation of all network communications. This seven-layer structure provides a clear picture of how network communications work.

Protocols (rules) govern communications between the layers of a single system and among several systems. In this way, devices made by different manufacturers or using different designs can use different protocols and still communicate.

By understanding how network troubleshooting fits into the framework of the OSI model, you can identify at what layer problems are located and which type of troubleshooting tools to use. For example, unreliable packet delivery can be caused by a problem with the transmission media or with a router configuration. If you are receiving high rates of “FCS Errors” and “Alignment Errors”, which you can monitor with Status Watch, then the problem is probably located at the physical layer and not the network layer. Figure 1 shows how to troubleshoot the layers of the OSI model.

Table 5 describes the data that the network management tools can collect as it relates to the OSI model layers.

Table 5 Network Data and the OSI Model Layers

Layer Data Collected TranscendcNCS Tool Used

Application

Presentation

Session

Transport

Protocol information and other Remote Monitoring (RMON) and RMON2 data

■ LANsentry Manager

■ Traffix Manager™(for more detail)

Network Routing information ■ Status Watch

■ LANsentry Manager(for more detail)

■ Traffix Manager(for more detail)

Data Link Traffic counts and other packet breakdowns

■ Status Watch

■ LANsentry Manager(for more detail)

Physical Error counts ■ Status Watch


Figure 1 OSI Reference Model and Network Troubleshooting

For information about network troubleshooting tools, see “Your Network Troubleshooting Toolbox”.

Troubleshooting Strategy

How do you know when you are having a network problem? The answer to this question depends on your site’s network configuration and on your network’s normal behavior. See “Knowing Your Network” for more information.

SNMPmanagers Console

SNMPmanager, agent,

proxy agent

Telnet,rlogin, FTP

UDPTCP

IP

Application

Presentation

Session

Transport

NetworkLayer 3

Analyzers

Probes

Traffix™ Manager

LANsentry® Manager

Probes

LANsentry Manager

Status Watch

TroubleshootingTools

Examples:

Examples:

Examples:

IPX

Data link

PhysicalLayer 1

Layer 2

Ethernet

LLC

MAC

PHY

TokenRing

LLC

MAC

PHY

FDDI

LLC

MAC

PHY

PMD

Layer 4

Layer 5

Layer 6

Layer 7

StatusWatch

Cabletestingtools

Troubleshooting Strategy 27

If you notice changes on your network, ask the following questions:

■ Is the change expected or unusual?

■ Has this event ever occurred before?

■ Does the change involve a device or network path for which you already have a backup solution in place?

■ Does the change interfere with vital network operations?

■ Does the change affect one or many devices or network paths?

After you have an idea of how the change is affecting your network, you can categorize it as critical or noncritical. Both of these categories need resolution (except for changes that are one-time occurrences); the difference between the categories is the time that you have to fix the problem.

By using a strategy for network troubleshooting, you can approach a problem methodically and resolve it with minimal disruption to network users. It is also important to have an accurate and detailed map of your current network environment. Beyond that, a good approach to problem resolution is:

■ Recognizing Symptoms

■ Understanding the Problem

■ Identifying and Testing the Cause of the Problem

■ Solving the Problem

RecognizingSymptoms

The first step to resolving any problem is to identify and interpret the symptoms. You may discover network problems in several ways. Users may complain that the network seems slow or that they cannot connect to a server. You may pass your network management station and notice that a node icon is red. Your beeper may go off and display the message: WAN connection down .

User Comments

Although you can often solve networking problems before users notice a change in their environment, you invariably get feedback from your users about how the network is running, such as:

■ They cannot print.

■ They cannot access the application server.


■ It takes them much longer to copy files across the network than it usually does.

■ They cannot log on to a remote server.

■ When they send e-mail to another site, they get a routing error message.

■ Their system freezes whenever they try to Telnet.

Network Management Software Alerts

Network management software, as described in “Your Network Troubleshooting Toolbox”, can alert you to areas of your network that need attention. For example:

■ The application displays red (Warning) icons.

■ Your weekly Top-N utilization report (which indicates the 10 ports with the highest utilization rates) shows that one port is experiencing much higher utilization levels than normal.

■ You receive an e-mail message from your network management station that the threshold for broadcast and multicast packets has been exceeded.

These signs usually provide additional information about the problem, allowing you to focus on the right area.

Analyzing Symptoms

When a symptom occurs, ask yourself these types of questions to narrow the location of the problem and to get more data for analysis:

■ To what degree is the network not acting normally (for example, does it now take one minute to perform a task that normally takes five seconds)?

■ On what subnetwork is the user located?

■ Is the user trying to reach a server, end station, or printer on the same subnetwork or on a different subnetwork?

■ Are many users complaining that the network is operating slowly or that a specific network application is operating slowly?

■ Are many users reporting network logon failures?

■ Are the problems intermittent? For example, some files may print with no problems, while other printing attempts generate error messages, make users lose their connections, and cause systems to freeze.


Understanding theProblem

Networks are designed to move data from a transmitting device to a receiving device. When communication becomes problematic, you must determine why data are not traveling as expected and then find a solution. The two most common causes for data not moving reliably from source to destination are:

■ The physical connection breaks (that is, a cable is unplugged or broken).

■ A network device is not working properly and cannot send or receive some or all data.

Network management software can easily locate and report a physical connection break (layer 1 problem). It is more difficult to determine why a network device is not working as expected, which is often related to a layer 2 or a layer 3 problem.

To determine why a network device is not working properly, look first for:

■ Valid service — Is the device configured properly for the type of service it is supposed to provide? For example, has Quality of Service (QoS), which is the definition of the transmission parameters, been established?

■ Restricted access — Is an end station supposed to be able to connect with a specific device or is that connection restricted? For example, is a firewall set up that prevents that device from accessing certain network resources?

■ Correct configuration — Is there a misconfiguration of IP address, subnet mask, gateway, or broadcast address? Network problems are commonly caused by misconfiguration of newly connected or configured devices. See “Manager-to-Agent Communication” for more information.

Identifying andTesting the Cause of

the Problem

After you develop a theory about the cause of the problem, test your theory. The test must conclusively prove or disprove your theory.

Two general rules of troubleshooting are:

■ If you cannot reproduce a problem, then no problem exists unless it happens again on its own.

■ If the problem is intermittent and you cannot replicate it, you can configure your network management software to catch the event in progress.


For example, with “LANsentry Manager”, you can set alarms and automatic packet capture filters to monitor your network and inform you when the problem occurs again. See “Configuring Transcend NCS” for more information.

Although network management tools can provide a great deal of information about problems and their general location, you may still need to swap equipment or replace components of your network until you locate the exact trouble spot.

After you test your theory, either fix the problem as described in “Solving the Problem” or develop another theory.

Sample Problem Analysis

This section illustrates the analysis phase of a typical troubleshooting incident.

On your network, a user cannot access the mail server. You need to establish two areas of information:

■ What you know — In this case, the user’s workstation cannot communicate with the mail server.

■ What you do not know and need to test —

■ Can the workstation communicate with the network at all, or is the problem limited to communication with the server? Test by sending a “Ping” or by connecting to other devices.

■ Is the workstation the only device that is unable to communicate with the server, or do other workstations have the same problem? Test connectivity at other workstations.

■ If other workstations cannot communicate with the server, can they communicate with other network devices? Again, test the connectivity.

The analysis process follows these steps:

1 Can the workstation communicate with any other device on the subnetwork?

■ If no, then go to step 2.

■ If yes, determine if only the server is unreachable.

■ If only the server cannot be reached, this suggests a server problem. Confirm by doing step 2.


■ If other devices cannot be reached, this suggests a connectivity problem in the network. Confirm by doing step 3.

2 Can other workstations communicate with the server?

■ If no, then most likely it is a server problem. Go to step 3.

■ If yes, then the problem is that the workstation is not communicating with the subnetwork. (This situation can be caused by workstation issues or a network issue with that specific station.)

3 Can other workstations communicate with other network devices?

■ If no, then the problem is likely a network problem.

■ If yes, the problem is likely a server problem.

When you determine whether the problem is with the server, subnetwork, or workstation, you can further analyze the problem, as follows:

■ For a problem with the server — Examine whether the server is running, if it is properly connected to the network, and if it is configured appropriately.

■ For a problem with the subnetwork — Examine any device on the path between the users and the server.

■ For a problem with the workstation — Examine whether the workstation can access other network resources and if it is configured to communicate with that particular server.

Equipment for Testing

To help identify and test the cause of problems, have available:

■ A laptop computer that is loaded with a terminal emulator, TCP/IP stack, TFTP server, CD-ROM drive (to read the online documentation), and some key network management applications, such as LANsentry®

Manager. With the laptop computer, you can plug into any subnetwork to gather and analyze data about the segment.

■ A spare managed hub to swap for any hub that does not have management. Swapping in a managed hub allows you to quickly spot which port is generating the errors.

■ A single port probe to insert in the network if you are having a problem where you do not have management capability.

■ Console cables for each type of connector, labeled and stored in a secure place.


Solving the Problem Many device or network problems are straightforward to resolve, but others yield misleading symptoms. If one solution does not work, continue with another.

A solution often involves:

■ Upgrading software or hardware (for example, upgrading to a new version of agent software or installing Gigabit Ethernet devices)

■ Balancing your network load by analyzing:

■ What users communicate with which servers

■ What the user traffic levels are in different segments

Based on these findings, you can decide how to redistribute network traffic.

■ Adding segments to your LAN (for example, adding a new switch where utilization is continually high)

■ Replacing faulty equipment (for example, replacing a module that has port problems or replacing a network card that has a faulty jabber protection mechanism)

To help solve problems, have available:

■ Spare hardware equipment (such as modules and power supplies), especially for your critical devices

■ A recent backup of your device configurations to reload if flash memory gets corrupted (which can sometimes happen due to a power outage)

Use the Transcend NCS application suite Network Admin Tools to save and reload your software configurations to devices.

2
YOUR NETWORK TROUBLESHOOTING TOOLBOX
A robust network troubleshooting toolbox consists of items (such as network management applications, hardware devices, and other software) to recognize, diagnose, and solve networking problems. It contains:

■ Transcend Applications

■ Network Management Platforms

■ 3Com SmartAgent Embedded Software

■ Other Commonly Used Tools

Transcend Applications

Transcend® management software is optimized for managing 3Com devices and their attached networks. However, some applications, such as LANsentry® Manager, can manage any vendor’s networking equipment that complies with the Remote Monitoring (RMON) Management Information Base (MIB).

This section describes these Transcend applications, which you can use to troubleshoot your network:

■ Transcend Central

■ Status Watch

■ Address Tracker


■ Traffix Manager

■ Device View

This guide primarily focuses on using these applications to troubleshoot your network.

34 CHAPTER 2: YOUR NETWORK TROUBLESHOOTING TOOLBOX

Transcend Central Start with Transcend Central, which is an asset management and device grouping application, to understand what your network consists of and to control the Transcend NCS network management troubleshooting tools. Transcend Central is available as both a native Windows application and a Java application that you can access using a Web browser.

Using Transcend Central for troubleshooting, you can:

■ Display an inventory of device, module, and port information.

■ Group devices to make your troubleshooting tasks easier. By managing a collection of devices, you can simultaneously perform the same tasks on each device in a group and locate physical or logical problems on your network.

■ Launch Transcend NCS applications, including some of your primary Transcend NCS troubleshooting tools:

■ Status Watch includes Web Reporter (from the Java version)

■ Address Tracker


■ Traffix Manager

■ Device View

Status Watch The Status Watch applications manage 3Com devices and their attached networks. Status Watch applications primarily poll for “MIB-II” data. This is a performance monitoring application that allows you to monitor the operational status of your network devices and quickly identify any problems that require your attention. It works in conjunction with Web Reporter.

See the Status Watch Help to learn which 3Com devices are supported.

Web Reporter

Web Reporter is a data-reporting application that runs in a World Wide Web (WWW) browser. It generates reports from data that Status Watch collects, allowing you to compare network statistics against a baseline.

Address Tracker Address Tracker is an address collection and discovery application that:

■ Polls managed devices for all MAC addresses

Transcend Applications 35

■ Polls managed devices and routers for IP addresses to perform MAC-to-IP address translation

■ Uses Device View to disable troublesome ports

LANsentry Manager LANsentry Manager is a set of integrated applications that displays and explores the real-time and historical data that RMON-compliant devices (probes) on the network capture. LANsentry Manager uses SNMP polling to gather RMON and RMON2 data from the probes.

Use LANsentry Manager to:

■ Monitor current performance of network segments

■ See trends over time

■ Spot signs of current problems

■ Configure alarms to monitor for specific events

■ Capture packets and display their contents

LANsentry Manager works with any device (from 3Com or other vendors) that supports the “RMON MIB” or the “RMON2 MIB”.

Traffix Manager Traffix™ Manager is a performance-monitoring application that provides information about layer 2 (RMON) and layer 3 conversations between nodes. It helps you to assess traffic patterns on your network. Traffix Manager:

■ Monitors all the stations that the RMON2–compliant probes encounter on your network

■ Captures and stores RMON and RMON2 data for your network’s protocols and applications

■ Displays traffic between stations in user-defined views of the network

■ Graphs current or historical data on the devices selected

■ Delivers reports for user-specified stations and time periods as postscript to your printer or as HTML to your Web server

■ Launches LANsentry Manager tools for in-depth analysis of a station or a conversation between stations


You can use Traffix Manager to:

■ Know your network — Understand overall flow patterns and interactions between systems, and determine how your network is really being used at the application level.

■ Optimize your network — Gain an insight into traffic and application usage trends to help you optimize the use and placement of current network resources and make wise decisions about capacity planning and network growth.

Traffix Manager works with any device (from 3Com or other vendors) that supports the “RMON2 MIB”.

Device View The Device View application is a device configuration tool. When you troubleshoot your network, you can use Device View to determine or change a device’s configuration. You can also use Device View to look at a device’s statistics and to set alarms.

Device View manages only 3Com devices.

See the Device View Help for which 3Com devices are supported by Device View.

You can also use Transcend Upgrade Manager, which is one of the Network Admin Tools applications, to perform bulk software upgrades on devices.

Network Management Platforms

As part of your troubleshooting toolbox, your network management platform is the first place to go to view the overall health of your network. With the platform, you can understand the logical configuration of your network and configure views of your network to understand how devices work together and the role that they play in the users’ work. The network management platform that supports your Transcend software installation can provide valuable troubleshooting tools. Transcend runs on several platforms within the NT and UNIX environments.

The platform discovers the devices. Transcend imports that information from the platform to populate the core database. Unless you are rediscovering, the user must manually update the platform

3Com SmartAgent Embedded Software 37

Using this device database, a map displays the graphical representation of your network. Each device on your network appears as a symbol (icon) on the map. You can configure views of your network to show devices on the same subnetworks or floors.

You can monitor network performance and diagnose network performance and connectivity problems. You can also:

■ Take a snapshot of your network in its normal state. The snapshot records the state of your network at a particular instant. If you later have network performance problems, you can compare the current state of your network to the snapshot.

■ Quickly determine the connectivity status of a device by noting the color of its map symbol. Red usually means that communication with a device has ceased.

■ Diagnose connectivity problems by determining whether two devices can communicate. If they can communicate, then examine the route between the devices, the number of packets that were sent and lost, and the roundtrip time between the two devices.

■ Manage MIB information (for example, collecting and storing MIB data for trend analysis and graphing) using MIB queries. Transcend compiles MIBs and allows you to navigate up and down the “MIB Tree” to retrieve MIB objects from devices. You can set thresholds for MIB data and generate events when a threshold is exceeded.

■ Configure the software to act on certain events. The Event Categories window informs you of any unexpected events (which arrive in the form of traps).

For more information, see the documentation that is shipped with your software.

3Com SmartAgent Embedded Software

Traditional Simple Network Management Protocol (SNMP) management places the burden of collecting network management information on the management station. In this traditional model, software agents collect information about throughput, record errors or packet overflows, and measure performance based on established thresholds. Through a polling process, agents pass this information to a centralized network management station whenever they receive an SNMP query. Management applications then make the data useful and alert the user if there are problems on the device.


For more information about traditional SNMP management, see “SNMP Operation”.

As a useful companion to traditional network management methods, 3Com’s SmartAgent® technology places management intelligence into the software agent that runs within a 3Com device. This scalable solution reduces the amount of computational load on the management station and helps minimize management-related network traffic.

SmartAgent software, which uses the “RMON MIB”, is self-monitoring, collecting and analyzing its own statistical, analytical, and diagnostic data. In this way, you can conduct network management by exception — that is, you are notified only if a problem occurs. Management by exception is unlike traditional SNMP management, in which the management software collects all data from the device through polling.

SmartAgent software works autonomously and reports to the network management station whenever an exceptional network event occurs. The software can also take direct action without involving the management station. Devices that contain SmartAgent software may be able to:

■ Perform broadcast throttling to minimize the flow of broadcast traffic on your network

■ Monitor the ratio of good frames to bad frames

■ Switch a resilient link pair to the standby path if the primary path corrupts frames

■ Report if traffic on vital segments drops below minimum usage levels

■ Disable a port for five seconds to clear problems, and then automatically reconnect it

To configure these advanced SmartAgent software features, see your device documentation.

The Transcend NCS applications LANsentry Manager and Traffix Manager make RMON data that the SmartAgent software collect more usable by summarizing and correlating important information.

Other Commonly Used Tools 39

Other Commonly Used Tools

These commonly used tools can also help you troubleshoot your network:

■ Network software, such as Ping, Telnet, and FTP and TFTP. You can use these applications to troubleshoot, configure, and upgrade your system.

■ Network monitoring devices, such as Analyzers and Probes.

■ Tools, such as Cable Testers, for working on physical problems.

Many of the tools that are discussed in this section are only useful in TCP/IP networks.

Ping Packet Internet Groper (Ping) allows you to quickly verify the connectivity of your network devices. Ping attempts to transmit a packet from one device to a station on the network, and listens for the response to ensure that it was correctly received. You can validate connections on the parts of your network by pinging different devices:

■ A successful response indicates that a valid network path exists between your station and the remote host and that the remote host is active.

■ Slower response times than normal can indicate that the path is congested or obstructed.

■ A failed response indicates that a connection is broken somewhere; use the message to help locate the problem. See Tips on Interpreting Ping Messages.

Some network devices, like the CoreBuilder™ 5000, must be configured to be able to respond to Ping messages. If you are not receiving responses from a device, first make sure that it is set up to be a Ping responder.

Strategies for Using Ping

Follow these strategies for using Ping:

■ Ping devices when your network is operating normally so that you have a performance baseline for comparison. See “Identifying Your Network’s Normal Behavior” for more information.

■ Ping by IP address when:

■ You want to test devices on different subnetworks. This method allows you to Ping your network segments in an organized way, rather than having to remember all the hostnames and locations.


■ Your Domain Name System (DNS) server is down and your system cannot look up host names properly. You can Ping with IP addresses even if you cannot access hostname information.

■ Ping by hostname when you want to identify DNS server problems.

■ To troubleshoot problems that involve large packet sizes, Ping the remote host repeatedly, increasing the packet size each time.

■ To determine if a link is erratic, perform a continuous Ping (using ping

-s on UNIX), which indicates the time that it takes the device to respond to each Ping.

■ To determine a route taken to a destination, use the trace route function (tracert ).

■ Consider creating a Ping script that periodically sends a Ping to all necessary networking devices. If a Ping failure message is received, the script can perform some action to notify you of the problem, such as paging you.

■ Use the Ping functions of your network management platform. For example, in your HP OpenView map, select a device and click the right mouse button to gain access to ping functions.

Tips on Interpreting Ping Messages

Use the following ping failure messages to troubleshoot problems:

No reply from <destination>

Indicates that the destination routes are available but that there is a problem with the destination itself.

<destination> is unreachable

Indicates that your system does not know how to get to the destination. This message means either that routing information to a different subnetwork is unavailable or that a device on the same subnetwork is down.

ICMP host unreachable from gateway

Indicates that your system can transmit to the target address using a gateway, but that the gateway cannot forward the packet properly because either a device is misconfigured or the gateway is not operating.

Other Commonly Used Tools 41

Telnet Telnet, which is a login and terminal emulation program for Transmission Control Protocol/Internet Protocol (TCP/IP) networks, is a common way to communicate with an individual device. You log in to the device (a remote host) and use that remote device as if it were a local terminal.

If you have established an out-of-band Telnet connection with a device, you can use Telnet to communicate with that device even if the network is unavailable. This feature makes Telnet one of the most frequently used network troubleshooting tools. Usually, all device statistics and configuration capabilities are accessible by using Telnet to connect to the device’s console. For more information about setting up an out-of-band connection, see “Using Telnet, Serial Line, and Modem Connections”.

You can invoke the Telnet application on your local system and set up a link to a Telnet process that is running on a remote host. You can then run a program that is located on a remote host as if you were working at the remote system.

FTP and TFTP Most network devices support either the File Transfer Protocol (FTP) or the Trivial File Transfer Protocol (TFTP) for downloading updates of system software. Updating system software is often the solution to networking problems that are related to agent problems. Also, new software features may help correct a networking problem.

FTP provides flexibility and security for file transfer by:

■ Accepting many file formats, such as ASCII and binary

■ Using data compression

■ Providing Read and Write access so that you can display, create, and delete files and directories

■ Providing password protection

TFTP is a simple version of FTP that does not list directories or require passwords. TFTP only transfers files to and from a remote server.

Analyzers An analyzer, which is often called a Sniffer, is a network device that collects network data on the segment to which it is attached, a process called packet capturing. Software on the device analyzes this data, which is a process referred to as protocol analysis. Most analyzers can interpret different types of protocol traffic, such as TCP/IP, AppleTalk, and Banyan VINES traffic.


You usually use analyzers for reactive troubleshooting — when you see a problem somewhere on your network, you attach an analyzer to capture and interpret the data from that area. Analyzers are particularly helpful for identifying intermittent problems. For example, if your network backbone has experienced moments of instability that prevent users from logging on to the network, you can attach an analyzer to the backbone to capture the intermittent problems when they happen again.

Probes Like Analyzers, a probe is a network device that collects network data. Depending on its type, a probe can collect data from multiple segments simultaneously. It stores the collected data and transfers the data to an analysis site when requested. Unlike an analyzer, probes do not interpret data.

A probe can be either a stand-alone device or an agent in a network device. The Transcend Enterprise Monitor 500 series and the SuperStack® II Monitor series are stand-alone RMON probes. LANsentry Manager and Traffix Manager use data from probes that comply with the “RMON MIB” or the “RMON2 MIB”.

You can use a probe daily to determine the health of your network. The Transcend NCS applications can interpret and report this data, alerting you to possible problems so that you can proactively manage your network. For example, an RMON2 probe can help you to analyze traffic patterns on your network. Use this data to make decisions about reconfiguring devices and end stations as needed.

Cable Testers Cable testers examine the electrical characteristics of the wiring. They are most commonly used to ensure that building wiring and cables meet Category 5, 4, and 3 standards. For example, network technologies such as Fast Ethernet require the cabling to meet Category 5 requirements. Testers are also used to find defective and broken wiring in a building.

3
STEPS TO ACTIVELY MANAGING YOUR NETWORK
These sections describe the steps that you can take to effectively troubleshoot your network when the need arises:

■ Designing Your Network for Troubleshooting

■ Preparing Devices for Management

■ Configuring Transcend NCS

■ Knowing Your Network

Designing Your Network for Troubleshooting

By designing your network for troubleshooting, you can access key devices on your network when your network is experiencing connectivity or performance problems. Having adequate management access depends on these design criteria:

■ Position of the management station so that it can gather the greatest amount of network data through Simple Network Management Protocol (SNMP) polling

■ Position of probes for distributed management of critical networks

■ Ability to communicate with each device even when your management station cannot access the network

The following sections discuss how to design your network with the preceding criteria in mind:

■ Positioning Your SNMP Management Station

■ Using Probes

■ Monitoring Business-critical Networks

■ Using Telnet, Serial Line, and Modem Connections

■ Using Communications Servers

■ Setting Up Redundant Management

44 CHAPTER 3: STEPS TO ACTIVELY MANAGING YOUR NETWORK

■ Other Tips on Network Design

Positioning YourSNMP Management

Station

In a typical LAN, locate your management station directly off the backbone where it can conduct SNMP polling and manage network devices. The backbone is usually the optimum location for the management station because:

■ The backbone is not subject to the failures of individual subnetworked routers or switches.

■ In a partial network outage, the information collected by a backbone management station is probably more accurate than from a station in a routed subnetwork.

■ The backbone is usually protected with redundant power and technologies, like Fiber Distributed Data Interface (FDDI), that correct their own problems. This redundancy ensures that the backbone remains operational, even when other areas of the network are having problems.

■ The backbone is typically faster and has a higher bandwidth than other areas of your network, making it a more efficient location for a management station.

Make sure that the capacity of your backbone can accommodate the SNMP traffic that the management applications generate.

Figure 2 shows a management station that is set up at the network backbone and polling network devices.

Designing Your Network for Troubleshooting 45

Figure 2 SNMP Management at the Backbone

Although SNMP management from the backbone is a good way to keep track of what is happening on your network, do not rely on it exclusively. Because SNMP management occurs in-band (that is, SNMP traffic shares network bandwidth with data traffic), network troubleshooting using SNMP can become a problem in these ways:

■ Very heavy data traffic or a break in the network can make it difficult or impossible for the management station to poll a device.

■ Traffic that SNMP polling adds to the network may contribute to networking problems.

Using Probes To minimize the frequency of SNMP traffic on your network, set up one or more Probes to collect Remote Monitoring (RMON) data from the network devices. In the distributed model illustrated in Figure 3, the management station uses SNMP polling to collect data from the probes rather than from all the network devices. Distributing the management over the network ensures you of some continued data collection even if you have network problems.

Many management applications support data from MIBs other than the RMON MIBs. For this reason, even if you are using RMON probes, some SNMP polling to individual devices from a key management station is always useful for a complete picture of your network.

Backbone

x x x

x x x x x x x x x

Managementworkstation

x

x = Network devices that you want to poll

NIC card ornetwork device


Figure 3 Management at the Backbone with an Attached Probe

To extend your remote monitoring capabilities, use embedded RMON probes or roving analysis (monitoring one port for a period of time, moving on to another port for a while, and so on). However, with roving analysis, you cannot see a historical analysis of the ports because the probe is moving from one port to another.

Some probes, like 3Com’s Enterprise Monitor, are designed to support the large number of interfaces that are found in switched environments. The probe’s high port density supports this multi-segmented switched environment. You can also use the probe’s interfaces to monitor mirror (or copy) ports on the switch, which means that all data received and transmitted on a port is also sent to the probe.

Probes do not indicate which port has caused an error. Only a managed hub (a hub or switch with an onboard management module) can provide that level of detail. Probes and a hub’s own management module complement each other.

Backbone

x x x

x x x x x x


x


Probe

xNIC card ornetwork device

NIC card ornetwork device

Probex

x x x

x

x


MonitoringBusiness-critical

Networks

On business-critical networks, you need to increase your level of management by dedicating probes to the essential areas of your network. For detailed network management, it is not enough to gather raw performance figures — you need to know, at the network and conversation level, what is generating the traffic and when it is being generated. For this type of analysis, use reporting tools, such as Traffix Manager, and low-level, fault diagnostic tools, such as LANsentry Manager®.

The three critical areas to monitor on this type of network are discussed in these sections and shown in Figure 4:

■ FDDI Backbone Monitoring

■ Internet WAN Link Monitoring

■ Switch Management Monitoring

Figure 4 Probes Monitoring a Business-critical Network

FDDI Backbone

x x

x x x x x x


x


SuperStack® IIEnterprise Monitorwith FDDI module

Direct connection to themanagement workstation

WAN = Possible probe attachment to a switch’s

SuperStack IIEnterprise Monitor

x

roving analysis port

Inline monitoringon Fast Ethernet


FDDI Backbone Monitoring

On the FDDI backbone, you need to continually monitor whether it is being overutilized, and, if so, by what type of traffic. By placing the SuperStack® II Enterprise Monitor with an FDDI media module directly at the backbone, you can gather utilization and host matrix information. Traffix Manager uses these data to provide regular segment utilization reports and Top-N host reports. In addition, the probe provides a full range of FDDI performance statistics that LANsentry Manager can record or that SNMP traps can report to the management station.

To ensure management access to the probe, provide a direct connection to the probe from your management station. You can use this connection to access probe data even if the ring is unusable and keeps management traffic off the main ring.

Internet WAN Link Monitoring

The Internet link is a concern for dedicated network management because it:

■ Represents an external cost to the company

■ Requires budgeting

■ Is a possible security problem

In a way that is similar to monitoring the FDDI backbone, Traffix Manager reports can indicate whether you are paying for too much bandwidth or whether you need to purchase more. Traffix Manager can also indicate the level of use on a workgroup basis for internal billing and highlight the top sites that users visit. Similarly, you can monitor for unexpected conversations and protocols.

You also need to know the error rates on this link and whether you are experiencing congestion because of circumstances on the Internet provider’s network. LANsentry Manager can record and display these statistics and provide a detailed real-time view.

Switch Management Monitoring

The third area of interest in this network is the large number of switch-to-end station links. When detailed analysis of these devices is required (for example, if one of the ports on the network suddenly reports much higher traffic than normal), you need to track the source of the problem and decide whether you can optimize the traffic path. In this


case, you need a way to view the traffic on the switch port at a conversation level.

By placing a Superstack II Enterprise Monitor in a central location, you can easily attach it to the switches that have the most Ethernet ports as the need arises. By using the roving analysis feature of many 3Com devices, you can copy data from a monitored port to the port on the switch that is connected to the SuperStack II. When a problem arises, roving analysis is activated for a particular switch and LANsentry Manager or Traffix Manager collects the data from the SuperStack II Enterprise Monitor. These applications can then monitor the network data for the devices that are connected to that switch.

Using Telnet,Serial Line, and

Modem Connections

To minimize your dependency on SNMP management, set up a way to reach the console of your key networking devices. Through the console, you can often view Ethernet, FDDI, Asynchronous Transfer Mode (ATM), and token ring statistics, view routing and bridging tables, and determine and modify device configurations.

Out-of-band (that is, management using a dedicated line to a device) console connections are also key to network troubleshooting. If the network goes down, your console connections are still available.

The types of console connections include:

■ Telnet — Out-of-band and in-band access using a network connection. For example, on 3Com’s CoreBuilder™ 6000 switch, using Telnet you can access the management console by using a dedicated Ethernet connection to the management module (out-of-band) and from any network attached to the device (in-band).

■ Serial line — Direct, out-of-band access using a terminal connection. This type of connection allows you to maintain your connections to a device if it reboots.

■ Modem — Remote, out-of-band access using a modem connection.

Figure 5 shows management of a device through the serial line and modem ports.


Figure 5 Out-of-band Management Using the Serial and Modem Ports

Sometimes, direct access to network devices through out-of-band management is the only way to examine a network problem. For example, if your network connections are down, you can Telnet to one of your key routers and examine its routing table. The routing table lists the devices that the router can reach, allowing you to narrow the area of the problem. You can also Ping from this device to further investigate which areas of the network are down.

UsingCommunications

Servers

Although out-of-band management keeps you in contact with a particular device during a network problem, it does not inform you about all the areas of your network from a central point. You must access each device separately. To manage devices more centrally, you can set up a communications server (often called a comm server). See Figure 6.

Modem

Modem



Wiring closet

Networkswitch

Serial line port

Modem port

Attached LAN


Figure 6 Out-of-band Management with a Communications Server

For optimal benefit, provide two management connections to the comm server:

■ Connect the comm server to the network (an in-band connection) so that you can access the devices from anywhere on the network using reverse Telnet.

■ Connect your management workstation directly to one of the serial ports of the comm server (an out-of-band connection) so that you can access the devices when the network is down.

Setting UpRedundant

Management

To ensure that a management station can always access the backbone, set up a redundancy system of management. In this setup, management applications (often different ones) run on separate management workstations, which are connected to the backbone through separate network devices or by using a network card.

This setup allows the management workstations to monitor each other and report any problems with their attached network devices. The redundancy system also provides a backup management connection to your network if one management station loses connectivity.


Wiring closet

Serial line port

Attached LAN

Serial line port

Communications server(“Comm” server)

Wiring closet

Networkswitch

Networkswitch

Serial line port


Other Tips onNetwork Design

This section provides some additional tips for designing your network for troubleshooting.

Management Station Configuration

■ Configure the management station to run without any network connection — including NIS, NFS, and DNS lookups. Do not install Transcend® on a network drive.

■ Have more than one interface available on the management station, an arrangement called dual hosting. Connect vital probes to the second interface to create a private monitoring LAN (one without regular network traffic) on which network problems do not impair communication.

■ Do not give the management station privileges on the network, such as the ability to log in with no passwords (rsh). Hackers can easily spot management stations.

■ Connect the management station to an uninterruptible power supply (UPS) to protect the station from events that interrupt power, such as blackouts, power surges, and brownouts.

■ Regularly back up the management station.

■ Provide remote access through a modem to the management station so that you can keep track of your network’s activity remotely.

More Tips

■ Use managed hubs to narrow which link is causing an error. Even if your budget does not allow you to manage all hubs, strategically install one managed hub for error tracking.

■ Keep copies of all configurations on a file server and on the management station. See “Knowing Your Network’s Configuration” for more information.

Preparing Devices for Management

Before Transcend (or any other management software) can work with the devices on your network, make sure that the devices are configured appropriately for management communication.

If you have a problem establishing a management connection, see “Manager-to-Agent Communication” for more information about solving this problem.

Configuring Transcend NCS 53

ConfiguringManagement

Parameters

Before you attempt to manage the supported devices with Transcend NCS applications, ensure that each device conforms with these prerequisites:

■ The device must have an IP hostname and IP address. When you manage modular devices, use the IP address of the device’s management module, if one is present.

■ The device and your network management platform must use the same SNMP read (get) and write (set) community strings. See “Security” for more information about community strings.

Configuring Traps SNMP trap reporting means that management agents send unsolicited messages to management stations, relaying events that have occurred at the device, such as a system reboot. Traps include an object identification (OID) that passes integer values or strings that the management software decodes.

Configure each device to send the SNMP traps that are required by the network management applications to the management station. You can set SNMP traps using the device’s console program or Device View, a Transcend NCS application.

For more information about traps, see “Trap Reporting”.

Configuring Transcend NCS

Configure Transcend to monitor your network most effectively, identify when thresholds are exceeded, and alert you to problems or potential problems.

Monitoring Devices For Transcend to monitor your devices:

■ Use your platform’s autodiscovery feature to detect all manageable devices on your network and to create a network map. Transcend NCS applications use this data for their operation. For Transcend NCS applications to recognize 3Com devices from the platform, the device icons must be 3Com device icons.

■ Add 3Com devices to an inventory database using Transcend Central. You can import devices from your platform’s database. The Transcend Central database defines the devices that many of the Transcend NCS applications manage and allows you to group devices for easier management and faster troubleshooting.


■ Create logical and physical groups of the devices in your database using Transcend Central.

Setting Thresholdsand Alarms

Thresholds are the upper and lower limits that you set for the network conditions and events that you are monitoring with network management software. When these limits are exceeded, the management software reports that a threshold has been exceeded (usually by icons changing color). Alarms add to this reporting functionality by allowing you to configure an action to be taken (such as disabling ports or sending e-mail) if the threshold is exceeded.

Alarms that are configured correctly can prevent inconvenient or even catastrophic network failures. The main advantage of alarms is that you can specify at exactly which point an action should take place, and you can tailor them to suit the normal operating conditions of your network.

The first time that you use the Transcend NCS applications, use the default thresholds to see how they apply to your network. After you assess your network’s normal behavior, you can adjust the thresholds and alarms to make them more useful for your particular network. See “Identifying Your Network’s Normal Behavior” for more information.

Setting Thresholds in Status Watch

You can set a rising threshold and a falling threshold for most Status Watch tools. The rising threshold triggers a status severity change when the threshold is exceeded. The falling threshold causes a status severity change when the excessive activity or abnormal condition has returned to normal.

For example, your Ethernet network may normally accommodate 50 percent utilization. If it exceeds 60 percent for an extended time, your network slows considerably. You want to know when and for how long your network exceeds the threshold of 60 percent.

Status Watch also allows you to set status severity levels for events in the FDDI Status and the System Status tools. You can set the severity level setting for the conditions and events. For some conditions and events, you can specify severity level settings for the individual values of the variables.

For more information about setting thresholds in Status Watch, see the Status Watch User Guide and Status Watch Help.

Configuring Transcend NCS 55

Setting Thresholds and Alarms in LANsentry Manager

Much of network management involves monitoring for specific network events. With LANsentry Manager, you can specify these events in advance and then know as soon as they occur. This process is known as setting alarms.

Consider the following examples of alarms:

■ Example A: The router on your network, which is capable of forwarding data at 3,000 packets per second (pps), appears to have problems forwarding at the top of its specification. You configure an alarm to notify you as soon as the traffic approaches this rate.

■ Example B: Your network is running at 1,400 pps. Typically, a Cyclic Redundancy Check (CRC) rate of more than 1 percent of network traffic is considered excessive. You configure an alarm to notify you as soon as the CRC rate climbs above the threshold of 14 pps.

Over time, you build up a library of alarms for your own network.

Refining Alarm Settings

You can refine your alarms for more exact monitoring by setting the hysteresis zone and defining Start and Stop events.

Hysteresis zone For more control over the conditions that trigger an alarm, you can also specify a hysteresis zone around the specified value. The hysteresis zone ensures that alarms are not triggered due to small fluctuations around the threshold value. The hysteresis zone is the area where a value has fallen below the upper threshold (also called the rising threshold) but has not yet reached a lower threshold (also called the falling threshold). After a rising threshold generates an alarm, the value must fall below the falling threshold before another alarm is generated. For alarms that are set on falling thresholds, the rule is reversed. Figure 7 shows an example of this alarm mechanism.


Figure 7 Alarm Triggering Mechanism

Stop and Startevents

In addition to using alarms on their own, in LANsentry Manager, you can use them as Start or Stop events when capturing packets with the Capture application. In Example A, you can start capturing all packets the router transmits whenever the traffic rate rises above 2,800 packets per second and then stop capturing when it drops below this level. In this way, you can capture packets leading up to the event and immediately after. By combining alarms and the Capture application, you have powerful troubleshooting capabilities.

For more information about setting alarms with LANsentry Manager, see the LANsentry Manager User Guide and Help.

Setting Alarms Based on a Baseline

When you determine the baselines of your network’s normal activity with Traffix Manager, you can use the Alarms View in LANsentry Manager to set alarms that trigger when network activity deviates from the baseline. See “Baselining Your Network” for more information.

Hysteresis zone

Alarm event generated

Time

Knowing Your Network 57

When determining the baseline for setting utilization alarms, use either of these approaches:

■ Set alarms for any peaks in network utilization — Pick a baseline value that covers most of your network traffic, ignoring any obvious one-time-only peaks. For example, as users log on at the start of the day, you see a large peak in network utilization. The alarm is triggered whenever such peaks occur.

■ Set alarms for exceptional peaks in network utilization — Pick a baseline value that covers the highest possible peak seen when service was still provided. The alarm is triggered at levels higher than this peak, alerting you to the most serious utilization on your network.

When you choose the baseline for error alarms, pick the lowest possible baseline so that the alarm is triggered by any peaks.

Other Tips for Setting Thresholds and Alarms

For SNMP traps to be effective, their thresholds must be high enough so that they do not generate false alarms. On the other hand, high thresholds also mean that small amounts of errors can escape detection. A very small error rate that regularly occurs (such as four per minute) can cause major problems with protocols with large retry delays. For example, some MAC-level errors corrupt packets so that a switch does not forward them.

Knowing Your Network

You can better troubleshoot the problems on your network by:

■ Knowing Your Network’s Configuration

■ Identifying Your Network’s Normal Behavior

Knowing YourNetwork’s

Configuration

Part of understanding your network is knowing its physical and logical configuration. You should know:

■ Which devices are on your network

■ How the devices are configured

■ Which devices are attached to the backbone

■ Which devices connect your network to the outside world (WAN)

To keep track of your network’s configuration, gather the following information:


■ Site Network Map

■ Logical Connections

■ Device Configuration Information

■ Other Important Data About Your Network

This data, when kept up-to-date, is extremely helpful for locating information when you experience network or device problems.

Site Network Map

A network map helps you to:

■ Know exactly where each device is physically located

■ Easily identify the users and applications that are affected by a problem

■ Systematically search each part of your network for problems

You can create a network map using any drawing or flow chart application. Store your network map online. In addition, make sure that you always have a current version on paper in case you cannot access the online version. Figure 8 shows an example of a network map of 3Com devices.


Figure 8 Example of a Site Network Map

Consider including the following information on your network map:

■ Location of important devices and workgroups (by floor, building, or area)

■ Location of the network backbone, data center, and wiring closets, as appropriate for your network

■ Location of your network management stations

■ Location and type of remote connections

■ IP subnetwork addresses for all managed switches and hubs

■ Other subnetwork addresses, such as Novell IPX and AppleTalk, if appropriate for your network

CoreBuilder 9000with SwitchModules

NETBuilder II®8-slot

AccessBuilder®

5000 7-slot

CS/2500

Windows NT workstations

Windows NT workstationsPrinters

Networkmanagementstationwith FDDI card

Floor 1

SuperStack® IISwitch 2200

CoreBuilderTM 3500

Internet ModemsISDN

Ethernet

Windows 95workstations

Printers

FDDIIP: 138.6.12.xxx

Floor 2

Floor 1

Ethernet

SuperStack IIHub 100 TX

UNIX workstations

CoreBuilder 3500

Ethernet

FastEthernet

FDDIIP: 138.6.13.xxx

FDDI BackboneIP: 138.6.1.xxx

Data center

Fast Ethernet

Fast Ethernet

FDDI

Mail serverNetWare servers

Web serverServer farm

SuperStack IISwitch 3300

NETBuilder II8-slot

SuperStack IIEnterprise Monitorwith FDDI module

Servers

UNIX workstations


■ Type of media (by actual name, such as 10BASE-T, or by grouping, such as Ethernet), which you can show with callouts, colors, line weights, or line styles

■ Virtual workgroups, which you can show with colors or shaded areas

■ Redundant links, which you can indicate with gray or dashed lines

■ Types of network applications that are used in different areas of your network

■ Types of end stations that are connected to the switches and hubs

Complete data about end station connections is usually too detailed for the network map. Instead, maintain tables that detail which end stations are connected to which devices, along with the MAC addresses of each end station. Use tools like “Address Tracker” to generate the MAC address information.

Logical Connections

With the advent of virtual LANs (VLANs), you need to know how your devices are connected logically as well as physically. For example, if you have connected two devices through the same physical switch, you can assume that they can communicate with each other. However, the devices can be in separate VLANs that restrict their communication.

Knowing the setup of your VLANs can help you to quickly narrow the scope of a problem to a VLAN instead of to a network connection.

The Transcend NCS application Enterprise VLAN Manager allows you to view the logical makeup of your network. Depending on the complexity of your network and VLAN configurations, you can use colors to show the VLANs graphically on your network map.

Device Configuration Information

Maintain online and paper copies of device configuration information. Make sure that all online data is stored with your site’s regular data backup. If your site does not have a backup system, copy the information onto a backup disc (CD, Zip disk, and the like) and store it offsite.

The Transcend NCS Network Admin Tools include applications that allow you to save device configurations.


Follow these guidelines for saving configuration information:

■ Because the easiest way to recover a device’s configuration is to use FTP or TFTP, save the configuration settings of each device that supports this method of uploading.

■ For other devices, Telnet in and save the session (which contains configuration details) to a file. If you cannot print the configuration of a device, then create a quick “rebuild” guide that explains the quickest way to configure the device from a fresh install.

■ For devices that store information to diskette, store this data as part of your site’s regular backup.

■ For routers and other important devices with text configuration files, store this data online in a revision control system. Keep the most recent version on paper. Keep previous versions.

■ For PCs, keep a recovery disk for each type of PC. For any device that you use as a server, store all startup scripts and copies of registries.

Other Important Data About Your Network

For a complete picture of your network, have the following information available:

■ All passwords — Store passwords in a safe place. Keep previous passwords in case you restore a device to a previous software version and need to use the old password that was valid for that version.

■ Device inventory — The inventory allows you to see the device type, IP address, ports, MAC addresses, and attached devices at a glance. Software tools, such as Transcend Central, can help you keep track of the 3Com devices on your network. Using Transcend Central, you can group devices by type and location and have this information on hand for troubleshooting.

■ MAC address-to-port number list — If your hubs or switches are not managed, you must keep a list of the MAC addresses that correlate to the ports on your hubs and switches. Generate and keep a paper copy of this list, which is crucial for deciphering captured packets, using Address Tracker.

Do not rely on Address Tracker getting an up-to-date list of MAC addresses because the network may be down, which prevents SNMP polling. If the network is down, an exported copy of Address Tracker’s data is invaluable (online or on paper).


■ Log book — Document your interactions, no matter how trivial, with each device that is critical to your network’s operation (that is, routers, remote access devices, security servers). For example, document that you noticed a fan making noise one morning. Your note may help you to identify why a device is over temperature a week later (because the fan stopped working).

■ Change control — Maintain a change control system for all critical systems. Permanently store change control records.

■ Contact details — Store, online and on paper, the details of all support contracts, support numbers, engineer details, and telephone and fax numbers.

■ LANsentry Reporter — Use LANsentry Reporter to generate reports from the database.

To be ready to remotely access your network, store the network maps, contact details, and important network addresses at the homes of those who support the network.

Identifying YourNetwork’s Normal

Behavior

By monitoring your network over a long period, you begin to understand its normal behavior. You begin to see a pattern in the traffic flow, such as which servers are typically accessed, when peak usage times occur, and so on. If you are familiar with your network when it is fully operational, you can be more effective at troubleshooting problems that arise.

Baselining Your Network

You can use a baseline analysis, which is an important indicator of overall network health, to identify problems. A baseline can serve as a useful reference of network traffic during normal operation, which you can then compare to captured network traffic while you troubleshoot network problems. A baseline analysis speeds the process of isolating network problems.

By running tests on a healthy network, you compile “normal” data to compare against the results that you get when your network is in trouble. For example, Ping each node to discover how long it typically takes you to receive a response from devices on your network.

Applications such as Status Watch, Address Tracker, LANsentry Manager, and Traffix Manager allow you to collect days and weeks of data and set a baseline for comparison. Through the reporting mechanisms in the


following list, you can continuously assess the data from your network and ensure that its performance is optimal:

■ Web Reporter generates daily or weekly reports from data collected by Status Watch.

■ Traffix Manager generates weekly reports from collected data and calculates the baselines for you. Set up Utilization History and Error History reports with data resolution set to Weekly.

■ LANsentry Manager History View generates daily utilization graphs, which are sampled every 30 minutes, for each day over one week. Use these graphs to calculate your network baselines manually.

Identifying Background Noise

Know your network’s background noise so that you can recognize “real” data flow. For example, one evening after everyone is gone, no backups are running, and most nodes are on, analyze the traffic on your network using the Traffix Manager application. The traffic that you see is mostly broadcast and multicast packets. Any errors that you see are the result of faulty devices (trace). This traffic is the background noise of your network — traffic that occurs for little value. If background noise is high, redesign your network.

II
NETWORK CONNECTIVITY PROBLEMS AND SOLUTIONS
Chapter 4 Manager-to-Agent Communication

Chapter 5 FDDI Connectivity

Chapter 6 Token Ring Connectivity and Errors

Chapter 7 ATM and LANE Connectivity

4
MANAGER-TO-AGENT COMMUNICATION
Use these sections to identify and correct problems with communication between the management station and network devices:

■ Manager-to-Agent Communication Overview

■ Verifying Management Configurations

■ See “Manager-to-Agent Communication Reference” (for additional conceptual and problem analysis detail.)

Manager-to-Agent Communication Overview

If your management workstation cannot communicate with devices on the network, examine your management configurations for the devices and your management station configurations.

For information about Simple Network Management Protocol (SNMP), see “SNMP Operation”.

Understandingthe Problem

If your management station or the devices that you manage are incorrectly configured for management, then the management station, which includes your Transcend® applications, cannot perform autodiscovery, polling, or SNMP Get and Set requests on the device.

If you have not configured port connections (including a possible out-of-band serial or modem connection) and have not created an administration password for access to the management agent, do so before you continue.

Identifyingthe Problem

Examine your management configurations for any device that your management station cannot reach. Also examine your management station setup. If you can reach a device but are not receiving traps, first examine the trap configurations (the trap destination address and the traps configured to send). See “Configuring Traps” for more information.

68 CHAPTER 4: MANAGER-TO-AGENT COMMUNICATION

Solving the Problem Either modify device configurations so that they are the same as your management stations or modify the management station to match the configurations of your devices.

Verifying Management Configurations

Verify that the following management configurations are correct:

■ IP Address

■ Gateway Address

■ Subnet Mask

■ SNMP Community Strings

■ SNMP Traps

How these parameters are configured can vary by device. For more information, see the user guide for each device.

Follow these steps:

1 Ping the device.

■ If the device is accessible by Ping, then its IP address is valid and you may have a problem with the SNMP setup. Go to step 5.

■ If the device is not accessible by Ping, then there is a problem with either the path or the IP address.

2 To test the IP address, Telnet into the device using an out-of-band connection.

If Telnet works, then your IP address is working.

3 If Telnet does not work, connect to the device’s console using a serial line connection and ensure that your device’s IP address setting is correct.

If your management station is on a separate subnetwork, make sure that the gateway address and subnet mask are set correctly.

4 Using a management application, perform an SNMP Get and an SNMP Set (that is, try to poll the device or change a configuration using management software).

5 If you cannot reach the device using SNMP, access the device’s console and make sure that your SNMP community strings and traps are set correctly.

You can access the console using Telnet, a serial connection, or a Web management interface.

Manager-to-Agent Communication Reference 69

Manager-to-Agent Communication Reference

This section explains management configuration terms and provides additional conceptual and problem analysis detail.

IP Address Devices use IP addresses to communicate with the management station and to perform routing tasks. Assign a unique IP address to each device in your network. Choose each IP address from the range of addresses that are assigned to your organization.

Gateway Address The default gateway IP address identifies the gateway (for example, a router) that receives and forwards those packets whose addresses are unknown to the local network. The agent uses the default gateway address when sending alert packets to the management workstation on a network other than the local network. Assign the gateway address on each device.

Subnet Mask The subnet mask is a 32-bit number in the same format and representation as IP addresses. The subnet mask determines which bits in the IP address are interpreted as the network number, which as the subnetwork number, and which as the host number. Each IP address bit that corresponds to a 1 in the subnet mask is in the network/subnetwork part of the address. This group of numbers is also called the Network ID. Each IP address bit that corresponds to a 0 is in the host part of the IP address.

The subnet mask is specific to each type of Internet class. The subnet mask must match the subnet mask that you used when you configured your TCP/IP software.

SNMP CommunityStrings

An SNMP community string is a text string that acts as a password. It is used to authenticate messages that are sent between the management station (the SNMP manager) and the device (the SNMP agent). The community string is included in every packet that is transmitted between the SNMP manager and the SNMP agent.

After receiving an SNMP request, the SNMP agent compares the community string in the request to the community strings that are


configured for the agent. The requests are valid under these circumstances:

■ Only SNMP Get and Get-next requests are valid if the community string in the request matches the read-only community.

■ SNMP Get, Get-next, and Set requests are valid if the community string in the request matches the agent’s read-write community.

For more information about SNMP requests and community strings, see “SNMP Operation”

A device is difficult or impossible to manage if:

■ The device is not using the correct community strings.

■ Your management station uses community strings that do not match those of the devices it manages.

If community strings do not match, either modify the community string at the device so that it is the string that the management station expects, or modify the management station so that it uses the device’s community strings.

Table 6 lists the default community strings for some common 3Com devices. Modify these default strings when you install a new device. You can use “Device View” to change community strings of most 3Com devices.

Community string settings are case-sensitive for all devices.

Table 6 Default Security Settings for Common 3Com Devices

DeviceRead-Only Community

Read-Write Community

AccessBuilder® 7000 BRI Card and PRI Card public private

CoreBuilder™ 2500 public private

CoreBuilder 3500 public private






NETBuilder® public *

Manager-to-Agent Communication Reference 71

Although community strings are SNMP’s way to secure management communication, these strings appear in the SNMP packet header unencrypted and are visible if the packet data is analyzed. For this reason, change community string settings frequently to improve management security.

SNMP Traps If your platform or management applications do not report events for some devices, then SNMP trap reporting may not be configured correctly for those devices.

If you find that traps are overwhelming your management workstation, you can filter out (disable) some common traps so that the management station does not receive them. Most devices allow you to select which traps to send to a management station IP address.

NETBuilder II® public *

OfficeConnect® products monitor security

OfficeConnect Remote 511, 521, and 531 public private

ONline™ hubs public *

SuperStack® II Desktop Switch public security

SuperStack II Hub TR Network Management Module

public private

SuperStack II Enterprise Monitor public admin

SuperStack II PS Hub monitor security

SuperStack II Switch 1000 public security

SuperStack II Switch 2000 TR public private

SuperStack II Switch 2200 public private

SuperStack II Switch 3000 (all variations) public security



SuperStack II Token Ring Monitor public admin

Transcend® Enterprise Monitor 540 public admin

Transcend Enterprise Monitor 542 public admin

Transcend Enterprise Monitor 570 public admin

* By default, no setting exists or is needed for initial access on this device.

Table 6 Default Security Settings for Common 3Com Devices

DeviceRead-Only Community

Read-Write Community


You can use “Device View” to change the trap reporting configuration of most 3Com devices.

See “Trap Reporting” for more information.

5
FDDI CONNECTIVITY
Use these sections to identify and correct connectivity errors on an FDDI ring:

■ FDDI Connectivity Overview

■ Monitoring FDDI Connections

■ Making Your FDDI Connections More Resilient

See “FDDI Connectivity Reference” for additional conceptual and problem analysis detail.

FDDI Connectivity Overview

Fiber Distributed Data Interface (FDDI), which is a self-correcting technology, automatically corrects ring faults to maintain connectivity throughout most of the network. However, you should monitor your FDDI connections for wrapped rings and other problems with ring connectivity.


As shown in Figure 9, in a thru FDDI LAN, no stations on the trunk ring have a Configuration State (SMTConfigurationState) of Wrap or Isolated. However, users who complain about network performance may have lost connectivity to other stations on the network because the FDDI network is wrapped or segmented.

Figure 9 Thru Ring

Wrapped ring By monitoring the “Peer Wrap Condition”, you can see when the Configuration State changes. In a wrapped ring (Figure 10), two stations

thru

thruthru

thru

74 CHAPTER 5: FDDI CONNECTIVITY

on the LAN are in a wrapped Configuration State. This condition may or may not affect the connectivity of certain stations. Although operational, your network may have a cabling problem or a problem with a link.

Figure 10 Wrapped LAN

Segmented ring In a segmented ring (Figure 11), more than two stations are wrapped on the trunk ring. Although this mode of operation is a valid FDDI LAN configuration, your LAN is probably experiencing a degraded or degrading condition.

Figure 11 Segmented Ring

When a network connection has excessively high link errors, Station Management (SMT) shuts down the connection and tries to reconnect again. A dual-attachment trunk ring station with an A or B connection that is shut down is one of the wrap points in the network. See “Making Your FDDI Connections More Resilient” for information about keeping a dual-attachment station connection from wrapping.

Isolated station Sometimes a network wraps a particular station out of the ring. Stations on either side of a problem station can be wrapped. This effectively isolates the station or links that have problems, as shown in Figure 12.

wrap_B

wrap_Athru

thru

wrap_B

wrap_A

wrap_B

wrap_A

FDDI Connectivity Overview 75

Figure 12 Wrapped Ring with Isolated Station

If a ring was already wrapped when a network wraps a station out of the ring, then a segmented ring results, as shown in Figure 13.

Figure 13 Segmented Ring with Isolated Stations

Twisted ring In a twisted ring, an A port is connected to an A port and a B port is connected to a B port instead of the normal A-to-B connections. A twisted ring, which always has two twist points (stations), can exist in either a Thru or Wrap state. You can monitor the “Twisted Ring Condition” and “Undesired Connection Attempt Event” for evidence of twisted ring and other connection problems.


To identify the problem, follow this process:

1 At the FDDI LAN level, verify that your network is operating.

If the network is operating, the FDDI ring may be segmented, and therefore an FDDI station or an Ethernet station on an Ethernet link may have lost connectivity to other nodes on the network.

2 Determine if a ring is in a Thru, Wrap, or Segmented state.

If the FDDI ring is segmented or wrapped, look for a problem with a link somewhere in the network or for a nonfunctioning node on your trunk ring. If the ring is operating and is not segmented, or if it is segmented

wrap_B

wrap_Athru

thru

isolated

wrap_B

wrap_Athru

isolated

thru

1st down

wrap_B

wrap_Athru

thru

isolated

thru

wrap_B

wrap_A

isolated

isolated

2nd down

wrap_A

wrap_B


but you still have connectivity to the stations in question, move to a more specific level in your network.

See “Monitoring FDDI Connections” for more information.

3 Determine if the poorly performing station is an Ethernet or FDDI station.

If the problem is an FDDI station, find out if it is congested (that is, if the station is so busy that it cannot accept all the network traffic that is directed to it) by determining its “Bandwidth Utilization”. Also determine if the station has a high frame error rate by looking at the “FDDI Ring Errors”.

If the problem is an Ethernet station, look for congestion by examining “Ethernet Packet Loss” and “Bandwidth Utilization”.

Solving the Problem Identify the station that is causing the disconnection and take the appropriate steps:

■ If the disconnection is caused by a wrapped ring, then fix the hardware or cabling problem at that station.

■ If the station is congested, you have a device problem rather than a network problem. For example, if the congested station is a file server and every other machine on the network is retrieving and saving files using that server, consider upgrading your server or adding additional servers to the network. A variety of devices from different vendors may be communicating on an FDDI or Ethernet network; some are faster and more capable, and some are slower and more prone to congestion.

■ If the station is an Ethernet station that is attached to an Ethernet segment, reevaluate the setup of your Ethernet network and make some changes to improve its performance.

You can also make FDDI connections more resilient by implementing dual homing or installing an Optical Bypass Unit (OBU) where FDDI connections are prone to fail. See “Making Your FDDI Connections More Resilient” for more information.

Monitoring FDDI Connections

Monitor your FDDI devices for Warning or Critical alerts in the FDDI Status tool.

Status Watch Use Status Watch to identify these FDDI connectivity errors:

Making Your FDDI Connections More Resilient 77

■ Peer Wrap Condition

■ Twisted Ring Condition

■ Undesired Connection Attempt Event

Follow these steps:

1 In the Device area, select the device that is located where you suspect an FDDI ring connectivity problem.

2 Monitor the FDDI Status tool for the currently selected device.

Here are some pointers for monitoring:

■ If the Peer Wrap Configuration State variable is Isolated, the device is not connected to the FDDI trunk ring. If you intend the device to remain isolated, this indication is not a serious condition. However, if the device is supposed to be connected on a trunk ring, a serious problem may exist. The device is no longer transmitting packets to the larger trunk ring.

■ If the Peer Wrap flag (SMTPeerWrapFlag) is set, the device is one of the wrap points. The cause of the wrapped ring is somewhere in the portion of the network between the two stations that report the peer wrap condition.

Making Your FDDI Connections More Resilient

When devices are removed from an FDDI ring, there is a break in the fiber path that causes the ring to wrap until the ring is made whole again. To prevent the break in the FDDI connection, you can implement dual homing or install an Optical Bypass Unit (OBU).

Implementing DualHoming

When the operation of a dual attachment node is critical to your network, dual homing adds reliability by providing a backup connection if the primary link fails. Because a dual attachment station (DAS) has two attachments to the FDDI ring (A-to-M and B-to-M), you can use one of them as a “standby” link if the active link fails. Using dual homing, only one of the two attachments is active at a time. In this sense, a DAS acts as if it is a single attachment station (SAS) by using its A port as the standby link.

Through SMT, a DAS can be dual homed to the same concentrator or, more commonly, to two concentrators. This arrangement provides a more stable trunk ring of concentrators. If one concentrator fails, the DAS


enables the standby link to another concentrator to become the active link. See Figure 14.

If the station is a dual path or dual path/dual MAC station, you can configure the dual-homed station in one of two ways:

■ With both links active

■ With one link active and one connection withheld as a backup, only becoming active when one link fails

Figure 14 Dual Homing Configuration

Installing an OpticalBypass Unit

You can insert an Optical Bypass Unit (OBU) into the FDDI ring as if it were a node and then plug your device into it. To use an OBU, your device needs an optical bypass interface. This interface lets the bypass know whether your device is still on the ring or not. See Figure 15.

If your device is removed or if it fails, the bypass unit diverts the optical path away from your device, keeping the ring whole. You can use a bypass on devices that are prone to failure or are likely to be removed often, such as diagnostic equipment.

A

A

A

B

A

B

SAS

SASserver

Dual-homedswitch

Standby link set by SMTconfiguration policy

Concentrator #1

FDDIdualring

Concentrator #2

Active link

M

M

M

B

B

M

M

M

M

FDDI Connectivity Reference 79

Figure 15 Optical Bypass Unit Configuration

FDDI Connectivity Reference

This section explains terms that are relevant to FDDI connectivity and provides additional conceptual and problem analysis detail.

Peer Wrap Condition A Peer Wrap (wrapped ring) condition occurs when a dual-attachment station detects a fault (often a lost connection) and reconfigures the network by wrapping the dual trunk rings to form a single ring. Normally, the two stations that are adjacent to the fault wrap to maintain full connectivity. However, if a second fault occurs before the first is repaired, the network partitions itself into two or more rings and stations lose connectivity.

When a station reports a Peer Wrap condition, locate and repair the problem that caused the station to wrap the rings. Potential causes include:

■ Faulty FDDI port hardware

■ Faulty cables or connectors

■ Unplugged connectors

■ Powered-down stations

You can expect to find the cause of the problem between the two stations that report the Peer Wrap condition.

Twisted RingCondition

A Twisted Ring condition occurs when certain undesirable connection types exist. See Table 7 for more information. Although similar to the Undesired Connection Attempt, the Twisted Ring condition provides specific Station Management (SMT) and port information for diagnosis.

B

A

OBUFDDIdualring

MICreceptacles

Power/control cableconnected to the opticalbypass interface of the DAS

DASA A

B B


UndesiredConnection

Attempt Event

An Undesired Connection Attempt event occurs when a port tries to connect to another port of a type that may result in an undesirable network topology. Whether the connection attempt is successful depends on the current setting of the station’s connection policies.

Table 7 lists connections that the FDDI standard defines as undesirable. The managed devices may or may not permit these connections, depending on their FDDI station configurations.

Table 8 lists FDDI connections that create valid topologies.

Table 7 Undesirable Connection Types

Connection Type*

* SuperStack® II Monitor series and Transcend® Enterprise Monitor series use type 1 to represent connection type A and type 2 to represent connection type B.

Reason That the Connection Is Undesirable

A-A Twisted primary and secondary rings

A-S A wrapped ring

B-B Twisted primary and secondary rings

B-S A wrapped ring

S-A A wrapped ring

S-B A wrapped ring

M-M A tree of rings topology (illegal connection)

Table 8 Valid Connection Types

Connection Type Reason That the Connection Is Valid

A-B A normal trunk peer connection

A-M A tree connection with possible redundancy. In a single MAC node, Port B has precedence (by default) for connecting to a Port M.

B-A A normal trunk ring peer connection

B-M A tree connection with possible redundancy. In a single MAC node, Port B has precedence (by default) for connecting to a Port M.

S-S A single ring of two slave stations

S-M A normal tree connection

M-A A tree connection that provides possible redundancy

M-B A tree connection that provides possible redundancy

M-S A normal tree connection

6
TOKEN RING CONNECTIVITY AND ERRORS
Use these sections to identify and correct token ring errors:

■ Token Ring Overview

■ Using Transcend Applications to Identify Problems and Symptoms

■ Identifying and Solving Ring Errors

■ Troubleshooting Notes

Token Ring Overview

Token Ring’s ring topology uses a token passing method for ring access and data transmission. The term “ring” is derived from the logical adjacency of the links between the adapter cards in a token ring network. Each adapter card has physical links to an upstream neighbor and to a downstream neighbor. Each device on the ring transmits onto the downstream link and receives data from the upstream link. In this way, each node acts as a repeater, passing traffic from neighbor to neighbor.

The term “token” refers to a special data sequence that is continuously sent around the ring. Any node that has data to send waits to receive the token before sending that data. Only a station in possession of a token can transmit new data on the ring. Unlike Ethernet and other contention-oriented protocols, token passing resolves network access conflicts without collisions. This arrangement ensures that every station on a token ring always has access to the network within a predictable time interval, even under heavy traffic load.

Token Ring connects up to 260 nodes in a star topology at 4 Mbps or 16 Mbps. Token Ring is a data link protocol or MAC layer protocol and functions at layers 1 and 2 of the 7-layer Open Systems Interconnection (OSI) model.

82 CHAPTER 6: TOKEN RING CONNECTIVITY AND ERRORS

Using Transcend Applications to Identify Problems and Symptoms

Troubleshooting your Token Ring network is a basic elimination process. The design of Token Ring makes it easier to isolate the causes of poor network performance or network failure because of the nearest upstream neighbor (NAUN) concept. However, locating the actual cause of the symptom may not be that obvious. Correctly isolating the symptom and cause eases your troubleshooting tasks and decreases the time your network is down or experiencing poor performance.

Use these tools to troubleshoot ring errors:

■ Token Ring Manager’s Statistics Tool (Windows and UNIX)

■ TR Analyzer (Windows only)

■ LANsentry Manager (Windows and UNIX)

■ Status Watch’s Token Ring Status Tool

■ Status Watch’s Token Ring Utilization Tool

Using Token RingStatistics Tool

To view general performance statistics of your token ring network you can use Token Ring Manager’s Statistics Tool. This tool, available for Windows and UNIX, displays high-level statistics of your network. See Figure 16.

Using Transcend Applications to Identify Problems and Symptoms 83

Figure 16 Token Ring Manager’s Statistics Tool


Token Ring Manager’s Statistics Tool shows top level statistics of your token ring network, such as total utilization, soft errors, and hard errors. You can view these statistics on three levels:

■ Port

■ Stack

■ Unit

For more information on Token Ring Manager’s Statistics Tool, see the Token Ring Manager User Guide.

Soft errors are less serious types of Token Ring errors that usually only temporarily disturb normal ring performance. However, high occurrences of certain types of soft errors can impact your network. You can review the soft errors that have occurred by using LANsentry® Manager (Figure 17) or TR Analyzer. These soft errors include:

■ Internal error

■ Burst error

■ Line error

■ Abort delimiter transmitted error

■ AC error

■ Lost frame error

■ Receiver congestion error

■ Frame copied error

■ Frequency error

■ Token error

Using LANsentryManager

LANsentry Manager consists of an integrated set of applications that you can use to display and explore the real-time and historical data captured by RMON-1 and RMON-2 compliant devices on the network. You can use LANsentry Manager to collect statistics to identify and deal with imminent problems.

Use LANsentry Manager to:

■ Capture and display packets using filtering and decode functions.

■ Configure alarms to monitor for specific events on a segment.

■ Monitor current performance of LAN segments.


■ Spot signs of current problems

■ View trends over time

At specified levels, LANsentry Manager polls remote network devices to retrieve essential network data. LANsentry Manager processes and displays the collected data in the main window. From the main window you can monitor the health of a segment, its current performance and recent trends. You can also open new windows to monitor different segments at the same time.

The following graphs appear for Token Ring in LANsentry Manager’s main window:

■ Packet Size Distribution

■ Packet Rates

■ Network Statistics

■ Top 10 Hosts (Packet Rate)

■ Top 10 Hosts (Error Rate)

■ Token Ring Status

■ Event Distribution

For in-depth investigation, you can launch LANsentry Manager’s RMON Views and Applications from the main window. The RMON Views and Applications allow you to:

■ Capture and display specific packets.

■ Compare statistics from different segments in the same graph.

■ Look at statistical and historical data.

■ Monitor conversations between stations on the network.

■ Set up alarm conditions.

Using the Ring Station View

The Ring Station View generates a table of statistics and status information associated with each station on the ring, including station status and last entered and last exited times.

For example, a disruption is caused every time a station inserts onto the ring. This results in a ring purge event and a new token is issued by the


active monitor. Use the Ring Station View to track which station is doing this and to discover the active monitor issuing the token.

Use the Ring Station View to:

■ Spot patterns on the token ring.

■ Review isolating errors and non-isolating errors.

■ See which devices are currently active on the ring.

Figure 17 LANSentry Manager Main Window

For more information on Ring Station View or LANsentry Manager, see the LANsentry Manager User Guide.

Using TR NetworkAnalyzer Tool

The TR Network Analyzer Tool provides an interface to the most important performance statistics that Token Ring management agents monitor. The interface provides a summary of the critical status and performance information for any monitored Token Ring network at a glance.

The TR Network Analyzer Tool window displays summary network configuration information and performance statistics. The right side of the window displays network graphs. The left side of the window displays an active station and error statistics list.


Network Graphs

By default, the TR Network Analyzer window shows the following graphs:

■ Utilization — Mac and Non-Mac

■ Soft Errors — Isolating and Non-isolating

■ Recoveries — Claim Tokens, Beacons, and Ring Purges

You can change the network graphs that appear. The following network graphs are available:

■ Utilization

■ Soft Errors

■ Recoveries

■ Line Errors

■ Internal Errors

■ Burst Errors

■ AC Bit Errors

■ Abort Xmits

■ Lost Frames

■ RX Congest

■ Frame Copys

■ Frequency Errors

■ Token Errors

Active Station and Error Statistics List

The Active Station and Error Statistics List lists all active stations on the managed network with corresponding station error information. This spotlights error trends and lets you quickly identify problem nodes.

This list is divided into two columns. The first column contains the ASCII characters that represent a station address. The second column is the number of errors that the station has.

These errors are broken down into specific errors in a box below. These errors can be:

■ Line errors


■ Internal errors

■ Burst errors

■ AC Bit errors

■ Aborted transmits

■ Lost Frame errors

■ Congestion errors

■ Frame Copied

■ Frequency errors

■ Token errors

■ Soft errors

■ Isolating errors

■ Non-Isolating errors

For more information about the TR Network Analyzer Tool, see the Token Ring Manager User Guide for Windows.

Transcend’s Status Watch application provides two tools for viewing and monitoring Token Ring segments. You can use these tools to:

■ Monitor events of token ring stations

■ View and configure utilization of bandwidth on the ring

Token Ring StatusTool

The Token Ring Status tool monitors conditions and events of the token ring stations in a managed device.

When you select a device in the report, the conditions and events for that device are sorted by severity. When you select a condition or event, the related variables and current values appear in the far right column.

Token Ring UtilizationTool

The Token Ring Utilization tool monitors the amount of traffic on token ring segments and shows how the bandwidth is being allocated on your network.

You can use the collected information to determine which ports have excessively high or low utilization. If necessary, you can redistribute network traffic accordingly. For example, if utilization is 40 percent or

Identifying and Solving Ring Errors 89

higher on a shared token ring segment, you need to reconfigure the network to better balance the load.

For more information about Status Watch, see the Status Watch User Guide.

Identifying and Solving Ring Errors

This section contains three sample problems that you may encounter on your network. These examples show ways to begin your troubleshooting task using Transcend®.

Example 1 A user cannot access the network.

1 Launch Token Ring Manager’s Network Analyzer (Windows) or LANsentry Manager (UNIX).

2 Look up the user’s address, which can be a MAC address, IP address, or NetWare name.

If you find the correct address, then you know that the user is on the ring; however the user may not be able to actually access a particular server. See Example 2.

If you cannot find the user’s address, see Step 3.

3 If you did not find the user’s address:

■ Reboot the machine to determine if this “fixes” any connection problems.

■ Examine the physical wiring from the stations’ adapters to the hub.

Example 2 A user cannot access the server.

1 Launch Token Ring Manager’s Network Analyzer (Windows) or LANsentry Manager (UNIX).

2 Look up the user’s address, which can be a MAC address, IP address, or NetWare name.

3 Launch LANsentry Manager (Windows and UNIX).

4 Perform a packet capture. You may want to perform a packet capture from the user end, the server end, or on any devices in between to find the source of the problem. Packet captures can be filtered on protocols or addresses.

5 Analyze various packet captures.


Example 3 A user declares that the network is slow.

1 Launch LANsentry Manager.

2 Review the “Top 10 Errors”. View which station is reporting the errors and the type of errors being reported.

3 Take decided course of action on the NAUN (Next Active Upstream Neighbor).

If you notice the problem is due to a high amount of traffic (greater than 40 or 50% utilization), do the following:

1 Review which station(s) or port(s) have high or excessive utilization.

2 Reconfigure the ring so that network traffic is distributed optimally in the network.

Example 4 A group of users cannot access the network.

Verify integrity of main ring path cabling.

and/or

1 Launch Token Ring Manager’s Device Manager.

2 Examine RI/RO cards for devices on the ring.

Example 5 None of the users on the ring can access the network.

■ Check the status of your switch, bridge, or router.

■ Ping device from station(s) on the ring.

Troubleshooting Notes

Here are some recommendations for easing troubleshooting tasks:

Documentation

The first thing a network administrator should do is document the network topology. Documenting your network is the first step in understanding and administering your network. This topology should include all devices on the network. You can document topologies of your complete LAN and of subnets in a LAN.

Documenting saves you time when you begin analyzing network problems. It will ease your troubleshooting task in general.

Troubleshooting Notes 91

You can document token ring networks with Token Ring Manager’s Map Manager. For more information on Token Ring Map Manager, see the Token Ring Manager User Guide for UNIX.

Analyzing Failures

There are two types of soft errors: normal and abnormal. Many normal soft errors occur when a station inserts into the ring or exits from a ring. These types of normal soft errors can usually be overlooked, especially when diagnosing a potentially more serious network

Don’t overlook the basics or any “obvious” problems. Remember to confirm the integrity of the main ring path. Check for bad or loose cable connectors or bad or damaged cabling. Also, make sure that any peripheral devices are configured correctly. For example, verify that your Network Interface Card (NIC) is correctly set to 16Mbps or 4mbps.

Know Your Network

A network administrator should have a good understanding of your network’s baselines. Baselines are usually identified as average network utilization for a set time period. For example, peak utilization on a network may occur first thing in the morning when a majority of users are starting their systems and launching various applications.

Baselining gives you a better understanding of how your network functions normally. This knowledge can also help you better analyze problems or failures in your network.

7
ATM AND LANE CONNECTIVITY
Ensuring Asynchronous Transfer Mode (ATM) and LAN Emulation (LANE) connectivity is a vital step in troubleshooting your ATM network. Use these tools to establish a baseline from which to measure future performance variations. Use these sections to identify and correct ATM and LANE connectivity problems:

■ ATM and LANE Connectivity Overview

■ Color Status and Propagation

■ Device Level Troubleshooting

■ LANE Level Troubleshooting

■ ATM Network Level Troubleshooting

■ Virtual LANs Level Troubleshooting

■ Identifying VLAN Splits

■ Path Assistants for Identifying Connectivity and Performance Problems

ATM and LANE Connectivity Overview

ATM differs from conventional LAN technologies because it employs a connection-based model for its basis. The connection in ATM is a point-to-point or point-to-multipoint link from one end of a system to another end. ATM is also based on cells. In the process of completing the connection, the cells must traverse a series of ATM switches in the network.

This methodology simplifies the delivery of cells because station destination and source addresses do not need to be carried in each cell. After a connection identifier makes the connection the connection remains open. Information is received in the same order in which it was sent so there is no need to disassemble and then reassemble cells. This delivery mechanism is especially useful and effective in voice and video applications.

94 CHAPTER 7: ATM AND LANE CONNECTIVITY

Before data can be sent through an ATM network, a connection must be established between end stations using either a preestablished, fixed path or by a protocol that determines the signalling.

Because ATM is connection-oriented, troubleshooting the network begins with the connection.

Color Status and Propagation

An extensive context status notification feature is supported in the Enterprise VLAN Management software.

The same network event may cause different status on different logical maps/icons. For example, an LAN Emulation Client (LEC) that cannot join its LAN Emulation Server (LES) is considered a critical event in the LAN Emulation map and not necessarily a fault in the Enterprise map. The severity depends on the context or the logical domain. Colors propagate upwards to the parent icon, so that the next highest level window’s color is influenced.

Transcend icons use high-end platform-configured icon status colors. Each status has a default color that the user can change.

You need to:

1 Identify the icon status color.

2 Locate the event.

3 Identify the cause of the color status according to the tables below and fix the problem if possible.

Table 9 lists the icon statuses according to the severity of the fault.

.

Table 9 Color Coding Key

Status Color

Critical Red

Major Orange

Minor Yellow

Normal Green

User-definable Brown

Unknown Blue

Disabled White

Device Level Troubleshooting 95

Device Level Troubleshooting

See Table 10 for the color and status of devices. Using this information you can determine where bottlenecks are occurring and then take appropriate steps to relieve the congestion.

Table 10 Color Key for Root Window and Devices.

If one or more parts of the logical entity of a device is in a critical state, the device appears in Major state. For example, a CoreBuilder appears in a Major state if one or more of the LESs that is attached to it is in critical state.

LANE Level Troubleshooting

The LAN Emulation map shows a comprehensive view of the real-time status of all edge devices in the network. Use this map to quickly determine the status of all LECs in the network prior to the start of a new work day. If any LEC in any switch is not in operational state, the critical status propagates up to the edge device icon level and further to the LAN Emulation icon at the highest level. This status propagation allows you to isolate the problem and fix it prior to users calling them.

Table 11 lists the status states of the LAN Emulation icons.

.

Map Icon Status Status Cause

Root Each icon reflects highest priority status of maps below it.

Enterprise Device Critical Does not respond to SNMP.

Major Hardware problem in the device.

Minor Device ports are enabled but in down state.

Normal Device operating normally.

Table 11 Color Key for LANE Level

Map IconStatus /Color Status Cause

LAN Emulation All icons reflect highest priority status of maps below it.

Backbone and Services

LES Critical Not defined for this version.

Major Does not respond to SNMP



Minor There is a user-defined name for this VLAN ID, but there is no LEC connected.

Brown There is no LEC connected.

Normal In operational state.

LEC Critical The LEC is not connected to the LES. It may be in join, configure or LECS connect state.

Major Does not respond to SNMP.

Minor In initial state.


LECS Major Does not respond to SNMP.

Minor Enabled but not active.

Unknown The LES is enabled but the LECS is disabled on the CoreBuilder device.

Normal Enabled and operational.

LANE User LEC Critical The LEC is not connected to the LES. It may be in join, configure or LECS connect state.


Minor In initial state.


Segment Major The device connecting this segment does not respond to SNMP.

Brown The first segment on the device may appear in this status. All other segments on the device are operating normally.

Unknown Device (all segments) operating normally.

Table 11 Color Key for LANE Level (continued)

ATM Network Level Troubleshooting 97

ATM Network Level Troubleshooting

Table 12 lists the status states of the ATM Network icons. These maps are hidden by default because the Topology view contains the same information. If you want to display the ATM Network map, select Edit, then select Show Hidden Object from the HP OpenView menu.

Virtual LANs Level Troubleshooting

Table 13 lists the status states of the ATM Network icons.

Table 12 Color Key for Network icons

Map Icon Status Status Cause

ATM Network Switch Domain Critical One or more of the lower level devices has an error of highest severity.

Major One or more of the lower level devices has a hardware problem.

Minor One or more lower level devices has device ports that are enabled but in down state.


ATM Switch Domain

This icon shows the highest priority status of the edge devices below it.

Table 13 Color Key for Virtual LANs Icons


Virtual LANs Virtual LAN Critical The LES is in major state.

Major One of the devices configured to use this VLAN does not respond to SNMP.

Minor There is a user-defined name for this VLAN ID but there is no LEC connected.

Brown There is no segment connected.

VLAN LES Critical Not defined for this version.



Identifying VLAN Splits

After the redundancy in the LAN Emulation Server has taken effect, the LAN Emulation Clients (LECs) are moved to the backup services. There may be circumstances where some of the (LECs) remain connected to the primary (LES) and are not moved to the backup LES. This condition creates a VLAN (ELAN) split. The VLAN split is caused because several (LECs) that belong to the same ELAN are bound to different LAN Emulation Servers. The split may occur when a LAN Emulation Server(LES) fails and the Network Management Station (NMS) changes the LAN Emulation Configuration Server database.

Indications in theVLAN Map

VLAN splits appear in the VLAN Map when the icon for the primary VLAN is green. This condition indicates that LECs are still attached. Under normal circumstances only one ELAN either primary or backup, should be green.

Indications in theBackbone and

Services Window

VLAN splits in the Backbone and Services window appear when different LAN Emulation Clients (LECs), belonging to the same ELAN are bound to both the primary and backup LAN Emulation Server (LES) of an ELAN.


Minor There is a user-defined name for this VLAN ID but there is no LEC connected.

Brown There is no LEC connected.


Segment Major The device connecting this segment does not respond to SNMP.

Brown The first segment on the device may appear in this status. All other segments on the device are operating normally.

Unknown Device (all segments) operating normally.

Table 13 Color Key for Virtual LANs Icons (continued)

Path Assistants for Identifying Connectivity and Performance Problems 99

General process To unify the split VLANS, you need to:

1 Ensure that all the LAN Emulation Configuration Servers have the same LAN that is displaying the split.

2 Using the Network Management Station, move the ports that are displayed in the primary VLAN to a temporary VLAN. Move the ports from the temporary VLAN into the backup VLAN.

Empty ELANS in the network are indicated with a brown color key.

Path Assistants for Identifying Connectivity and Performance Problems

You can use the Enterprise VLAN Management Path assistants to display the paths between ATM devices and network elements that are part of LAN Emulation.

LE Path Assistant Use LE Path Assistant to select any two LE Clients or two ports and to obtain the following information:

■ Address resolution through the LE Server

■ Control distributed path (direct)

■ Multicast forward addressing through the BUS

■ Data direct path

The Path Assistant displays the corresponding ports, its LAN Emulation Clients, and the LES/BUS service used for the connection. The color of the icons isolates the problem area. You can use Path Assistant to isolate user-user or user-server connectivity problems.

To access the LAN Emulation Path Assistant window, click on two Ethernet ports and then click the Path icon.

ATM Path Assistant Use the Path option to select any two ATM User Network Interface (UNI) or Network to Network Interface (NNI) endpoints across the network and to see the physical path as well as the (VCCs) established between the two end points. The following information can be obtained from this assistant window:

■ The physical path including all the intermediate switch nodes and the physical link between them


■ The ports at the ends of the physical links

■ The VCCs established between the end points

You can also use the Path option to setup Private Virtual Channels between the selected endpoint.

Tracing a VC PathBetween Two ATM

End Nodes

To trace a VC Path between two ATM nodes perform the following:

■ Select two ATM end nodes in the Topology Browser or management maps, and then select the Path icon.

Examining VirtualChannels Across

Layer 2 Topologies

You can easily examine Virtual Channels in the Network-Network Interface (NNI) and User-Network Interface (UNI) even if the icons are located in two different maps.

To examine Virtual Channels:

1 In the management map, click on a device and select Path from the VLAN menu.

2 Select any two devices in the maps.

3 Select Find VCI/VPI to display the path between the two devices.

Tracing the LANEmulation Control

VCCs Between TwoLANE Clients

To trace the LAN Emulation control VCCs between two LANE clients perform the following steps:

1 In the LAN Emulation map or the Topology Tree, select two LECs that are attached to the same LES.

2 Select the Path icon.

The LE-Assist window is displayed, showing the control VCCs between the two LECs and the LANE services (LES/BUS).

III
NETWORK PERFORMANCE PROBLEMS AND SOLUTIONS
Chapter 8 Bandwidth Utilization

Chapter 9 Broadcast Storms

Chapter 10 Duplicate Addresses

Chapter 11 Ethernet Packet Loss

Chapter 12 FDDI Ring Errors

Chapter 13 Network File Server Timeouts

Chapter 14 Measuring ATM Network Performance

8
BANDWIDTH UTILIZATION
Use these sections to identify and correct problems that are indicated by changes in bandwidth utilization:

■ Bandwidth Utilization Overview

■ Identifying Utilization Problems

■ Generating Historical Utilization Reports

■ See “Bandwidth Utilization Reference” see for additional conceptual and problem analysis detail.

Bandwidth Utilization Overview

To determine how your network is operating on a day-to-day basis, examine its bandwidth utilization. Changes in utilization can alert you to actual or potential problems.


Utilization varies depending on the media and on how your network is configured and used. Become aware of your network’s normal behavior so that you know when to examine utilization levels more closely. See “Identifying Your Network’s Normal Behavior” for more information.


Determine the current utilization of all media on your network (Ethernet, Fiber Distributed Data Interface, token ring, and Asynchronous Transfer Mode) to determine whether utilization rates are exceeding thresholds that you have set in the management software.

On most networks, utilization gradually increases as users begin using more network resources, such as electronic mail, network printing, and file sharing. Be concerned with utilization peaks that do not follow this pattern of use.

The process of identifying immediate utilization levels is discussed in “Identifying Utilization Problems”.

104 CHAPTER 8: BANDWIDTH UTILIZATION

Examine your network’s historical trends (its typical utilization over time) and note whether your network has experienced a gradual or sudden increase in utilization. Here are ways to assess trends:

■ A sharp increase in utilization indicates an abnormal condition. Search the area of the network where the increase occurred. For example, a device might be causing “Broadcast Storms”.

■ A sustained high or low level of utilization indicates an increasing or decreasing load on your network. Balance your network’s load by adding or redistributing segments.

The process of identifying historical trends is discussed in “Generating Historical Utilization Reports”.

A high rate of utilization can lead to high rates of packet fragments. As utilization exceeds the alarm threshold, packet fragments become common. See “Ethernet Packet Loss” for information about identifying when packet fragments are occurring.

Solving the Problem Narrow the utilization problem to the ports that have excessively high or low utilization. If necessary, redistribute network traffic accordingly by segmenting your LAN with a bridge, router, or switch.

Sometimes, a hardware problem can cause abnormal utilization rates. In this case, see “Ethernet Packet Loss” and “FDDI Ring Errors” for troubleshooting information.

Identifying Utilization Problems

First, determine utilization levels on your current network. Try to locate the segments that are experiencing high or low utilization levels.

Use Status Watch, which collects MIB-II data using Simple Network Management Protocol polling, to determine bandwidth utilization.

Status Watch The Status Watch utilization tools monitor the amount of traffic on network segments and show how the bandwidth is being allocated. These tools provide a real-time report of utilization data on the selected device or group of devices.

Table 14 describes the Status Watch tools that monitor your network’s utilization.

Identifying Utilization Problems 105

Follow these steps:

1 Select the group that you suspect has a performance problem.

The color-coded icons (for groups, devices, and tools) can guide you to the areas of your network that are experiencing problems. For example, red icons mean that you should examine a problem immediately. If a group is red, click the group to see all devices in that group, and locate the device that is red. Select the device and examine which tool icons are also red.

2 Select the utilization tool icon that indicates a problem.

The tool report displays all the interfaces in the group or device. Determine which interfaces reflect high rates. If some interfaces are experiencing excessively high utilization rates, look for broadcast storms and other conditions that cause packet loss, as described in:

■ Broadcast Storms

■ Ethernet Packet Loss

If an increase in utilization causes an increase in Error rates (other than collisions), look for MAC and physical layer problems (for example, faulty network cards, illegal repeater hops, and cables that are too long). Additionally, monitor Collision rates as utilization rises, looking for large increases that are out of the ordinary. In particular, search devices on the

Table 14 Status Watch Tools Used for Examining Utilization

Tool Icon What It Indicates

Ethernet Utilization

The aggregate percentage of utilization of an Ethernet segment (calculated by tracking the receive and transmit utilizations of Ethernet ports)

FDDI Utilization The percentage of utilization of the primary, secondary, and local FDDI rings (calculated by tracking the percent utilization of FDDI ports)

Token Ring Utilization

The percentage of utilization of a token ring segment

ATM Utilization The percentage of utilization of supported ATM interfaces


segment for “Excessive Collisions”. While Collisions are normal, Excessive Collisions means network delays.

Generating Historical Utilization Reports

Use real-time utilization data to see how your network is operating at the moment. To gauge whether utilization is at a critical point for your network, look at historical data. Use Web Reporter to generate a historical report that shows the utilization trends for a specific set of devices on your network.

Web Reporter Using Web Reporter, you can save days and weeks of network data, save a baseline week of “normal” data, and determine when utilization is constantly high.

Follow these steps:

1 Access Web Reporter. Use as the uniform resource locator (URL) the directory where you installed Transcend® NCS on the Web.

2 Generate a weekly Historical report to see utilization rates for the whole week.

3 Compare your weekly Historical report to a baseline of historical utilization data.

See “Identifying Your Network’s Normal Behavior” and the Web Reporter Help for more information about setting a baseline.

Bandwidth Utilization Reference

This section explains terms that are relevant to bandwidth utilization and provides additional conceptual and problem analysis detail.

ATM Utilization Over time, if a port has experienced increased, sustained utilization levels, then you need to balance the load of your ATM segments.

Status Watch calculates ATM utilization in this way:

greater of (in_util, out_util)

where:

in_util = ( ((rate of ifInOctets)*8) / ((linespeed)*0.9875) )*100out_util = ( ((rate of ifOutOctets)*8) / ((linespeed)*0.9875) )*100

Bandwidth Utilization Reference 107

The 8 factor converts octets to bits.

The 0.9875 factor offsets the interframe gap.

Ethernet Utilization Over time, if a port has experienced increased utilization levels (often a sustained level of over 40 percent), then you need to rebalance the load of your Ethernet segments.

Typically, the larger the frame size, the more utilization your network can accommodate.

You may recognize utilization problems with certain protocols before other protocols because some protocols have less tolerance for high rates of traffic. When utilization becomes a problem also depends on users. For example, you may allow higher utilization rates on an engineering network, yet you want greater bandwidth availability on a financial network where data delivery is critical.

As general guidelines, your network is healthy in these conditions:

■ Utilization is running up to 15 percent most of the time.

■ Utilization is peaking at 30 to 35 percent for a few seconds at a time, with large gaps of time between peaks.

■ Utilization is peaking at 60 percent for a few seconds, with large gaps of time between peaks. However, in this instance, locate the reason for the peak. Determine if the problem might get worse or if you can isolate it.

If the 30 percent utilization peaks start occurring very close together, your network starts showing signs of degraded performance.

Status Watch calculates Ethernet utilization in this way:

in_util + out_util

where:

in_util = ( ((rate of ifInOctets)*8) / ((linespeed)*0.9875) )*100out_util = ( ((rate of ifOutOctets)*8) / ((linespeed)*0.9875) )*100


The 0.9875 factor offsets the interframe gap.


FDDI Utilization FDDI accepts utilization levels that are equivalent to its rated speed. Unlike Ethernet, FDDI does not have delays and problems that cause collisions.

The best way to determine high FDDI utilization is to know the normal capacity of your FDDI network. Generally, if your FDDI network is consistently reporting 90 percent or more utilization, plan to balance the load on your network.

Status Watch calculates FDDI utilization in this way:

(1 - (delta(token_count)*latency) / delta(time) )*100

Token Ring Utilization Token ring media accepts utilization levels equivalent to its rated speed. Unlike Ethernet, token ring does not have delays and problems caused by collisions.

The best way to determine high token ring utilization is to know the normal capacity of your token ring network. Generally, if your token ring network is consistently reporting 90 percent or more utilization, plan to balance the load on your network.

Status Watch calculates token ring utilization in this way:

( ( rate*8) / (speed) )*100

where:

rate = ifInOctets / delta(time)speed = line speed of 4 or 16


9
BROADCAST STORMS
Use these sections to identify and eliminate broadcast storms:

■ Broadcast Storms Overview

■ Identifying a Broadcast Storm

■ Disabling the Offending Interface

■ Correcting Spanning Tree Misconfigurations

See “Broadcast Storms Reference” for additional conceptual and problem analysis detail.

Broadcast Storms Overview

A broadcast storm means that your network is overwhelmed with constant broadcast or multicast traffic. Broadcast storms can eventually lead to a complete loss of network connectivity as the packets proliferate.

Some devices, like the CoreBuilder™ 2500 and CoreBuilder 3500, have firewall protection against broadcast storms. If a certain broadcast transmit threshold is reached, the port drops all broadcast traffic. Firewalls are one of the best ways to protect your network against broadcast storms. Determine whether your network devices support this functionality.


“Broadcast Packets” and “Multicast Packets” are a normal part of your network’s operation. To recognize a storm, you must be able to identify when broadcast and multicast traffic is abnormal for your network.


You may suspect that a broadcast storm is occurring when your network response times become extremely slow and network operations are timing out. As a broadcast storm progresses, users cannot log in to servers or access e-mail. As the storm worsens, the network becomes unusable.

110 CHAPTER 9: BROADCAST STORMS

When your network is operating normally, monitor the percentage of broadcast and multicast traffic. You can then use this data as a baseline to determine when broadcast and multicast traffic is too high.

The process of identifying the problem is discussed in “Identifying a Broadcast Storm”.

Solving the Problem Storms can occur if network equipment is faulty or configured incorrectly, if the Spanning Tree Protocol is not implemented correctly, or if poorly designed programs that generate broadcast or multicast traffic are used.

The process for solving the problem is discussed in these sections:

■ Disabling the Offending Interface

■ Correcting Spanning Tree Misconfigurations

Identifying a Broadcast Storm

When identifying broadcast storms, use the following applications:

■ Status Watch — To recognize when broadcast and multicast traffic exceeds the normal rates for your network

■ Traffix Manager — To monitor all broadcast traffic over time

Status Watch Using the Status Watch tools in Table 15, you can identify when and where a broadcast storm is occurring.

For the Broadcast Receive and Broadcast Transmit tools, if the value for receive utilization is less than 10 percent, Status Watch ignores the high

Table 15 Status Watch Tools Used for Identifying Broadcast Storms

Tool Icon What It Indicates

Broadcast Receive The percentage of broadcast and multicast traffic received on an Ethernet port or token ring port

Broadcast Transmit The percentage of broadcast and multicast traffic transmitted from an Ethernet port or token ring port

Ethernet Utilization

\

The aggregate percentage of utilization of an Ethernet segment as calculated by tracking the receive and transmit utilizations of Ethernet ports

Identifying a Broadcast Storm 111

rate of broadcast traffic. This way, a broadcast problem is not falsely triggered in Status Watch for a segment on which a majority of traffic is spanning tree or Routing Information Protocol (RIP) packets.

Follow these steps:

1 Use the Summary View window to examine the Broadcast Transmit tool and Broadcast Receive tool to determine if any thresholds have been exceeded on your monitored devices.

These tools work together in this way:

■ If the thresholds for both the Broadcast Transmit tool and Broadcast Receive tool are exceeded on a device, then a broadcast storm is occurring on your network, and this device is receiving and transmitting the broadcast traffic.

■ If the threshold for the Broadcast Receive tool is exceeded but the Broadcast Transmit tool reports normal data on a device, then a broadcast storm is probably occurring on the segment that is attached to the interface that reports the excessive traffic, but this device might have a filter (such as a multicast packet firewall) that prevents the storm from propagating.

■ If the threshold for the Broadcast Transmit tool is exceeded but the Broadcast Receive tool reports normal data on a device, then the device is responsible for the broadcast storm.

2 Examine the Asynchronous Transfer Mode (ATM), Ethernet, Fiber Distributed Data Interface (FDDI), and token ring utilization tools to determine if their reported rates are abnormally high. If so, traffic is flooding the network. See “Bandwidth Utilization” for more information.

3 Search for “Ethernet Packet Loss” as an additional indicator that a broadcast storm is occurring. Increased collisions occur as the network becomes saturated.

After you set a baseline for normal network activity, you can set the Broadcast Transmit tool and Broadcast Receive tool thresholds to alert you when broadcast and multicast traffic is heavier than normal.

Traffix Manager Using Traffix Manager, you can monitor all broadcast traffic to identify exactly which devices are generating broadcast traffic.


Follow these steps:

1 Using the Select Database Traffic to Load dialog box, retrieve data to the Map using the 6-Hourly or Hourly data resolution.

Finer resolutions take longer to load from the database to the Map. However, they are more suitable for in-depth analysis of network traffic than the daily or weekly resolutions. For quicker retrieval of finer resolution data, select a shorter time range.

2 Open the Protocol Selection dialog box and set all protocols to appear as Other :

a Click Clear All to deselect all protocols.

b Click the Other checkbox to select it without selecting any child protocols.

c Set the Protocol Filter Mode to Unselected protocols are added to parent.

3 In the Map, select MAC Labels to display devices by their MAC addresses.

4 Use the Find Objects tool to locate the broadcast MAC address ff:ff:ff:ff:ff:ff and select it from the Object List or Map.

5 From the Display menu, select Show Conversations To and From to display all traffic that is going to and from the broadcast MAC address.

6 Set the Map all objects button to Map connected objects.

7 To create a list of the devices that are sending broadcast traffic to the broadcast address, right-click the Traffix group and select Visible Device List….

8 To generate a baseline of broadcast traffic:

a Right-click the Traffix root group and select Protocol Distribution.

b Select Packets and the timeline graph format.

9 To generate a list of the Top-N sources of broadcast traffic:

a Right-click the Traffix root group and select Child Top N.

b Select Packets and the bar graph format.

c Set Top N to an appropriate value.

The Top-N list can indicate what interface is starting the storm and what interfaces are propagating the storm.

Disabling the Offending Interface 113

Disabling the Offending Interface

Because broadcast storms can ultimately cause your whole network to become unavailable, take action immediately to disable the offending interface. You can enable the interface again after you have corrected the problem.

Address Tracker Use Address Tracker to locate the interface that is causing the broadcast storm. Use Device View to disable the port.

Follow these steps:

1 In the Find Address window, enter the address of the interface that seems to be receiving the broadcast traffic.

You can copy the MAC or IP address from the Status Watch report and paste it into Address Tracker’s Enter the Address You Want to Find field.

2 Click Find Now.

Search displays the device name.

3 Use Transcend Central to launch Device View and disable the port.

Disabling the port stops the broadcast storm before it interferes with all vital network traffic. You can re-enable this interface using Device View or the device’s console later.

Correcting Spanning Tree Misconfigurations

Spanning Tree does not cause broadcast storms, but a loop in your Spanning Tree topology can create data that looks like a storm. A loop can occur in your topology if:

■ Someone disables Spanning Tree on a port

■ You set up your Spanning Tree configuration incorrectly

Device View Use Device View to disable any Spanning Tree port that has a repeater attached to it and to correct Spanning Tree misconfigurations.

To correct Spanning Tree misconfigurations, use Device View to disable Spanning Tree Protocol (STP) for a port on a SuperStack® II Switch 1000, Switch 3000, Switch 3000 10/100, Switch 9000SX, Desktop Switch, LinkBuilder® FMS II Bridge/Management Module, or CoreBuilder™ 6000.


To disable the STP port state for a port on a SuperStack II switch:

1 Select a port and click the right mouse button.

2 From the shortcut menu, select Configure.

3 In the Port section, click the STP tab.

4 From the STP Port State list box, select Disabled.

5 Click Apply.

To disable the STP port state for a port on a LinkBuilder FMS II Bridge/Management Module:

1 Double-click the module.

2 From the shortcut menu, select Configure Bridge.

3 In the Port section, click the STP tab.

Broadcast Storms Reference

This section explains terms that are relevant to broadcast storms and provides additional conceptual and problem analysis detail.

Broadcast Packets Broadcast packets, which are a normal part of network operation, are transmitted by a device to a broadcast address. For example, IP networks use broadcasts to resolve network addresses using Address Resolution Protocol (ARP); IPX networks use a large number of broadcast packets to operate most effectively.

Problems arise when broadcast packets endlessly propagate throughout the network, which increases the traffic volume on your network and the CPU time that each host spends processing and discarding unwanted broadcast packets.

Multicast Packets Multicast packets, which are a normal part of network operation, are transmitted by a device to a multicast group address. Hosts that want to receive the packets indicate that they want to be members of the multicast group, and then multicast packets are distributed to that group. For example, multicast packets support the Spanning Tree Protocol. Multicast applications and underlying multicast protocols control multimedia traffic and shield hosts from processing unnecessary broadcast traffic. However, multicast traffic can also cause storms that saturate your network.

10
DUPLICATE ADDRESSES
Use these sections to identify and correct problems caused by duplicate MAC and IP addresses:

■ Duplicate Addresses Overview

■ Finding Duplicate MAC Addresses

■ Finding Duplicate IP Addresses

See “Duplicate Addresses Reference” for additional conceptual and problem analysis detail.

Duplicate Addresses Overview

Networks sometime generate duplicate MAC and IP addresses. Because duplicate addresses can cause problems with packet delivery, resolve them as soon as possible.


Duplicate MAC addresses are caused by data link layer problems with Fiber Distributed Data Interface (FDDI) media and the passing of tokens on the FDDI ring. Duplicate IP addresses are caused by network layer problems. See these sections for more information about causes of duplicate addresses:

■ Duplicate MAC Addresses

■ Duplicate IP Addresses


Identify duplicate MAC and IP addresses by following the instructions in these sections:

■ Finding Duplicate MAC Addresses

■ Finding Duplicate IP Addresses

Solving the Problem Identify the cause of the duplicate address (such as user error or a hardware problem), and fix the problem, if possible.

116 CHAPTER 10: DUPLICATE ADDRESSES

Finding Duplicate MAC Addresses

To find out if duplicate MAC addresses are occurring, monitor your network using Status Watch.

Status Watch The Status Watch FDDI Status tool identifies duplicate FDDI MAC addresses, and Status Watch reports when two or more MACs on the same ring have the same MAC address (a Duplicate Address condition).

Follow these steps:

1 In the Status Watch Summary View window, determine if any FDDI Status conditions are reported. If there are, double-click the table cell value to display the Device List window.

Another approach is to examine only the devices that you know reside on your FDDI ring. In the Status Watch main window, red device icons indicate that a threshold has been exceeded.

2 Select a device.

■ If you selected the device from the Device List window, the real-time report for that device appears in the Status Watch main window.

■ If you selected the device from the main window, also select the FDDI Status tool to view the real-time report.

3 Determine if a Duplicate Address condition caused the FDDI Status tool to trigger a Critical or Warning status for that device.

In Status Watch, you can specify the status severity level to apply to a Duplicate Address condition.

Finding Duplicate IP Addresses

To find out if duplicate IP addresses are occurring, monitor your network using these applications:

■ Address Tracker — To find duplicate IP addresses on 3Com devices and their attached networks.

■ LANsentry Manager® — To find duplicate IP addresses that are collected by probes gathering RMON2 SmartAgent® data from the Enterprise Communications Analysis Module (ECAM) downloaded on your network devices.

Address Tracker Use Address Tracker to determine when and where duplicate IP addresses occur.

Duplicate Addresses Reference 117

Follow these steps:

1 From the Find Address menu, select Find Duplicate IP Addresses.

2 Click Find Now to start your search.

LANsentry Manager Use the Duplicates table in LANsentry Manager to compile a list of all stations with duplicate IP addresses. This table is available only on probes that have downloaded RMON2 (ECAM) SmartAgent software.

Follow these steps:

1 From the LANsentry Manager Address Map menu, select Duplicates. Address Map data is displayed as a table.

2 To export the contents of the table, click Export to launch the Data Export dialog box.

Duplicate Addresses Reference

This section explains terms that are relevant to duplicate addresses and provides additional conceptual and problem analysis detail.

Duplicate MACAddresses

Each device on your network has a unique MAC address. This address identifies a single device on the network, allowing packets to be delivered to correct destinations.

Packets are delivered to their destinations by means of MAC-address-to-IP address translation that the Address Resolution Protocol (ARP) provides. Therefore, if MAC addresses are duplicated on the network, ARP caches of routing devices contain erroneous destinations. In FDDI, devices monitor network traffic, searching for their own MAC address in each packet to determine whether to decode the packet. If MAC addresses are not unique, two stations cannot be distinguished from each other.

Duplicate MAC addresses can occur for the following reasons:

■ Someone has manually configured a MAC address for a device instead of using the address that the vendor supplied or allowing it to be assigned dynamically, and this address is also assigned to a different device.

■ In rare circumstances, loops in a bridged network can cause a MAC hardware problem or an address learning problem that creates a duplicate MAC address entry in the bridging address table.

118 CHAPTER 10: DUPLICATE ADDRESSES

■ On DECnet Phase 4 networks, MAC addresses are set from the DECnet address. A duplicate NET address can cause a duplicate MAC address.

A router that maps the same MAC address to more than one IP address creates a valid network configuration. These MAC address assignments are not considered duplicate MAC addresses.

Burned-in addresses (BIAs), which are MAC addresses that a vendor permanently gives to a device, are always unique.

Duplicate IPAddresses

Because IP addresses are critical for transmission of packets on TCP/IP networks, resolve them immediately.

Duplicate IP addresses can occur when someone has configured an IP address that is identical to an IP address that is assigned to a different device. Address assignments, although possible for you to configure manually, are usually made using one of these protocols:

■ Dynamic Host Configuration Protocol (DHCP) — Allows your network to dynamically assign IP addresses to nodes. With this protocol, a DHCP server temporarily assigns an IP address to a node, or you can statically configure addresses as needed.

■ BOOTstrap protocol (BootP) — Allows you to statically assign IP addresses to nodes. This protocol is more efficient than RARP.

■ Reverse ARP (RARP) — Allows you to statically assign IP addresses to nodes. However, because this protocol relies on the MAC address to identify the node, you cannot use it on networks that dynamically assign hardware addresses.

11
ETHERNET PACKET LOSS
Use these sections to identify and correct Ethernet packet loss:

■ Ethernet Packet Loss Overview

■ Searching for Packet Loss

See “Ethernet Packet Loss Reference” for additional conceptual and problem analysis detail.

Ethernet Packet Loss Overview

If your Ethernet network shows signs of congestion, it may be experiencing packet loss. When your network is congested, utilization is usually high, packets are discarded because buffers are full, and collision rates are up. Problems related to “Collisions” are often at the heart of packet loss.


Collisions are normal in Ethernet networks. In many cases, Collision rates of 50 percent do not cause a large decrease in throughput. The Collision rate helps mark the upper limit on your network (the maximum percentage of collisions that your network can bear), which is usually around 70 percent. If Collisions increase above this upper limit, your network can become unreliable.

When the Collision rates increase, so do “Excessive Collisions”, which causes a delay in transmitting data. An increase in Collisions also indicates that network utilization and network errors, such as “FCS Errors”, are probably increasing.

The real packet problems to watch for, however, are undetected collisions that show up as “Late Collisions”.

If small packets are colliding, you do not necessarily see a rise in utilization, but you may still have a problem. Capture packets to determine their size.

120 CHAPTER 11: ETHERNET PACKET LOSS

Identifying theProblem

To identify that your network’s problem is related to packet loss, verify that frames are being dropped on your network by examining this packet loss data:

■ Alignment Errors

■ Collisions

■ Excessive Collisions

■ FCS Errors

■ CRC Errors

■ Late Collisions

■ Receive Discards

■ Too Long Errors

■ Too Short Errors

■ Transmit Discards

The process of identifying the problem is discussed in “Searching for Packet Loss”.

Solving the Problem If you notice that packet loss data is consistently high, then your network is too congested. In this case, segment your network with the appropriate network device (such as a switch or router). If Collision data shows increases but your network’s utilization is the same, then your network may have a physical problem, such as cabling that is too long. Other problems that packet loss data can indicate include:

■ Faulty connectors or improper cabling

■ Excessive numbers of repeaters between network devices

■ Defective Ethernet transceivers or controllers

Possible solutions to these problems are explained in the procedures in “Searching for Packet Loss”.

Searching for Packet Loss

When you look for packet loss, use the following applications:

■ Status Watch — For Ethernet and MIB-II data collection using SNMP polling

■ LANsentry Manager Network Statistics Graph — For RMON data collection using an RMON probe

Searching for Packet Loss 121

■ Device View — On a per-device basis, you can evaluate statistics for any port on the device.

Status Watch Status Watch monitors:

■ Alignment Errors

■ Excessive Collisions

■ FCS Errors

■ Receive Discards

■ Transmit Discards

Follow these steps:

1 Determine if the thresholds for the Alignment Errors tool and FCS Errors tool are being exceeded.

Table 16 identifies the problems that this data can indicate and your possible actions. For information about problems related to a nonstandard Ethernet implementation, see “Nonstandard Ethernet Problems”.

Table 16 Alignment Errors, FCS Errors, and CRC Errors Data

Possible Problem Possible Action

Faulty cabling Examine the cable and cable connections for breaks or damage.

Network noise Look for improper cabling, faulty cable, faulty network equipment, or cables that are too close to equipment that emits electromagnetic interference (lamps, for example).

Faulty transceiver Use an analyzer to identify the problematic transceiver. If necessary, replace the transceiver, network adapter, or station.

Fault at the transmitting end station

1 Locate the source of the errors by looking at the module and port statistics.

2 Verify the correct operation of the transceiver or adapter card of the device that is connected to the problem port.

3 If the card appears to be operating correctly, examine the cable and cable connections for breaks or damage.


2 Determine if the Excessive Collisions tool threshold is being exceeded.

Table 17 identifies the problems that this data can indicate and your possible actions.

3 Determine if the Receive Discards and Transmit Discards tools thresholds are being exceeded.

If these errors are high in conjunction with the data that you learned in steps 1 and 2, then your network is overloaded. Segment your network.

LANsentry ManagerNetwork Statistics

Graph

Use the LANsentry® Manager Network Statistics graph to view data for:

■ Collisions

■ Late Collisions

Station powering up or down None required.

Early implementations of Ethernet transceivers generate a significant amount of in-band noise when powering up; they frequently cause Alignment Errors and FCS Errors in an otherwise stable network.

When powering up, some software drivers for Ethernet controllers also initiate Time Domain Reflectometry (TDR) tests to test the Ethernet media. Network monitors report TDR tests as Alignment Errors and FCS Errors.

Faulty adapter Replace the adapter.

Table 16 Alignment Errors, FCS Errors, and CRC Errors Data (continued)


Table 17 Collisions and Excessive Collisions Data


Busy network Use a bridge, router, or switch to reconfigure your network into segments with fewer stations.

Faulty device (adapter, switch, hub, and the like) that does not listen before broadcasting. This problem increases the incidence of all types of collisions.

Isolate each adapter to see if the problem stops.

Network loop Ensure that no redundant connections to the same station have both connections active simultaneously.


■ Bandwidth Utilization

■ CRC Errors

■ Too Long Errors

■ Too Short Errors

Follow these steps:

1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance.

This graph shows the most recent trend in Collision rates. If you have set up a History sample, you can also look at the historical trend. If a number of segments are connected by repeaters, examine the graph for each Ethernet segment.

2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component.

■ If Utilization rates are high — The collisions are probably caused by an overloaded segment. If you have added nodes or new applications to your network, consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment. This action should level the network load.

■ If Utilization rates are stable and appear normal — The collisions are probably caused by faulty components. In this case, do the following:

■ If the network consists of repeaters — Compare the Network Statistics graphs for each segment connected to the repeater. Because repeaters “repeat” traffic across all connected segments (which makes many segments seem like one network), you should see similar levels of traffic on all segments. One segment that shows dissimilar levels of traffic and collisions may indicate faulty hardware. In this case, monitor several collisions to track the source station that is transmitting too soon after collisions and repair the station. Packets that are transmitted too soon after collisions are unlikely to be valid. See Table 17 for more information about Collisions.

■ On other networks — Determine the segment cable length.

3 Examine the CRC Errors and Late Collisions, which often indicate cabling or component problems.


Table 16 identifies the problems that CRC Errors can indicate and your possible actions. Table 18 identifies the problems that Late Collisions data can indicate and your possible actions.

4 Trace Too Short Errors and Too Long Errors to the sender.

These errors often indicate faulty routers or LAN drivers and transceiver problems. Table 19 identifies the problems that this data can indicate and your possible actions.

Table 18 Late Collisions Data


Cabling problems:

■ Segment too long

■ Failing cable

■ Segment not grounded properly (noise)

■ Improper termination

■ Taps too close (10BASE-5 and 10BASE-2 only)

■ Noisy cable

Correct the cabling problem by doing one or more of the following:

■ Reduce the segment length.

■ Replace the cable.

■ Ground the cable.

■ Terminate the cable correctly.

■ Check the taps.

■ Check for cables too close to equipment that emits electromagnetic interference.

Component problems:

■ Deaf or partially deaf node

■ Failing repeater, transceiver, or controller cards

Correct the component problem by doing one of the following:

■ Trace the failing component and replace it.

■ Replace the NIC or the transceiver.

Table 19 Too Long Errors and Too Short Errors Data


A transceiver on your network is adding bits to the packets that are transmitted by the attached station.

1 Use a network analyzer to identify the problematic transceiver.

2 If necessary, replace the transceiver, network adapter, or station.

The jabber protection mechanism on a transceiver has failed; it can no longer protect the network from the jabbering produced by the attached station.

Replace the network card.


Device View Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss. Table 20 describes these statistics and their use in troubleshooting.

Excessive noise on the cable

Note: Some 10/100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed, causing excessive noise.

Check for improper cabling, faulty cable, faulty network equipment, or cables too close to noisy electronic equipment (lamps, for example)

If your network card autodetects the network speed, and you have ruled out other problems, manually configure the network speed.

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer.

Faulty LAN driver Replace the driver.

A normal condition on a LinkSwitch® 1000, LinkSwitch® 3000, or CoreBuilder™ 5000 FastModule

If you use maximum-sized, 1518 Ethernet frames, the device’s VLT-enabled ports add a frame tag of 4 bytes, resulting in a misleading Too Long Frame error.

These frames are passed successfully but will create the Too Long Frame error message.

If you want to eliminate the error message, reduce your Ethernet packet frames by 4 bytes.

Table 19 Too Long Errors and Too Short Errors Data



:

To display Activity and Errors statistics for a device or port, follow these steps:

1 Select the required port or device.

2 From the shortcut menu, select Activity or Errors.

Table 20 Activity and Error Statistics in Device View

Statistics Group Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port.

This data shows readable packets, broadcast packets, “Collisions”, total errors, and runts, which cause “Too Short Errors”. You can interpret this data in the following ways:

■ The presence of runts can often be caused by Collisions; however, if the values increase at specific times of the day, it may indicate you need to change the network topology to manage the traffic more efficiently (for example, with switches or routers).

■ Runts can also be caused by a badly terminated coax cable.

■ Large numbers of runts, not associated with high levels of collisions, can indicate a transmission problem (examine the cable).

■ Particularly high numbers of Collisions, compared to the total number of readable packets, can point to a hardware problem (a bad adapter) or to a data loop.

■ A high proportion of “Broadcast Packets” (>10%) on a heavily utilized network (>50% of available bandwidth) can point to an incorrectly configured bridge or router on the network.

Errors Displays the number of frames with errors on the selected port.

The significance of errors depends on accompanying errors and prevailing network conditions. See the following error data for more information:

■ Alignment Errors, Table 16

■ FCS Errors, Table 16

■ Too Long Errors, Table 19

■ Too Short Errors or runts, Table 19

■ Late Collisions, Table 18

Ethernet Packet Loss Reference 127

The statistics available depend on the type of port or device selected. See Table 20 for troubleshooting information.

You may not be able to access these statistics on some devices using Device View. See the Device View documentation for additional information.

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail.

Alignment Errors An Alignment Error indicates a received frame in which both are true:

■ The number of bits received is an uneven byte count (that is, not an integral multiple of 8)

■ The frame has a Frame Check Sequence (FCS) error.

Alignment Errors often result from MAC layer packet formation problems, cabling problems that cause corrupted or lost data, and packets that pass through more than two cascaded multiport transceivers. See “FCS Errors” for more information about interpreting Alignment Errors.

Collisions Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay). Because only one device can transmit at a time, both devices must stop sending and attempt to retransmit. Collisions are detected by the transmitting stations.

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time. However, if the two devices retry at nearly the same time, packets can collide again; the process repeats until either the packets finally pass onto the network without collisions, or 16 consecutive collisions occur and the packets are discarded.

CRC Errors A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines “FCS Errors” and “Alignment Errors”. These errors indicate that packets were received with:

■ A bad FCS and an integral number of octets (FCS Errors)

■ A bad FCS and a non-integral number of octets (Alignment Errors)


CRC Errors can cause an end station to freeze. If a large number of CRC Errors are attributed to a single station on the network, replace the station’s network interface board. Typically, a CRC Error rate of more than 1 percent of network traffic is considered excessive.

Excessive Collisions Excessive Collisions indicate that 16 consecutive collisions have occurred, usually a sign that the network is becoming congested. For each excessive collision count (or after 16 consecutive collisions), a packet is dropped. If you know the normal rate of excessive collisions, then you can determine when the rate of packet loss is affecting your network’s performance. See “Knowing Your Network’s Configuration” for more information.

FCS Errors Frame Check Sequence (FCS) Errors, a type of CRC, indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check. The FCS is a mathematical way to ensure that all the frame’s bits are correct without having the system examine each bit and compare it to the original. Packets with Alignment Errors also generate FCS Errors.

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments. In a network that complies with the Ethernet standard, FCS or Alignment Errors indicate bit errors during a transmission or reception. A very low rate is acceptable. Although Ethernet allows a 1 in 108 bit error rate, typical Ethernet performance is 1 in 1012 or better.

Late Collisions Late Collisions indicate that two devices have transmitted at the same time, but cabling errors (most commonly, excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision. Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network. As a result, neither of the devices that cause the late collision senses the other’s transmission until the entire packet is on the network.

Although late collisions occur for small packets, the transmitter cannot detect them. As a result, a network suffering measurable Late Collisions for large packets is losing small packets as well.

Ethernet Packet Loss Reference 129

NonstandardEthernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard.

Receive Discards Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors.

Too Long Errors A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed. Too Long Errors are often caused by a bad transceiver, a malfunction of the jabber protection mechanism on a transceiver, or excessive noise on the cable.

Table 21 Symptoms of Common Ethernet Network Problems

Symptoms Problem Notes

“FCS Errors” and “Alignment Errors” increase significantly.

Network cabling is too long.

If you use a promiscuous network monitor, the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor.

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits).

Network segment is noisy.

Typically observed on a 10BASE-T network segment in a noisy environment. If you use multiple promiscuous monitors, the FCS and Alignment Errors among the monitors will not correlate.

If the monitor can track runts, also called “Too Short Errors”, the number of runt packets should be significantly higher than normal.

FCS and Alignment Errors are much higher than normal.

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMA/CD).

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 802.3 repeaters.

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG). The IPG is smaller on one side of the repeated network, causing a lost packet.

Ethernet controllers cannot receive packets that are separated by 4.7 µs or less. Some controllers cannot sustain receptions of packets separated by as much as 9.6 µs. If runt packets are received one after another and are followed by a collision fragment, Ethernet controllers that cannot sustain reception will lose packets.


Too Short Errors A Too Short Error, also called a runt, indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed.

Transmit Discards Transmit Discards indicate that packets were not transmitted because of network congestion.

12
FDDI RING ERRORS
Use these sections to identify and correct FDDI ring errors:

■ FDDI Ring Errors Overview

■ Identifying Ring Errors

See “FDDI Ring Errors Reference” for additional conceptual and problem analysis detail.

FDDI Ring Errors Overview

Fiber Distributed Data Interface (FDDI) often corrects its own problems. However, because FDDI cannot correct all errors (especially those related to hardware problems), you should monitor FDDI errors.


FDDI ring errors that you should monitor include:

■ Elasticity Buffer Error Condition

■ Frame Error Condition

■ Frames Not Copied Condition

■ Link Error Condition


First determine the type of FDDI ring errors and where they are occurring. Similar to the way you identify other FDDI problems, identify the upstream and downstream neighbors of the devices that you are monitoring.

Several types of network errors can cause FDDI performance problems. For example, problems with cables or physical connections may result in a link or frame error. Elasticity buffer (EB) errors can also lead to link and frame errors.

132 CHAPTER 12: FDDI RING ERRORS

FDDI deals with port-related errors as follows:

■ The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm. When the LER is greater than the alarm setting, Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port.

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring.

■ When link errors reach the threshold defined by the variable PORTLERCutoff, SMT breaks the connection, disabling the PHY that detected the problem. A “Link Error Condition” is also generated.

FDDI deals with MAC-related errors as follows:

■ When MAC frame errors reach a certain threshold, a “Frame Error Condition” is generated. Because the actual error can be further upstream than the immediate connection, the connection remains intact.

■ For a large network, the worst case MACFrameErrorRatio is less than 0.1 percent. However, during network configuration, frame error ratios can reach 50 percent for short periods. When you detect a sustained frame error ratio of more than 0.1 percent, a problem exists between the station that is reporting the condition and the nearest upstream MAC.

See “Identifying Ring Errors” for more information.

Solving the Problem To solve problems related to FDDI errors, fix the hardware, cabling, or congestion problem.

Identifying Ring Errors

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts.

Status Watch Use Status Watch to identify FDDI ring errors.

FDDI Ring Errors Reference 133

Follow these steps:

1 Monitor the FDDI Status tool for the currently selected device.

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors, Frames Not Copied, or Link Error Rates for the currently selected device.

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail.

Elasticity Buffer ErrorCondition

The Elasticity Buffer Error condition occurs when a port’s elasticity buffer overflows or underflows. This condition usually indicates that a port’s hardware is not operating within the tolerances that the FDDI standard specifies. Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port.

Frame ErrorCondition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold. In the situation when a device is an uplink to FDDI (that is, a device is transmitting onto FDDI), this type of condition indicates that the ring is saturated. The ring is out of buffer space and packets are being dropped from the device’s backbone port.

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor. Because many physical connections can lie along this path, the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring.

Frames Not CopiedCondition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold. This condition indicates that the station is congested and is unable to process frames as quickly as they arrive. To help eliminate congestion:

■ Add more capacity to the station

■ Reconfigure your network so that end stations that communicate heavily with one another are on the same bridge or switch

■ Filter out certain traffic

134 CHAPTER 12: FDDI RING ERRORS

Link Error Condition The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold. When the Link Error threshold is exceeded, the station removes itself from the ring and tries to reinsert itself on the ring. This action creates a “MAC Neighbor Change Event” (which also occurs if a ring wraps).

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector. Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter.

MAC NeighborChange Event

The MAC Neighbor Change event occurs when a MAC’s upstream or downstream neighbor changes.

This event indicates either:

■ A network reconfiguration

■ Another station that is leaving or joining the ring

13
NETWORK FILE SERVER TIMEOUTS
Use these sections to identify and correct timeouts on network file servers:

■ Network File Server Timeout Overview

■ Looking for Obvious Errors

■ Reproducing the Fault While Monitoring the Network

■ Correcting the Fault

See “Network File Server Timeouts Reference” for additional conceptual and problem analysis detail.

Network File Server Timeout Overview

A network file server can time out if your network gets congested or if your server is having problems. Users might have problems downloading data from or to the server or copying files from or to the server. To help you to understand the troubleshooting process for this type of problem, an EXAMPLE throughout this section follows the symptoms, analysis, and resolution of a typical file server timeout problem.


When users log in, their stations make network file server calls, either to determine quotas (if this feature has been enabled) or to mount user home directories. The network file server timeout messages, even when spread across multiple nodes, indicate a problem either with the network or with a server.

EXAMPLE: UNIX users notice that it takes a long time — over 30 seconds in some cases — to log in to any machine. Some machines report network file server timeout messages, but the messages have no obvious pattern and are infrequent. You begin to get a sense of the problem.

136 CHAPTER 13: NETWORK FILE SERVER TIMEOUTS

Identifying theProblem

First, rule out the obvious causes. Ask these questions:

■ Can you access the network file server with Telnet?

■ Have any alarms been triggered?

■ Are there any new errors?

The process of identifying the problem is developed in “Looking for Obvious Errors”.

Solving the Problem To determine the cause, reproduce the fault while you monitor the network. After you know the cause, you can fix the problem.

The solutions to the network file server timeout are identified in these sections:

■ Reproducing the Fault While Monitoring the Network

■ Correcting the Fault

Looking for Obvious Errors

To look for obvious errors, use these applications:

■ Ping and Telnet — To determine for connectivity to the network file server nodes

■ LANsentry Manager Alarms View — To search for triggered alarms

■ LANsentry Manager Statistics View — To look for errors

■ LANsentry Manager History View — To identify for trends

Ping and Telnet Determine whether you can contact network file server nodes using “Ping” and “Telnet”. If the response is extremely slow, then a problem may exist with the connections to the nodes. No delay indicates that the connections are normal, implying that the delay is occurring elsewhere. In this case, use LANsentry® Manager tools to determine whether packets are being lost or ignored.

LANsentry ManagerAlarms View

Using the LANsentry Manager Alarms View, you can determine if any configured alarms have been triggered.

Search the Alarms View to see if any MAC events have been logged.

EXAMPLE: MAC events have not been logged for the network on which the UNIX users are attached.

Looking for Obvious Errors 137

Even though no alarms have occurred, errors may exist. For example, a lower rate of background errors may exist just below the alarm threshold. Based on maximum and minimum values, RMON errors may miss constant, periodic, or low amounts of errors.

Before you monitor your network with LANsentry Manager, you should have already set up alarms for obvious errors related to MAC events and loading problems. See “Setting Thresholds and Alarms” for more information.

LANsentry ManagerStatistics View

Using the LANsentry Manager Statistics View, you can display a multisegment graph of utilization and error statistics.

Follow these steps:

1 Set up a graph that shows utilization and errors on all your major segments.

2 Determine whether any segments are particularly busy or error prone.

EXAMPLE: You notice that one segment of the UNIX network, HUB3, is reporting “Too Long Errors” and “FCS Errors” roughly every second sample. While the amount is not higher than normal, it is currently higher than any other segment.

LANsentry ManagerHistory View

Using the LANsentry Manager History View, you can display a rolling history table to determine if the errors that you are seeing are new. For example, if you have a history table that runs for 30-minute samples over two days, you can compare the most recent sample to a previous sample, looking for new errors. If your probe has the resources, use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors.

EXAMPLE: You see that the history table shows that no error rates remained constant throughout the day. However, errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors.

If you notice low error rates that are not triggering alarms, use a recent history of the network to see if the errors occur in regular bursts and to estimate the average number of errors.


Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem, it may not provide enough data to solve the problem. To determine the cause of the problem, reproduce it while you monitor the network by using these applications:

■ LANsentry Manager Top-N Graph — To locate a quiet node to use for reproducing the fault

■ LANsentry Manager Packet Capture — To capture packets from the hub to which the quiet node is attached

■ LANsentry Manager Packet Decode — To analyze the packets to assess network file server traffic and delays

■ Address Tracker— To find the location of the problem nodes

EXAMPLE: Using LANsentry Manager, you find a hub on the network with a higher than normal error rate. However, the error rate does not seem high enough to cause login delays of 60 seconds or more.

LANsentry ManagerTop-N Graph

Using the Top-N graph in the LANsentry Manager main window, locate a quiet node that has been showing the same problem. Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem.

EXAMPLE: You see that the node, Monolith , which has the same Network File System (NFS) mounts as the other nodes on the network, is quiet. You decide to use this node for reproducing the fault. See “Network File System (NFS) Protocol” for more information about NFS.

LANsentry ManagerPacket Capture

Using the LANsentry Manager Packet Capture application, capture packets from the network using predefined patterns and start-and-stop conditions.

Follow these steps:

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node. Until you know more about the problem, set a very general filter.

EXAMPLE: You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith .

Reproducing the Fault While Monitoring the Network 139

2 Telnet into and log out from the quiet node. Then reset the capture buffer. Repeat this procedure until you see the problem reflected in your captured data. To keep the buffer information clean, reset the buffer each time that you repeat the procedure.

3 When you see the delay, note the rough value of the packet count on the LANsentry Manager packet buffer.

By noting the packet count at which you think the delay has occurred, you can narrow the problem to within about 20 packets in the buffer. If you have used an extremely quiet node, you may even identify the exact packet.

LANsentry ManagerPacket Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail: summary information, header information, and actual packet content.

Follow these steps:

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred.

When you Telnet into a node, the traffic that the Telnet operation generates appears in the capture buffer. Expect this traffic when you read the buffer.

2 Select the packet and launch a MAC-layer conversation filter. In the filter display, look for a gap in the conversation (that is, where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem).

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes.

EXAMPLE: On the quiet node that you selected, the delay is obvious. You see an NFS request going out to a node and a repeat of the request 30 seconds later. During that time, the node did not respond. You now know that the delay occurred because nodes were not seeing responses for NFS requests. When you repeat the test on other nodes, you find that the delay is happening with more than one destination node.

Address Tracker Use Address Tracker, which polls managed devices, to determine the hubs to which the problem nodes are attached. If the problem end stations are located on unmanaged devices, then you can at least narrow the problem to those unmanaged devices.


EXAMPLE: Although your network does not have managed hubs that Transcend® NCS management software can poll, it does have managed switches. When it polls the switches, Address Tracker displays the switch ports on which addresses were last seen. This information indicates the hub (but not the hub port) on which the device is located.

If you need to take immediate action to resolve this problem for your users, move all the network file servers to different hubs. This quick fix reduces the amount of timeouts.

LANsentry ManagerPacket Decode

After you know the location of the hub that has the problem node, monitor the problem from the hub using LANsentry Manager Packet Decode.

Follow these steps:

1 To capture packets from one of the nodes on the hub, set up another capture buffer and repeat the exercise that is described in “LANsentry Manager Packet Capture”. Because a delay may occur on a different node, use two capture buffers without stopping the first one.

Note the rough packet count where the delay appears.

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation.

EXAMPLE: You hope that the nodes are on the same hub. You find that all the nodes are on HUB3. This result indicates that FCS Errors may be causing the timeouts. However, because the errors occur at a low rate, you decide to verify this diagnosis. You monitor the problem from the hub, logging in and out many times, and the delay eventually occurs. This time, the delay shows that the node’s reply had an FCS Error even though the node received the request. The switch would not have transmitted this packet, causing a timeout on the NFS protocol. The retry time is presumably 30 seconds. During this test, you see the problem occurring on another node.

Correcting the Fault Without a managed hub, you may find it very difficult to discover network file server timeout errors. To find the problematic node, you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub.

Network File Server Timeouts Reference 141

EXAMPLE: You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission. A possible reason for this occurrence is a “Jabbering” node. This explanation makes sense because FCS/Jabber frames increased linearly when you were monitoring the live network.

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail.

Jabbering When a node transmits illegal length packets and is possibly not operating within carrier specifications. In effect, another node has written bad data over a valid packet. This bad data is often interpreted as a repeated sequence of data.

Network File System(NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks. This protocol has been incorporated into products by more than 200 companies. It is now a de facto Internet standard.

NFS is one protocol in the NFS suite of protocols, which includes NFS, RPC, XDR (External Data Representation), and others. These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC). ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun.

14
MEASURING ATM NETWORK PERFORMANCE
Measuring performance of your Asynchronous Transfer Mode (ATM) network is an important step in establishing appropriate metrics for the most desired operation. Using these metrics you can establish a baseline and measure future performance. This chapter describes how to use the Enterprise VLAN Manager to measure ATM network performance.

The following topics are described:

■ Measuring Traffic Performance

■ Measuring Device Level Performance

■ Measuring Port Level Performance

■ LANE Component Statistics

Measuring Traffic Performance

Use the Utilization tool icon to launch and view traffic statistics between two or more switches. The Utilization tool can be configured to collect, display, and store information about good or bad traffic patterns across the network. In addition, information on new switches is automatically collected as soon as they are added to the network. The tool’s browser displays both the ATM switch and Ethernet switch hierarchy separately. You can add ATM or Ethernet switches to the Utilization Map using the Add button. After you add the switches to the Utilization map, data collection starts automatically.

The Configure option allows you to custom configure traffic polling, communications, and map configuration settings. These settings provide the base of information reported by the history portion of the tool. You can view historical data collected and stored by the Utilization Tool as line graphs, pie charts, and bar graphs.

Utilization Map The Utilization maps display the traffic patterns between switches. ATM switches are displayed as circular icons that are also pie charts that

144 CHAPTER 14: MEASURING ATM NETWORK PERFORMANCE

represent the in and out User-Network Interface (UNI) traffic that corresponds to that switch. The upper portion of the pie represents the maximum percentage of bandwidth utilization of the in/out UNI traffic. The lower portion of the circle represents the maximum percentage of speed of the in UNI traffic and appears in magenta. The IP address is displayed below the switch icon.

Displaying Link Traffic

The lines between the switch icons represent switch-to-switch links. The traffic load on each link is dynamically updated and is represented by a unique color. To view the legend information, select the Map Legend from the Map menu of the Utilization map.

The links are color coded according to the following legend:

■ 0 - 1 percent White

■ 1 - 3 percent Green

■ 3 - 10 percent Blue

■ 10 - 20 percent Yellow

■ 20 - 100 percent Red

Displaying Node Configuration

Select the switch in the Utilization Map and then select Node Configuration from the Map menu. The following static parameters of the switch are shown:

■ Name

■ IP address

■ ATM address

Configuring theUtilization Tool

The Utilization tool has a complete set of configuration options. To configure the Utilization tool, select the Configuration option from the Map menu of the Utilization Tool. You must restart the Utilization tool for the changes to take effect.

Map Configuration

Use Map tab to configure the size of the switch icon as well as the layout of the map itself. Included in this form is the Switch Radius option which allows you to modify the switch icon radius. The default radius is 32. You can select from one of three layout options:

Measuring Device Level Performance 145

■ None — Disables the automatic map layout

■ Rectangular — The map icons display in a rectangle.

■ Circular — The map icons display in a circle.

The Max% Traffic option allows you to set the maximum percentage traffic rate represented on a switch to switch link.

Polling Configuration

The Polling tab allows you configure the polling interval for data collection of the Utilization tool. Select the following:

■ Map Enable — Check to enable the dynamic updating of traffic on the Utilization map.

■ Chart Enable — Check to enable the dynamic updating of node and link performance charts.

■ Polling, Seconds — Select a polling interval for data collection.

Communication Configuration

Use Communication Configuration tab to configure the type of data that is monitored and collected. Select the following:

■ Good Cells — Configures the Utilization tool to collect data on Good Cells.

■ Bad Cells — Configures the Utilization tool to collect data on Bad Cells such as Errored (BIP), unrecognized ATM Cells.

Measuring Device Level Performance

The Performance Statistics windows display performance statistics for different objects in the Network. The Performance Statistics windows are “live” and updated automatically by continuous polling of the system. An object can be a device (for example a SuperStack II Switch 2700 or CoreBuilder module), device port (Ethernet or ATM), Emulated LAN entity (LEC or LES) or Virtual Channel. The windows use history graphs, bar charts, pie charts and dials to display the performance information. Polling and logging features can be accessed using the Options menu.

Using the HistoryGraph

Use the history graph to track device performance over a specified period of time. This metric is useful for spotting trends in performance and isolating downturns in device operation. Position the cross-hairs at a desired point on the history graph and click the left mouse button. The


detailed information about this sample point appears on the lower left corner of the graph. This information includes sample number, sample time, sample graph, and sample value.

When you are in the individual sample display mode, click on the right mouse button to return to the normal display mode.

Displaying Statistics To display statistics:

1 Select a device in one of the management maps or, click on a branch of the subtree in the Component View of the Topology tool.

2 Select Graph from the Enterprise VLAN menu or click the Graph icon.

Measuring Port Level Performance

Port level statistics are useful for isolating heavy-traffic ports. By knowing this information, you can determine bottlenecks and reshape network traffic on a port-by-port basis. This information is also useful for determining Virtual LAN (VLAN) structure and prioritization rationale. Identify high traffic ports so that you can take the proper steps to either isolate or integrate alternative traffic shaping patterns. Doing so is a valuable and necessary step in troubleshooting for peak network performance.

1 Select an element in the Enterprise branch subtree in the Topology tool, or select a port on the device view front panel display.

2 Select Graph from the Enterprise VLAN menu or select the Graph icon.

Traffic A History graph shows through frames per second through the port. Four separated sub-graphs are in the performance window:

Utilization A Dial graph shows maximum Utilization (10Mbps) of the port.

Table 22 Traffic Graphs

Graph Meaning

inGood All valid frames received at the port

inError Errored frames received at the port

outGood All valid frames transmitted from the port

outError Errored frames transmitted from the port

Measuring Port Level Performance 147

Total Frames A Pie chart shows the distribution of all received and transmitted frames:

Good Frames A Pie chart shows the distribution of valid received and transmitted frames:

Errored Frames A Pie chart shows the distribution of errored received and transmitted frames:

Table 23 Total Frames

Graph Meaning

inGood All valid frames received at the port

inError Errored frames received at the port

outGood All valid frames transmitted from the port

outError Errored frames transmitted from the port

Table 24 Good Frames

Graph Meaning

inUcast Unicast frames received at the port excluding discards

inNonUcast Broadcast and multicast frames received at the port excluding discards

outUcast Unicast frames transmitted from the port including discards

outNonUcast Broadcast and multicast frames transmitted from the port including discards

Table 25 Errored Frames

Graph Meaning

inDiscards Frames received at the port but discarded for internal reasons

inErrors Frames received at the port but discarded due to errors

inUnknown Frames received at the port but discarded due to unknown protocols

outDiscards Frames discarded from being transmitted from the port for internal reasons

outErrors Frames discarded from being transmitted from the port due to errors


LANE Component Statistics

Use the LANE Component Statistics allow you to measure the performance of LAN Emulation Services (LES) and LAN Emulation Clients (LEC) in the network. You can display statistics for the following LAN Emulation Services:

■ LES

■ LEC

■ LANE User

LES Statistics The LES performance statistics show the type of load that exists on the LAN Emulation Servers, use this information for load balancing when required.

The LES performance statistics are as follows:

■ Data — History graph of transmission rate of Broadcast and Unknown data (BUS) in Emulated LAN.

■ Data Utilization — Utilization of the transmission rate of the BUS service relative to the maximum possible.

■ Control Frames — A Pie graph of quality of LE ARPs and other LAN Emulation control frames handled by LES.

■ Errored Control Frames — A Pie graph of errored control frames.

■ Data/Control Octets — A pie graph of the ratio between LES transmission rate and BUS transmission rate.

To display performance statistics for a LAN Emulation Server:

1 Select an LES icon from the LAN Emulation map or an LES device (found in the Backbone and Services subtree) component in the Component View of the Topology tool.

2 Select Graph from the Enterprise VLAN menu or select the Graph icon.

LEC Statistics The LEC Graph displays statistics of the message traffic through the LEC. The LEC Statistics are:

■ Data frames/sec — History graph of the transmission rate of data frames through the LEC.

■ Data Frames — Pie graph of the distribution of different types of data frames through the LEC.

LANE Component Statistics 149

■ Data Utilization — Utilization of LEC data transmission rate relative to the maximum possible rate.

■ Control frames/sec — History graph of the transmission rate of control frames through the LEC.

■ Control Frames — Pie graph of the ratio of transmission of different types of LEC control frames.

■ Data/Control Frames — Pie graph of the ratio between LEC data frame transmission and LEC control frame transmission.

To display performance statistics for an LAN Emulation Client:

1 Select an LEC icon from the management maps or an LEC device component in the Component View of the Topology tool.

2 Select Graph from the Enterprise VLAN menu or use the Graph icon.

LANE User The LANE User statistics parameters show the in traffic and out traffic on the LEC and its segments. You may select to display all or part of the LEC groups in the LANE User statistics.

To display performance statistics for an LEC:

1 Select the LANE User icon from the management maps or a LANE User device component in the Component View of the Topology tool.

2 Select Graph from the Enterprise VLAN menu or select the Graph icon. Double-click the graph to zoom into one or more of the graphs.

IV
REFERENCE
Chapter 15 SNMP in Network Troubleshooting

Chapter 16 Information Resources

15
SNMP IN NETWORK TROUBLESHOOTING
The Simple Network Management Protocol (SNMP) and the Management Information Bases (MIBs) it uses are important for troubleshooting your network. These sections provide information about:

■ SNMP Operation

■ SNMP MIBs

SNMP Operation SNMP which is one of the most widely used management protocols, allows management communication between network devices and your management workstation across TCP/IP internets.

Most management applications, including Status Watch and Address Tracker applications, require SNMP to perform their management functions.

Manager/AgentOperation

SNMP communication requires a manager (the station that is managing network devices) and an agent (the software in the devices that communicates with the management station). SNMP provides the language and the rules that the manager and agent use to communicate.

Managers can discover agents:

■ Through autodiscovery tools on “Network Management Platforms” (such as HP OpenView Network Node Manager)

■ When you manually enter IP addresses of the devices that you want to manage

For agents to discover their managers, you must provide the agents with the IP addresses of the management stations.

Managers send requests to agents (either to send information or to set a parameter), and agents provide the requested data or set the parameter.

154 CHAPTER 15: SNMP IN NETWORK TROUBLESHOOTING

Agents can also notify the managers independently through unsolicited trap messages, which indicate that certain events have occurred.

SNMP Messages SNMP supports queries (called messages) that allow the protocol to transmit information between the managers and the agents. Types of SNMP messages:

■ Get and Get-next — The management station requests an agent to report information.

■ Set — The management station requests an agent to change one of its parameters.

■ Get Responses — The agent responds to a Get, Get-next, or Set operation.

■ Trap — The agent sends an unsolicited message to notify the management station that an event has occurred.

MIBs define what can be monitored and controlled within a device (that is, what the manager can Get and Set). An agent can implement one or more groups from one or more MIBs. See “SNMP MIBs” for more information.

Trap Reporting Traps are unsolicited, asynchronous events that devices generate to indicate status changes. Every agent supports some trap reporting. You must configure trap reporting at the devices so that these events are reported to your management station to be used by the “Network Management Platforms” (such as HP OpenView Network Node Manager) and the “Transcend Applications”.

Not all traps are important for your management tasks. To decrease the burden on the management station and on your network, you can limit the number and type of traps reported to the management station.

MIBs are not required to document traps. SNMP supports the limited number of traps defined in Table 26. More traps may be defined in vendors’ private MIBs.

Table 26 Traps Supported by SNMP

Trap Indication

Cold Start The agent has started or been restarted.

Warm Start The agent’s configuration has changed.

SNMP MIBs 155

To minimize SNMP traffic on your network, you can implement trap-based polling. Trap-based polling allows the management station to start polling only when it receives certain traps. Your management applications must support trap-based polling for you to take advantage of this feature.

Security SNMP uses community strings as a form of management security. To enable management communication, the manager must use the same community strings that are configured on the agent. You can define both read and read/write community strings.

Because community strings are included unencoded in the header of a User Datagram Protocol (UDP) packet, packet capture tools can easily access this information. Similar to what you do with any password, change the community strings frequently.

See “SNMP Community Strings” for more information.

SNMP MIBs SNMP MIBs include MIB-II, other standard MIBs (such as the RMON MIB), and vendors’ private MIBs (such as enterprise MIBs from 3Com). These MIBs and their objects are part of the MIB tree.

MIB Tree The MIB tree is a structure that groups MIB objects in a hierarchy and uses an abstract syntax notation to define manageable objects. Each item on the tree is assigned a number (shown in parentheses after each item), which creates the path to objects in the MIB. See Figure 18. This path of numbers is called the object identifier (OID). Each object is uniquely and unambiguously identified by the path of numeric values.

Link Down The status of an attached communication interface has changed from up to down.

Link Up The status of an attached communication interface has changed from down to up.

Authentication Failure The agent received a request from an unauthorized manager.

EGP Neighbor Loss In routers running the Exterior Gateway Protocol (EGP), an EGP Neighbor has changed to a down state.

Table 26 Traps Supported by SNMP

Trap Indication


When you perform an SNMP Get operation, the manager sends the OID to the agent, which in turn determines whether the OID is supported. If the OID is supported, the agent returns information about the object.

For example, to retrieve an object from the RMON MIB, the software uses this OID:

1.3.6.1.2.1.16

which indicates this path:

iso(1).indent-org(3).dod(6).internet(1).mgmt(2).mib(1).RMON(16)

SNMP MIBs 157

Figure 18 MIB Tree Showing Key SNMP MIBs

MIB-II MIB-II defines various groups of manageable objects that contain device statistics as well as information about the device, device status, and the number and status of interfaces.

The MIB-II data is collected from network devices using SNMP. This data collects in its raw form. To be useful, data must be interpreted by a management application, such as Status Watch.

ROOT

ccit(0) iso(1) joint(2)

standard(0) reg-authority(1) member-body(2) indent-org(3)

dod(6)

internet(1)

directory(1) mgmt(2) experimental(3) private(4)

mib(1)

system(1)

interfaces(2)

at(3)

ip(4)

icmp(5)

tcp(6)

udp(7)

egp(8)

enterprises(1)

3Com® enterprise MIBs:a3Com(43)

synernetics(114)

chipcom(49)

startek(260)

onstream(135)

transmission(10)

snmp(11)

RMON(16)

RMON2(17)

MIB-II (1-11)

retix(72)


MIB-II, the only MIB that has reached Internet Engineering Task Force (IETF) standard status, is the one MIB that all SNMP agents are likely to support.

Table 27 lists the MIB-II object groups. The number in parentheses indicates the group’s branch in the MIB subtree.

MIB-I supports groups 1 through 8; MIB-II supports groups 1 through 8, plus two additional groups.

RMON MIB RMON is an SNMP MIB that enables the collection of data about the network itself, rather than about devices on the network.

A typical RMON system consists of two components:

■ Probe — Connects to a LAN segment, examines all the LAN traffic on that segment, and keeps a summary of statistics (including historical data) in the probe’s local memory. The probe can stand alone or be embedded within the agent software. See “Other Commonly Used Tools” and “3Com SmartAgent Embedded Software” for more information.

■ Management station — Communicates with the probe and collects the summarized data from it. The station can be on a different network from the probe and can manage the probe through either in-band or out-of-band connections.

Table 27 SNMP MIB-II Group Descriptions

MIB-II Group Purpose

system(1) Operates on the managed node

interfaces(2) Operates on the network interface (for example, a port or MAC) that attaches the device to the network

at(3) As used for address translation in MIB-I but is no longer needed in MIB-II

ip(4) Operates on the Internet Protocol (IP)

icmp(5) Operates on the Internet Control Message Protocol (ICMP)

tcp(6) Operates on the Transmission Control Protocol (TCP)

udp(7) Operates on the User Datagram Protocol (UDP)

egp(8) Operates on the Exterior Gateway Protocol (EGP)

transmission(10) Applies to media-specific information (implemented in MIB-II only)

snmp(11) Operates on SNMP (implemented in MIB-II only)

SNMP MIBs 159

The IETF definition for the RMON MIB specifies several groups of information. These groups are described in Table 28.

RMON2 MIB RMON and RMON2 are complementary MIBs. The RMON2 MIB extends the capability of the original RMON MIB to include protocols above the MAC level. Because network-layer protocols (such as IP) are included, a probe can monitor traffic through routers attached to the local subnetwork.

Use RMON2 data to identify traffic patterns and slow applications. The RMON2 probe can monitor:

■ The sources of traffic arriving by a router from another network

■ The destination of traffic leaving by a router to another network

Because it includes higher-layer protocols (such as those at the application level), an RMON2 probe can provide a detailed breakdown of traffic by application.

Table 28 RMON Group Descriptions

RMON Group Description

Statistics(1) Total LAN statistics

History(2) Time-based statistics for trend analysis

Alarm(3) Notices that are triggered when statistics reach predefined thresholds

Hosts(4) Statistics stored for each station’s MAC address

HostTopN(5) Stations ranked by traffic or errors

Matrix(6) Map of traffic communication among devices (that is, who is talking to whom)

Filter(7) Packet selection mechanism

Capture(8) Traces of packets according to predefined filters

Event(9) Reporting mechanisms for alarms

Token Ring(10) ■ Ring Station — Statistics and status information associated with each token ring station on the local ring, which also includes status information for each ring being monitored

■ Ring Station Order — Location of stations on monitored rings

■ Source Routing Statistics — Utilization statistics derived from source routing information optionally present in token ring packets


Table 29 lists the additional MIB groups that are available with RMON2.

3Com Enterprise MIBs 3Com Enterprise MIBs allow you to manage unique and advanced functionality of 3Com devices. MIB names and numbers are usually retained when organizations restructure their businesses; therefore, many of the 3Com Enterprise MIB names do not contain the word “3Com.” Figure 18 shows some of the 3Com Enterprise MIB names and numbers.

Table 29 RMON2 Group Descriptions

RMON2 Group Description

Protocol Directory(11) Lists the inventory of protocols that the probe can monitor

Protocol Distribution(12) Collects the number of octets and packets for protocols detected on a network segment

Address Map(13) Lists MAC-address-to-network-address bindings discovered by the probe, and the interface on which the bindings were last seen

Network Layer Host(14) Counts the amount of traffic sent from and to each network address discovered by the probe

Network Layer Matrix(15) Counts the amount of traffic sent between each pair of network addresses discovered by the probe

Application Layer Host(16) Counts the amount of traffic, by protocol, sent from and to each network address discovered by the probe

Application Layer Matrix(17) Counts the amount of traffic, by protocol, sent between each pair of network addresses discovered by the probe

User History(18) Periodically samples user-specified variables and logs the data based on user-defined parameters

Probe Configuration(19) Defines standard configuration parameters for RMON probes

16
INFORMATION RESOURCES
This section lists the information resources that can help you troubleshoot problems with your network. It contains:

■ Books

■ URLs

Books The books listed in Table 30 can help you with network troubleshooting.

Table 30 Reference Books

IBM’s Token-Ring Networking Handbook (J. Ranade Series on Computer Communications)

Author: George C. Sackett

Publisher: McGraw Hill Text

ISBN: 0070544182

Publish Date: June 1993

Interconnections: Bridges and Routers (Addison-Wesley Professional Computing Series)

Author: Radia Perlman

Publisher: Addison-Wesley Publishing Co.

ISBN: 0201563320

Publish Date: May 1992

Internetworking with TCP/IP: Design, Implementation, and Internals

Authors: Douglas E. Comer, David L. Stevens

Publisher: Prentice Hall

Edition: 2nd

ISBN: 0131255274


Internetworking with TCP/IP: Principles, Protocols, and Architecture

Author: Douglas E. Comer

Publisher: Prentice-Hall

Edition: 3rd

ISBN: 0132169878

Publish Date: April 1995

(continued)

162 CHAPTER 16: INFORMATION RESOURCES

URLs The following uniform resource locators (URLs) lead to Web sites that are useful for network troubleshooting:

■ www.3Com.com — 3Com Corporation’s Web site, which contains:

■ The latest release notes and documentation for all 3Com products. Documents are organized in the Support area by product type.

■ White papers and other technical documents about networking technology and solutions.

■ 3Com product information.

■ The 3Com Shopping Network.

Managing Switched Local Area Networks.

Author: Darryl Black

Publisher: Addison Wesley Longman, Inc.

ISBN: 0201185547

Publish Date: November 1997

Network Management Standards: SNMP, CMIP, TMN, MIBs, and Object Libraries (McGraw-Hill Computer Communications)

Author: Uyless Black

Publisher: McGraw Hill Text

Edition: 2nd

ISBN: 007005570X

Publish Date: November 1994

The Complete Guide to Netware LAN Analysis

Authors: Laura A. Chappell, Dan E. Hakes

Publisher: Sybex

Edition: 3rd

ISBN: 0782119034

Publish Date: July 1996

The Simple Book: An Introduction to Networking Management

Author: Marshall Rose

Publisher: Prentice-Hall

Edition: 2nd

ISBN: 0134516591

Publish Date: 1996

Token Ring Network Design (Data Communications and Networks)

Author: David Bird

Publisher: Addison-Wesley Publishing Co.

ISBN: 0201627604

Publish Date: July 1994

Troubleshooting TCP/IP (Network Troubleshooting Library)

Author: Mark A. Miller

Publisher: M & T Books

Edition: 2nd

ISBN: 1558514503


Table 30 Reference Books (continued)

URLs 163

■ wwwhost.ots.utexas.edu/ethernet/ethernet-home.html — Charles Spurgeon’s Ethernet Web Site, which includes Ethernet troubleshooting information.

■ techweb.cmp.com/nc/netdesign/series.htm — Network Computing Online’s Interactive Network Design Manual, which helps you to design and troubleshoot networks.

■ www.nmf.org — Network Management Forum (NMF), a nonprofit global consortium that promotes and accelerates the worldwide acceptance and implementation of a common, service-based approach to the management of networked information systems.

■ www.ovforum.org — HP OpenView Forum’s Web site. HP OpenView Forum is a nonprofit corporation formed by the largest licensees of Hewlett-Packard OpenView to represent the interests of HP OpenView users and developers world-wide. The Forum is an independent corporation, not affiliated with Hewlett-Packard Company.

■ hpcc920.external.hp.com/openview/index.html — HP OpenView home page.

■ www.iol.unh.edu/index.html — University of New Hampshire InterOperability Lab (IOL) web site. Information on IOL consortiums, test suites, and technology tutorials.

■ www.3com.com/nsc/500251.html — Location of the document RMON Methodology: Towards Successful Deployment for Distributed Enterprise Management by John McConnell of McConnell Consulting, published in 1997.

These URLs are known to work; however, URLs are subject to change without notice.

164 CHAPTER 16: INFORMATION RESOURCES

INDEX 165

INDEX

Numerics3Com enterprise MIBs 160

AAddress Resolution Protocol

role in duplicate MAC addresses 117alarms

defined 54defining Start and Stop events 56setting against a baseline 56setting in LANsentry Manager 55tips for setting 57

alignment errorscauses and actions 121defined 127See also FCS errors

analyzersdefined 41use in troubleshooting 42See also probes

analyzing symptoms 28application layer 25ARP

quality of 148ARP (Address Resolution Protocol)

role in duplicate MAC addresses 117ATM (Asynchronous Transfer Mode) utilization 106audience description, About This Guide 13

Bbackbone

checking utilization 103location of management station 44monitoring with probes 48

background noise 63bad traffic

Utiolization 145balancing network load 32bandwidth utilization

ATM parameters 106Ethernet parameters 107FDDI parameters 108problems with 103token ring parameters 108

baselinescreating 62defined 62setting alarms from 56

book resources 161BootP

defined 118BOOTstrap protocol 118broadcast packets

defined 114See also broadcast storms

broadcast stormsbroadcast packets 114defined 109disabling the offending interface 113first clues 109identifying with Traffix Manager 111monitoring with Status Watch 110multicast packets 114troubleshooting 109

BUSin emulated LAN 148tranmission rate utilization 148

Ccable testers 42cabling

faulty 121problems 74, 124testing 42too long 129too short 130

collisionscauses and actions 123defined 127excessive 119late 119related to packet loss 119when normal 119See also excessive collisions and late collisions

color coded legendUtilization 144

color status propogation 94communications servers

connecting on the network 51defined 50

community strings

166 INDEX

default settings for 3Com devices 70defined 155device configuration 53

configuringUtilization Tool 144

configuring and customizingUtilization 144

congested station 76connections

adding redundancy 77undesirable for FDDI 80valid for FDDI 80

connectivity problemsdefined 23FDDI ring disconnections 73manager-to-agent communication 67

conventionsnotice icons, About This Guide 15text, About This Guide 15

CRC (Cyclic Redundancy Check) errorscauses and actions 123defined 127

customizingUtilization Tool 144

Ddata link layer 25DECnet Phase 4 networks 118default community strings 70default thresholds 54designing a network 43device configurations

for management 52misconfigured 29Ping responder 39storing 60

device levelmeasuring performance 145troubleshooting 95

Device Viewchecking packet loss statistics 125correcting spanning tree configurations 113defined 36using to set traps 53

devicesconfiguration information 60configuring for management 52default community strings 70faulty 122grouping 34, 54inventory 34, 61monitoring with probes 48

monitoring with Transcend software 53DHCP (Dynamic Host Configuration Protocol)

defined 118diagnostic equipment on FDDI 78disabling an interface 113DNS server problems 40dual homing

configuration 78defined 77

dual hosting 52duplicate addresses

causes 115troubleshooting 115with IP addresses 116with MAC addresses 116

Dynamic Host Configuration Protocol 118

EECAM 116, 117elasticity buffer errors

causes 131defined 133

Enterprise Communications Analysis Module 116enterprise MIBs 160equipment

backups 32for testing 31replacing 32

Ethernetcabling problems 124frames through port 147network problems 76nonstandard cabling problems 129port

utilization 146segment problems 76station problems 75utilization 107

Ethernet packet losschecking with LANsentry Manager 122checking with Status Watch 121Ethernet standard violations 129troubleshooting 119

excessive collisionscauses and actions 122defined 128related to packet loss 119

INDEX 167

FFCS (Frame Check Sequence) errors

defined 128related to packet loss 119

FCS errorsSee also alignment errors

FDDIidentifying problems with Status Watch 132MAC errors 132ring errors 131station problems 76utilization 108

FDDI backbonemonitoring with probes 48position of management station 44

FDDI connectivityadding redundancy 77dual homing 77Optical Bypass Unit 78SMT role 74troubleshooting 73undesired connections 80valid connections 80

file serverscorrecting timeouts 136

firewallsprotection against broadcast storms 109restricting access 29

frame errors 131causes 132defined 133

frames not copieddefined 133

FTP (File Transfer Protocol)compared to TFTP 41defined 41

Ggateway address

defined 69good traffic

Utilization 145

Hhardware

backups 32upgrading 32

historical reports 106history

graph 145

hysteresis zonecontrolling alarms 55

Iidentifying VLAN splits 98IETF

MIB-II MIB 158RMON MIB 159

in/out UNI traffic 144in-band management 49information resources 161installation problems 14intermittent connectivity 23internet link

monitoring with probes 48IP address

switch 144IP addresses

causes of duplicates 115, 118defined 69device configuration 53dynamically assigned 118identifying duplicates 116Pinging 39

IP hostnamesdevice configuration 53Pinging 40

ISO (International Standards Organization) 25isolated stations

defined 74

Jjabbering

defined 141protection mechanism failure 124

LLAN driver, faulty 125LAN Emulation

tracing VCCs between LANE clientsin Wizard Tool 100

LANEstatistics 148

LANE Userstatistics 149

LANSentry Manager 84LANsentry Manager

analyzing file server timeouts 136, 138checking Ethernet packet loss 122decoding packets 139

168 INDEX

defined 35identifying duplicate IP addresses 117setting alarms 55setting thresholds 55

late collisionscauses and actions 123defined 128related to packet loss 119

LE Serverdata 148statistics 148

LER cutoff 132link errors 74

causes 132defined 134

log bookmaintaining 62

logical network configuration 60loss of connectivity

overview 23

MMAC addresses

causes of duplicate 115, 117finding 34identifying duplicate 116storing 61

MAC neighbor change events 134MAC Watch

finding duplicate IP addresses 116troubleshooting file server timeouts 139

MACFrameErrorRatio variable 132MAC-to-IP address translation 35managed hubs

defined 46in troubleshooting 52troubleshooting file server timeouts 140

management configurationschecking 68design of network 43gateway address 69IP address 69SNMP community strings 69SNMP traps 71

management softwareEnterprise VLAN Manager 60

Device View 36LANsentry Manager 35Network Admin Tools 32Status Watch 34Traffix Manager 35Transcend Central 34

Upgrade Manager 36Web Reporter 34

management stationconfiguration 52connecting to UPS 52dual hosting 52location on network 44RMON MIB 158security 52

measuringnetwork-wide ATM traffic

Utilization 143MIB browser

in NNM 37viewing the tree 155

MIB-IIdefined 157objects 158

MIBsenterprise 160example of OID 156in SNMP management 154MIB-II 157RMON 158RMON2 159tree representation 157tree structure 155

misconfigurationsin newly connected devices 29

modemaccessing the device console 49out-of-band connections 49

multicast packetsdefined 114See also broadcast storms

NNetwork Admin Tools 32network changes

interpreting 27network configuration

device configurations 60site map 58VLAN setup 60

network designconsole connections 49criteria 43for business-critical networks 47position of management station 44redundant management 51tips 52using communications servers 50

INDEX 169

using probes 45network file server timeouts

checking for errors 136correcting the problem 140decoding packets 139description 135overview 135reproducing the fault 138

Network ID 69network layer 25network load

balancing 32network loop 122network management

position of management station 44network management platforms

defined 36in troubleshooting 37

network mapcontent 59defined 58example 59

network noise 121network performance measurement 143NFS

defined 141in file server timeouts 138

NNMMIB browser 37

normal networksbaselining 62collision rates 119defined 62identifying background noise 63setting thresholds and alarms 54

OOBU

configuration 79defined 78

OIDexample 156MIB tree 157use in trap reporting 53

ONC 141Open Network Computing 141OSI reference model

and network troubleshooting 25graphical representation 26layers and troubleshooting tools 25

out-of-band connectionsdefined 49

with Telnet 41

Ppacket capturing

using analyzers 41passwords

community strings 155storing 61

peer wrap conditioncauses 73defined 79evaluating 77

performancedevice level 145

performance problems 131checking utilization 103correcting duplicate addresses 115correcting FDDI ring errors 131defined 24Ethernet congestion problems 119solving file server timeouts 135stopping broadcast storms 109

Performance Statistics window 145physical connection break 29physical layer 25Ping

checking file server response 136creating a script 40defined 39device configuration 39interpreting messages 40strategies for using 39

Ping responder 39platforms 36presentation layer 25probes

defined 42in troubleshooting 42on business-critical networks 47placement on a network 45RMON MIB 158roving analysis 46See also analyzers 42

problemsanalysis example 30device configuration 29identifying causes 29physical connection break 29recognizing symptoms 27software installation 14solving 32testing causes 29

170 INDEX

Transcend software errors 14understanding 29

protocol analysis 41

QQoS (Quality of Service) 29

RRARP 118receive discards 129redundant connections

dual homing 77Optical Bypass Unit 78

redundant management 51replacing faulty equipment 32reporting

with Web Reporter 34reports

historical 106utilization 106

resourcesbooks 161URLs 162

Reverse ARP 118RIP packets 111RMON

groups 159LANsentry Manager 35MIB definition 158probes 42SmartAgent software 38Traffix Manager 35

RMON2groups 160LANsentry Manager 35MIB definition 159probes 42purpose 159Traffix Manager 35

routers, faulty 125Routing Information Protocol 111routing table

examining 50roving analysis

in business-critical networks 49with probes 46

Ssecurity

of management station 52

SNMP community strings 71, 155segmented ring

defined 74identifying 75

serial lineaccessing the device console 49out-of-band connections 49

serverscomm 50timeouts 135

session layer 25SmartAgent software

defined 37use in troubleshooting 38

SMT (Station Management)role in FDDI connectivity 74

SMTConfigurationState variable 73SMTPeerWrapFlag variable 77SNMP

messages 154SNMP agent

defined 153troubleshooting communication problems 67

SNMP community strings3Com defaults 70defined 69, 155device configuration 53

SNMP Getdefined 154when valid 70

SNMP Get Responsesdefined 154

SNMP Get-nextdefined 154when valid 70

SNMP managementlocation of station on network 45problems with 45

SNMP managerdefined 153troubleshooting communication problems 67

SNMP Setdefined 154when valid 70

SNMP trapsdefined 71, 154device configuration 53message description 154supported objects 154

softwarealerts 28backups 32problems 14

INDEX 171

upgrading 32solving problems

balancing network load 32overview 24replacing equipment 32upgrading software and hardware 32

spanning treecausing broadcast storms 110correcting configurations 113traffic not monitored 111

statisticsLANE component 148LE Server 148

Status Watchchecking for Ethernet packet loss 121checking utilization 104defined 34identifying a broadcast storm 110identifying duplicate FDDI MAC addresses 116identifying FDDI ring errors 132setting thresholds 54

Stop and Start events 56subnetwork mask

defined 69switch radius 144symptoms

analyzing 28recognizing 27software alerts 28user comments 27

TTelnet

accessing the device console 49checking file server response 136defined 41examining a routing table 50out-of-band connections 41, 49use in troubleshooting 41

testingequipment 31proving a theory 29

TFTPcompared to FTP 41defined 41

thresholdsdefined 54hysteresis zone 55setting in LANsentry Manager 55setting in Status Watch 54tips for setting 57

thru ring 73

timeout problemsnetwork file servers 135overview 23

Token Ring ManagerStatistics Tool 82

token ring utilization 108too long errors

causes and actions 124defined 129

too short errorscauses and actions 124defined 130

tracingLAN Emulation Control VCCs

in Wizard Tool 100traffic patterns 143

evaluating 36RMON2 MIB 159

Traffix Managerdefined 35identifying broadcast storms 111

transceiver, faulty 121Transcend Central

3Com inventory database 53defined 34grouping devices 54

Transcend SoftwareUpgrade Manager 36

Transcend softwareDevice View 36Enterprise VLAN Manager 60LANsentry Manager 35monitoring devices 53Network Admin Tools 32Status Watch 34Traffix Manager 35Transcend Central 34troubleshooting toolbox 33Web Reporter 34

transmit discardsdefined 130

transport layer 25trap reporting

defined 154device configuration 53

trap-based polling 155troubleshooting

device level 95LANE level 95Virtual LANs level 97

troubleshooting strategy 26twisted ring

defined 75, 79

172 INDEX

evaluating 77

Uundesired connection attempt

defined 80evaluating 77

uninterruptible power supply 52upgrading software

to solve problems 32using FTP 41using TFTP 41

UPS 52URL resources 162user complaints 27Utilization

bad traffic 145color coded legend 144Communication Configuration Tab 145configuring and customizing 144good traffic 145historical data 143Map Configuration Tab 144Polling Configuration Tab 145Traffic Polling Configuration 145

utilizationATM parameters 106Ethernet parameters 107FDDI parameters 108historical reports 106of Ethernet port 146of LEC transmission 149problems with 103token ring parameters 108transmission rate of BUS 148

Utilization Icon 143Utilization Map 143Utilization Tool

configuring and customizing 144

Vvalid service 29VLAN

splits 98VLANs (virtual LANs) 60

WWAN Link

monitoring with probes 48Web Reporter

defined 34

historical utilization reports 106wiring

testing 42Wizard Tool

tracingLAN Emulation VCCs 100

wrapped ringdefined 73identifying 75peer wrap condition 73, 79

WWW browserwith Web Reporter 34

Transcend NCS Network Troubleshooting Guide

Documents

Transcript of Transcend NCS Network Troubleshooting Guide