How to troubleshoot switch fabric

16
Switch Fabric- Troubleshooting tips How to Troubleshoot Switch Fabric? Introduction What is Switch Fabric? The switch fabric is essentially the backplane for all ports and modules on the switch module. When a connection is made from a port on one module to a port on another module, it is made across the switch fabric. Physically, it is the combination of silicon, plastic, and metal that enables ports to connect and pass traffic between themselves. The switch fabric can be blocking or non-blocking. Non-blocking fabric ensures that the total bandwidth of all ports that use the switch fabric do not exceed its capacity. What this means is that the density of the ports on the switch are such that their total capacity will never be greater than that of the switch fabric. Switches operating in non-blocking mode ensure that congestion will never occur on the switch, nor will ports ever want for bandwidth between each other. A blocking switch has a port density capacity that exceeds the total capacity of the switch fabric. Control is possible by blocking traffic flow when the switch fabric capacity is exceeded or otherwise not available. The switch fabric resides on the SE. When a port has to communicate with another port, it has the supervisor check its tables (Content Addressable Memory [CAM] for Layer 2 addresses and Ternary CAM [TCAM] for Layer 3 addresses) to determine what slot and port it needs. The supervisor then establishes the connection between the ports.The switch fabric can also reside on its own module (such as the Switch Fabric Module 2 (WS-X6500-SFM2) and the Switch Fabric Module (WS-C6500-SFM for the Catalyst 6500 Series), which enables the available capacity to be expanded without replacing the SE, or to expand beyond the capacity of the SE. On Catalyst 6500, Switch fabric is a daughter card installed at Sup720 , it used to be a separate module, Switch Fabric module in its first implementation back in the days of Sup2. It is used to provide backplane connectivity between linecards. The default bandwidth available on the backplane of 6500 is 32 Gbps. This 32 Gbps is used by all 1

Transcript of How to troubleshoot switch fabric

Page 1: How to troubleshoot switch fabric

Switch Fabric- Troubleshooting tips

How to Troubleshoot Switch Fabric?

Introduction

What is Switch Fabric?

The switch fabric is essentially the backplane for all ports and modules on the switch

module. When a connection is made from a port on one module to a port on another

module, it is made across the switch fabric. Physically, it is the combination of silicon,

plastic, and metal that enables ports to connect and pass traffic between themselves.

The switch fabric can be blocking or non-blocking. Non-blocking fabric ensures that the

total bandwidth of all ports that use the switch fabric do not exceed its capacity. What this

means is that the density of the ports on the switch are such that their total capacity will

never be greater than that of the switch fabric. Switches operating in non-blocking mode

ensure that congestion will never occur on the switch, nor will ports ever want for

bandwidth between each other.

A blocking switch has a port density capacity that exceeds the total capacity of the switch

fabric. Control is possible by blocking traffic flow when the switch fabric capacity is

exceeded or otherwise not available.

The switch fabric resides on the SE. When a port has to communicate with another port,

it has the supervisor check its tables (Content Addressable Memory [CAM] for Layer 2

addresses and Ternary CAM [TCAM] for Layer 3 addresses) to determine what slot and

port it needs. The supervisor then establishes the connection between the ports.The

switch fabric can also reside on its own module (such as the Switch Fabric Module 2

(WS-X6500-SFM2) and the Switch Fabric Module (WS-C6500-SFM for the Catalyst 6500

Series), which enables the available capacity to be expanded without replacing the SE,

or to expand beyond the capacity of the SE.

On Catalyst 6500, Switch fabric is a daughter card installed at Sup720 , it used to be a

separate module, Switch Fabric module in its first implementation back in the days of

Sup2. It is used to provide backplane connectivity between linecards. The default

bandwidth available on the backplane of 6500 is 32 Gbps. This 32 Gbps is used by all

1

Page 2: How to troubleshoot switch fabric

slots for serial transmission of data. Therefore at any instant only two ports can be

communicating.

With the addition of Switch fabric, the switch’s backplane changes from serially-accessed

bus to crossbar fabric. By using crossbar fabric, many ports can be simultaneously

transmitting and receiving data, providing a much higher throughput.

The crossbar fabric consists of 18 fabric channels, providing each linecard two fabric

channels into the crossbar fabric. These channels can run at 8Gbps or 20Gbps

depending upon the line card used. The CEF256 and dCEF256 series modules connect

to fabric using 8 Gbps per channel and CEF720 series modules connect to it using 20

Gbps per channel.

Requirements

For a module to use switch fabric, it should be a fabric enabled module

Troubleshooting Tips

1. If the Fabric Switch Module does not work as expected, check the following: 2

Page 3: How to troubleshoot switch fabric

a) Check if the Fabric Switch Status is Active. To do this, use the show fabric active

command. This command will display the current status of the Fabric Switch. Here is an

example.

Switch# show fabric active

Active fabric card in slot 5

No backup fabric card in the system

If the system has backup fabric card, then:

Switch #show fabric active

show fabric active:

Active fabric card in slot 5

Backup fabric card in slot 6

b) Check the fabric status of switching modules in the device. To do this use the show

fabric status [slot_number | all] command. This command will display the fabric status

of one or all switching modules. Here is an example,

Switch# show fabric status

slot channel speed module fabric

status status

1 0 8G OK OK

5 0 8G OK Up- Timeout

6 0 20G OK Up- BufError

8 0 8G OK OK

8 1 8G OK OK

9 0 8G Down- DDRsync OK

Switch#

3

Page 4: How to troubleshoot switch fabric

c) Check the fabric utilization of switching modules. To do this use the show fabric

utilization [slot_number | all] command. This command will display the fabric utilization

of one or all modules.

Here is an example,

Switch# show fabric utilization all

slot channel speed Ingress % Egress %

1 0 20G 0 0

1 1 20G 0 0

2 0 20G 0 24

2 1 20G 0 24

3 0 20G 48 0

4 0 20G 0 0

4 1 20G 0 0

2. In certain rare condi tions out put of 'show fabric channel-counters' may show

incrementing number of rxErrors.

Switch#show fabric channel-counters

Slot channel rxErrors txErrors txDrops lbusDrops

1 1 0 0 0 0

3 0 0 0 0 0

3 1 0 0 0 0

4

Page 5: How to troubleshoot switch fabric

5 0 5 0 0 0

8 0 39 0 0 0

8 1 0 0 0 0

a) RxRrror indicates that the module received corrupted packet(s) and dropped.

b) The Fabric do NOT check CRC when forwarding frames between different fabric

ports/channels.

c) This could be due to the receiving module corrupting the frames or receiving

corrupted frames from any fabric-enabled module in the switch.

The following actions can be taken to solve these errors:

a) Reseat the module with rxErrors. Reloading the linecard in question might stop the

errors for some time, but the errors might eventually come back.

b) If empty slot is available in the chassis move the affected line card to empty slot.

c) If no empty slots available, swap the linecard that counts rxErrors with other linecard

within the chassis (with no issue) or good known working linecard.

d) Swap the active and standby supervisors (i.e. move supervisor from slot 5 to slot 6

and vice versa. Sup failover.

e) Replace the affected linecard.

If the output of "show fabric status" command is showing "not-hot" for linecards under

hotStandby support.

Switch#show fabric status

slot channel speed module fabric hotStandby Standby Standby

status status support module fabric

3 0 20G OK OK Y(not-hot)

5

Page 6: How to troubleshoot switch fabric

3 1 20G OK OK Y(not-hot)

4 0 20G OK OK Y(not-hot)

4 1 20G OK OK Y(not-hot)

5 0 20G OK OK N/A

5 1 20G OK OK N/A

6 0 20G OK OK N/A

6 1 20G OK OK N/A

7 0 20G OK OK Y(not-hot)

7 1 20G OK OK Y(not-hot)

8 0 20G OK OK Y(not-hot)

8 1 20G OK OK Y(not-hot)

9 0 20G OK OK Y(not-hot)

9 1 20G OK OK Y(not-hot)

Reason: The standby fabric hot sync feature is only supported on the E version of the

6500 chassis, and this system has a non-E version.

3. If you see the error message as, “SP: Linecard endpoint of Channel 7 lost

Sync.

To Lower fabric and trying to recover now!”.

Reason: The message caused by a line card not being fully or properly seated. To

identify this line card - the capture of show fabric fpoe map command need to be

analyzed. Here is an example, 同上

6

Page 7: How to troubleshoot switch fabric

Switch#show fabric fpoe map

slot channel fpoe

1 0 0

1 1 9

2 0 1

2 1 10

3 0 2

3 1 11

4 0 3

4 1 12

5 0 4

6 0 5

6 1 14

7 0 6

7 1 15

8 0 7

8 1 16

9 0 8

9 1 17

Workaround: The fpoe will be mapped to a specific line card slot. Once the suspect line

card is identified From the output of show fabric fpoe map, fpoe 7 points to the line

7

Page 8: How to troubleshoot switch fabric

card in slot 8 and that is card that is causing the error messages.your next action

should be to schedule a removal and re-insert of that card to try to eliminate this

message from re-occurring.

4. If the system switching performance drops from 30Mpps to 15Mpps.

Reason: When classic and fabric enabled modules are mixed in a chassis, the system

switching performance drops from 30Mpps to 15Mpps.

Older "Classic" modules in the 6500, models 61xx, 62xx, 63xx, 64xx, send all traffic over

the switch BUS backplane, to be forwarded by the supervisor. Fabric enabled modules

only send the packet headers over the bus and the switch fabric can be utilized for

forwarding the data portion of the packet.

Workaround: Consider replacing any "Classic" modules with fabric enabled modules, in

order to increase system performance.

5. To troubleshoot further, collect the following show command output before

opening a TAC case.

step 1. turn on service internal.

switch# configure terminal

switch(config)# service internal

step 2. collect the requested logs.

terminal length 0

show fabric active

show fabric channel-counters

show fabric drop

show fabric errors

show fabric errors threshold

8

Page 9: How to troubleshoot switch fabric

show fabric fpoe map

show fabric status

show fabric utilization

show tech-support

remote login switch

terminal length 0

show fabric error

show fabric state-machine channel state

show fabric state-machine channel event_trace 11

show fabric resync

show fabric timeout

show platform hardware capacity fabric

exit

step 3. turn off service internal

switch # configure terminal

switch(config)# no service internal

9

Page 10: How to troubleshoot switch fabric

Troubleshooting Example

1. Fabric Time out Error Message:

%FABRIC-SP-[module-number]-TIMEOUT_ERR: Fabric in slot [dec] reported timeout

error for channel [dec] (Module [dec], fabric connection [dec])

Description

The error message indicates that firmware code on the fabric detected that the input or

output buffer was not moving. To recover from this condition, the system will

automatically

resynchronize the fabric channel.

Troubleshooting Steps

1. Issue the command “hw-module reset” to soft-reset the module. After the module

is up again,

2. capture the output of the command “show module” and the command “show

diagnostic module all”.

Sample Output Of “show module”

Show Module

Mod Ports Card Type Model Serial No.

--- ----- -------------------------------------- ------------------ --------- ------------------------------------

2 24 CEF720 24 port 1000mb SFP WS-X6724-SFP SAL0AAAAAAA

3 24 CEF720 24 port 1000mb SFP WS-X6724-SFP SAD0AAAAAAA

5 2 Supervisor Engine 720 (Hot) WS-SUP720-3B SAD0AAAAAAA

6 2 Supervisor Engine 720 (Active) WS-SUP720-3B SAD0AAAAAAA

10

Page 11: How to troubleshoot switch fabric

7 4 CEF720 4 port 10-Gigabit Ethernet WS-X6704-10GE SAL1AAAAAAA

8 4 CEF720 4 port 10-Gigabit Ethernet WS-X6704-10GE SAL1AAAAAAA

Sample Output of “show diagnostic module all”

Switch#show diagnostic module all

Current bootup diagnostic level: minimal

Module 6: Supervisor Engine 720 (Active)

Overall Diagnostic Result for Module 6 : PASS

Diagnostic level at card bootup: minimal

Test results: (. = Pass, F = Fail, U = Untested)

1) TestScratchRegister -------------> .

2) TestSPRPInbandPing --------------> .

3) TestTransceiverIntegrity:

Port 1 2

----------

U U

4) TestActiveToStandbyLoopback:

Port 1 2

----------

U U

5) TestLoopback:

11

Page 12: How to troubleshoot switch fabric

Port 1 2

---------

6) TestNewIndexLearn ---------------> .

7) TestDontConditionalLearn --------> .

8) TestBadBpduTrap -----------------> .

9) TestMatchCapture ----------------> .

10) TestProtocolMatchChannel --------> .

11) TestFibDevices ------------------> .

12) TestIPv4FibShortcut -------------> .

13) TestL3Capture2 ------------------> .

14) TestIPv6FibShortcut -------------> .

15) TestMPLSFibShortcut -------------> .

16) TestNATFibShortcut --------------> .

17) TestAclPermit -------------------> .

18) TestAclDeny ---------------------> .

19) TestQoSTcam ---------------------> .

20) TestL3VlanMet -------------------> .

21) TestIngressSpan -----------------> .

22) TestEgressSpan ------------------> .

23) TestNetflowInlineRewrite:

Port 1 2

12

Page 13: How to troubleshoot switch fabric

----------

U U

24) TestFabricSnakeForward ----------> .

25) TestFabricSnakeBackward ---------> .

26) TestTrafficStress ---------------> U

27) TestFibTcamSSRAM ----------------> U

28) TestAsicMemory ------------------> U

29) TestAclQosTcam ------------------> U

30) TestNetflowTcam -----------------> U

31) ScheduleSwitchover --------------> U

32) TestFirmwareDiagStatus ----------> .

In case the output doesn’t come as expected, physically pull out and reseat the module

firmly in the chassis to hard-reset the module. After the module is up again, capture the

output of the command “show module” and “show diagnostic module all”

Here is an example of failed diagnostic test for module 1

Module 1: Catalyst 6000 supervisor 2 (Active) SerialNo :

Overall Diagnostic Result for Module 1 : MINOR ERROR

Diagnostic level at card bootup: minimal

Test results: (. = Pass, F = Fail, U = Untested)

1) TestSPRPInbandPing --------------> F

2) TestTransceiverIntegrity:

13

Page 14: How to troubleshoot switch fabric

2. Overruns on some ports on Card 5 (WS-X6548-GE-TX)

Switch1# show interface counters

Noticed "overruns" on 4 interfaces. They were not incrementing

GigabitEthernet5/1 is up, line protocol is up (connected)

Full-duplex, 1000Mb/s, media type is 10/100/1000BaseT

0 input errors, 0 CRC, 0 frame, 26 overrun, 0 ignored

GigabitEthernet5/6 is up, line protocol is up (connected)

Full-duplex, 1000Mb/s, media type is 10/100/1000BaseT

0 input errors, 0 CRC, 0 frame, 14 overrun, 0 ignored

GigabitEthernet5/9 is up, line protocol is up (connected)

Full-duplex, 1000Mb/s, media type is 10/100/1000BaseT

0 input errors, 0 CRC, 0 frame, 159 overrun, 0 ignored

GigabitEthernet5/12 is up, line protocol is up (connected)

Full-duplex, 1000Mb/s, media type is 10/100/1000BaseT

0 input errors, 0 CRC, 0 frame, 269 overrun, 0 ignored

Also will enabling switch fabric rectify the overruns?

Solution:

Enabling switch fabric will not rectify overruns because after the installation of a Switch

Fabric Module in a Cisco Catalyst 6500 series switch, the traffic is forwarded to and from

modules in different modes which doesn't necessarily facilitate resolution for overruns.

The traffic is forwarded in one of these of these modes:

14

Page 15: How to troubleshoot switch fabric

a) Flow-through mode: In this mode, data passes between the local bus and the

supervisor engine bus. This mode is used for traffic to or from modules that are not

fabric-enabled.

b) Truncated mode: Only truncated data (the first 64 bytes of the frame) goes over the

switch fabric channel if both the destination and the source are fabric-enabled modules. If

either the source or destination is not a fabric-enabled module, the data goes through the

switch fabric channel and the data bus. The Switch Fabric Module does not get involved

when traffic is forwarded between modules that are not fabric-enabled.

c) Compact mode: A compact version of the DBus header is forwarded over the switch

fabric channel, which delivers the best possible switching rate. Modules that are not

fabric-enabled do not support the compact mode and generate cyclic redundancy check

(CRC) errors upon receipt of frames in compact mode. This mode is used only when no

such modules are installed in the chassis.

Let’s understand what overrun is:-

Overrun - The number of times the receiver hardware was unable to hand received data

to a hardware buffer

Common Cause - The input rate of traffic exceeded the ability of the receiver to handle

the data.

From the given example, the module used is WS-X6548-GE-TX:

This module is 8:1 oversubscribed. The ports on this module go to servers. On these

modules there is a single 1-Gigabit Ethernet uplink from the port ASIC that supports eight

ports. These cards share a 1 Mb buffer between a group of ports (1-8, 9-16, 17-24, 25-

32, 33-40, and 41-48) since each block of eight ports is 8:1 oversubscribed. The

aggregate throughput of each block of eight ports cannot exceed 1 Gbps. These line

cards are oversubscription cards that are designed to extend gigabit to the desktop and

might not be ideal for server farm connectivity. For more information refer to:-

Troubleshooting Switch Port and Interface Problems

To resolve this issue of overrun, move the high volume servers to ports on different asic

groups, so that the traffic flow through the 8 ports of every asic group does not exceed 1

Gbps. Alternatively look for other ideal design recommendations on line cards that have

better oversubscription ratio.

It refers from: https://supportforums.cisco.com/document/71201/switch-fabric-troubleshooting-tips15

Page 16: How to troubleshoot switch fabric

More related:

The Differences of XFP, SFP and SFP+

How to Identify the Genuine Cisco SFP?

Basic Knowledge on Cisco SFP Modules

Technical Specifications for Cisco SFP Modules

More Cisco products and Reviews you can visit: http://www.3anetwork.com/blog

3Anetwork.com is a world leading Cisco networking products supplier, we supply original

new Cisco networking equipments, including Cisco Catalyst switches, Cisco routers,

Cisco firewalls, Cisco wireless products, Cisco modules and interface cards products at

competitive price and ship to worldwide.

Our website: http://www.3anetwork.com

Telephone: +852-3069-7733

Email: [email protected]

Address: 23/F Lucky Plaza, 315-321 Lockhart Road, Wanchai, Hongkong

16