EtherNetIP System Troubleshooting

download EtherNetIP System Troubleshooting

of 13

Transcript of EtherNetIP System Troubleshooting

  • 8/3/2019 EtherNetIP System Troubleshooting

    1/13

    EtherNet/IP and Logix troubleshooting

    1 of 13

    Application Note:EtherNet/IP

    System Troubleshooting(A59533568)

    Revision history:

    8 December, 2003- Duplicate IP Address

    Add comment that work is in progress within ODVA EtherNet/IP to examine astandard mechanism to detect and defense.

    - Network traceRemoved reference to vendor software products

    - UDP StatisticsClarify enhancement request.

    - Change disposition of this document from Internal to Global access.

    2 October, 2003Initial release

    Purpose of Application NoteWhen troubleshooting any EtherNet/IP system, you must have a logical order to troubleshooting. Theorder for each troubleshooting case is dependent on the details for that case. The purpose of thisdocument is to list, in order of priority, troubleshooting steps.

    This paper should make you aware of two things:

    1. What/where should I look? slow PC or slow application running on the PC node configuration (IP address, etc.)

    congested network (lots of traffic such as broadcast) slow network (satellite or frame relay)

    misconfigured switch or router

    Logix controller resources- controller processing capability (5550, 5555, 5563)

  • 8/3/2019 EtherNetIP System Troubleshooting

    2/13

    EtherNet/IP and Logix troubleshooting

    2 of 13

    - timeslice for communications- cached message queue (32 max)- unconnected outgoing buffers (40 max)- etc.

    insufficient processing capability in a ENBT module

    duplicate IP addresses

    defective Ethernet network hardware (e.g. cable, switch port, or ENBT module) web server diagnostics or RSLinx diagnostics

    2. When all logical troubleshooting in step 1 above is not helpful, consider noise.

  • 8/3/2019 EtherNetIP System Troubleshooting

    3/13

    EtherNet/IP and Logix troubleshooting

    3 of 13

    Scope

    This paper will result in one of the following: Problem identified and solution implemented

    Error conditions identified for report to Rockwell Tech Support

    Additionally, there are potentially many possible troubleshooting scenarios. In general, there are threetypes of problems:

    It does not work at allExample: An I/O node is not connected to a switch (missing cable)Example: Cannot ping a node.Example: All MSG instruction to a specific 1756-ENBT fail.

    It works, but too slowExample: A resource (PC, controller, 1756-ENBT) in the system is overloaded.

    It works, but fails intermittently

    Example: The CLGX outgoing unconnected message buffer is being exceeded.Example: Noise is causing an I/O connection to be lost.

    The steps below will help solve any of these problems but to keep this document short, it does notdetail individual troubleshooting possibilities.

    The detailed steps below can be used as follows: It does not work at all

    Example: See Ping, Physical Layer

    It works, but too slow

    Examples: See Logix Controller System Overhead, Module Device Capacity, I/Oor Produce/Consume Tags, Rockwell Ethernet NIC, Logix Controller outgoingunconnected message buffer, etc.

    It works but fails intermittentlyExamples: See Switch configuration, I/O or Produce/Consume Tags, LogixController unconnected message buffer, etc.

  • 8/3/2019 EtherNetIP System Troubleshooting

    4/13

    EtherNet/IP and Logix troubleshooting

    4 of 13

    Where should I look?The list below gives an order to troubleshooting. Start with Ping and work yourway down. Skip any steps that you know are not necessary.

    PingIf you cannot get a reply using Ping.Example:

    Request timed out could mean a number of things inc target is powered down.Unknown host means the specified IP address is bad, e.g. 255.255.255.255.Destination host is not reachable could mean a number of things including a bad cable.

    Look for:

    AC power not applied a missing or defective cable (a clue would be that the Link light is off or intermittent)

    you forgot to configure the module

    you forgot to completely configure the target node including subnet mask and gatewayExample: attempting to ping a module on a different subnet and the subnet maskis set incorrectly or the gateway address is incorrect.

    on some switches (e.g. Cisco 3550), port mirroring disables pinging (on the mirror-toport)

    If replies are intermittent, ping continuously and see how much deviation. If the jitter is morethan 10ms or you skip a reply:

    something is busy (network or NIC)However, a busy 1756-ENBT probably wont be the problem. Frommeasurements, a 1756-ENBT running at 100% CPU Utilization replies in

    the range 10-16ms. Note that if you find a heavily loaded interface, reduce theload to, lets say, not more than 90% to allow for some margin.

    the network is long (satellite or Frame relay)

    noise is corrupting packets and they are being dropped

    Example: ping t 130.130.130.1 This will ping continuously

    If you can ping successfully, but the problem is not solved, continue on with the steps below.For help with the Ping command, just enter Ping from a cmd screen (DOS screen).

    You could also use RSWho to test connectivity. However, ping is simpler to use and is faster.

    Bad HardwareIf communications are consistently bad, replace suspect hardware to isolate the trouble area.

    Examples:CableRockwell Ethernet interface (e.g. 1756-ENBT)Switch port

  • 8/3/2019 EtherNetIP System Troubleshooting

    5/13

    EtherNet/IP and Logix troubleshooting

    5 of 13

    The problem may be old firmware or old hardware. Record hardware and firmware versionsand report to Technical Support for the appropriate vendor.

    Switch configuration, Autonegotiation or hard-configurationThe autonegotiation specification (in the 802.3 standard) allows for interpretation by developers.

    The result is that every vendors Autonegotiation firmware works nearly the same but notexactly. If one node is configured for half-duplex and the other for full-duplex, random andpossibly frequent communications will be lost.

    To see Rockwell duplex/speed status, see Rockwell web server diagnostics, Class 1 PacketStatistics. Verify that the status reported here matches the switch configuration.

    Example:If your switch is configured for Autonegotiation, the Rockwell web server page shouldindicate Auto Negotiated speed and duplex.

    If you are running out of troubleshooting ideas, hard configure the speed and duplex on the

    switch ports and also on all RA nodes, this will eliminate one more variable.

    As of RSLogix version 12, you can hard configure speed and duplex. As of RSLinx version 2.41(build 10), this feature is not yet supported but has been requested.

    I/O or Produce/Consume Tags (class 1 messaging)Look at Missed Frames in the web browser Diagnostics (see detailed web server descriptionbelow). This parameter is only for I/O or produce tag messaging.

    Although some applications may run OK when losing some frames, you should strive for asystem with zero (0) dropped frames.

    Furthermore, if you are dropping at least 4 consecutive frames, you might be dropping a CIPconnection. To clarify, if you are dropping connections, this will definitely be incrementing. Ifyou are not dropping connections, this may be incrementing if your system is not as stable aspossible.

    Viewing Missed Frames will give you something numerical to help quantify a problem. Note thatyellow triangles in RSLogix5000 I/O Configuration tree will not be seen if a connection is lostand recovered quickly enough. However, the Missed Frames counter will see everything evenone missed frame. This counter is excellent for diagnostics because of its high resolution.

    EtherNet/IP Module Device CapacityUse the web server to verify that CPU utilization on the Ethernet NIC is less than 100%.If utilization is at 100%, this may be the problem. To reduce the utilization:

    make I/O RPI values larger (slower)

    reduce the number of I/O connections

    make non-critical traffic less frequent (e.g. MSGs and HMI)

    add another EtherNet/IP module and divide the traffic load

  • 8/3/2019 EtherNetIP System Troubleshooting

    6/13

    EtherNet/IP and Logix troubleshooting

    6 of 13

    Logix Controller outgoing unconnected message bufferControlLogix has a limit of 10 outgoing unconnected buffers. As of version 8, this can beincreased to 40. See KnowBase document for details. These are required for all messaging -explicit and implicit (for establishing a connection).

    If the controller tries to exceed this limit, it will fail. Example, if you try to initiate 50 MSGinstructions simultaneously, those in excess of the buffer size will fail. See KnowBasedocument G20181 for information on how to read unconnected outgoing buffers

    attribute 17 is reserve (unused)attribuite18 is high-water markattribute 19 is buffers currently in use

    Use RSLogix5000 version 12 to read the above values reliably.

    Logix Controller System OverheadAdd more time for communications by increasing the continuous task timeslice or run the higher

    priority tasks (eg. Periodic) tasks less frequently or at a lower priority. The default timeslice is10%. Increase it to 30-50% to see affect.

    Slow PC ApplicationIf you think the customers application might be running slow, there are two possibilities:

    the PC is not powerful enough

    the application runs slow (or accesses controller data inefficiently)

    For either case, look at the CPU utilization in the Task Manager to see how close to 100% it is.

    Another approach would be to stop the application and use a simple application, OPC test

    client, that comes with RSLinx to access all the data you need. Configure the topic poll rate for1ms to make it goes as fast as the Rockwell controller(s) will go. If you can achieve sufficientthroughput using this approach, you hopefully will have convinced the customer that theproblem is the application (or that the PC is not powerful enough).

    Duplicate IP AddressIf two Rockwell nodes are duplicated, the last one to be configured will steal the IP address.Detection of this can be simple or difficult.

    Simple:In the I/O tree, a 1794-AENT is configured and operating nicely. However, a 17560ENBT is

    then accidently configured for the same address. The result would be that the Logix controllerwould declare the connection to the AENT is lost.

    Difficult:Messages (MSG instruction) from one CLGX to another CLGX are occurring OK. Then, after athird device is configured, the MSGs are failing. If you ping the IP address, it will ping OK. If the3rd device is of the same type (e.g. 1756-ENBT) but does not have the desired tag, evenRSWho will show good connectivity but the MSG will fail.

  • 8/3/2019 EtherNetIP System Troubleshooting

    7/13

    EtherNet/IP and Logix troubleshooting

    7 of 13

    However, work is in progress within ODVA EtherNet/IP to examine a standardmechanism to detect and defense against duplicate addresses.

    Network traceIf you have not solved the problem by now, we need to see what is happening on the network.Take a trace and forward to Tech Support for analysis. Make sure that the trace has theproblem in it.

    Without waiting for an analysis of the trace, start looking at the physical layer (see below).

    Noise or Intermittent Defective HardwareAll of the above steps are logical. If the above steps dont solve the problem, noise or badhardware is the problem. Intermittent communications is most likely caused by one of thefollowing.

    Ethernet cable placementExample: Visually inspect for cable placement next to 480VAC.

    Noise/grounding includingExample: Physically detach an intermittent chassis from the enclosure and seehow it operates.

    Intermittent hardwareFocus on a communications problem between 2 nodes and try the following:

    - Replace a Rockwell Ethernet interface.- Move the cat5 cable (from a Rockwell node) to a different switch port.- Replace an Ethernet cable

  • 8/3/2019 EtherNetIP System Troubleshooting

    8/13

    EtherNet/IP and Logix troubleshooting

    8 of 13

    Web Server Description

    From the Rockwell web server home page, the following are parameters that have proven useful when

    troubleshooting a system on one of the following modules:1756-ENBT1788-ENBT1794-AENT1769-L35E

    Other Rockwell EtherNet/IP products have a different looks to them at this time. However, there is amigration plan for uniformity for all of our products.

    In the Address field of Internet Explorer or Netscape, enter the IP address of an Ethernet interfacemodule.

    Example: 10.88.76.96

    You will see something similar to the following ---

    Of all the Rockwell Ethernet modules that you may have (CLGX, Flex I/O, etc.), the Ethernetinterface(s) within the controller chassis is where you want to start troubleshooting since it probably isthe busiest.

    Up to this time, most requests for troubleshooting involved the I/O and produce tag. The diagnosticsmost useful I/O and produce tag are marked with an asterisk ( * ) below.

    Report all errors, timeouts, etc. to Rockwell Automation Technical Support.

  • 8/3/2019 EtherNetIP System Troubleshooting

    9/13

    EtherNet/IP and Logix troubleshooting

    9 of 13

    How much is too much?The answer to the question, How many errors of type X are bad?, is application dependent.For example, if you have a single bad UDP checksum (caused by electrical noise) every 100 packets,that packet will be discarded. One customer may say this is not a problem because his production lineis running fine. However, another customer may say that this is unacceptable.

    Link name: Module InformationThis page is self-descriptive. Firmware revision and module uptime are important.

    Link name: TCP/IP ConfigurationThis page is self-descriptive and useful.

    Link name: Chassis WhoThis page is self-descriptive and useful.

    Link name: Diagnostic Information

  • 8/3/2019 EtherNetIP System Troubleshooting

    10/13

    EtherNet/IP and Logix troubleshooting

    10 of 13

    Backplane StatisticsIdentifies backplane errors.Report timeouts or errors to Rockwell Technical Support.

    Connection Manager StatisticsIdentify if any Rejects or Timeouts are incrementing.

    Note: you can get the same info from RSLinx by right clicking on the Ethernet moduleand selecting Module Statistics and selecting Connection Manager.

    Link name: Ethernet StatisticsInput errorsOutput errors

    Link name: TCP StatisticsConnection requests

    These are out-going from the controller thru an ENBT.Connection accepts

    These are in-coming from the wire through an ENBT to a controller.These will increment while you are on line with a web browser.

    DiscardsThese are bad packets that have been discarded.

    Link name: UDP StatisticsAt this time, this screen will increment only if other devices are sending non-CIP UDPpackets to this module. At this time, no devices send non-CIP UDP packets to thismodule.

    From testing with a produced tag (RPI=10ms), the total UDP packets and input UDPpackets do increment (on the company network) but they increment at a rate of only 1-3every 10-30 seconds. With an RPI of 10ms, the produce tag rate is 200 packets persecond. The conclusion is that there is no relationship between CIP packets and UDPstatistics. Without connecting Sniffer to investigate, the assumption is that someone inthe building is sending multicast to all stations, including my ENBT module.

    Also, the addition of CIP UDP checksum errors has formally been requested.

    Link name: Encapsulation Statistics

    Shows cumulative and active in/out TCP connections used for encapsulation (CIP)sessions.The TCP statistics shown here are for all TCP connection (CIP+ HTTP+ telnet, etc. ).

    Link name: Enet/IP (CIP) StatisticsActive Class 1 Transports provides the number of transports. In general, two (2) class 1transports equates to a connection. Use this number to verify against your calculatedclass 1 total.

  • 8/3/2019 EtherNetIP System Troubleshooting

    11/13

    EtherNet/IP and Logix troubleshooting

    11 of 13

    Class 3 transport information is supplied here including client (outgoing) andserver(incoming) detailed information.

    Unconnected message information is also provided here. The UCMM Worst Backlog(Client) can be used to see the unconnected message high-water mark for messages to

    legacy PLCs. If this is 10 and you have the Logix processor configured for a maximumof 10, this would be a sign that you may be trying to exceed the controllers limit.

    Link name: Class 1 (CIP) Packet Statistics*Link Status (including negotiation description)*Speed*Duplex*Method for selecting duplex and speed (eg. Autonegotiation)*CPU Utilization Percentage (includes processing for everything on the module)Current TCP connections (these are for all connections, class 1 and class 3)

    Includes actual connections and ones being built but not yet complete.

    Current incoming TCP connections (these are for all connections, class 1 and class 3)Current outgoing TCP connections (these are for all connections, class 1 and class 3)

    Includes actual connections and ones being built but not yet complete.*Actual class 1 packets per second (for I/O and produce tag only)

    Compare your calculated to this number.Reserve Class 1 capacity is how much is unused.*Total Missed Class 1 Packets (for I/O and produce tag only)

    Link name: *Class 1 (CIP) Active TransportsYou should see only the RPIs you configured.Example: If all your configured RPIs are 50ms, you should see only 50ms API.

    Link name: Class 3 (CIP) Active TransportsFor explicit messaging, transports are the same as connections.

    Examine the remote addresses. Verify that these are correct for your system.

    Examine the number of Class 3 transports.The number of transports expected depends on what you are doing.

    Example: RSLogix5000 opens 1 CIP connection.

    Example, a PvPlus can use 1 or more depending on the volume of tags on scan.

    With 488 tags on scan (120 integers, 120 dints, 128 reals, 128 bools), a PvPlus(actually RSLinx Enterprise) opened three transports.

  • 8/3/2019 EtherNetIP System Troubleshooting

    12/13

    EtherNet/IP and Logix troubleshooting

    12 of 13

    RSLinx Diagnostics

    From RSLinx, in RSWho, you can right click, select Module Statistics and select the tabs/links listed

    below.

    Link name: GeneralThis tab is self-descriptive.

    Link name: Port DiagnosticsMost of this information, and more, can also be found in the web server in 3 places:

    Diagnostics Ethernet StatisticsDiagnostics TCP StatisticsDiagnostics IP Statistics

    For the most part, the amount of information in the web server is greater but requires

    that you to look in 3 different places to see everything. Additionally, RSLinx PortDiagnostics does show some values (e.g. alignment errors) that are not seen in the websever.

    The recommendation is that you look at RSLinx port diagnostics and note any errors.

    Link name: Connection ManagerSame as Connection Manager in web server.

    Link name: BackplaneSame as Backplane stats in web server.

  • 8/3/2019 EtherNetIP System Troubleshooting

    13/13

    EtherNet/IP and Logix troubleshooting

    13 of 13

    References:1. Noise -----------------------------------------------------

    EtherNet/IP Media Planning and Installation ManualPublication ENET-IN001A-EN-P

    Industrial Automation Wiring and Grounding Guidelines, 1770-4.1

    GMC-RM001www.ab.com/manuals/gmc/GMC-RM001A-EN-P-JUL01.pdf

    2. System Planning and module capacities --------EtherNet/IP Performance and ApplicationPub ENET-AP001C-EN-P