Alarm Description - Huaweienterprise.huawei.com/topic/huawei-sap/mpog/SE2900 Alarm... · HUAWEI...
-
Upload
vuongkhanh -
Category
Documents
-
view
344 -
download
17
Transcript of Alarm Description - Huaweienterprise.huawei.com/topic/huawei-sap/mpog/SE2900 Alarm... · HUAWEI...
HUAWEI SE2900 Session Border Controller
Alarm Description
Issue 01
Date 2016-01-15
Huawei Technologies Co., Ltd.
Copyright © Huawei Technologies Co., Ltd. 2016. All rights reserved.
No part of this document may be reproduced or transmitted in any form or by any means without prior
written consent of Huawei Technologies Co., Ltd.
Trademarks and Permissions
and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.
All other trademarks and trade names mentioned in this document are the property of their respective
holders.
Notice
The purchased products, services and features are stipulated by the contract made between Huawei and
the customer. All or part of the products, services and features described in this document may not be
within the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements,
information, and recommendations in this document are provided "AS IS" without warranties, guarantees or
representations of any kind, either express or implied.
The information in this document is subject to change without notice. Every effort has been made in the
preparation of this document to ensure accuracy of the contents, but all statements, information, and
recommendations in this document do not constitute a warranty of any kind, express or implied.
Huawei Technologies Co., Ltd.
Address: Huawei Industrial Base
Bantian, Longgang
Shenzhen 518129
People's Republic of China
Website: http://www.huawei.com
Email: [email protected]
ALM-3009 Board Temperature Exceeds Level 1 Threshold
Description
This alarm is generated when the temperature of a board drops to the lower threshold (Lower Critical) or rises to
the upper threshold (Upper Critical).
This alarm is cleared when the temperature of the board falls into the range between the lower threshold (Lower
Critical) and the upper threshold (Upper Critical).
Attribute
Alarm ID Alarm Severity Auto Clear
3009 Minor Yes
Parameters
Name Meaning
Rack number This parameter identifies a rack.
Position number This parameter indicates the position of the subrack housing the board in a rack.
Value:
2: Upper
1: Middle
0: Lower
Subrack number This parameter identifies a subrack.
Slot number This parameter specifies the number of the slot that houses the board.
Location This parameter specifies the location of a board.
Value:
Front
Back
Front GPC
Front XMC
Threshold crossing
type
This parameter specifies whether the current value exceeds the upper threshold or falls
below the lower threshold.
Impact on the System
When the board temperature is lower than the lower threshold (Lower Critical), the performance of the system
hardware may be degraded; when the board temperature is higher than the upper threshold (Upper Critical), the
system hardware may be damaged. In these cases, the system may not run properly or stably. Therefore, the
services on the system may be interrupted.
Possible Causes
The temperature of the equipment room where the system is located is out of the normal range. You can
query the specified normal temperature range in Environment Specifications. To access the
Environment Specifications, choose Hardware Manual > Technical Specifications > Environment
Specifications.
The fan in the subrack that houses the board runs improperly.
Procedure
1. Based on the Subrack number alarm parameter, identify the subrack that houses the board whose
temperature is abnormal. Then, check whether ALM-2010 Fan Speed Exceeds Threshold and ALM-2135
Fan Fault are generated for a fan in the subrack.
Yes: Go to 2.
No: Go to 3.
2. Clear the alarm by referring to ALM-2010 Fan Speed Exceeds Threshold and ALM-2135 Fan Fault in the
Alarm Help. Then, check whether the alarm is cleared.
Yes: No further action is required.
No: Go to 3.
3. Check the temperature of the equipment room and ensure that the temperature falls into the normal range.
You can query the specified normal temperature range in Environment Specifications. To access the
Environment Specifications, choose Hardware Manual > Technical Specifications > Environment
Specifications. Then, check whether the alarm is cleared.
Yes: No further action is required.
No: Go to 4.
4. Check whether a vacant slot exists in the subrack. If any, install a filler panel in the slot to improve heat
dissipation (if there is no a vacant slot or the filler panel has been installed => 6). After 3 to 5 minutes,
check whether the alarm is cleared.
Yes: No further action is required.
No: Go to 5.
5. Use the Information Collection Tool to collect the information about the Extended information for
fault analysis scenario.
6. Contact Huawei technical support engineers.
ALM-5001 Optical Module Fault
Description
This alarm is generated when an optical module is absent or encounters a hardware failure, or a diagnosis failure,
or transmission failure, or has a type mismatch.
This alarm is cleared when the status of the optical module is normal.
Attribute
Alarm ID Alarm Severity Auto Clear
5001 Major Yes
Parameters
Name Meaning
Rack number This parameter specifies the rack where the faulty optical module is located.
Position
number
This parameter specifies the position of the subrack where the faulty optical module is located.
Subrack
number
This parameter specifies the subrack where the faulty optical module is located.
Slot number This parameter specifies the slot where the faulty optical module is located.
Location This parameter specifies the location of the board where the faulty optical module is located.
Value:
Front
Back
Front XMC
Port type This parameter identifies the type of the port on a board.
Port name This parameter identifies the name of a port where the faulty optical module is located. At present,
the name comes from the silk-screen of hardware.
Alarm cause This parameter specifies the cause value of this alarm:
Optical module is absent
Optical module self check fault
Optical module type mismatch
Optical module diagnosis abnormal
Optical module send fault
Optical module receive fault
Impact on the System
When an optical module becomes faulty, services on its ports may be interrupted or even a communication failure
or module switch over occurs.
Possible Causes
The optical module is not inserted.
The optical module type is incorrect.
The network port connected to the optical fiber is faulty.
The optical fiber is faulty.
The optical module encounters a hardware failure.
The peer end is faulty.
Procedure
1. Based on the alarm parameters Subrack number, Slot number, and Port name, check whether the
faulty port is an external port on a multi-function switching board. The external ports on the
multi-function switching boards include Fabric/Lan0, Fabric/Lan1, Base/Port0 to Base/Port3, and
Fabric/Port0 to Fabric/Port7.
Yes: Go to 2.
No: Go to 4.
2. Based on the engineering design document and the alarm parameters Subrack number, Slot number,
and Port name, check whether the faulty port is in use.
Yes: Go to 4.
No: Go to 3.
3. In the MML Command - CGP window, run SET SWUPORTSTS to set Port config of the port
indicated by the alarm parameters Subrack number, Slot number, and Port name to Unused. Then,
check whether this alarm is cleared.
Yes: No further action is required.
No: Go to 23.
4. Rectify the fault by performing the following steps based on the Alarm cause in the alarm.
Optical module is absent: Go to 5.
Optical module self check fault: Go to 19.
Optical module type mismatch: Go to 11.
Optical module diagnosis abnormal: Go to 8.
Optical module send fault: Go to 19.
Optical module receive fault: Go to 9.
5. Based on the alarm parameters Subrack number, Slot number, Port type, and Port name, locate the
faulty optical module. Then, check whether the optical module needs to be inserted.
Yes: Go to 6.
No: Go to 23.
6. Check whether the optical module is inserted.
Yes: Go to 19.
No: Go to 7.
7. Insert the optical module with one of the correct module type and check whether this alarm is cleared.
Yes: No further action is required.
No: Go to 19.
8. Based on the alarm parameters Subrack number, Slot number, Port type, and Port name, run DSP
OTR to get the Diagnosis alarm state of the faulty optical module. Rectify the fault by performing the
following steps based on the Diagnosis alarm state in the report.
Normal: Go to 23.
Input power is too high or Input power is too low: Go to 9.
Voltage is too high or Voltage is too low: Go to 17.
Other: Go to 19
9. Ask the maintenance engineers at the peer end to check whether the peer end is faulty.
Yes: Go to 10.
No: Go to 17.
10. Check whether this alarm is cleared when the peer end returns to normal.
Yes: No further action is required.
No: Go to 17.
11. In the MML Command - CGP window, run LST PORT with Subrack number, Slot number, Port
type, and Port name specified based on the alarm information to query the work mode and check
whether the Work mode meet the actual configuration requirement.
Yes: Go to 14.
No: Go to 12.
12. Check whether the Location in the alarm information is Front XMC.
Yes: Go to 24.
No: Go to 13.
13. In the MML Command - CGP window, run MOD PORT to modify the work mode and then check
whether the alarm is cleared.
Yes: No further action is required.
No: Go to 14.
14. In the MML Command - CGP window, run LST PORT with Subrack number, Slot number, Port
type, and Port name specified based on the alarm information to query the work mode. In the same
window, run DSP OTR to query the speed mode. Check whether the work mode is consistent with the
speed mode.
Yes: Go to 16.
No: Go to 15.
15. Replace the optical module with one of the correct module type and check whether the alarm is cleared.
Yes: No further action is required.
No: Go to 16.
16. Based on the alarm parameters Subrack number, Slot number, Port type, and Port name, run DSP
OTR to get the information of the faulty optical module, combines the label of the faulty optical module
to check whether the type of the faulty optical module is correct.
Yes: Go to 24.
No: Go to 22.
17. Based on the alarm parameters Subrack number, Slot number, Port type, and Port name, run DSP
OTR to check whether Distance meets the requirements.
NOTE:
The requirements for transmission distances of different optical modules are as follows:
Short distance: The transmission distance of the optical module is shorter than 2 km.
Middle distance: The transmission distance of the optical module ranges from 10 km to 40 km.
Long distance: The transmission distance of the optical module ranges from 40 km to 80 km. The
low-speed transmission distance of the optical module can reach up to 120 km.
Yes: Go to 19.
No: Go to 18.
18. Replace the optical module so that the transmission distance meets the requirements. Then, check whether
the alarm is cleared.
Yes: No further action is required.
No: Go to 19.
19. Clean the connector and reconnect the optical fiber. Then, check whether the alarm is cleared.
Yes: No further action is required.
No: Go to 20.
20. Check whether the optional fiber is damaged.
Yes: Go to 21.
No: Go to 22.
21. Replace the optical fiber with another optical fiber matching the optical module. Then, check whether this
alarm is cleared.
Yes: No further action is required.
No: Go to 22.
22. Replace the faulty optical module with one of the correct module type. Then, check whether this alarm is
cleared.
Yes: No further action is required.
No: Go to 24.
23. Clear the alarm manually. The handling procedure is complete.
24. Contact Huawei technical support engineers.
ALM-4766 Account Locked
Description
This alarm is generated when the number of incorrect passwords consecutively entered on the CGP client is equal
to or greater than the value of Number of login attempts before lockout. Number of login attempts before
lockout is set by running SET SECPOLICY, and the default value is 5.
This alarm is cleared when the locked account is unlocked.
Attribute
Alarm ID Alarm Severity Auto Clear
4766 Warning Yes
Parameters
Name Meaning
Account name Specifies the name of the locked account.
Impact on the System
The user whose account is locked cannot log in to the CGP client to perform operations on the OMU.
Possible Causes
An account lockout policy is enabled, and the number of consecutive login failures reaches the value of Number
of login attempts before lockout set by running SET SECPOLICY.
Procedure
1. Log in to the OMU as an administrator, and runLST SECPOLICY in the MML Command - CGP
window to check whether the Account lockout duration is greater than zero.
Yes: Go to 2.
No: Go to 3.
2. The account is automatically unlocked after the specified duration. No further action is required.
3. After the account is locked, identify the cause of account lockout immediately. If a malicious user logs in
to the system by guessing the password, adopt corresponding security measures. Determine whether to
unlock the locked account immediately based on the actual conditions, for example, the cause of account
lockout.
Yes: Go to 4.
No: No further action is required.
4. Unlock the locked account by running ULK USER in the MML Command - CGP window. Then, check
whether the unlocked account can log in properly and the alarm is cleared.
Yes: No further action is required.
No: Go to 5.
5. Contact Huawei technical support engineers.
ALM-3076 Board Voltage Exceeds Level 1 Threshold
Description
This alarm is generated when in any of the following cases:
The voltage of a board reaches the lower threshold (LowerCritical) or the upper threshold
(UpperCritical).
The status of the logic voltage is abnormal.
This alarm is cleared when the following conditions are met:
The voltage of a board returns to a value between the lower threshold (LowerCritical) and the upper
threshold (UpperCritical).
The status of the logic voltage becomes normal.
Attribute
Alarm ID Alarm Severity Auto Clear
3076 Minor Yes
Parameters
Name Meaning
Rack number Identifies a rack uniquely.
Position number Indicates the position of the subrack housing the faulty board in a rack, for which the
alarm is generated.
Value:
0: Lower
1: Middle
2: Upper
Subrack number Identifies the subrack that houses the faulty board for which the alarm is generated.
Slot number Indicates the slot number of the faulty board for which the alarm is generated.
Location Indicates the location of a board.
Value:
Front
Back
Front GPC
Front XMC
Threshold crossing
type
Indicates the status of the voltage level.
Value:
Too low
Too high
The logic voltage is abnormal
Impact on the System
Generally, this alarm is generated because the voltage of the power supply configured for the system is unstable.
If the voltage is lower than the lower threshold (LowerCritical), the board may be powered off.
If the voltage is higher than the upper threshold (UpperCritical), the board may be damaged. In this case,
the board may not run properly, and thereby interrupting services.
The status of the logic voltage is abnormal. In this case, the board parts may run incorrectly, and thereby
interrupting services.
Possible Causes
The output voltage of the power supply module is abnormal.
The output voltage of the PDB is abnormal.
The backplane is faulty.
The internal power module of the board is faulty.
Procedure
1. In the MML Command - CGP window, run LST PDU to check whether there is a PDB on the subrack
identified by the alarm parameter Rack number.
Yes: Go to 2.
No: Go to 4.
2. In the Browse Alarms window of the CGP, check whether the ALM-4603 PDB Input Voltage Abnormal
alarm is generated for the subrack identified by the alarm parameters Rack number and Subrack
number.
Yes: Go to 3.
No: Go to 4.
3. Rectify the fault by following the handling procedure of ALM-4603 PDB Input Voltage Abnormal. Check
whether the alarm is cleared.
Yes: The alarm handling is complete.
No: Go to 4.
4. In the Browse Alarms window of the CGP, check whether the ALM-3112 Power Entry Module Voltage
Exceeds Level 1 Threshold or ALM-3113 Power Entry Module Voltage Exceeds Level 2 Threshold alarm
is generated for the subrack identified by the alarm parameter Rack number and Subrack number.
Yes: Go to 5.
No: Go to 6.
5. Rectify the fault by following the handling procedure of ALM-3112 Power Entry Module Voltage
Exceeds Level 1 Threshold or ALM-3113 Power Entry Module Voltage Exceeds Level 2 Threshold.
Check whether the alarm is cleared.
Yes: The alarm handling is complete.
No: Go to 6.
6. NOTICE:
Board replacement is an important operation. Any improper operation may lead to abnormal operation of the
system. Therefore, replace boards only when all the following conditions are met:
In case of an emergency, you can contact Huawei technical support engineers quickly.
Spare boards are available in the warehouse.
You are familiar with the board replacement procedure. For details, see Parts Replacement >
Overview of Parts Replacement in the product documentation.
Replace the multi-function board only when the traffic is light, for example, between midnight
and early morning. In addition, ensure that the other multi-function board is running properly
before replacing the multi-function board.
Replace the board identified by the alarm parameters Subrack number, Rack number, and Slot number. When
the board is running properly, check whether the alarm is cleared.
Yes: The alarm handling is complete.
No: Go to 7.
7. Use the Information Collection Tool to collect the information about the Extended information for
fault analysis scenario.
8. Contact Huawei technical support engineers.
ALM-1005 Board CPU Overload
Description
This alarm is generated when the system detects that the CPU usage of a board exceeds the overload alarm
threshold (can be set by users).
There are four CPU overload levels. When the CPU overload level rises, the system clears the lower-level CPU
overload alarm and generates a new CPU overload alarm. When the CPU overload level falls, the system clears
the higher-level CPU overload alarm and generates a new CPU overload alarm.
The mapping between the CPU overload level and the alarm severity is as follows:
Level one overload: warning
Level two overload: minor
Level three overload: major
Level four overload: critical
Attribute
Alarm ID Alarm Severity Auto Clear
1005 Warning, minor, major, and critical Yes
Parameters
Name Meaning
Rack number This parameter uniquely identifies a rack. It must be set in ascending order from 0.
Position number This parameter specifies the position (upper, middle, or lower) of the subrack in the rack. 2:
Upper; 1: Middle; 0: Lower.
Subrack number This parameter uniquely identifies a subrack.
Slot number This parameter uniquely identifies a slot in the subrack.
Location This parameter specifies the location of the board.
Value:
Front
Back
Front GPC
Front XMC
CPU usage(%) This parameter specifies the average of the usage (percentage) of all the CPUs on the board.
CPU overload
level
This parameter specifies the CPU overload level.
Impact on the System
When the CPU usage of a board is greater than the critical alarm threshold continuously and the CPU of the
service module is overloaded, the board fails to respond to the commands sent to the board in time. In this case,
the service module performs flow control.
Possible Causes
The modules on the board process excessive services.
Unbalanced service data configurations on boards may overload a board.
Procedure
1. Identify the board on which CPU overload occurs based on the alarm parameters.
2. Determine whether the alarm is caused by heavy traffic or unexpected data flood. Determine whether the
alarm is caused by heavy traffic.
a. On the CGP client, run the DSP MODULE command on the MML command interface to
display the CPU usage of the modules.
b. Through the performance management system of the CGP client, query for the performance
measurement data.
c. Compare the measurement result with the CPU usage displayed by running the DSP MODULE
command at the same time.
d. Analyze the data as follows:
If the CPU usage varies with the traffic, you can infer that the CPU overload is caused
by heavy traffic.
If the traffic remains light while the CPU of the board is overloaded continuously, you
can infer that the alarm is caused by unexpected data flood.
Yes: Go to 3.
No: Go to 7.
3. On the CGP client, run the LST ALMTHD command on the MML command interface. Then, determine
whether the CPU overload alarm is generated because the overload threshold is too small based on the
level-x overload threshold field contained in the command output.
Yes: Go to 4.
No: Go to 5.
4. On the CGP client, run the MOD ALMTHD command on the MML command interface to modify the
CPU overload threshold to a proper value. Then, check whether the alarm is generated continuously.
Yes: Go to 5.
No: The handling procedure is complete.
5. On the CGP client, run the DSP CPU command on the MML command interface of the ME. Then, check
whether the CPU usage of the boards of the same type is high based on the CPU usage of module field
contained in the command output.
Yes: Go to 7.
No: Go to 6.
6. You are advised to modify the service configuration to transfer part services from the overloaded board to
other boards. Then, check whether the alarm is generated continuously.
Yes: Go to 7.
No: The handling procedure is complete.
7. Contact Huawei technical support engineers.
Related Information
The CPU usage is classified into two types as follows:
CPU usage of a board: indicates the general usage of all logical CPUs of the board.
CPU usage of a module: indicates the frequency of invoking a module by the CPU, that is, it indicates
whether the module is busy.
As a major reference index, the CPU usage indicates the software running status and service processing capability
of the board. You can create monitoring tasks on the client to obtain the CPU usage. For details about how to
create monitoring tasks, see Creating Monitoring Tasks in GUIs. You also can run the MOD ALMTHD
command to modify the CPU overload thresholds and CPU recovery thresholds of a board.
The CPU overload thresholds are classified into four levels, that is, Usage threshold level 1, Usage threshold
level 2, Usage threshold level 3, and Usage threshold level 4. The higher the overload threshold level, the more
sever the overload is.
The initial settings of the CPU overload thresholds are as follows:
For the SPUA0, SPUA1, or SPUZ0 boards whose application type is SEESU or SEISU:
Usage threshold level 1 is 90.
Usage threshold level 2 is 92.
Usage threshold level 3 is 94.
Usage threshold level 4 is 96.
For other boards:
Usage threshold level 1 is 80.
Usage threshold level 2 is 85.
Usage threshold level 3 is 90.
Usage threshold level 4 is 95.
NOTE:
To query the application type of a board, run LST APPTYPE.
When the CPU overload level rises, the system clears the lower-level CPU overload alarm and generates a new
CPU overload alarm. When the CPU overload level falls, the system clears the higher-level CPU overload alarm
and generates a new CPU overload alarm.
ALM-27002 Node Fault
Description
This alarm is generated when the physical status of a node changes from Up to Down.
This alarm is cleared when the physical status of a node changes from Down to Up.
Attribute
Alarm ID Alarm Severity Auto Clear
27002 Critical Yes
Parameters
Name Meaning
Office direction
name
Indicates the name of the office direction, for which the alarm is generated, in a static route.
This parameter can be viewed using LST AOFC.
Office direction
address
Indicates the office direction address, for which the alarm is generated, in a dynamic route. An
office direction address comprises an IP address, a port number, and a VRF instance name.
Routing name1 Indicates the name of a route that references this office direction. This parameter can be viewed
using LST ART.
Routing name2 Indicates the name of a route that references this office direction. This parameter can be viewed
using LST ART.
Routing name3 Indicates the name of a route that references this office direction. This parameter can be viewed
using LST ART.
Routing name4 Indicates the name of a route that references this office direction. This parameter can be viewed
using LST ART.
Routing name5 Indicates the name of a route that references this office direction. This parameter can be viewed
using LST ART.
Impact on the System
This node cannot communicate with the SE2900.
Possible Causes
This node becomes faulty.
The link between this node and the SE2900 becomes faulty.
Procedure
1. Check whether alarms ALM-27005 Trunk Group Fault and ALM-27007 All Links Between BCF and
DNS Disconnected are generated, and the remote IP addresses in these alarms are the same as the IP
address of the office direction in this alarm.
Yes: Go to 2.
No: Go to 4.
2. Clear alarms ALM-27005 Trunk Group Fault, ALM-27007 All Links Between BCF and DNS
Disconnected, and ALM-27008 Remote Address Unreachable.
3. Check whether this alarm is cleared.
Yes: The alarm handling is complete.
No: Go to 4.
4. Collect alarm information.
Collect following information:
Alarm logs: In the MML Command - CGP window, run LST ALMLOG with Alarm cleared
flag set to UNCLEARED(Uncleared) to obtain alarm logs.
Operation logs: In the MML Command - CGP window, run LST OPTLOG with ME ID set to
the ID of the SE2900 generating the alarm to obtain operation logs.
Other logs: Choose Maintenance > File Transfer Service to obtain the logs with Remote
Directory set to OMU run log, DEV log, and Peer information separately.
SE2900 configuration files:
a. Run EXP MML in the MML Command - CGP window to export MML configuration
files.
b. Choose Maintenance > File Transfer Service and select Export file from the Remote
Directory drop-down list box for the SE2900. SE2900 configuration files are saved in
/opt/HUAWEI/cgp/workshop/omu/share/export/mit/mml.
Performance measurement files
a. Run EXP TRFINFO in the MML Command - CGP window to export the
performance measurement files of the SE2900 generating the alarm.
b. Choose Maintenance > File Transfer Service and select Export file from the Remote
Directory drop-down list box for the SE2900. Performance measurement files are saved
in /opt/HUAWEI/cgp/workshop/omu/share/export/pm/perf_trf_info.
5. Contact Huawei technical support engineers.
ALM-1023 Board Port Fault
Description
This alarm is generated when the configuration or connection status of a port on a board is abnormal.
This alarm is cleared when the status of the port on the board is normal.
Attribute
Alarm ID Alarm Severity Auto Clear
1023 Major Yes
Parameters
Name Meaning
Rack
number
This parameter specifies the rack where the board holding the faulty port is located.
Position
number
This parameter specifies the position of the subrack where the faulty port is located.
Subrack
number
This parameter specifies the subrack of the faulty port.
Slot number This parameter specifies the slot of the faulty port.
Location For ATCA subracks, this parameter specifies the location (front or back) of the board that hosts the
faulty port.
For FusionEngine subracks, this parameter specifies the location (front, front GPC, or front XMC)
of the board that hosts the faulty port.
Port type This parameter specifies the type of the faulty port, including Electric ethernet, Fiber ethernet, 10
gigabit fiber ethernet, 20 gigabit fiber ethernet, or 40 gigabit fiber ethernet.
Port name This parameter identifies the name of the faulty port on a board. At present, the silk-screen on the
board presents the port name.
Alarm cause This parameter specifies the cause value of this alarm. For ATCA subracks, the alarm causes are as
follows:
The port is faulty.
The configured port type is inconsistent with the physical port type.
The port does not exist.
For FusionEngine subracks, the alarm causes are as follows:
The port is faulty.
The port does not exist.
The port phy is faulty.
SerDes fault.
Impact on the System
A port fault may lead to service interruption, communication failure, or module switchover.
Possible Causes
The possible causes of the alarm for ATCA subracks are as follows:
The network cable or optical fiber is not connected or loose.
The peer network port is faulty.
The board where the faulty port resides fails to load the logic or the logic of the board needs to be
upgraded.
The hardware of the port is faulty.
The switch board is faulty.
The configured port type is inconsistent with the physical port type.
The peer device for sending data or the local device for receiving data is faulty.
The possible causes of the alarm for FusionEngine subracks are as follows:
The network cable or optical fiber is not connected or loose.
The peer network port is faulty.
The board where the faulty port resides fails to load the logic or the logic of the board needs to be
upgraded.
The hardware of the port is faulty.
The multi-function board is faulty.
The peer device for sending data or the local device for receiving data is faulty.
Procedure
1. In the MML Command - CGP window, run LST SUBRACK, check the value of Subrack type in the
command output, and perform the following:
ATCA subrack: Go to 2.
FusionEngine subrack: Go to 3.
2. Clear the alarm by following the handling procedure described in ALM-1023 Board Port Fault (ATCA).
3. Clear the alarm by following the handling procedure described in ALM-1023 Board Port Fault
(FusionEngine).
ALM-8703 The OMU Failed to Communicate with the NTP Server
Description
When the OMU server is connected to the NTP server, the OMU server synchronizes the time with the NTP server
periodically. This alarm is generated when the NTP server stratum exceeds the maximum stratum (15) defined by
the NTP protocol or when the OMU server fails to synchronize the time with the NTP server in three consecutive
synchronization periods.
You can run ADD NTPSVR to configure the NTP server and run LST NTPCFG to query the period for the
OMU server to synchronize the time with the NTP server. By default, the synchronization period is 5 minutes.
This alarm is cleared when the communication between the OMU and the NTP server becomes normal.
Attribute
Alarm ID Alarm Severity Auto Clear
8703 Minor Yes
Parameters
Name Meaning
NTP server
name
Specifies the name of the faulty NTP server.
NTP server IP Specifies the IP address of the faulty NTP server.
Alarm cause Specifies the cause of the alarm.
Value:
Connection failed.
Authentication failed.
The NTP server stratum exceeds the maximum stratum (15) defined by the NTP
protocol.
Impact on the System
If other NTP servers are available after the specified NTP server fails, the OMU server synchronizes the time with
other NTP servers. However, if all NTP servers fail, the OMU server fails to perform the time synchronization. As
a result, the time in CDRs may be incorrect.
Possible Causes
The NTP server is faulty.
The network connection between the OMU server and the NTP server is faulty.
Authentication failed.
The NTP server stratum exceeds the maximum stratum (15) defined by the NTP protocol.
The connection mode of SERVICE_NET(Service network) is selected but the IP address of the NTP
client is not configured correctly.
Procedure
1. Run LST NTPSVR to get the connection mode of the NTP server configured.
MAINTENANCE_NET(Maintenance network): Go to 2.
SERVICE_NET(Service network): Go to 3.
2. Run LST NTPSVR to check whether the IP address of the NTP server is correctly configured.
Yes: Go to 8.
No: Go to 4.
3. Run LST NTPCLIENTIP on the MML command interface of the service ME to check whether the IP
address of the NTP client is correctly configured.
Yes: Go to 8.
No: Go to 5.
4. Run MOD NTPSVR to change the IP address of the NTP server to a correct one, go to 6.
5. Run RMV NTPCLIENTIP and ADD NTPCLENTIP orderly on the MML command interface of the
service ME to change the IP address of the NTP client to a correct one, go to 6.
6. Wait for a time synchronization period, and then switch to the alarm console to check whether ALM-8703
The OMU Failed to Communicate with the NTP Server is cleared.
Yes: Go to 7.
No: Go to 18.
7. Switch to the alarm console and check whether ALM-8703 The OMU Failed to Communicate with the
NTP Server related to the new NTP server is generated.
Yes: Go to 8.
No: No further action is required.
8. Clear the alarm based on the cause of the alarm:
Connection failed. Go to 9.
Authentication failed. Go to 15.
The NTP server stratum exceeds the maximum stratum (15) defined by the NTP protocol. Go to
16.
9. Contact the maintenance personnel on the NTP server side to check whether the NTP server is normal.
Yes: Go to 12.
No: Go to 10.
10. Contact the maintenance personnel on the NTP server side to rectify the fault. After a synchronization
period, go to 11.
11. Check whether the alarm is cleared in the Browse Alarm window.
Yes: No further action is required.
No: Go to 12.
12. Run Ping to check whether the network connection between the OMU server and the NTP server is
proper.
Yes: Go to 18.
No: Go to 13.
Set the local IP address to the external floating IP address of the OMU server and the peer IP address to the IP
address of the NTP server. You can run LST OMUOMIP to query the external floating IP address of the OMU
server.
13. Rectify the fault related to the network connection between the OMU server and the NTP server. Then, go
to 14.
14. Check whether the alarm is cleared in the Browse Alarm window.
Yes: No further action is required.
No: Go to 18.
15. Ask the maintenance personnel on the NTP server side for Key ID and Key string. Run MOD NTPSVR
in the MML Command - CGP window to modify the values of Key ID and Key string. Then, check whether the command is successfully executed.
Yes: Go to 17.
No: Go to 18.
16. Contact the maintenance personnel on the NTP server side to replace an NTP server whose time source
stratum is less than or equal to 15. Then, go to 17.
17. After a synchronization period, check whether the alarm is cleared in the Browse Alarm window.
Yes: No further action is required.
No: Go to 18.
18. Contact Huawei technical support engineers.
ALM-1012 Communication Failure of Control Plane
Description
NOTICE:
If this high severity alarm is generated, handle it immediately.
Alarm Mechanism
For OSTA 5.0 or F8000 boards on virtual machines (VMs):
This alarm is generated when the communication over the dual control planes between the source VM
and the destination VM is interrupted for 2 minutes.
This alarm is cleared when the communication over either control plane between the source VM and the
destination VM is restored.
For OSTA 5.0 or F8000 boards on physical machines:
The system detects the communication status between the source board and the peer board every 1
minute. This alarm is generated when the system detects the communication between the source board
and the peer board is faulty for consecutive two times.
This alarm is cleared when the communication between the source board and the peer board becomes
normal.
Figure 1 illustrates the alarm mechanism.
Figure 1 Alarm mechanism
Attribute
Alarm ID Alarm Severity Auto Clear
1012 Major Yes
Parameters
Name Meaning
Source position Indicates the position of the source VM or board.
Destination position Indicates the position of the destination VM or board.
Impact on the System
The communication fails over the control planes between all the modules on the source VM or board and all the
modules on the destination VM or board. Service requests are redirected to other service modules, so the load on
these other service modules increases.
Possible Causes
For OSTA 5.0 or F8000 boards on VMs:
A Fabric port of the source VM is faulty.
A Fabric port of the destination VM is faulty.
The network port of the board running the source VM or the destination VM is faulty.
A virtual port is faulty.
For OSTA 5.0 or F8000 boards on physical machines:
The switch board is faulty.
The cascading cables between subracks are faulty.
The Fabric network port on the source, destination, or switch board is faulty.
The cables of the switch boards in cascaded subracks are looped or not connected.
Procedure
Check how the service ME is deployed.
1. Perform the following operations based on where the service ME is running:
The ME is running on OSTA 5.0 or F8000 boards on VMs: Go to 2.
The ME is running on OSTA 5.0 or F8000 boards on physical machines: Go to 10.
Check whether either Fabric port of the source VM is faulty.
2. Run DSP VPORT to check whether the Port state of the Fabric1 or Fabric2 ports indicated by
the Source position in the alarm information is Fault.
Yes: Go to 3.
No: Go to 10.
3. Check whether this alarm is generated for the control planes between the VM indicated by the
Source position and other properly-running VMs.
Yes: Go to 4.
No: Go to 7.
4. NOTICE:
The VM stops providing services when it is being reset. Therefore, run SWP MODULE
to switch over the active and standby modules before resetting the VM, and reset the
VM only when the traffic is light, for example, in the midnight.
If the VM running the active OMU must be reset, run SWP OMU to switch over the
active and standby OMUs before setting the VM.
In the MML Command window of the corresponding NE, run RST VM with Virtual machine name set to the
name of the VM indicated by the Source position.
5. Wait 15 minutes more after the VM indicated by the Source position recovers from failure.
Then, check whether this alarm is generated again.
Yes: Go to 10.
No: No further action is required.
Check whether either Fabric port of the destination VM is faulty.
6. Run DSP VPORT to check whether the Port state of the Fabric1 or Fabric2 ports indicated by
the Destination position in the alarm information is Fault.
Yes: Go to 7.
No: Go to 10.
7. Check whether this alarm is generated for the control planes between the VM indicated by the
Destination position and other properly-running VMs.
Yes: Go to 8.
No: Go to 10.
8. NOTICE:
The VM stops providing services when it is being reset. Therefore, run SWP MODULE
to switch over the active and standby modules before resetting the VM, and reset the
VM only when the traffic is light, for example, in the midnight.
If the VM running the active OMU must be reset, run SWP OMU to switch over the
active and standby OMUs before setting the VM.
In the MML Command window of the corresponding NE, run RST VM with Virtual machine name set to the
name of the VM indicated by the Destination position.
9. Wait 15 minutes more after the VM indicated by the Destination position recovers from failure.
Then, check whether this alarm is generated again.
Yes: Go to 10.
No: No further action is required.
Check whether a switch board is faulty.
10. In the Browse Alarms window of the OMU client, check whether ALM-8302 Communication
Failure Between the OMU and Switch Boards is generated on the source subrack and peer
subrack.
Yes: Go to 11.
No: Go to 12.
11. Rectify the fault by following the procedure for handling ALM-8302 Communication Failure
Between the OMU and Switch Boards. Then, check whether ALM-1012 is cleared.
Yes: No further action is required.
No: Go to 12.
Check whether a port is faulty.
12. In the Browse Alarms window of the OMU client, check whether ALM-1023 Board Port Fault is
generated for the network ports on the source board, peer board, source switch boards, or peer
switch boards.
Yes: Go to 13.
No: Go to 14.
13. Rectify the fault by following the procedure for handling ALM-1023 Board Port Fault. Then,
check whether ALM-1012 is cleared.
Yes: No further action is required.
No: Go to 14.
14. In the Browse Alarms window of the OMU client, check whether ALM-9944 Board Port
Detection Abnormal is generated for the network ports on the source board, peer board, source
switch boards, or peer switch boards.
Yes: Go to 15.
No: Go to 16.
15. Rectify the fault by following the procedure for handling ALM-9944 Board Port Detection
Abnormal. Then, check whether ALM-1012 is cleared.
Yes: No further action is required.
No: Go to 16.
16. Check whether the source or peer board is a back board.
Yes: Go to 17.
No: Go to 19.
17. In the Browse Alarms window of the OMU client, check whether ALM-5001 Optical Module
Fault is generated for the Card1/SFP1 or Card3/SFP0 network port on the source or peer board.
Yes: Go to 18.
No: Go to 19.
18. Rectify the fault by following the procedure for handling ALM-5001 Optical Module Fault.
Then, check whether ALM-1012 is cleared.
Yes: No further action is required.
No: Go to 19.
Locate faults on cables between subracks.
19. Check whether the environment is single-subrack environment on Device Panel in the
navigation tree of the OMU client.
Yes: Go to 22.
No: Go to 20.
20. In the MML command - CGP window, run LST NET to query the networking mode. Check
whether the connection between subracks is normal according to the Introduction to the
cascading network mode of the Cascading Mode.
Yes: Go to 22.
No: Go to 21.
21. Based on the Introduction to the cascading network mode of the Cascading Mode, correct
connections between cascading subracks, and then check whether ALM-1012 is cleared.
Yes: No further action is required.
No: Go to 22.
Collect fault-related information.
22. Use the Information Collection Tool to collect information about Common fault analysis
scenario.
23. Contact Huawei technical support engineers.
ALM-4766 Account Locked
Description
This alarm is generated when the number of incorrect passwords consecutively entered on the CGP client is equal
to or greater than the value of Number of login attempts before lockout. Number of login attempts before
lockout is set by running SET SECPOLICY, and the default value is 5.
This alarm is cleared when the locked account is unlocked.
Attribute
Alarm ID Alarm Severity Auto Clear
4766 Warning Yes
Parameters
Name Meaning
Account name Specifies the name of the locked account.
Impact on the System
The user whose account is locked cannot log in to the CGP client to perform operations on the OMU.
Possible Causes
An account lockout policy is enabled, and the number of consecutive login failures reaches the value of Number
of login attempts before lockout set by running SET SECPOLICY.
Procedure
1. Log in to the OMU as an administrator, and runLST SECPOLICY in the MML Command - CGP
window to check whether the Account lockout duration is greater than zero.
Yes: Go to 2.
No: Go to 3.
2. The account is automatically unlocked after the specified duration. No further action is required.
3. After the account is locked, identify the cause of account lockout immediately. If a malicious user logs in
to the system by guessing the password, adopt corresponding security measures. Determine whether to
unlock the locked account immediately based on the actual conditions, for example, the cause of account
lockout.
Yes: Go to 4.
No: No further action is required.
4. Unlock the locked account by running ULK USER in the MML Command - CGP window. Then, check
whether the unlocked account can log in properly and the alarm is cleared.
Yes: No further action is required.
No: Go to 5.
5. Contact Huawei technical support engineers.
Failed to Start the DHCP Server Service
1.1 Symptom
After you configure the installation parameters and click Next in the INU dialog box, the
system displays the message Fail to start DHCP server, as shown in Figure 1.
Figure 1 Message "Fail to start DHCP server" displayed
1.2 Possible Causes
Port 67 used to start the DHCP service is occupied by other programs.
The settings of the PC are incorrect.
The IP addresses of a certain network adapter of the PC are incorrect.
1.3 Fault Diagnosis
None.
1.4 Procedure
1. Check whether the port used to start the DHCP service is occupied by other programs.
a. Choose Start > Run.
b. In the displayed window, type cmd, and press Enter.
c. In the displayed window, run netstat -aon | findstr 67, and check whether there is command output.
Yes: The port is occupied. Go to 1.d.
No: Go to 2.
d. Run ntsd -c q -p PID to disable the TFTP Server service. PID is the last parameter in the command output of netstat -aon | findstr 67, for example, 3808.
e. Run netstat -aon | findstr 67 again, and then check whether the command output is displayed.
Yes: Go to 11.
No: Go to 2.
2. Exit the INU4Win.
3. Ensure that the File and Printer Sharing for Microsoft Networks check box is selected. For details, see Checking PC Configuration.
4. Stop the DHCP Server service on the PC.
NOTE:
The INU4Win program provides the DHCP Server service. Therefore, ensure that the DHCP Server service on the PC is stopped to prevent a service collision.
Assume that the PC runs the Windows XP OS.
c. Choose Start > Run, type cmd, and then press Enter. The command-line interface (CLI) is displayed.
d. Run netstat -a -o -p udp | find "bootp" on the CLI.
If the command output is not displayed, it indicates that the DHCP Server service is stopped. In this case, go to 4.e.
If the command output is displayed as shown in Figure 2, the DHCP Server service is started. In this case, stop the DHCP Server service. For details, go to 4.c.
Figure 2 DHCP Server service started
e. Run ntsd -c q -p PID to stop the DHCP Server service. PID is the last parameter in the command output of netstat -a -o -p udp | find "bootp", for example, 3808.
f. Run netstat -a -o -p udp | find "bootp" again to check whether the DHCP Server service is stopped.
Yes: Go to 11.
No: Go to 4.e.
g. Run exit to exit the CLI.
5. Check whether the Server and Workstation services on the PC are started.
. Choose Control Panel > Administrative Tools > Services. The Services window is displayed.
a. Check whether the Status of the Server and Workstation services is Started.
Yes: Go to 6.
No: Go to 5.c.
b. Right-click the server or workstation service not started, and choose Start.
6. Perform a complete installation or an incremental installation again. Then, check whether the fault is rectified.
Yes: The handling procedure is complete.
No: Go to 7.
7. On the PC hosting the INU4Win, run ipconfig on the CLI to check the IP addresses configured on the network adapter that connects to the BASE network port. The IP addresses should be 192.168.2.1, 172.17.128.254, and 172.16.128.254.
If the IP addresses are correctly configured: Ensure that the network cable is securely connected.
If the IP addresses are incorrectly configured: Go to 8.
8. Restart the INU4Win and check whether the IP addresses are correctly configured.
Yes: Go to 10.
No: Go to 9.
9. Configure the IP addresses (192.168.2.1, 172.17.128.254, and 172.16.128.254) on the network adapter that connects to the BASE network port.
10. Check whether the fault is rectified.
Yes: The handling procedure is complete.
No: Go to 11.
11. Contact Huawei technical support engineers to rectify the fault.
Connection Failure Between the OMU and the Syslog Server
1.5 Symptom
After you verify the connection between an OMU and the Syslog server by running ping
SYSLOG_SERVER_IP on the OMU, the system displays the message: Destination Host
Unreachable. SYSLOG_SERVER_IP indicates the IP address of the Syslog server.
1.6 Possible Causes
The network cable between the OMU and the Syslog server is connected incorrectly or is
not connected.
1.7 Procedure
Checking the Connection of the Network Cable
1. Check whether the communication network interface on the OMU board and the communication network interface of the Syslog server is correctly connected.
Yes: Go to 4.
No: Go to 2.
Connecting the Network Cable
2. Connect one end of the communication network interface on the OMU board and the other end to the communication network interface of the Syslog server.
Checking Whether the Fault Is Rectified
3. Run ping SYSLOG_SERVER_IP to check whether the connection between the OMU and the Syslog server is normal.
Yes: No further action is required.
No: Go to 4.
4. Contact Huawei technical support engineers.