Huawei › en-us › eu-west-0-user... · 2019-12-14 · Contents 1...

MapReduce Service

User Guide

Date 2019-01-15

Contents

1 Overview......................................................................................................................................... 11.1 Introduction.................................................................................................................................................................... 11.2 Application Scenarios.....................................................................................................................................................11.3 Functions........................................................................................................................................................................ 21.3.1 Cluster Management Function.....................................................................................................................................21.3.2 Hadoop.........................................................................................................................................................................31.3.3 Spark............................................................................................................................................................................31.3.4 Spark SQL................................................................................................................................................................... 41.3.5 HBase...........................................................................................................................................................................41.3.6 Hive............................................................................................................................................................................. 51.3.7 Hue...............................................................................................................................................................................61.3.8 Kerberos Authentication..............................................................................................................................................61.3.9 Kafka........................................................................................................................................................................... 71.3.10 Storm......................................................................................................................................................................... 81.3.11 CarbonData................................................................................................................................................................ 81.3.12 Flume......................................................................................................................................................................... 91.3.13 Loader........................................................................................................................................................................91.4 Relationships with Other Services................................................................................................................................101.5 Required Permission for Using MRS........................................................................................................................... 101.6 Limitations.................................................................................................................................................................... 11

2 MRS Quick Start..........................................................................................................................132.1 Introduction to the Operation Process.......................................................................................................................... 132.2 Quick Start....................................................................................................................................................................142.2.1 Creating a Cluster...................................................................................................................................................... 142.2.2 Managing Files.......................................................................................................................................................... 152.2.3 Creating a Job............................................................................................................................................................ 17

3 Cluster Operation Guide............................................................................................................203.1 Overview...................................................................................................................................................................... 203.2 Cluster List................................................................................................................................................................... 213.3 Creating a Cluster......................................................................................................................................................... 233.4 Managing Active Clusters............................................................................................................................................ 323.4.1 Viewing Basic Information About an Active Cluster................................................................................................32

MapReduce ServiceUser Guide Contents

2019-01-15 ii

3.4.2 Viewing Patch Information About an Active Cluster................................................................................................353.4.3 Entering the Cluster Management Page.................................................................................................................... 353.4.4 Expanding a Cluster...................................................................................................................................................363.4.5 Terminating a Cluster................................................................................................................................................ 363.4.6 Deleting a Failed Task............................................................................................................................................... 373.4.7 Managing Jobs in an Active Cluster..........................................................................................................................373.4.8 Managing Data Files..................................................................................................................................................373.4.9 Viewing the Alarm List............................................................................................................................................. 413.5 Managing Historical Clusters....................................................................................................................................... 413.5.1 Viewing Basic Information About a Historical Cluster.............................................................................................423.5.2 Viewing Job Configurations in a Historical Cluster.................................................................................................. 443.6 Managing Jobs..............................................................................................................................................................453.6.1 Introduction to Jobs................................................................................................................................................... 453.6.2 Adding a Jar or Script Job......................................................................................................................................... 473.6.3 Submitting a Spark SQL Statement...........................................................................................................................503.6.4 Viewing Job Configurations and Logs...................................................................................................................... 513.6.5 Stopping Jobs.............................................................................................................................................................523.6.6 Replicating Jobs.........................................................................................................................................................523.6.7 Deleting Jobs............................................................................................................................................................. 553.7 Querying Operation Logs............................................................................................................................................. 55

4 Remote Operation Guide...........................................................................................................584.1 Overview...................................................................................................................................................................... 584.2 Logging In to a Master Node........................................................................................................................................594.2.1 Logging In to an ECS Using VNC............................................................................................................................ 594.2.2 Logging In to a Linux ECS Using a Key Pair (SSH)................................................................................................ 604.2.3 Logging In to a Linux ECS Using a Password (SSH)...............................................................................................604.3 Viewing Active and Standby Nodes............................................................................................................................. 604.4 Client Management.......................................................................................................................................................614.4.1 Updating the Client....................................................................................................................................................614.4.2 Using the Client on a Cluster Node........................................................................................................................... 624.4.3 Using the Client on Another Node of a VPC............................................................................................................ 63

5 MRS Manager Operation Guide.............................................................................................. 675.1 MRS Manager Introduction..........................................................................................................................................675.2 Accessing MRS Manager............................................................................................................................................. 705.3 Accessing MRS Manager Supporting Kerberos Authentication..................................................................................715.4 Viewing Running Tasks in a Cluster............................................................................................................................ 725.5 Monitoring Management.............................................................................................................................................. 735.5.1 Viewing the System Overview.................................................................................................................................. 735.5.2 Configuring a Monitoring History Report.................................................................................................................745.5.3 Managing Service and Host Monitoring................................................................................................................... 755.5.4 Managing Resource Distribution...............................................................................................................................795.5.5 Configuring Monitoring Metric Dumping.................................................................................................................80


2019-01-15 iii

5.6 Alarm Management...................................................................................................................................................... 825.6.1 Viewing and Manually Clearing an Alarm................................................................................................................825.6.2 Configuring an Alarm Threshold.............................................................................................................................. 835.6.3 Configuring Syslog Northbound Interface................................................................................................................ 845.6.4 Configuring SNMP Northbound Interface................................................................................................................ 875.7 Alarm Reference...........................................................................................................................................................895.7.1 ALM-12001 Audit Log Dump Failure...................................................................................................................... 895.7.2 ALM-12002 HA Resource Is Abnormal................................................................................................................... 915.7.3 ALM-12004 OLdap Resource Is Abnormal..............................................................................................................935.7.4 ALM-12005 OKerberos Resource Is Abnormal....................................................................................................... 945.7.5 ALM-12006 Node Fault............................................................................................................................................ 965.7.6 ALM-12007 Process Fault.........................................................................................................................................975.7.7 ALM-12010 Manager Heartbeat Interruption Between the Active and Standby Nodes...........................................995.7.8 ALM-12011 Manager Data Synchronization Exception Between the Active and Standby Nodes........................ 1005.7.9 ALM-12012 NTP Service Is Abnormal.................................................................................................................. 1025.7.10 ALM-12016 CPU Usage Exceeds the Threshold..................................................................................................1045.7.11 ALM-12017 Insufficient Disk Capacity................................................................................................................ 1065.7.12 ALM-12018 Memory Usage Exceeds the Threshold............................................................................................1085.7.13 ALM-12027 Host PID Usage Exceeds the Threshold...........................................................................................1095.7.14 ALM-12028 Number of Processes in the D State on the Host Exceeds the Threshold.........................................1115.7.15 ALM-12031 User omm or Password Is About to Expire...................................................................................... 1135.7.16 ALM-12032 User ommdba or Password Is About to Expire................................................................................ 1145.7.17 ALM-12033 Slow Disk Fault................................................................................................................................ 1165.7.18 ALM-12034 Periodic Backup Failure................................................................................................................... 1175.7.19 ALM-12035 Unknown Data Status After Recovery Task Failure........................................................................ 1185.7.20 ALM-12037 NTP Server Is Abnormal.................................................................................................................. 1195.7.21 ALM-12038 Monitoring Indicator Dump Failure................................................................................................. 1215.7.22 ALM-12039 GaussDB Data Is Not Synchronized................................................................................................ 1235.7.23 ALM-12040 Insufficient System Entropy............................................................................................................. 1255.7.24 ALM-13000 ZooKeeper Service Unavailable.......................................................................................................1275.7.25 ALM-13001 Available ZooKeeper Connections Are Insufficient........................................................................ 1295.7.26 ALM-13002 ZooKeeper Heap Memory or Direct Memory Usage Exceeds the Threshold................................. 1325.7.27 ALM-14000 HDFS Service Unavailable.............................................................................................................. 1335.7.28 ALM-14001 HDFS Disk Usage Exceeds the Threshold.......................................................................................1355.7.29 ALM-14002 DataNode Disk Usage Exceeds the Threshold.................................................................................1375.7.30 ALM-14003 Number of Lost HDFS Blocks Exceeds the Threshold....................................................................1385.7.31 ALM-14004 Number of Damaged HDFS Blocks Exceeds the Threshold............................................................1405.7.32 ALM-14006 Number of HDFS Files Exceeds the Threshold............................................................................... 1415.7.33 ALM-14007 HDFS NameNode Memory Usage Exceeds the Threshold............................................................. 1425.7.34 ALM-14008 HDFS DataNode Memory Usage Exceeds the Threshold............................................................... 1445.7.35 ALM-14009 Number of Dead DataNodes Exceeds the Threshold....................................................................... 1455.7.36 ALM-14010 NameService Service Is Abnormal.................................................................................................. 147


2019-01-15 iv

5.7.37 ALM-14011 HDFS DataNode Data Directory Is Not Configured Properly......................................................... 1505.7.38 ALM-14012 HDFS JournalNode Data Is Not Synchronized................................................................................1535.7.39 ALM-16000 Percentage of Sessions Connected to the HiveServer to Maximum Number Allowed Exceeds theThreshold.......................................................................................................................................................................... 1545.7.40 ALM-16001 Hive Warehouse Space Usage Exceeds the Threshold.....................................................................1565.7.41 ALM-16002 Successful Hive SQL Operations Are Lower than the Threshold....................................................1585.7.42 ALM-16004 Hive Service Unavailable.................................................................................................................1605.7.43 ALM-18000 Yarn Service Unavailable................................................................................................................. 1635.7.44 ALM-18002 NodeManager Heartbeat Lost...........................................................................................................1655.7.45 ALM-18003 NodeManager Unhealthy................................................................................................................. 1665.7.46 ALM-18006 MapReduce Job Execution Timeout.................................................................................................1675.7.47 ALM-19000 HBase Service Unavailable.............................................................................................................. 1695.7.48 ALM-19006 HBase Replication Synchronization Failed......................................................................................1705.7.49 ALM-25000 LdapServer Service Unavailable...................................................................................................... 1735.7.50 ALM-25004 Abnormal LdapServer Data Synchronization.................................................................................. 1755.7.51 ALM-25500 KrbServer Service Unavailable........................................................................................................ 1775.7.52 ALM-27001 DBService Unavailable.................................................................................................................... 1795.7.53 ALM-27003 DBService Heartbeat Interruption Between the Active and Standby Nodes................................... 1815.7.54 ALM-27004 Data Inconsistency Between Active and Standby DBServices........................................................1825.7.55 ALM-28001 Spark Service Unavailable............................................................................................................... 1855.7.56 ALM-26051 Storm Service Unavailable...............................................................................................................1865.7.57 ALM-26052 Number of Available Supervisors in Storm Is Lower Than the Threshold......................................1885.7.58 ALM-26053 Slot Usage of Storm Exceeds the Threshold.................................................................................... 1905.7.59 ALM-26054 Heap Memory Usage of Storm Nimbus Exceeds the Threshold......................................................1915.7.60 ALM-38000 Kafka Service Unavailable...............................................................................................................1935.7.61 ALM-38001 Insufficient Kafka Disk Space..........................................................................................................1955.7.62 ALM-38002 Heap Memory Usage of Kafka Exceeds the Threshold................................................................... 1975.7.63 ALM-24000 Flume Service Unavailable.............................................................................................................. 1995.7.64 ALM-24001 Flume Agent Is Abnormal................................................................................................................2005.7.65 ALM-24003 Flume Client Connection Failure..................................................................................................... 2025.7.66 ALM-24004 Flume Fails to Read Data................................................................................................................. 2045.7.67 ALM-24005 Data Transmission by Flume Is Abnormal.......................................................................................2065.7.68 ALM-12041 Permission of Key Files Is Abnormal.............................................................................................. 2085.7.69 ALM-12042 Key File Configurations Are Abnormal...........................................................................................2095.7.70 ALM-23001 Loader Service Unavailable..............................................................................................................2115.7.71 ALM-12357 Failed to Export Audit Logs to the OBS.......................................................................................... 2145.7.72 ALM-12014 Partition Lost.................................................................................................................................... 2165.7.73 ALM-12015 Partition Filesystem Readonly..........................................................................................................2175.7.74 ALM-12043 DNS Resolution Duration Exceeds the Threshold........................................................................... 2185.7.75 ALM-12045 Network Read Packet Dropped Rate Exceeds the Threshold.......................................................... 2215.7.76 ALM-12046 Network Write Packet Dropped Rate Exceeds the Threshold..........................................................2265.7.77 ALM-12047 Network Read Packet Error Rate Exceeds the Threshold................................................................ 2275.7.78 ALM-12048 Network Write Packet Error Rate Exceeds the Threshold............................................................... 229


2019-01-15 v

5.7.79 ALM-12049 Network Read Throughput Rate Exceeds the Threshold................................................................. 2315.7.80 ALM-12050 Network Write Throughput Rate Exceeds the Threshold................................................................ 2335.7.81 ALM-12051 Disk Inode Usage Exceeds the Threshold........................................................................................2355.7.82 ALM-12052 TCP Temporary Port Usage Exceeds the Threshold........................................................................ 2375.7.83 ALM-12053 File Handle Usage Exceeds the Threshold.......................................................................................2395.7.84 ALM-12054 The Certificate File Is Invalid.......................................................................................................... 2415.7.85 ALM-12055 The Certificate File Is About to Expire............................................................................................ 2435.7.86 ALM-18008 Heap Memory Usage of Yarn ResourceManager Exceeds the Threshold....................................... 2465.7.87 ALM-18009 Heap Memory Usage of MapReduce JobHistoryServer Exceeds the Threshold.............................2485.7.88 ALM-20002 Hue Service Unavailable.................................................................................................................. 2495.7.89 ALM-43001 Spark Service Unavailable............................................................................................................... 2525.7.90 ALM-43006 Heap Memory Usage of the JobHistory Process Exceeds the Threshold........................................ 2535.7.91 ALM-43007 Non-Heap Memory Usage of the JobHistory Process Exceeds the Threshold................................ 2555.7.92 ALM-43008 Direct Memory Usage of the JobHistory Process Exceeds the Threshold.......................................2565.7.93 ALM-43009 JobHistory GC Time Exceeds the Threshold................................................................................... 2585.7.94 ALM-43010 Heap Memory Usage of the JDBCServer Process Exceeds the Threshold......................................2595.7.95 ALM-43011 Non-Heap Memory Usage of the JDBCServer Process Exceeds the Threshold..............................2615.7.96 ALM-43012 Direct Memory Usage of the JDBCServer Process Exceeds the Threshold.................................... 2625.7.97 ALM-43013 JDBCServer GC Time Exceeds the Threshold.................................................................................2645.8 Object Management....................................................................................................................................................2655.8.1 Introduction............................................................................................................................................................. 2655.8.2 Querying Configurations......................................................................................................................................... 2665.8.3 Managing Services.................................................................................................................................................. 2675.8.4 Configuring Service Parameters..............................................................................................................................2675.8.5 Configuring Customized Service Parameters..........................................................................................................2695.8.6 Synchronizing Service Configurations....................................................................................................................2705.8.7 Managing Role Instances.........................................................................................................................................2715.8.8 Configuring Role Instance Parameters.................................................................................................................... 2715.8.9 Synchronizing Role Instance Configuration............................................................................................................2725.8.10 Decommissioning and Recommissioning Role Instances..................................................................................... 2735.8.11 Managing a Host....................................................................................................................................................2745.8.12 Isolating a Host......................................................................................................................................................2745.8.13 Canceling Isolation of a Host................................................................................................................................ 2755.8.14 Starting and Stopping a Cluster............................................................................................................................. 2755.8.15 Synchronizing Cluster Configurations.................................................................................................................. 2755.8.16 Exporting Configuration Data of a Cluster............................................................................................................2765.9 Log Management........................................................................................................................................................2765.9.1 Viewing and Exporting Audit Logs.........................................................................................................................2765.9.2 Exporting Services Logs..........................................................................................................................................2785.9.3 Configuring Audit Log Dumping Parameters......................................................................................................... 2785.10 Health Check Management...................................................................................................................................... 2805.10.1 Performing a Health Check................................................................................................................................... 280


2019-01-15 vi

5.10.2 Viewing and Exporting a Check Report................................................................................................................ 2815.10.3 Configuring the Number of Health Check Reports to Be Reserved......................................................................2825.10.4 Managing Health Check Reports...........................................................................................................................2835.11 Static Service Pool Management.............................................................................................................................. 2835.11.1 Viewing the Status of a Static Service Pool...........................................................................................................2835.11.2 Configuring a Static Service Pool..........................................................................................................................2845.12 Tenant Management..................................................................................................................................................2875.12.1 Introduction........................................................................................................................................................... 2875.12.2 Creating a Tenant...................................................................................................................................................2885.12.3 Creating a Sub-tenant............................................................................................................................................ 2915.12.4 Deleting a Tenant...................................................................................................................................................2935.12.5 Managing a Tenant Directory................................................................................................................................ 2945.12.6 Recovering Tenant Data........................................................................................................................................ 2955.12.7 Creating a Resource Pool...................................................................................................................................... 2965.12.8 Modifying a Resource Pool................................................................................................................................... 2975.12.9 Deleting a Resource Pool...................................................................................................................................... 2975.12.10 Configuring a Queue........................................................................................................................................... 2985.12.11 Configuring the Queue Capacity Policy of a Resource Pool...............................................................................2995.12.12 Clearing the Configuration of a Queue................................................................................................................3005.13 Backup and Restoration............................................................................................................................................3005.13.1 Introduction........................................................................................................................................................... 3005.13.2 Backing Up Metadata............................................................................................................................................ 3035.13.3 Recovering Metadata.............................................................................................................................................3055.13.4 Modifying a Backup Task......................................................................................................................................3075.13.5 Viewing Backup and Recovery Tasks................................................................................................................... 3085.14 Security Management............................................................................................................................................... 3095.14.1 Default Users of Clusters with Kerberos Authentication Disabled.......................................................................3095.14.2 Changing the Password for an OS User................................................................................................................ 3125.14.3 Changing the Password for User admin................................................................................................................ 3135.14.4 Changing the Password for the Kerberos Administrator.......................................................................................3145.14.5 Changing the Password for the LDAP Administrator and the LDAP User.......................................................... 3155.14.6 Changing the Password for a Component Running User...................................................................................... 3165.14.7 Changing the Password for the OMS Database Administrator............................................................................. 3175.14.8 Changing the Password for the Data Access User of the OMS Database............................................................. 3185.14.9 Changing the Password for a Component Database User..................................................................................... 3195.14.10 Replacing HA Certificates...................................................................................................................................3205.14.11 Updating the Key of a Cluster............................................................................................................................. 321

6 Management of Clusters with Kerberos Authentication Enabled.................................. 3236.1 Users and Permissions of Clusters with Kerberos Authentication Enabled...............................................................3236.2 Default Users of Clusters with Kerberos Authentication Enabled............................................................................. 3276.3 Creating a Role........................................................................................................................................................... 3376.4 Creating a User Group................................................................................................................................................343


2019-01-15 vii

6.5 Creating a User........................................................................................................................................................... 3446.6 Modifying User Information...................................................................................................................................... 3456.7 Locking a User............................................................................................................................................................3466.8 Unlocking a User........................................................................................................................................................ 3466.9 Deleting a User........................................................................................................................................................... 3476.10 Changing the Password of an Operation User..........................................................................................................3476.11 Initializing the Password of a System User.............................................................................................................. 3486.12 Downloading a User Authentication File................................................................................................................. 3496.13 Modifying a Password Policy...................................................................................................................................3506.14 Configuring Cross-Cluster Mutual Trust Relationships........................................................................................... 3526.15 Configuring Users to Access Resources of a Trusted Cluster.................................................................................. 353

7 Using MRS..................................................................................................................................3557.1 Accessing the UI of the Open Source Component..................................................................................................... 3557.1.1 Overview................................................................................................................................................................. 3557.1.2 Creating an SSH Channel for Connecting to an MRS Cluster................................................................................ 3577.1.3 Configuring a Website Accessed by Browsers........................................................................................................3597.2 Using Hadoop from Scratch....................................................................................................................................... 3607.3 Using Spark from Scratch...........................................................................................................................................3647.4 Using Spark SQL from Scratch.................................................................................................................................. 3687.5 Using HBase from Scratch......................................................................................................................................... 3707.6 Using Hue................................................................................................................................................................... 3747.6.1 Accessing the Hue WebUI.......................................................................................................................................3747.6.2 Using HiveQL Editor on the Hue WebUI................................................................................................................3757.6.3 Using the Metadata Browser on the Hue WebUI.................................................................................................... 3777.6.4 Using File Browser on the Hue WebUI...................................................................................................................3807.6.5 Using Job Browser on the Hue WebUI....................................................................................................................3827.7 Using Kafka................................................................................................................................................................3837.7.1 Managing Kafka Topics...........................................................................................................................................3847.7.2 Querying Kafka Topics............................................................................................................................................3857.7.3 Managing Kafka User Permission...........................................................................................................................3857.7.4 Managing Messages in Kafka Topics...................................................................................................................... 3877.8 Using Storm................................................................................................................................................................3887.8.1 Submitting Storm Topologies on the Client............................................................................................................ 3887.8.2 Accessing the Storm WebUI....................................................................................................................................3897.8.3 Managing Storm Topologies....................................................................................................................................3907.8.4 Querying Storm Topology Logs.............................................................................................................................. 3917.9 Using CarbonData...................................................................................................................................................... 3927.9.1 Getting Started with CarbonData............................................................................................................................ 3927.9.2 About CarbonData Table......................................................................................................................................... 3947.9.3 Creating a CarbonData Table...................................................................................................................................3957.9.4 Deleting a CarbonData Table...................................................................................................................................3977.10 Using Flume............................................................................................................................................................. 397


2019-01-15 viii

7.10.1 Introduction........................................................................................................................................................... 3977.10.2 Installing the Flume Client.................................................................................................................................... 3997.10.3 Viewing Flume Client Logs...................................................................................................................................4017.10.4 Stopping or Uninstalling the Flume Client............................................................................................................4027.10.5 Using the Encryption Tool of the Flume Client.....................................................................................................4037.10.6 Flume Configuration Parameter Description.........................................................................................................4037.10.7 Example: Using Flume to Collect Logs and Import Them to Kafka.....................................................................4207.10.8 Example: Using Flume to Collect Logs and Import Them to OBS.......................................................................4227.10.9 Example: Using Flume to Read OBS Files and Upload Them to HDFS.............................................................. 4247.11 Using Loader.............................................................................................................................................................4267.11.1 Introduction............................................................................................................................................................4267.11.2 Loader Link Configuration.................................................................................................................................... 4277.11.3 Managing Loader Links.........................................................................................................................................4307.11.4 Source Link Configurations of Loader Jobs.......................................................................................................... 4317.11.5 Destination Link Configurations of Loader Jobs...................................................................................................4357.11.6 Managing Loader Jobs...........................................................................................................................................4387.11.7 Preparing a Driver for MySQL Database Link......................................................................................................4417.11.8 Example: Using Loader to Import Data from OBS to HDFS................................................................................441

8 FAQs.............................................................................................................................................4448.1 What Is MRS?............................................................................................................................................................ 4448.2 What Are the Highlights of MRS?............................................................................................................................. 4448.3 What Is MRS Used For?.............................................................................................................................................4458.4 How Do I Use MRS?..................................................................................................................................................4458.5 How Do I Ensure Data and Service Running Security?.............................................................................................4468.6 How Do I Prepare a Data Source for MRS?...............................................................................................................4468.7 What Is the Difference Between Data in OBS and That in HDFS?........................................................................... 4478.8 How Do I View All Clusters?.....................................................................................................................................4488.9 How Do I View Log Information?..............................................................................................................................4488.10 What Types of Jobs Are Supported by MRS?.......................................................................................................... 4488.11 How Do I Submit Developed Programs to MRS?....................................................................................................4498.12 How Do I View Cluster Configurations?..................................................................................................................4508.13 What Types of Host Specifications Are Supported by MRS?..................................................................................4508.14 What Components Are Supported by MRS?............................................................................................................4518.15 What Is the Relationship Between Spark and Hadoop?........................................................................................... 4528.16 What Types of Spark Jobs Are Supported by an MRS Cluster?.............................................................................. 4528.17 Can a Spark Cluster Access Data in OBS?...............................................................................................................4528.18 What Is the Relationship Between Hive and Other Components?...........................................................................4528.19 What Types of Distributed Storage Are Supported by MRS?..................................................................................4538.20 Can MRS Cluster Nodes Be Changed on the MRS Management Console?............................................................453

A Change History......................................................................................................................... 454


2019-01-15 ix

1 Overview

1.1 IntroductionMapReduce Service (MRS) is a data processing and analysis service that is based on a cloudcomputing platform. It is stable, reliable, scalable, and easy to manage. You can use MRSimmediately after applying for it.

MRS builds a reliable, secure, and easy-to-use operation and maintenance (O&M) platform.MRS is capable of processing and analyzing a large volume of data to meet your requirementson data storage and processing. You can independently apply for, use, and host componentsHadoop, Spark, HBase, and Hive to quickly create clusters on hosts for storing and computinga large volume of data that is not demanding on real-time in batches. After data storage andcomputing tasks are completed, the clusters can be terminated.

1.2 Application ScenariosMRS can be applied in various industries in the processing, analysis, and storage of massivedata.

l Analysis and processing of mass data

Usage: analysis and processing of massive sets of data, online and offline analysis, andbusiness intelligence

Characteristics: processing of massive data sets, heavy computing workloads, long-termanalysis, and data analysis and processing on a large number of computers

Application scenarios: log analysis, online and offline analysis, simulation calculationsin scientific research, biometric analysis, and spatial-temporal data analysis

l Storage of mass data

Usage: storage and retrieval of massive sets of data and data warehouse

Characteristics: storage, retrieval, backup, and disaster recovery of massive sets of datawith zero data loss

Application scenarios: log storage, file storage, simulation data storage in scientificresearch, biological characteristic information storage, genetic engineering data storage,and spatial-temporal data storage

MapReduce ServiceUser Guide 1 Overview

2019-01-15 1

l Streaming processing of mass dataUsage: real-time analysis of mass data, continuous computing, offline and onlinemessage consumptionCharacteristics: massive amount of data, high throughput, high reliability, flexiblescalability, and distributed real-time computing frameworkApplication scenarios: streaming data collection, active tracking on websites, datamonitoring, distributed ETL, and risk control

1.3 Functions

1.3.1 Cluster Management FunctionThis section describes the Web interface functions of MRS clusters.

MRS provides a Web interface, the functions of which are described as follows:

l Creating a cluster:Users can create a cluster on MRS. The application scenarios of a cluster are as follows:– Data storage and computing are performed separately. Data is stored in the Object

Storage Service (OBS), which features a low-cost and unlimited storage capacityand clusters can be terminated at any time. The computing performance isdetermined by the OBS access performance and is lower than that of HadoopDistributed File System (HDFS). OBS is recommended when data computing isinfrequent.

– Data storage and computing are performed together. Data is stored in HDFS, whichfeatures high cost, high computing performance, and limited storage capacity.Before terminating clusters, you must export and store the data. HDFS isrecommended when data computing is frequent.

l Expanding clusters:To expand clusters and handle peak service loads, add core nodes.

l Managing clusters:After completing data processing and analysis, you can manage and terminate clusters.– Querying alarms:

If either the system or a cluster is faulty, Elastic BigData will collect faultinformation and report it to the network management system. Maintenancepersonnel will then be able to locate the faults.

– Querying logs:To help locate faults in the case of faulty clusters, operation information is recorded.

– Managing files:MRS supports the ability to import data from the OBS system to HDFS and alsoexport data that has already been processed and analyzed. You can store data inHDFS.

l Adding a job:A job is an executable program provided by MRS to process and analyze user data.Currently, MRS supports MapReduce jobs, Spark jobs, and Hive jobs, and allows usersto submit Spark SQL statements online to query and analyze data.


2019-01-15 2

l Managing jobs:Jobs can be managed, stopped, or deleted. You can also view details of completed jobsalong with detailed configurations. Spark SQL jobs, however, cannot be stopped.

l Providing management interfaces:MRS Manager functions as a unified management platform for MRS clusters.– Cluster monitoring enables you to quickly see the health status of hosts and

services.– Graphical indicator monitoring and customization enable you to quickly obtain key

information about the system.– Service property configurations help meet service performance requirements.– Cluster, service, and role instance operations enable you to start or stop services and

clusters in one-click mode.

1.3.2 HadoopMRS deploys and hosts Apache Hadoop clusters in the cloud to provide services featuringhigh availability and enhanced reliability for big data processing and analysis.

Hadoop is a distributed system architecture that consists of HDFS, MapReduce, and Yarn.The following describes the functions of each component:l HDFS:

HDFS provides high-throughput data access and is applicable to the processing of largedata sets. MRS cluster data is stored in HDFS.

l MapReduce:As a programming model that simplifies parallel computing, MapReduce gets its namefrom two key operations: Map and Reduce. Map divides one task into multiple tasks, andReduce summarizes their processing results and produces the final analysis result. MRSclusters allow users to submit self-developed MapReduce programs, execute theprograms, and obtain the results.

l Yarn:As the resource management system of Hadoop, Yarn manages and schedules resourcesfor applications. MRS uses Yarn to schedule and manage cluster resources.

For details about Hadoop architecture and principles, see http://hadoop.apache.org/docs/stable/index.html.

1.3.3 SparkSpark is a distributed and parallel data processing framework. MRS deploys and hosts ApacheSpark clusters in the cloud.

Fault-tolerant Spark is a distributed computing framework based on memory, which ensuresthat data can be quickly restored and recalculated. It is more efficient than MapReduce interms of iterative data computing.

In the Hadoop ecosystem, Spark and Hadoop are seamlessly interconnected. By using HDFSfor data storage and Yarn for resource management and scheduling, users can switch fromMapReduce to Spark quickly.

Spark applies to the following scenarios:


2019-01-15 3

http://hadoop.apache.org/docs/stable/index.html

http://hadoop.apache.org/docs/stable/index.html

l Data processing and ETL (extract, transform, and load)

l Machine learning

l Interactive analysis

l Iterative computing and data reuse. Users benefit more from Spark when they performoperations frequently and the volume of the required data is large.

l On-demand capacity expansion. This is due to Spark's ease-of-use and low cost in thecloud.

For details about Spark architecture and principles, see http://spark.apache.org/docs/2.1.0/quick-start.html.

1.3.4 Spark SQLSpark SQL is an important component of Apache Spark and subsumes Shark. It helpsengineers unfamiliar with MapReduce to get started quickly. Users can enter SQL statementsdirectly to analyze, process, and query data.

Spark SQL has the following highlights:

l Is compatible with most Hive syntaxes, which enables seamless switchovers.

l Is compatible with standard SQL syntaxes.

l Resolves data skew problems.

Spark SQL can join and convert skew data. It evenly distributes data that does notcontain skewed keys to different tasks for processing. For data that contains skewedkeys, Spark SQL broadcasts the smaller amount of data and uses the Map-Side Join toevenly distribute the data to different tasks for processing. This fully utilizes CPUresources and improves performance.

l Optimizes small files.

Spark SQL employs the coalesce operator to process small files and combines partitionsgenerated by small files in tables. This reduces the number of hash buckets during ashuffle operation and improves performance.

For details about Spark SQL architecture and principles, see http://spark.apache.org/docs/2.1.0/programming-guide.html.

1.3.5 HBaseHBase is a column-oriented distributed cloud storage system. It features enhanced reliability,excellent performance, and elastic scalability.

It is applicable in distributed computing and the storage of massive data. With HBase, userscan filter and analyze data with ease and get responses in milliseconds, thereby rapidly miningdata.

HBase applies to the following scenarios:

l Massive data storage

Users can use HBase to build a storage system capable of storing TB or PB of data. Italso provides dynamic scaling capabilities so that users can adjust cluster resources tomeet specific performance or capacity requirements.

l Real-time query


2019-01-15 4

http://spark.apache.org/docs/2.1.0/quick-start.html

http://spark.apache.org/docs/2.1.0/quick-start.html

http://spark.apache.org/docs/2.1.0/programming-guide.html

http://spark.apache.org/docs/2.1.0/programming-guide.html

The columnar and key-value storage models apply to the ad-hoc querying of enterpriseuser details. The low-latency point query, based on the master key, reduces the responselatency to seconds or milliseconds, facilitating real-time data analysis.

HBase has the following highlights:l Provides automatic Region recovery from an exception, ensuring reliability of data

access.l Enables data imported to the active cluster using BulkLoad to be automatically

synchronized to the disaster recovery backup cluster. HBase also enhances theReplication feature, for example, supporting table structure synchronization, datasynchronization between tables with system permissions, and the cluster readonlyfunction.

l Improves performance of the BulkLoad feature, accelerating data import.

For details about HBase architecture and principles, see http://hbase.apache.org/book.html.

1.3.6 HiveHive is a data warehouse framework built on Hadoop. It stores structured data using the Hivequery language (HiveQL), a language similar to SQL.

Hive converts HiveQL statements to MapReduce or HDFS tasks to query and analyzemassive data stored in Hadoop clusters. The console provides the interface to enter HiveScript and supports the online submission of HiveQL statements.

Hive supports the HDFS Colocation, column encryption, HBase deletion, row delimiter, andCSV SerDe functions, as detailed below.

HDFS Colocation

HDFS Colocation is the data location control function provided by HDFS. The HDFSColocation interface stores associated data or data on which associated operations areperformed on the same storage node.

Hive supports the HDFS Colocation function. When Hive tables are created, after the locatorinformation is set for table files, the data files of related tables are stored on the same storagenode. This ensures convenient and efficient data computing among associated tables.

Column Encryption

Hive supports encryption of one or more columns. The columns to be encrypted and theencryption algorithm can be specified when a Hive table is created. When data is inserted intothe table using the insert statement, the related columns are encrypted.

The Hive column encryption mechanism supports two encryption algorithms that can beselected to meet site requirements during table creation:l AES (the encryption class is org.apache.hadoop.hive.serde2.AESRewriter)l SMS4 (the encryption class is org.apache.hadoop.hive.serde2.SMS4Rewriter)

HBase Deletion

Due to the limitations of underlying storage systems, Hive does not support the ability todelete a single piece of table data. In Hive on HBase, MRS Hive supports the ability to delete


2019-01-15 5

http://hbase.apache.org/book.html

a single piece of HBase table data. Using a specific syntax, Hive can delete one or morepieces of data from an HBase table.

Row Delimiter

In most cases, a carriage return character is used as the row delimiter in Hive tables stored intext files, that is, the carriage return character is used as the terminator of a row duringsearches. However, some data files are delimited by special characters, and not a carriagereturn character.

MRS Hive allows users to use different characters or character combinations to delimit rowsof Hive text data. When creating a table, set inputformat toSpecifiedDelimiterInputFormat, and set the following parameter before each search.

set hive.textinput.record.delimiter='';

The table data is then queried by the specified delimiter.

CSV SerDe

Comma separated value (CSV) is a common text file format. CSV stores table data (digits andtexts) in texts and uses a comma (,) as the text delimiter.

CSV files are universal. Many applications allow users to view and edit CSV files inWindows Office or conventional databases.

MRS Hive supports CSV files. Users can import CSV files to Hive tables or export user Hivetable data as CSV files to use them in other applications.

1.3.7 HueHue is a web application developed based on the open source Django Python Web framework.It provides graphical user interfaces (GUIs) for users to configure, use, and view MRSclusters. Hue supports HDFS, Hive, MapReduce, and ZooKeeper in MRS clusters, includingthe following application scenarios:l HDFS: You can create, view, modify, upload, and download files as well as create

directories and modify directory permission.l Hive: You can edit and execute HiveQL and add, delete, modify, and query databases,

tables, and views through MetaStore.l MapReduce: You can check MapReduce tasks that are being executed or have been

finished in the clusters, including their status, start and end time, and run logs.l ZooKeeper: You can check ZooKeeper status in the clusters.

For details about Hue, visit http://gethue.com/.

1.3.8 Kerberos Authentication

Overview

To ensure data security for users, MRS clusters provide user identity verification and userauthentication functions. To enable all verification and authentication functions, you mustenable Kerberos authentication when creating the cluster.

Identity Verification


2019-01-15 6

http://gethue.com/

The user identity verification function verifies the identity of a user when the user performsO&M operations or accesses service data in a cluster.

If the user restarts services in an MRS cluster on MRS Manager, the user must enter thepassword of the current account on MRS Manager. For example, restart services andsynchronize cluster configurations.

Authentication

Users with different identities may have different permissions to access and use clusterresources. To ensure data security, users must be authenticated after identity verification.

Identity Verification

Clusters that support Kerberos authentication use the Kerberos protocol for identityverification. The Kerberos protocol supports mutual verification between clients and servers.This eliminates the risks incurred by sending user credentials over the network for simulatedverification. In MRS clusters, KrbServer provides the Kerberos authentication function.

Kerberos User Object

In the Kerberos protocol, each user object is a principal. A complete principal consists of twoparts: username and domain name. In O&M or application development scenarios, the useridentity must be verified before a client connects to a server. Users for O&M and serviceoperations in MRS clusters are classified into Human-machine and Machine-machine users.The password of Human-machine users is manually configured, while the password ofMachine-machine users is generated by the system randomly.

Kerberos Authentication

Kerberos supports two authentication modes: password and keytab. The default verificationvalidity period is 24 hours.

l Password verification: User identity is verified by inputting the correct password. Thismode mainly applies to O&M scenarios where Human-machine users are used. Theconfiguration command is kinit user name.

l Keytab verification: Keytab files contain users' security information. During keytabverification, the system automatically uses the encrypted credential information forverification. Users do not need to enter the password. This mode mainly applies tocomponent application development scenarios where Machine-machine users are used.Keytab verification can also be configured using the kinit command.

Authentication

After identity verification for users, the MRS system also authenticates the users to ensurethat they have limited or full permission on cluster resources. If a user does not have thepermission for accessing cluster resources, the system administrator must grant the requiredpermission to the user. Otherwise, the user fails to access the resources.

1.3.9 KafkaMRS deploys and hosts Kafka clusters in the cloud based on the open-source Apache Kafka.Kafka is a distributed, partitioned, replicated message publishing and subscription system. Itprovides features similar to Java Message Service (JMS) and has the following enhancements:

l Message persistency


2019-01-15 7

Messages are stored in the storage space of clusters in persistence mode and can be usedfor batch consumption and real-time application programs. Data persistence preventsdata loss.

l High throughputHigh throughput is provided for message publishing and subscription.

l ReliabilityMessage processing methods such as At-Least Once, At-Most Once, and Exactly Onceare provided.

l DistributionA distributed system is easy to expand. When new Core nodes are added for capacityexpansion, the MRS cluster detects the nodes on which Kafka is installed and adds themto the cluster without interrupting services.

Kafka applies to online and offline message consumption. It is ideal for network service datacollection scenarios, such as conventional data collection, website active tracing, datamonitoring, and log collection.

For details about Kafka architecture and principles, see https://kafka.apache.org/0100/documentation.html.

1.3.10 StormMRS deploys and hosts Storm clusters in the cloud based on the open-source Apache Storm.Storm is a distributed, reliable, fault-tolerant computing system that processes large-volumestreaming data in real time. It is applicable to real-time analysis, continuous computing, anddistributed extract, transform, and load (ETL). It has the following features:

l Distributed real-time computingIn a Storm cluster, each node runs multiple work processes; each work process createsmultiple threads; each thread executes multiple tasks; and each task processes dataconcurrently.

l Fault toleranceDuring message processing, if a node or a process is faulty, the message processing unitcan be redeployed.

l Reliable messagesData processing methods At-Least Once, At-Most Once, and Exactly Once aresupported.

l Flexible topology defining and deploymentThe Flux framework is used to define and deploy service topologies. If the service DAGis changed, users only need to modify YAML domain specific language (DSL), but donot need to recompile or package service code.

l Integration with external componentsStorm supports integration with external components such as Kafka, HDFS, and HBase.This facilitates implementation of services that involve multiple data sources.

For details about Storm architecture and principles, see http://storm.apache.org/.

1.3.11 CarbonDataCarbonData is a new Apache Hadoop file format. It adopts the advanced column-orientedstorage, index, compression, and encoding technologies and stores data in HDFS to improve


2019-01-15 8

https://kafka.apache.org/0100/documentation.html

https://kafka.apache.org/0100/documentation.html

http://storm.apache.org/

computing efficiency. It helps accelerate the PB-level data query and is applicable to quickerinteractive queries. CarbonData is also a high-performance analysis engine that integrates datasources with Spark. Users can execute Spark SQL statements to query and analyze data.

CarbonData has the following features:

l SQLCarbonData is compatible with Spark SQL and supports SQL query operationsperformed on Spark SQL.

l Simple definition of table data setsCarbonData supports defining and creating data sets by using user-friendly DataDefinition Language (DDL) statements. CarbonData DDL is flexible and easy to use,and can define complex tables.

l Convenient data managementCarbonData provides various data management functions for data loading andmaintenance. It can load historical data and incrementally load new data. The loadeddata can be deleted according to the loading time and specific data loading operationscan be canceled.

l Quick query responseCarbonData features high-performance query. It uses dedicated data formats and appliesmultiple index technologies, global dictionary code, and multiple push-downoptimizations. The query speed is 10 times that of Spark SQL.

l Efficient data compressionCarbonData compresses data by combining the lightweight and heavyweightcompression algorithms. This saves 60% to 80% data storage spaces and the hardwarestorage cost.

For details about CarbonData architecture and principles, see http://carbondata.apache.org/.

1.3.12 FlumeFlume is a distributed and highly available system for massive log aggregation. Users cancustomize data transmitters in Flume to collect data. Flume can also roughly process the datait receives.

Flume provides the following features:

l Collects and aggregates event stream data in a distributed approach.l Collects log data.l Supports dynamic configuration update.l Provides the context-based routing function.l Supports load balancing and failover.l Provides comprehensive scalability.

For details about the Flume architecture and principles, see https://flume.apache.org/releases/1.6.0.html.

1.3.13 LoaderLoader is a data migration component developed based on Apache Sqoop. It quickens andsimplifies data migration between Hadoop and structured, semi-structured, and unstructureddata sources. Loader can both import and export data into and out of MRS clusters.


2019-01-15 9

http://carbondata.apache.org/

https://flume.apache.org/releases/1.6.0.html

https://flume.apache.org/releases/1.6.0.html

Loader provides the following features:

l Uses a high-available service architecture.

l Supports data migration using a client.

l Manages data migration jobs.

l Supports data processing during migration.

l Runs migration jobs using MapReduce components.

For details about the Loader architecture and principles, see http://sqoop.apache.org/docs/1.99.7/index.html.

1.4 Relationships with Other ServicesThis section describes the relationships between MRS and other services.

l Virtual Private Cloud (VPC)

MRS clusters are created in the subnets of a VPC. VPCs provide secure, isolated, logicalnetwork environments for MRS clusters.

l Object Storage Service (OBS)

OBS stores the following user data:

– MRS job input data, such as user programs and data files

– MRS job output data, such as result files and log files of jobs

In MRS clusters, the HDFS, Hive, MapReduce, Yarn, Spark, Flume, and Loader modulescan import or export data from OBS.

l Elastic Cloud Server (ECS)

Each node in an MRS cluster is an ECS.

l Identity and Access Management (IAM)

IAM provides authentication for MRS.

l Cloud Trace Service (CTS)

CTS provides operation records, including requests for operating MRS resources and therequest results, for users to query, audit, and trace back the recorded problems.

1.5 Required Permission for Using MRSThis section describes permission required for using MRS.

Permission

By default, the system provides user management permission and resource managementpermission. The user management permission is used to manage users, user groups, and theirpermissions. The resource management permission is used to manage user operations oncloud service resources.

See Table 1-1 or Permissions for MRS permission.


2019-01-15 10

http://sqoop.apache.org/docs/1.99.7/index.html

http://sqoop.apache.org/docs/1.99.7/index.html

https://docs.prod-cloud-ocb.orange-business.com/en-us/permissions/index.html

Table 1-1 Permission list

Permission Description Setting

MRS operationpermission

Users with this permissionhave the full operationrights on MRS resources.

Two methods are available:l Add the Tenant Administrator

permission to the user group.l Add the MRS Administrator,

Server Administrator, TenantGuest, and BSS Administratorpermission to the user group.

MRS query permission Users with this permissioncan:l View overview

information aboutMRS.

l Query MRS operationlogs.

l Query MRS clusterlists, including existingclusters, historicalclusters, and task lists.

l View cluster basicinformation and patchinformation.

l View job lists and jobdetails.

l Query the HDFS filelist and file operationrecords.

l Query the alarm list.

Add the Tenant Guest permissionto the user group.

1.6 LimitationsBefore using MRS, ensure that you have read and understand the following limitations.

l MRS clusters must be created in VPC subnets.l You are advised to use any of the following browsers to access MRS:

– Google Chrome 36.0 or later– Internet Explorer 9.0 or later

If you use Internet Explorer 9.0, you may fail to log in to the MRS managementconsole because user Administrator is disabled by default in some Windowssystems, such as Windows 7 Ultimate. Internet Explorer automatically selects asystem user for installation. As a result, Internet Explorer cannot access themanagement console. You are advised to reinstall Internet Explorer 9.0 or later asthe administrator (recommended) or alternatively run Internet Explorer 9.0 as theadministrator.


2019-01-15 11

l To prevent illegal access, only assign access permission for security groups used byMRS where necessary.

l Do not perform the following operations because they will cause cluster exceptions:– Deleting or modifying the default security group that is created when you create an

MRS cluster.– Powering off, restarting, or deleting cluster nodes displayed in ECS, changing or

reinstalling their OS, or modifying their specifications when you use MRS.– Deleting the processes, installed applications or files that already exist on the cluster

node.– Deleting MRS cluster nodes. Deleted nodes will still be charged.

l If a cluster exception occurs when no incorrect operations have been performed, contacttechnical support engineers. The technical support engineers will ask you for your keyand then perform troubleshooting.

l Changing the password may make services unavailable. For this reason, change thepassword on the ECS console.

l MRS clusters are still charged during exceptions. Contact technical support engineers tohandle cluster exceptions.

l Plan disks of cluster nodes based on service requirements. If you want to store a largevolume of service data, add EVS disks or storage space to prevent insufficient storagespace from affecting node running.

l The cluster nodes store only users' service data. Non-service data can be stored in theOBS or other ECS nodes.

l The cluster nodes only run MRS cluster programs. Other client applications or userservice programs are deployed on separate ECS nodes.


2019-01-15 12

2 MRS Quick Start

2.1 Introduction to the Operation ProcessMRS is easy to use and provides a user-friendly user interface (UI). By using computersconnected in a cluster, you can run various tasks, and process or store petabytes of data.

After Kerberos authentication is disabled, a typical procedure for using MRS is as follows:

1. Prepare data.Upload the local programs and data files to be computed to Object Storage Service(OBS).

2. Create a cluster.Create clusters before you use MRS. The cluster quantity is subject to the Elastic CloudServer (ECS) quantity. Configure basic cluster information to complete cluster creation.You can submit a job at the same time when you create a cluster.

NOTE

When you create a cluster, only one new job can be added. If you need to add more jobs, performStep 4.

3. Import data.After an MRS cluster is successfully created, use the import function of the cluster toimport OBS data to HDFS. An MRS cluster can process both OBS data and HDFS data.

4. Add a job.After a cluster is created, you can analyze and process data by adding jobs. Note thatMRS provides a platform for executing programs developed by users. You can submit,execute, and monitor such programs by using MRS. After a job is added, the job is in theRunning state by default.

5. View the execution result.The job operation takes a while. After job running is complete, go to the JobManagement page, and refresh the job list to view the execution results on the Job tabpage.You cannot execute a successful or failed job, but can add or copy the job. After settingjob parameters, you can submit the job again.

6. Terminate a cluster.

MapReduce ServiceUser Guide 2 MRS Quick Start

2019-01-15 13

If you want to terminate a cluster after jobs are complete, click Terminate in Cluster.The cluster status changes from Running to Terminating. After the cluster isterminated, the cluster status will change to Terminated and will be displayed inHistorical Cluster.

2.2 Quick Start

2.2.1 Creating a ClusterThis section describes how to create a cluster using MRS.

Procedure

Step 1 Log in to the MRS management console.

Step 2 Click Create Cluster and open the Create Cluster page.

NOTE

Note the usage of quotas when you create a cluster. If the resource quotas are insufficient, apply for newquotas based on the prompted information and create new clusters.

The following is a cluster configuration example:

l Cluster Name: This parameter can be set to the default system name. For the ease ofdistinguishing and memorizing, it is recommended that the cluster name be set to a valueconsisting of the employee ID, short spelling of the user's name, or the date. Forexample: mrs_20160907

l AZ: Use the default value. If a cluster already exists in the region, you are advised to usea different region to create a cluster.

l VPC: Use the default value. If no virtual private cloud (VPC) exists, click View VPC toenter VPC, and create a VPC.

l Subnet: Use the default value. If no subnet is created in the VPC, click Create Subnetto create a subnet in the corresponding VPC.

l Cluster Version: Use the default value MRS 1.5.0. The latest version of MRS is used bydefault. Currently, the latest version is MRS 1.5.0.

l Cluster Type: Use the default value Analysis Cluster or select Streaming Cluster.l Instance Specifications: Select s1.8xlarge.linux.mrs -- 32 vCPU,128 GB for both the

Master and Core nodes.l Quantity: Retain the default number 2 for the Master nodes and set the number of Core

nodes to 3.l Data Disk: Indicate the Core node data disk storage space. Select Common I/O. The

size is 100 GB.l Key Pair: Select the key pair, for example SSHkey-bba1.pem, from the drop-down list. If

you have obtained the private key file, select I acknowledge that I have obtainedprivate key file SSHkey-bba1.pem and that without this file I will not be able to log into my ECS. If no key pair is created, click View Key Pair and create or import a keypair. Then obtain the private key file.

l Logging: Select "Disable" . The default value is "Enable": .


2019-01-15 14

l Kerberos Authentication: The default value is "Disable": .l Component: For an analysis cluster, select components Spark, HBase, Hive, and so on.

For a streaming cluster, select components Kafka, Storm, and so on.l Create Job: Do not add a job here and do not select Terminate the cluster after jobs

are completed.

NOTE

MRS streaming clusters do not support Job Management or File Management. If the cluster typeis Streaming Cluster, the Create Job area is not displayed on the cluster creation page.

Step 3 Click Create Now.

Step 4 Confirm cluster specifications and click Submit.

The cluster creation takes a while. The initial state of the cluster created is Starting. After thecluster is created successfully, the status will be updated to Running. Please be patient.

----End

2.2.2 Managing FilesYou can create directories, delete directories, and import, export, or delete files on the FileManagement page in an analysis cluster with Kerberos authentication disabled.

BackgroundData to be processed by MRS is stored in either OBS or HDFS. OBS provides you withmassive, highly reliable, and secure data storage capabilities at a low cost. You can view,manage, and use data through OBS Console or OBS Browser.

Importing DataMRS supports data import from the OBS system to HDFS. This function is recommended ifthe data size is small, because the upload speed reduces as the file size increases.

Both files and folders containing files can be imported. The operations are as follows:

1. Log in to the MRS management console.2. Choose Cluster > Active Cluster, select a cluster, and click its name to switch to the

cluster information page.3. Click File Management and go to the File Management tab page.4. Select HDFS File List.5. Click the data storage directory, for example, bd_app1.

bd_app1 is just an example. The storage directory can be any directory on the page. Youcan create a directory by clicking Create Folder.

6. Click Import Data to configure the paths for HDFS and OBS.

NOTE

When configuring the OBS or HDFS path, click Browse, select the file path, and click OK.

– The path for OBSn Must start with s3a://. s3a:// is used by default.n Empty folders cannot be imported.


2019-01-15 15

n Directories and file names can contain letters, Chinese characters, digits,hyphens (-), or underscores (_), but cannot contain special characters (;|&><'$*?\).

n Directories and file names cannot start or end with spaces, but can have spacesbetween other characters.

n The full path of OBS contains a maximum of 1023 characters.– The path for HDFS

n Must start with /user.n Directories and file names can contain letters, Chinese characters, digits,

hyphens (-), or underscores (_), but cannot contain special characters (;|&><'$*?\).


n The full path of HDFS contains a maximum of 1023 characters.n The parent HDFS directory in HDFS File List is displayed in the textbox for

the HDFS path by default when data is imported.7. Click OK.

View the upload progress in File Operation Record. The data import operation isoperated as a Distcp job by MRS. You can check whether the Distcp job is successfullyexecuted in Job Management > Job.

Exporting Data

After data is processed and analyzed, you can either store the data in HDFS or export it to theOBS system.

Both files and folders containing files can be exported. The operations are as follows:


cluster information page.3. Click File Management and go to the File Management tab page.4. Select HDFS File List.5. Click the data storage directory, for example, bd_app1.6. Click Export Data and configure the paths for HDFS and OBS.

NOTE


– The path for OBSn Must start with s3a://. s3a:// is used by default.n Empty folders cannot be imported.n Directories and file names can contain letters, Chinese characters, digits,



n The full path of OBS contains a maximum of 1023 characters.


2019-01-15 16

– The path for HDFSn Must start with /user.n Directories and file names can contain letters, Chinese characters, digits,




the HDFS path by default when data is imported.

NOTE

Ensure that the exported folder is not empty. If an empty folder is exported to the OBS system, thefolder is exported as a file. After the folder is exported, its name is changed, for example, from testto test-$folder$, and its type is file.

7. Click OK.View the upload progress in File Operation Record. The data export operation isoperated as a Distcp job by MRS. You can check whether the Distcp job is successfullyexecuted in Job Management > Job.

2.2.3 Creating a JobYou can submit developed programs to MRS, execute them, and obtain the execution resulton the Job Management page in an analysis cluster with Kerberos authentication disabled.

PrerequisitesBefore creating jobs, upload the local data to OBS for computing and analysis. MRS allowsdata to be exported from OBS to HDFS for computing and analysis. After the analysis andcomputing are complete, you can either store the data in HDFS or export it to OBS. HDFSand OBS can store compressed data in the format of bz2 or gz.

Procedure


Step 2 Choose Cluster > Active Cluster, select a cluster, and click its name to switch to the clusterinformation page.

Step 3 Click Job Management and go to the Job Management tab page.

Step 4 On the Job tab page, click Create and go to the Create Job page.

Table 2-1 describes job configuration information.


2019-01-15 17

Table 2-1 Job configuration information

Parameter Description

Type Job typePossible types include:l MapReducel Sparkl Spark Scriptl Hive Script

NOTETo add jobs of the Spark and Hive types, you need to select Spark and Hivecomponents when creating a cluster and the cluster must be in the runningstate. Spark Script jobs support Spark SQL only, and Spark supports SparkCore and Spark SQL.

Name Job nameThis parameter consists of 1 to 64 characters, including letters, digits,hyphens (-), or underscores (_). It cannot be null.NOTE

Identical job names are allowed but not recommended.

Program Path Address of the JAR file of the program for executing jobsNOTE

When configuring this parameter, click OBS or HDFS, specify the file path, andclick OK.

This parameter cannot be null.This parameter must meet the following requirements:l A maximum of 1023 characters are allowed, but special characters

(*?<">|\) are not allowed. The address cannot be empty or full ofspaces.

l The path varies depending on the file system:– OBS: The path must start with s3a://, for example, s3a://

wordcount/program/hadoop-mapreduce-examples-2.7.2.jar.– HDFS: The path must start with /user.

l Spark Script must end with .sql; MapReduce and Spark must endwith .jar. sql, jar are case-insensitive.

Parameters Key parameter for executing jobsThis parameter is assigned by an internal function. MRS is onlyresponsible for inputting the parameter.Format: package name.class nameA maximum of 2047 characters are allowed, but special characters (;|&>',<$) are not allowed. This parameter can be empty.


2019-01-15 18


Import From Address for inputting dataNOTE


The path varies depending on the file system:l OBS: The path must start with s3a://.l HDFS: The path must start with /user.A maximum of 1023 characters are allowed, but special characters (*?<">|\) are not allowed. This parameter can be empty.

Export To Address for outputting dataNOTE



Log path Address for storing job logs that record job running statusNOTE



NOTE

l The OBS path supports s3a://. s3a:// is used by default.

l The full path of HDFS and OBS contains a maximum of 1023 characters.

Step 5 Confirm job configuration information and click OK.

After jobs are added, you can manage them.

NOTE

By default, each cluster supports a maximum of 10 running jobs.

----End


2019-01-15 19

3 Cluster Operation Guide

3.1 OverviewYou can view the overall cluster status on the Dashboard > Overview page, and obtainrelevant MRS documents by clicking the document name under Helpful Links.

MRS helps manage and analyze massive data. MRS is easy to use and allows you to create acluster in about 20 minutes. You can add MapReduce, Spark, and Hive jobs to clusters toprocess and analyze user data. Additionally, processed data can be encrypted by using SecureSockets Layer (SSL) and transmitted to OBS, ensuring data security and integrity.

Cluster Status

Table 3-1 describes the possible status of each cluster on the MRS management console.

Table 3-1 Cluster status

Status Description

Starting A cluster is being created.

Running A cluster has been created successfully and all components in thecluster are running properly.

Expanding Core nodes are being added to a cluster.

Shrinking The Shrinking state is displayed when a node is being deleted inthe following operations: shutting down the node, deleting thenode, changing the OS of the node, reinstalling the OS of the node,and modifying the specifications of the node.

Abnormal Some components in a cluster are abnormal, and the cluster isabnormal.

Terminating A cluster is being terminated.

Failed Cluster creation, termination, or capacity expansion fails.

Terminated A cluster has been terminated.

MapReduce ServiceUser Guide 3 Cluster Operation Guide

2019-01-15 20

Job Status

Table 3-2 describes the status of jobs that you can add after logging in to the MRSmanagement console.

Table 3-2 Job status

Status Description

Running A job is being executed.

Completed Job execution is complete and successful.

Terminated A job is stopped during execution.

Abnormal An error occurs during job execution or job execution fails.

3.2 Cluster ListThe cluster list contains all clusters in MRS. You can view clusters in various states. If a largenumber of clusters are involved, navigate through multiple pages to view all of the clusters.

MRS, as a platform managing and analyzing massive data, provides a PB-level dataprocessing capability. Multiple clusters can be created. The cluster quantity is subject to theECS quantity.

Active Cluster

Clusters are listed in chronological order by default in the cluster list, with the most recentcluster displayed at the top. Table 3-3 describes parameters of the cluster list.

l Active Cluster: contains all clusters except the clusters in the Failed or Terminatedstate.

l Failed Task: contains only the tasks in the Failed state. Task failures include:

– Cluster creation failure

– Cluster termination failure

– Cluster capacity expansion failure

Table 3-3 Parameters in the active cluster list


Name Cluster name, which is set when a cluster is created.

ID Unique identifier of a cluster, which is automatically assigned when acluster is created.This parameter is displayed in Active Cluster only.


2019-01-15 21


Nodes Number of nodes that can be deployed in a cluster. This parameter is setwhen a cluster is created.NOTE

A small value may cause slow cluster running. Properly set a value based on datato be processed.

Status Status of a cluster.

Created Time when MRS starts charging MRS clusters of the customer.

AZ An availability zone of the working zone in the cluster, which is setwhen a cluster is created.

Operation Terminate: If you want to terminate a cluster after jobs are complete,click Terminate. The cluster status changes from Running toTerminating. After the cluster is terminated, the cluster status willchange to Terminated and will be displayed in Historical Cluster. Ifthe MRS cluster fails to be deployed, the cluster is automaticallyterminated.This parameter is displayed in Active Cluster only.NOTE

If a cluster is terminated before data processing and analysis are completed, dataloss may occur. Therefore, exercise caution when terminating a cluster.

Table 3-4 Button description

Button Description

In the drop-down list, select a state to filter clusters:l Active Cluster

– All: displays all existing clusters.– Starting: displays existing clusters in the Starting state.– Running: displays existing clusters in the Running state.– Expanding: displays existing clusters in the Expanding state.– Shrinking: displays existing clusters in the Shrinking state.– Abnormal: displays existing clusters in the Abnormal state.– Terminating: displays existing clusters in the Terminating state.– Frozen: displays existing clusters in the Frozen state.

Enter a cluster name in the search bar and click to search for acluster.

Click to manually refresh the cluster list.


2019-01-15 22

Historical ClusterHistorical Cluster: contains only the clusters in the Failed or Terminated state. Only clustersterminated within the last six months are displayed. If you want to view clusters terminatedsix months ago, contact technical support engineers.

Table 3-5 Parameters in the historical cluster list


Name Cluster name, which is set when a cluster is created.

Nodes Number of nodes that can be deployed in a cluster. This parameter is setwhen a cluster is created.NOTE

A small value may cause slow cluster running. Properly set a value based on datato be processed.

Status Status of a cluster.

Created Time when MRS starts charging MRS clusters of the customer.

Terminated Termination time of a cluster, that is, time when charging for the clusterstops. This parameter is displayed in Historical Cluster only.

AZ An availability zone of the working zone in the cluster, which is setwhen a cluster is created.


Button Description

Enter a cluster name in the search bar and click to search for acluster.

Click to manually refresh the cluster list.

3.3 Creating a ClusterThis section describes how to create a cluster using MRS.

Procedure


Step 2 Click Create Cluster and open the Create Cluster page.

NOTE

Note the usage of quotas when you create a cluster. If the resource quotas are insufficient, apply for newquotas based on the prompted information and create new clusters.


2019-01-15 23

Step 3 Table 3-7, Table 3-8, Table 3-9, Table 3-10, Table 3-11, and Table 3-12 describe the basicconfiguration information, node configuration information, login information, componentinformation and job configuration information for a cluster, respectively.

Table 3-7 Basic cluster information


Cluster Name Cluster name, which is globally unique.A cluster name can contain only 1 to 64 characters, including letters,digits, hyphens (-), or underscores (_).The default name is mrs_xxxx, where xxxx is a random combination offour letters and numbers.

AZ An availability zone (AZ) is a physical area that uses independent powerand network resources. In this way, applications are interconnectedusing internal networks but are physically isolated. As a result,application availability is improved. It is recommended that you createclusters under different AZs.Select an AZ of the working zone in the cluster. Currently, only the eu-west or as-south working zone is supported.

VPC A VPC is a secure, isolated, and logical network environment.Select the VPC for which you want to create a cluster and click ViewVPC to view the name and ID of the VPC. If no VPC is available, createone.

Subnet A subnet provides dedicated network resources that are isolated fromother networks, improving network security.Select the subnet for which you want to create a cluster to enter the VPCand view the name and ID of the subnet.If no subnet is created under the VPC, click Create Subnet to createone.

Cluster Version Currently, MRS 1.3.0 and MRS 1.5.0 are supported.The latest version of MRS is used by default. Currently, the latestversion is MRS 1.5.0.

Cluster Type MRS 1.3.0 or MRS 1.5.0 provides two types of clusters:l Analysis cluster: is used for offline data analysis and provides

Hadoop components.l Streaming cluster: is used for streaming tasks and provides stream

processing components.NOTE

MRS streaming clusters do not support Job Management or File Management.If the cluster type is Streaming Cluster, the Create Job area is not displayed onthe cluster creation page.


2019-01-15 24

Table 3-8 Cluster node information


Node Type MRS provides two types of nodes:l Master: A Master node in an MRS cluster manages the cluster,

assigns cluster executable files to Core nodes, traces the executionstatus of each job, and monitors the DataNode running status.

l Core: A Core node in a cluster processes data and stores process datain HDFS.


2019-01-15 25


InstanceSpecifications

Instance specifications of a nodeMRS supports host specifications determined by CPU, memory, anddisks space.Currently, Master nodes support the specifications s1.4xlarge ands1.8xlarge.Core nodes of a streaming cluster support specifications s1.xlarge,s1.2xlarge, s1.4xlarge, s1.8xlarge, c2.2xlarge and d1.8xlarge.Core nodes of an analysis cluster support all the followingspecifications:l s1.xlarge.linux.mrs -- 4 vCPU,16 GB

– CPU: 4-core– Memory: 16 GB– System Disk: 40 GB

l d1.xlarge.linux.mrs -- 6 vCPU,55 GB– CPU: 6-core– Memory: 55 GB– System Disk: 40 GB– Data Disk: 1.8 TB x 3 HDDs

l c2.2xlarge.linux.mrs -- 8 vCPU,16 GB– CPU: 8-core– Memory: 16 GB– System Disk: 40 GB

l d1.2xlarge.linux.mrs -- 12 vCPU,110 GB– CPU: 12-core– Memory: 110 GB– System Disk: 40 GB– Data Disk: 1.8 TB x 6 HDDs

l s1.4xlarge.linux.mrs -- 16 vCPU,64 GB– CPU: 16-core– Memory: 64 GB– System Disk: 40 GB


l s1.8xlarge.linux.mrs -- 32 vCPU,128 GB– CPU: 32-core– Memory: 128 GB


2019-01-15 26


– System Disk: 40 GBl d1.8xlarge.linux.mrs -- 48 vCPU,440 GB

– CPU: 48-core– Memory: 440 GB– System Disk: 40 GB– Data Disk: 1.8 TB x 24 HDDs

NOTEMore advanced instance specifications allow better data processing.

Quantity Number of Master and Core nodesMaster: currently fixed at 2Core: 3 to 100NOTE

l By default, a maximum of 100 Core nodes are supported. If more than 100Core nodes are required, contact technical support engineers or invoke abackground interface to modify the database.

l A small number of nodes may cause clusters to run slowly. Set an appropriatevalue based on data to be processed.

Data Disk Disk space of Core nodesUsers can add disks to increase storage capacity when creating a cluster.There are two different configurations for storage and computing:l Data storage and computing are performed separately

Data is stored in OBS, which features low cost and unlimited storagecapacity. The clusters can be terminated at any time in OBS. Thecomputing performance is determined by OBS access performanceand is lower than that of HDFS. This configuration is recommendedif data computing is infrequent.

l Data storage and computing are performed togetherData is stored in HDFS, which features high cost, high computingperformance, and limited storage capacity. Before terminatingclusters, you must export and store the data. This configuration isrecommended if data computing is frequent.

The following disk types are supported:l SATA: Common I/Ol SSD: Ultra-high I/OThe disk sizes range from 100 GB to 32000 GB, with 10 GB added eachtime, for example, 100 GB, 110 GB.NOTE

l The Master node increases data disk storage space for MRS Manager. Thedisk type must be the same as the data disk type of Core nodes. The defaultdisk space is 200 GB and cannot be changed.

l When the specifications of Core nodes are d1.xlarge, d1.2xlarge, d1.4xlarge,Data Disk is not displayed.


2019-01-15 27

Table 3-9 Login information


Key Pair Keys are used to log in to Master1 of the cluster.A key pair, also called an SSH key, consists of a public key and a privatekey. You can create an SSH key and download the private key forauthenticating remote login. For security, a private key can only bedownloaded once. Keep it secure.Select the key pair, for example SSHkey-bba1.pem, from the drop-downlist. If you have obtained the private key file, select I acknowledge thatI have obtained private key file SSHkey-bba1.pem and that withoutthis file I will not be able to log in to my ECS. If no key pair iscreated, click View Key Pair to create or import keys. Then obtain theprivate key file.Configure an SSH key using either of the following two methods:l Create an SSH key

After you create an SSH key, a public key and a private key aregenerated. The public key is stored in the system, and the private keyis stored in the local ECS. When you log in to an ECS, the public andprivate keys are used for authentication.

l Import an SSH keyIf you have obtained the public and private keys, import the publickey into the system. When you log in to an ECS, the public andprivate keys are used for authentication.

Table 3-10 Log management information


Logging Indicates whether the tenant has enabled the log collection function.

l Enable

l Disable

OBS Bucket Indicates the log save path, for example, s3a://mrs_log_0adca19f25834f3597602094bec12990_eu-xx.If an MRS cluster supporting log records fails to be created, you can useOBS to download related logs for troubleshooting.Procedure:1. Log in to the OBS management console.2. Select the mrs-log-<tenant_id>-<region_id> bucket from the

bucket list and go to the /<cluster_id>/install_log folder todownload the YYYYMMDDHHMMSS.tar.gz log, for example, /mrs_log_0adca19f25834f3597602094bec12990_eu-xx/65d0a20f-bcb7-4da3-81d3-71fef12d993d/20170818091516.tar.gz.


2019-01-15 28

Table 3-11 Component information


KerberosAuthentication

Indicates whether to enable Kerberos authentication when logging in toMRS Manager. Possible values are as follows:

lIf Kerberos authentication is disabled, you can use all functions of anMRS cluster. You are advised to disable Kerberos authentication insingle-user scenarios. For clusters with Kerberos authenticationdisabled, you can directly access the MRS cluster management pageand components without security authentication.

lIf Kerberos authentication is enabled, common users cannot use thefile management and job management functions of an MRS clusterand cannot view cluster resource usage or the job records for Hadoopand Spark. To use more cluster functions, the users must contact theMRS Manager administrator to assign more permissions. You areadvised to enable Kerberos authentication in multi-user scenarios.

You can click or to disable or enable Kerberosauthentication, respectively.After creating MRS clusters with Kerberos authentication enabled, userscan manage running clusters on MRS Manager. The users must preparea working environment on the public cloud platform for accessing MRSManager. For details, see Accessing MRS Manager SupportingKerberos Authentication.NOTE

The Kerberos Authentication, Username, Password, and Confirm Passwordparameters are displayed only after the user obtains the open beta test (OBT)permission for MRS in security mode.

Username Indicates the username for the administrator of MRS Manager. admin isused by default.This parameter needs to be configured only when Kerberos

Authentication is set to "Enable": .


2019-01-15 29


Password Indicates the password of the MRS Manager administrator.The password for MRS 1.5.0:l Must contain 6 to 32 characters.l Must contain at least two types of the following:

– Lowercase letters– Uppercase letters– Digits– Special characters of `~!@#$%^&*()-_=+\|[{}];:'",<.>/?– Spaces

l Must be different from the username.l Must be different from the username written in reverse order.The password for MRS 1.3.0:l Must contain 8 to 64 characters.l Must contain at least four types of the following:

– Lowercase letters– Uppercase letters– Digits– Special characters of `~!@#$%^&*()-_=+\|[{}];:',<.>/?– Spaces

l Must be different from the username.l Must be different from the username written in reverse order.This parameter needs to be configured only when Kerberos


ConfirmPassword

Enter the user password again.This parameter needs to be configured only when Kerberos



2019-01-15 30


Component l MRS 1.5.0 supports the following components:Components of an analysis cluster:– Hadoop 2.7.2: distributed system architecture– Spark 2.1.0: in-memory distributed computing framework– HBase 1.0.2: distributed column store database– Hive 1.2.1: data warehouse framework built on Hadoop– Hue 3.11.0: providing the Hadoop UI capability, which enables

users to analyze and process Hadoop cluster data on browsers– Loader 2.0.0: a tool based on source sqoop 1.99.7, designed for

efficiently transferring bulk data between Apache Hadoop andstructured datastores such as relational databases.Hadoop is mandatory, and Spark and Hive must be used together.Select components based on services.

Components of a streaming cluster:– Kafka 0.10.0.0: distributed message subscription system– Storm 1.0.2: distributed real-time computing system– Flume 1.6.0: a distributed, reliable, and available service for

efficiently collecting, aggregating, and moving large amounts oflog data.

l MRS 1.3.0 supports the following components:Components of an analysis cluster:– Hadoop 2.7.2: distributed system architecture– Spark 1.5.1: in-memory distributed computing framework– HBase 1.0.2: distributed column store database– Hive 1.2.1: data warehouse framework built on Hadoop– Hue 3.11.0: providing the Hadoop UI capability, which enables

users to analyze and process Hadoop cluster data on browsersHadoop is mandatory, and Spark and Hive must be used together.Select components based on services.NOTE

After Kerberos Authentication is set to "Enable": , the Huecomponent can be selected, but the Create Job area is not displayed,indicating that jobs cannot be created.

Components of a streaming cluster:– Kafka 0.10.0.0: distributed message subscription system– Storm 1.0.2: distributed real-time computing system


2019-01-15 31



Create Now You can click Create Now to display create job area page. Then clickCreate to display job configuration information.

Create Later You can add job configuration information later.

Create Job You can click Create to submit a job at the same time when you create acluster. Only one job can be added and its status is Running after acluster is created. For details, see Adding a Jar or Script Job.You can add jobs only after Kerberos Authentication is set to

"Disable": .

Name Name of a job

Type Type of a job

Parameter Key parameters for executing an application

Operation l Edit: modifies job configurations.l Delete: deletes a job.

Step 4 Click Create Now.

Step 5 Confirm cluster specifications and click Submit.

Cluster creation takes some time. While the cluster is being created, its status is Starting.After the cluster is created successfully, the cluster status becomes Running.

Users can create a maximum of 10 clusters at a time and manage a maximum of 100 clusterson the MRS management console.

NOTE

The name of a new cluster can be the same as that of a failed or terminated cluster.

----End

3.4 Managing Active ClustersAfter an MRS cluster is created, you can view basic information and patch information aboutthe clusters and the cluster management page.

3.4.1 Viewing Basic Information About an Active ClusterAfter clusters are created, you can monitor and manage clusters. Choose Cluster > ActiveCluster. Select a cluster and click its name to switch to the cluster information page. Viewinformation about a cluster such as the configurations, deployed nodes, and other basicinformation.

Table 3-13 and Table 3-14 describe the information about cluster configurations and nodes,respectively.


2019-01-15 32

Table 3-13 Cluster configuration information


Cluster ID Unique identifier of a clusterThis parameter is automatically assigned when a cluster is created.

Cluster Name Cluster nameThis parameter is set when a cluster is created.

Key Pair Key pair nameThis parameter is set when a cluster is created.

Cluster Version MRS versionCurrently, MRS 1.3.0 and MRS 1.5.0 are supported.This parameter is set when a cluster is created.



processing components.This parameter is set when a cluster is created.

AZ An availability zone of the working zone in the clusterThis parameter is set when a cluster is created.

VPC VPC informationThis parameter is set when a cluster is created.A VPC is a secure, isolated, and logical network environment.

Subnet Subnet informationThis parameter is set when a cluster is created.A subnet provides dedicated network resources that are isolated fromother networks, improving network security.

Master Node Information about the Master nodeFormat: [instance specification | node quantity]

Core Node Information about a Core nodeFormat: [instance specification | node quantity]

Active MasterNode IPAddress

IP address of the active Master node in a cluster, which is also the IPaddress of the active management node of MRS Manager.

ClusterManager IPAddress

Floating IP address for accessing MRS ManagerThis parameter is displayed only after Kerberos authentication isenabled.

Created Time when MRS starts charging MRS clusters of the customer


2019-01-15 33



Indicates whether to enable Kerberos authentication when logging in toMRS Manager.


Hadoop Version Hadoop version

Spark Version Spark versionOnly a Spark cluster displays this version. Because Spark and Hive mustbe used together, Spark and Hive versions are displayed at the sametime.

HBase Version HBase versionOnly an HBase cluster displays this version.

Hive Version Hive versionOnly a Hive cluster displays this version.

Hue Version Hue versionFor MRS 1.3.0, this parameter is displayed only after Kerberosauthentication is enabled. For MRS 1.5.0, this parameter is displayedwithout the limitation of Kerberos authentication.

Loader Version Loader VersionThis parameter is displayed only when the MRS version is MRS 1.5.0.

Kafka Version Kafka VersionOnly a Kafka cluster displays this version.

Storm Version Storm VersionOnly a Storm cluster displays this version.

Flume Version Flume VersionThis parameter is displayed only when the MRS version is MRS 1.5.0.

Table 3-14 Node information


Add Node For details about adding a Core node to a cluster, see Expanding aCluster.Add Node is unavailable and capacity expansion is not allowed in anyof the following situations:l The cluster is not in the running state.l The number of Core nodes exceeds the maximum value (100).l The cluster charging mode is not on-demand.

Name Name of a cluster node


2019-01-15 34


Status Status of a cluster

Type Node typel Master

A Master node in an MRS cluster manages the cluster, assignsMapReduce executable files to Core nodes, traces the executionstatus of each job, and monitors DataNode running status.

l CoreA Core node in a cluster processes data and stores processed data inHDFS.

IP Address IP address of a cluster node

Specifications Instance specifications of a nodeThis parameter is determined by the CPU, memory, and disks used.NOTE

More advanced instance specifications allow better data processing.

DefaultSecurity Group

Security group name for master and Core nodes, which areautomatically assigned when a cluster is created.This is the default security group. Do not modify or delete the securitygroup because modifying or deleting it will cause a cluster exception.


Button Description

Click to manually refresh the node.

3.4.2 Viewing Patch Information About an Active ClusterYou can view patch information about cluster components. If a cluster component, such asHadoop or Spark, is abnormal, download the patch. Then choose Cluster > Active Cluster.Select a cluster and click its name to switch the cluster information page to upgrade thecomponent to resolve the problem.

The Patch Information is displayed on the basic information page only when patchinformation exists in the database. Patch information contains the following parameters:l Patch Name: patch name set when the patch is uploaded to OBSl Patch Path: location where the patch is stored in OBSl Patch Description: patch description

3.4.3 Entering the Cluster Management PageAfter Kerberos authentication is disabled, you can choose Cluster > Active Cluster, select acluster and click its name to switch to the cluster information page, and then click Cluster


2019-01-15 35

Manager to go to the cluster management page. You can view and handle alarms, modifycluster configurations, and upgrade cluster patches on the page.

You can enter the cluster management page of clusters in the Abnormal, Running,Expanding or Shrinking state only. For details about how to use the cluster managementpage, see MRS Manager Operation Guide.

3.4.4 Expanding a ClusterThe storage and computing capabilities of MRS can be improved by simply adding nodeswithout modifying system architecture. This reduces O&M costs. Core nodes can compute orstore data. You can add Core nodes to expand the node capacities and handle peak loads.

Background

An MRS cluster supports a maximum of 102 nodes. A maximum of 100 Core nodes aresupported by default. If more than 100 Core nodes are required, contact technical supportengineers or invoke a background interface to modify the database.

Only the Core nodes can be expanded. Master nodes cannot be expanded. The maximumnumber of nodes that can be added equals to 100 minus the number of existing Core nodes.For example, if the number of existing Core nodes is 3, a maximum of 97 nodes can be added.If a cluster fails to be expanded, you can perform capacity expansion for the cluster again.

Procedure


Step 2 Choose Cluster > Active Cluster, select a running cluster, and click its name. Select Node.Then click Add Node.

The expansion operation can only be performed on the running clusters.

Step 3 Set Number of Nodes and click OK.

Cluster expansion is explained as follows:

l During expansion: The cluster status is Expanding. The submitted jobs will be executedand you can submit new jobs. You are not allowed to continue to expand, restart, modify,or terminate clusters.

l Successful expansion: The cluster status is Running. The resources used in the old andnew nodes are charged.

l Failed expansion: The cluster status is Running. You are allowed to execute jobs andexpand the clusters again.

----End

3.4.5 Terminating a ClusterIf you do not need an MRS cluster after the job execution is complete, you can terminate theMRS cluster.


2019-01-15 36

Background

If a cluster is terminated before data processing and analysis are completed, data loss mayoccur. Therefore, exercise caution when terminating a cluster. If MRS cluster deploymentfails, the cluster is automatically terminated.

Procedure


Step 2 In the navigation tree of the MRS management console, choose Cluster > Active Cluster.

Step 3 In the Operation column of the cluster that you want to terminate, click Terminate.

The cluster status changes from Running to Terminating. After the cluster is terminated, thecluster status will change to Terminated and will be displayed in Historical Cluster.

----End

3.4.6 Deleting a Failed TaskThis section describes how to delete a failed MRS task.

Background

If cluster creation, termination, or capacity expansion fails, the failed task is displayed on theManage Failed Task page. A failed cluster termination task is also displayed on theHistorical Cluster page. If you do not need the failed task, you can delete it on the ManageFailed Task page.

Procedure


Step 2 In the navigation tree of the MRS management console, choose Cluster > Active Cluster.

Step 3 Click close to Failed Task.

The Manage Failed Task page is displayed.

Step 4 In the Operation column of the task that you want to delete, click Delete.

This operation deletes only a single failed task.

Step 5 You can click Delete All on the upper left of the task list to delete all tasks.

----End

3.4.7 Managing Jobs in an Active ClusterFor details about managing jobs in an active cluster, see Managing Jobs.

3.4.8 Managing Data FilesAfter Kerberos authentication is disabled, you can create directories, delete directories, andimport, export, or delete files on the File Management page.


2019-01-15 37

Background

Data to be processed by MRS is stored in either OBS or HDFS. OBS provides you withmassive, highly reliable, and secure data storage capabilities at a low cost. You can view,manage, and use data through OBS Console or OBS Browser. In addition, you can use theREST APIs to manage or access data. The REST APIs can be used alone or it can beintegrated with service programs.

Before creating jobs, upload the local data to OBS for computing and analysis. MRS allowsdata to be exported from OBS to HDFS for computing and analysis. After the analysis andcomputing are complete, you can either store the data in HDFS or export it to OBS. HDFSand OBS can store compressed data in the format of bz2 or gz.

Importing Data

MRS supports data import from the OBS system to HDFS. This function is recommended ifthe data size is small, because the upload speed reduces as the file size increases.

Both files and folders containing files can be imported. The operations are as follows:


cluster information page.3. Click File Management and go to the File Management tab page.4. Select HDFS File List.5. Click the data storage directory, for example, bd_app1.

bd_app1 is just an example. The storage directory can be any directory on the page. Youcan create a directory by clicking Create Folder.The name of the created directory must meet the following requirements:– Contains a maximum of 255 characters, and the full path contains a maximum of

1023 characters.– Cannot be empty.– Must be different from the names of the directories at same level.– Cannot contain special characters (/:*?"<|>\).– Cannot start or end with a period (.).

6. Click Import Data to configure the paths for HDFS and OBS.

NOTE


– The path for OBSn Must start with s3a://. s3a:// is used by default.n Empty folders cannot be imported.n Directories and file names can contain letters, Chinese characters, digits,



n The full path of OBS contains a maximum of 1023 characters.


2019-01-15 38

– The path for HDFSn Must start with /user.n Directories and file names can contain letters, Chinese characters, digits,




the HDFS path by default when data is imported.7. Click OK.

View the upload progress in File Operation Record. The data import operation isoperated as a Distcp job by MRS. You can check whether the Distcp job is successfullyexecuted in Job Management > Job.

Exporting Data

After data is processed and analyzed, you can either store the data in HDFS or export it to theOBS system.

Both files and folders containing files can be exported. The operations are as follows:


cluster information page.3. Click File Management and go to the File Management tab page.4. Select HDFS File List.5. Click the data storage directory, for example, bd_app1.6. Click Export Data and configure the paths for HDFS and OBS.

NOTE


– The path for OBSn Must start with s3a://. s3a:// is used by default.n Directories and file names can contain letters, Chinese characters, digits,



n The full path of OBS contains a maximum of 1023 characters.– The path for HDFS

n Must start with /user.n Directories and file names can contain letters, Chinese characters, digits,




2019-01-15 39

n The full path of HDFS contains a maximum of 1023 characters.

n The parent HDFS directory in HDFS File List is displayed in the textbox forthe HDFS path by default when data is exported.

NOTE

Ensure that the exported folder is not empty. If an empty folder is exported to the OBS system, thefolder is exported as a file. After the folder is exported, its name is changed, for example, from testto test-$folder$, and its type is file.

7. Click OK.

View the upload progress in File Operation Record. The data export operation isoperated as a Distcp job by MRS. You can check whether the Distcp job is successfullyexecuted in Job Management > Job.

Viewing File Operation Records

When importing or exporting data on the MRS management console, you can choose FileManagement > File Operation Record to view the import or export progress.

Table 3-16 lists the parameters in file operation records.

Table 3-16 Parameters in file operation records


Created Time when data import or export is started

Source Path Source path of datal In data import, Source Path is the OBS path.l In data export, Source Path is the HDFS path.

Target Path Target path of datal In data import, Target Path is the HDFS path.l In data export, Target Path is the OBS path.

Status Status of the data import or export operationl Runningl Completedl Terminatedl Abnormal

Duration (min) Total time used by data import or exportUnit: minute

Result Data import or export resultl Successfull Failed

Operation View Log: You can click View Log to view the real-time log of arunning job. For details, see Viewing Job Configurations and Logs.


2019-01-15 40

3.4.9 Viewing the Alarm ListThe alarm list provides information about all alarms in the MRS cluster. Examples of alarmsinclude host faults, disk usage exceeding the threshold, and component abnormalities.

In Alarm on the MRS management console, you can only view basic information aboutalarms that are not cleared in MRS Manager. If you want to view alarm details or managealarms, log in to MRS Manager. For details, see Alarm Management.

Alarms are listed in chronological order by default in the alarm list, with the most recentalarms displayed at the top.

Table 3-17 describes alarm parameters.

Table 3-17 Alarm parameters


Severity Alarm severityPossible values include:l Criticall Majorl Warningl Minor

Service Name of the service that reports the alarm

Description Alarm description

Generated Alarm generation time


Button Description

In the drop-down list, select an alarm severity to filter alarms.l All: displays all alarms.l Critical: displays Critical alarms.l Major: displays Major alarms.l Warning: displays Warning alarms.l Minor: displays Minor alarms.

Click to manually refresh the alarm list.

3.5 Managing Historical Clusters


2019-01-15 41

3.5.1 Viewing Basic Information About a Historical ClusterTo view historical clusters, choose Cluster > Historical Cluster. Select a cluster and click itsname to switch to the cluster information page. You can view the configurations, deployednodes, and other basic information.

Table 3-19 and Table 3-20 describe the information about cluster configurations and nodes,respectively.

Table 3-19 Cluster configuration information


Cluster ID Unique identifier of a clusterThis parameter is automatically assigned when a cluster is created.

Cluster Name Cluster nameThis parameter is set when a cluster is created.

Key Pair Key pair nameThis parameter is set when a cluster is created.

Cluster Version MRS versionCurrently, MRS 1.3.0 and MRS 1.5.0 are supported.This parameter is set when a cluster is created.



processing components.This parameter is set when a cluster is created.

AZ An availability zone of the working zone in the clusterThis parameter is set when a cluster is created.

VPC VPC informationThis parameter is set when a cluster is created.A VPC is a secure, isolated, and logical network environment.

Subnet Subnet informationThis parameter is set when a cluster is created.A subnet provides dedicated network resources that are isolated fromother networks, improving network security.

Master Node Information about the Master nodeFormat: [instance specification | node quantity]

Core Node Information about a Core nodeFormat: [instance specification | node quantity]


2019-01-15 42


Active MasterNode IPAddress

IP address of the active Master node in a cluster, which is also the IPaddress of the active management node of MRS Manager.

ClusterManager IPAddress

Floating IP address for accessing MRS ManagerThis parameter is displayed only after Kerberos authentication isenabled.

Created Time when MRS starts charging MRS clusters of the customer


Indicates whether to enable Kerberos authentication when logging in toMRS Manager.


Hadoop Version Hadoop version

Spark Version Spark versionOnly a Spark cluster displays this version. Because Spark and Hive mustbe used together, Spark and Hive versions are displayed at the sametime.

HBase Version HBase versionOnly an HBase cluster displays this version.

Hive Version Hive versionOnly a Hive cluster displays this version.

Hue Version Hue versionFor MRS 1.3.0, this parameter is displayed only after Kerberosauthentication is enabled. For MRS 1.5.0, this parameter is displayedwithout the limitation of Kerberos authentication.

Loader Version Loader VersionThis parameter is displayed only when the MRS version is MRS 1.5.0.

Kafka Version Kafka VersionOnly a Kafka cluster displays this version.

Storm Version Storm VersionOnly a Storm cluster displays this version.

Flume Version Flume VersionThis parameter is displayed only when the MRS version is MRS 1.5.0.


2019-01-15 43

Table 3-20 Node information


Add Node For details about adding a Core node to a cluster, see Expanding aCluster.Add Node is unavailable and capacity expansion is not allowed in anyof the following situations:l The cluster is not in the running state.l The number of Core nodes exceeds the maximum value (100).l The cluster charging mode is not on-demand.

Name Name of a cluster node

Status Status of a cluster

Type Node typel Master

A Master node in an MRS cluster manages the cluster, assignsMapReduce executable files to Core nodes, traces the executionstatus of each job, and monitors DataNode running status.

l CoreA Core node in a cluster processes data and stores processed data inHDFS.

IP Address IP address of a cluster node

Specifications Instance specifications of a nodeThis parameter is determined by the CPU, memory, and disks used.NOTE

More advanced instance specifications allow better data processing.

DefaultSecurity Group

Security group name for master and Core nodes, which areautomatically assigned when a cluster is created.This is the default security group. Do not modify or delete the securitygroup because modifying or deleting it will cause a cluster exception.


Button Description

Click to manually refresh the node.

3.5.2 Viewing Job Configurations in a Historical ClusterOn the Historical Cluster page, users can query only clusters in the Failed or Terminatedstate and their job information.


2019-01-15 44


Step 2 Choose Cluster > Historical Cluster, select a cluster, and click its name to switch to thecluster information page.

Step 3 Select Job Management.

Step 4 In the Operation column corresponding to the selected job, click View.

In the View Job Information window that is displayed, configuration of the selected job isdisplayed.

----End

3.6 Managing Jobs

3.6.1 Introduction to JobsA job is an executable program provided by MRS to process and analyze user data. All addedjobs are displayed in Job Management, where you can add, query, and manage jobs.

Job TypesAn MRS cluster allows you to create and manage the following jobs:

l MapReduce: provides the capability to process massive data quickly and in parallel. It isa distributed data processing mode and execution environment. MRS supports thesubmission of the MapReduce Jar program.

l Spark: functions as a distributed computing framework based on memory. MRS supportsthe submission of Spark, Spark Script, and Spark SQL jobs.– Spark: submits the Spark program, executes the Spark application, and computes

and processes user data.– Spark Script: submits the Spark Script script and batch executes Spark SQL

statements.– Spark SQL: uses Spark SQL statements (similar to SQL statements) to query and

analyze user data in real time.l Hive: functions as an open-source data warehouse constructed on Hadoop. MRS

supports the submission of the Hive Script script and batch executes HiveQL statements.

If you fail to create a job in a Running cluster, check the component health status on thecluster management page. For details, see Viewing the System Overview.

Job ListJobs are listed in chronological order by default in the job list, with the most recent jobsdisplayed at the top. Table 3-22 describes parameters of the job list.


2019-01-15 45

Table 3-22 Parameters of the job list


Name Job nameThis parameter is set when a job is added.

ID Unique identifier of a jobThis parameter is automatically assigned when a job is added.

Type Job typePossible types include:l Distcp (data import and export)l MapReducel Sparkl Spark Scriptl Spark SQLl Hive Script

NOTEAfter you import or export data on the File Management page, you can viewthe Distcp job on the Job Management page.

Status Job statusl Runningl Completedl Terminatedl AbnormalNOTE


Result Execution result of a jobl Successfull FailedNOTE

You cannot execute a successful or failed job, but can add or copy the job. Aftersetting job parameters, you can submit the job again.

Created Time when a job starts

Duration (min) Duration of executing a job, specifically from the time when a job isstarted to the time when the job is completed or stopped.Unit: minute


2019-01-15 46


Operation l View Log: You can click View Log to view the real-time log of arunning job. For details, see Viewing Job Configurations and Logs.

l View: You can click View to view job details. For details, seeViewing Job Configurations and Logs.

l More– Stop: You can click Stop to stop a running job. For details, see

Stopping Jobs.– Copy: You can click Copy to copy and add a job. For details, see

Replicating Jobs.– Delete: You can click Delete to delete a job. For details, see

Deleting Jobs.NOTE

l Spark SQL jobs cannot be stopped.

l Deleted jobs cannot be recovered. Therefore, exercise caution whendeleting a job.

l If you configure the system to save job logs to an HDFS or OBS path, thesystem compresses the logs and saves them to the specified path after jobexecution is complete. In this case, the job remains in the Running stateafter execution is complete and changes to the Completed state after thelogs are successfully saved. The time required for saving the logs dependson the log size. The process generally takes a few minutes.


Button Description

In the drop-down list, select a job state to filter jobs.l All (Num): displays all jobs.l Completed (Num): displays jobs in the Completed state.l Running (Num): displays jobs in the Running state.l Terminated (Num): displays jobs in the Terminated state.l Abnormal (Num): displays jobs in the Abnormal state.

Enter a job name in the search bar and click to search for a job.

Click to manually refresh the job list.

3.6.2 Adding a Jar or Script JobYou can submit developed programs to MRS, execute them, and obtain the execution result.This section describes how to create a job.


2019-01-15 47

PrerequisitesYou have completed the procedure described in Background.

Procedure


Step 2 Choose Cluster > Active Cluster, select a running cluster, and click its name to switch to thecluster information page.


Step 4 On the Job tab page, click Create and go to the Create Job page.




Type Job typePossible types include:l MapReducel Sparkl Spark Scriptl Hive ScriptNOTE

To add jobs of the Spark and Hive types, you need to select Spark and Hivecomponents when creating a cluster and the cluster must be in the running state.Spark Script jobs support Spark SQL only, and Spark supports Spark Core andSpark SQL.




2019-01-15 48








l Spark Script must end with .sql; MapReduce and Spark must endwith .jar. sql and jar are case-insensitive.









2019-01-15 49





NOTE

l The OBS path supports s3a://, and s3a:// is used by default.l The full path of HDFS and OBS contains a maximum of 1023 characters.

Step 5 Confirm job configuration information and click OK.

After jobs are added, you can manage them.

NOTE


----End

3.6.3 Submitting a Spark SQL StatementThis section describes how to use Spark SQL. You can submit a Spark SQL statement toquery and analyze data on the MRS management console page. To submit multiplestatements, separate them from each other using semicolons (;).

Procedure




Step 4 Select Spark SQL. The Spark SQL job page is displayed.

Step 5 Enter the Spark SQL statement for table creation.

When entering Spark SQL statements, ensure that they have no more than 10,000 characters.

Syntax:

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name [(col_name data_type[COMMENT col_comment], ...)] [COMMENT table_comment] [PARTITIONED BY(col_name data_type [COMMENT col_comment], ...)] [CLUSTERED BY (col_name,col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS][ROW FORMAT row_format] [STORED AS file_format] [LOCATION hdfs_-path];


2019-01-15 50

Use either of the following methods to create a table:

l Method 1: Create an src_data table and write data in every row. The data is stored inthe /user/guest/input directory.create external table src_data(line string) row format delimited fields terminated by '\\n' stored as text file location '/user/guest/input/';

l Method 2: Create an src_data table and load the data to the src_dada1 table.create table src_data1 (eid int, name String, salary String, destination String) rowformat delimited fields terminated by ',' ;load data in path '/tttt/test.txt' into table src_data1;

NOTE

The data from OBS cannot be loaded to the created tables in method 2.

Step 6 Enter the Spark SQL statement for table query.

Syntax:

SELECT col_name FROM table_name;

Example:

select * from src_data;

Step 7 Enter the Spark SQL statement for table deletion.

Syntax:

DROP TABLE [IF EXISTS] table_name;

Example:

drop table src_data;

Step 8 Click Check to check the statement correctness.

Step 9 Click Submit.

After submitting Spark SQL statements, you can check whether the execution is successful inLast Execution Result and view detailed execution results in Last Query Result Set.

----End

3.6.4 Viewing Job Configurations and LogsThis section describes how to view job configurations and logs.

Backgroundl You can view configurations of all jobs.l For clusters created on MRS of a version earlier than 1.0.7, logs of completed jobs in the

clusters cannot be viewed. For clusters created on MRS 1.0.7 or later, logs of all jobs canbe viewed.

Procedure



2019-01-15 51


Step 3 Click Job Management.

Step 4 In the Operation column corresponding to the selected job, click View.

In the View Job Information window that is displayed, configuration of the selected job isdisplayed.

Step 5 Select a MapReduce job, and click View Log in the Operation column corresponding to theselected job.

In the page that is displayed, log information of the selected job is displayed.

The MapReduce job is only an example. You can view log information about MapReduce,Spark, Spark Script, and Hive Script jobs regardless of their status.

Each tenant can submit 10 jobs and query 10 logs concurrently.

----End

3.6.5 Stopping JobsThis section describes how to stop running MRS jobs.

Background

Spark SQL jobs cannot be stopped. After a job is stopped, its status changes to Terminated,and it cannot be executed again.

Procedure




Step 4 Select a running job and choose More > Stop in the Operation column corresponding to theselected job.

The job status changes from Running to Terminated.

NOTE

When you submit a job on the Spark SQL page, you can click Cancel to stop the job.

----End

3.6.6 Replicating JobsThis section describes how to replicate MRS jobs.

Background

Currently, all types of jobs except for Spark SQL and Distcp jobs can be replicated.


2019-01-15 52

Procedure




Step 4 In the Operation column corresponding to the to-be-replicated job, choose More > Copy.

The Copy Job dialog box is displayed.

Step 5 Set job parameters, and click OK.


After being successfully submitted, a job changes to the Running state by default. You do notneed to manually execute the job.



Type Job typePossible types include:l MapReducel Sparkl Spark Scriptl Hive ScriptNOTE

To add jobs of the Spark and Hive types, you need to select Spark and Hivecomponents when creating a cluster and the cluster must be in the running state.Spark Script jobs support Spark SQL only, and Spark supports Spark Core andSpark SQL.




2019-01-15 53








l Spark Script must end with .sql; MapReduce and Spark must endwith .jar. sql and jar are case-insensitive.









2019-01-15 54





----End

3.6.7 Deleting JobsThis section describes how to delete MRS jobs.

Background

Jobs can be deleted one after another or in batches. The deletion operation is irreversible.Exercise caution when performing this operation.

Procedure




Step 4 In the Operation column corresponding to the selected job, choose More > Delete.

This operation deletes only a single job.

Step 5 You can select multiple jobs and click Delete on the upper left of the job list.

This operation deletes multiple jobs at a time.

----End

3.7 Querying Operation LogsThe Operation Log page records cluster and job operations. Logs are typically used toquickly locate faults in case of cluster exceptions, helping you resolve problems.

Operation Types

Currently, only after Kerberos authentication is disabled, two types of operations are recordedin the logs. You can filter and search for a desired type of operations.


2019-01-15 55

l Cluster: Creating, terminating, shrinking, and expanding a clusterl Job: Creating, stopping, and deleting a job

Log Parameters

Logs are listed in chronological order by default in the log list, with the most recent logsdisplayed at the top.

Table 3-26 describes parameters in logs.

Table 3-26 Description of parameters in logs


Operation Type Operation typePossible types include:l Clusterl Job

IP Address IP address where an operation is executedNOTE

If MRS cluster deployment fails, the cluster is automatically terminated, and theoperation log of the terminated cluster does not contain the user's IP Addressinformation.

OperationDetails

Operation contentThe content can contain a maximum of 2048 characters.

Operation Time Operation timeFor terminated clusters, only those terminated within the last six monthsare displayed. If you want to view clusters terminated six months ago,contact technical support engineers.


Button Description

In the drop-down list, select an operation type to filter logs.l All: displays all logs.l Cluster: displays logs of Cluster.l Job: displays logs of Job.

Filter logs by time.1. Click the button.2. Specify the date and time.3. Click OK.Enter the query start time in the left box and end time in the right box.The end time must be later than the start time. Otherwise, logs cannot befiltered by time.


2019-01-15 56

Button Description

Enter key words in Operation Details and click to search for logs.

Click to manually refresh the log list.


2019-01-15 57

4 Remote Operation Guide

4.1 OverviewThis section describes remote login, MRS cluster node types, and node functions.

MRS cluster nodes support remote login. The following remote login methods are available:l GUI login: Use the remote login function provided by the ECS management console to

log in to the Linux interface of the Master node.l SSH login: Applies to Linux ECSs only. You can use a remote login tool (such as

PuTTY) to log in to an ECS. To use this method, you must assign an elastic IP address(EIP) to the ECS.For details about applying for and binding an elastic IP address for Master nodes, see"Assigning an EIP and Binding It to an ECS" under the "Management" section in theVPC User Guide.You can log in to a Linux ECS using either a key pair or password. To use a key pair,you must log in to the Linux ECS as user linux. For the login procedure, see Logging Into a Linux ECS Using a Key Pair (SSH). If you use a password, see Logging In to aLinux ECS Using a Password (SSH).

In an MRS cluster, a node is an ECS. Table 4-1 describes node types and functions.

MapReduce ServiceUser Guide 4 Remote Operation Guide

2019-01-15 58

Table 4-1 Cluster node types

Node Type Function

Master node Management node of an MRS cluster. It manages and monitorsthe cluster.In the navigation tree of the MRS management console, chooseCluster > Active Cluster, select a running cluster, and click itsname to switch to the cluster information page. On the Nodetab page, view the Name. The node that contains master1 inits name is the Master1 node. The node that contains master2in its name is the Master2 node.You can log in to a Master node either using VNC on the ECSmanagement console or using SSH. After logging in to theMaster node, you can access Core nodes without enteringpasswords.The system automatically deploys the Master nodes in active/standby mode and supports the high availability (HA) featurefor MRS cluster management. If the active management nodefails, the standby management node switches to the active stateand takes over services.To determine whether the Master1 node is the activemanagement node, see Viewing Active and Standby Nodes.

Core node Working node of an MRS cluster. It processes and analyzesdata and stores process data on HDFS.

4.2 Logging In to a Master Node

4.2.1 Logging In to an ECS Using VNCThis section describes how to log in to an ECS using VNC on the ECS management console.This login method is mainly used for emergent O&M. In other scenarios, it is recommendedthat you log in to the ECS using SSH.

Login Notes

If no default image password is set when Cloud-Init is installed, you must log in to the ECSby following the instructions provided in section Logging In to a Linux ECS Using a KeyPair (SSH). After logging in using SSH, you can set the ECS login password. For detailsabout other login notes, see "Logging In to an ECS Using VNC" in the ECS User Guide(Getting Started > Logging In to an ECS > Logging In to an ECS Using VNC).

Logging In to an ECS




2019-01-15 59

Step 3 On the basic information page, query the IP addresses of Master1 and Master2 nodes.

Step 4 Log in to the ECS management console. Choose IP Address from the drop-down list of thesearch box on the right.

Step 5 Enter the IP address of Master1 or Master2 and click .

Step 6 In the searched ECS, click Remote Login in Operation.

For details about remote login to an ECS, see "Logging In to an ECS Using VNC" in the ECSUser Guide (Getting Started > Logging In to an ECS > Logging In to an ECS UsingVNC).

----End

Changing the OS Keyboard Language

All nodes in the MRS cluster use the Linux OS. For details about changing the OS keyboardlanguage, see "Logging In to an ECS Using VNC" in the ECS User Guide (Getting Started >Logging In to an ECS > Logging In to an ECS Using VNC).

4.2.2 Logging In to a Linux ECS Using a Key Pair (SSH)This section describes how to log in to a Linux ECS using a key pair.

For details about logging in to a Linux ECS using a key pair, see "Logging In to a Linux ECSUsing a Key Pair (SSH)" in the ECS User Guide (Getting Started > Logging In to an ECS> Logging In to a Linux ECS Using a Key Pair (SSH)).

4.2.3 Logging In to a Linux ECS Using a Password (SSH)Logging in to a Linux ECS in SSH password authentication mode is disabled by default. Ifyou require the password authentication mode, configure it after logging in to the ECS. Toensure system security, you must reset the common user password for logging in to the LinuxECS after configuring the SSH password authentication mode.

All nodes in the MRS cluster use the Linux OS. For details about logging in to a Linux ECSusing a password, see "Logging In to a Linux ECS Using a Password (SSH)" in the ECS UserGuide (Getting Started > Logging In to an ECS > Logging In to a Linux ECS Using aPassword (SSH)).

4.3 Viewing Active and Standby NodesThis section describes how to confirm the active and standby management nodes of MRSManager.

Background

You can log in to other nodes in a cluster from the Master node. After logging in to the Masternode, you can confirm the active and standby management nodes of MRS Manager and runcommands on the corresponding management nodes.


2019-01-15 60

Procedure

Step 1 Log in to the MRS management console and view information of the specified cluster.

Step 2 View the Active Master Node IP Address.

Active Master Node IP Address is the IP address of the active Master node in a cluster,which is also the IP address of the active management node of MRS Manager.

----End

4.4 Client Management

4.4.1 Updating the Client

ScenarioAn MRS cluster provides a client for users to connect to servers, query task results, andmanage data. Before using the MRS client or modifying service configuration parameters andrestarting the services on MRS Manager, users must prepare the client configuration file andupdate the client.

During cluster creation, the original client is installed and saved in the /opt/client directory onall nodes in the cluster by default. After the cluster is created, only the client on Master nodescan be used directly, and the client on Core nodes must be updated before being used.

Procedure

Step 1 Log in to MRS Manager.

Step 2 Click Service, and click Download Client.

Set Client Type to Only configuration files, set Download Path to Server, and click OK togenerate the client configuration file. The generated file is saved in the /tmp/MRS-clientdirectory on the active management node by default. You can modify the file save path asrequired.

Step 3 On the MRS management console, click Cluster > Active Cluster.

Step 4 In the cluster list, click the specified cluster name and view the Active Master Node IPAddress.

Active Master Node IP Address is the IP address of the active Master node in a cluster,which is also the IP address of the active management node of MRS Manager.

Step 5 Locate the active management node based on the IP address and log in to the activemanagement node as user linux using VNC. For details, see Logging In to an ECS UsingVNC.

The Master node supports Cloud-init. The preset username and password for Cloud-init islinux and cloud.1234. If you have changed the password, log in to the node using the newpassword. It is recommended that you change the password upon the first login.

Step 6 Run the following command to switch the user:

sudo su - omm


2019-01-15 61

Step 7 Run the following command to go to the client directory:

cd /opt/client

Step 8 Run the following command to update the client configuration:

sh refreshConfig.sh Client installation directory Full path of the client configuration filepackage

For example:

sh refreshConfig.sh /opt/client /tmp/MRS-client/MRS_Services_Client.tar

If the following information is displayed, the configuration is updated successfully.

ReFresh components client config is complete.Succeed to refresh components client config.

----End

4.4.2 Using the Client on a Cluster Node

Scenario

After the client is updated, users can use the client on a Master node or a Core node in thecluster.

Prerequisites

The client has been updated on the active management node.

Procedurel Using the client on a Master node:

a. On the active management node where the client is updated, that is, a Master node,run the sudo su - omm command to switch the user. Run the following command togo to the client directory:cd /opt/client

b. Run the following command to configure the environment variable:source bigdata_env

c. If the Kerberos authentication is enabled for the current cluster, run the followingcommand to authenticate users. If the Kerberos authentication is disabled for thecurrent cluster, skip this step.kinit MRS cluster userFor example, kinit admin.

d. Run a component client command.For example, run hdfs dfs -ls / to view files in the HDFS root directory.

l Using the client on a Core node

a. Update the client on the active management node.b. Locate the active management node based on the IP address and log in to the active

management node as user linux using VNC. For details, see Logging In to an ECSUsing VNC.


2019-01-15 62

c. On the active management node, run the following command to switch the user:

sudo su - omm

d. On the MRS management console, view IP Address in the Node page of thespecified cluster.

Record the IP address of the Core node on which the client is to be used.

e. On the active management node, run the following command to copy the packageto a Core node:

scp -p /tmp/MRS-client/MRS_Services_Client.tar IP address of the Corenode:///opt/client

f. Log in to the Core node as user linux. For details, see Logging In to an ECS UsingVNC.

The Master node supports Cloud-init. The preset username and password for Cloud-init is linux and cloud.1234. If you have changed the password, log in to the nodeusing the new password. It is recommended that you change the password upon thefirst login.

g. On the Core node, run the following command to switch the user:

sudo su - omm

h. Run the following command to update the client configuration:

sh /opt/client/refreshConfig.sh Client installation directory Full path of the clientconfiguration file package

For example:

sh /opt/client/refreshConfig.sh /opt/client /opt/client/MRS_Services_Client.tar

i. Run the following commands to go to the client directory and configure theenvironment variable:

cd /opt/client

source bigdata_env

j. If the Kerberos authentication is enabled for the current cluster, run the followingcommand to authenticate users. If the Kerberos authentication is disabled for thecurrent cluster, skip this step.

kinit MRS cluster user

For example, kinit admin.

k. Run a component client command.

For example, run hdfs dfs -ls / to view files in the HDFS root directory.

4.4.3 Using the Client on Another Node of a VPC

Scenario

After the client is prepared, users can use the client on a node outside the MRS cluster.

NOTE

If the client has been installed on the node outside the MRS cluster but must be updated, update theclient using the same account that is used to install the client, for example, the root account.


2019-01-15 63

Prerequisitesl An ECS has been prepared. For details about the OS and its version of the ECS, see

Table 4-2.

Table 4-2 Reference list

OS Supported Version

SuSE l Recommended: SUSE Linux Enterprise Server 11 SP4 (SUSE11.4)

l Available: SUSE Linux Enterprise Server 11 SP3 (SUSE 11.3)l Available: SUSE Linux Enterprise Server 11 SP1 (SUSE 11.1)l Available: SUSE Linux Enterprise Server 11 SP2 (SUSE 11.2)

RedHat l Recommended: Red Hat-6.6-x86_64 (Red Hat 6.6)l Available: Red Hat-6.4-x86_64 (Red Hat 6.4)l Available: Red Hat-6.5-x86_64 (Red Hat 6.5)l Available: Red Hat-6.7-x86_64 (Red Hat 6.7)

CentOS l Available: CentOS-6.4 (CentOS 6.4)l Available: CentOS-6.5 (CentOS 6.5)l Available: CentOS-6.6 (CentOS 6.6)l Available: CentOS-6.7 (CentOS 6.7)l Available: CentOS-7.2 (CentOS 7.2)

For example, a user can select the enterprise imageEnterprise_SLES11_SP4_latest(4GB) or standard imageStandard_CentOS_7.2_latest(4GB) to prepare the OS for an ECS.In addition, sufficient disk space is allocated for the ECS, for example, 40GB.

l The ECS and the MRS cluster are in the same VPC.l The IP address configured for the NIC of the ECS is in the same network segment as the

IP address of the MRS cluster.l The security group of the ECS is the same as that of the Master node of the MRS cluster.

If this requirement is not met, modify the ECS security group or configure the inboundand outbound rules of the ECS security group to allow the ECS security group to beaccessed by all security groups of MRS cluster nodes.For details about how to create an ECS that meets this requirement, see "Creating anECS" under the "Getting Started" chapter in the Elastic Cloud Server User Guide.

l To enable users to log in to a Linux ECS using a password (SSH), see "Logging In to aLinux ECS Using a Password (SSH)" in the Elastic Cloud Server User Guide (GettingStarted > Logging In to an ECS > Logging In to a Linux ECS Using a Password(SSH)).


2019-01-15 64

Procedure

Step 1 Create an ECS that meets the requirements in the prerequisites.


Step 3 Click Service, and click Download Client.

Step 4 In Client Type, select All client files.

Step 5 In Download Path, select Remote host.

Step 6 Set Host IP Address to the IP address of the ECS, set Host Port to 22, and set Save Path to /home/linux.l If the default port 22 for logging in to an ECS using SSH has been changed, set Host

Port to the new port.l Save Path contains a maximum of 256 characters.

Step 7 Set Login User to linux.

If other users are used, ensure that the users have read, write, and execute permission on thesave path.

Step 8 In SSH Private Key, select and upload the private key used for creating the cluster.

Step 9 Click OK to start downloading the client to the ECS.

If the following information is displayed, the client package is successfully saved. ClickClose.

Client files downloaded to the remote host successfully.

Step 10 Log in to the ECS using VNC. See "Logging In to a Linux ECS Using VNC" in the ElasticCloud Server User Guide (Getting Started > Logging In to an ECS > Logging In to aLinux ECS Using VNC).

All standard (Standard_xxx) and enterprise (Enterprise_xxx) images support Cloud-Init. Thepreset username and password for Cloud-Init is linux and cloud.1234, respectively. If youhave changed the password, log in to the ECS using the new password. See "How Do I Log Into an ECS Once All Images Support Cloud-Init?" in the Elastic Cloud Server User Guide(FAQs > Login FAQs > How Do I Log In to an ECS Once All Images Support Cloud-Init?). It is recommended that you change the password upon the first login.

Step 11 On the ECS, switch to user root and copy the installation package to the /opt directory.

sudo su - root

cp /home/linux/MRS_Services_Client.tar /opt

Step 12 Run the following command in the /opt directory to decompress the package and obtain theverification file and the configuration package of the client:

tar -xvf MRS_Services_Client.tar

Step 13 Run the following command to verify the configuration package of the client:

sha256sum -c MRS_Services_ClientConfig.tar.sha256

The command output is as follows:

MRS_Services_ClientConfig.tar: OK


2019-01-15 65

Step 14 Run the following command to decompress MRS_Services_ClientConfig.tar:

tar -xvf MRS_Services_ClientConfig.tar

Step 15 Run the following command to install the client to a new directory, for example, /opt/hadoopclient. A directory is automatically generated during installation.

sh /opt/MRS_Services_ClientConfig/install.sh /opt/hadoopclient

If the following information is displayed, the client is successfully installed:

Components client installation is complete.

Step 16 Check whether the IP address of the ECS node is connected to the IP address of the clusterMaster node.

For example, run the following command: ping Master node IP address.

l If yes, go to Step 17.l If no, check whether the VPC and security group are correct and whether the ECS and

the MRS cluster are in the same VPC and security group, and go to Step 17.

Step 17 Run the following command to configure the environment variable:

source /opt/hadoopclient/bigdata_env

Step 18 If the Kerberos authentication is enabled for the current cluster, run the following command toauthenticate users. If the Kerberos authentication is disabled for the current cluster, skip thisstep.

kinit MRS cluster user

For example, kinit admin.

Step 19 Run the client command of the component.

For example, run the following command to query the HDFS directory.

hdfs dfs -ls /

----End


2019-01-15 66

5 MRS Manager Operation Guide

5.1 MRS Manager Introduction

OverviewThe MRS manages and analyzes massive data and helps users rapidly obtain desired datafrom both structured and unstructured data. However, structures of open source componentsare complicated and component installation, configuration, and management are time-consuming and labor-intensive.

MRS Manager provides a unified enterprise-level platform for managing big data clusters. Itprovides the following functions:

l Cluster monitoringEnables you to quickly know the health status of hosts and services.

l Graphical indicator monitoring and customizationEnables you to obtain key system information in time.

l Service property configurationMeets your service performance requirements.

l Cluster, service, and role instance operationsEnables you to start or stop services and clusters with just one click.

MRS Manager InterfaceMRS Manager provides a unified cluster management platform to help users rapidly run andmaintain clusters.

Table 5-1 describes the functions of operation entries.

MapReduce ServiceUser Guide 5 MRS Manager Operation Guide

2019-01-15 67

Table 5-1 Function description of MRS Manager operation entries

Operation Entry Function Description

Dashboard Shows the status and key monitoring indicators of all services, aswell as the host status, in histograms, line charts, and tables. Userscan customize a dashboard for the key monitoring indicators anddrag it to any position on the interface. Data can be automaticallyupdated on the dashboard.

Service Provides service monitoring, operation guidance, and configuration,which help users manage services in a unified manner.

Host Provides host monitoring and operation guidance to help usersmanage hosts in a unified manner.

Alarm Provides alarm query and guidance to clear alarms, which enablesusers to quickly identify product faults and potential risks, ensuringproper system running.

Audit Queries and exports audit logs to help users know all users'activities and operations.

Tenant Provides a unified tenant management platform.

System Enables users to manage monitoring and alarm configurations aswell as backup.

On the page of a subfunction of System, you can use the System shortcut menu to go toanother subfunction page.

l Table 5-2 describes the System shortcut menu of a common cluster.l Table 5-3 describes the System shortcut menu of a security cluster.

The following describes how to use the System shortcut menu to go to a function page.

Step 1 On MRS Manager, click System.

Step 2 On the System page, click a link of any function to go to the function page.

For example, in the Backup and Restoration area, click Back Up Data to go to the Back UpData page.

Step 3 Move the cursor to the left boundary of the browser window. The black System shortcutmenu is unfolded. After you move the cursor away, the shortcut menu will be folded.

Step 4 On the shortcut menu, click a function link to go to the function page.

For example, choose Maintenance > Export Log. The Export Log page is displayed.

----End


2019-01-15 68

Table 5-2 System shortcut menu of a common cluster

Submenu Function Link

Backup and Restoration Back Up Data

Restore Data

Maintenance Export Log

Export Audit Log

Check Health Status

Monitoring and Alarm Configure Syslog

Configure Alarm Threshold

Configure SNMP

Configure Monitoring Metric Dump

Configure Resource Contribution Ranking

Resource Configure Static Service Pool

Permission Change OMS Database Password

Patch Manage Patch

Table 5-3 System shortcut menu of a security cluster


Backup and Restoration Back Up Data

Restore Data

Maintenance Export Log

Export Audit Log

Check Health Status

Monitoring and Alarm Configure Syslog

Configure Alarm Threshold

Configure SNMP

Configure Monitoring Metric Dump

Configure Resource Contribution Ranking

Permission Manage User

Manage User Group

Manage Role


2019-01-15 69


Configure Password Policy

Change OMS Database Password

Patch Manage Patch

Reference Information

MRS is a data analysis service on the public cloud. It is used for management and analysis ofmassive data.

MRS uses the MRS Manager portal to manage big data components, for example,components in the Hadoop ecosystem. Table 5-4 details the differences between MRS on thepublic cloud and on the MRS Manager portal.

Table 5-4 Differences

Concept MRS on the Public Cloud MRS Manager

MapReduce Service Indicates the data analysis serviceon the public cloud. This serviceincludes components such asHive, Spark, Yarn, HDFS, andZooKeeper.

Indicates the MapReducecomponent in the Hadoopecosystem.

5.2 Accessing MRS Manager

Scenario

MRS Manager supports MRS cluster monitoring, configuration, and management. You canopen the Manager page on the MRS Console page.

For clusters with Kerberos authentication disabled, you can open MRS Manager on MRSConsole. For clusters with Kerberos authentication enabled, see Accessing MRS ManagerSupporting Kerberos Authentication to learn how to access MRS Manager.

Procedure

Step 1 Log in to the Management Console of the public cloud, and click MapReduce Service.

Step 2 Click Cluster. In the Active Cluster list, click the specified cluster name to switch to thecluster information page.

Step 3 Click Cluster Manager to open MRS Manager. If you access MRS Manager aftersuccessfully logging in to the MRS console, you do not need to enter the password againbecause user admin is used for login by default.

----End


2019-01-15 70

5.3 Accessing MRS Manager Supporting KerberosAuthentication

ScenarioAfter users create MRS clusters that support Kerberos authentication, they can managerunning clusters on MRS Manager.

This section describes how to prepare a work environment on the public cloud platform foraccessing MRS Manager.

Impact on the SystemSite trust must be added to the browser when you access MRS Manager for the first time.Otherwise, MRS Manager cannot be accessed.

PrerequisitesYou have obtained the password of user admin. The password of user admin is specified bythe user during MRS cluster creation.

Procedure

Step 1 On the MRS management console, click Cluster.

Step 2 In the Active Cluster list, click the specified cluster name.

Record AZ, VPC, and Cluster Manager IP Address of the cluster, and Default SecurityGroup of the Master node.

Step 3 On the ECS management console, create a new ECS.l Ensure that AZ, VPC, and Security Group of the ECS are the same as those of the

cluster to be accessed.l Select a Windows public image. For example, select the

Enterprise_Windows_STD_2012R2_20170316-0(80GB) enterprise image.l For details about other parameter configurations, see .

NOTE

If the security group of the ECS is different from Default Security Group of the Master node, you canmodify the configuration using either of the following methods:

l Change the security group of the ECS to the default security group of the Master node. For details,see Changing the Security Group in Elastic Cloud Server User Guide > Management >Modifying ECS Specifications.

l Add two security group rules to the security groups of the Master node and Core node to ensure thatthe ECS can access the cluster and set the protocol to TCP. Set Port Range of the two rules to28443 and 20009, respectively. For details, see Virtual Private Cloud User Guide > Security >Security Group > Adding a Security Group Rule.

Step 4 On the VPC management console, apply for an EIP and bind it to the ECS.

See the Virtual Private Cloud User Guide > Network Components > EIP > Assigning anEIP and Binding It to an ECS).


2019-01-15 71

Step 5 Log in to the ECS.

The account, password, EIP, and security group configuration rules of the Windows systemare required for logging in to the ECS. For details about how to log in to the ECS, see ElasticCloud Server User Guide > Getting Started > Logging In to an ECS > Logging In to aWindows ECS Using a Password (MSTSC).

Step 6 On the Windows remote desktop, use your browser to access MRS Manager.

For example, you can use Internet Explorer 11 in the Windows 2012 OS.

In the browser address bar, enter https://Cluster Manager IP Address:28443/web. Enter thename and password of the MRS cluster user, for example, user admin.

NOTE

l If you access MRS Manager with other MRS cluster usernames, change the password upon yourfirst access. The new password must meet the requirements of the current password complexitypolicies. For details, contact the administrator.

l By default, a user is locked after inputting an incorrect password five consecutive times. The user isautomatically unlocked after 5 minutes.

Step 7 If you want to exit MRS Manager, move the cursor to in the upper-right corner and clickLog Out.

----End

Related Operation

Configuring mapping between node names and IP addresses

Step 1 Log in to MRS Manager and click Host.

Record the Host Name and OM IP Address of all nodes in a cluster.

Step 2 In the work environment, use Notepad to open the hosts file and add the mappingrelationship between node names and IP addresses to the file.

Each mapping relationship occupies an independent line. The following is an example:

192.168.4.127 node-core-Jh3ER192.168.4.225 node-master2-PaWVE192.168.4.19 node-core-mtZ81192.168.4.33 node-master1-zbYN8192.168.4.233 node-core-7KoGY

Save the configurations and exit.

----End

5.4 Viewing Running Tasks in a Cluster

Scenario

After you trigger a running task on MRS Manager, the task running process and progress aredisplayed. After the task window is closed, you need to use the task management function toopen the task window.


2019-01-15 72

By default, MRS Manager keeps the records of the latest 10 running tasks, such as restartingservices, synchronizing service configurations, and performing health checks.

Procedure

Step 1 On the MRS Manager portal, click and open Task List.

You can view the following information under Task List: Name, Status, Progress, StartTime, and End Time.

Step 2 Click the name of a specified task and view details about the task execution process.

----End

5.5 Monitoring Management

5.5.1 Viewing the System Overview

Scenario

You can view basic statistics about services and clusters on the MRS Manager portal.

Procedure

Step 1 On the MRS Manager portal, choose Dashboard > Real-Time Monitoring.l The Health Status and Roles of each service are displayed in Service Summary.l The following statistics about host indicators are displayed:

– Cluster Host Health Status– Host Network Read Speed Distribution– Host Network Write Speed Distribution– Cluster Disk Information– Host Disk Usage Distribution– Cluster Memory Usage– Host Memory Usage Distribution– Host CPU Usage Distribution– Average Cluster CPU UsageClick Customize to display customized statistics.

Step 2 Set an interval for automatic page refreshing or click to refresh immediately.

The following parameters are supported:

l Refresh every 30 seconds: refreshes the page once every 30 seconds.l Refresh every 60 seconds: refreshes the page once every 60 seconds.l Stop refreshing: stops page refreshing.


2019-01-15 73

NOTE

Selecting Full screen maximizes the Real-Time Monitoring window.

----End

5.5.2 Configuring a Monitoring History Report

Scenario

On MRS Manager, the nodes where roles are deployed in a cluster can be classified intomanagement nodes, control nodes, and data nodes. Change trends of key host monitoringindicators on each type of nodes can be calculated and displayed as curve charts in reportsbased on user-defined periods. If a host belongs to multiple types of nodes, the indicatorstatistics will be collected several times.

You can view, customize, and export node monitoring indicator reports on MRS Manager.

Procedure

Step 1 View a monitoring indicator report.

1. On MRS Manager, click Dashboard.2. Click Historical Report to view the report.

By default, the report displays the monitoring indicator statistics of the previous day.

NOTE

Selecting Full screen maximizes the Historical Report window.

Step 2 Customize a monitoring indicator report.

1. Click Customize and select the monitoring indicators to be displayed on MRS Manager.The following indicators are supported and the page displays a maximum of sixcustomized indicators:– Cluster Network Read Speed Statistics– Cluster Disk Write Speed Statistics– Cluster Disk Usage Statistics– Cluster Disk Information– Cluster Disk Read Speed Statistics– Cluster Memory Usage Statistics– Cluster Network Write Speed Statistics– Cluster CPU Usage Statistics

2. Click OK to save the settings and view the selected indicators.

NOTE

Click Clear to deselect all the indicators.

Step 3 Export a monitoring indicator report.

1. Select a period.The options are Last day, Last week, Last month, Last quarter, and Last half year.You can define the start time and end time in Time Range.


2019-01-15 74

2. Click Export to generate a report file for the selected cluster monitoring indicators in thespecified period, and select a storage location to save the file.

NOTE

To view the curve charts of the monitoring indicators in a specified period, click View.

----End

5.5.3 Managing Service and Host Monitoring

Scenario

On MRS Manager, you can manage status and indicator information for all services(including role instances) and hosts.

l Status information, including operation, health, configuration, and role instance status.

l Information about key monitoring indicators of services.

l Monitoring indicator exports.

NOTE

You can set an interval for automatic page refreshing or click to refresh the page immediately.


l Refresh every 30 seconds: refreshes the page once every 30 seconds.


l Stop refreshing: stops page refreshing.

Managing Service Monitoring

Step 1 On MRS Manager, click Service to view the status of all services.

The service list includes Service, Operating Status, Health Status, Configuration Status,Roles, and Operation.

l Table 5-5 describes service operating status.

Table 5-5 Service operating status

Status Description

Started Indicates that the service is started.

Stopped Indicates that the service is stopped.

Failed to start Indicates that the service fails to be started.

Failed to stop Indicates that the service fails to be stopped.

Unknown Indicates the initial service status after the backgroundsystem restarts.

l Table 5-6 describes service health status.


2019-01-15 75

Table 5-6 Service health status

Status Description

Good Indicates that all role instances in the service are runningproperly.

Bad Indicates that at least one role instance in the service is inthe Bad state or that the dependent service is abnormal.

Unknown Indicates that all role instances in the service are in theUnknown state.

Concerning Indicates that the background system is restarting theservice.

Partially Healthy Indicates that the service that this service depends on isabnormal and the interfaces of the abnormal service cannotbe invoked externally.

l Table 5-7 describes service configuration status.

Table 5-7 Service configuration status

Status Description

Synchronized Indicates that the latest configuration has taken effect.

Expired Indicates that the latest configuration has not taken effectafter the parameter modification. You need to restart therelated services.

Failed Indicates that communication is abnormal or data cannot beread or written during the parameter configuration. Tryclicking Synchronize Configuration to recover theprevious configuration.

Configuring Indicates that the parameter is being configured.

Unknown Indicates that current configuration status cannot beobtained.

By default, services are displayed in ascending order by Service You can click Service,Operating Status, Health Status, or Configuration Status to change the display mode.

Step 2 Click the target service in the service list to view its status and indicator information.

Step 3 Customize monitoring indicators and export customized monitoring information.

1. In the Real-Time Statistics area, click Customize to customize key monitoringindicators.

2. Click History to display the page in which you can query historical monitoringinformation.

3. Select a time period, and click View to display the monitoring data in the specified timeperiod.


2019-01-15 76

4. Click Export to export the displayed indicator information.

----End

Managing Role Instance Monitoring

Step 1 On MRS Manager, click Service, and click the target service in the service list.

Step 2 Click Instance to view the role instance status.

The role instance list includes Role, Host Name, OM IP Address, Business IP Address,Rack, Operating Status, Health Status and Configuration Status.

l Table 5-8 describes role instance operating status.

Table 5-8 Role instance operating status

Status Description

Started Indicates that the role instance is started.

Stopped Indicates that the role instance is stopped.

Failed to start Indicates that the role instance fails to be started.

Failed to stop Indicates that the role instance fails to be stopped.

Decommissioning Indicates that the role instance is being decommissioned.

Decommissioned Indicates that the role instance has been decommissioned.

Recommissioning Indicates that the role instance is being recommissioned.

Unknown Indicates the initial role instance status after thebackground system restarts.

l Table 5-9 describes role instance health status.

Table 5-9 Role instance health status

Status Description

Good Indicates that the role instance is running properly.

Bad Indicates that the role instance running is abnormal. Forexample, a port cannot be accessed because the PID doesnot exist.

Unknown Indicates that the host on which the role instance is runningdoes not connect to the background system.

Concerning Indicates that the background system is restarting the roleinstance.

l Table 5-10 describes role instance configuration status.


2019-01-15 77

Table 5-10 Role instance configuration status

Status Description

Synchronized Indicates that the latest configuration has taken effect.

Expired Indicates that the latest configuration has not taken effectafter the parameter modification. You need to restart therelated services.

Failed Indicates that communication is abnormal or data cannot beread or written during the parameter configuration. Tryclicking Synchronize Configuration to recover theprevious configuration.

Configuring Indicates that the parameter is being configured.

Unknown Indicates that configuration status cannot be obtained.

By default, roles are displayed in ascending order by Role. You can click Role, Host Name,OM IP Address, Business IP Address, Rack, Operating Status, Health Status, orConfiguration Status to change the display mode.

You can filter out all instances of the same role in Role.

Click Advanced Search, set search criteria in the role search area, and click Search to viewspecified role information. You can click Reset to reset search criteria. Fuzzy search issupported.

Step 3 Click the target role instance in the role instance list to view its status and indicatorinformation.

Step 4 Customize monitoring indicators and export customized monitoring information. Theoperation process is the same as that of exporting service monitoring indicators.

----End

Managing Host Monitoring

Step 1 On MRS Manager, click Host.

The host list includes Host Name, OM IP Address, Business IP Address, Rack, NetworkSpeed, Operating Status, Health Status, Disk Usage, Memory Usage and CPU Usage.

l Table 5-11 describes the operating status.

Table 5-11 Host operating status

Status Description

Normal The host and service roles on the host are running properly.

Isolated The host is isolated by the user, and service roles on thehost are stopped.


2019-01-15 78

l Table 5-12 describes host health status.

Table 5-12 Host health status

Status Description

Good Indicates that the host can properly send heartbeats.

Bad Indicates that the host fails to send heartbeats due totimeout.

Unknown Indicates the initial status of the host when it is beingadded.

By default, roles are displayed in ascending order by Host Name. You can click Host Name,OM IP Address, Business IP Address, Rack, NetWork Speed, Operating Status, HealthStatus, Disk Usage, Memory Usage, or CPU Usage to change the display mode.

Click Advanced Search, set search criteria in the role search area, and click Search to viewspecified host information. You can click Reset to reset search criteria. Fuzzy search issupported.

Step 2 Click the target host in the host list to view its status and indicator information.

Step 3 Customize monitoring indicators and export customized monitoring information.

1. In the Real-Time Statistics area, click Customize to customize key monitoringindicators.

2. Click History to display the page in which you can query historical monitoringinformation.

3. Select a time period, and click View to display the monitoring data in the specified timeperiod.

4. Click Export to export the displayed indicator information.

----End

5.5.4 Managing Resource Distribution

Scenario

You can query the top value curves, bottom value curves, or average data curves of keyservice and host monitoring indicators, that is, the resource distribution information, on MRSManager. MRS Manager allows you to view the monitoring data of the last hour.

You can also modify the resource distribution on MRS Manager to display both the top andbottom value curves in service and host resource distribution figures.

Resource distribution of some monitoring indicators is not recorded.

Procedurel View the resource distribution of service monitoring indicators.

a. On MRS Manager, click Service.


2019-01-15 79

b. Select the specific service in the service list.

c. Click Resource Distribution.

Select key service indicators in Metric. MRS Manager displays the resourcedistribution data of the selected service indicators in the last hour.

l View the resource distribution of host monitoring indicators.

a. On MRS Manager, click Host.

b. Click the specific host in the host list.

c. Click Resource Distribution.

Select key host indicators from Metric. MRS Manager displays the resourcedistribution data of the selected indicators in the last hour.

l Configure resource distribution.

a. On MRS Manager, click System.

b. In Configuration, click Configure Resource Contribution Ranking underMonitoring and Alarm.

Modify the displayed resource distribution quantity.

n Set Number of Top Resources to the number of top values.

n Set Number of Bottom Resources to the number of bottom values.

NOTE

The sum of the number of top and bottom values must not be greater than five.

c. Click OK to save the settings.

The Number of top and bottom resources saved successfully is displayed in theupper-right corner.

5.5.5 Configuring Monitoring Metric Dumping

Scenario

You can configure interconnection parameters on MRS Manager to save monitoring indicatordata to a specified FTP server using the FTP or SFTP protocol. In this way, MRS clusters caninterconnect with third-party systems. The FTP protocol does not encrypt data, which createspotential security risks. The SFTP protocol is recommended.

MRS Manager supports the collecting of all monitoring indicator data in managed clusters.The collection period can be 30 seconds, 60 seconds, or 300 seconds. Depending on thecollection period, the data is stored in different monitoring files on the FTP server. To name amonitoring file, adhere to the following pattern: Cluster name_metric_Monitoring indicatordata collection period_File saving time.log.

Prerequisitesl The corresponding ECS of the dump server and the Master node of the MRS cluster are

deployed on the same VPC.

l The Master node can access the IP address and specific ports of the dump server.

l The FTP service of the dump server is running properly.


2019-01-15 80

Procedure


Step 2 In Configuration, click Configure Monitoring Metric Dump under Monitoring andAlarm.

Step 3 Table 5-13 describes dump parameters.

Table 5-13 Dump parameters


DumpMonitoringMetric

Mandatory.Specifies whether to enable the audit log export function.l Enable: Enables Monitoring Metric Dumpl Disable: Disables Monitoring Metric Dump

FTP IP Address Mandatory.Specifies the IP address of the FTP server for storing monitoring filesafter the interconnection of monitoring indicator data is enabled.

FTP Port Mandatory.Specifies the port for connecting to the FTP server.

FTP Username Mandatory.Specifies the username for logging in to the FTP server.

FTP Password Mandatory.Specifies the password for logging in to the FTP server.

Save Path Mandatory.Specifies the save path of monitoring files on the FTP server.

Dump Interval (s) Mandatory.Specifies the interval for saving monitoring files to the FTP server, inseconds.

Dump Mode Mandatory.Specifies the protocol used to send monitoring files. The optionsinclude FTP and SFTP.

SFTP Public Key Optional.Specifies the public key of the FTP server. This parameter is validafter Dump Mode is set to SFTP. You are advised to set thisparameter. If not, security risks may exist.

Step 4 Click OK. The parameters are set.

----End


2019-01-15 81

5.6 Alarm Management

5.6.1 Viewing and Manually Clearing an Alarm

Scenario

You can view and manually clear an alarm on MRS Manager.

Generally, the system automatically clears an alarm when the fault that generated the alarm isrectified. If the alarm is not cleared automatically after the fault is rectified, and if the alarmhas no impact on the system, you can manually clear the alarm.

On the MRS Manager portal, you can view the most recent 100,000 alarms, including thosethat have either been manually or automatically cleared, or not cleared. If the number ofcleared alarms exceeds 100,000 and is about to reach 110,000, the system automaticallydumps the earliest 10,000 cleared alarms to the dump path ${BIGDATA_HOME}/OMSV100R001C00x8664/workspace/data on the active management node. The directorywill be automatically generated when alarms are dumped for the first time.

NOTE

You can set an interval for automatic page refreshing or click to refresh the page immediately.




l Stop refreshing: stops page refreshing.

Procedure

Step 1 On MRS Manager, click Alarm and view the alarm information.

l By default, alarms are displayed in descending order by Generated On. You can clickAlarm ID, Alarm Name, Severity, Generated On, Location, or Operation to changethe display mode.

l You can filter out all alarms of the same severity in Severity, including cleared alarmsand uncleared alarms.

Step 2 Click Advanced Search. In the displayed alarm search area, set search criteria and clickSearch to view the information about specified alarms. Click Reset to reset search criteria.

NOTE

You can set Start Time and End Time to specify the time range when alarms are generated.

Rectify the fault by referring to the help information. If the alarms are generated due to othercloud services on which MRS depends, you need to contact the maintenance personnel of therelevant cloud services.

Step 3 If the alarm needs to be manually cleared, click Clear Alarm.


2019-01-15 82

NOTE

If you want to clear multiple alarms, select those you want to clear and click Clear Alarm to clear themin batches. A maximum of 300 alarms can be cleared each time.

----End

5.6.2 Configuring an Alarm Threshold

Scenario

You can configure an alarm threshold to learn the indicator health status. After Send Alarm isselected, the system sends an alarm message when the monitored data reaches the alarmthreshold. You can view the alarm information in Alarm.

Procedure


Step 2 In Configuration, click Configure Alarm Threshold under Monitoring and Alarm.

Step 3 Click an indicator, for example, CPU Usage, and click Create Rule.

Step 4 Set the parameters for monitoring indicator rules.

Table 5-14 Parameters for monitoring indicator rules

Parameter Value Description

Rule Name CPU_MAX (example) Specifies the rule name.

Reference Date 3/18/2017 (example) Specifies the date on whichthe reference indicatorhistory is generated.

Threshold Type l Max. valuel Min. value

Specifies whether to use themaximum or minimumvalue of the indicator forsetting the threshold.l If this parameter is set to

Max. value, an alarm isgenerated when theactual value of theindicator is greater thanthe threshold.

l If this parameter is set toMin. value, an alarm isgenerated when theactual value of theindicator is smaller thanthe threshold.


2019-01-15 83


Alarm Severity l Criticall Majorl Minorl Warning

Specifies the alarm severity.

Time Range From 00:00 to 23:59(example)

Specifies the period inwhich the rule takes effect.

Threshold 80 (example) Specifies the threshold ofthe rule monitoringindicator.

Date l Workdayl Weekendl Other

Specifies the days when therule takes effect.

Add Date 11/06 (example) This parameter takes effectwhen you set Date toOthers. You can selectmultiple dates.

Step 5 Click OK. The Rule saved successfully is displayed in the upper-right corner.

Send Alarm is selected by default. MRS Manager checks whether the values of monitoringindicators meet the threshold requirements. If the number of times that the values do not meetthe threshold requirements during consecutive checks exceeds the value of Trigger Count, analarm will be sent. The value of Trigger Count can be customized. Check Period (s)specifies the interval for MRS Manager to check monitoring indicators.

Step 6 In the row that contains the newly added rule, click Apply in the Operation column. If adialog box indicating that the rule xx is applied successfully is displayed in the upper-rightcorner, the rule is added. The icon turns green, indicating that the operation is complete. ClickCancel in the Operation column. If a dialog box indicating that the rule xx is canceledsuccessfully is displayed in the upper-right corner.

----End

5.6.3 Configuring Syslog Northbound Interface

Scenario

You can configure the northbound interface so that alarms generated on MRS Manager can bereported to your monitoring O&M system using Syslog.

The Syslog protocol is not encrypted. Therefore, data can be easily stolen duringtransmission. This represents a significant security risk.


2019-01-15 84

Prerequisitesl The corresponding ECS of the interconnected server and the Master node of the MRS

cluster are deployed on the same VPC.

l The Master node can access the IP address and specific ports of the interconnectedserver.

Procedure


Step 2 In Configuration, click Configure Syslog under Monitoring and Alarm.

The switch of the Syslog Service is disabled by default. Click the switch to enable the Syslogservice.

Step 3 On the displayed page, set Syslog parameters listed in Table 5-15:

Table 5-15 Description of Syslog parameters

Area Parameter Description

Syslog Protocol Server IP Address Specifies the IP address ofthe interconnected server.

Server Port Specifies the port numberfor interconnection.

Protocol Specifies the protocol type.Possible values are:l TCPl UDP

Severity Specifies the messageseverity. Possible values are:l Informationall Emergencyl Alertl Criticall Errorl Warningl Noticel Debug

Facility Specifies the module wherethe log is generated.

Identifier Specifies the product. Thedefault value is MRSManager.


2019-01-15 85


Report Message Report Format Specifies the messageformat of alarms. For detailsabout the formatrequirements, see the helpinformation on the WebUI.

Report Alarm Type Specifies the type of alarmsto be reported. Possiblevalues are:l Fault

Syslog alarm informationis reported if theManager generates analarm.

l ClearSyslog alarm informationis reported if theManager clears an alarm.

l EventSyslog alarm informationis reported if theManager generates anevent.

Report Alarm Severity Specifies the severity ofalarms to be reported.Possible values areWarning, Minor, Major,and Critical.

Uncleared AlarmReporting

Periodic Uncleared AlarmReporting

Specifies whether unclearedalarms are reportedperiodically.The switch of the PeriodicUncleared AlarmReporting is disabled bydefault. Click the switch toenable the function.

Report Interval (min) Specifies the interval forperiodical alarm report.This parameter is availableonly when PeriodicUncleared AlarmReporting is enabled . Theinterval is measured inminutes and the defaultvalue is 15. The value rangeis 5 to 1440.


2019-01-15 86


Heartbeat Settings Heartbeat Report Specifies whether periodicalreport of Syslog heartbeatmessages is enabled.The switch of the HeartbeatReport is disabled bydefault. Click the switch toenable the function.

Heartbeat Period (min) Specifies the interval forperiodical heartbeat report.This parameter is availableonly when HeartbeatReport is enabled. The unitof the interval is minute andthe default value is 15. Thevalue range is 1 to 60.

Heartbeat Packet Specifies the heartbeatreport content. Thisparameter is available onlywhen Heartbeat Report isenabled. The identifiercannot be empty. The valuemust contain a maximum of256 characters. It cancontain only numbers,letters, underscores (_),vertical bars (|), colons (:),commas (,), periods (.), andspaces.

NOTE

When the heartbeat packets are reported periodically, reporting packets may be interrupted in scenarios(active/standby management node switchover) where some clusters are automatically restored fromfaults. Wait until the restoration is complete.

Step 4 Click OK to complete the settings.

----End

5.6.4 Configuring SNMP Northbound Interface

ScenarioYou can integrate the alarm and monitoring data of MRS Manager to the networkmanagement system (NMS) using the Simple Network Management Protocol (SNMP).


2019-01-15 87

Prerequisitesl The corresponding ECS of the interconnected server and the Master node of the MRS

cluster are deployed on the same VPC.l The Master node can access the IP address and specific ports of the interconnected

server.

Procedure


Step 2 In Configuration, click Configure SNMP under Monitoring and Alarm.

The switch of the SNMP Service is disabled by default. Click the switch to enable the SNMPservice

Step 3 On the displayed page, set SNMP parameters listed in Table 5-16:

Table 5-16 Description of SNMP parameters


Version Specifies the version of the SNMP protocol. Possible values are:l v2c: an earlier version of SNMP with low securityl v3: the latest version of SNMP with higher security than

SNMPv2cSNMPv3 is recommended.

Local Port Specifies the local port number. The default value is 20000. Thevalue ranges from 1025 to 65535.

Read-OnlyCommunity

Specifies the read-only community name. This parameter is validwhen Version is set to v2c.

Read-WriteCommunity

Specifies the write community name. This parameter is valid whenVersion is set to v2c.

Security Username Specifies the SNMP security username. This parameter is validwhen Version is set to v3.

AuthenticationProtocol

Specifies the authentication protocol. You are advised to set thisparameter to SHA. This parameter is valid when Version is set tov3.

AuthenticationPassword

Specifies the authentication key. This parameter is valid whenVersion is set to v3.

Confirm Password Used to confirm the authentication key. This parameter is validwhen Version is set to v3.

EncryptionProtocol

Specifies the encryption protocol. You are advised to set thisparameter to AES256. This parameter is valid when Version is setto v3.

EncryptionPassword

Specifies the encryption key. This parameter is valid when Versionis set to v3.


2019-01-15 88


Confirm Password Used to confirm the encryption key. This parameter is valid whenVersion is set to v3.

NOTE

l The values of Authentication Password and Encryption Password must contain 8 to 16characters. At least three of the following types of character must be used: uppercase letters,lowercase letters, digits, and special characters. The passwords must be different and cannot be thesame as the security username or the security username written backwards.

l To ensure security, periodically change the values of Authentication Password and EncryptionPassword if the SNMP protocol is used.

l If SNMPv3 is used, a security user will be locked if authentication fails five consecutive times in a5-minute window. They will be unlocked 5 minutes later.

Step 4 Click Create Trap Target under Trap Target, and set the following parameters in the CreateTrap Target dialog box:l Target Symbol

Specifies the ID of the Trap target. This is generally the ID of the network managementor host that receives Trap. The value consists of 1 to 255 characters, including letters anddigits.

l Target IP AddressSpecifies the target IP address. The value of this parameter can be set to an IP address ofA, B, or C class and can communicate with the IP address of the management plane onthe management node.

l Target PortSpecifies the port that receives Trap. The value of this parameter must be that same asthat on the peer end and ranges from 0 to 65535.

l Trap CommunitySpecifies the trap community name. This parameter is valid when Version is set to v2c.

Click OK to finish the settings and exit the Create Trap Target dialog box.

Step 5 Click OK.

----End

5.7 Alarm Reference

5.7.1 ALM-12001 Audit Log Dump Failure

DescriptionCluster audit logs need to be dumped on a third-party server due to the local historical databackup policy. Audit logs can be successfully dumped if the dump server meets theconfiguration conditions. This alarm is generated when the audit log dump fails due toinsufficient disk space on the dump directory on the third-party server or if a user changes theusername, password, or dump directory of the dump server.


2019-01-15 89

Attribute

Alarm ID Alarm Severity Automatically Cleared

12001 Minor Yes

Parameters


ServiceName Specifies the service for which the alarm isgenerated.

RoleName Specifies the role for which the alarm isgenerated.

HostName Specifies the host for which the alarm isgenerated.

Impact on the System

The system can only store a maximum of 50 dump files locally. If the fault persists on thedump server, the local audit log may be lost.

Possible Causesl The network connection is abnormal.l The username, password, or dump directory of the dump server does not meet the

configuration conditions.l The disk space of the dump directory is insufficient.

Procedure

Step 1 Check whether the username, password, and dump directory are correct.

1. Check on the dump configuration page of MRS Manager to see if they are correct.– If yes, go to Step 3.– If no, go to Step 1.2.

2. Change the username, password, or dump directory, and click OK.3. Wait 2 minutes and check whether the alarm is cleared.

– If yes, no further action is required.– If no, go to Step 2.

Step 2 Reset the dump rule.

1. On MRS Manager, choose System > Dump Audit Log.2. Reset dump rules, set the parameters properly, and click OK.3. Wait 2 minutes and check whether the alarm is cleared.


2019-01-15 90


Step 3 Collect fault information.

1. On MRS Manager, choose System > Export Log.2. Contact the O&M personnel and send the collected log information.

----End

Related InformationN/A

5.7.2 ALM-12002 HA Resource Is Abnormal

DescriptionThe high availability (HA) software periodically checks the WebService floating IP addressesand databases of Manager. This alarm is generated when any of these is abnormal.

This alarm is cleared when the HA software detects that the floating IP addresses or databasesare in the normal state.

AttributeAlarm ID Alarm Severity Automatically Cleared

12002 Major Yes

ParametersParameter Description




RESName Specifies the resource for which the alarm isgenerated.

Impact on the SystemIf the WebService floating IP addresses of Manager are abnormal, users cannot log in to oruse MRS Manager. If databases are abnormal, all core services and related service processes,such as alarm and monitoring functions, are affected.


2019-01-15 91

Possible Causesl The floating IP address is abnormal.l An exception occurs in the database.

Procedure

Step 1 Check the status of the floating IP address on the active management node.

1. In the alarm list on MRS Manager, locate the row that contains the alarm, and view itshost address and resource name in the alarm details.

2. Log in to the active management node. Run the following command to switch the user:sudo su - rootsu - omm

3. Go to the ${BIGDATA_HOME}/om-0.0.1/sbin/ directory, run the status-oms.sh scriptto check whether the floating IP address of the active Manager is normal. View thecommand output, locate the row where ResName is floatip, and check whether thefollowing information is displayed.For example:10-10-10-160 floatip Normal Normal Single_active

– If yes, go to Step 2.– If no, go to Step 1.4.

4. Contact the O&M personnel to check whether the NIC configured with the floating IPaddress exists.– If yes, go to Step 2.– If no, go to Step 1.5.

5. Contact the O&M personnel to rectify NIC faults.Wait 5 minutes and check whether the alarm is cleared.– If yes, no further action is required.– If no, go to Step 2.

Step 2 Check the database status of the active and standby management nodes.

1. Log in to the active and standby management nodes, run the sudo su - root and su -ommdba command to switch to user ommdba, and then run the gs_ctl query command.Check whether the following information is displayed in the command output.Command output of the active management node:Ha state: LOCAL_ROLE: Primary STATIC_CONNECTIONS: 1 DB_STATE: Normal DETAIL_INFORMATION: user/password invalid Senders info: No information Receiver info: No information

Command output of the standby management node:Ha state: LOCAL_ROLE: Standby STATIC_CONNECTIONS: 1 DB_STATE : Normal DETAIL_INFORMATION: user/password invalid Senders info: No information Receiver info: No information

– If yes, go to Step 2.3.– If no, go to Step 2.2.

2. Contact the O&M personnel to check for and rectify network faults.– If yes, go to Step 2.3.


2019-01-15 92

– If no, go to Step 3.

3. Wait 5 minutes and check whether the alarm is cleared.– If yes, no further action is required.– If no, go to Step 3.

Step 3 Contact the O&M personnel and send the collected log information.

----End

Related Information

N/A

5.7.3 ALM-12004 OLdap Resource Is Abnormal

Description

This alarm is generated when the Ldap resource in Manager is abnormal and is cleared afterthe Ldap resource in Manager recovers and the alarm handling is complete.

Attribute


12004 Major Yes

Parameters






The OLdap resources are abnormal and the Manager authentication service is unavailable. Asa result, security authentication and user management functions cannot be provided for upper-layer web services. Users may be unable to log in to Manager.

Possible Causes

The LdapServer process in Manager is abnormal.


2019-01-15 93

Procedure

Step 1 Check whether the LdapServer process in Manager is in the normal state.

1. Log in to the active management node.2. Run ps -ef | grep slapd to check whether the LdapServer resource process in the $

{BIGDATA_HOME}/om-0.0.1/ directory of the configuration file is running properly.You can determine that the resource is normal by checking the following information:

a. Run sh ${BIGDATA_HOME}/om-0.0.1/sbin/status-oms.sh. You can view thatResHAStatus of the OLdap process is Normal.

b. Run ps -ef | grep slapd. You can view the slapd process occupying port 21750.

– If yes, go to Step 2.– If no, go to Step 3.

Step 2 Run kill -2 PID of LdapServer process and wait 20 seconds. The HA starts the OLdap processautomatically. Check whether the status of the OLdap resource is in the normal state.l If yes, no further action is required.l If no, go to Step 3.



----End


5.7.4 ALM-12005 OKerberos Resource Is Abnormal

DescriptionThe alarm module monitors the status of the Kerberos resource in Manager. This alarm isgenerated when the Kerberos resource is abnormal and is cleared after the Kerberos resourcerecovers and the alarm handling is complete.


12005 Major Yes


2019-01-15 94





Impact on the SystemThe Kerberos resources are abnormal and the Manager authentication service is unavailable.As a result, the security authentication function cannot be provided for upper-layer webservices. Users may be unable to log in to Manager.

Possible CausesThe OLdap resource on which the OKerberos depends is abnormal.

Procedure

Step 1 Check whether the OLdap resource on which the OKerberos depends is abnormal in Manager.

1. Log in to the active management node.2. Run the following command to check whether the OLdap resource managed by HA is in

the normal state:sh ${BIGDATA_HOME}/OMSV100R001C00x8664/workspace0/ha/module/hacom/script/status_ha.shThe OLdap resource is in the normal state when it is in the Active_normal state on theactive node and in the Standby_normal state on the standby node.– If yes, go to Step 3.– If no, go to Step 2.

Step 2 See ALM-12004 OLdap Resource Is Abnormal for further assistance. After the OLdapresource status recovers, check whether the OKerberos resource is in the normal state.l If yes, no further action is required.l If no, go to Step 3.



----End



2019-01-15 95

5.7.5 ALM-12006 Node Fault

DescriptionController checks the NodeAgent status every 30 seconds. This alarm is generated whenController fails to receive the status report of a NodeAgent for three times consecutively andis cleared when Controller can properly receive the status report of the NodeAgent.


12006 Critical Yes





Impact on the SystemServices on the node are unavailable.

Possible CausesThe network is disconnected or the hardware is faulty.

Procedure

Step 1 Check whether the network is disconnected or the hardware is faulty.

1. In the alarm list on MRS Manager, locate the row that contains the alarm, and view itshost address in the alarm details.

2. Log in to the active management node.3. Run the following command to check whether the faulty node is reachable:

ping IP address of the faulty host

a. If yes, go to Step 2.b. If no, go to Step 1.4.

4. Contact the O&M personnel to check whether a network fault occurs and rectify thefault.


2019-01-15 96


5. Rectify the network fault and check whether the alarm is cleared from the alarm list.– If yes, no further action is required.– If no, go to Step 1.6.

6. Contact the O&M personnel to check whether a hardware fault (for example, a CPUfault or memory fault) occurs on the node.– If yes, go to Step 1.7.– If no, go to Step 2.

7. Repair the faulty components and restart the node. Check whether the alarm is clearedfrom the alarm list.– If yes, no further action is required.– If no, go to Step 2.



----End

Related Information

N/A

5.7.6 ALM-12007 Process Fault

Description

The process health check module checks the process status every 5 seconds. This alarm isgenerated when the process health check module detects that the process connection status isBad for three times consecutively and is cleared when the process can be connected.

Attribute


12007 Major Yes

Parameters





2019-01-15 97




The service provided by the process is unavailable.

Possible Causesl The instance process is abnormal.l The disk space is insufficient.

Procedure

Step 1 Check whether the instance process is abnormal.

1. In the alarm list on MRS Manager, locate the row that contains the alarm, and view itshost name and service name in the alarm details.

2. On the Alarms page, check whether alarm ALM-12006 Node Fault is generated.– If yes, go to Step 1.3.– If no, go to Step 1.4.

3. See the procedure in ALM-12006 Node Fault to handle the alarm.4. Check whether the installation directory user, user group, and permission of the alarm

role are correct. The user, user group, and the permission must be omm:ficommon 750.– If yes, go to Step 1.6.– If no, go to Step 1.5.

5. Run the following command to set the permission to 750 and User:Group toomm:ficommon:chmod 750 <folder_name>chown omm:ficommon <folder_name>

6. Wait 5 minutes and check whether the alarm is cleared.– If yes, no further action is required.– If no, go to Step 2.1.

Step 2 Check whether the disk space is insufficient.

1. On MRS Manager, check whether the alarm list contains ALM-12017 Insufficient DiskCapacity.– If yes, go to Step 2.2.– If no, go to Step 3.

2. See the procedure in ALM-12017 Insufficient Disk Capacity to handle the alarm.3. Wait 5 minutes and check whether the alarm is cleared.

– If yes, go to Step 2.4.– If no, go to Step 3.


2019-01-15 98




----End


5.7.7 ALM-12010 Manager Heartbeat Interruption Between theActive and Standby Nodes

DescriptionThis alarm is generated when the active Manager does not receive a heartbeat signal from thestandby Manager or 7 seconds. It is cleared when the active Manager receives a heartbeatsignal from the standby Manager.


12010 Major Yes





Local Manager HA Name Specifies a local Manager HA.

Peer Manager HA Name Specifies a peer Manager HA.

Impact on the SystemWhen the active Manager process is abnormal, an active/standby failover cannot beperformed, and services are affected.


2019-01-15 99

Possible Causes

The link between the active and standby Managers is abnormal.

Procedure

Step 1 Check whether the network between the active and standby Manager servers is in the normalstate.

1. In the alarm list on MRS Manager, locate the row that contains the alarm, and view theIP address of the standby Manager server in the alarm details.

2. Log in to the active management node.3. Run the following command to check whether the standby Manager is reachable:

ping heartbeat IP address of the standby Manager– If yes, go to Step 2.– If no, go to Step 1.4.

4. Contact the O&M personnel to check whether the network is faulty.– If yes, go to Step 1.5.– If no, go to Step 2.

5. Rectify the network fault and check whether the alarm is cleared from the alarm list.– If yes, no further action is required.– If no, go to Step 2.



----End

Related Information

N/A

5.7.8 ALM-12011 Manager Data Synchronization ExceptionBetween the Active and Standby Nodes

Description

This alarm is generated when the standby Manager fails to synchronize files with the activeManager and is cleared when they succeed in synchronizing files.

Attribute


12011 Critical Yes


2019-01-15 100

Parameters





Local Manager HA Name Specifies a local Manager HA.

Peer Manager HA Name Specifies a peer Manager HA.


Because the configuration files on the standby Manager are not updated, some configurationswill be lost after an active/standby switchover. Manager and some components may not runproperly.

Possible Causes

The link between the active and standby Managers is interrupted.

Procedure

Step 1 Check whether the network between the active and standby Manager servers is in the normalstate.

1. In the alarm list on MRS Manager, locate the row that contains the alarm, and view theIP address of the standby Manager in the alarm details.

2. Log in to the active management node. Run the following command to check whetherthe standby Manager is reachable:

ping IP address of the standby Manager

a. If yes, go to Step 2.

b. If no, go to Step 1.3.

3. Contact the O&M personnel to check whether the network is faulty.

– If yes, go to Step 1.4.


4. Rectify the network fault and check whether the alarm is cleared from the alarm list.

– If yes, no further action is required.



1. On MRS Manager, choose System > Export Log.


2019-01-15 101

2. Contact the O&M personnel and send the collected log information.

----End

Related Information

N/A

5.7.9 ALM-12012 NTP Service Is Abnormal

Description

This alarm is generated when the NTP service on the current node fails to synchronize timewith the NTP service on the active OMS node. It is cleared when they succeed insynchronizing time.

Attribute


12012 Major Yes

Parameters






The time on the node is inconsistent with the time on other nodes in the cluster. Therefore,some MRS applications on the node may not run properly.

Possible Causesl The NTP service on the current node cannot start properly.l The current node fails to synchronize time with the NTP service on the active OMS

node.l The key value authenticated by the NTP service on the current node is inconsistent with

that on the active OMS node.l The time offset between the node and the NTP service on the active OMS node is large.


2019-01-15 102

Procedure

Step 1 Check the NTP service on the current node.

1. Check whether the ntpd process is running on the node using the following method. Login to the node and run the sudo su - root command to switch the user. Run the followingcommand to check whether the command output contains the ntpd process:ps -ef | grep ntpd | grep -v grep.– If yes, go to Step 2.1.– If no, go to Step 1.2.

2. Run service ntp start to start the NTP service.3. Wait 10 minutes and check whether the alarm is cleared.

– If yes, no further action is required.– If no, go to Step 2.1.

Step 2 Check whether the current node can synchronize time properly with the NTP service on theactive OMS node.

1. Check whether the node can synchronize time with the NTP service on the active OMSnode based on Additional Info of the alarm.– If yes, go to Step 2.2.– If no, go to Step 3.

2. Check whether the synchronization with the NTP service on the active OMS node isfaulty.Log in to the alarm node and run the sudo su - root command to switch the user. Thenrun the ntpq -np command.In the command output, if an asterisk (*) exists before the IP address of the NTP serviceon the active OMS node, the synchronization is in the normal state. The command outputis as follows:remote refid st t when poll reach delay offset jitter ============================================================================== *10.10.10.162 .LOCL. 1 u 1 16 377 0.270 -1.562 0.014If no asterisk (*) exists before the IP address of the NTP service on the active OMS nodeand the value of refid is .INIT., the synchronization is abnormal. The command output isas follows:remote refid st t when poll reach delay offset jitter ============================================================================== 10.10.10.162 .INIT. 1 u 1 16 377 0.270 -1.562 0.014– If yes, go to Step 2.3.– If no, go to Step 3.

3. Rectify the fault, wait 10 minutes, and then check whether the alarm is cleared.– If yes, no further action is required.– If no, go to Step 3.An NTP synchronization failure is usually related to the system firewall. If the firewallcan be disabled, disable it and check whether the fault is rectified. If the firewall cannotbe disabled, check the firewall configuration policies and ensure that port UDP 123 isenabled. You need to follow specific firewall configuration policies of each system.

Step 3 Check whether the key value authenticated by the NTP service on the current node isconsistent with that on the active OMS node.


2019-01-15 103

Run cat /etc/ntp.keys to check whether the authentication code with a key value index of 1 isthe same as the value of the NTP service on the active OMS node.

l If yes, go to Step 4.1.

l If no, go to Step 5.

Step 4 Check whether the time offset between the node and the NTP service on the active OMS nodeis large.

1. Check whether the time offset is large in Additional Info of the alarm.



2. On the Hosts page, select the host of the node, and choose More > Stop All Roles tostop all the services on the node.

If the time on the alarm node is earlier than that on the NTP service of the active OMSnode, adjust the time on the alarm node to be the same as that on the NTP service of theactive OMS node. After doing so, choose More > Start All Roles to start services on thenode.

If the time on the alarm node is later than that on the NTP service of the active OMSnode, wait until the time offset is due and adjust the time on the alarm node. After doingso, choose More > Start All Roles to start services on the node.

NOTE

If you do not wait until the time offset is due, data loss may occur.

3. Wait 10 minutes and check whether the alarm is cleared.






----End

Related Information

N/A

5.7.10 ALM-12016 CPU Usage Exceeds the Threshold

Description

The system checks the CPU usage every 30 seconds and compares it with the threshold. Thisalarm is generated when the CPU usage exceeds the threshold several times (configurable, 10times by default) consecutively.

This alarm is cleared when the average CPU usage is less than or equal to 90% of thethreshold.


2019-01-15 104

Attribute


12016 Major Yes

Parameters





Trigger Condition Generates an alarm when the actualindicator value exceeds the specifiedthreshold.


Service processes respond slowly or become unavailable.

Possible Causesl The alarm threshold or Trigger Count is configured inappropriately.

l The CPU configuration does not meet service requirements. As a result, the CPU usagereaches the upper limit.

Procedure

Step 1 Check whether the alarm threshold or Trigger Count is appropriate.

1. Log in to MRS Manager.

2. Choose System > Configure Alarm Threshold > Device > Host > CPU Usage > CPUUsage and change the alarm threshold based on the actual CPU usage.

3. Choose System > Configure Alarm Threshold > Device > Host > CPU Usage > CPUUsage and change Trigger Count based on the actual CPU usage.

NOTE

This option defines the alarm check phase. Interval indicates the alarm check period and TriggerCount indicates the number of times the CPU usage exceeds the threshold. An alarm is generatedif the CPU usage exceeds the threshold several times consecutively.




2019-01-15 105


Step 2 Expand the system capacity.

1. In the alarm list on MRS Manager, locate the row that contains the alarm, and view theIP address of the alarm node in the alarm details.

2. Log in to the alarm node.3. Run the cat /proc/stat | awk 'NR==1'|awk '{for(i=2;i<=NF;i++)j+=$i;print "" 100 -

($5+$6) * 100 / j;}' command to check the system CPU usage.4. If the CPU usage exceeds the threshold, expand the CPU capacity.5. Check whether the alarm is cleared.




----End

Related Information

N/A

5.7.11 ALM-12017 Insufficient Disk Capacity

Description

The system checks the host disk usage every 30 seconds and compares it with the threshold.This alarm is generated when the host disk usage exceeds the specified threshold and iscleared when the host disk usage is less than or equal to the threshold.


12017 Major Yes






2019-01-15 106


PartitionName Specifies the disk partition for which thealarm is generated.



Service processes become unavailable.

Possible Causes

The disk configuration does not meet service requirements. As a result, the disk usage reachesthe upper limit.

Procedure

Step 1 Log in to MRS Manager and check whether the alarm threshold is appropriate.l If yes, go to Step 2.l If no, go to Step 1.1.

1. Choose System > Configure Alarm Threshold > Device > Disk > Disk Usage > DiskUsage and change the alarm threshold based on the actual disk usage.


Step 2 Check whether the disk is a system disk.

1. In the alarm list on MRS Manager, locate the row that contains the alarm, and view itshost name and disk partition information in the alarm details.

2. Log in to the alarm node.3. Run the df -h command to check the system disk partition usage. Check whether the disk

is mounted to the following directories based on the disk partition name obtained in Step2.1: /, /boot, /home, /opt, /tmp, /var, /var/log, /boot, and /srv/BigData.– If yes, the disk is a system disk. Go to Step 3.1.– If no, the disk is not a system disk. Go to Step 2.4.

4. Run the df -h command to check the system disk partition usage. Determine the role ofthe disk based on the disk partition name obtained in Step 2.1.

5. Check whether the disk is used by HDFS or Yarn.– If yes, expand the disk capacity for the Core node. Go to Step 2.6.– If no, go to Step 4.

6. Wait 2 minutes and check whether the alarm is cleared.– If yes, no further action is required.


2019-01-15 107


Step 3 Check whether a large file is written to the disk.

1. Run the find / -xdev -size +500M -exec ls -l {} \; command to view files larger than 500MB on the node. Check whether these files are written to the disk.– If yes, go to Step 3.2.– If no, go to Step 4.

2. Process the large files and check whether the alarm is cleared after 2 minutes.– If yes, no further action is required.– If no, go to Step 4.

3. Expand the disk capacity.4. Wait 2 minutes and check whether the alarm is cleared.




----End

Related Information

N/A

5.7.12 ALM-12018 Memory Usage Exceeds the Threshold

Description

The system checks the memory usage every 30 seconds and compares it with the threshold.This alarm is generated when the host memory usage exceeds the threshold and is clearedwhen it is less than or equal to 90% of the threshold.


12018 Major Yes





2019-01-15 108





Service processes respond slowly or become unavailable.

Possible Causes

Memory configuration does not meet service requirements. As a result, the memory usagereaches the upper limit.

Procedure


1. In the alarm list on MRS Manager, locate the row that contains the alarm, and view theIP address of the alarm host in the alarm details.

2. Log in to the alarm node.3. Run the free -m | grep Mem\: | awk '{printf("%s,", ($3-$6-$7) * 100 / $2)}' command

to check the system memory usage.4. If the memory usage exceeds the threshold, expand the memory capacity.5. Wait 5 minutes and check whether the alarm is cleared.




----End

Related Information

N/A

5.7.13 ALM-12027 Host PID Usage Exceeds the Threshold

Description

The system checks the PID usage every 30 seconds and compares it with the threshold. Thisalarm is generated when the PID usage exceeds the threshold and is cleared when it is lessthan or equal to the threshold.


2019-01-15 109


12027 Major Yes







No PID is available for new processes and service processes are unavailable.

Possible Causesl Too many processes are running on the node.l The value of pid_max needs to be increased.l The system is abnormal.

Procedure

Step 1 Increase the value of pid_max.

1. On MRS Manager, click the alarm in the real-time alarm list. In the Alarm Details area,obtain the IP address of the host that generated the alarm.

2. Log in to the alarm node.3. Run the cat /proc/sys/kernel/pid_max command to check the value of pid_max.4. If the PID usage exceeds the threshold, run the following command to double the value

of pid_max:echo new pid_max value > /proc/sys/kernel/pid_max.For example,echo 65536 > /proc/sys/kernel/pid_max

5. Wait 5 minutes and check whether the alarm is cleared.– If yes, no further action is required.


2019-01-15 110


Step 2 Check whether the system environment is abnormal.

1. Contact the O&M personnel to check whether the operating system is abnormal.– If yes, go to Step 2 to rectify the fault.– If no, go to Step 3.




----End

Related Information

N/A

5.7.14 ALM-12028 Number of Processes in the D State on the HostExceeds the Threshold

Description

The system checks the number of processes of user omm that are in the D state on the hostevery 30 seconds and compares the number with the threshold. This alarm is generated whenthe number of processes in the D state exceeds the threshold and is cleared when the numberis less than or equal to the threshold.

Attribute


12028 Major Yes

Parameters






2019-01-15 111



Impact on the SystemExcessive system resources are used and service processes respond slowly.

Possible CausesThe host responds slowly to I/O (disk I/O and network I/O) requests and a process is in the Dstate.

Procedure

Step 1 Check the process that is in the D state.

1. In the alarm list on MRS Manager, locate the row that contains the alarm, and view theIP address of the alarm host in the alarm details.

2. Log in to the alarm node.3. Run the following command to switch the user:

sudo su - rootsu - omm

4. Run the following command to view the PID of the process of user omm that is in the Dstate:ps -elf | grep -v "\[thread_checkio\]" | awk 'NR!=1 {print $2, $3, $4}' | grep omm |awk -F' ' '{print $1, $3}' | grep D | awk '{print $2}'

5. Check whether no command output is displayed.– If yes, the service process is running properly. Go to 1.7.– If no, go to 1.6.

6. Switch to user root and run the reboot command to restart the alarm host.Restarting the host is risky. Ensure that the service process runs properly after the restart.




----End



2019-01-15 112

5.7.15 ALM-12031 User omm or Password Is About to Expire

Description

At 00:00 every day, the system starts checking whether user omm and its password are aboutto expire every 8 hours. This alarm is generated when the user or password is going to expirein 15 days.

It is cleared when the validity period of user omm is changed or the password is reset and thealarm handling is complete.

Attribute


12031 Major Yes

Parameters






The node trust relationship is unavailable and Manager cannot manage services.

Possible Causes

User omm or its password is about to expire.

Procedure

Step 1 Check whether user omm and its password in the system are valid.

1. Log in to the faulty node.2. Run the following command to view information about user omm and its password:

chage -l omm3. Check whether the user and password are about to expire based on the system message.

a. View the value of Password expires to check whether the password is about toexpire.


2019-01-15 113

b. View the value of Account expires to check whether the user is about to expire.

NOTE

If the parameter value is never, the user and password are valid permanently; if the value is a date,check whether the user and password are going to expire within 15 days.


4. Modify the validity period:– Run the following command to set a validity period for user omm:

chage -E 'specified date' omm– Run the following command to set the number of validity days for user omm:

chage -M 'number of days' omm5. Check whether the alarm is cleared automatically in the next periodic check.




----End


5.7.16 ALM-12032 User ommdba or Password Is About to Expire

DescriptionAt 00:00 every day, the system starts checking whether user ommdba and its password areabout to expire every 8 hours. This alarm is generated when the user or password is going toexpire in 15 days.

It is cleared when the validity period of user ommdba is changed or the password is reset andthe alarm handling is complete.


12032 Major Yes


2019-01-15 114






The OMS database cannot be managed and data cannot be accessed.

Possible Causes

User ommdba or its password is about to expire.

Procedure

Step 1 Check whether user ommdba and its password in the system are valid.

1. Log in to the faulty node.2. Run the following command to view information about user ommdba and its password:

chage -l ommdba3. Check whether the user and password are about to expire based on the system message.

a. View the value of Password expires to check whether the password is about toexpire.

b. View the value of Account expires to check whether the user is about to expire.

NOTE

If the parameter value is never, the user and password are valid permanently; if the value is a date,check whether the user and password are going to expire within 15 days.


4. Modify the validity period configuration:– Run the following command to set a validity period for user ommdba:

chage -E 'specified date' ommdba– Run the following command to set the number of validity days for user ommdba:

chage -M 'number of days' ommdba5. Check whether the alarm is cleared automatically in the next periodic check.




2019-01-15 115


----End

Related Information

N/A

5.7.17 ALM-12033 Slow Disk Fault

Description

The system runs the iostat command every second to monitor the disk I/O indicator. Thisalarm is generated when the svctm value exceeds 100 ms more than 30 times in 60 seconds,which indicates that the disk is faulty.

This alarm is automatically cleared after the disk is replaced.

Attribute


12033 Major Yes

Parameters





DiskName Specifies the disk for which the alarm isgenerated.


Service performance and service processing capabilities deteriorate. For example, DBServiceactive/standby synchronization is affected and the service becomes unavailable.

Possible Causes

The disk is aged or has bad sectors.


2019-01-15 116

Procedure

Contact the O&M personnel and send the collected log information.

Related Information

N/A

5.7.18 ALM-12034 Periodic Backup Failure

Description

This alarm is generated when a periodic backup task fails to be executed and is cleared whenthe next backup task is executed successfully.

Attribute


12034 Major Yes

Parameters





TaskName Specifies the task.


No backup package is available, so the system cannot be restored if faults occur.

Possible Causes

The alarm cause depends on the task details. Handle the alarm according to the logs and alarmdetails.

Procedure



2019-01-15 117


5.7.19 ALM-12035 Unknown Data Status After Recovery TaskFailure

DescriptionIf a recovery task fails, the system automatically rolls back. If the rollback fails, data may belost. When this occurs, an alarm is generated. This alarm is cleared when the next recoverytask is executed successfully.


12035 Critical Yes





TaskName Specifies the task.

Impact on the SystemData may be lost or the data status may be unknown, both of which may affect services.

Possible CausesThe alarm cause depends on the task details. Handle the alarm according to the logs and alarmdetails.

ProcedureContact the O&M personnel and send the collected log information.



2019-01-15 118

5.7.20 ALM-12037 NTP Server Is Abnormal

Description

This alarm is generated when the NTP server is abnormal and is cleared after the NTP serverrecovers.

Attribute


12037 Major Yes

Parameters




HostName Specifies the IP address of the NTP serverfor which the alarm is generated.


If the NTP server configured on the active OMS node is abnormal, the active OMS node failsto synchronize time with the NTP server and a time offset may be generated in the cluster.

Possible Causesl The NTP server network is faulty.l The NTP server authentication fails.l The NTP server time cannot be obtained.l The time obtained from the NTP server is not being continuously updated.

Procedure

Step 1 Check the NTP server network.

1. On the MRS Manager portal, view the real-time alarm list and locate the target alarm.2. In the Alarm Details area, view the additional information to check whether the NTP

server is successfully pinged.– If yes, go to Step 2.– If no, go to Step 1.3.


2019-01-15 119

3. Contact the O&M personnel to check the network configuration and ensure that thenetwork between the NTP server and the active OMS node is in the normal state. Then,check whether the alarm is cleared.



Step 2 Check whether the NTP server authentication fails.

1. Log in to the active management node.

2. Run the ntpq -np command to check whether the NTP server authentication fails. Ifrefid of the NTP server is .AUTH., the authentication fails.

– If yes, go to Step 5.


Step 3 Check whether the time can be obtained from the NTP server.

1. View the additional information of the alarm to check whether the time can be obtainedfrom the NTP server.


– If no, go to Step 3.2.

2. Contact the O&M personnel to rectify the NTP server fault. After the NTP server is inthe normal state, check whether the alarm is cleared.



Step 4 Check whether the time obtained from the NTP server is being continuously updated.

1. View the additional information of the alarm to check whether the time obtained fromthe NTP server is being continuously updated.



2. Contact the provider of the NTP server to rectify the NTP server fault. After the NTPserver is in the normal state, check whether the alarm is cleared.






----End

Related Information

N/A


2019-01-15 120

5.7.21 ALM-12038 Monitoring Indicator Dump Failure

Description

This alarm is generated when dump fails after monitoring indicator dump is configured onMRS Manager and is cleared when dump is successful.

Attribute


12038 Major Yes

Parameters






The upper-layer management system fails to obtain monitoring indicators from the MRSManager system.

Possible Causesl The server cannot be connected.l The save path on the server cannot be accessed.l The monitoring indicator file fails to be uploaded.

Procedure

Step 1 Contact the O&M personnel to check whether the network connection between the MRSManager system and the server is in the normal state.l If yes, go to Step 3.l If no, go to Step 2.

Step 2 Contact the O&M personnel to restore the network and check whether the alarm is cleared.l If yes, no further action is required.l If no, go to Step 3.


2019-01-15 121

Step 3 Choose System > Configure Monitoring Metric Dump and check whether the FTPusername, password, port, dump mode, and public key that are configured on theconfiguration page for monitoring indicator dumping are consistent with those on the server.

l If yes, go to Step 5.


Step 4 Enter the correct configuration information, click OK, and check whether the alarm iscleared.

l If yes, no further action is required.


Step 5 Choose System > Configure Monitoring Metric Dump and check the configuration items,including FTP Username, Save Path, and Dump Mode.

l If the dumping mode is FTP, go to Step 6.

l If the dumping mode is SFTP, go to Step 7.

Step 6 Log in to the server in FTP mode. In the default path, check whether the relative path SavePath has the read and write permission on FTP Username.



Step 7 Log in to the server in FTP mode. In the default path, check whether the absolute path SavePath has the read and write permission on FTP Username.



Step 8 Add the read and write permission and check whether the alarm is cleared.



Step 9 Log in to the server and check whether the save path has sufficient disk space.



Step 10 Delete any unnecessary files or go to the configuration page for monitoring indicator dumpingto change the save path. Check whether the alarm is cleared.






----End

Related Information

N/A


2019-01-15 122

5.7.22 ALM-12039 GaussDB Data Is Not Synchronized

Description

The system checks the data synchronization status between the active and standby GaussDBsevery 10 seconds. This alarm is generated when the synchronization status cannot be queriedsix times consecutively or when the synchronization status is abnormal.

This alarm is cleared when data synchronization is normal.

Attribute


12039 Critical Yes

Parameters





Local GaussDB HA IP Specifies the HA IP address of the localGaussDB.

Peer GaussDB HA IP Specifies the HA IP address of the peerGaussDB.

SYNC_PERSENT Specifies the synchronization percentage.


If the active instance becomes abnormal while data is not synchronized between the activeand standby GaussDBs, data may be lost or abnormal.

Possible Causesl The network between the active and standby nodes is unstable.

l The standby GaussDB is abnormal.

l The disk space of the standby node is full.


2019-01-15 123

Procedure

Step 1 Log in to MRS Manager, click Alarms, locate the row that contains the alarm, and view theIP address of the standby GaussDB in the alarm details.

Step 2 Log in to the active management node.

Step 3 Run the following command to check whether the standby GaussDB is reachable:

ping heartbeat IP address of the standby GaussDB

If yes, go to Step 6.

If no, go to Step 4.

Step 4 Contact the O&M personnel to check whether the network is faulty.l If yes, go to Step 5.l If no, go to Step 6.

Step 5 Rectify the network fault and check whether the alarm is cleared from the alarm list.l If yes, no further action is required.l If no, go to Step 6.

Step 6 Log in to the standby GaussDB node.


sudo su - root

su - omm

Step 8 Go to the ${BIGDATA_HOME}/om-0.0.1/sbin/ directory.

Run the following command to check whether the resource status of the standby GaussDB isnormal:

sh status-oms.sh

In the command output, check whether the following information is displayed in the rowwhere ResName is gaussDB:10_10_10_231 gaussDB Standby_normal Normal Active_standby

l If yes, go to Step 9.l If no, go to Step 15.

Step 9 Log in to the standby GaussDB node.


sudo su - root

su - omm

Step 11 Run the echo ${BIGDATA_DATA_HOME}/dbdata_om command to obtain the GaussDBdata directory.

Step 12 Run the df -h command to check the system disk partition usage.

Step 13 Check whether the disk on which the GaussDB data directory is mounted is full.


2019-01-15 124


Step 14 Contact the O&M personnel to expand the disk capacity. After capacity expansion, wait 2minutes and check whether the alarm is cleared.l If yes, no further action is required.l If no, go to Step 15.



----End

Related Information

N/A

5.7.23 ALM-12040 Insufficient System Entropy

Description

At 00:00:00 every day, the system checks the entropy five times consecutively. First, thesystem checks whether either the rng-tools or haveged tool is enabled and correctlyconfigured. If not, the system checks the current entropy. This alarm is generated when theentropy is less than 500 in the five checks.

This alarm is cleared in any of the following scenarios:

l True random number mode is configured.l Random numbers are configured in pseudo-random number mode.l Neither the true random number mode nor pseudo-random number mode is configured

but the entropy is greater than or equal to 500 in at least one of the five checks.

Attribute


12040 Major Yes

Parameters





2019-01-15 125



Impact on the SystemDecryption failures occur and functions related to decryption are affected, for example,DBService installation.

Possible CausesThe haveged or rngd service is abnormal.

Procedure

Step 1 On the MRS Manager portal, click Alarms.

Step 2 View detailed alarm information to obtain the value of the HostName field.

Step 3 Log in to the node for which the alarm is generated. Run the sudo su - root command toswitch the user.

Step 4 Run the /bin/rpm -qa | grep -w "haveged" command. If the command is executedsuccessfully, run the /sbin/service haveged status |grep "running" command and view thecommand output.l If the command is executed successfully, the haveged service is correctly installed and

configured, and is running properly. Go to Step 8.l If the command is not executed successfully, the haveged service is not running properly.

Go to Step 5.

Step 5 Run the /bin/rpm -qa | grep -w "rng-tools" command. If the command is executedsuccessfully, run the ps -ef | grep -v "grep" | grep rngd | tr -d " " | grep "\-o/dev/random"| grep "\-r/dev/urandom" command and view the command output.l If the command is executed successfully, the rngd service is correctly installed and

configured, and is running properly. Go to Step 8.l If the command is not executed successfully, the rngd service is not running properly. Go

to Step 6.

Step 6 Manually configure the system entropy. For details, see Related Information.

Step 7 Wait until 00:00:00, at which time the system checks the entropy again. Check whether thealarm is cleared automatically.l If yes, no further action is required.l If no, go to Step 8.



----End


2019-01-15 126

Related Information

Manually check the system entropy.

Log in to the node and run the sudo su - root command to switch the user. Run the cat /proc/sys/kernel/random/entropy_avail command to check whether the system entropy isgreater than or equal to 500. If the system entropy is less than 500, you can reset it by usingone of the following methods:

l Using the haveged tool (true random number mode): Contact the O&M personnel toinstall the tool and start it.

l Using the rng-tools tool (pseudo-random number mode): Contact the O&M personnel toinstall the tool.

5.7.24 ALM-13000 ZooKeeper Service Unavailable

Description

The system checks the ZooKeeper service status every 30 seconds. This alarm is generatedwhen the ZooKeeper service is unavailable and is cleared when the ZooKeeper servicerecovers.


13000 Critical Yes





Impact on the Systeml ZooKeeper fails to provide coordination services for upper-layer components.l Components dependent on ZooKeeper may not run properly.

Possible Causesl A ZooKeeper instance is abnormal.l The disk capacity is insufficient.


2019-01-15 127

l The network is faulty.l The DNS is installed on the ZooKeeper node.

Procedure

Check the ZooKeeper service instance status.

Step 1 On MRS Manager, choose Services > ZooKeeper > quorumpeer.

Step 2 Check whether the ZooKeeper instances are normal.l If yes, go to Step 6.l If no, go to Step 3.

Step 3 Select instances whose status is not good and choose More > Restart Instance.

Step 4 Check whether the instance status is good after restart.l If yes, go to Step 5.l If no, go to Step 19.

Step 5 On the Alarms tab, check whether the alarm is cleared.l If yes, no further action is required.l If no, go to Step 6.

Check the disk status.

Step 6 On MRS Manager, choose Services > ZooKeeper > quorumpeer, and check the hostinformation of the ZooKeeper instance on each node.

Step 7 On MRS Manager, click Hosts.

Step 8 In the Disk Usage column, check whether the disk space of each node that containsZooKeeper instances is insufficient (where disk usage exceeds 80%).l If yes, go to Step 9.l If no, go to Step 11.

Step 9 Expand disk capacity. For details, see ALM-12017 Insufficient Disk Capacity.

Step 10 On the Alarms tab, check whether the alarm is cleared.l If yes, no further action is required.l If no, go to Step 11.

Check the network status.

Step 11 On the Linux node that contains the ZooKeeper instance, run the ping command to checkwhether the host names of other nodes that contain the ZooKeeper instance can be pingedsuccessfully.l If yes, go to Step 15.l If no, go to Step 12.

Step 12 Modify the IP addresses in /etc/hosts and add the host name and IP address mapping.

Step 13 Run the ping command again to check whether the host names of other nodes that contain theZooKeeper instance can be pinged successfully.


2019-01-15 128



Step 14 On the Alarms tab, check whether the alarm is cleared.



Check the DNS.

Step 15 Check whether the DNS is installed on the node that contains the ZooKeeper instance. On theLinux node that contains the ZooKeeper instance, run the cat /etc/resolv.conf command tocheck whether the file is empty.



Step 16 Run the service named status command to check whether the DNS is started.


l No, go to Step 19.

Step 17 Run the service named stop command to stop the DNS service. If "Shutting down nameserver BIND waiting for named to shut down (28s)" is displayed, the DNS service is stoppedsuccessfully. Comment out any content in /etc/resolv.conf.

Step 18 On the Alarms tab, check whether the alarm is cleared.






----End

Related Information

N/A

5.7.25 ALM-13001 Available ZooKeeper Connections AreInsufficient

Description

The system checks ZooKeeper connections every 30 seconds. This alarm is generated whenthe system detects that the number of used ZooKeeper instance connections exceeds thethreshold (80% of the maximum connections).

This alarm is cleared when the number of used ZooKeeper instance connections is less thanthe threshold.


2019-01-15 129


13001 Major Yes







Available ZooKeeper connections are insufficient. When the connection usage reaches 100%,external connections cannot be handled.

Possible Causesl The number of connections to the ZooKeeper node exceeds the threshold.l Connection leakage occurs on some connection processes.l The maximum number of connections does not meet the requirement of the actual

scenario.

Procedure

Step 1 Check the connection status.

1. On the MRS Manager portal, choose Alarms > ALM-13001 Available ZooKeeperConnections Are Insufficient > Location. Check the IP address of the alarm node.

2. Obtain the PID of the ZooKeeper process. Log in to the alarm node and run the pgrep -fproc_zookeeper command.

3. Check whether the PID can be successfully obtained.– If yes, go to Step 1.4.– If no, go to Step 2.

4. Obtain all the IP addresses connected to the ZooKeeper instance and the number ofconnections. Check 10 IP addresses with the top connections. Run the followingcommand based on the obtained PID and IP address: lsof -i|grep $pid | awk '{print $9}'


2019-01-15 130

| cut -d : -f 2 | cut -d \> -f 2 | awk '{a[$1]++} END {for(i in a){print i,a[i] | "sort -r -g -k 2"}}' | head -10. ($pid is the PID obtained in the preceding step.)

5. Check whether the node IP addresses and the number of connections are successfullyobtained.



6. Obtain the ID of the port connected to the process. Run the following command based onthe obtained PID and IP address: lsof -i|grep $pid | awk '{print $9}' |cut -d \> -f 2 |grep$IP | cut -d :-f 2. ($pid and $IP are the PID and IP address obtained in the precedingstep.)

7. Check whether the port ID is successfully obtained.



8. Obtain the ID of the connected process. Log in to each IP address and run the followingcommand based on the obtained port ID: lsof -i|grep $port. ($port is the port IDobtained in the preceding step.)

9. Check whether the process ID is successfully obtained.



10. Check whether connection leakage occurs on the process based on the obtained processID.



11. Close the process where connection leakage occurs and check whether the alarm iscleared.



12. On the MRS Manager portal, choose Services > ZooKeeper > Service Configuration >All > quorumpeer > Performance and change the value of maxCnxns to 20000 ormore.

13. Check whether the alarm is cleared.






----End

Related Information

N/A


2019-01-15 131

5.7.26 ALM-13002 ZooKeeper Heap Memory or Direct MemoryUsage Exceeds the Threshold

DescriptionThe system checks the memory usage of the ZooKeeper service every 30 seconds. This alarmis generated when the memory usage of a ZooKeeper instance exceeds the threshold (80% ofthe maximum memory).

The alarm is cleared when the memory usage is less than the threshold.


13002 Major Yes






Impact on the SystemIf the available memory for the ZooKeeper service is insufficient, a memory overflow occursand the service breaks down.

Possible Causesl The memory usage of the ZooKeeper instance on the node is overusedl The memory is improperly allocated.

Procedure

Step 1 Check the memory usage.

1. On the MRS Manager portal, choose Alarms > ALM-13002 ZooKeeper MemoryUsage Exceeds the Threshold > Location. Check the IP address of the instance thatgenerated the alarm.


2019-01-15 132

2. On the MRS Manager portal, choose Services > ZooKeeper > Instance >quorumpeer(the IP address checked) > Customize > Heap and Direct Memory ofZooKeeper. Check the heap usage.

3. Check whether the used heap memory of ZooKeeper reaches 80% of the maximum heapmemory specified for ZooKeeper.



4. On the MRS Manager portal, choose Services > ZooKeeper > Service Configuration >All > quorumpeer > System. Increase the value of -Xmx in GC_OPTS as required.




6. On the MRS Manager portal, choose Services > ZooKeeper > Instance >quorumpeer(the IP address checked) > Customize > Heap and Direct Memory ofZooKeeper. Check the direct buffer memory usage.

7. Check whether the used direct buffer memory of ZooKeeper reaches 80% of themaximum direct buffer memory specified for ZooKeeper.



8. On the MRS Manager portal, choose Services > ZooKeeper > Service Configuration >All > quorumpeer > System.

Increase the value of -XX:MaxDirectMemorySize in GC_OPTS as required.







----End

Related Information

N/A

5.7.27 ALM-14000 HDFS Service Unavailable

Description

The system checks the service status of NameService every 30 seconds. This alarm isgenerated when the HDFS service becomes unavailable because all NameService services areabnormal.

This alarm is cleared when the HDFS service recovers because at least one NameServiceservice is in the normal state.


2019-01-15 133


14000 Critical Yes





Impact on the SystemHDFS fails to provide services for HDFS service-based upper-layer components, such asHBase and MapReduce. As a result, users cannot read or write files.

Possible Causesl The ZooKeeper service is abnormal.l All NameService services are abnormal.

Procedure

Step 1 Check the ZooKeeper service status.

1. Log in to MRS Manager, choose Services, and check whether the health status of theZooKeeper service is Good.– If yes, go to Step 1.2.– If no, go to Step 2.1.

2. Rectify the health status. For details, see ALM-13000 ZooKeeper Service Unavailable.Check whether the health status of the ZooKeeper service is Good.– If yes, go to Step 1.3.– If no, go to Step 3.


Step 2 Handle the NameService service exception alarm.

1. Log in to MRS Manager. On the Alarms page, check whether all NameService serviceshave abnormal alarms.


2019-01-15 134


2. See ALM-14010 NameService Service Is Abnormal to handle abnormal NameServiceservices and check whether each alarm is cleared.– If yes, go to Step 2.3.– If no, go to Step 3.




----End

Related Information

N/A

5.7.28 ALM-14001 HDFS Disk Usage Exceeds the Threshold

Description

The system checks the disk usage of the HDFS cluster every 30 seconds and compares it withthe threshold. This alarm is generated when the HDFS disk usage exceeds the threshold and iscleared when the usage is less than or equal to the threshold.


14001 Major Yes





NSName Specifies the NameService service forwhich the alarm is generated.


2019-01-15 135




The performance of writing data to HDFS is affected.

Possible Causes

The disk space configured for the HDFS cluster is insufficient.

Procedure

Step 1 Check the disk capacity and delete unnecessary files.

1. On the MRS Manager portal, choose Services > HDFS. The Service Status page isdisplayed.

2. In the Real-Time Statistics area, view the value of the monitoring indicator Percentageof HDFS Capacity to check whether the HDFS disk usage exceeds the threshold.



3. Use the client on the cluster node and run the hdfs dfsadmin -report command to checkwhether the value of DFS Used% is less than 100% minus the threshold.



4. Use the client on the cluster node and run the hdfs dfs -rm -r file or directory commandto delete unnecessary files.




Step 2 Expand the system.

1. Expand the disk capacity.







----End


2019-01-15 136

Related Information

N/A

5.7.29 ALM-14002 DataNode Disk Usage Exceeds the Threshold

Description

The system checks the DataNode disk usage every 30 seconds and compares it with thethreshold. This alarm is generated when the value of Percentage of DataNode Capacityexceeds the threshold and is cleared when the value is less than or equal to the threshold.

Attribute


14002 Major Yes

Parameters







The performance of writing data to HDFS is affected.

Possible Causesl The disk space configured for the HDFS cluster is insufficient.l Data skew occurs among DataNodes.

Procedure

Step 1 Check the cluster disk capacity.

1. Log in to MRS Manager. On the Alarms page, check whether alarm ALM-14001 HDFSDisk Usage Exceeds the Threshold exists.


2019-01-15 137


2. Follow the procedures in ALM-14001 HDFS Disk Usage Exceeds the Threshold tohandle the alarm and check whether the alarm is cleared.– If yes, go to Step 1.3.– If no, go to Step 3.


Step 2 Check the balance status of DataNodes.

1. Use the client on the cluster node and run the hdfs dfsadmin -report command to viewthe value of DFS Used% on the DataNode that generated the alarm. Compare this valuewith those on other DataNodes and check whether the difference between the values isgreater than 10.– If yes, go to Step 2.2.– If no, go to Step 3.

2. If data skew occurs, use the client on the cluster node and run the hdfs balancer -threshold 10 command.




----End


5.7.30 ALM-14003 Number of Lost HDFS Blocks Exceeds theThreshold

DescriptionThe system checks the number of lost blocks every 30 seconds and compares it with thethreshold. This alarm is generated when the number of lost blocks exceeds the threshold andis cleared when the number is less than or equal to the threshold.


14003 Major Yes


2019-01-15 138

Parameters








Data stored in HDFS is lost. HDFS may enter the safe mode and cannot provide writeservices. Lost block data cannot be restored.

Possible Causesl The DataNode instance is abnormal.l Data is deleted.

Procedure

Step 1 Check the DataNode instance.

1. On the MRS Manager portal, choose Services > HDFS > Instance.2. Check whether the status of all DataNode instances is Good.


3. Restart the DataNode instance. Check whether the DataNode instance restartssuccessfully.– If yes, go to Step 2.2.– If no, go to Step 2.1.

Step 2 Delete the damaged file.

1. Use the client on the cluster node. Run the hdfs fsck / -delete command to delete the lostfile. Then rewrite the file and recover the data.



2019-01-15 139



----End

Related Information

N/A

5.7.31 ALM-14004 Number of Damaged HDFS Blocks Exceeds theThreshold

Description

The system checks the number of damaged blocks every 30 seconds and compares it with thethreshold. This alarm is generated when the number of damaged blocks exceeds the thresholdand is cleared when the number is less than or equal to the threshold. It is advised to run thecommand (hdfs fsck /) to verify if any files are completely damaged.


14004 Major Yes








Data is damaged and HDFS fails to read files.


2019-01-15 140

Possible Causesl The DataNode instance is abnormal.l Data verification information is damaged.

ProcedureContact the O&M personnel and send the collected log information.


5.7.32 ALM-14006 Number of HDFS Files Exceeds the Threshold

DescriptionThe system checks the number of HDFS files every 30 seconds and compares it with thethreshold. This alarm is generated when the number of HDFS files exceeds the threshold andis cleared when the number is less than or equal to the threshold.


14006 Major Yes







Impact on the SystemDisk storage space is insufficient, which may result in data import failure. The performance ofthe HDFS system is affected.


2019-01-15 141

Possible Causes

The number of HDFS files exceeds the threshold.

Procedure

Step 1 Check whether unnecessary files exist in the system.

1. Use the client on the cluster node and run the hdfs dfs -ls file or directory command tocheck whether the files in the directory can be deleted.– If yes, go to Step 1.2.– If no, go to Step 2.1.

2. Run the hdfs dfs -rm -r file or directory command. Delete unnecessary files, wait 5minutes, and check whether the alarm is cleared.– If yes, no further action is required.– If no, go to Step 2.1.

Step 2 Check the number of files in the system.

1. On the MRS Manager portal, choose System > Configure Alarm Threshold.2. In the navigation tree on the left, choose Services > HDFS > HDFS File > Total

Number of Files.3. In the right pane, modify the threshold in the rule based on the number of current HDFS

files.To check the number of HDFS files, choose Services > HDFS, click Customize in theReal-Time Statistics area on the right, and select the HDFS File monitoring item.




----End

Related Information

N/A

5.7.33 ALM-14007 HDFS NameNode Memory Usage Exceeds theThreshold

Description

The system checks the HDFS NameNode memory usage every 30 seconds and compares itwith the threshold. This alarm is generated when the HDFS NameNode memory usageexceeds the threshold and is cleared when it is less than or equal to the threshold.


2019-01-15 142

Attribute


14007 Major Yes

Parameters





Trigger condition Generates an alarm when the actualindicator value exceeds the specifiedthreshold.


The HDFS NameNode memory usage is too high, which affects the data read/writeperformance of the HDFS.

Possible Causes

The HDFS NameNode memory is insufficient.

Procedure

Step 1 Delete unnecessary files.








----End


2019-01-15 143


5.7.34 ALM-14008 HDFS DataNode Memory Usage Exceeds theThreshold

DescriptionThe system checks the HDFS DataNode memory usage every 30 seconds and compares itwith the threshold. This alarm is generated when the HDFS DataNode memory usage exceedsthe threshold and is cleared when it is less than or equal to the threshold.


14007 Major Yes






Impact on the SystemThe HDFS DataNode memory usage is too high, which affects the data read/writeperformance of the HDFS.

Possible CausesThe HDFS DataNode memory is insufficient.

Procedure

Step 1 Delete unnecessary files.



2019-01-15 144







----End

Related Information

N/A

5.7.35 ALM-14009 Number of Dead DataNodes Exceeds theThreshold

Description

The system checks the number of faulty DataNodes in the HDFS cluster every 30 seconds andcompares it with the threshold. This alarm is generated when the number of faulty DataNodesin the HDFS cluster exceeds the threshold and is cleared when the number is less than orequal to the threshold.

Attribute


14009 Major Yes

Parameters







2019-01-15 145


Faulty DataNodes cannot provide HDFS services.

Possible Causesl DataNodes are faulty or overloaded.l The network between the NameNode and the DataNode is disconnected or busy.l NameNodes are overloaded.

Procedure

Step 1 Check whether DataNodes are faulty.

1. Use the client on the cluster node and run the hdfs dfsadmin -report command to checkwhether DataNodes are faulty.– If yes, go to Step 1.2.– If no, go to Step 2.1.

2. On the MRS Manager portal, choose Services > HDFS > Instance to check whether anyDataNode is stopped.– If yes, go to Step 1.3.– If no, go to Step 2.1.

3. Select the DataNode instance, and choose More > Restart Instance to restart it. Wait 5minutes and check whether the alarm is cleared.– If yes, no further action is required.– If no, go to Step 2.1.

Step 2 Check the status of the network between the NameNode and the DataNode.

1. Log in to the faulty DataNode using its service IP address. Run the ping IP address ofthe NameNode command to check whether the network between the DataNode and theNameNode is abnormal.– If yes, go to Step 2.2.– If no, go to Step 3.1.

2. Rectify the network fault. Wait 5 minutes and check whether the alarm is cleared.– If yes, no further action is required.– If no, go to Step 3.1.

Step 3 Check whether the DataNode is overloaded.

1. On the MRS Manager portal, click Alarms and check whether alarm ALM-14008HDFS DataNode Memory Usage Exceeds the Threshold exists.– If yes, go to Step 3.2.– If no, go to Step 4.1.

2. Follow the procedures in ALM-14008 HDFS DataNode Memory Usage Exceeds theThreshold to handle the alarm and check whether the alarm is cleared.– If yes, go to Step 3.3.– If no, go to Step 4.1.



2019-01-15 146


Step 4 Check whether the NameNode is overloaded.

1. On the MRS Manager portal, click Alarms and check whether alarm ALM-14007HDFS NameNode Memory Usage Exceeds the Threshold exists.– If yes, go to Step 4.2.– If no, go to Step 5.

2. Follow the procedures in ALM-14007 HDFS NameNode Memory Usage Exceeds theThreshold to handle the alarm and check whether the alarm is cleared.– If yes, go to Step 4.3.– If no, go to Step 5.




----End

Related Information

N/A

5.7.36 ALM-14010 NameService Service Is Abnormal

Description

The system checks the NameService service status every 180 seconds. This alarm is generatedwhen the NameService service is unavailable and is cleared when the NameService servicerecovers.

Attribute


14010 Major Yes

Parameters




2019-01-15 147




NSName Specifies the name of NameService forwhich the alarm is generated.


HDFS fails to provide services for upper-layer components based on the NameServiceservice, such as HBase and MapReduce. As a result, users cannot read or write files.

Possible Causesl The JournalNode is faulty.l The DataNode is faulty.l The disk capacity is insufficient.l The NameNode enters safe mode.

Procedure

Step 1 Check the status of the JournalNode instance.

1. On the MRS Manager portal, click Services.2. Click HDFS.3. Click Instance.4. Check whether the Health Status of the JournalNode is Good.


5. Select the faulty JournalNode and choose More > Restart Instance. Check whether theJournalNode successfully restarts.– If yes, go to Step 1.6.– If no, go to Step 5.


Step 2 Check the status of the DataNode instance.

1. On the MRS Manager portal, click Services.2. Click HDFS.3. In Operation and Health Summary, check whether the Health Status of all

DataNodes is Good.


2019-01-15 148



4. Click Instance. On the DataNode management page, select the faulty DataNode, andchoose More > Restart Instance. Check whether the DataNode successfully restarts.






Step 3 Check disk status.

1. On the MRS Manager portal, click Hosts.

2. In the Disk Usage column, check whether disk space is insufficient.



3. Expand the disk capacity.




Step 4 Check whether NameNode is in safe mode.

1. Use the client on the cluster node, and run the hdfs dfsadmin -safemode get commandto check whether Safe mode is ON is displayed.

Information after Safe mode is ON is alarm information and is displayed based on actualconditions.



2. Use the client on the cluster node and run the hdfs dfsadmin -safemode leavecommand.







----End

Related Information

N/A


2019-01-15 149

5.7.37 ALM-14011 HDFS DataNode Data Directory Is NotConfigured Properly

Description

The DataNode parameter dfs.datanode.data.dir specifies DataNode data directories. Thisalarm is generated in any of the following scenarios:

l A configured data directory cannot be created.

l A data directory uses the same disk as other critical directories in the system.

l Multiple directories use the same disk.

This alarm is cleared when the DataNode data directory is configured properly and thisDataNode is restarted.

Attribute


14011 Major Yes

Parameters






If the DataNode data directory is mounted on critical directories such as the root directory, thedisk space of the root directory will be used up after running for a long time. This causes asystem fault.

If the DataNode data directory is not configured properly, HDFS performance will deteriorate.

Possible Causesl The DataNode data directory fails to be created.

l The DataNode data directory uses the same disk as critical directories, such as / or /boot.

l Multiple directories in the DataNode data directory use the same disk.


2019-01-15 150

Procedure

Step 1 Check the alarm cause and information about the DataNode that generated the alarm.

1. On the MRS Manager portal, click Alarms. In the alarm list, click the alarm.2. In the Alarm Details area, view Alarm Cause. In HostName of Location, obtain the

host name of the DataNode that generated the alarm.

Step 2 Delete directories that do not comply with the disk plan from the DataNode data directory.

1. Choose Services > HDFS > Instance. In the instance list, click the DataNode instanceon the alarm node.

2. Click Instance Configuration and view the value of the DataNode parameterdfs.datanode.data.dir.

3. Check whether all DataNode data directories are consistent with the disk plan.– If yes, go to Step 2.4.– If no, go to Step 2.7.

4. Modify the DataNode parameter dfs.datanode.data.dir and delete the incorrectdirectories.

5. Choose Services > HDFS > Instance and restart the DataNode instance.6. Check whether the alarm is cleared.


7. Log in to the DataNode that generated the alarm.– If the alarm cause is "The DataNode data directory fails to be created", go to Step

3.1.– If the alarm cause is "The DataNode data directory uses the same disk as critical

directories, such / or /boot", go to Step 4.1.– If the alarm cause is "Multiple directories in the DataNode data directory use the

same disk", go to Step 5.1.

Step 3 Check whether the DataNode data directory is created.

1. Run the following command to switch the user:sudo su - rootsu - omm

2. Run the ls command to check whether the directories exist in the DataNode datadirectory.– If yes, go to Step 7.– If no, go to Step 3.3.

3. Run the mkdir data directory command to create a directory. Check whether thedirectory is successfully created.– If yes, go to Step 6.1.– If no, go to Step 3.4.

4. On the MRS Manager portal, click Alarms to check whether alarm ALM-12017Insufficient Disk Capacity exists.– If yes, go to Step 3.5.


2019-01-15 151

– If no, go to Step 3.6.5. Adjust the disk capacity and check whether alarm ALM-12017 Insufficient Disk

Capacity is cleared. For details, see ALM-12017 Insufficient Disk Capacity.– If yes, go to Step 3.3.– If no, go to Step 7.

6. Check whether user omm has the rwx or x permission for all upper-layer directories ofthe directory. For example, for /tmp/abc/, user omm has the x permission for the tmpdirectory and the rwx permission for the abc directory.– If yes, go to Step 6.1.– If no, go to Step 3.7.

7. Run the chmod u+rwx path or chmod u+x path command as user root to assign therwx or x permission to user omm. Go to Step 3.3.

Step 4 Check whether the DataNode data directory uses the same disk as other critical directories inthe system.

1. Run the df command to obtain the disk mounting information of each directory in theDataNode data directory.

2. Check whether the directories mounted on the disk are critical directories, such as / or /boot.– If yes, go to Step 4.3.– If no, go to Step 6.1.

3. Change the value of the DataNode parameter dfs.datanode.data.dir and delete thedirectories that use the same disk as critical directories.

4. Go to Step 6.1.

Step 5 Check whether multiple directories in the DataNode data directory use the same disk.

1. Run the df command to obtain the disk mounting information of each directory in theDataNode data directory. Record the mounted directory in the command output.

2. Modify the DataNode parameter dfs.datanode.data.dir to reserve one of the directoriesmounted on the same disk directory.

3. Go to Step 6.1.

Step 6 Restart the DataNode and check whether the alarm is cleared.

1. On the MRS Manager portal, choose Services > HDFS > Instance and restart theDataNode instance.

2. Check whether the alarm is cleared.– If yes, no further action is required.– If no, go to Step 7.



----End



2019-01-15 152

5.7.38 ALM-14012 HDFS JournalNode Data Is Not Synchronized

DescriptionOn the active NameNode, the system checks data synchronization on all JournalNodes in thecluster every 5 minutes. This alarm is generated when data on a JournalNode is notsynchronized with that on other JournalNodes.

This alarm is cleared in 5 minutes after data on the JournalNodes is synchronized.


14012 Major Yes




IP Specifies the service IP address of theJournalNode instance for which the alarm isgenerated.

Impact on the SystemIf data on more than half of the JournalNodes is not synchronized, the NameNode cannotwork correctly, making the HDFS service unavailable.

Possible Causesl The JournalNode instance has not been started or has been stopped.l The JournalNode instance is working incorrectly.l The network of the JournalNode is unreachable.

Procedure

Step 1 Check whether the JournalNode instance has been started.

1. Log in to MRS Manager and click Alarms. In the alarm list, click the alarm.2. In the Alarm Details area, check Location and obtain the IP address of the JournalNode

that generated the alarm.3. Choose Services > HDFS > Instance. In the instance list, click the JournalNode that

generated the alarm and check whether Operating Status of the node is Started.


2019-01-15 153


4. Select the JournalNode instance and choose More > Start Instance to start it.5. Wait 5 minutes and check whether the alarm is cleared.


Step 2 Check whether the JournalNode instance is working correctly.

1. Check whether Health Status of the JournalNode instance is Good.– If yes, go to Step 3.1.– If no, go to Step 2.2.

2. Select the JournalNode instance and choose More > Start Instance to start it.3. Wait 5 minutes and check whether the alarm is cleared.


Step 3 Check whether the network of the JournalNode is reachable.

1. On the MRS Manager portal, choose Services > HDFS > Instance to check the serviceIP address of the active NameNode.

2. Log in to the active NameNode.3. Run the ping Service IP address of the JournalNode command to check whether either a

timeout occurs or the network between the active NameNode and the JournalNode isunreachable.– If yes, go to Step 3.4.– If no, go to Step 4.

4. Contact O&M personnel to rectify the network fault. Wait 5 minutes and check whetherthe alarm is cleared.– If yes, no further action is required.– If no, go to Step 4.



----End


5.7.39 ALM-16000 Percentage of Sessions Connected to theHiveServer to Maximum Number Allowed Exceeds the Threshold

DescriptionThe system checks the percentage of sessions connected to the HiveServer to the maximumnumber allowed every 30 seconds. This indicator can be viewed on the Hive service


2019-01-15 154

monitoring page. This alarm is generated when the percentage exceeds the specified thresholdand is automatically cleared when the percentage is less than or equal to the threshold.

Attribute


16000 Major Yes

Parameters







New connections cannot be created.

Possible Causes

Too many clients are connected to the HiveServer.

Procedure

Step 1 Increase the maximum number of connections to Hive.

1. Log in to the MRS Manager portal.2. Choose Services > Hive > Service Configuration, and set Type to All.3. Increase the value of the hive.server.session.control.maxconnections configuration

item.Suppose the value of the configuration item is A, the threshold is B, and sessionsconnected to the HiveServer is C. Adjust the value of the configuration item according toA x B > C. Sessions connected to the HiveServer can be viewed on the Hive servicemonitoring page.

4. Check whether the alarm is cleared.– If yes, no further action is required.


2019-01-15 155





----End

Related Information

N/A

5.7.40 ALM-16001 Hive Warehouse Space Usage Exceeds theThreshold

Description

The system checks the usage of Hive data warehouse space every 30 seconds. The indicatorPercentage of HDFS Space Used by Hive to the Available Space can be viewed on theHive service monitoring page. This alarm is generated when the usage of Hive warehousespace exceeds the specified threshold and is cleared when the usage is less than or equal to thethreshold.

You can reduce the warehouse space usage by expanding the warehouse capacity or releasingused space.

Attribute


16001 Major Yes

Parameters







2019-01-15 156


The system fails to write data, which causes data loss.

Possible Causesl The maximum available capacity of the HDFS for Hive is too small.l The system disk space is insufficient.l Data nodes break down.

Procedure

Step 1 Expand the system configuration.

1. Analyze the cluster HDFS capacity usage and increase the maximum available capacityof the HDFS for Hive.Log in to MRS Manager, choose Services > Hive > Service Configuration, and setType to All. Increase the value of the hive.metastore.warehouse.size.percentconfiguration item. Suppose the value of the configuration item is A, total HDFS storagespace is B, the threshold is C, and HDFS space used by Hive is D. Adjust the value ofthe configuration item according to A x B x C > D. The total HDFS storage space can beviewed on the HDFS monitoring page, and HDFS space used by Hive can be viewed onthe Hive service monitoring page.

2. Check whether the alarm is cleared.– If yes, no further action is required.– If no, go to Step 2.1.


1. Add nodes.2. Check whether the alarm is cleared.


Step 3 Check whether data nodes are in the normal state.

1. Log in to MRS Manager and click Alarms.2. Check whether alarm ALM-12006 Node Fault, ALM-12007 Process Fault, or

ALM-14002 DataNode Disk Usage Exceeds the Threshold exists.– If yes, go to Step 3.3.– If no, go to Step 4.

3. Follow the procedures in ALM-12006 Node Fault, ALM-12007 Process Fault, orALM-14002 DataNode Disk Usage Exceeds the Threshold to handle the alarm.





2019-01-15 157


----End

Related Information

N/A

5.7.41 ALM-16002 Successful Hive SQL Operations Are Lowerthan the Threshold

Description

Every 30 seconds, the system checks the percentage of successfully executed HiveQLstatements. Percentage of successfully executed HiveQL statements = Number of HiveQLstatements successfully executed by Hive in a specified period/Total number of HiveQLstatements executed by Hive. This indicator can be viewed on the Hive service monitoringpage.

This alarm is generated when the percentage of successfully executed HiveQL statements islower than the specified threshold and is cleared when the percentage is greater than or equalto the threshold.

The name of the host where the alarm is generated can be obtained from the alarm locationinformation. The host IP address is the IP address of the HiveServer node.

Attribute


16002 Major Yes

Parameters







2019-01-15 158


The system configuration and performance cannot meet service processing requirements.

Possible Causesl A syntax error occurs in HiveQL commands.l The HBase service is abnormal when a Hive on HBase task is being performed.l Basic services that are depended on are abnormal, such as HDFS, Yarn, and ZooKeeper.

Procedure

Step 1 Check whether the HiveQL commands comply with syntax.

1. Use the Hive client to log in to the HiveServer node where the alarm is generated. Querythe HiveQL syntax standard provided by Apache, and check whether the HiveQLcommands are correct. For details, see https://cwiki.apache.org/confluence/display/hive/languagemanual.– If yes, go to Step 2.1.– If no, go to Step 1.2.

NOTE

To view the user who runs an incorrect statement, download the HiveServerAudit logs of theHiveServer node where this alarm is generated. Set Start Time and End Time to 10 minutesbefore and after the alarm generation time respectively. Open the log file and search for theResult=FAIL keyword to filter the log information about the incorrect statement, and then viewthe user who runs the incorrect statement according to UserName in the log information.

2. Enter correct HiveQL statements, and check whether the command can be properlyexecuted.– If yes, go to Step 4.5.– If no, go to Step 2.1.

Step 2 Check whether the HBase service is abnormal.

1. Check whether a Hive on HBase task is performed.– If yes, go to Step 2.2.– If no, go to Step 3.1.

2. Check whether the HBase service is in the normal state in the service list.– If yes, go to Step 3.1.– If no, go to Step 2.3.

3. Check the alarms displayed on the alarm page and clear them according to Alarm Help.4. Enter correct HiveQL statements, and check whether the command can be properly

executed.– If yes, go to Step 4.5.– If no, go to Step 3.1.

Step 3 Check whether the Spark service is abnormal.

1. Check whether the Spark service is in the normal state in the service list.– If yes, go to Step 4.1.


2019-01-15 159


2. Check the alarms displayed on the alarm page and clear them according to Alarm Help.

3. Enter correct HiveQL statements, and check whether the command can be properlyexecuted.



Step 4 Check whether HDFS, Yarn, and ZooKeeper are in the normal state.

1. On the MRS Manager portal, click Services.

2. In the service list, check whether the services, such as HDFS, Yarn, and ZooKeeper arein the normal state.



3. Check the alarms displayed on the alarm page and clear them according to Alarm Help.

4. Enter correct HiveQL statements, and check whether the command can be properlyexecuted.









----End

Related Information

N/A

5.7.42 ALM-16004 Hive Service Unavailable

Description

The system checks the Hive service status every 30 seconds. This alarm is generated when theHive service is unavailable and is cleared when the Hive service is in the normal state.

Attribute


16004 Critical Yes


2019-01-15 160

Parameters






The system cannot provide data loading, query, and extraction services.

Possible Causesl The Hive service unavailability may be related to basic services, such as ZooKeeper,

HDFS, Yarn, and DBService or caused by the faults of the Hive processes.– The ZooKeeper, HDFS, Yarn, or DBService services are abnormal.– The Hive service process is faulty. If the alarm is caused by a Hive process fault,

the alarm report has a delay of about 5 minutes.l The network communication between the Hive service and basic services is interrupted.

Procedure

Step 1 Check the HiveServer/MetaStore process status.

1. On MRS Manager, choose Services > Hive > Instance. In the Hive instance list, checkwhether all HiveSserver/MetaStore instances are in the Unknown state.– If yes, go to Step 1.2.– If no, go to Step 2.

2. Above the Hive instance list, choose More > Restart Instance to restart the HiveServer/MetaStore process.

3. In the alarm list, check whether alarm ALM-16004 Hive Service Unavailable iscleared.– If yes, no further action is required.– If no, go to Step 2.


1. In the alarm list on MRS Manager, check whether alarm ALM-12007 Process Fault isgenerated.– If yes, go to Step 2.2.– If no, go to Step 3.

2. In the Alarm Details area of ALM-12007 Process Fault, check whether ServiceNameis ZooKeeper.


2019-01-15 161


3. Rectify the fault by following the steps provided in ALM-12007 Process Fault.4. In the alarm list, check whether alarm ALM-16004 Hive Service Unavailable is

cleared.– If yes, no further action is required.

n If no, go to Step 3.

Step 3 Check the HDFS service status.

1. In the alarm list on MRS Manager, check whether alarm ALM-14000 HDFS ServiceUnavailable is generated.– If yes, go to Step 3.2.– If no, go to Step 4.

2. Rectify the fault by following the steps provided in ALM-14000 HDFS ServiceUnavailable.


Step 4 Check the Yarn service status.

1. In the alarm list on MRS Manager, check whether alarm ALM-18000 Yarn ServiceUnavailable is generated.– If yes, go to Step 4.2.– If no, go to Step 4.

2. Rectify the fault by following the steps provided in ALM-18000 Yarn ServiceUnavailable.


Step 5 Check the DBService service status.

1. In the alarm list on MRS Manager, check whether alarm ALM-27001 DBServiceUnavailable is generated.– If yes, go to Step 5.2.– If no, go to Step 6.

2. Rectify the fault by following the steps provided in ALM-27001 DBServiceUnavailable.


Step 6 Check the network connection between Hive and ZooKeeper, HDFS, Yarn, and DBService.


2019-01-15 162

1. On MRS Manager, choose Services > Hive.

2. Click Instance.

The HiveServer instance list is displayed.

3. Click Host Name in the row of HiveServer.

The HiveServer host status page is displayed.

4. Record the IP address under Summary.

5. Use the IP address obtained in Step 6.4 to log in to the host that runs HiveServer.

6. Run the ping command to check whether the network connection is in the normal statebetween the host that runs HiveServer and the hosts that run the ZooKeeper, HDFS,Yarn, and DBService services.



The methods of obtaining the IP addresses of the hosts that are running ZooKeeper,HDFS, Yarn, and DBService services, as well as the HiveServer IP address, are thesame.

7. Contact O&M personnel to recover the network.

8. In the alarm list, check whether the alarm ALM-16004 Hive Service Unavailable iscleared.






----End

Related Information

N/A

5.7.43 ALM-18000 Yarn Service Unavailable

Description

The alarm module checks the Yarn service status every 30 seconds. This alarm is generatedwhen the Yarn service is unavailable and is cleared when the Yarn service recovers.

Attribute


18000 Critical Yes


2019-01-15 163





Impact on the Systeml The cluster cannot provide the Yarn service.l Users cannot run new applications.l Submitted applications cannot be run.

Possible Causesl The ZooKeeper service is abnormal.l The HDFS service is abnormal.l There is no active ResourceManager node in the Yarn cluster.l All NodeManager nodes in the Yarn cluster are abnormal.

Procedure


1. In the alarm list on MRS Manager, check whether alarm ALM-13000 ZooKeeperService Unavailable is generated.– If yes, go to Step 1.2.– If no, go to Step 2.1.

2. Rectify the fault by following the steps provided in ALM-13000 ZooKeeper ServiceUnavailable, and check whether the alarm is cleared.– If yes, no further action is required.– If no, go to Step 2.1.


1. In the alarm list on MRS Manager, check whether an alarm related to HDFS isgenerated.– If yes, go to Step 2.2.– If no, go to Step 3.1.

2. Click Alarms, and handle HDFS alarms according to Alarm Help. Check whether thealarm is cleared.– If yes, no further action is required.– If no, go to Step 3.1.


2019-01-15 164

Step 3 Check the ResourceManager node status in the Yarn cluster.

1. On MRS Manager, choose Services > Yarn.

2. In Yarn Summary, check whether there is an active ResourceManager node in the Yarncluster.



Step 4 Check the NodeManager node status in the Yarn cluster.

1. On MRS Manager, choose Services > Yarn > Instance.

2. Check Health Status of NodeManager, and check whether there are unhealthy nodes.



3. Rectify the fault by following the steps provided in ALM-18002 NodeManagerHeartbeat Lost or ALM-18003 NodeManager Unhealthy. After the fault is rectified,check whether the alarm is cleared.






----End

Related Information

N/A

5.7.44 ALM-18002 NodeManager Heartbeat Lost

Description

The system checks the number of lost NodeManager nodes every 30 seconds and comparesthe number with the threshold. This alarm is generated when the value of the Lost Nodesindicator exceeds the threshold and is cleared when the value is less than or equal to thethreshold.

Attribute


18002 Major Yes


2019-01-15 165






Impact on the Systeml The lost NodeManager node cannot provide the Yarn service.l The number of containers decreases, so the cluster performance deteriorates.

Possible Causesl NodeManager is forcibly deleted without decommission.l All NodeManager instances are stopped or the NodeManager process is faulty.l The host where the NodeManager node resides is faulty.l The network between the NodeManager and ResourceManager is disconnected or busy.

Procedure


Related Information

N/A

5.7.45 ALM-18003 NodeManager Unhealthy

Description

The system checks the number of abnormal NodeManager nodes every 30 seconds andcompares the number with the threshold. This alarm is generated when the value of theUnhealthy Nodes indicator exceeds the threshold and is cleared when the value is less than orequal to the threshold.


18003 Major Yes


2019-01-15 166






Impact on the Systeml The faulty NodeManager node cannot provide the Yarn service.l The number of containers decreases, so the cluster performance deteriorates.

Possible Causesl The hard disk space of the host where the NodeManager node resides is insufficient.l User omm does not have the permission to access a local directory on the NodeManager

node.

Procedure


Related Information

N/A

5.7.46 ALM-18006 MapReduce Job Execution Timeout

Description

The alarm module checks MapReduce job execution every 30 seconds. This alarm isgenerated when the execution of a submitted MapReduce job times out. It must be manuallycleared.


18006 Major No


2019-01-15 167

Parameters







Because the execution times out, no execution result can be obtained.

Possible Causes

The specified time period is shorter than the execution time. (Executing a MapReduce jobtakes a long time.)

Procedure

Step 1 Check whether time is improperly set.

Set -Dapplication.timeout.interval to a larger value, or do not set the parameter. Execute theMapReduce job again and check whether it is executed successfully.

l If yes, go to Step 2.4.

l If no, go to Step 2.1.

Step 2 Check the Yarn service status.

1. In the alarm list on MRS Manager, check whether alarm ALM-18000 Yarn ServiceUnavailable is generated.– If yes, go to Step 2.2.– If no, go to Step 3.

2. Rectify the fault by following the steps provided in ALM-18000 Yarn ServiceUnavailable.

3. Run the MapReduce job command again to check whether the MapReduce job can beexecuted.– If yes, go to Step 2.4.– If no, go to Step 4.


2019-01-15 168

4. In the alarm list, choose Operation > to manually clear the alarm. No further actionis required.

Step 3 Adjust the timeout threshold.

On MRS Manager, choose System > Configure Alarm Threshold > Service > Yarn >Timed Out Tasks, and increase the maximum number of timeout tasks allowed by the currentthreshold rule. Check whether the alarm is cleared.l If yes, no further action is required.l If no, go to Step 4.



----End


5.7.47 ALM-19000 HBase Service Unavailable

DescriptionThe alarm module checks the HBase service status every 30 seconds. This alarm is generatedwhen the HBase service is unavailable and is cleared when the HBase service recovers.


19000 Critical Yes





Impact on the SystemOperations such as reading or writing data and creating tables cannot be performed.


2019-01-15 169

Possible Causesl The ZooKeeper service is abnormal.l The HDFS service is abnormal.l The HBase service is abnormal.l The network is abnormal.

Procedure


1. In the service list on MRS Manager, check whether the health status of ZooKeeper isGood.– If yes, go to Step 2.1.– If no, go to Step 1.2.

2. In the alarm list, check whether alarm ALM-13000 ZooKeeper Service Unavailableexists.– If yes, go to Step 1.3.– If no, go to Step 2.1.

3. Rectify the fault by following the steps provided in ALM-13000 ZooKeeper ServiceUnavailable.

4. Wait several minutes and check whether the alarm is cleared.– If yes, no further action is required.– If no, go to Step 2.1.


1. In the alarm list, check whether alarm ALM-14000 HDFS Service Unavailable exists.– If yes, go to Step 2.2.– If no, go to Step 3.

2. Rectify the fault by following the steps provided in ALM-14000 HDFS ServiceUnavailable.

3. Wait several minutes and check whether the alarm is cleared.



----End


5.7.48 ALM-19006 HBase Replication Synchronization Failed

DescriptionThis alarm is generated when disaster recovery (DR) data fails to be synchronized to astandby cluster. It is cleared when DR data is successfully synchronized.


2019-01-15 170


19006 Major Yes





Impact on the SystemHBase data in a cluster fails to be synchronized to the standby cluster, causing datainconsistency between active and standby clusters.

Possible Causesl The HBase service on the standby cluster is abnormal.l The network is abnormal.

Procedure

Step 1 Check whether the alarm is automatically cleared.

1. Log in to MRS Manager of the active cluster, and click Alarms.2. In the alarm list, click the alarm and obtain the alarm generation time from Generated

On in Alarm Details. Check whether the alarm persists for over 5 minutes.– If yes, go to Step 2.1.– If no, go to Step 1.3.

3. Wait 5 minutes and check whether the alarm is automatically cleared.– If yes, no further action is required.– If no, go to Step 2.1.

Step 2 Check the HBase service status of the standby cluster.

1. Log in to MRS Manager of the active cluster, and click Alarms.2. In the alarm list, click the alarm and obtain HostName from Location in Alarm Details.3. Log in to the node where the HBase client resides in the active cluster. Run the following

command to switch the user:sudo su - root


2019-01-15 171

su - omm4. Run the status 'replication', 'source' command to check the replication synchronization

status of the faulty node.The replication synchronization status of a node is as follows.10-10-10-153: SOURCE: PeerID=abc, SizeOfLogQueue=0, ShippedBatches=2, ShippedOps=2, ShippedBytes=320, LogReadInBytes=1636, LogEditsRead=5, LogEditsFiltered=3, SizeOfLogToReplicate=0, TimeForLogToReplicate=0, ShippedHFiles=0, SizeOfHFileRefsQueue=0, AgeOfLastShippedOp=0, TimeStampsOfLastShippedOp=Mon Jul 18 09:53:28 CST 2016, Replication Lag=0, FailedReplicationAttempts=0 SURCE: PeerID=abc1, SizeOfLogQueue=0, ShippedBatches=1, ShippedOps=1, ShippedBytes=160, LogReadInBytes=1636, LogEditsRead=5, LogEditsFiltered=3, SizeOfLogToReplicate=0, TimeForLogToReplicate=0, ShippedHFiles=0, SizeOfHFileRefsQueue=0, AgeOfLastShippedOp=16788, TimeStampsOfLastShippedOp=Sat Jul 16 13:19:00 CST 2016, Replication Lag=16788, FailedReplicationAttempts=5

5. Obtain PeerID corresponding to a record whose value of FailedReplicationAttempts isgreater than 0.In the preceding step, data on faulty node 10-10-10-153 fails to be synchronized to astandby cluster whose PeerID is abc1.

6. Run the list_peers command to find the cluster and the HBase instance corresponding tothe PeerID.PEER_ID CLUSTER_KEY STATE TABLE_CFS abc1 10.10.10.110,10.10.10.119,10.10.10.133:24002:/hbase2 ENABLED abc 10.10.10.110,10.10.10.119,10.10.10.133:24002:/hbase ENABLED /hbase2 indicates that data is synchronized to the HBase2 instance of the standby cluster.

7. In the service list on MRS Manager of the standby cluster, check whether the healthstatus of the HBase instance obtained in Step 2.6 is Good.– If yes, go to Step 3.1.– If no, go to Step 2.8.

8. In the alarm list, check whether alarm ALM-19000 HBase Service Unavailable exists.– If yes, go to Step 2.9.– If no, go to Step 3.1.

9. Rectify the fault by following the steps provided in ALM-19000 HBase ServiceUnavailable.

10. Wait several minutes and check whether the alarm is cleared.– If yes, no further action is required.– If no, go to Step 3.1.

Step 3 Check the network connection between RegionServers on active and standby clusters.

1. Log in to MRS Manager of the active cluster, and click Alarms.2. In the alarm list, click the alarm and obtain HostName from Location in Alarm Details.3. Log in to the faulty RegionServer node.4. Run the ping command to check whether the network connection between the faulty

RegionServer node and the host where RegionServer of the standby cluster resides is inthe normal state.– If yes, go to Step 4.– If no, go to Step 3.5.

5. Contact O&M personnel to recover the network.


2019-01-15 172

6. After the network recovers, check whether the alarm is cleared in the alarm list.






----End

Related Information

N/A

5.7.49 ALM-25000 LdapServer Service Unavailable

Description

The system checks the LdapServer service status every 30 seconds. This alarm is generatedwhen the system detects that both the active and standby LdapServer services are abnormal. Itis cleared when one or both LdapServer services are normal.

Attribute


25000 Critical Yes

Parameters






No operation can be performed for the KrbServer and LdapServer users in the cluster. Forexample, users, user groups, or roles cannot be added, deleted, or modified, and userpasswords cannot be changed on MRS Manager. Authentication for existing users in thecluster is not affected.


2019-01-15 173

Possible Causesl The node where the LdapServer service resides is faulty.l The LdapServer process is abnormal.

Procedure

Step 1 Check whether the nodes where the two SlapdServer instances of the LdapServer servicereside are faulty.

1. On MRS Manager, choose Services > LdapServer > Instance to go to the LdapServerinstance page. Obtain the host name of the node where the two SlapdServer instancesreside.

2. On the Alarms page of MRS Manager, check whether alarm ALM-12006 Node Fault isgenerated.– If yes, go to Step 1.3.– If no, go to Step 2.1.

3. Check whether the host name in the alarm is consistent with the host name in Step 1.1.– If yes, go to Step 1.4.– If no, go to Step 2.1.

4. Rectify the fault by following the steps provided in ALM-12006 Node Fault.5. In the alarm list, check whether alarm ALM-25000 LdapServer Service Unavailable is

cleared.– If yes, no further action is required.– If no, go to Step 3.

Step 2 Check whether the LdapServer process is in the normal state.

1. On the Alarms page of MRS Manager, check whether alarm ALM-12007 Process Faultis generated.– If yes, go to Step 2.2.– If no, go to Step 3.

2. Check whether the service name and host name in the alarm are consistent with those ofLdapServer.– If yes, go to Step 2.3.– If no, go to Step 3.

3. Rectify the fault by following the steps provided in ALM-12007 Process Fault.4. In the alarm list, check whether alarm ALM-25000 LdapServer Service Unavailable is




----End


2019-01-15 174

Related Information

N/A

5.7.50 ALM-25004 Abnormal LdapServer Data Synchronization

Description

This alarm is generated when LdapServer data on Manager is inconsistent or LdapServer datais different between LdapServer and Manager. It is cleared when the data becomes consistent.

Attribute


25004 Critical Yes

Parameters






LdapServer data inconsistency occurs because LdapServer data on Manager or in the clusteris damaged. The LdapServer process with damaged data cannot provide services externally,and the authentication functions of Manager and the cluster are affected.

Possible Causesl The network of the node where the LdapServer process locates is faulty.

l The LdapServer process is abnormal.

l The OS restart damages data on LdapServer.

Procedure

Step 1 Check whether the network where the LdapServer nodes reside is faulty.

1. On MRS Manager, click Alarms. Record the IP address of HostName in Location ofthe alarm as IP1. If multiple alarms exist, record the IP addresses as IP1, IP2, and IP3.


2019-01-15 175

2. Contact O&M personnel and log in to the node corresponding to IP1. Run the pingcommand on the node to check whether the IP address of the management plane of theactive OMS node can be pinged.– If yes, go to Step 1.3.– If no, go to Step 2.1.

3. Contact O&M personnel to recover the network and check whether alarm ALM-25004Abnormal LdapServer Data Synchronization is cleared.– If yes, no further action is required.– If no, go to Step 2.1.

Step 2 Check whether the LdapServer process is in the normal state.

1. On the Alarms page of MRS Manager, check whether alarm ALM-12004 OLdapResource Is Abnormal is generated.– If yes, go to Step 2.2.– If no, go to Step 2.4.

2. Rectify the fault by following the steps provided in ALM-12004 OLdap Resource IsAbnormal.

3. Check whether alarm ALM-25004 Abnormal LdapServer Data Synchronization iscleared.– If yes, no further action is required.– If no, go to Step 2.4.

4. On the Alarms page of MRS Manager, check whether alarm ALM-12007 Process Faultof LdapServer is generated.– If yes, go to Step 2.5.– If no, go to Step 3.1.

5. Rectify the fault by following the steps provided in ALM-12007 Process Fault.6. Check whether alarm ALM-25004 Abnormal LdapServer Data Synchronization is

cleared.– If yes, no further action is required.– If no, go to Step 3.1.

Step 3 Check whether the OS restart damages data on LdapServer.

1. On MRS Manager, click Alarms. Record the IP address of HostName in Location ofthe alarm as IP1. If multiple alarms exist, record the IP addresses as IP1, IP2, and IP3.Choose Services > LdapServer > Service Configuration, record the LdapServer portnumber as PORT. If the IP address in the alarm location information is the IP address ofthe standby OMS node, the port ID is the default port ID 21750.

2. Log in to the node corresponding to IP1 as user omm and run the ldapsearch -Hldaps://IP1:PORT -x -LLL -b dc=hadoop,dc=com command to check whether errorsare displayed in the queried information. If the IP address is that of the standby OMSnode, run export LDAPCONF=${CONTROLLER_HOME}/ldapserver/ldapserver/local/conf/ldap.conf before running the preceding command.– If yes, go to Step 3.3.– If no, go to Step 4.

3. Recover the LdapServer and OMS nodes using backup data before the alarm isgenerated. For details, see section "Recovering Manager Data" in the AdministratorGuide.


2019-01-15 176

NOTE

To restore data, use the OMS data and LdapServer data backed up at the same time. Otherwise, theservice and operation may fail. To recover data when services are running properly, you areadvised to manually back up the latest management data and then recover the data. Otherwise,Manager data produced between the backup and recovery points in time will be lost.

4. Check whether alarm ALM-25004 Abnormal LdapServer Data Synchronization iscleared.






----End

Related Information

N/A

5.7.51 ALM-25500 KrbServer Service Unavailable

Description

The system checks the KrbServer service status every 30 seconds. This alarm is generatedwhen the KrbServer service is abnormal and is cleared when the KrbServer service is in thenormal state.

Attribute


25500 Critical Yes

Parameters






2019-01-15 177

Impact on the Systeml No operation can be performed for the KrbServer component in the cluster.l KrbServer authentication of other components will be affected.l The health status of components that depend on KrbServer in the cluster is Bad.

Possible Causesl The node where the KrbServer service resides is faulty.l The OLdap service is unavailable.

Procedure

Step 1 Check whether the node where the KrbServer service locates is faulty.

1. On MRS Manager, choose Services > KrbServer > Instance to go to the KrbServerinstance page. Obtain the host name of the node where the KrbServer service resides.

2. On the Alarms page of MRS Manager, check whether alarm ALM-12006 Node Fault isgenerated.– If yes, go to Step 1.3.– If no, go to Step 2.1.

3. Check whether the host name in the alarm is consistent with the host name in Step 1.1.– If yes, go to Step 1.4.– If no, go to Step 2.1.

4. Rectify the fault by following the steps provided in ALM-12006 Node Fault.5. In the alarm list, check whether alarm ALM-25500 KrbServer Service Unavailable is


Step 2 Check whether the OLdap service is unavailable.

1. On the Alarms page of MRS Manager, check whether alarm ALM-12004 OLdapResource Is Abnormal is generated.– If yes, go to Step 2.2.– If no, go to Step 3.

2. Rectify the fault by following the steps provided in ALM-12004 OLdap Resource IsAbnormal.

3. In the alarm list, check whether alarm ALM-25500 KrbServer Service Unavailable iscleared.– If yes, no further action is required.– If no, go to Step 3.



----End


2019-01-15 178

Related Information

N/A

5.7.52 ALM-27001 DBService Unavailable

Description

The alarm module checks the DBService status every 30 seconds. This alarm is generatedwhen the system detects that DBService is unavailable and is cleared when DBServicerecovers.


27001 Critical Yes






The database service is unavailable and cannot provide data import or query functions forupper-layer services, which results in service exceptions.

Possible Causesl The floating IP address does not exist.l There is no active DBServer instance.l The active and standby DBServer processes are abnormal.

Procedure

Step 1 Check whether the floating IP address exists in the cluster environment.

1. On MRS Manager, choose Services > DBService > Instance.2. Check whether the active instance exists.



2019-01-15 179

– If no, go to Step 2.1.3. Select the active DBServer instance and record the IP address.4. Log in to the host that corresponds to the preceding IP address, and run the ifconfig

command to check whether the DBService floating IP address exists on the node.– If yes, go to Step 1.5.– If no, go to Step 2.1.

5. Run the ping floating IPaddress command to check whether the DBService floating IPaddress can be pinged.– If yes, go to Step 1.6.– If no, go to Step 2.1.

6. Log in to the host that corresponds to the DBService floating IP address, and run theifconfig interface down command to delete the floating IP address.

7. On MRS Manager, choose Services > DBService > More > Restart Service to restartDBService. Check whether DBService is restarted successfully.– If yes, go to Step 1.8.– If no, go to Step 2.1.

8. Wait about 2 minutes and check whether the alarm is cleared from the alarm list.– If yes, no further action is required.– If no, go to Step 3.1.

Step 2 Check the status of the active DBServer instance.

1. Select the DBServer instance whose role status is abnormal and record the IP address.2. On the Alarms page, check whether alarm ALM-12007 Process Fault occurs in the

DBServer instance on the host that corresponds to the IP address.– If yes, go to Step 2.3.– If no, go to Step 4.

3. Follow procedures in ALM-12007 Process Fault to handle the alarm.4. Wait about 5 minutes and check whether the alarm is cleared from the alarm list.


Step 3 Check the status of the active and standby DBServers.

1. Log in to the host that corresponds to the DBService floating IP address, and run thesudo su - root and su - omm commands to switch to user omm. Run the cd ${BIGDATA_HOME}/FusionInsight/dbservice/ command to go to the installationdirectory of DBService.

2. Run the sh sbin/status-dbserver.sh command to view the status of DBService's activeand standby HA processes. Check whether the status can be viewed successfully.– If yes, go to Step 3.3.– If no, go to Step 4.

3. Check whether the active and standby HA processes are normal.– If yes, go to Step 4.– If no, go to Step 3.4.

4. On MRS Manager, choose Services > DBService > More > Restart Service to restartDBService, and check whether DBService is restarted successfully.


2019-01-15 180



5. Wait about 2 minutes and check whether the alarm is cleared from the alarm list.






----End

Related Information

N/A

5.7.53 ALM-27003 DBService Heartbeat Interruption Between theActive and Standby Nodes

Description

This alarm is generated when the active or standby DBService node does not receiveheartbeat messages from the peer node. It is cleared when the heartbeat recovers.

Attribute


27003 Major Yes

Parameters





Local DBService HA Name Specifies a local DBService HA.

Peer DBService HA Name Specifies a peer DBService HA.


2019-01-15 181


During the DBService heartbeat interruption, only one node can provide services. If this nodeis faulty, no standby node is available for failover and the services become unavailable.

Possible Causes

The link between the active and standby DBService nodes is abnormal.

Procedure

Step 1 Check whether the network between the active and standby DBService servers is in thenormal state.

1. In the alarm list on MRS Manager, locate the row that contains the alarm, and view theIP address of the standby DBService server in the alarm details.

2. Log in to the active DBService server.

3. Run the ping heartbeat IP address of the standby DBService command to check whetherthe standby DBService server is reachable.



4. Contact the network administrator to check whether the network is faulty.



5. Rectify the network fault and check whether the alarm is cleared from the alarm list.






----End

Related Information

N/A

5.7.54 ALM-27004 Data Inconsistency Between Active andStandby DBServices

Description

The system checks the data synchronization status between the active and standbyDBServices every 10 seconds. This alarm is generated when the synchronization status cannotbe queried six times consecutively or when the synchronization status is abnormal. This alarmis cleared when data synchronization is normal.


2019-01-15 182


27004 Critical Yes





Local DBService HA Name Specifies a local DBService HA.

Peer DBService HA Name Specifies a peer DBService HA.

SYNC_PERSENT Specifies the synchronization percentage.

Impact on the SystemData may be lost or abnormal if the active instance becomes abnormal.

Possible Causesl The network between the active and standby nodes is unstable.l The standby DBService is abnormal.l The disk space of the standby node is full.

Procedure

Step 1 Check whether the network between the active and standby nodes is in the normal state.

1. Log in to MRS Manager, click Alarms, click the row where the alarm is located in thealarm list, and view the IP address of the standby DBService in the alarm details.

2. Log in to the active DBService node.3. Run the ping heartbeat IP address of the standby DBService command to check whether

the standby DBService node is reachable.– If yes, go to Step 2.1.– If no, go to Step 1.4.

4. Contact the O&M personnel to check whether the network is faulty.– If yes, go to Step 1.5.– If no, go to Step 2.1.


2019-01-15 183

5. Rectify the network fault and check whether the alarm is cleared from the alarm list.– If yes, no further action is required.– If no, go to Step 2.1.

Step 2 Check whether the standby DBService is in the normal state.

1. Log in to the standby DBService node.2. Run the following command to switch the user:


3. Go to the ${DBSERVER_HOME}/sbin directory and run the ./status-dbserver.shcommand to check whether the GaussDB resource status of the standby DBService is inthe normal state. In the command output, check whether the following information isdisplayed in the row where ResName is gaussDB:For example:10_10_10_231 gaussDB Standby_normal Normal Active_standby


Step 3 Check whether the disk space of the standby node is insufficient.

1. Use PuTTY to log in to the standby DBService node as user root.2. Run the su - omm command to switch to user omm.3. Go to the ${DBSERVER_HOME} directory, and run the following commands to obtain

the DBService data directory:cd ${DBSERVER_HOME}source .dbservice_profileecho ${DBSERVICE_DATA_DIR}

4. Run the df -h command to check the system disk partition usage.5. Check whether the DBService data directory space is full.


6. Perform an upgrade and expand the capacity.7. After capacity expansion, wait 2 minutes and check whether the alarm is cleared.




----End

Related Information

N/A


2019-01-15 184

5.7.55 ALM-28001 Spark Service Unavailable

Description

The system checks the Spark service status every 30 seconds. This alarm is generated whenthe Spark service is unavailable and is cleared when the Spark service recovers.

Attribute


28001 Critical Yes

Parameters






The Spark tasks submitted by users fail to be executed.

Possible Causes

Any of the following services is abnormal:

l KrbServerl LdapServerl ZooKeeperl HDFSl Yarnl Hive

Procedure

Step 1 Check whether service unavailability alarms exist in services that Spark depends on.

1. On MRS Manager, click Alarms.2. Check whether any of the following alarms exists in the alarm list:


2019-01-15 185

a. ALM-25500 KrbServer Service Unavailable

b. ALM-25000 LdapServer Service Unavailable

c. ALM-13000 ZooKeeper Service Unavailable

d. ALM-14000 HDFS Service Unavailable

e. ALM-18000 Yarn Service Unavailable

f. ALM-16004 Hive Service Unavailable



3. Handle the alarms using the troubleshooting methods provided in the alarm help.

After all the alarms are cleared, wait a few minutes and check whether this alarm iscleared.






----End

Related Information

N/A

5.7.56 ALM-26051 Storm Service Unavailable

Description

The system checks the Storm service availability every 30 seconds. This alarm is generated ifthe Storm service becomes unavailable after all Nimbus nodes in a cluster become abnormal.

This alarm is cleared after the Storm service recovers.

Attribute


26051 Critical Yes

Parameters




2019-01-15 186




Impact on the Systeml The cluster cannot provide the Storm service.l Users cannot run new Storm tasks.

Possible Causesl The Kerberos component is faulty.l ZooKeeper is faulty or suspended.l The active and standby Nimbus nodes in the Storm cluster are abnormal.

Procedure

Step 1 Check the Kerberos component status. For clusters without Kerberos authentication, skip thisstep and go to Step 2.

1. On MRS Manager, click Services.2. Check whether the health status of the Kerberos service is Good.


3. Rectify the fault by following instructions in ALM-25500 KrbServer ServiceUnavailable.

4. Perform Step 1.2 again.

Step 2 Check the ZooKeeper component status.

1. Check whether the health status of the ZooKeeper service is Good.– If yes, go to Step 3.1.– If no, go to Step 2.2.

2. If the ZooKeeper service is stopped, start it. For other problems, follow the instructionsin ALM-13000 ZooKeeper Service Unavailable.


Step 3 Check the status of the active and standby Nimbus nodes.

1. Choose Services > Storm > Nimbus.2. In Role, check whether only one active Nimbus node exists.


3. Select the two Nimbus instances and choose More > Restart Instance. Check whetherthe restart is successful.


2019-01-15 187


4. Log in to MRS Manager again and choose Services > Storm > Nimbus. Check whetherthe health status of Nimbus is Good.– If yes, go to Step 3.5.– If no, go to Step 4.1.

5. Wait 30 seconds and check whether the alarm is cleared.– If yes, no further action is required.– If no, go to Step 4.1.



----End

Related Information

N/A

5.7.57 ALM-26052 Number of Available Supervisors in Storm IsLower Than the Threshold

Description

The system checks the number of supervisors every 60 seconds and compares it with thethreshold. This alarm is generated if the number of supervisors is lower than the threshold.

To modify the threshold, users can choose System > Threshold Configuration on MRSManager.

This alarm is cleared if the number of supervisors is greater than or equal to the threshold.

Attribute


26052 Major Yes

Parameters





2019-01-15 188




Impact on the Systeml Existing tasks in the cluster cannot be executed.

l The cluster can receive new Storm tasks but cannot execute them.

Possible Causes

Supervisors are abnormal in the cluster.

Procedure

Step 1 Check the supervisor status.

1. Choose Services > Storm > Supervisor.

2. In Role, check whether the cluster has supervisor instances that are in the Bad orConcerning state.



3. Select the supervisor instances that are in the Bad or Concerning state and choose More> Restart Instance.

– If the restart is successful, go to Step 1.4.

– If the restart fails, go to Step 2.1.

4. Wait 30 seconds and check whether the alarm is cleared.






----End

Related Information

N/A


2019-01-15 189

5.7.58 ALM-26053 Slot Usage of Storm Exceeds the Threshold

DescriptionThe system checks the slot usage of Storm every 60 seconds and compares it with thethreshold. This alarm is generated if the slot usage exceeds the threshold.


This alarm is cleared if the slot usage is lower than or equal to the threshold.


26053 Major Yes






Impact on the SystemUsers cannot run new Storm tasks.

Possible Causesl Supervisors are abnormal in the cluster.l Supervisors are normal but have poor processing capability.

Procedure

Step 1 Check the supervisor status.

1. Choose Services > Storm > Supervisor.2. In Role, check whether the cluster has supervisor instances that are in the Bad or

Concerning state.


2019-01-15 190

– If yes, go to Step 1.3.– If no, go to Step 2.1 or Step 3.1.

3. Select the supervisor instances that are in the Bad or Concerning state and choose More> Restart Instance.– If the restart is successful, go to Step 1.4.– If the restart fails, go to Step 4.1.

4. Wait a moment and then check whether the alarm is cleared.– If yes, no further action is required.– If no, go to Step 2.1 or Step 3.1.

Step 2 Increase the number of slots for the supervisors.

1. On MRS Manager, choose Services > Storm > Supervisor > Service Configuration >Type > All.

2. Increase the value of supervisor.slots.ports to increase the number of slots for eachsupervisor. Then restart the instances.

3. Wait a moment and then check whether the alarm is cleared.– If yes, no further action is required.– If no, go to Step 4.1.

Step 3 Expand the capacity of the supervisors.

1. Add nodes.2. Wait a moment and then check whether the alarm is cleared.




----End

Related Information

N/A

5.7.59 ALM-26054 Heap Memory Usage of Storm Nimbus Exceedsthe Threshold

Description

The system checks the heap memory usage of Storm Nimbus every 30 seconds and comparesit with the threshold. This alarm is generated if the heap memory usage exceeds the threshold(80% by default).

To modify the threshold, users can choose System > Threshold Configuration > Service >Storm on MRS Manager.

This alarm is cleared if the heap memory usage is lower than or equal to the threshold.


2019-01-15 191


26054 Major Yes







Frequent memory garbage collection or memory overflow may occur, affecting submission ofStorm services.

Possible Causes

The heap memory usage is high or the heap memory is improperly allocated.

Procedure

Step 1 Check the heap memory usage.

1. On MRS Manager, choose Alarms > ALM-26054 Heap Memory Usage of StormNimbus Exceeds the Threshold > Location. Query the HostName of the alarmedinstance.

2. On MRS Manager, choose Services > Storm > Instance> Nimbus (corresponding tothe HostName of the alarmed instance) > Customize > Heap Memory Usage ofNimbus.

3. Check whether the heap memory usage of Nimbus has reached the threshold (80%).– If yes, go to Step 1.4.– If no, go to Step 2.1.

4. Adjust the heap memory.On MRS Manager, choose Services > Storm > Service Configuration > All > Nimbus> System. Increase the value of -Xmx in NIMBUS_GC_OPTS. Click SaveConfiguration. Select Restart the affected services or instances and click OK.


2019-01-15 192

5. Check whether the alarm is cleared.– If yes, no further action is required.– If no, go to Step 2.1.



----End

Related Information

N/A

5.7.60 ALM-38000 Kafka Service Unavailable

Description

The system checks the Kafka service availability every 30 seconds. This alarm is generatedwhen the Kafka service becomes unavailable.

This alarm is cleared after the Kafka service recovers.

Attribute


38000 Critical Yes

Parameters






The cluster cannot provide the Kafka service and users cannot run new Kafka tasks.

Possible Causesl The KrbServer component is faulty.


2019-01-15 193

l The ZooKeeper component is faulty or fails to respond.

l The Broker node in the Kafka cluster is abnormal.

Procedure

Step 1 Check the KrbServer component status. For clusters without Kerberos authentication, skipthis step and go to Step 2.

1. On MRS Manager, click Services.

2. Check whether the health status of the KrbServer service is Good.



3. Rectify the fault by following instructions in ALM-25500 KrbServer ServiceUnavailable.


Step 2 Check the ZooKeeper component status.

1. Check whether the health status of the ZooKeeper service is Good.



2. If the ZooKeeper service is stopped, start it. For other problems, follow the instructionsin ALM-13000 ZooKeeper Service Unavailable.


Step 3 Check the Broker status.

1. Choose Services > Kafka > Broker.

2. In Role, check whether all instances are normal.



3. Select all instances of Broker and choose More > Restart Instance.

– If the restart is successful, go to Step 3.4.

– If the restart fails, go to Step 4.1.

4. Choose Services > Kafka. Check whether the health status of Kafka is Good.



5. Wait 30 seconds and check whether the alarm is cleared.






----End


2019-01-15 194

Related Information

N/A

5.7.61 ALM-38001 Insufficient Kafka Disk Space

Description

The system checks the Kafka disk usage every 60 seconds and compares it with the threshold.This alarm is generated when the disk usage exceeds the threshold.


This alarm is cleared when the Kafka disk usage is lower than or equal to the threshold.


38001 Major Yes





PartitionName Specifies the disk partition where the alarmis generated.



Kafka fails to write data to the disks.

Possible Causesl The Kafka disk configurations (such as disk count and disk size) are insufficient for the

data volume.l The data retention period is long and historical data occupies a large space.


2019-01-15 195

l Services are improperly planned. As a result, data is unevenly distributed and some disksare full.

Procedure1. Log in to MRS Manager and click Alarms.2. In the alarm list, click the alarm and view the HostName and PartitionName of the

alarm in Location of Alarm Details.3. In Hosts, click the host obtained in 2.4. Check whether the Disk area contains the PartionName of the alarm.

– If yes, go to 5.– If no, manually clear the alarm and no further action is required.

5. In the Disk area, check whether the usage of the alarmed partition has reached 100%.– If yes, go to 6.– If no, go to 8.

6. In Instance, choose Broker > Instance Configuration. On the Instance Configurationpage that is displayed, set Type to All and query data directory parameter log.dirs.

7. Choose Services > Kafka > Instance. On the Kafka Instance page that is displayed,stop the Broker instance corresponding to that in 2. Then log in to the alarm node andmanually delete the data directory queried in 6. After all subsequent operations arecomplete, start the Broker instance.

8. Choose Services > Kafka > Service Configuration. The Kafka Configuration page isdisplayed.

9. Check whether disk.adapter.enable is true.– If yes, go to 11.– If no, change the value to true and go to 10.

10. Check whether the adapter.topic.min.retention.hours parameter, indicating theminimum data retention period, is properly configured.– If yes, go to 11.– If no, set it to a proper value and go to 11.

NOTE

If the retention period cannot be adjusted for certain topics, the topics can be added todisk.adapter.topic.blacklist.

11. Wait 10 minutes and check whether the disk usage is reduced.– If yes, wait until the alarm is cleared.– If no, go to 12.

12. Go to the Kafka Topic Monitor page and query the data retention period configured forKafka. Determine whether the retention period needs to be shortened based on servicerequirements and data volume.– If yes, go to 13.– If no, go to 14.

13. Find the topics with great data volumes based on the disk partition obtained in 2. Log into the Kafka client and manually shorten the data retention period for these topics usingthe following command:


2019-01-15 196

kafka-topics.sh --zookeeper ZooKeeper address:24002/kafka --alter --topic Topicname --config retention.ms=Retention period

14. Check whether partitions are properly configured for topics. For example, if the numberof partitions for a topic with a large data volume is smaller than the number of disks, datamay be unevenly distributed to the disks and the usage of some disks will reach theupper limit.

NOTE

To identify topics with great data volumes, log in to the relevant nodes that are obtained in 2, go tothe data directory (the directory before log.dirs in 6 is modified), and check the disk spaceoccupied by the partitions of the topics.

– If the partitions are improperly configured, go to 15.– If the partitions are properly configured, go to 16.

15. On the Kafka client, add partitions to the topics.kafka-topics.sh --zookeeper ZooKeeper address:24002/kafka --alter --topic Topicname --partitions=Number of new partitions

NOTE

It is advised to set the number of new partitions to a multiple of the number of Kafka disks.

This operation may not quickly clear the alarm. Data will be gradually balanced among the disks.

16. Check whether the cluster capacity needs to be expanded.– If yes, add nodes to the cluster and go to 17.– If no, go to 17.

17. Wait a moment and then check whether the alarm is cleared.– If yes, no further action is required.– If no, go to 18.


Related Information

N/A

5.7.62 ALM-38002 Heap Memory Usage of Kafka Exceeds theThreshold

Description

The system checks the heap memory usage of Kafka every 30 seconds. This alarm isgenerated when the heap memory usage of Kafka exceeds the threshold (80%).

This alarm is cleared when the heap memory usage is lower than the threshold.

Attribute


38002 Major Yes


2019-01-15 197

Parameters







Memory overflow may occur, causing service crashes.

Possible Causes

The heap memory usage is high or the heap memory is improperly allocated.

Procedure

Step 1 Check the heap memory usage.

1. On MRS Manager, choose Alarms > ALM-38002 Heap Memory Usage of KafkaExceeds the Threshold > Location. Query the IP address of the alarmed instance.

2. On MRS Manager, choose Services > Kafka > Instance > Broker (corresponding tothe IP address of the alarmed instance) > Customize > Kafka Heap MemoryResource Percentage.

3. Check whether the heap memory usage of Kafka has reached the threshold (80%).



4. On MRS Manager, choose Services > Kafka > Service Configuration > All > Broker> Environment. Increase the value of KAFKA_HEAP_OPTS as required.







----End


2019-01-15 198

Related Information

N/A

5.7.63 ALM-24000 Flume Service Unavailable

Description

The alarm module checks the Flume service status every 180 seconds. This alarm is generatedwhen the Flume service is abnormal.

This alarm is cleared when the Flume service recovers.

Attribute


24000 Critical Yes

Parameters






Flume cannot work and data transmission is interrupted.

Possible Causesl HDFS is unavailable.l LdapServer is unavailable.

Procedure

Step 1 Check the HDFS status.

On MRS Manager, check whether alarm ALM-14000 HDFS Service Unavailable is reported.l If yes, clear the alarm according to the handling suggestions of "ALM-14000 HDFS

Service Unavailable".l If no, go to Step 2.


2019-01-15 199

Step 2 Check the LdapServer status.

On MRS Manager, check whether alarm ALM-25000 LdapServer Service Unavailable isreported.l If yes, clear the alarm according to the handling suggestions of "ALM-25000 LdapServer

Service Unavailable".l If no, go to Step 3.1.

Step 3 Check whether the HDFS and LdapServer services are stopped.

1. In the service list on MRS Manager, check whether the HDFS and LdapServer servicesare stopped.– If yes, start the HDFS and LdapServer services and go to Step 3.2.– If no, go to Step 4.1.

2. Check whether the "ALM-24000 Flume Service Unavailable" alarm is cleared.– If yes, no further action is required.– If no, go to Step 4.1.



----End

Related Information

N/A

5.7.64 ALM-24001 Flume Agent Is Abnormal

Description

This alarm is generated when the Flume agent monitoring module detects that the Flumeagent process is abnormal.

This alarm is cleared when the Flume agent process recovers.


24001 Minor Yes




2019-01-15 200





Functions of the alarmed Flume agent instance are abnormal. Data transmission tasks of theinstance are suspended. In real-time data transmission, data will be lost.

Possible Causesl The JAVA_HOME directory does not exist or the Java permission is incorrect.l The permission of the Flume agent directory is incorrect.

Procedure

Step 1 Check the Flume agent's configuration file.

1. Log in to the host where the faulty node resides. Run the following command to switchto user root:sudo su - root

2. Run the cd Flume installation directory/fusioninsight-flume-1.6.0/conf/ command to goto Flume's configuration directory.

3. Run the cat ENV_VARS command. Check whether the JAVA_HOME directory existsand whether the Flume agent user has execute permission of Java.– If yes, go to Step 2.1.– If no, go to Step 1.4.

4. Specify the correct JAVA_HOME directory and grant the Flume agent user with theexecute permission of Java. Then go to Step 2.4.

Step 2 Check the permission of the Flume agent directory.


2. Run the following command to access the installation directory of the Flume agent:cd Flume agent installation directory

3. Run the ls -al * -R command. Check whether the owner of all files is the Flume agentuser.– If yes, go to Step 3.1.– If no, run the chown command and change the owner of the files to the Flume agent

user. Then go to Step 2.4.4. Check whether the alarm is cleared.



2019-01-15 201




----End

Related Information

N/A

5.7.65 ALM-24003 Flume Client Connection Failure

Description

The alarm module monitors the port connection status on the Flume server. This alarm isgenerated when the Flume server fails to receive a connection message from the Flume clientin 3 consecutive minutes.

This alarm is cleared when the Flume server receives a connection message from the Flumeclient.

Attribute


24003 Major Yes

Parameters


ClientIP Specifies the IP address of the Flume client.

ServerIP Specifies the IP address of the Flume server.

ServerPort Specifies the port on the Flume server.


The communication between the Flume client and server fails. The Flume client cannot senddata to the Flume server.

Possible Causesl The network between the Flume client and server is faulty.l The Flume client's process is abnormal.


2019-01-15 202

l The Flume client is incorrectly configured.

Procedure

Step 1 Check the network between the Flume client and server.

1. Log in to the host where the alarmed Flume client resides. Run the following commandto switch to user root:sudo su - root

2. Run the ping Flume server IP address command to check whether the network betweenthe Flume client and server is normal.



Step 2 Check whether the Flume client's process is normal.


2. Run the ps -ef|grep flume |grep client command to check whether the Flume clientprocess exists.



Step 3 Check the Flume client configuration.


2. Run the cd Flume installation directory/fusioninsight-flume-1.6.0/conf/ command to goto Flume's configuration directory.

3. Run the cat properties.properties command to query the current configuration file ofthe Flume client.

4. Check whether the properties.properties file is correctly configured according to theconfiguration description of the Flume agent.



5. Modify the properties.properties configuration file.







----End


2019-01-15 203

Related Information

N/A

5.7.66 ALM-24004 Flume Fails to Read Data

Description

The alarm module monitors the Flume source status. This alarm is generated when theduration that Flume source fails to read data exceeds the threshold.

Users can modify the threshold as required.

This alarm is cleared when the source reads data successfully.

Attribute


24004 Major Yes

Parameters




ComponentType Specifies the component type for which thealarm is generated.

ComponentName Specifies the component name for which thealarm is generated.


Data collection is stopped.

Possible Causesl The Flume source is faulty.l The network is faulty.

Procedure

Step 1 Check whether the Flume source is normal.


2019-01-15 204

1. Check whether the Flume source is the spoolDir type.– If yes, go to Step 1.2.– If no, go to Step 1.3.

2. Query the spoolDir directory and check whether all files have been sent.– If yes, no further action is required.– If no, go to Step 1.5.

3. Check whether the Flume source is the Kafka type.– If yes, go to Step 1.4.– If no, go to Step 1.5.

4. Log in to the Kafka client and run the following commands to check whether all topicdata configured for the Kafka source has been consumed.cd /opt/client/Kafka/kafka/bin./kafka-consumer-groups.sh --bootstrap-server Kafka cluster IP address:21007 --new-consumer --describe --group example-group1 --command-config../config/consumer.properties– If yes, no further action is required.– If no, go to Step 1.5.

5. On MRS Manager, choose Services > Flume > Instance.6. Click the Flume instance of the faulty node and check whether the value of the Source

Speed Metrics is 0.– If yes, go to Step 2.1.– If no, no further action is required.

Step 2 Check the status of the network between the Flume source and faulty node.

1. Check whether the Flume source is the avro type.– If yes, go to Step 2.3.– If no, go to Step 3.1.


3. Run the ping Flume source IP address command to check whether the Flume source canbe pinged.– If yes, go to Step 3.1.– If no, go to Step 2.4.

4. Contact the network administrator to repair the network.5. Wait for a while and check whether the alarm is cleared.




----End


2019-01-15 205

Related Information

N/A

5.7.67 ALM-24005 Data Transmission by Flume Is Abnormal

Description

The alarm module monitors the capacity of Flume channels. This alarm is generated when theduration that a channel is full or the number of times that a source fails to send data to thechannel exceeds the threshold.

Users can set the threshold as required by modifying the channelfullcount parameter.

This alarm is cleared when the channel space is released.

Attribute


24005 Major Yes

Parameters




ComponentType Specifies the component type for which thealarm is generated.

ComponentName Specifies the component name for which thealarm is generated.


If the usage of the Flume channel continues to grow, the data transmission time increases.When the usage reaches 100%, the Flume agent process is suspended.

Possible Causesl The Flume sink is faulty.

l The network is faulty.


2019-01-15 206

Procedure

Step 1 Check whether the Flume sink is normal.

1. Check whether the Flume sink is the HDFS type.– If yes, go to Step 1.2.– If no, go to Step 1.3.

2. On MRS Manager, check whether alarm ALM-14000 HDFS Service Unavailable isreported and whether the HDFS service is stopped.– If the alarm is reported, clear it according to the handling suggestions of

ALM-14000 HDFS Service Unavailable; if the HDFS service is stopped, start it.Then go to Step 1.7.

– If the alarm is not reported and the HDFS service is running properly, go to Step1.7.

3. Check whether the Flume sink is the HBase type.– If yes, go to Step 1.4.– If no, go to Step 1.7.

4. On MRS Manager, check whether alarm ALM-19000 HBase Service Unavailable isreported and whether the HBase service is stopped.– If the alarm is reported, clear it according to the handling suggestions of

ALM-19000 HBase Service Unavailable; if the HBase service is stopped, start it.Then go to Step 1.7.

– If the alarm is not reported and the HBase service is running properly, go to Step1.7.

5. Check whether the Flume sink is the Kafka type.– If yes, go to Step 1.6.– If no, go to Step 1.7.

6. On MRS Manager, check whether alarmALM-38000 Kafka Service Unavailable isreported and whether the Kafka service is stopped.– If the alarm is reported, clear it according to the handling suggestions of

ALM-38000 Kafka Service Unavailable; if the Kafka service is stopped, start it.Then go to Step 1.7.

– If the alarm is not reported and the Kafka service is running properly, go to Step1.7.

7. On MRS Manager, choose Services > Flume > Instance.8. Click the Flume instance of the faulty node and check whether the value of the Sink

Speed Metrics is 0.– If yes, go to Step 2.1.– If no, no further action is required.

Step 2 Check the status of the network between the Flume sink and faulty node.

1. Check whether the Flume sink is the avro type.– If yes, go to Step 2.3.– If no, go to Step 3.1.

2. Log in to the host where the faulty node resides. Run the following command to switchto user root:


2019-01-15 207

sudo su - root3. Run the ping Flume sink IP address command to check whether the Flume sink can be

pinged.– If yes, go to Step 3.1.– If no, go to Step 2.4.

4. Contact the network administrator to repair the network.5. Wait for a while and check whether the alarm is cleared.




----End

Related Information

N/A

5.7.68 ALM-12041 Permission of Key Files Is Abnormal

Description

The system checks the permission, users, and user groups of key directories or files everyhour. This alarm is generated when any of these is abnormal.

This alarm is cleared when the problem that causes abnormal permission, users, or usergroups is solved.

Attribute


12041 Major Yes

Parameters






2019-01-15 208


PathName Specifies the file path or file name.

Impact on the SystemSystem functions are unavailable.

Possible CausesThe user has manually modified the file permission, user information, or user groups, or thesystem has experienced an unexpected power-off.

Procedure

Step 1 Check the file permission.

1. On MRS Manager, click Alarms.2. In the details of the alarm, query the HostName (name of the alarmed host) and

PathName (path or name of the involved file).3. Log in to the alarm node.4. Run the ll PathName command to query the current user, permission, and user group of

the file or path.5. Go to the ${BIGDATA_HOME}/nodeagent/etc/agent/autocheck directory and run the

vi keyfile command. Search for the name of the involved file and query the correctpermission of the file.

6. Compare the actual permission of the file with the permission obtained in Step 1.5. Ifthey are different, change the actual permission, user information, and user group to thecorrect values.

7. Wait until the next system check is complete and check whether the alarm is cleared.– If yes, no further action is required.– If no, go to Step 2.



----End


5.7.69 ALM-12042 Key File Configurations Are Abnormal

DescriptionThe system checks key file configurations every hour. This alarm is generated when any keyconfiguration is abnormal.


2019-01-15 209

This alarm is cleared when the configuration becomes normal.

Attribute


12042 Major Yes

Parameters





PathName Specifies the file path or file name.


Functions related to the file are abnormal.

Possible Causes

The user has manually modified the file configurations or the system has experienced anunexpected power-off.

Procedure

Step 1 Check the file configurations.

1. On MRS Manager, click Alarms.2. In the details of the alarm, query the HostName (name of the alarmed host) and

PathName (path or name of the involved file).3. Log in to the alarm node.4. Manually check and modify the file configurations according to the criteria in Related

Information.5. Wait until the next system check is complete and check whether the alarm is cleared.





2019-01-15 210


----End

Related Informationl Checking /etc/fstab

Check whether partitions configured in /etc/fstab exist in /proc/mounts and whetherswap partitions configured in /etc/fstab match those in /proc/swaps.

l Checking /etc/hostsRun the cat /ect/hosts command. If any of the following situations exists, the fileconfigurations are abnormal.– The /ect/hosts file does not exist.– The host name is not configured in the file.– The IP address of the host is duplicate.– The IP address of the host does not exist in the ipconfig list.– An IP address in the file is used by multiple hosts.

5.7.70 ALM-23001 Loader Service Unavailable

Description

The system checks the Loader service availability every 60 seconds. This alarm is generatedwhen the Loader service is unavailable and is cleared when the Loader service recovers.

Attribute


23001 Critical Yes

Parameters






Data loading, import, and conversion are unavailable.


2019-01-15 211

Possible Causesl The services that Loader depends on are abnormal.

– ZooKeeper is abnormal.– HDFS is abnormal.– DBService is abnormal.– Yarn is abnormal.– MapReduce is abnormal.

l The network is faulty. Loader cannot communicate with its dependent services.l Loader is running improperly.

Procedure

Step 1 Check the ZooKeeper status.

1. On MRS Manager, choose Services > ZooKeeper. Check whether the health status ofZooKeeper is normal.– If yes, go to Step 1.3.– If no, go to Step 1.2.

2. Choose More > Restart Service to restart ZooKeeper. After ZooKeeper starts, checkwhether alarm ALM-23001 Loader Service Unavailable is cleared.– If yes, no further action is required.– If no, go to Step 1.3.

3. On MRS Manager, check whether alarm ALM-12007 Process Fault is reported.– If yes, go to Step 1.4.– If no, go to Step 2.1.

4. In Alarm Details of alarm ALM-12007 Process Fault, check whether ServiceName isZooKeeper.– If yes, go to Step 1.5.– If no, go to Step 2.1.

5. Clear the alarm according to the handling suggestions of ALM-12007 Process Fault.6. Check whether alarm ALM-23001 Loader Service Unavailable is cleared.


Step 2 Check the HDFS status.

1. On MRS Manager, check whether alarm ALM-14000 HDFS Service Unavailable isreported.– If yes, go to Step 2.2.– If no, go to Step 3.1.

2. Clear the alarm according to the handling suggestions of ALM-14000 HDFS ServiceUnavailable.

3. Check whether alarm ALM-23001 Loader Service Unavailable is cleared.– If yes, no further action is required.– If no, go to Step 3.1.


2019-01-15 212

Step 3 Check the DBService status.

1. On MRS Manager, choose Services > DBService. Check whether the health status ofDBService is normal.– If yes, go to Step 3.1.– If no, go to Step 3.2.

2. Choose More > Restart Service to restart DBService. After DBService starts, checkwhether alarm ALM-23001 Loader Service Unavailable is cleared.– If yes, no further action is required.– If no, go to Step 4.1.

Step 4 Check the MapReduce status.

1. On MRS Manager, choose Services > MapReduce. Check whether the health status ofMapReduce is normal.– If yes, go to Step 5.1.– If no, go to Step 4.2.

2. Choose More > Restart Service to restart MapReduce. After MapReduce starts, checkwhether alarm ALM-23001 Loader Service Unavailable is cleared.– If yes, no further action is required.– If no, go to Step 5.1.

Step 5 Check the Yarn status.

1. On MRS Manager, choose Services > Yarn. Check whether the health status of Yarn isnormal.– If yes, go to Step 5.3.– If no, go to Step 5.2.

2. Choose More > Restart Service to restart Yarn. After Yarn starts, check whether alarmALM-23001 Loader Service Unavailable is cleared.– If yes, no further action is required.– If no, go to Step 5.3.

3. On MRS Manager, check whether alarm ALM-18000 Yarn Service Unavailable isreported.– If yes, go to Step 5.4.– If no, go to Step 6.1.

4. Clear the alarm according to the handling suggestions of ALM-18000 Yarn ServiceUnavailable.

5. Check whether alarmALM-23001 Loader Service Unavailable is cleared.– If yes, no further action is required.– If no, go to Step 6.1.

Step 6 Check the network connections between Loader and its dependent components.

1. On MRS Manager, choose Services > Loader.2. Click Instance. The Sqoop instance list is displayed.3. Record the management IP addresses of all Sqoop instances.4. Log in to the hosts using the IP addresses obtained in Step 6.3. Run the following

commands to switch the user:


2019-01-15 213


5. Run the ping command to check whether the network connection between the hostswhere the Sqoop instances reside and the dependent components is normal. (Thedependent components include ZooKeeper, DBService, HDFS, MapReduce, and Yarn.The method to obtain the IP addresses of the dependent components is the same as thatused to obtain the IP addresses of the Sqoop instances.)– If yes, go to Step 7.1.– If no, go to Step 6.6.

6. Contact the network administrator to repair the network.7. Check whether alarm ALM-23001 Loader Service Unavailable is cleared.




----End

5.7.71 ALM-12357 Failed to Export Audit Logs to the OBS

DescriptionIf the user has configured audit log export to the OBS on MRS Manager, the system regularlyexports audit logs to the OBS. This alarm is generated when the system fails to access theOBS.

This alarm is cleared when the system exports audit logs to the OBS successfully.


12357 Major Yes






2019-01-15 214


The local system saves a maximum of seven compressed service audit log files. If this alarmpersists, local service audit logs may be lost.

The local system saves a maximum of 50 management audit log files (each file contains100,000 records). If this alarm persists, local management audit logs may be lost.

Possible Causesl Connection to the OBS server fails.

l The specified OBS bucket does not exist.

l The user AK/SK information is invalid.

l The local OBS configuration cannot be obtained.

Procedure

Step 1 Log in to the OBS server and check whether the OBS server can be properly accessed.



Step 2 Contact the maintenance personnel to repair the OBS. Then check whether the alarm iscleared.



Step 3 On MRS Manager, choose System > Export Audit Log. Check whether the AK/SKinformation, bucket name, and path are correct.



Step 4 Correct the information. Then check whether the alarm is cleared when the export task isexecuted again.

NOTE

To check alarm clearance quickly, you can set the start time of audit log collection to 10 or 30 minuteslater than the current time. After checking the result, restore the original start time.






----End

Related Information

N/A


2019-01-15 215

5.7.72 ALM-12014 Partition Lost

Description

The system checks the partition status periodically. This alarm is generated when the systemdetects that a partition to which service directories are mounted is lost (because the device isremoved or goes offline, or the partition is deleted).

This alarm must be manually cleared.

Attribute


12014 Major No

Parameters





DirName Specifies the directory for which the alarmis generated.

PartitionName Specifies the device partition for which thealarm is generated.


Service data fails to be written into the partition, and the service system runs abnormally.

Possible Causesl The hard disk is removed.

l The hard disk is offline, or a bad sector exists on the hard disk.

Procedure

Step 1 On MRS Manager, click Alarms, and click the alarm in the real-time alarm list.

Step 2 In the Alarm Details area, obtain HostName, PartitionName and DirName from Location.


2019-01-15 216

Step 3 Check whether the disk of PartitionName on HostName is inserted to the correct server slot.



Step 4 Contact hardware engineers to remove the faulty disk.

Step 5 Use PuTTY to log in to the HostName node where an alarm is reported and check whetherthere is a line containing DirName in the /etc/fstab file.



Step 6 Run the vi /etc/fstab command to edit the file and delete the line containing DirName.

Step 7 Contact hardware engineers to insert a new disk. For details, see the hardware productdocument of the relevant model. If the faulty disk is in a RAID group, configure the RAIDgroup. For details, see the configuration methods of the relevant RAID controller card.

Step 8 Wait 20 to 30 minutes (The disk size determines the waiting time), and run the mountcommand to check whether the disk has been mounted to the DirName directory.

l If yes, manually clear the alarm. No further operation is required.





----End

Related Information

N/A

5.7.73 ALM-12015 Partition Filesystem Readonly

Description

The system checks the partition status periodically. This alarm is generated when the systemdetects that a partition to which service directories are mounted enters the read-only mode(due to a bad sector or a faulty file system).

This alarm is cleared when the system detects that the partition to which service directoriesare mounted exits from the read-only mode (because the file system is restored to read/writemode, the device is removed, or the device is formatted).

Attribute


12015 Major Yes


2019-01-15 217





DirName Specifies the directory for which the alarmis generated.

PartitionName Specifies the device partition for which thealarm is generated.

Impact on the SystemService data fails to be written into the partition, and the service system runs abnormally.

Possible CausesThe hard disk is faulty. For example, a bad sector exists.

Procedure

Step 1 On MRS Manager, click the alarm in the real-time alarm list.

Step 2 In the Alarm Details area, obtain HostName and PartitionName from Location. HostNameis the node where the alarm is reported, and PartitionName is the partition of the faulty disk.

Step 3 Contact hardware engineers to check whether the disk is faulty. If the disk is faulty, remove itfrom the server.

Step 4 After the disk is removed, alarm ALM-12014 Partition Lost is reported. Handle the alarm.For details, see ALM-12014 Partition Lost. After the alarm ALM-12014 Partition Lost iscleared, alarm ALM-12015 Partition Filesystem Readonly is automatically cleared.

----End


5.7.74 ALM-12043 DNS Resolution Duration Exceeds theThreshold

DescriptionThe system checks the DNS resolution duration every 30 seconds and compares the actualDNS resolution duration with the threshold (the default threshold is 20,000 ms). This alarm is


2019-01-15 218

generated when the system detects that the DNS resolution duration exceeds the threshold forseveral times (2 times by default) consecutively.

To change the threshold, choose System > Threshold Configuration > Device > Host >Network Status > DNS Name Resolution Duration > DNS Name Resolution Duration.

When the hit number is 1, this alarm is cleared when the DNS resolution duration is less thanor equal to the threshold. When the hit number is not 1, this alarm is cleared when the DNSresolution duration is less than or equal to 90% of the threshold.

Attribute


12043 Major Yes

Parameters





Impact on the Systeml Kerberos-based secondary authentication is slow.l The ZooKeeper service is abnormal.l The node is faulty.

Possible Causesl The node is configured with the DNS client.l The node is equipped with the DNS server and the DNS server is started.

Procedure

Check whether the node is configured with the DNS client.

Step 1 On MRS Manager, click Alarms.

Step 2 Check the value of HostName in the detailed alarm information to obtain the name of thehost involved in this alarm.

Step 3 Use PuTTY to log in to the node for which the alarm is generated as user root.


2019-01-15 219

Step 4 Run the cat /etc/resolv.conf command to check whether the DNS client is installed.

If information similar to the following is displayed, the DNS client is installed and started:

namesever 10.2.3.4 namesever 10.2.3.4


Step 5 Run the vi /etc/resolv.conf command to comment out the following content using the numbersigns (#) and save the file:# namesever 10.2.3.4 # namesever 10.2.3.4

Step 6 Check whether this alarm is cleared after 5 minutes.l If yes, no further action is required.l If no, go to Step 7.

Check whether the node is equipped with the DNS server and the DNS server is started.

Step 7 Run the service named status command to check whether the DNS server is installed on thenode:

If information similar to the following is displayed, the DNS server is installed and started:

Checking for nameserver BIND version: 9.6-ESV-R7-P4 CPUs found: 8 worker threads: 8 number of zones: 17 debug level: 0 xfers running: 0 xfers deferred: 0 soa queries in progress: 0 query logging is ON recursive clients: 4/0/1000 tcp clients: 0/100 server is up and running


Step 8 Run the service named stop command to stop the DNS service.

Step 9 Check whether this alarm is cleared after 5 minutes.l If yes, no further action is required.l If no, go to Step 10.

Collect fault information.

Step 10 On MRS Manager, choose System > Export Log.


----End

Related Information

N/A


2019-01-15 220

5.7.75 ALM-12045 Network Read Packet Dropped Rate Exceedsthe Threshold

DescriptionThe system checks the network read packet dropped rate every 30 seconds and compares theactual packet dropped rate with the threshold (the default threshold is 0.5%). This alarm isgenerated when the system detects that the network read packet dropped rate exceeds thethreshold for several times (5 times by default) consecutively.

To change the threshold, choose System > Threshold Configuration > Device > Host >Network Reading > Network Read Packet Rate Information > Read Packet DroppedRate.

When the hit number is 1, this alarm is cleared when the network read packet dropped rate isless than or equal to the threshold. When the hit number is greater than 1, this alarm iscleared when the network read packet dropped rate is less than or equal to 90% of thethreshold.

Alarm detection is disabled by default. If you want to enable this function, check whetheralarm sending can be enabled based on section "Check the system environment."


12045 Major Yes





NetworkCardName Specifies the network port for which thealarm is generated.


Impact on the SystemThe service performance deteriorates or services time out.


2019-01-15 221

Precautions: In SUSE (kernel: 3.0 or later) or Red Hat 7.2, because the system kernelmodifies the mechanism for counting read and discarded packets, this alarm may be generatedeven when the network is normal. Services are not adversely affected. You are advised tocheck whether the alarm is caused by this problem based on section "Check the systemenvironment."

Possible Causesl An OS exception occurs.l The NIC has configured the active/standby bond mode.l The alarm threshold is set improperly.l The network is abnormal.

Procedure

Check the network packet dropped rate.

Step 1 Use PuTTY to log in to any node for which the alarm is not generated in the cluster as useromm and run the ping IP address -c 100 command to check whether network packet lossoccurs.# ping 10.10.10.12 -c 5 PING 10.10.10.12 (10.10.10.12) 56(84) bytes of data. 64 bytes from 10.10.10.11: icmp_seq=1 ttl=64 time=0.033 ms 64 bytes from 10.10.10.11: icmp_seq=2 ttl=64 time=0.034 ms 64 bytes from 10.10.10.11: icmp_seq=3 ttl=64 time=0.021 ms 64 bytes from 10.10.10.11: icmp_seq=4 ttl=64 time=0.033 ms 64 bytes from 10.10.10.11: icmp_seq=5 ttl=64 time=0.030 ms --- 10.10.10.12 ping statistics --- 5 packets transmitted, 5 received, 0% packet loss, time 4001ms rtt min/avg/max/mdev = 0.021/0.030/0.034/0.006 ms

NOTE

l IP address: indicates the value of HostName in the alarm location information. To query the valueof OM IP and Business IP, click Hosts on MRS Manager.

l -c: indicates the check times. The default value is 100.


Check the system environment.

Step 2 Use PuTTY to log in as user omm to the active OMS node or the node for which the alarm isgenerated.

Step 3 Run the cat /etc/*-release command to check the OS type.l If EulerOS is used, go to Step 4.

# cat /etc/*-releaseEulerOS release 2.0 (SP2)EulerOS release 2.0 (SP2)

l If SUSE is used, go to Step 5.# cat /etc/*-releaseSUSE Linux Enterprise Server 11 (x86_64)VERSION = 11PATCHLEVEL = 3

l If another OS is used, go to Step 10.

Step 4 Run the cat /etc/euleros-release command to check whether the OS version is


2019-01-15 222

EulerOS 2.2.

# cat /etc/euleros-releaseEulerOS release 2.0 (SP2)l If yes, the alarm sending function cannot be enabled. Go to Step 6.l If no, go to Step 10.

Step 5 Run the cat /proc/version command to check whether the SUSE kernel version is 3.0 or later.# cat /proc/versionLinux version 3.0.101-63-default (geeko@buildhost) (gcc version 4.3.4 [gcc-4_3-branch revision 152973] (SUSE Linux) ) #1 SMP Tue Jun 23 16:02:31 UTC 2015 (4b89d0c)l If yes, the alarm sending function cannot be enabled. Go to Step 6.l If no, go to Step 10.

Step 6 Log in to MRS Manager and choose System > Configuration > Threshold Configuration.

Step 7 In the navigation tree of the Threshold Configuration page, choose Network Reading >Network Read Packet Rate Information > Read Packet Dropped Rate. In the area on theright, check whether Send Alarm is selected.l If yes, the alarm sending function has been enabled. Go to Step 8.l If no, the alarm sending function has been disabled. Go to Step 9.

Step 8 In the area on the right, deselect Send Alarm to disable the checking of Network ReadPacket Dropped Rate Exceeds the Threshold.

Step 9 On the Alarms page of MRS Manager, search for the 12045 alarm. If the alarm is not clearedautomatically, clear it manually. No further action is required.

NOTE

The ID of alarm Network Read Packet Dropped Rate Exceeds the Threshold is 12045.

Check whether the NIC has configured the active/standby bond mode.

Step 10 Use PuTTY to log in to the alarm node as user omm. Run the ls -l /proc/net/bondingcommand to check whether directory /proc/net/bonding exists on the alarm node.l If yes, the NIC has configured the active/standby bond mode, as shown in the following.

Go to Step 11.# ls -l /proc/net/bonding/total 0-r--r--r-- 1 root root 0 Oct 11 17:35 bond0

l If no, the NIC has not configured the active/standby bond mode, as shown in thefollowing. Go to Step 13.# ls -l /proc/net/bonding/ls: cannot access /proc/net/bonding/: No such file or directory

Step 11 Run the cat /proc/net/bonding/bond0 command and check whether the value of BondingMode is fault-tolerance.

NOTE

bond0 indicates the name of the bond configuration file. Use the file name queried in Step 10 inpractice.

# cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: fault-tolerance (active-backup)Primary Slave: eth1 (primary_reselect always)Currently Active Slave: eth1


2019-01-15 223

MII Status: upMII Polling Interval (ms): 100Up Delay (ms): 0Down Delay (ms): 0

Slave Interface: eth0MII Status: upSpeed: 1000 MbpsDuplex: fullLink Failure Count: 1Slave queue ID: 0

Slave Interface: eth1MII Status: upSpeed: 1000 MbpsDuplex: fullLink Failure Count: 1Slave queue ID: 0l If yes, the NIC has configured the active/standby bond mode. Go to Step 12.l If no, the NIC has not configured the active/standby bond mode. Go to Step 13.

Step 12 Check whether the NIC of the NetworkCardName parameter in the alarm details is thestandby NIC.l If yes, manually clear the alarm on the Alarms page because the alarm on the standby

cannot be automatically cleared. No further action is required.l If no, go to Step 13.

NOTE

Method of determining whether an NIC is standby: In the /proc/net/bonding/bond0 configurationfile, check whether the NIC name of the NetworkCardName parameter is the same as the SlaveInterface, but is different from Currently Active Slave (indicating the current active NIC). If theanswer is yes, the NIC is a standby one.

Check whether the threshold is set properly.

Step 13 Log in to MRS Manager and check whether the alarm threshold is set properly. (By default,0.5% is a proper value. However, users can configure the value as required.)l If yes, go to Step 16.l If no, go to Step 14.

Step 14 Based on actual usage condition, choose System > Threshold Configuration > Device >Host > Network Reading > Network Read Packet Rate Information > Read PacketDropped Rate to modify the alarm threshold.

For details, see Figure 5-1.


2019-01-15 224

Figure 5-1 Setting alarm thresholds

Step 15 Wait 5 minutes and check whether the alarm is cleared.l If yes, no further action is required.l If no, go to Step 16.

Check whether the network is normal.

Step 16 Contact the system administrator to check whether the network is abnormal.l If yes, go to Step 17 to rectify the network fault.l If no, go to Step 18.





----End



2019-01-15 225

5.7.76 ALM-12046 Network Write Packet Dropped Rate Exceedsthe Threshold

Description

The system checks the network write packet dropped rate every 30 seconds and compares theactual packet dropped rate with the threshold (the default threshold is 0.5%). This alarm isgenerated when the system detects that the network write packet dropped rate exceeds thethreshold for several times (5 times by default) consecutively.

To change the threshold, choose System > Threshold Configuration > Device > Host >Network Writing > Network Write Packet Rate Information > Write Packet DroppedRate.

When the hit number is 1, this alarm is cleared when the network write packet dropped rateis less than or equal to the threshold. When the hit number is greater than 1, this alarm iscleared when the network write packet dropped rate is less than or equal to 90% of thethreshold.

Attribute


12046 Major Yes

Parameters








The service performance deteriorates or services time out.


2019-01-15 226

Possible Causesl The alarm threshold is set improperly.l The network is abnormal.

ProcedureCheck whether the threshold is set properly.


Step 2 Based on actual usage condition, choose System > Threshold Configuration > Device >Host > Network Writing > Network Write Packet Rate Information > Write PacketDropped Rate to modify the alarm threshold.








----End


5.7.77 ALM-12047 Network Read Packet Error Rate Exceeds theThreshold

DescriptionThe system checks the network read packet error rate every 30 seconds and compares theactual packet error rate with the threshold (the default threshold is 0.5%). This alarm isgenerated when the system detects that the network read packet error rate exceeds thethreshold for several times (5 times by default) consecutively.


2019-01-15 227

To change the threshold, choose System > Threshold Configuration > Device > Host >Network Reading > Network Read Packet Rate Information > Read Packet Error Rate.

When the hit number is 1, this alarm is cleared when the network read packet error rate isless than or equal to the threshold. When the hit number is greater than 1, this alarm iscleared when the value of the network read packet error rate is less than or equal to 90% ofthe threshold.

Attribute


12047 Major Yes

Parameters








The communication interrupts intermittently and services time out.


Procedure


Step 1 Log in to MRS Manager and check whether the alarm threshold is set properly. (By default,0.5% is a proper value. However, users can configure the value as required.)l If yes, go to Step 4.


2019-01-15 228


Step 2 Based on actual usage condition, choose System > Threshold Configuration > Device >Host > Network Reading > Network Read Packet Rate Information > Read PacketError Rate to modify the alarm threshold.

Step 3 Wait 5 minutes and check whether the alarm is cleared.




Step 4 Contact the system administrator to check whether the network is abnormal.

l If yes, go to Step 5 to rectify the network fault.








----End

Related Information

N/A

5.7.78 ALM-12048 Network Write Packet Error Rate Exceeds theThreshold

Description

The system checks the network write packet error rate every 30 seconds and compares theactual packet error rate with the threshold (the default threshold is 0.5%). This alarm isgenerated when the system detects that the network write packet error rate exceeds thethreshold for several times (5 times by default) consecutively.

To change the threshold, choose System > Threshold Configuration > Device > Host >Network Writing > Network Write Packet Rate Information > Write Packet Error Rate.

When the hit number is 1, this alarm is cleared when the network write packet error rate isless than or equal to the threshold. When the hit number is greater than 1, this alarm iscleared when the value of the network write packet error rate is less than or equal to 90% ofthe threshold.


2019-01-15 229

Attribute


12048 Major Yes

Parameters








The communication interrupts intermittently and services time out.


Procedure



Step 2 Based on actual usage condition, choose System > Threshold Configuration > Device >Host > Network Writing > Network Write Packet Rate Information > Write PacketError Rate to modify the alarm threshold.

Step 3 Wait 5 minutes and check whether the alarm is cleared.l If yes, no further action is required.


2019-01-15 230








----End


5.7.79 ALM-12049 Network Read Throughput Rate Exceeds theThreshold

DescriptionThe system checks the network read throughput rate every 30 seconds and compares theactual throughput rate with the threshold (the default threshold is 80%). This alarm isgenerated when the system detects that the network read throughput rate exceeds the thresholdfor several times (5 times by default) consecutively.

To change the threshold, choose System > Threshold Configuration > Device > Host >Network Reading > Network Read Throughput Rate > Read Throughput Rate.

When the hit number is 1, this alarm is cleared when the network read throughput rate is lessthan or equal to the threshold. When the hit number is greater than 1, this alarm is clearedwhen the network read throughput rate is less than or equal to 90% of the threshold.


12049 Major Yes


2019-01-15 231

Parameters








The service system runs improperly or is unavailable.

Possible Causesl The alarm threshold is set improperly.l The network port rate cannot meet the current service requirements.

Procedure


Step 1 Log in to MRS Manager and check whether the alarm threshold is set properly. (By default,80% is a proper value. However, users can configure the value as required.)l If yes, go to Step 2.l If no, go to Step 4.

Step 2 Based on actual usage condition, choose System > Threshold Configuration > Device >Host > Network Reading > Network Read Throughput Rate > Read Throughput Rate tomodify the alarm threshold.


Check whether the network port rate can meet the service requirements.

Step 4 On MRS Manager, click the alarm in the real-time alarm list. In the Alarm Details area,obtain the IP address of the host and the network port name for which the alarm is generated.

Step 5 Use PuTTY to log in to the host for which the alarm is generated as user root.


2019-01-15 232

Step 6 Run the ethtool network port name command to check the maximum speed of the currentnetwork port.

NOTE

In the VM environment, you cannot run a command to query the network port rate. It is recommendedthat you contact the system administrator to confirm whether the network port rate meets therequirements.

Step 7 If the network read throughput rate exceeds the threshold, contact the system administrator toincrease the network port rate.

Step 8 Check whether the alarm is cleared.






----End

Related Information

N/A

5.7.80 ALM-12050 Network Write Throughput Rate Exceeds theThreshold

Description

The system checks the network write throughput rate every 30 seconds and compares theactual throughput rate with the threshold (the default threshold is 80%). This alarm isgenerated when the system detects that the network write throughput rate exceeds thethreshold for several times (5 times by default) consecutively.

To change the threshold, choose System > Threshold Configuration > Device > Host >Network Writing > Network Write Throughput Rate > Write Throughput Rate.

When the hit number is 1, this alarm is cleared when the network write throughput rate isless than or equal to the threshold. When the hit number is greater than 1, this alarm iscleared when the network write throughput rate is less than or equal to 90% of the threshold.

Attribute


12050 Major Yes


2019-01-15 233

Parameters








The service system runs improperly or is unavailable.

Possible Causesl The alarm threshold is set improperly.l The network port rate cannot meet the current service requirements.

Procedure


Step 1 Log in to MRS Manager and check whether the alarm threshold is set properly. (By default,80% is a proper value. However, users can configure the value as required.)l If yes, go to Step 4.l If no, go to Step 2.

Step 2 Based on actual usage condition, choose System > Threshold Configuration > Device >Host > Network Writing > Network Write Throughput Rate > Write Throughput Rateto modify the alarm threshold.


Check whether the network port rate can meet the service requirements.

Step 4 On MRS Manager, click the alarm in the real-time alarm list. In the Alarm Details area,obtain the IP address of the host and the network port name for which the alarm is generated.



2019-01-15 234

Step 6 Run the ethtool network port name command to check the maximum speed of the currentnetwork port.

NOTE

In the VM environment, you cannot run a command to query the network port rate. It is recommendedthat you contact the system administrator to confirm whether the network port rate meets therequirements.

Step 7 If the network write throughput rate exceeds the threshold, contact the system administrator toincrease the network port rate.

Step 8 Check whether the alarm is cleared.l If yes, no further action is required.l If no, go to Step 10.




----End


5.7.81 ALM-12051 Disk Inode Usage Exceeds the Threshold

DescriptionThe system checks the disk Inode usage every 30 seconds and compares the actual Inodeusage with the threshold (the default threshold is 80%). This alarm is generated when theInode usage exceeds the threshold for several times (5 times by default) consecutively.

To change the threshold, choose System > Threshold Configuration > Device > Host >Disk > Disk Inode Usage > Disk Inode Usage.

When the hit number is 1, this alarm is cleared when the disk Inode usage is less than orequal to the threshold. When the hit number is greater than 1, this alarm is cleared when thedisk Inode usage is less than or equal to 90% of the threshold.


12051 Major Yes


2019-01-15 235





PartitionName Specifies the disk partition for which thealarm is generated.


Impact on the SystemData cannot be properly written to the file system.

Possible Causesl Massive small files are stored in the disk.l The system is abnormal.

ProcedureMassive small files are stored in the disk.

Step 1 On MRS Manager, click the alarm in the real-time alarm list. In the Alarm Details area,obtain the IP address of the host and the disk partition for which the alarm is generated.


Step 3 Run the df -i partition name command to check the current disk Inode usage.

Step 4 If the Inode usage exceeds the threshold, manually check small files stored in the diskpartition and confirm whether these small files can be deleted.l If yes, delete files and go to Step 5.l If no, adjust the capacity. For details, see the FusionInsight HD Capacity Adjustment

Guide. Go to Step 6.


Check whether the system environment is abnormal.

Step 6 Contact the operating system maintenance personnel to check whether the operating system isabnormal.


2019-01-15 236

l If yes, go to Step 7 to rectify the fault.l If no, go to Step 9.





----End

Related Information

N/A

5.7.82 ALM-12052 TCP Temporary Port Usage Exceeds theThreshold

Description

The system checks the TCP temporary port usage every 30 seconds and compares the actualusage with the threshold (the default threshold is 80%). This alarm is generated when the TCPtemporary port usage exceeds the threshold for several times (5 times by default)consecutively.

To change the threshold, choose System > Threshold Configuration > Device > Host >Network Status > TCP Ephemeral Port Usage > TCP Ephemeral Port Usage.

When the hit number is 1, this alarm is cleared when the TCP temporary port usage is lessthan or equal to the threshold. When the hit number is greater than 1, this alarm is clearedwhen the TCP temporary port usage is less than or equal to 90% of the threshold.

Attribute


12052 Major Yes

Parameters





2019-01-15 237




Impact on the SystemServices on the host cannot establish external connections, and therefore they are interrupted.

Possible Causesl The temporary port cannot meet the current service requirements.l The system is abnormal.

ProcedureExpand the temporary port number range.

Step 1 On MRS Manager, click the alarm in the real-time alarm list. In the Alarm Details area,obtain the IP address of the host for which the alarm is generated.

Step 2 Use PuTTY to log in to the host for which the alarm is generated as user omm.

Step 3 Run the cat /proc/sys/net/ipv4/ip_local_port_range |cut -f 1 command to obtain the value ofthe start port and run the cat /proc/sys/net/ipv4/ip_local_port_range |cut -f 2 command toobtain the value of the end port. The total number of temporary ports is the value of the endport minus the value of the start port. If the total number of temporary ports is smaller than28,232, the random port range of the OS is narrow. Contact the system administrator toincrease the port range.

Step 4 Run the ss -ant 2>/dev/null | grep -v LISTEN | awk 'NR > 2 {print $4}'|cut -d ':' -f 2 | awk'$1 >"Value of the start port" {print $1}' | sort -u | wc -l command to calculate the numberof used temporary ports.

Step 5 The formula for calculating the usage of the temporary ports is: Usage of the temporary ports= (Number of used temporary ports/Total number of temporary ports) x 100%. Check whetherthe temporary port usage exceeds the threshold.l If yes, go to Step 7.l If no, go to Step 6.



Step 7 Run the following command to import the temporary file and view the frequently used portsin the port_result.txt file:

netstat -tnp > $BIGDATA_HOME/tmp/port_result.txt


2019-01-15 238

netstat -tnp

Active Internet connections (w/o servers)

Proto Recv Send LocalAddress ForeignAddress State PID/ProgramName tcp 0 0 10-120-85-154:45433 10-120-8:25009 CLOSE_WAIT 94237/java tcp 0 0 10-120-85-154:45434 10-120-8:25009 CLOSE_WAIT 94237/java tcp 0 0 10-120-85-154:45435 10-120-8:25009 CLOSE_WAIT 94237/java ...

Step 8 Run the following command to view the processes that occupy a large number of ports:

ps -ef |grep PID

NOTE

l PID is the processes ID queried in Step 7.

l Run the following command to collect information about all processes and check the processes thatoccupy a large number of ports:

ps -ef > $BIGDATA_HOME/tmp/ps_result.txt

Step 9 After obtaining the administrator's approval, clear the processes that occupy a large number ofports. Wait 5 minutes and check whether the alarm is cleared.l If yes, no further action is required.l If no, go to Step 11.




----End


5.7.83 ALM-12053 File Handle Usage Exceeds the Threshold

DescriptionThe system checks the file handle usage every 30 seconds and compares the actual usage withthe threshold (the default threshold is 80%). This alarm is generated when the file handleusage exceeds the threshold for several times (5 times by default) consecutively.

To change the threshold, choose System > Threshold Configuration > Device > Host >Host Status > Host File Handle Usage > Host File Handle Usage.

When the hit number is 1, this alarm is cleared when the host file handle usage is less than orequal to the threshold. When the hit number is greater than 1, this alarm is cleared when thehost file handle usage is less than or equal to 90% of the threshold.


2019-01-15 239

Attribute


12053 Major Yes

Parameters







The I/O operations, such as opening a file or connecting to network, cannot be performed andprograms are abnormal.

Possible Causesl The number of file handles cannot meet the current service requirements.l The system is abnormal.

Procedure

Increase the number of file handles.

Step 1 On MRS Manager, click the alarm in the real-time alarm list. In the Alarm Details area,obtain the IP address of the host for which the alarm is generated.


Step 3 Run the ulimit -n command to check the current maximum number of file handles of thesystem.

Step 4 If the file handle usage exceeds the threshold, contact the system administrator to increase thenumber of file handles.



2019-01-15 240


Step 6 Contact the system administrator to check whether the operating system is abnormal.

l If yes, go to Step 7 to rectify the fault.








----End

Related Information

N/A

5.7.84 ALM-12054 The Certificate File Is Invalid

Description

The system checks whether the certificate file is invalid (has expired or is not yet valid) on23:00 every day. This alarm is generated when the certificate file is invalid.

This alarm is cleared if the status of the newly imported certificate is valid.

Attribute


12054 Major Yes

Parameters






2019-01-15 241


The system reminds users that the certificate file is invalid. If the certificate file expires, somefunctions are restricted and cannot be used properly.

Possible Causes

No HA root certificate or HA user certificate is imported, certificate import fails or thecertificate file is invalid.

Procedure

Locate the alarm cause.

Step 1 On MRS Manager, view the real-time alarm list and locate the target alarm.

In the Alarm Details area, view the additional information about the alarm.

l If CA Certificate is displayed in the additional information, use PuTTY to log in to theactive OMS node as user omm and go to Step 2.

l If HA root Certificate is displayed in the additional information, check Location toobtain the name of the host involved in this alarm. Then use PuTTY to log in to the hostas user omm and go to Step 3.

l If HA server Certificate is displayed in the additional information, check Location toobtain the name of the host involved in this alarm. Then use PuTTY to log in to the hostas user omm and go to Step 4.

Check the validity period of the certificate file.

Step 2 Check whether the current system time is in the validity period of the CA certificate.

Run the openssl x509 -noout -text -in ${CONTROLLER_HOME}/security/cert/root/ca.crt command to check the effective time and due time of the CA certificate.


Step 3 Check whether the current system time is in the validity period of the HA root certificate.

Run the openssl x509 -noout -text -in ${CONTROLLER_HOME}/security/certHA/root-ca.crt command to check the effective time and expiration time of the HA root certificate.


Step 4 Check whether the current system time is in the validity period of the HA user certificate.

Run the openssl x509 -noout -text -in ${CONTROLLER_HOME}/security/certHA/server.crt command to check the effective time and expiration time of the HA user certificate.


The example of the effective time and expiration time of the HA/CA certificate:Certificate: Data: Version: 3 (0x2)


2019-01-15 242

Serial Number: 97:d5:0e:84:af:ec:34:d8 Signature Algorithm: sha256WithRSAEncryption Issuer: C=CountryName, ST=State, L=Locality, O=Organization, OU=IT, CN=HADOOP.COM Validity Not Before: Dec 13 06:38:26 2016 GMT //The effective time. Not After : Dec 11 06:38:26 2026 GMT //The expiration time.

Import the certificate file.

Step 5 Import a new CA certificate file.

Apply for or generate a CA certificate file and import it to the system. For details, see sectionReplacing HA Certificates in the Administrator Guide. Manually clear the alarm and checkwhether this alarm is generated again during periodic check.

l If yes, go to Step 8

l If no, no further action is required.

Step 6 Import a new HA certificate file.

Apply for or generate an HA certificate file and import it to the system. For details, seesection Replacing HA Certificates in the Administrator Guide. Manually clear the alarm andcheck whether this alarm is generated again during periodic check.


l If no, no further action is required.




----End

Related Information

N/A

5.7.85 ALM-12055 The Certificate File Is About to Expire

Description

The system checks the certificate file on 23:00 every day. This alarm is generated if the timeleft before the certificate file expires is shorter than the threshold. In this case, the certificatefile is about to expire. For details about how to configure the alarm threshold duration, seesection Configuring the Threshold for the Alarm Stating That the Certificate Is About toExpire in the Administrator Guide.

This alarm is cleared if the status of the newly imported certificate is valid.


2019-01-15 243

Attribute


12055 Minor Yes

Parameters






The system reminds users that the certificate file is about to expire. If the certificate fileexpires, some functions are restricted and cannot be used properly.

Possible Causes

The remaining validity period of the CA certificate, HA root certificate (root-ca.crt), or HAuser certificate (server.crt) is smaller than the alarm threshold.

Procedure

Locate the alarm cause.

Step 1 On MRS Manager, view the real-time alarm list and locate the target alarm.

In the Alarm Details area, view the additional information about the alarm.

l If CA Certificate is displayed in the additional information, use PuTTY to log in to theactive OMS node as user omm and go to Step 2.

l If HA root Certificate is displayed in the additional information, check Location toobtain the name of the host involved in this alarm. Then use PuTTY to log in to the hostas user omm and go to Step 3.

l If HA server Certificate is displayed in the additional information, check Location toobtain the name of the host involved in this alarm. Then use PuTTY to log in to the hostas user omm and go to Step 4.

Check the validity period of the certificate file.

Step 2 Check whether the remaining validity period of the CA certificate is smaller than the alarmthreshold.


2019-01-15 244

Run the openssl x509 -noout -text -in ${CONTROLLER_HOME}/security/cert/root/ca.crt command to check the effective time and due time of the CA certificate.


Step 3 Check whether the remaining validity period of the HA root certificate is smaller than thealarm threshold.

Run the openssl x509 -noout -text -in ${CONTROLLER_HOME}/security/certHA/root-ca.crt command to check the effective time and due time of the HA root certificate.


Step 4 Check whether the remaining validity period of the HA user certificate is smaller than thealarm threshold.

Run the openssl x509 -noout -text -in ${CONTROLLER_HOME}/security/certHA/server.crt command to check the effective time and expiration time of the HA user certificate.


The example of the effective time and expiration time of the HA/CA certificate:Certificate: Data: Version: 3 (0x2) Serial Number: 97:d5:0e:84:af:ec:34:d8 Signature Algorithm: sha256WithRSAEncryption Issuer: C=CountryName, ST=State, L=Locality, O=Organization, OU=IT, CN=HADOOP.COM Validity Not Before: Dec 13 06:38:26 2016 GMT //The effective time. Not After : Dec 11 06:38:26 2026 GMT //The expiration time.

Import the certificate file.

Step 5 Import a new CA certificate file.

Apply for or generate a CA certificate file and import it to the system. For details, see sectionReplacing HA Certificates in the Administrator Guide. Manually clear the alarm and checkwhether this alarm is generated again during periodic check.

l If yes, go to Step 8.l If no, no further action is required.

Step 6 Import a new HA certificate file.

Apply for or generate an HA certificate file and import it to the system. For details, seesection Replacing HA Certificates in the Administrator Guide. Manually clear the alarm andcheck whether this alarm is generated again during periodic check.

l If yes, go to Step 8.l If no, no further action is required.



2019-01-15 245



----End


5.7.86 ALM-18008 Heap Memory Usage of Yarn ResourceManagerExceeds the Threshold

DescriptionThe system checks the heap memory usage of Yarn ResourceManager every 30 seconds andcompares the actual usage with the threshold. The alarm is generated when the heap memoryusage of Yarn ResourceManager exceeds the threshold (80% of the maximum memory bydefault).

To change the threshold, choose System > Threshold Configuration > Service > Yarn. Thisalarm is cleared when the heap memory usage of Yarn ResourceManager is less than or equalto the threshold.


18008 Major Yes







2019-01-15 246


Overhigh heap memory usage of the Yarn ResourceManager deteriorates Yarn tasksubmission and running performance or even causes OOM, which results in unavailable Yarnservice.

Possible Causes

The heap memory of the Yarn ResourceManager instance is overused or inappropriatelyallocated.

Procedure

Check the heap memory usage.

Step 1 On MRS Manager, click Alarms and select the alarm whose Alarm ID is 18008. Then checkthe IP address and role name of the instance in Location.

Step 2 On MRS Manager, choose Services > Yarn > Instance > ResourceManager > Customize >Percentage of Used Heap Memory of the ResourceManager.

Step 3 Check whether the used heap memory of ResourceManager reaches 80% of the maximumheap memory specified for ResourceManager.l If yes, go to Step 4.l If no, go to Step 6.

Step 4 On MRS Manager, choose Services > Yarn > Service Configuration > All >ResourceManager > System. Increase the value of -Xmx in the GC_OPTS parameter asrequired, click Save Configuration, and select Restart the affected services or instance.Click OK to restart the role instance.




Step 7 Select the following node from the Service drop-down list and click OK.l NodeAgentl Yarn

Step 8 Set Start Time for log collection to 10 minutes ahead of the alarm generation time and EndTime to 10 minutes behind the alarm generation time, and click Download.


----End

Related Information

N/A


2019-01-15 247

5.7.87 ALM-18009 Heap Memory Usage of MapReduceJobHistoryServer Exceeds the Threshold

Description

The system checks the heap memory usage of MapReduce JobHistoryServer every 30seconds and compares the actual usage with the threshold. The alarm is generated when theheap memory usage of MapReduce JobHistoryServer exceeds the threshold (80% of themaximum memory by default).

To change the threshold, choose System > Threshold Configuration > Service >MapReduce. This alarm is cleared when the heap memory usage of MapReduceJobHistoryServer is less than or equal to the threshold.

Attribute


18009 Major Yes

Parameters







Overhigh heap memory usage of the MapReduce JobHistoryServer deteriorates performanceof MapReduce log archiving or even causes OOM, which results in unavailable MapReduceservice.

Possible Causes

The heap memory of the MapReduce JobHistoryServer instance is overused orinappropriately allocated.


2019-01-15 248

Procedure

Check the memory usage.

Step 1 On MRS Manager, click Alarms and select the alarm whose Alarm ID is 18009. Then checkthe IP address and role name of the instance in Location.

Step 2 On MRS Manager, choose Services > MapReduce > Instance > JobHistoryServer >Customize > Percentage of Used Heap Memory of the JobHistoryServer.

Step 3 JobHistoryServer indicates the corresponding HostName of the instance for which the alarmis generated. Check the heap memory usage.

Step 4 Check whether the used heap memory of JobHistoryServer reaches 80% of the maximumheap memory specified for JobHistoryServer.



Step 5 On MRS Manager, choose Services > MapReduce > Service Configuration > All >JobHistoryServer > System. Increase the value of -Xmx in the GC_OPTS parameter asrequired, click Save Configuration, and select Restart the affected services or instance.Click OK to restart the role instance.

Step 6 Check whether the alarm is cleared.





Step 8 Select the following node from the Service drop-down list and click OK.

l NodeAgent

l MapReduce



----End

Related Information

N/A

5.7.88 ALM-20002 Hue Service Unavailable

Description

The system checks the Hue service status every 60 seconds. This alarm is generated when theHue service is unavailable and is cleared when the Hue service recovers.


2019-01-15 249


20002 Critical Yes






The system cannot provide data loading, query, and extraction services.

Possible Causesl The internal KrbServer service on which the Hue service depends is abnormal.l The internal DBService service on which the Hue service depends is abnormal.l The network connection to the DBService is abnormal.

Procedure

Check whether the KrbServer is abnormal.

Step 1 On the MRS Manager home page, click Services. In the service list, check whether theKrbServer health status is Good.l If yes, go to Step 4.l If no, go to Step 2.

Step 2 Click Restart in the Operation column of the KrbServer to restart the KrbServer.

Step 3 Wait several minutes, and check whether ALM-20002 Hue Service Unavailable is cleared.l If yes, no further action is required.l If no, go to Step 4.

Check whether the DBService is abnormal.

Step 4 On the MRS Manager home page, click Services.

Step 5 In the service list, check whether the DBService health status is Good.l If yes, go to Step 8.


2019-01-15 250


Step 6 Click Restart in the Operation column of the DBService to restart the DBService.

NOTE

To restart the service, enter the MRS Manager administrator password and select Start or restartrelated services.


Check whether the network connection to the DBService is normal.

Step 8 Choose Services > Hue > Instance, record the IP address of the active Hue.

Step 9 Use PuTTY to log in to the active Hue.

Step 10 Run the ping command to check whether communication between the host that runs theactive Hue and the hosts that run the DBService is normal. (Obtain the IP addresses of thehosts that run the DBService in the same way as that for obtaining the IP address of the activeHue.)l If yes, go to Step 13.l If no, go to Step 11.

Step 11 Contact the administrator to restore the network.




Step 14 Select the following nodes from the Service drop-down list and click OK:l Huel Controller


The Hue is restarted.

Step 16 On MRS Manager, choose Services > Hue.

Step 17 Choose More Actions > Restart service, and click OK.


Step 19 Contact Technical Support.


----End


2019-01-15 251

Related Information

N/A

5.7.89 ALM-43001 Spark Service Unavailable

Description

The system checks the Spark service status every 60 seconds. This alarm is generated whenthe Spark service is unavailable and is cleared when the Spark service recovers.

Attribute


43001 Critical Yes

Parameters






The tasks submitted by users fail to be executed.

Possible Causesl The KrbServer service is abnormal.l The LdapServer service is abnormal.l The ZooKeeper service is abnormal.l The HDFS service is abnormal.l The Yarn service is abnormal.l The corresponding Hive service is abnormal.

Procedure

Step 1 Check whether service unavailability alarms exist in services on which Spark depends.

1. On MRS Manager, click Alarms.


2019-01-15 252

2. Check whether the following alarms exist in the alarm list:– ALM-25500 KrbServer Service Unavailable– ALM-25000 LdapServer Service Unavailable– ALM-13000 ZooKeeper Service Unavailable– ALM-14000 HDFS Service Unavailable– ALM-18000 Yarn Service Unavailable– ALM-16004 Hive Service Unavailable

If yes, go to Step 1.3If no, go to Step 2.

3. Handle the service unavailability alarms based on the troubleshooting methods providedin the alarm help.After all the service unavailability alarms are cleared, wait a few minutes and checkwhether this alarm is cleared.– If yes, no further action is required.– If no, go to Step 2.


1. On MRS Manager, choose System > Export Log.2. Select the following nodes from the Service drop-down list and click OK (Hive is the

specific Hive service determined based on ServiceName in the alarm locationinformation).– KrbServer– LdapServer– ZooKeeper– HDFS– Yarn– Hive

3. Set Start Time for log collection to 10 minutes ahead of the alarm generation time andEnd Time to 10 minutes behind the alarm generation time, and click Download.


----End

Related Information

N/A

5.7.90 ALM-43006 Heap Memory Usage of the JobHistory ProcessExceeds the Threshold

Description

The system checks the heap memory usage of the JobHistory process every 30 seconds. Thealarm is generated when the heap memory usage of the JobHistory process exceeds thethreshold (90% of the maximum memory).


2019-01-15 253

Attribute


43006 Major Yes

Parameters






Overhigh heap memory usage of the JobHistory process deteriorates JobHistory runningperformance or even causes OOM, which results in unavailable JobHistory process.

Possible Causes

The heap memory of the JobHistory process is overused or inappropriately allocated.

Procedure

Step 1 Check heap memory usage.

1. On MRS Manager, click Alarms and select the alarm whose Alarm ID is 43006. Thencheck the IP address and role name of the instance in Location.

2. On MRS Manager, choose Services > Spark > Instance and click the JobHistory forwhich the alarm is generated to enter the Instance Status page. Then choose Customize> Statistics for the heap memory of the JobHistory Process and click OK.

3. Check whether the used heap memory of the JobHistory process reaches 90% of themaximum heap memory specified for JobHistory.– If yes, go to Step 1.4.– If no, go to Step 2.

4. On MRS Manager, choose Services > Spark > Service Configuration, and set Type toAll. Choose JobHistory > Default. Increase the value ofSPARK_DAEMON_MEMORY as required.



2019-01-15 254



2. Select Spark from the Service drop-down list and click OK.



----End

Related Information

N/A

5.7.91 ALM-43007 Non-Heap Memory Usage of the JobHistoryProcess Exceeds the Threshold

Description

The system checks the non-heap memory usage of the JobHistory process every 30 seconds.The alarm is generated when the non-heap memory usage of the JobHistory process exceedsthe threshold (90% of the maximum memory).

Attribute


43007 Major Yes

Parameters






Overhigh non-heap memory usage of the JobHistory process deteriorates JobHistory runningperformance or even causes OOM, which results in unavailable JobHistory process.


2019-01-15 255

Possible Causes

The non-heap memory of the JobHistory process is overused or inappropriately allocated.

Procedure

Step 1 Check non-heap memory usage.


2. On MRS Manager, choose Services > Spark > Instance and click the JobHistory forwhich the alarm is generated to enter the Instance Status page. Then choose Customize> Statistics for the non-heap memory of the JobHistory Process and click OK.

3. Check whether the used non-heap memory of the JobHistory process reaches 90% of themaximum non-heap memory specified for JobHistory.



4. On MRS Manager, choose Services > Spark > Service Configuration, and set Type toAll. Choose JobHistory > Default. Increase the value of -XX:MaxMetaspaceSize inSPARK_DAEMON_JAVA_OPTS as required.








a. Contact the O&M personnel and send the collected log information.

----End

Related Information

N/A

5.7.92 ALM-43008 Direct Memory Usage of the JobHistory ProcessExceeds the Threshold

Description

The system checks the direct memory usage of the JobHistory process every 30 seconds. Thealarm is generated when the direct memory usage of the JobHistory process exceeds thethreshold (90% of the maximum memory).


2019-01-15 256

Attribute


43008 Major Yes

Parameters






Overhigh direct memory usage of the JobHistory process deteriorates JobHistory runningperformance or even causes OOM, which results in unavailable JobHistory process.

Possible Causes

The direct memory of the JobHistory process is overused or inappropriately allocated.

Procedure

Step 1 Check direct memory usage.


2. On MRS Manager, choose Services > Spark > Instance and click the JobHistory forwhich the alarm is generated to enter the Instance Status page. Then choose Customize> Direct Memory of JobHistory and click OK.

3. Check whether the used direct memory of the JobHistory process reaches 90% of themaximum direct memory specified for JobHistory.– If yes, go to Step 1.4.– If no, go to Step 2.

4. On MRS Manager, choose Services > Spark > Service Configuration, and set Type toAll. Choose JobHistory > Default. Increase the value of -XX:MaxDirectMemorySizein SPARK_DAEMON_JAVA_OPTS as required.



2019-01-15 257






----End

Related Information

N/A

5.7.93 ALM-43009 JobHistory GC Time Exceeds the Threshold

Description

The system checks the garbage collection (GC) time of the JobHistory process every 60seconds. This alarm is generated when the detected GC time exceeds the threshold (exceeds12 seconds for three consecutive checks.) To change the threshold, choose System >Threshold Configuration > Service > Spark > Garbage Collection (GC) Time ofJobHistory > Total GC time in milliseconds. This alarm is cleared when the JobHistory GCtime is shorter than or equal to the threshold.

Attribute


43009 Major Yes

Parameters






If the GC time exceeds the threshold, the JobHistory process running performance will beaffected and the JobHistory process will even be unavailable.


2019-01-15 258

Possible Causes

The heap memory of the JobHistory process is overused or inappropriately allocated, causingfrequent occurrence of the GC process.

Procedure

Step 1 Check the GC time.


2. On MRS Manager, choose Services > Spark > Instance and click the JobHistory forwhich the alarm is generated to enter the Instance Status page. Then choose Customize> Garbage Collection (GC) Time of JobHistory and click OK.

3. Check whether the GC time is longer than 12 seconds.



4. On MRS Manager, choose Services > Spark > Service Configuration, and set Type toAll. Choose JobHistory > Default. Increase the value ofSPARK_DAEMON_MEMORY as required.






2. In the Service drop-down list box, select Spark and click OK.



----End

Related Information

N/A

5.7.94 ALM-43010 Heap Memory Usage of the JDBCServerProcess Exceeds the Threshold

Description

The system checks the heap memory usage of the JDBCServer process every 30 seconds. Thealarm is generated when the heap memory usage of the JDBCServer process exceeds thethreshold (90% of the maximum memory).


2019-01-15 259

Attribute


43010 Major Yes

Parameters






Overhigh heap memory usage of the JDBCServer process deteriorates JDBCServer runningperformance or even causes OOM, which results in unavailable JDBCServer process.

Possible Causes

The heap memory of the JDBCServer process is overused or inappropriately allocated.

Procedure

Step 1 Check heap memory usage.


2. On MRS Manager, choose Services > Spark > Instance and click the JDBCServer forwhich the alarm is generated to enter the Instance Status page. Then choose Customize> Statistics for the heap memory of the JDBCServer Process and click OK.

3. Check whether the used heap memory of the JDBCServer process reaches 90% of themaximum heap memory specified for JDBCServer.– If yes, go to Step 1.4.– If no, go to Step 2.

4. On MRS Manager, choose Services > Spark > Service Configuration, and set Type toAll. Choose JDBCServer > Tuning. Increase the value ofSPARK_DRIVER_MEMORY as required.



2019-01-15 260






----End

Related Information

N/A

5.7.95 ALM-43011 Non-Heap Memory Usage of the JDBCServerProcess Exceeds the Threshold

Description

The system checks the non-heap memory usage of the JDBCServer process every 30 seconds.The alarm is generated when the non-heap memory usage of the JDBCServer process exceedsthe threshold (90% of the maximum memory).

Attribute


43011 Major Yes

Parameters






Overhigh non-heap memory usage of the JDBCServer process deteriorates JDBCServerrunning performance or even causes OOM, which results in unavailable JDBCServer process.


2019-01-15 261

Possible Causes

The non-heap memory of the JDBCServer process is overused or inappropriately allocated.

Procedure

Step 1 Check non-heap memory usage.


2. On MRS Manager, choose Services > Spark > Instance and click the JDBCServer forwhich the alarm is generated to enter the Instance Status page. Then choose Customize> Statistics for the non-heap memory of the JDBCServe Process and click OK.

3. Check whether the used non-heap memory of the JDBCServer process reaches 90% ofthe maximum non-heap memory specified for JDBCServer.



4. On MRS Manager, choose Services > Spark > Service Configuration, and set Type toAll. Choose JDBCServer > Tuning. Increase the value of-XX:MaxMetaspaceSize inspark.driver.extraJavaOptions as required.









----End

Related Information

N/A

5.7.96 ALM-43012 Direct Memory Usage of the JDBCServerProcess Exceeds the Threshold

Description

The system checks the direct memory usage of the JDBCServer process every 30 seconds.The alarm is generated when the direct memory usage of the JDBCServer process exceeds thethreshold (90% of the maximum memory).


2019-01-15 262

Attribute


43012 Major Yes

Parameters






Overhigh direct memory usage of the JDBCServer process deteriorates JDBCServer runningperformance or even causes OOM, which results in unavailable JDBCServer process.

Possible Causes

The direct memory of the JDBCServer process is overused or inappropriately allocated.

Procedure

Step 1 Check direct memory usage.


2. On MRS Manager, choose Services > Spark > Instance and click the JDBCServer forwhich the alarm is generated to enter the Instance Status page. Then choose Customize> Direct Memory of JDBCServer and click OK.

3. Check whether the used direct memory of the JDBCServer process reaches 90% of themaximum direct memory specified for JDBCServer.– If yes, go to Step 1.4– If no, go to Step 2.

4. On MRS Manager, choose Services > Spark > Service Configuration, and set Type toAll. Choose JDBCServer > Tuning. Increase the value of -XX:MaxDirectMemorySizein spark.driver.extraJavaOptions as required.



2019-01-15 263


1. On MRS Manager, choose System > Export Log.2. Select Spark from the Service drop-down list and click OK.3. Set Start Time for log collection to 10 minutes ahead of the alarm generation time and

End Time to 10 minutes behind the alarm generation time, and click Download.4. Contact the O&M personnel and send the collected log information.

----End

Related Information

N/A

5.7.97 ALM-43013 JDBCServer GC Time Exceeds the Threshold

Description

The system checks the garbage collection (GC) time of the JDBCServer process every 60seconds. This alarm is generated when the detected GC time exceeds the threshold (exceeds12 seconds for three consecutive checks.) To change the threshold, choose System >Threshold Configuration > Service > Spark > Garbage Collection (GC) Time ofJDBCServer > Total GC time in milliseconds. This alarm is cleared when the JDBCServerGC time is shorter than or equal to the threshold.

Attribute


43013 Major Yes

Parameters






If the GC time exceeds the threshold, the JDBCServer process running performance will beaffected and the JDBCServer process will even be unavailable.


2019-01-15 264

Possible CausesThe heap memory of the JDBCServer process is overused or inappropriately allocated,causing frequent occurrence of the GC process.

Procedure

Step 1 Check the GC time.


2. On MRS Manager, choose Services > Spark > Instance and click the JDBCServer forwhich the alarm is generated to enter the Instance Status page. Then choose Customize> Garbage Collection (GC) Time of JDBCServer and click OK.

3. Check whether the GC time is longer than 12 seconds.– If yes, go to Step 1.4.– If no, go to Step 2.

4. On MRS Manager, choose Services > Spark > Service Configuration, and set Type toAll. Choose JDBCServer > Tuning. Increase the value ofSPARK_DRIVER_MEMORY as required.



1. On MRS Manager, choose System > Export Log.2. In the Service drop-down list box, select Spark and click OK.3. Set Start Time for log collection to 10 minutes ahead of the alarm generation time and

End Time to 10 minutes behind the alarm generation time, and click Download.4. Contact the O&M personnel and send the collected log information.

----End


5.8 Object Management

5.8.1 IntroductionAn MRS cluster contains different types of basic objects. Table 5-17 describes these objects.


2019-01-15 265

Table 5-17 MRS basic objects

Object Description Example

Service Function set that can complete specificoperations.

KrbServer service and LdapServerservice

Serviceinstance

Specific instance of a service, oftenreferred to as a service.

KrbServer service

Servicerole

Functional entity that forms a completeservice, often referred to as a role.

KrbServer consists of theKerberosAdmin role and theKerberosServer role.

Roleinstance

Specific instance of a service rolerunning on a host.

KerberosAdmin running on Host2and KerberosServer running onHost3

Host Elastic Cloud Server (ECS) running aLinux OS.

Host1 to Host5

Rack Physical entity that contains multiplehosts connecting to the same switch.

Rack1 contains Host1 to Host5.

Cluster Logical entity that consists of multiplehosts and provides various services.

Cluster1 consists of five hosts(Host1 to Host5) and providesservices such as KrbServer andLdapServer.

5.8.2 Querying Configurations

Scenario

On MRS Manager, users can query the configurations of services (including roles) and roleinstances.

Procedurel Query service configurations.

a. On MRS Manager, click Service.b. Select the target service from the service list.c. Click Service Configuration.d. Set Type to All. All configuration parameters of the service are displayed in the

navigation tree. The root nodes in the navigation tree represent the service namesand role names.

e. In the navigation tree, choose a parameter and change its value. You can also enterthe parameter name in Search to search for it and view the result.The parameters under both the service and role nodes are configuration parameters.

l Query role instance configurations.

a. On MRS Manager, click Service.


2019-01-15 266

b. Select the target service from the service list.c. Click the Instance tab.d. Click the target role instance in the role instance list.e. Click Instance Configuration.f. Set Type to All. All configuration parameters of the service are displayed in the

navigation tree. The root nodes in the navigation tree represent the service namesand role names.

g. In the navigation tree, choose a parameter and change its value. You can also enterthe parameter name in Search to search for it and view the result.

5.8.3 Managing Services

ScenarioOn MRS Manager, users can perform the following operations:

l Start a service that is in the Stopped, Stopped_Failed, or Start_Failed state.l Stop unused or abnormal services.l Restart abnormal services or configure expired services to restore or enable the services.

Procedure

Step 1 On MRS Manager, click Service.

Step 2 Locate the row that contains the target service, click Start, Stop, or Restart to start, stop, orrestart the service.

Services are interrelated. If a service is started, stopped, or restarted, services dependent on itwill be affected.

The services will be affected in the following ways:

l If a service is to be started, the lower-layer services dependent on it must be started first.l If a service is stopped, the upper-layer services dependent on it are unavailable.l If a service is restarted, the running upper-layer services dependent on it must be

restarted.

----End

5.8.4 Configuring Service Parameters

ScenarioOn MRS Manager, users can view and modify the default service configurations based on siterequirements. Configurations can be imported and exported.

Impact on the Systeml After the attributes of HBase, HDFS, Hive, Spark, Yarn, and Mapreduce are configured,

the client configurations need to be downloaded to update the files.l The parameters of DBService cannot be modified if only one DBService role instance

exists in the cluster.


2019-01-15 267

Procedurel Modify a service.

a. Click Service.

b. Select the target service from the service list.

c. Click the Service Configuration tab.

d. Set Type to All. All configuration parameters of the service are displayed in thenavigation tree. The root nodes in the navigation tree represent the service namesand role names.

e. In the navigation tree, choose a parameter and change its value. You can also enterthe parameter name in Search to search for it and view the result.

You can click to restore a parameter value.

NOTE

You can also use host groups to change role instance configurations in batches. Choose arole name in Role, and then choose <select hosts> in Host. Enter a name in Host GroupName, select the target host from All Hosts and add it to Selected Hosts. Click OK to add itto the host group. The added host group can be selected from Host and is only valid on thecurrent page. The page cannot be saved after being refreshed.

f. Click Save Configuration, select Restart the affected services or instances, andclick OK to restart the service.

Click Finish when the system displays Operation succeeded. The service issuccessfully started.

NOTE

If you do not restart Yarn after upgrading its queue configuration, you can choose More >Refresh the queue for the configuration to take effect.

l Export service configuration parameters.

a. Click Service.

b. Select a service.

c. Click Service Configuration.

d. Click Export Service Configuration. Select a path for saving the configurationfiles.

l Import service configuration parameters.

a. Click Service.

b. Select a service.

c. Click Service Configuration.

d. Click Import Service Configuration.

e. Select the target configuration file.

f. Click Save Configuration, and select Restart the affected services or instances.Click OK.

When Operation succeeded is displayed, click Finish. The service is startedsuccessfully.


2019-01-15 268

5.8.5 Configuring Customized Service Parameters

Scenario

Each component of MRS supports all open source parameters. MRS Manager supports themodification of some parameters for key application scenarios. Some component clients maynot include all parameters with open source features. To modify the component parametersthat are not directly supported by MRS Manager, users can add new parameters forcomponents by using the configuration customization function on MRS Manager. Newlyadded parameters are saved in component configuration files and take effect after thecomponent is restarted.

Impact on the Systeml After the service attributes are configured, the service needs to be restarted. The service

cannot be accessed during the restart.l After the attributes of HBase, HDFS, Hive, Spark, Yarn, and MapReduce are configured,

the client configurations need to be downloaded to update the files.

Prerequisites

You have learned the meanings of parameters to be added, configuration files to take effect,and impact on components.

Procedure


Step 2 Select the target service from the service list.

Step 3 Click Service Configuration.

Step 4 Set Type to All.

Step 5 In the navigation tree, choose Customization. The customized parameters of the currentcomponent are displayed on MRS Manager.

The configuration files that save the newly added customized parameters are displayed inParameter File. Different configuration files may support open source parameters with thesame names. After the parameters in different files are set to different values, theconfiguration effect depends on the sequence of the configuration files that are loaded bycomponents. Service-level and role-level customized parameters are supported. Performconfiguration based on the actual service requirements. Customized parameters for a singlerole instance are not supported.

Step 6 Based on the configuration files and parameter functions, enter parameter names supported bycomponents in Name and enter the parameter values in the Value column of the row wherethe parameters are located.

l You can click or to add or delete a customized parameter. A customized

parameter can be deleted only after you click to add the parameter.

l You can click to restore a parameter value.


2019-01-15 269

Step 7 Click Save Configuration, select Restart the affected services or instances, and click OKto restart the service.

When Operation succeeded is displayed, click Finish. The service is started successfully.

----End

Task Example

Configuring Customized Hive Parameters

Hive depends on HDFS. By default, Hive accesses the client of HDFS. The configurationparameters to take effect are controlled by HDFS in a unified manner. For example, HDFSparameter ipc.client.rpc.timeout affects the RPC timeout period for all clients to connect tothe HDFS server. If you need to modify the timeout period for Hive to connect to HDFS, youcan use the configuration customization function. After this parameter is added to the core-site.xml file of Hive, it can be identified by the Hive service and replace the HDFSconfiguration.

Step 1 On MRS Manager, choose Service > Hive > Service Configuration.

Step 2 Set Type to All.

Step 3 In the navigation tree, choose Customization of the Hive service level. The service-levelcustomized parameters supported by Hive are displayed on MRS Manager.

Step 4 In the Name: column of the core.site.customized.configs parameter in core-site.xml, enteripc.client.rpc.timeout, and enter the new parameter value in Value. For example, enter150000. The unit is millisecond.

Step 5 Click Save Configuration, select Restart the affected services or instances, and click OKto restart the service.

When Operation succeeded is displayed, click Finish. The service is successfully started.

----End

5.8.6 Synchronizing Service Configurations

Scenario

If Configuration Status of any service is Expired or Failed, users can synchronizeconfigurations for the cluster or service to recover its configuration status. If the configurationstatus of all services in the cluster is Failed, synchronize the cluster configurations with thebackground configurations.


After synchronizing service configuration, users need to restart the service that had an expiredconfiguration. The service is unavailable during the restart.

Procedure




2019-01-15 270

Step 3 Click More in the upper pane, and select Synchronize Configuration from the drop-downlist.

Step 4 In the dialog box that is displayed, select Restart services or instances whoseconfigurations have expired, and click OK.


----End

5.8.7 Managing Role Instances

Scenario

Users can start a role instance that is in the Stopped, Stopped_Failed, or Start_Failed state,stop an unused or abnormal role instance, or restart an abnormal role instance to recover itsfunctions.

Procedure



Step 3 Click the Instance tab.

Step 4 Select the check box on the left of the target role instance.

Step 5 Choose More > Start Instance, Stop Instance, or Restart Instance to perform the requiredoperation.

----End

5.8.8 Configuring Role Instance Parameters

Scenario

View and modify default role instance configurations on MRS Manager. Parameters must beconfigured based on site requirements. Configurations can be imported and exported.


After the attributes of HBase, HDFS, Hive, Spark, Yarn, and Mapreduce are configured, theclient configurations need to be downloaded to update the files.

Procedurel Modify role instance configurations.

a. On MRS Manager, click Service.b. Select the target service from the service list.c. Click the Instance tab.d. Click the target role instance in the role instance list.e. Click the Instance Configuration tab.


2019-01-15 271

f. Set Type to All. All configuration parameters of the role instances are displayed inthe navigation tree.

g. In the navigation tree, choose a parameter and change its value. You can also enterthe parameter name in Search to search for the parameter and view the result.

You can click to restore a parameter value.h. Click Save Configuration, select Restart the role instance, and click OK to

restart the role instance.When Operation succeeded is displayed, click Finish. The service is startedsuccessfully.

l Export configuration data of a role instance.

a. On MRS Manager, click Service.b. Select a service.c. Select a role instance or click Instance.d. Select a role instance on a specified host.e. Click Instance Configuration.f. Click Export Instance Configuration to export the configuration data of a

specified role instance, and choose a path for saving the configuration file.l Import configuration data of a role instance.

a. Click Service.b. Select a service.c. Select a role instance or click Instance.d. Select a role instance on a specified host.e. Click Instance Configuration.f. Click Import Instance Configuration to import configuration data of a specified

role instance.g. Click Save Configuration and select Restart the role instance. Click OK.

When Operation succeeded is displayed, click Finish. The service is startedsuccessfully.

5.8.9 Synchronizing Role Instance Configuration

Scenario

When the Configuration Status of a role instance is Expired or Failed, users cansynchronize the configuration data of the role instance with the background configuration.


After synchronizing a role instance configuration, you need to restart the role instance thathad an expired configuration. The role instance is unavailable during the restart.

Procedure

Step 1 On MRS Manager, click Service and choose a service name.


2019-01-15 272


Step 3 Click the target role instance in the role instance list.

Step 4 Click More in the upper pane, and select Synchronize Configuration from the drop-downlist.

Step 5 In the dialog box that is displayed, select Restart services or instances whoseconfigurations have expired, and click OK to restart a role instance.


----End

5.8.10 Decommissioning and Recommissioning Role Instances

Scenario

If a Core node is faulty, the cluster status may become abnormal. In an MRS cluster, data canbe stored on different Core nodes. Users can decommission the specified DataNode roleinstance of HDFS or the NodeManager role instance of Yarn on MRS Manager to stop therole instance from providing services. After fault rectification, users can recommission theDataNode or NodeManager role instance.

l If the number of DataNodes is less than or equal to the number of HDFS copies,decommissioning cannot be performed. For example, if the number of HDFS copies isthree and the number of DataNodes is less than four in the system, decommissioningcannot be performed. After the decommissioning is performed for 30 minutes, an errorwill be reported and force MRS Manager to exit the decommissioning.

l To reuse a decommissioned role instance, users must recommission and restart it.

Procedure


Step 2 In the service list, click HDFS or Yarn.


Step 4 Select the check box in front of the specified DataNode or NodeManager role instancename.

Step 5 Click More, and select Decommission Role Instance or Recommission from the drop-downlist.

NOTE

If the HDFS service is restarted in another browser or window while the instance decommissioningoperation is in progress, MRS Manager displays a message indicating that the decommissioning issuspended and Operating Status is Started. However, the instance decommissioning is actuallycomplete in the background. You need to decommission the instance again to synchronize the status.

----End


2019-01-15 273

5.8.11 Managing a Host

ScenarioTo check an abnormal or faulty host, users need to stop all host roles on MRS Manager. Torecover host services after the host fault is rectified, restart all roles.

Procedure


Step 2 Select the check box of the target host.

Step 3 Choose More > Start All Roles or More > Stop All Roles to perform the required operation.

----End

5.8.12 Isolating a Host

ScenarioIf a host is found to be abnormal or faulty, affecting cluster performance or preventingservices from being provided, users can temporarily exclude that host from the availablenodes in the cluster. In this way, the client can access other available nodes. In scenarioswhere patches are to be installed in a cluster, users can also exclude a specified node frompatch installation.

Users can isolate a host manually on MRS Manager based on the actual service requirementsor O&M plan. Only non-management nodes can be isolated.

Impact on the Systeml After a host is isolated, all role instances on the host will be stopped. You cannot start,

stop, or configure the host or any instances on the host.l After a host is isolated, statistics of the monitoring status and indicator data of the host

hardware and instances cannot be collected or displayed.

Procedure


Step 2 Select the check box of the host to be isolated.

Step 3 Choose More > Isolate Host.

Step 4 In Isolate Host, click OK.

When Operation succeeded is displayed, click Finish. The host is isolated successfully, andthe value of Operating Status becomes Isolated.

NOTE

The isolation of a host can be canceled and the host can be added to the cluster again. For details, seeCanceling Isolation of a Host.

----End


2019-01-15 274

5.8.13 Canceling Isolation of a Host

Scenario

After a host fault is rectified, users must cancel the isolation of the host so that the host can beused properly.

Users can cancel the isolation of a host on MRS Manager.

Prerequisitesl The host status is Isolated.l The host fault has been rectified.

Procedure


Step 2 Select the checkbox of the host that you want to cancel its isolation.

Step 3 Choose More > Cancel Host Isolation.

Step 4 In Cancel Host Isolation, click OK.

When Operation succeeded is displayed, click Finish. Host isolation is canceledsuccessfully, and the value of Operational Status becomes Normal.

Step 5 Click the name of the host for which isolation has been canceled. Status of the host isdisplayed. Click Start All Roles.

----End

5.8.14 Starting and Stopping a Cluster

Scenarios

A cluster is a collection of service components. Users can start or stop all services in a cluster.

Procedure


Step 2 Click More above the service list and choose Start Cluster or Stop Cluster from the drop-down list.

----End

5.8.15 Synchronizing Cluster Configurations

Scenarios

If Configuration Status of any service is Expired or Failed, users can synchronizeconfigurations to recover the configuration status.


2019-01-15 275

l If the configuration status of all services in the cluster is Failed, synchronize the clusterconfigurations with the background configurations.

l If the configuration status of some services in the cluster is Failed, synchronize thespecified service configurations with the background configurations.

Impact on the SystemAfter synchronizing cluster configurations, users need to restart the service that has an expiredconfiguration. The service is unavailable during the restart.

Procedure


Step 2 Click More above the service list, and choose Synchronize Configuration from the drop-down list.

Step 3 In the dialog box that is displayed, select Restart services or instances whoseconfigurations have expired, and click OK.

When Operation succeeded is displayed, click Finish. The cluster is successfully started.

----End

5.8.16 Exporting Configuration Data of a Cluster

ScenariosUsers can export all configuration data of a cluster from MRS Manager to meet actual servicerequirements. The exported file can be used to rapidly update service configurations.

Procedure


Step 2 Click More above the service list, and choose Export Cluster Configuration from the drop-down list.

The exported file is used to update service configurations. For details, see Import serviceconfiguration parameters in Configuring Service Parameters.

----End

5.9 Log Management

5.9.1 Viewing and Exporting Audit Logs

ScenarioOn MRS Manager, view and export audit logs for post-event tracing, fault cause locating, andresponsibility classification of security events.

The system records the following log information:


2019-01-15 276

l User activity information, such as user login and logout, and modifications to systemuser and system user group information

l Information about user operation instructions, such as cluster startups and shutdowns,and software upgrades.

Procedurel View the audit logs.

a. On MRS Manager, click Audit to view the default audit logs.If the content of the audit log contains more than 256 characters, click the unfoldbutton to unfold audit details and then click log file to download the complete logfile.n By default, audit logs are displayed in descending order by Occurred On. You

can click Operation Type, Severity, Occurred On, User, Host, Service,Instance, or Operation Result to change the display mode.

n You can filter out all audit logs of the same severity in Severity, includingboth cleared and uncleared alarms.

Export the audit logs, which contain the following information:n Sno: indicates the number of audit logs generated by MRS Manager. The

number increases by 1 when a new audit log is generated.n Operation Type: indicates the type of user operations. User operations are

classified into the following scenarios: User_Manager, Cluster, Service,Host, Alarm, Collect Log, Auditlog, Backup And Restoration, Tenant.User_Manager is supported only by clusters with Kerberos authenticationenabled. Each scenario contains different operation types. For example, Alarmcontains Export alarms, Cluster contains Start Cluster, and Tenant containsAdd Tenant.

n Severity: indicates the security level of each audit log, including Critical,Major, Minor, and Information.

n Start Time: indicates the CET or CEST time when a user operation starts.n End Time: indicates the CET or CEST time when a user operation ends.n User IP Address: indicates the IP address used by a user.n User: indicates the name of a user who performs the operations.n Host: indicates the node where a user operation is performed. The information

is not saved if the operation does not involve a node.n Service: indicates the service on which a user operation is performed. The

information is not saved if the operation does not involve a service.n Instance: indicates the role instance on which a user operation is performed.

The information is not saved if the operation does not involve a role instance.n Operation Result: indicates the user operation result, including Successful,

Failed, and Unknown.n Content: indicates execution information of the user operation.

b. Click Advanced Search. In the audit log search area, set search criteria and clickSearch to view the desired audit logs. Click Reset to reset search criteria.

NOTE

You can set Start Time and End Time to specify the time range when logs are generated.


2019-01-15 277

l Export the audit logs.In the audit log list, select the check box of a log and click Export, or click Export All.

5.9.2 Exporting Services Logs

Scenario

Export the logs of each service role from MRS Manager.

Prerequisitesl You have obtained the Access Key ID (AK) and Secret Access Key (SK) for the

corresponding account. For details, see see the My Credential User Guide. (MyCredential > How Do I Manage User Access keys (AK/SK)?)

l You have created a bucket in the Object Storage Service (OBS) system for thecorresponding account. For details, see the Object Storage Service User Guide. (ObjectStorage Service > Quick Start > Common Operations Using OBS Console >Creating a Bucket)

Procedure


Step 2 Click Export Log under Maintenance.

Step 3 Click Service, set Host to the IP address of the host where the service is deployed, and setStart Time and End Time.

Step 4 In Export to, specify a path for saving logs. This parameter is available only for clusters withKerberos authentication enabled.l Local PC: stores logs in a local directory. If you select this option, go to Step 7.l OBS: stores data in the OBS system and is used by default. If you select this option, go

to Step 5.

Step 5 In OBS Path, specify the path where service logs are stored in the OBS system.

Fill in the full path. The path must not start with /. You do not need to create the path inadvance because the system creates it automatically. The full OBS path contains a maximumof 900 bytes.

Step 6 In Bucket Name, enter the name of the created OBS bucket. In AK and SK, enter the AccessKey ID and Secret Access Key for the account.

Step 7 Click OK to export logs.

----End

5.9.3 Configuring Audit Log Dumping Parameters

Scenario

If audit logs on MRS Manager are stored in the database for a long time, the disk space for thedata directory may become insufficient. Therefore, you must set dump parameters toautomatically dump audit logs to a specified directory on a server.


2019-01-15 278

If you do not configure the audit log dumping function, when the number of audit logsreaches 100,000, the system automatically saves them to a file. On the active managementnode, the save path is ${BIGDATA_HOME}/OMSV100R001C00x8664/workspace/conf/data/operatelog, and the file name format isOperateLog_store_YY_MM_DD_HH_MM_SS.csv. A maximum of 50 historical files ofaudit logs can be saved. The directory will be automatically generated when audit logs aredumped for the first time.

Prerequisitesl The corresponding ECS of the dump server and the Master node of the MRS cluster are

deployed on the same VPC.l The Master node can access the IP address and specific ports of the dump server.l The SFTP service of the dump server is running properly.

Procedure


Step 2 Click Dump Audit Log under Maintenance.

Table 5-18 Description of audit log dumping parameters


Dump Audit Log l Onl Off

MandatorySpecifies whether to enable auditlog dumping.l On: Audit log dumping is

enabled.l Off: Audit log dumping is

disabled.

Dump Mode l By quantityl By time

MandatorySpecifies the dump mode.l By quantity: The dump starts

when the number of audit logsreaches the upper limit(100,000 by default).

l By time: The dump starts on aspecified date.

SFTP IP Address 192.168.10.51 (examplevalue)

MandatorySpecifies the SFTP server onwhich the dumped audit logs arestored.

SFTP Port 22 (example value) MandatorySpecifies the connection port ofthe SFTP server on which thedumped audit logs are stored.


2019-01-15 279


Save Path /opt/omm/oms/auditLog(example value)

MandatorySpecifies the path for storing auditlogs on the SFTP server.

SFTP Username root (example value) MandatorySpecifies the username for loggingin to the SFTP server.

SFTP Password Root_123 (example value) MandatorySpecifies the password for loggingin to the SFTP server.

SFTP Public Key - OptionalSpecifies the public key of theSFTP server. You are advised toset this parameter; otherwise,security risks may arise.

Dump Date Nov 06 (example value) MandatorySpecifies the date when the systemstarts dumping audit logs. Thisparameter is valid when DumpingMode is set to By time. The logsto be dumped include all the auditlogs generated before 00:00 onJanuary 1 of the current year.

NOTE

The key fields in audit log dumping files are described as follows:

l USERTYPE specifies the user type. 0 indicates that the user is a Human-machine user. 1 indicatesthat the user is a Machine-machine user.

l LOGLEVEL specifies the security level. 0 is critical, 1 is major, 2 is minor, and 3 is notice.

l OPERATERESULT specifies the operation results. 0 indicates that the operations are successful. 1indicates that the operations failed.

----End

5.10 Health Check Management

5.10.1 Performing a Health Check

ScenarioTo ensure that cluster parameters, configurations, and monitoring are correct and that thecluster can run stably for a long time, you can perform a health check during routinemaintenance.


2019-01-15 280

NOTE

A system health check includes MRS Manager, service-level, and host-level health checks:

l MRS Manager health checks focus on whether the unified management platform can providemanagement functions.

l Service-level health checks focus on whether components can provide services properly.

l Host-level health checks focus on whether host indicators are normal.

The system health check has three types of check items: Health Status, related alarms, and customizedmonitoring indicators for each check object. The health check results are not always the same as theHealth Status on the portal.

Procedurel Manually perform the health check for all services.

a. On MRS Manager, click Service.b. Choose More > Start Cluster Health Check to start the health check for all

services.

NOTE

l The cluster health checks include MRS Manager, service, and host status checks.

l To perform cluster health checks, you can also choose System > Check Health Status > StartCluster Health Check on MRS Manager.

l To export the health check result, click Export Report in the upper left corner.

l Manually perform the health check for a service.

a. On MRS Manager, click Service, and click the target service in the service list.b. Choose More > Start Service Health Check to start the health check for a

specified service.

l Manually perform the health check for a host.

a. On MRS Manager, click Host.b. Select the check box of the target host.c. Choose More > Start Host Health Check to start the health check for the host.

l Perform an automatic health check.

a. On MRS Manager, click System.b. Under Maintenance, click Check Health Status.c. Click Configure Health Check to configure automatic health check items.

Periodic Health Check indicates whether to enable the automatic health checkfunction. The switch of the Periodic Health Check is disabled by default. Click theswitch to enable the function. Select Daily,Weekly, or Monthly as required.Click OK to save the configuration. The Health check configuration savedsuccessfully is displayed in the upper-right corner.

5.10.2 Viewing and Exporting a Check Report

Scenario

You can view the health check result on MRS Manager and export it for further analysis.


2019-01-15 281

NOTE

A system health check includes MRS Manager, service-level, and host-level health checks:

l MRS Manager health checks focus on whether the unified management platform can providemanagement functions.

l Service-level health checks focus on whether components can provide services properly.

l Host-level health checks focus on whether host indicators are normal.

The system health check has three types of check items: Health Status, related alarms, and customizedmonitoring indicators for each check object. The health check results are not always the same as theHealth Status on the portal.

Prerequisites

You have performed a health check.

Procedure


Step 2 Choose More > View Cluster Health Check Report to view the health check report of thecluster.

Step 3 Click Export Report on the health check report pane to export the report and view detailedinformation about check items.

----End

5.10.3 Configuring the Number of Health Check Reports to BeReserved

Scenario

Health check reports of MRS clusters, services, and hosts may vary with the time andscenario. You can modify the number of health check reports to be reserved on MRS Managerfor later comparison.

This setting is valid for health check reports of clusters, services, and hosts. Report files aresaved in $BIGDATA_DATA_HOME/Manager/healthcheck on the active managementnode by default and are automatically synchronized to the standby management node.

Prerequisitesl You have specified service requirements and planned the save time and health check

frequency.l The disk spaces of the active and standby management nodes are sufficient.

Procedure

Step 1 On MRS Manager, choose System > Check Health Status > Configure Health Check.

Step 2 Set Max. Number of Health Check Reports to the number of health check reports to bereserved. The value ranges from 1 to 100 and the default is 50.


2019-01-15 282

Step 3 Click OK to save the configuration. The Health check configuration saved successfully isdisplayed in the upper-right corner.

----End

5.10.4 Managing Health Check Reports

Scenario

On MRS Manager, you can manage historical health check reports, including viewing,downloading, and deleting them.

Procedurel Download a specified health check report.

a. Choose System > Check Health Status.b. Locate the row that contains the target health check report and click Download File

to download the report file.l Download specified health check reports in batches.

a. Choose System > Check Health Status.b. Select multiple health check reports and click Download File to download them.

l Delete a specified health check report.

a. Choose System > Check Health Status.b. Locate the row that contains the target health check report and click Delete to delete

the report file.l Delete specified health check reports in batches.

a. Choose System > Check Health Status.b. Select multiple health check reports and click Delete File to delete them.

5.11 Static Service Pool Management

5.11.1 Viewing the Status of a Static Service Pool

Scenario

The big data management platform uses static service resource pools to manage and isolateservice resources that are not running on Yarn. The platform dynamically manages the CPU,I/O, and memory capacity that can be used by HBase, HDFS, and Yarn on the deploymentnodes. The system supports time-based automatic policy adjustment for static service resourcepools. This enables a cluster to automatically adjust the parameters at different periods toensure a more efficient utilization of resources.

On MRS Manager, users can view the monitoring indicators of the resources used by eachservice in static service pools. The following indicators are included:

l Overall CPU usage of a servicel Overall disk I/O read rate of a service


2019-01-15 283

l Overall disk I/O write rate of a servicel Overall memory used by a service

Procedure

Step 1 On MRS Manager, click System. In the Resource area, click Configure Static Service Pool.

Step 2 Click Status.

Step 3 View the system resource adjustment base.l System Resource Adjustment Base specifies the maximum amount of resources that

can be used by services on each node in the cluster. If the node has only one service, thisservice exclusively uses the available resources on the node. If the node has multipleservices, they share the available resources.

l CPU(%) specifies the maximum number of CPUs that can be used by services on thenode.

l Memory(%) specifies the maximum memory that can be used by services on the node.

Step 4 View the usage of cluster service resources.

In the Real-Time Statistics area, select All Services., the resource usage of all services in theservice pool is displayed in Real-Time Statistics.

NOTE

Effective Configuration Group specifies the resource control configuration group used by clusterservices currently. By default, the default configuration group is used at all periods in a day. Thisconfiguration group specifies that cluster services can use all CPUs and 70% of the memory of a node.

Step 5 View the resource usage status of a single service.

In the Real-Time Statistics area, select a specified service, The resource usage of the servicein the service pool will be displayed in Real-Time Statistics.

Step 6 Set an interval for automatic page refreshing to refresh nowimmediately.


l Refresh every 30 seconds: refreshes the page once every 30 seconds.l Refresh every 60 seconds: refreshes the page once every 60 seconds.l Stop refreshing: stops page refreshing.

----End

5.11.2 Configuring a Static Service Pool

ScenarioUsers can adjust the resource base on MRS Manager and customize a resource configurationgroup to control the node resources used by cluster services or specify different node CPUsfor cluster services at different periods.

Prerequisitesl After a static service pool is configured, the HDFS and Yarn services need to be

restarted. The services are unavailable during restart.


2019-01-15 284

l After a static service pool is configured, the maximum amount of resources used by theservices and their role instances cannot exceed the threshold.

Procedure

Step 1 Modify the resource adjustment base.

1. On MRS Manager, click System. In the Resource area, click Configure Static ServicePool.

2. Click Configuration. The management page of the service pool configuration group isdisplayed.

3. In System Resource Adjustment Base, modify parameters CPU(%) and Memory(%).You can restrict the maximum number of physical CPUs and memory resources that canbe used by the HBase, HDFS, and Yarn services. If multiple services are deployed on thesame node, the maximum percentage of physical resources used by all services cannotexceed the value of this parameter.

4. Click OK to complete the modification.

To modify the parameters again, click on the right side of System ResourceAdjustment Base.

Step 2 Modify the default configuration group of the service pool.

1. Click default, and set CPU LIMIT(%), CPU SHARE(%), I/O(%), and Memory(%)for the HBase, HDFS, and Yarn services in the Service Pool Configuration table.

NOTE

l The sum of CPU LIMIT(%) used by all services can exceed 100%.

l The sum of CPU SHARE(%) and the sum of I/O(%) used by all services must be 100%. Forexample, if CPU resources are allocated to the HDFS, HBase and Yarn services, the total percentageof the CPU resources allocated to the services must be 100%.

l The sum of Memory(%) used by all services can be greater than, smaller than, or equal to 100%.

l Memory(%) cannot take effect dynamically. This parameter can only be modified in the defaultconfiguration group.

2. Click OK to complete the modification. The correct values of the service poolparameters are generated by MRS Manager in Detailed Configuration based on clusterhardware resources and distribution.

To modify the parameters again, click on the right side of Service PoolConfiguration.

3. Click on the right side of Detailed Configuration to change the parameter values ofthe service pool.

After you click the name of a specified service in Service Pool Configuration, only theparameters of this service will be displayed in Detailed Configuration. The displayedresource usage will not be updated by changing the parameter values manually. Forparameters that take effect dynamically, their names displayed in a newly addedconfiguration group will contain the ID of the configuration group, for example, HBase :RegionServer : dynamic-config1.RES_CPUSET_PERCENTAGE. The parametersfunction in the same way as those in the default configuration group.


2019-01-15 285

Table 5-19 Static service pool parameters


– RES_CPUSET_PERCENTAGE– dynamic-

configX.RES_CPUSET_PERCENTAGE

Specifies the CPU percentage used by aservice.

– RES_CPU_SHARE– dynamic-

configX.RES_CPU_SHARE

Specifies the CPU share used by a service.

– RES_BLKIO_WEIGHT– dynamic-

configX.RES_BLKIO_WEIGHT

Specifies the I/O weight used by a service.

HBASE_HEAPSIZE Specifies the maximum JVM memory ofRegionServer.

HADOOP_HEAPSIZE Specifies the maximum JVM memory ofDataNode.

dfs.datanode.max.locked.memory Specifies the size of the cached memoryblock replica of DataNode in the memory.

yarn.nodemanager.resource.memory-mb

Specifies the memory that can be used byNodeManager on the current node.

Step 3 Add a customized resource configuration group.

1. Determine whether to implement time-based automatic resource configurationadjustment.If yes, go to Step 3.2.If no, go to Step 4.

2. Click to add a resource configuration group. In Scheduling Time, click to openthe time policy configuration page.Modify the following parameters and click OK to save the modification.– Repeat: If Repeat is selected, the resource configuration group runs periodically

according to a schedule. If Repeat is not selected, you need to set a date and timefor the resource configuration group to take effect.

– Repeat On: Daily, Weekly, and Monthly are supported. This parameter takeseffect only in Repeat mode.

– Between: This parameter specifies the start time and end time for the resourceconfiguration group to take effect. Set this parameter to a unique time segment. Ifthe value is the same as the time segment set for an existing configuration group,the settings cannot be saved.This parameter takes effect only in Repeat mode.


2019-01-15 286

NOTE

– The default configuration group takes effect in all undefined time periods.– The newly added configuration group is a configuration item set that takes effect dynamically

in a specified time range.– The newly added configuration group can be deleted. A maximum of four configuration

groups that take effect dynamically can be added.– Select any type of Repeat On. If the end time is earlier than the start time, the end time on the

second day is adopted by default. For example, 22:00 to 6:00 indicates that the scheduling timerange is from 22:00 on the current day to 06:00 on the next day.

– If the types of Repeat On for multiple configuration groups are different, the time segmentscan overlap. Monthly has the highest priority of the policies, Weekly has the second highest,and Daily has the lowest. Therefore, if there are two scheduling configuration groups, and oneis Monthly with a time segment from 04:00 to 07:00, and the other is Daily with a timesegment from 06:00 to 08:00, the Monthly configuration group takes precedence.

– If the types of Repeat On for multiple configuration groups are the same, the time segmentscan overlap when the dates are different. For example, if two Weekly scheduling configurationgroups exist, the time segments can be specified from 04:00 to 07:00 on Monday and 04:00 to07:00 on Wednesday.

3. Modify the resource configuration of each service in Service Pool Configuration, clickOK, and go to Step 4.

You can click to modify the parameters again. You can click in DetailedConfiguration to manually update the parameter values generated by the system basedon service requirements.

Step 4 Save the configuration.

Click Save, select Restart the affected services or instances in the Save Configurationwindow, and click OK.

When Operation succeeded is displayed, click Finish.

----End

5.12 Tenant Management

5.12.1 Introduction

DefinitionAn MRS cluster provides various resources and services for multiple organizations,departments, or applications to share. The cluster provides tenants as a logical entity to usethese resources and services. A mode involving different tenants is called multi-tenant mode.Currently, tenants are supported by analysis clusters only.

PrinciplesThe MRS cluster provides the multi-tenant function. It supports a layered tenant model andallows dynamic adding or deleting of tenants to isolate resources. It dynamically manages andconfigures tenants' computing and storage resources.

The computing resources indicate tenants' Yarn task queue resources. The task queue quotacan be modified, and the task queue usage status and statistics can be viewed.


2019-01-15 287

Storage resources support HDFS storage. Tenants' HDFS storage directories can be added ordeleted, and the quotas for file quantity and storage space of the directories can be configured.

As the unified tenant management platform of the MRS cluster, MRS Manager provides amature multi-tenant management model for enterprises, implementing centralized tenant andservice management. Users can create and manage tenants in the cluster.

l Roles, computing resources, and storage resources are automatically created whentenants are created. By default, all rights on the new computing and storage resources areassigned to the tenant roles.

l By default, the permission to view tenant resources, create sub-tenants, and manage sub-tenant resources is assigned to the tenant roles.

l After tenants' computing or storage resources are modified, the related role rights areupdated automatically.

MRS Manager supports a maximum of 512 tenants. The tenants that are created by default inthe system contain default. Tenants that are in the topmost layer with the default tenant arecalled level-1 tenants.

Resource Pool

Yarn task queues support only the label-based scheduling policy. This policy enables Yarntask queues to associate with NodeManagers that have specific node labels. In this way, Yarntasks run on specified nodes for scheduling and certain hardware resources are utilized. Forexample, Yarn tasks requiring a large memory capacity can run on nodes with a large memorycapacity by means of label association, preventing poor service performance.

On the MRS cluster, users can logically divide Yarn cluster nodes to combine multipleNodeManagers into a resource pool. Yarn task queues can be associated with specifiedresource pools by configuring queue capacity policies, ensuring efficient and independentresource utilization in the resource pools.

MRS Manager supports a maximum of 50 resource pools. The system has a Default resourcepool.

5.12.2 Creating a Tenant

Scenario

You can create a tenant on MRS Manager to specify the resource usage.

Prerequisitesl A tenant name has been planned. The name must not be the same as that of a role or

Yarn queue that exists in the current cluster.

l If a tenant requires storage resources, a storage directory has been planned in advancebased on service requirements, and the planned directory does not exist under the HDFSdirectory.

l The resources that can be allocated to the current tenant have been planned and the sumof the resource percentages of direct sub-tenants under the parent tenant at every leveldoes not exceed 100%.


2019-01-15 288

Procedure

Step 1 On MRS Manager, click Tenant.

Step 2 Click Create Tenant. On the displayed page, configure tenant attributes according to thefollowing table.

Table 5-20 Tenant parameters


Name Specifies the name of the current tenant. The value consistsof 3 to 20 characters, and can contain letters, digits, andunderscores (_).

Tenant Type The options include Leaf and Non-leaf. If Leaf is selected,the current tenant is a leaf tenant and no sub-tenant can beadded. If Non-leaf is selected, sub-tenants can be added tothe current tenant.

Dynamic Resource Specifies the dynamic computing resources for the currenttenant. The system automatically creates a task queue inYarn and the queue is given the same name as the tenant. Ifdynamic resources are not Yarn resources, the system doesnot automatically create a task queue.

Default Resource PoolCapacity (%)

Specifies the percentage of the computing resources used bythe current tenant in the default resource pool.

Default Resource PoolMax. Capacity (%)

Specifies the maximum percentage of the computingresources used by the current tenant in the default resourcepool.

Storage Resource Specifies the storage resources for the current tenant. Thesystem automatically creates a file folder in the /tenantdirectory, which is given the same name as the tenant. Whenthe tenant is created, the system automatically creates the /tenant directory under the root directory of HDFS. Ifstorage resources are not HDFS, the system does not createa storage directory under the root directory of HDFS.

Space Quota (MB) Specifies the quota for HDFS storage space used by thecurrent tenant. The value of Space Quota ranges from 1 to8796093022208 and the unit is MB. This parameterindicates the maximum HDFS storage space that can be usedby the tenant, not the actual space used. If the value isgreater than the size of the HDFS physical disk, themaximum space available is the full space of the HDFSphysical disk.NOTE

To ensure data reliability, two more copies of a file are automaticallygenerated when the file is stored in HDFS. That is, three copies ofthe same file are stored by default. The HDFS storage spaceindicates the total disk space occupied by all these copies. Forexample, if Space Quota is set to 500, the actual space for storingfiles is about 166 MB (500/3 = 166).


2019-01-15 289


Storage Path Specifies the tenant's HDFS storage directory. The systemautomatically creates a file folder in the /tenant directory,which is given the same name as the tenant. For example,the default HDFS storage directory for tenant ta1 is tenant/ta1. When the tenant is created, the system automaticallycreates the /tenant directory under the root directory ofHDFS. The storage path is customizable.

Service Specifies other service resources associated with the currenttenant. HBase is supported. To configure this parameter,click Associate Services. In the dialog box that is displayed,set Service to HBase. If Association Mode is set toExclusive, service resources are occupied exclusively. Ifshare is selected, service resources are shared.

Description Specifies the description of the current tenant.

Step 3 Click OK to save the settings.

It takes a few minutes to save the settings. If the Tenant created successfully is displayed inthe upper-right corner, the tenant is added successfully.

NOTE

l Roles, computing resources, and storage resources are automatically created when tenants arecreated.

l The new role has the rights on the computing and storage resources. The role and the rights arecontrolled by the system automatically and cannot be controlled manually under Manage Role.

l If you want to use the tenant, create a system user and assign the Manager_tenant role and the rolecorresponding to the tenant to the user. For details, see Creating a User.

----End

Related TasksViewing an added tenant


Step 2 In the tenant list on the left, click the name of an added tenant.

The Summary tab is displayed on the right by default.

Step 3 View Basic Information, Resource Quota, and Statistics of the tenant.

If HDFS is in the Stopped state, Available and Usage of Space in Resource Quota areunknown.

----End


2019-01-15 290

5.12.3 Creating a Sub-tenant

Scenario

You can create a sub-tenant on MRS Manager if the resources of the current tenant need to befurther allocated.

Prerequisitesl A parent tenant has been added.l A tenant name has been planned. The name must not be the same as that of a role or

Yarn queue that exists in the current cluster.l If a sub-tenant requires storage resources, a storage directory has been planned in

advance based on service requirements, and the planned directory does not exist underthe storage directory of the parent tenant.

l The resources that can be allocated to the current tenant have been planned and the sumof the resource percentages of direct sub-tenants under the parent tenant at every leveldoes not exceed 100%.

Procedure


Step 2 In the tenant list on the left, move the cursor to the tenant node to which a sub-tenant is to beadded. Click Create sub-tenant . On the displayed page, configure the sub-tenant attributesaccording to the following table.

Table 5-21 Sub-tenant parameters


Parent tenant Specifies the name of the parent tenant.

Name Specifies the name of the current tenant. The value consistsof 3 to 20 characters, and can contain letters, digits, andunderscores (_).

Tenant Type The options include Leaf and Non-leaf. If Leaf is selected,the current tenant is a leaf tenant and no sub-tenant can beadded. If Non-leaf is selected, sub-tenants can be added tothe current tenant.

Dynamic Resource Specifies the dynamic computing resources for the currenttenant. The system automatically creates a task queue in theYarn parent tenant queue and the task queue name is thesame as the name of the sub-tenant. If dynamic resources arenot Yarn resources, the system does not automatically createa task queue. If the parent tenant does not have dynamicresources, the sub-tenant cannot use dynamic resources.

Default Resource PoolCapacity (%)

Specifies the percentage of the computing resources used bythe current tenant. The base value is the total resources ofthe parent tenant.


2019-01-15 291


Default Resource PoolMax. Capacity (%)

Specifies the maximum percentage of the computingresources used by the current tenant. The base value is thetotal resources of the parent tenant.

Storage Resource Specifies the storage resources for the current tenant. Thesystem automatically creates a file folder in the HDFS parenttenant directory, which is given the same name as the sub-tenant. If storage resources are not HDFS, the system doesnot create a storage directory under the HDFS directory. Ifthe parent tenant does not have storage resources, the sub-tenant cannot use storage resources.

Space Quota (MB) Specifies the quota for HDFS storage space used by thecurrent tenant. The minimum value is 1. The maximumvalue is the entire space quota of the parent tenant. The unitis MB. This parameter indicates the maximum HDFSstorage space that can be used by the tenant, but does notindicate the actual space used. If the value is greater than thesize of the HDFS physical disk, the maximum spaceavailable is the full space of the HDFS physical disk. If thisquota is greater than the quota of the parent tenant, the actualstorage space will be affected by the quota of the parenttenant.NOTE

To ensure data reliability, two more copies of a file are automaticallygenerated when the file is stored in HDFS. That is, three copies ofthe same file are stored by default. The HDFS storage spaceindicates the total disk space occupied by all these copies. Forexample, if Space Quota is set to 500, the actual space for storingfiles is about 166 MB (500/3 = 166).

Storage Path Specifies the tenant's HDFS storage directory. The systemautomatically creates a file folder in the parent tenantdirectory, which is given the same name as the sub-tenant.For example, if the sub-tenant is ta1s and the parentdirectory is tenant/ta1, the system sets this parameter for thesub-tenant to tenant/ta1/ta1s by default. The storage path iscustomizable in the parent directory. The parent directory forthe storage path must be the storage directory of the parenttenant.

Service Specifies other service resources associated with the currenttenant. HBase is supported. To configure this parameter,click Associate Services. In the dialog box that is displayed,set Service to HBase. If Association Mode is set toExclusive, service resources are occupied exclusively. IfShare is selected, service resources are shared.

Description Specifies the description of the current tenant.



2019-01-15 292

It takes a few minutes to save the settings.The Tenant created successfully is displayed in theupper-right corner. The tenant is added successfully.

NOTE

l Roles, computing resources, and storage resources are automatically created when tenants arecreated.

l The new role has the rights on the computing and storage resources. The role and the rights arecontrolled by the system automatically and cannot be controlled manually under Manage Role.

l When using this tenant, create a system user and assign the user a related tenant role. For details, seeCreating a User.

----End

5.12.4 Deleting a Tenant

Scenario

On MRS Manager, you can delete a tenant that is not required.

Prerequisitesl A tenant has been added.

l You have checked whether the tenant to be deleted has sub-tenants. If the tenant has sub-tenants, delete them; otherwise, you cannot delete the tenant.

l The role of the tenant to be deleted cannot be associated with any user or user group. Fordetails about how to cancel the binding between roles and users, see Modifying UserInformation.

Procedure


Step 2 In the tenant list on the left, move the cursor to the tenant node where the tenant is to bedeleted. Click Delete .

The Delete Tenant dialog box is displayed. To save tenant data, select Reserve the data ofthis tenant. Otherwise, the tenant storage space will be deleted.

Step 3 Click OK.

It takes a few minutes to save the configuration. The tenant is deleted successfully. Thetenant's role and storage space are deleted.

NOTE

l After the tenant is deleted, the tenant's task queue persists in Yarn.

l If you choose not to reserve data when deleting the parent tenant, data of sub-tenants is also deletedif the sub-tenants use storage resources.

----End


2019-01-15 293

5.12.5 Managing a Tenant Directory

Scenario

You can manage the HDFS storage directory used by a specific tenant on MRS Manager. Themanagement operations include adding a tenant directory, modifying the quotas for directoryfile quantity and storage space, and deleting a directory.

Prerequisites

A tenant with HDFS storage resources has been added.

Procedurel View a tenant directory.

a. On MRS Manager, click Tenant.b. In the tenant list on the left, click the target tenant.c. Click the Resource tab.d. View the HDFS Storage table.

n The Quota column indicates the quotas for the file and directory quantity ofthe tenant directory.

n The Space Quota column indicates the storage space size of the tenantdirectory.

l Add a tenant directory.

a. On MRS Manager, click Tenant.b. In the tenant list on the left, click the tenant whose HDFS storage directory needs to

be added.c. Click the Resource tab.d. In the HDFS Storage table, click Create Directory.

n In Parent Directory, select a storage directory of a parent tenant.This parameter is valid for sub-tenants only. If the parent tenant has multipledirectories, select any one of them.

n Set Path to a tenant directory path.

NOTE

l If the current tenant is not a sub-tenant, the new path is created in the HDFS rootdirectory.

l If the current tenant is a sub-tenant, the new path is created in the specified parentdirectory.

A complete HDFS storage path contains a maximum of 1023 characters. AnHDFS directory name can contain digits, letters, spaces, and underscores (_).The name cannot start or end with a space.

n Set Quota to the quotas for file and directory quantity.Quota is optional. Its value ranges from 1 to 9223372036854775806.

n Set Space Quota to the storage space size of the tenant directory.The value of Space Quota ranges from 1 to 8796093022208.


2019-01-15 294

NOTE

To ensure data reliability, two more copies of a file are automatically generated whenthe file is stored in HDFS. That is, three copies of the same file are stored by default.The HDFS storage space indicates the total disk space occupied by all these copies.For example, if Space Quota is set to 500, the actual space for storing files is about166 MB (500/3 = 166).

e. Click OK. The system creates the tenant directory in the HDFS root directory.

l Modify tenant directory attributes.

a. On MRS Manager, click Tenant.

b. In the tenant list on the left, click the tenant whose HDFS storage directory needs tobe modified.

c. Click the Resource tab.

d. In the HDFS Storage table, click Modify in the Operation column of the specifiedtenant directory.

n Set Quota to the quotas for file and directory quantity.

Quota is optional. Its value ranges from 1 to 9223372036854775806.

n Set Space Quota to the storage space size of the tenant directory.

The value of Space Quota ranges from 1 to 8796093022208.

NOTE

To ensure data reliability, two more copies of a file are automatically generated whenthe file is stored in HDFS. That is, three copies of the same file are stored by default.The HDFS storage space indicates the total disk space occupied by all these copies.For example, if Space Quota is set to 500, the actual space for storing files is about166 MB (500/3 = 166).

e. Click OK.

l Delete a tenant directory.

a. On MRS Manager, click Tenant.

b. In the tenant list on the left, click the tenant whose HDFS storage directory needs tobe deleted.

c. Click the Resource tab.

d. In the HDFS Storage table, click Delete in the Operation column of the specifiedtenant directory.

The default HDFS storage directory configured during tenant creation cannot bedeleted. Only new HDFS storage directories can be deleted.

e. Click OK. The tenant directory is deleted.

5.12.6 Recovering Tenant Data

Scenario

Tenant data is stored on MRS Manager and in cluster components by default. Whencomponents are recovered from faults or reinstalled, some tenant configuration data may beabnormal. In this case, you can manually recover the tenant data.


2019-01-15 295

Procedure


Step 2 In the tenant list on the left, click a tenant node.

Step 3 Check the status of the tenant data.

1. In Summary, check the color of the circle on the left of Basic Information. Greenindicates that the tenant is available and gray indicates that the tenant is unavailable.

2. Click Resource, and check the color of the circle on the left of Yarn or HDFS Storage.Green indicates that the resource is available and gray indicates that the resource isunavailable.

3. Click Service Association, and check the Status column of the associated service table.Good indicates that the component can provide services for the associated tenant, whileBad indicates that the component cannot.

4. If any of the preceding check items is abnormal, go to Step 4 to recover tenant data.

Step 4 Click Restore Tenant Data.

Step 5 In the Restore Tenant Data window, select one or more components whose data needs to berecovered, and click OK. The system automatically recovers the tenant data.

----End

5.12.7 Creating a Resource Pool

ScenarioOn the MRS cluster, users can logically divide Yarn cluster nodes to combine multipleNodeManagers into a Yarn resource pool. Each NodeManager belongs to one resource poolonly. The system contains a Default resource pool by default. All NodeManagers that are notadded to customized resource pools belong to this default resource pool.

You can create a customized resource pool on MRS Manager and add hosts that have not beenadded to other customized resource pools to it.

Procedure


Step 2 Click the Resource Pool tab.

Step 3 Click Create Resource Pool.

Step 4 In Create Resource Pool, set the attributes of the resource pool.l Name:

Enter a name for the resource pool. The name cannot be Default.The name contains 1 to 20 characters and can consist of digits, letters, and underscores(_). However, it must not start with underscores.

l Available Hosts:

In the host list on the left, select the name of a specified host and click to add theselected host to the resource pool. Only hosts in the cluster can be selected. The host listof a resource pool can be left blank.


2019-01-15 296


Step 6 After the resource pool is created, users can view the Name, Members, Association Mode,vCore, and Memory in the resource pool list. Hosts that have been added to the customizedresource pool are no longer members of the Default resource pool.

----End

5.12.8 Modifying a Resource Pool

Scenario

You can modify members of an existing resource pool on MRS Manager.

Procedure



Step 3 Locate the row that contains the specified resource pool, and click Modify in the Operationcolumn.

Step 4 In Modify Resource Pool, modify Added Hosts.l Adding a host: Select the name of a specified host in the host list on the left and click

to add it to the resource pool.l Deleting a host: Select the name of a specified host in the host list on the right and click

to delete it from the resource pool. The host list of a resource pool can be leftblank.


----End

5.12.9 Deleting a Resource Pool

Scenario

You can delete an existing resource pool on MRS Manager.

Prerequisitesl No queue in any cluster is using the resource pool being deleted as the default resource

pool. For details, see Configuring a Queue.l Resource distribution policies of all queues have been cleared from the resource pool

being deleted. For details, see Clearing the Configuration of a Queue.

Procedure




2019-01-15 297

Step 3 Locate the row that contains the specified resource pool, and click Delete in the Operationcolumn.

In the dialog box that is displayed, click OK.

----End

5.12.10 Configuring a Queue

Scenario

On MRS Manager, you can modify queue configurations for a specific tenant.

Prerequisites

A tenant associated with Yarn and allocated with dynamic resources has been added.

Procedure


Step 2 Click the Dynamic Resource Plan tab.

Step 3 Click the Queue Configuration tab.

Step 4 In the tenant queue table, click Modify in the Operation column of the specified tenantqueue.

NOTE

In the tenant list on the left of the Tenant Management tab, click the target tenant. In the displayed

window, choose Resource. On the displayed page, click to open the queue configurationmodification page.

Table 5-22 Queue configuration parameters


Maximum Applications Specifies the maximum number of applications. The valueranges from 1 to 2147483647.

Maximum AMResource Percent

Specifies the maximum percentage of resources that can beused to run ApplicationMaster in a cluster. The value rangesfrom 0 to 1.

Minimum User LimitPercent (%)

Specifies the minimum percentage of user resource usage. Thevalue ranges from 0 to 100.

User Limit Factor Specifies the limit factor of the maximum user resource usage.The maximum percentage of user resource usage can becalculated by multiplying the limit factor with the percentageof the tenant's actual resource usage in the cluster. Theminimum value is 0.

State Specifies the current status of a resource plan. The values areRunning and Stopped.


2019-01-15 298


Default Resource Pool(Default Node LabelExpression)

Specifies the resource pool used by a queue. The default valueis Default. If you want to change the resource pool, configurethe queue capacity first. For details, see Configuring theQueue Capacity Policy of a Resource Pool.

----End

5.12.11 Configuring the Queue Capacity Policy of a Resource Pool

Scenario

After a resource pool is added, the capacity policies of available resources need to beconfigured for Yarn task queues. This ensures that tasks in the resource pool are runningproperly. Each queue can be configured with the queue capacity policy of only one resourcepool. Users can view the queues in any resource pool and configure queue capacity policies.After the queue policies are configured, Yarn task queues and resource pools are associated.

Prerequisitesl A resource pool has been added.

l The task queues are not associated with other resource pools. By default, all task queuesare associated with the default resource pool.

Procedure



Step 3 In Resource Pool, select a specified resource pool.

Available Resource Quota: indicates that all resources in each resource pool are available forqueues by default.

Step 4 Locate the specified queue in the Resource Allocation table, and click Modify in theOperation column.

Step 5 In Modify Resource Allocation, configure the resource capacity policy of the task queue inthe resource pool.

l Capacity (%): specifies the percentage of the current tenant's computing resource usage.

l Maximum Capacity (%): specifies the percentage of the current tenant's maximumcomputing resource usage.


----End


2019-01-15 299

5.12.12 Clearing the Configuration of a Queue

ScenarioUsers can clear the configuration of a queue on MRS Manager when the queue does not needresources from a resource pool or if a resource pool needs to be disassociated from the queue.Clearing the configuration of a queue means that the resource capacity policy of the queue iscanceled.

PrerequisitesIf a queue needs to be unbound from a resource pool, this resource pool cannot serve as thedefault resource pool of the queue. Therefore, you must first change the default resource poolof the queue to another one. For details, see Configuring a Queue.

Procedure



Step 3 In Resource Pool, select a specified resource pool.

Step 4 Locate the specified queue in the Resource Allocation table, and click Clear in theOperation column.

In Clear Queue Configuration, click OK to clear the queue configuration in the currentresource pool.

NOTE

If no resource capacity policy is configured for a queue, the clearance function is unavailable for thequeue by default.

----End

5.13 Backup and Restoration

5.13.1 Introduction

OverviewMRS Manager provides backup and recovery for user data and system data. The backupfunction is provided based on components to back up Manager data (including OMS data andLdapServer data), Hive user data, component metadata saved in DBService, and HDFSmetadata.

Backup and recovery tasks are performed in the following scenarios:

l Routine backup is performed to ensure the data security of the system and components.l If the system is faulty, backup data can be used to restore the system.l If the active cluster is completely faulty, an image cluster identical to the active cluster

needs to be created. Backup data can be used to perform restoration operations.


2019-01-15 300

Table 5-23 Backing up metadata

Backup Type Backup Content

OMS Back up database data (excluding alarm data) andconfiguration data in the cluster management system.

LdapServer Back up user information, including the username, password,key, password policy, and group information.

DBService Back up metadata of the component (Hive) managed byDBService.

NameNode Back up HDFS metadata.

PrinciplesTask

Before backup or recovery, you need to create a backup or recovery task and set taskparameters, such as the task name, backup data source, and type of directories for savingbackup files. When Manager is used to recover the data of HDFS, Hive, and NameNode, thecluster cannot be accessed.

Each backup task can back up different data sources and generate independent backup filesfor each data source. All the backup files generated in each task form a backup file set, whichcan be used in recovery tasks. Backup files can be stored on Linux local disks, HDFS of thelocal cluster, and HDFS of the standby cluster. The backup task provides both full andincremental backup policies. HDFS and Hive backup tasks support the incremental backuppolicy, while OMS, LdapServer, DBService, and NameNode backup tasks support only thefull backup policy.

NOTE

The rules for task execution are as follows:l If a task is being executed, it cannot be executed repeatedly and other tasks cannot be started at the

same time.l The interval at which a periodic task is automatically executed must be greater than 120s; otherwise,

the task is postponed and will be executed in the next period. Manual tasks can be executed at anyinterval.

l When a periodic task is to be automatically executed, the current time cannot be 120s later than thetask start time; otherwise, the task is postponed and will be executed in the next period.

l When a periodic task is locked, it cannot be automatically executed and needs to be manuallyunlocked.

l Before an OMS, LdapServer, DBService, or NameNode backup task starts, ensure that theLocalBackup partition on the active management node has more than 20 GB available space;otherwise, the backup task cannot be started.

l When planning backup and recovery tasks, select the data you want to back up or recover accordingto the service logic, data storage structure, and database or table association. By default, the systemcreates periodic backup task default that has an execution interval of 24 hours to perform fullbackup of OMS, LdapServer, DBService, and NameNode data to the Linux local disk.

Snapshot

The system adopts the snapshot technology to quickly back up data. Snapshots include HDFSsnapshots.


2019-01-15 301

An HDFS snapshot is a read-only backup of HDFS at a specified time point. It is used in databackup, misoperation protection, and disaster recovery.

The snapshot function can be enabled for any HDFS directory to create the related snapshotfile. Before creating a snapshot for a directory, the system automatically enables the snapshotfunction for the directory. Snapshot creation does not affect HDFS operations. A maximum of65,536 snapshots can be created for each HDFS directory.

When a snapshot has been created for an HDFS directory, the directory cannot be deleted andthe directory name cannot be modified before the snapshot is deleted. Snapshots cannot becreated for the upper-layer directories or subdirectories of the directory.

DistCp

Distributed copy (DistCp) is a tool used to replicate large amounts of data within a clusterHDFS or between HDFSs of different clusters. In HBase, HDFS, or Hive backup or recoverytasks, if the data is backed up in HDFS of the standby cluster, the system invokes DistCp toperform the operation. You need to install the same version of MRS in the active and standbyclusters.

DistCp uses MapReduce to implement data distribution, troubleshooting, recovery, and report.DistCp specifies different Map jobs for various source files and directories in the specifiedlist. Each Map job copies the data in the partition that corresponds to the specified file in thelist.

To use DistCp to perform data replication between HDFSs of two clusters, configure thecross-cluster trust relationship and enable the cross-cluster replication function for bothclusters.

Local quick recovery

After using DistCp to back up the HDFS and Hive data of the local cluster to HDFS of thestandby cluster, HDFS of the local cluster retains the backup data snapshots. Users can createlocal quick recovery tasks to recover data by using the snapshot files in HDFS of the localcluster.

Specifications

Table 5-24 Backup and recovery feature specifications

Item Specifications

Maximum number of backup or recoverytasks

100

Number of concurrent running tasks 1

Maximum number of waiting tasks 199

Maximum size of backup files on a Linuxlocal disk (GB)

600


2019-01-15 302

Table 5-25 Specifications of the default task

Item OMS LdapServer DBService NameNode

Backup period 1 hour

Maximumnumber ofbackups

2

Maximum sizeof a backup file

10 MB 20 MB 100 MB 1.5 GB

Maximum sizeof disk spaceused

20 MB 40 MB 200 MB 3 GB

Save path ofbackup data

Data save path/LocalBackup/ on active and standby management nodes

NOTE

The administrator must regularly transfer the backup data of the default task to an external cluster basedon the enterprise's O&M requirements.

5.13.2 Backing Up Metadata

Scenario

To ensure the security of metadata either on a routine basis or before and after performingcritical metadata operations (such as capacity expansion and reduction, patch installation,upgrades, or migration), metadata must be backed up. The backup data can be used to recoverthe system if an exception occurs or if the operation has not achieved the expected result. Thisminimizes the adverse impact on services.

Metadata includes OMS data, LdapServer data, DBService data, and NameNode data. TheManager data to be backed up includes OMS data and LdapServer data.

By default, metadata backup is supported by the default task. Users can create a backup taskon MRS Manager to back up metadata. Both automatic and manual backup tasks aresupported.

Prerequisitesl A standby cluster for backing up data has been created and the network is connected.

The inbound rules of the two security groups in the peer cluster have been added to thetwo security groups in each cluster to allow all access requests of all ECS protocols andports in the security groups.

l The backup type, period, policy, and other specifications have been planned.

l The Data save path/LocalBackup/ directories on the active and standby managementnodes have sufficient space.


2019-01-15 303

Procedure

Step 1 Create a backup task.

1. On MRS Manager, choose System > Back Up Data.2. Click Create Backup Task.

Step 2 Set backup policies.

1. Set Name to the name of the backup task.2. Set Mode to the type of the backup task. Periodic indicates that the backup task is

periodically executed and Manual indicates that the backup task is manually executed.To create a periodic backup task, set the following parameters in addition to thepreceding parameters:– Start Time: indicates the time when the task is started for the first time.– Period: indicates a task execution interval. The options include By hour and By

day.– Backup Policy: indicates the volume of data to be backed up when each task is

started. The options include Full backup at the first time and subsequentincremental backup, Full backup every time, and Full backup once every ntimes. When the parameter is set to Full backup once every n times, n must bespecified.

Step 3 Select backup sources.

Set Configuration to OMS and LdapServer.

Step 4 Set backup parameters.

1. Set Path Type of OMS and LdapServer to a backup directory type.The following backup directory types are supported:– LocalDir: indicates that backup files are stored on the local disk of the active

management node. The standby management node automatically synchronizes thebackup files. The default save path is Data save path/LocalBackup/. If you selectthis value, you need to set Max. Number of Backup Copies to specify the numberof backup files that can be retained in the backup directory.

– LocalHDFS: indicates that backup files are stored in the HDFS directory of thecurrent cluster. If you select this value, you need to set the following parameters:n Target Path: indicates the backup file save path in HDFS. The save path

cannot be a hidden HDFS directory, such as a snapshot or recycle bindirectory, or a default system directory.

n Max. Number of Backup Copies: indicates the number of backup file setsthat can be retained in the backup directory.

n Target Instance Name: indicates the name of the NameService instance thatcorresponds to the backup directory. The default value is hacluster.

2. Click OK to save the settings.

Step 5 Execute the backup task.

In the Operation column of the created task in the backup task list, click More > Run toexecute the backup task.

After the backup task is executed, the system automatically creates a subdirectory for eachbackup task in the backup directory. The subdirectory is used to save data source backup files.


2019-01-15 304

The format of the subdirectory name is Backup task name_Task creation time. The format ofthe backup file name is Version_Data source_Task execution time.tar.gz.

----End

5.13.3 Recovering Metadata

ScenarioMetadata needs to be recovered in the following scenarios:

l Data is modified or deleted unexpectedly and needs to be restored.l After a critical operation (such as an upgrade or critical data adjustment) is performed on

metadata components, an exception occurs or the operation does not achieve theexpected result. All modules are faulty and become unavailable.

l Data is migrated to a new cluster.

Users can create a recovery task on MRS Manager to recover metadata. Only manualrecovery tasks are supported.

l Data recovery can be performed only when the system version is consistent with that ofdata backup.

l Before recovering data when the service is running properly, you are advised to manuallyback up the latest management data. Otherwise, the metadata that is generated after thedata backup and before the data recovery will be lost.

l Use the OMS data and LdapServer data that are backed up at the same point in time torecover the data. Otherwise, the service and operation may fail.

l By default, the MRS cluster uses DBService to save Hive metadata.

Impact on the Systeml Data generated between the backup time and restoration time is lost.l After the data is recovered, the configuration of the components that depend on

DBService may expire and these components need to be restarted.

Prerequisitesl The data in the OMS and LdapServer backup files has been backed up at the same time.l The status of the OMS resources and the LdapServer instances is normal. If the status is

abnormal, data recovery cannot be performed.l The status of the cluster hosts and services is normal. If the status is abnormal, data

recovery cannot be performed.l The cluster host topologies during data recovery and data backup are the same. If the

topologies are different, data recovery cannot be performed and you need to back up dataagain.

l The services added to the cluster during data recovery and data backup are the same. Ifthe services are different, data recovery cannot be performed and you need to back updata again.


2019-01-15 305

l The status of the active and standby DBService instances is normal. If the status isabnormal, data recovery cannot be performed.

l The upper-layer applications that depend on the MRS cluster have been stopped.l On MRS Manager, all the NameNode role instances with data being recovered have been

stopped. Other HDFS role instances keep running. After data is recovered, theNameNode role instances need to be restarted and cannot be accessed before the restart.

l You have checked whether the NameNode backup files have been saved in the Data savepath/LocalBackup/ directory on the active management node.

Procedure

Step 1 Check the location of the backup data.

1. On MRS Manager, choose System > Back Up Data.2. In the Operation column of a specified task in the task list, click More > View History

to view the records of historical backup tasks. In the window that is displayed, select arecord and click View in the Backup Path column to view its backup path information.Find the following information:– Backup Object: indicates the data source of the backup data.– Backup Path: indicates the full path where the backup files are saved.

3. Select the correct item, and manually copy the full path of backup files in Backup Path.

Step 2 Create a recovery task.

1. On MRS Manager, choose System > Restore Data.2. Click Create Restoration Task.3. Set Name to the name of the recovery task.

Step 3 Select recovery sources.

In Configuration, select the components whose metadata is to be recovered.

Step 4 Set recovery parameters.

1. Set Path Type to a backup directory type.2. The settings vary according to backup directory types:

– LocalDir: indicates that backup files are stored on the local disk of the activemanagement node. If you select this value, you need to set Source Path to the fullpath of the backup file. For example, Data path/LocalBackup/backup taskname_task creation time/data source_task execution time/version_datasource_task execution time.tar.gz.

– LocalHDFS: indicates that backup files are stored in the HDFS directory of thecurrent cluster. If you select this value, you need to set the following parameters:n Source Path: indicates the full path of the backup file in HDFS. For example,

backup path/backup task name_task creation time/version_data source_taskexecution time.tar.gz.

n Source Instance Name: indicates the name of the NameService instance thatcorresponds to the backup directory when the recovery task is executed. Thedefault value is hacluster.

3. Click OK to save the settings.


2019-01-15 306

Step 5 Execute the recovery task.

In the Operation column of the created task in the recovery task list, click Start to executethe recovery task.

l If the recovery is successful, the progress bar is green.

l If the recovery is successful, the recovery task cannot be executed again.

l If the recovery task fails during the first execution, rectify the fault and click Start toexecute the task again.

Step 6 Determine what metadata has been recovered.

l If OMS and LdapServer metadata has been recovered, go to Step 7.

l If DBService data has been recovered, the task is complete.

l If NameNode data has been recovered, choose Service > HDFS > More > RestartService on MRS Manager to complete the task.

Step 7 Restart Manager for the recovered data to take effect.

1. On MRS Manager, choose LdapServer > More > Restart Service, click OK, and waitfor the LdapServer service to restart.

2. Log in to the active management node. For details, see Viewing Active and StandbyNodes.

3. Run the following command to restart OMS:

sh ${BIGDATA_HOME}/om-0.0.1/sbin/restart-oms.sh

The command is executed successfully if the following information is displayed:start HA successfully.

4. On MRS Manager, choose KrbServer > More > Synchronize Configuration, deselectRestart services or instances whose configurations have expired, click OK, and waitfor the KrbServer service configuration to synchronize and for the service to restart.

5. On MRS Manager, choose Service > More > Synchronize Configuration, deselectRestart services or instances whose configurations have expired, click OK, and waitfor the cluster configuration to synchronize.

6. Choose Service > More > Stop Cluster. After the cluster has been stopped, chooseService > More > Start Cluster, and wait for the cluster to start.

----End

5.13.4 Modifying a Backup Task

Scenario

Modify the parameters of a created backup task on MRS Manager to meet changing servicerequirements. The parameters of recovery tasks can be viewed but not modified.


After a backup task is modified, the new parameters take effect when the task is executed nexttime.


2019-01-15 307

Prerequisitesl A backup task has been created.l A new backup task policy has been planned based on the actual situation.

Procedure

Step 1 On MRS Manager, choose System > Back Up Data.

Step 2 In the task list, locate a specified task, and click Configure in the Operation column to go tothe configuration modification page.

Step 3 On the page that is displayed, modify the following parameters:l Start Timel Periodl Target Pathl Max. Number of Backup Copies

NOTE

After the Target Path parameter of a backup task is modified, this task will be performed as a fullbackup task for the first time by default.


----End

5.13.5 Viewing Backup and Recovery Tasks

Scenario

On MRS Manager, view created backup and recovery tasks and check their running status.

Procedure


Step 2 Click Back Up Data or Restore Data.

Step 3 In the task list, obtain the previous task execution result in the Task Progress column. Greenindicates that the task is executed successfully, and red indicates that the execution fails.

Step 4 In the Operation column of a specified task in the task list, click MoreView History to viewthe task execution records.

In the displayed window, click View in the Details column of a specified record to display loginformation about the execution.

----End

Related Tasksl Modifying a backup task

See Modifying a Backup Task.l Viewing a recovery task


2019-01-15 308

In the task list, locate a specified task and click View task in the Operation column toview a recovery task. The parameters of recovery tasks can only be viewed but notmodified.

l Executing a backup or recovery taskIn the task list, locate a specified task and click More > Run or Start in the Operationcolumn to start a backup or recovery task that is ready or fails to be executed. Executedrecovery tasks cannot be repeatedly executed.

l Stopping a backup taskIn the task list, locate a specified task and click More > Stop in the Operation columnto stop a backup task that is running.

l Deleting a backup or recovery taskIn the task list, locate a specified task and click More > Delete in the Operation columnto delete a backup or recovery task. Backup data will be reserved by default after a taskis deleted.

l Suspending a backup taskIn the task list, locate a specified task and click More > Suspend in the Operationcolumn to suspend a backup task. Only periodic backup tasks can be suspended.Suspended backup tasks are no longer executed automatically. When you suspend abackup task that is being executed, the task execution stops. If you want to cancel thesuspension status of a task, click More > Resume.

5.14 Security Management

5.14.1 Default Users of Clusters with Kerberos AuthenticationDisabled

User ClassificationThe MRS cluster provides the following two types of users.

NOTE

Users are required to periodically change their passwords. It is not recommended to use the defaultpasswords.

User Type Description

System user A user used to run OMS system processes.

Database user l Used to manage the OMS database and access data.l Used to run the database of service components (Hive and

DBservice).


2019-01-15 309

System UsersNOTE

l User ldap of the OS is required in the MRS cluster. Do not delete the account. Otherwise, the clustermay not work properly. Password management policies are maintained by the users.

l Reset the password when you change the passwords of user ommdba and user omm for the firsttime. Change the passwords regularly after you have retrieved them.

Type Username Initial Password Description

MRS clustersystem user

admin MIG2oAMCAQGhAw@IBAaIDAgwCAQGkgZ8@wgZwwVKAHMAWgAw@IBAKFJMEgABD4gA

Default user of MRS Manager.The user is used to record thecluster audit logs.

MRS clusternode OS user

omm Randomly generatedby the system

Internal running user of theMRS cluster system. This useris an OS user generated on allnodes and does not require aunified password.

User Group InformationDefault User Group Description

supergroup Primary group of user admin. The primary group does nothave additional permissions in the cluster where Kerberosauthentication is disabled.

check_sec_ldap Used to test whether the active LDAP works properly.This user group is generated randomly in a test andautomatically deleted after the test is complete. This is aninternal system user group used only betweencomponents.

Manager_tenant_187 Tenant system user group. This is an internal system usergroup used only between components.

System_administrator_186 MRS cluster system administrator group. This is aninternal system user group used only betweencomponents.

Manager_viewer_183 MRS Manager system viewer group. This is an internalsystem user group used only between components.

Manager_operator_182 MRS Manager system operator group. This is an internalsystem user group used only between components.

Manager_auditor_181 MRS Manager system auditor group. This is an internalsystem user group used only between components.


2019-01-15 310

Default User Group Description

Manager_administrator_180 MRS Manager system administrator group. This is aninternal system user group used only betweencomponents.

compcommon MRS cluster internal group for accessing public clusterresources. All system users and system running users areadded to this user group by default.

default_1000 This group is created for tenants. This is an internalsystem user group used only between components.

kafka Kafka common user group. A user added to this usergroup can access a topic only when a user in thekafkaadmin group grants the read and write permission ofthe topic to the user.

kafkasuperuser Kafka super user group. Users added to this user grouphave the read and write permission of all topics.

kafkaadmin Kafka administrator group. Users added to this user grouphave the rights to create, delete, authorize, read, and writeall topics.

storm Users added to this user group can submit topologies andmanage their own topologies.

stormadmin Users added to this user group can have the stormadministrator rights and can submit topologies andmanage all topologies.

OS User Group Description

wheel Primary group of MRS internal running user omm.

ficommon MRS cluster common group that corresponds tocompcommon for accessing public cluster resource filesstored in the OS.

Database UsersMRS cluster system database users contain OMS database users and DBService databaseusers.

NOTE

Do not delete the following database users. Otherwise, the cluster or services may not work properly.


2019-01-15 311

Type Default User InitialPassword

Description

OMS database ommdba dbChangeMe@123456

OMS database administratorwho performs maintenanceoperations, such as creating,starting, and stoppingapplications

omm ChangeMe@123456

User for accessing OMSdatabase data

DBServicedatabase

omm dbserverAdmin@123

Administrator of the GaussDBdatabase in the DBServicecomponent

hive HiveUser@ User for Hive to connect to theDBService database

hue HueUser@123 User for Hue to connect to theDBService database

5.14.2 Changing the Password for an OS User

Scenario

Periodically change the login passwords of the OS users omm , ommdba and root of theMRS cluster node to improve the system O&M security.

The passwords of users omm, ommdba and root of the nodes can be different.

Procedure

Step 1 Log in to the Master1 node, and then log in to other nodes whose OS user password needs tobe modified.

Step 2 Run the following command to switch to user root:

sudo su - root

Step 3 Run the following command to change the password for omm/ommdba:

passwd omm/ommdba/root

For example, if you run omm:passwd, the system displays the following information:

Changing password for user omm.New password:

Enter a new password. The policy for changing the password of an OS user varies accordingto the OS that is being used.

Retype new password:passwd: all authentication tokens updated successfully.


2019-01-15 312

NOTE

The default password complexity requirements of MRS clusters are as follows:

l The password must contain at least eight characters.

l The password must contain at least three types of the following: lowercase letters, uppercase letters,digits, spaces, and special characters which can only be '~!@#$%^&*()-_=+\|[{}];:'",<.>/?

l The reset password cannot be the same as the passwords that have been used in the last five times.

----End

5.14.3 Changing the Password for User adminPeriodically change the password for user admin to improve the system O&M security.

Changing the Password for User admin on the Cluster Node

Step 1 Update the client of the active management node. For details, see Updating the Client.


Step 3 (Optional) If you want to change the password as user omm, run the following command toswitch the user:

sudo su - omm


cd /opt/client

Step 5 Run the following command to configure environment variables:

source bigdata_env

Step 6 Run the following command to change the password for user admin. This operation takeseffect in the entire cluster.

kpasswd admin

Enter the old password and then enter a new password twice.

For the MRS 1.5.0 cluster, the password complexity requirements are as follows:l The password must contain 6 to 32 characters.l The password must contain at least two types of the following: lowercase letters,

uppercase letters, digits, spaces, and special characters which can only be ~`!?,.:;-_'(){}[]/<>@#$%^&*+|\=

l The password cannot be the same as the username or reverse username.

NOTE

For MRS clusters of other versions, the password complexity requirements are as follows:


l The password must contain at least four types of the following: lowercase letters, uppercase letters,digits, spaces, and special characters which can only be ~`!?,.:;-_'(){}[]/<>@#$%^&*+|\=


l The password cannot be the same as the previous password.

----End


2019-01-15 313

Changing the Password for User admin on MRS Manager

The password of user admin can be changed on MRS Manager only for clusters withKerberos authentication enabled and clusters with Kerberos authentication disbled but the EIPfunction enabled.

Step 1 Log in to MRS Manager as user admin.

Step 2 Click the username in the upper right corner and choose Change Password.

Step 3 On the Change Password page, set Old Password, New Password, and Confirm Password.

Figure 5-2 Changing the password of user admin

NOTE

The default password complexity requirements are as follows:

l The password must contain 8 to 32 characters.

l The password must contain at least three types of the following: lowercase letters, uppercaseletters, digits, spaces, and special characters which can only be '~!@#$%^&*()-_=+\|[{}];:'",<.>/?


Step 4 Click OK. Log in to MRS Manager again with the new password.

----End

5.14.4 Changing the Password for the Kerberos Administrator

Scenario

Periodically change the password for the Kerberos administrator kadmin of the MRS clusterto improve the system O&M security.

If the user password is changed, the OMS Kerberos administrator password is changed aswell.


2019-01-15 314

Prerequisites

A client has been prepared on the Master1 node.

Procedure

Step 1 Log in to the Master1 node.


sudo su - omm

Step 3 Run the following command to go to client directory /opt/client.

cd /opt/client

Step 4 Run the following command to configure the environment variables:

source bigdata_env

Step 5 Run the following command to change the password for kadmin/admin. The passwordchange takes effect on all servers.

kpasswd kadmin/admin

For the MRS 1.5.0 cluster, the password complexity requirements are as follows:


l The password must contain at least two types of the following: lowercase letters,uppercase letters, digits, spaces, and special characters which can only be ~`!?,.:;-_'(){}[]/<>@#$%^&*+|\=


NOTE






----End

5.14.5 Changing the Password for the LDAP Administrator andthe LDAP User

Scenario

Periodically change the password for LDAP administrator cn=root,dc=hadoop,dc=com andLDAP user cn=pg_search_dn,ou=Users,dc=hadoop,dc=com of the MRS cluster to improvethe system O&M security.


2019-01-15 315


All services need to be restarted for the new password to take effect. Services are unavailableduring the restart.

Procedure

Step 1 On MRS Manager, choose Service > LdapServer > More.

Step 2 Click Change Password.

Step 3 In the Change Password dialog box, select the user that you want to change the passwordfrom User Information.

Step 4 In the Change Password dialog box, enter the old password in Old Password and the newpassword in New Password and Confirm Password.

The password complexity requirements are as follows:

l The password must contain 16 to 32 characters.l The password must contain at least three types of the following: lowercase letters,

uppercase letters, digits, and special characters which can only be `~!@#$%^&*()-_=+\|[{}];:'",<.>/?

l The password cannot be the same as the username or reverse username.l The password cannot be the same as the previous password.

Step 5 Select I have read the information and understand the impact, and click OK to confirmthe password change and restart the service.

----End

5.14.6 Changing the Password for a Component Running User

Scenario

Periodically change the password for each component running user of the MRS cluster toimprove the system O&M security.

If the initial password is randomly generated by the system, reset the initial password.


The initial password of a component running user is randomly generated by the system andneeds to be changed. After the password changes, the MRS cluster needs to be restarted,during which services are temporarily interrupted.

Prerequisites

A client has been prepared on the Master1 node.

Procedure



2019-01-15 316


sudo su - omm

Step 3 Run the following command to go to the client directory, for example, /opt/client.

cd /opt/client


source bigdata_env

Step 5 Run the following command to log in to the console as kadmin/admin:

kadmin -p kadmin/admin

Step 6 Run the following command to change the password of an internal system user. The passwordchange takes effect on all servers.

cpw component running user

For example: cpw oms/manager

For the MRS 1.5.0 cluster, the password complexity requirements are as follows:


l The password must contain at least two types of the following: lowercase letters,uppercase letters, digits, spaces, and special characters which can only be ~`!?,.:;-_'(){}[]/<>@#$%^&*+|\=


NOTE






----End

5.14.7 Changing the Password for the OMS DatabaseAdministrator

Scenario

Periodically change the password for the OMS database administrator to ensure the systemO&M security.

Procedure



2019-01-15 317

NOTE

The password of user ommdba cannot be changed on the standby management node; otherwise, thecluster cannot work properly. Change the password of user ommdba on the active management nodeonly.


sudo su - root

su - omm

Step 3 Run the following command to go to the related directory:

cd $OMS_RUN_PATH/tools

Step 4 Run the following command to change the password for user ommdba:

mod_db_passwd ommdba

Step 5 Enter the old password of user ommdba and enter a new password twice. The passwordchange takes effect on all servers.



uppercase letters, digits, and special characters which can only be ~`!@#$%^&*()-+_=\|[{}];:",<.>/?

l The password cannot be the same as the username or reverse username.l The password cannot be the same as the last 20 historical passwords.

If the following information is displayed, the password is changed successfully.

Congratulations, update [ommdba] password successfully.

----End

5.14.8 Changing the Password for the Data Access User of theOMS Database

Scenario

Periodically change the password for the OMS data access user to ensure the system O&Msecurity.


The OMS service needs to be restarted for the new password to take effect. The service isunavailable during the restart.

Procedure


Step 2 In the Permission area, click Change OMS Database Password.


2019-01-15 318

Step 3 Locate the row that contains user omm and click Change password in Operation to changethe password for the OMS database user.



uppercase letters, digits, and special characters which can only be ~`!@#$%^&*()-+_=\|[{}];:",<.>/?

l The password cannot be the same as the username or reverse username.l The password cannot be the same as the last 20 historical passwords.

Step 4 Click OK. After Operation succeeded is displayed, click Finish.

Step 5 Locate the row that contains user omm and click Restart the OMS service in Operation torestart the OMS database.

NOTE

If you do not restart the OMS database after changing the password, the status of user omm changes toWaiting to restart. In this state, you cannot change its password again until the OMS database isrestarted.

Step 6 In the dialog box that is displayed, select I have read the information and understand theimpact, click OK, and then restart the OMS service.

----End

5.14.9 Changing the Password for a Component Database User

ScenarioPeriodically change the password for each component database user to improve the systemO&M security.

Impact on the SystemThe services need to be restarted for the new password to take effect. Services are unavailableduring the restart.

Procedure

Step 1 On MRS Manager, click Service and click the name of the database user service to bemodified.

Step 2 Determine the component database user whose password is to be changed.l To change the password for the DBService database user, go to Step 3.l To change the password for the Hive or Hue database user, you must stop the service

first, and go to Step 3.

Click Stop Service to stop the service.

Step 3 Choose More > Change Password.

Step 4 In the displayed window, enter the old and new passwords as prompted.



2019-01-15 319

l The password for a DBService database user must contain 16 to 32 characters; thepassword for a Hive or Hue database user must contain 8 to 32 characters.

l The password must contain at least three types of the following: lowercase letters,uppercase letters, digits, and special characters which can only be ~`!@#$%^&*()-+_=\|[{}];:",<.>/?


l The password cannot be the same as the last 20 historical passwords.

Step 5 Click OK. The system automatically restarts the service. After Operation succeeded isdisplayed, click Finish.

----End

5.14.10 Replacing HA Certificates

Scenario

HA certificates are used to encrypt the communication between active/standby processes andhigh availability processes to ensure security. Replace the HA certificates on active andstandby management nodes on MRS Manager to ensure product security.

The certificate file and key file can be generated by the users.


The MRS Manager system must be restarted during the replacement and cannot be accessedor provide services.

Prerequisitesl You have obtained the root-ca.crt root file and the root-ca.pem key file of the certificate

to be replaced.

l You have prepared a password, for example, Userpwd@123, for accessing the key file.

The password must meet the following complexity requirements. Otherwise, securityrisks may be incurred.

– The password must contain at least eight characters.

– The password must contain at least four types of the following: uppercase letters,lowercase letters, digits, and special characters ~`!?,.:;-_'(){}[]/<>@#$%^&*+|\=.

Procedure


Step 2 Run the following commands to switch the user:

sudo su - root

su - omm

Step 3 Run the following command to generate root-ca.crt and root-ca.pem in the ${OMS_RUN_PATH}/workspace0/ha/local/cert directory:


2019-01-15 320

sh ${OMS_RUN_PATH}/workspace/ha/module/hacom/script/gen-cert.sh --root-ca --country=country --state=state --city=city --company=company --organize=organize --common-name=commonname --email=Administrator email address --password=password

For example, run the following command to generate the files: sh ${OMS_RUN_PATH}/workspace/ha/module/hacom/script/gen-cert.sh --root-ca --country=FR --state=eur --city=pa --company=ocb --organize=IT --common-name=HADOOP.COM [email protected] --password=Userpwd@123

If the following information is displayed, the command is executed successfully:

Generate root-ca pair success.

Step 4 On the active management node, run the following command as user omm to copy root-ca.crt and root-ca.pem to the ${BIGDATA_HOME}/om-0.0.1/security/certHA directory:

cp -arp ${OMS_RUN_PATH}/workspace0/ha/local/cert/root-ca.* ${BIGDATA_HOME}/om-0.0.1/security/certHA

Step 5 Copy root-ca.crt and root-ca.pem generated on the active management node to ${BIGDATA_HOME}/om-0.0.1/security/certHA on the standby management node as useromm.

Step 6 Run the following command to generate an HA certificate and perform automaticreplacement:

sh ${BIGDATA_HOME}/om-0.0.1/sbin/replacehaSSLCert.sh

Enter password as prompted and press Enter.

Please input ha ssl cert password:

If the following information is displayed, the HA certificate is replaced successfully:

[INFO] Succeed to replace ha ssl cert.

Step 7 Run the following command to restart OMS.

sh ${BIGDATA_HOME}/om-0.0.1/sbin/restart-oms.sh

The following information is displayed:

start HA successfully.

Step 8 Log in to the standby management node and switch to user omm. Repeat Step 6 to Step 7.

Run the sh ${BIGDATA_HOME}/om-0.0.1/sbin/status-oms.sh command to check whetherHAAllResOK of the management node is Normal. Access the MRS Manager again. If MRSManager can be accessed, the operation is successful.

----End

5.14.11 Updating the Key of a Cluster

Scenario

When a cluster is created, the system automatically generates an encryption key to store thesecurity information in the cluster (such as all database user passwords and key file accesspasswords) in encryption mode. After a cluster is successfully installed, it is advised toregularly update the encryption key based on the following procedure.


2019-01-15 321

Impact on the Systeml After a cluster key is updated, a new key is generated randomly in the cluster. This key is

used to encrypt and decrypt the newly stored data. The old key is not deleted, and it isused to decrypt the old encrypted data. After security information is modified, forexample, a database user password is changed, the new password is encrypted using thenew key.

l When the key is updated in a cluster, the cluster must be stopped and cannot be accessed.

Prerequisites

You have stopped the upper-layer service applications that depend on the cluster.

Procedure

Step 1 On MRS Manager, choose Service > More > Stop Cluster.

Select I have read the information and understand the impact in the displayed window,and click OK. After Operation succeeded is displayed, click Finish. The cluster is stopped.



sudo su - omm

Step 4 Run the following command to disable logout upon timeout:

TMOUT=0

Step 5 Run the following command to switch the directory:

cd ${BIGDATA_HOME}/om-0.0.1/tools

Step 6 Run the following command to update the cluster key:

sh updateRootKey.sh

Enter y as prompted.

The root key update is a critical operation.Do you want to continue?(y/n):

If the following information is displayed, the key is updated successfully.

Step 4-1: The key save path is obtained successfully....Step 4-4: The root key is sent successfully.

Step 7 On MRS Manager, choose Service > More > Start Cluster.

In the confirmation dialog box, click OK to start the cluster. After Operation succeeded isdisplayed, click Finish. The cluster is started.

----End


2019-01-15 322

6 Management of Clusters with Kerberos

Authentication Enabled

6.1 Users and Permissions of Clusters with KerberosAuthentication EnabledOverview

l MRS Cluster UsersIndicate the security accounts of MRS Manager, including usernames and passwords.These accounts are used to access resources in MRS clusters. Each MRS cluster in whichKerberos authentication is enabled can have multiple users.

l MRS Cluster RolesBefore using resources in an MRS cluster, users must obtain the access permission. Theaccess permission is defined by MRS cluster objects. A cluster role is a set of one ormore permissions. For example, the permission to access a directory in HDFS needs tobe configured in the specified directory and saved in a role.

MRS Manager provides the user permission management function for MRS clusters,facilitating permission and user management.

l Permission management: adopts the role-based access control (RBAC) mode. In thismode, permissions are granted by role, forming a permission set. After one or more rolesare allocated to a user, the user can obtain the permissions of the roles.

l User management: uses MRS Manager to uniformly manage users, adopts the Kerberosprotocol for user identity verification, and employs Lightweight Directory AccessProtocol (LDAP) to store user information.

Permission ManagementPermissions provided by MRS clusters include the O&M permissions of MRS Manager andcomponents (such as HDFS, HBase, Hive, and Yarn). In actual application, permissions mustbe assigned to each user based on service scenarios. To facilitate permission management,MRS Manager introduces the role function to allow administrators to select and assignspecified permissions. Permissions are centrally viewed and managed in permission sets,enhancing user experience.

MapReduce ServiceUser Guide

6 Management of Clusters with Kerberos AuthenticationEnabled

2019-01-15 323

A role is a logical entity that contains one or more permissions. Permissions are assigned toroles, and users can be granted the permissions by obtaining the roles.

A role can have multiple permissions, and a user can be bound to multiple roles.

l Role 1: is assigned operation permissions A and B. After role 1 is allocated to users aand b, users a and b can obtain operation permissions A and B.

l Role 2: is assigned operation permission C. After role 2 is allocated to users c and d,users c and d can obtain operation permission C.

l Role 3: is assigned operation permissions D and F. After role 3 is allocated to user a, usera can obtain operation permissions D and F.

For example, if an MRS user is bound to the administrator role, the user is an administrator ofthe MRS cluster.

Table 6-1 lists the roles that are created by default on MRS Manager.

Table 6-1 Default roles and description

Default Role Description

default Tenant role

Manager_administrator Manager administrator: This role has the permission tomanage MRS Manager.

Manager_auditor Manager auditor: This role has the permission to view andmanage auditing information.

Manager_operator Manager operator: This role has all permissions except tenant,configuration, and cluster management permissions.

Manager_viewer Manager viewer: This role has the permission to view theinformation about systems, services, hosts, alarms, andauditing logs.

System_administrator System administrator: This role has the permissions ofManager administrators and all service administrators.

Manager_tenant Manager tenant viewer: This role has the permission to viewinformation on the Tenant page on MRS Manager.

When creating a role on MRS Manager, you can perform permission management for MRSManager and components, as described in Table 6-2.

Table 6-2 Manager and component permission management

Permission Description

Manager ManagerManager access and login permission.

HBase HBase administrator permission and permission for accessingHBase tables and column families.



2019-01-15 324

Permission Description

HDFS HDFS directory and file permission.

Hive l Hive Admin PrivilegeHive administrator permission.

l Hive Read Write PrivilegesHive data table management permission, which is theoperation permission to set and manage the data of createdtables.

Hue Storage policy administrator rights.

Yarn l Cluster Admin OperationsYarn administrator permission.

l Scheduler QueueQueue resource management permission.

User Management

MRS clusters that support Kerberos authentication use the Kerberos protocol and LDAP foruser management.

l Kerberos verifies the identity of a user when the user logs in to MRS Manager or uses acomponent client. Identity verification is not required for clusters with Kerberosauthentication disabled.

l LDAP is used to store user information, including user records, user group information,and permission information.

MRS clusters can automatically update Kerberos and LDAP user data when users are createdor modified on MRS Manager. They can also automatically perform user identity verificationand authentication and obtain user information when a user logs in to MRS Manager or uses acomponent client. This ensures the security of user management and simplifies the usermanagement tasks. MRS Manager also provides the user group function for managing one ormore users by type:

l A user group is a set of users. Users in the system can exist independently or in a usergroup.

l After a user is added to a user group to which roles are allocated, the role permission ofthe user group is assigned to the user.

The following table lists the user groups that are created by default on MRS Manager.

Table 6-3 Default user groups and description

User Group Description

hadoop Users added to this user group have the permission to submittasks to all Yarn queues.

hbase Common user group. Users added to this user group will nothave any additional permission.



2019-01-15 325


hive Users added to this user group can use Hive.

spark Common user group. Users added to this user group will nothave any additional permission.

supergroup Users added to this user group can have the administratorrights of HBase, HDFS, and Yarn and can use Hive.

kafka Kafka common user group. A user added to this user groupcan access a topic only when a user in the kafkaadmin groupgrants the read and write permission of the topic to the user.

kafkasuperuser Kafka super user group. Users added to this user group havethe read and write permission of all topics.

kafkaadmin Kafka administrator group. Users added to this user grouphave the rights to create, delete, authorize, read, and write alltopics.


stormadmin Users added to this user group can have the stormadministrator rights and can submit topologies and manage alltopologies.

User admin is created by default for MRS clusters with Kerberos authentication enabled andis used by administrators to maintain the clusters.

Process OverviewIn practice, administrators must understand the service scenarios of MRS clusters and planuser permissions. Then, create roles and assign permissions to the roles on MRS Manager tomeet service requirements. Administrators can create user groups on MRS Manager tomanage users in one or more service scenarios of the same type.

NOTE

If a role has the permission of HDFS, HBase, Hive, or Yarn, the role can use the corresponding functionsof the component. To use MRS Manager, the corresponding Manager permission must be added to therole.



2019-01-15 326

Figure 6-1 Process of creating a user

6.2 Default Users of Clusters with KerberosAuthentication Enabled

User ClassificationThe MRS cluster provides the following three types of users. Users are required toperiodically change the passwords. It is not recommended to use the default passwords.



2019-01-15 327

User Type Description

System user l A user created on MRS Manager for MRS cluster O&M andservice scenarios. There are two types of users:– Human-machine user: used for MRS Manager O&M scenarios

and component client operation scenarios.– Machine-machine user: used for MRS cluster application

development scenarios.l A user used to run OMS system processes.

Internal systemuser

An internal user provided by the MRS cluster and used to implementcommunication between processes, save user group information, andassociate user rights.

Database user l A user used to manage the OMS database and access data.l A user used to run the database of service components (Hive, Hue

and DBservice).

System UsersNOTE

l User ldap of the OS is required in the MRS cluster. Do not delete the account. Otherwise, the clustermay not work properly. Password management policies are maintained by the users.

l Reset the password when you change the passwords of user ommdba and user omm for the firsttime. Change the passwords regularly after you have retrieved them.

Type Username InitialPassword

Description

MRS clustersystem user

admin Specified by theuser when thecluster is created

Administrator of MRSManager.This user also has the followingrights:l Common HDFS and

ZooKeeper user rights.l Rights to submit and query

MapReduce and Yarn tasks,to manage Yarn queues, andto access Yarn WebUI.

l Rights to submit, query,activate, deactivate,reassign, delete topologies,and operate all topologies ofthe Storm service.

l Rights to create, delete,authenticate, reassign,consume, write, and querytopics of the Kafka service.



2019-01-15 328

Type Username InitialPassword

Description

MRS clusternode OS user

ommdba Randomlygenerated by thesystem

User who creates the MRScluster system database. Thisuser is an OS user generated onthe management nodes anddoes not require a unifiedpassword.

omm Randomlygenerated by thesystem

Internal running user of theMRS cluster system. This useris an OS user generated on allnodes and does not require aunified password.

User for runningMRS cluster jobs

yarn_user Randomlygenerated by thesystem

Internal user used to run theMRS cluster jobs. This user isgenerated on Core nodes.

Internal System UsersNOTE

Do not delete the following internal system users. Otherwise, the cluster or services may not workproperly.


Description

Kerberosadministrator

kadmin/admin Admin@123

Account that is used to add,delete, modify, and query users onKerberos.

OMS Kerberosadministrator

kadmin/admin Admin@123

Account that is used to add,delete, modify, and query users onOMS Kerberos.

LDAPadministrator

cn=root,dc=hadoop,dc=com

LdapChangeMe@123

Account that is used to add,delete, modify, and query the userinformation on LDAP.

OMS LDAPadministrator

cn=root,dc=hadoop,dc=com

LdapChangeMe@123

Account that is used to add,delete, modify, and query the userinformation on OMS LDAP.

LDAP user cn=pg_search_dn,ou=Users,dc=hadoop,dc=com

pg_search_dn@123

User that is used to queryinformation about users and usergroups on LDAP.

OMS LDAPuser

cn=pg_search_dn,ou=Users,dc=hadoop,dc=com

pg_search_dn@123

User that is used to queryinformation about users and usergroups on OMS LDAP.



2019-01-15 329


Description

LDAPadministratoraccount

cn=krbkdc,ou=Users,dc=hadoop,dc=com

LdapChangeMe@123

Account that is used to queryKerberos componentauthentication accountinformation.

cn=krbadmin,ou=Users,dc=hadoop,dc=com

LdapChangeMe@123

Account that is used to add,delete, or query Kerberoscomponent authentication accountinformation.

User forquerying theMRS cluster

executor Randomlygeneratedby thesystem

User that is used to query clusterswith Kerberos authenticationenabled on the MRS managementconsole.

Componentrunning user

hdfs Hdfs@123 HDFS system administrator whohas the following permission:1. File system operation

permission:l Views, modifies, and

creates files.l Views and creates

directories.l Views and modifies the

groups where files belong.l Views and sets disk quotas

of users2. HDFS management operation

permission:l Views the WebUI status.l Views and sets the active

and standby HDFS status.l Enters and exits HDFS in

security mode.l Checks HDFS.



2019-01-15 330


Description

hbase Hbase@123

HBase system administrator whohas the following permission:l Cluster management

permission: Enables anddisables tables, and triggersMajorCompact and AccessControl List (ACL).

l Grants and reclaimspermission, and shuts downthe cluster.

l Table managementpermission: Creates, modifies,and deletes tables.

l Data management permission:Reads and writes table-,column family-, and column-level data.

l Accesses the HBase WebUI.

mapred Mapred@123

MapReduce system administratorwho has the followingpermission:l Submits, stops, and views

MapReduce tasks.l Modifies the Yarn

configuration parameters.l Accesses the Yarn and

MapReduce WebUI.

spark Spark@123

Spark system administrator whohas the following permission:l Accesses the Spark WebUI.l Submits Spark tasks.

oms/manager Randomlygeneratedby thesystem

Controller and NodeAgentauthentication user who has thepermission of supergroup

backup/manager Randomlygeneratedby thesystem

User who runs backup andrecovery tasks and has thepermission of supergroup



2019-01-15 331


Description

hdfs/hadoop.hadoop.com

Randomlygeneratedby thesystem

HDFS system startup user whohas the following permission:1. File system operation




groups where files belong.l Views and sets disk quotas

of users.2. HDFS management operation

permission:l Views the WebUI status.l Views and sets the active

and standby HDFS status.l Enters and exits HDFS in

security mode.l Checks HDFS.

mapred/hadoop.hadoop.com


MapReduce system startup userwho has the followingpermission:l Submits, stops, and views

MapReduce tasks.l Modifies the Yarn

configuration parameters.

mr_zk/hadoop.hadoop.com


User used for MapReduce toaccess ZooKeeper

hbase/hadoop.hadoop.com


User used for the authenticationbetween internal componentsduring the HBase system startup

hbase/zkclient.hadoop.com


User used for HBase to performZooKeeper authentication in acluster in security mode



2019-01-15 332


Description

thrift/hadoop.hadoop.com


ThriftServer system startup user

thrift/<hostname> Randomlygeneratedby thesystem

User used for the ThriftServersystem to access HBase. This userhas the permission to read, write,execute, create, and manage allHBase NameSpaces and tables.<hostname> specifies the hostname of the node whereThriftServer is installed.

hive/hadoop.hadoop.com


User used for the authenticationbetween internal componentsduring the Hive system startup.The user has the followingpermission:1. Hive administrator permission:l Creates, deletes, and

modifies databases.l Creates, queries, modifies,

and deletes tables.l Queries, inserts, and loads

data.2. HDFS file operation




groups where files belong.3. Submits and stops MapReduce

jobs.

spark/hadoop.hadoop.com


Spark system startup user

spark_zk/hadoop.hadoop.com


User used for Spark to accessZooKeeper



2019-01-15 333


Description

zookeeper/hadoop.hadoop.com


ZooKeeper system startup user

zkcli/hadoop.hadoop.com


ZooKeeper server login user

kafka/hadoop.hadoop.com


User used for securityauthentication for Kafka.

storm/hadoop.hadoop.com


Storm system startup user.

storm_zk/hadoop.hadoop.com


User for the Worker process toaccess ZooKeeper.

loader/hadoop.hadoop.com


User for Loader system startupand Kerberos authentication.

HTTP/<hostname> Randomlygeneratedby thesystem

Used to connect to the HTTPinterface of each component.<hostname> indicates the name ofthe node in the cluster.

flume Randomlygeneratedby thesystem

User for Flume system startupand HDFS and Hive access. Theuser has read and writepermission of the HDFSdirectory /flume.

check_ker_M Randomlygeneratedby thesystem

Kerberos internal functional user.This user cannot be deleted, andits password cannot be changed.This internal account cannot beused on the nodes where Kerberosservice is not installed.K/M Randomly

generatedby thesystem



2019-01-15 334


Description

kadmin/changepw Randomlygeneratedby thesystem

kadmin/history Randomlygeneratedby thesystem

krbtgt/HADOOP.COM


User Group InformationDefault User Group Description

hadoop Users added to this user group have the permission tosubmit tasks to all Yarn queues.

hbase Common user group. Users added to this user group willnot have any additional rights.

hive Users added to this user group can use Hive.

spark Common user group. Users added to this user group willnot have any additional rights.

supergroup Users added to this user group can have the administratorrights of HBase, HDFS, and Yarn and can use Hive.

check_sec_ldap Used to test whether the active LDAP works properly.This user group is generated randomly in a test andautomatically deleted after the test is complete. This is aninternal system user group used only betweencomponents.

Manager_tenant_187 Tenant system user group. This is an internal system usergroup used only between components.

System_administrator_186 MRS cluster system administrator group. This is aninternal system user group used only betweencomponents.

Manager_viewer_183 MRS Manager system viewer group. This is an internalsystem user group used only between components.

Manager_operator_182 MRS Manager system operator group. This is an internalsystem user group used only between components.



2019-01-15 335

Default User Group Description

Manager_auditor_181 MRS Manager system auditor group. This is an internalsystem user group used only between components.

Manager_administrator_180 MRS Manager system administrator group. This is aninternal system user group used only betweencomponents.

compcommon MRS cluster internal group for accessing public clusterresources. All system users and system running users areadded to this user group by default.

default_1000 This group is created for tenants. Internal system usergroup, which is used only between components.

kafka Kafka common user group. A user added to this usergroup can access a topic only when a user in thekafkaadmin group grants the read and write permission ofthe topic to the user.

kafkasuperuser Kafka super user group. Users added to this user grouphave the read and write permission of all topics.

kafkaadmin Kafka administrator group. Users added to this user grouphave the rights to create, delete, authorize, read, and writeall topics.


stormadmin Users added to this user group can have the stormadministrator rights and can submit topologies andmanage all topologies.

OS User Group Description

wheel Primary group of MRS internal running user omm

ficommon MRS cluster common group that corresponds tocompcommon for accessing public cluster resource filesstored in the OS

Database UsersMRS cluster system database users contain OMS database users and DBService databaseusers.

NOTE

Do not delete the following database users. Otherwise, the cluster or services may not work properly.



2019-01-15 336


Description

OMS database ommdba dbChangeMe@123456

OMS database administratorwho performs maintenanceoperations, such as creating,starting, and stoppingapplications

omm ChangeMe@123456

User used for accessing OMSdatabase data

DBServicedatabase

omm dbserverAdmin@123

Administrator of the GaussDBdatabase in the DBServicecomponent

hive HiveUser@ User used for Hive to connectto the DBService database

hue HueUser@123 User used for Hue to connect tothe DBService database

sqoop SqoopUser@ User for Loader to connect tothe DBService database

6.3 Creating a Role

Scenario

This section describes how to create a role on MRS Manager and authorize and manageManager and components.

Up to 1000 roles can be created on MRS Manager.

Prerequisites

You have learned service requirements.

Procedure

Step 1 On MRS Manager, choose System > Manage Role.

Step 2 Click Create Role and fill in Role Name and Description.

Role Name is mandatory and contains 3 to 30 digits, letters, and underscores (_). Descriptionis optional.

Step 3 In Permission, set role permission.

1. Click Service Name and select a name in View Name.

2. Select one or more permissions.



2019-01-15 337

NOTE

l The Permission parameter is optional.

l If you select View Name to set component permissions, you can enter a resource name in the

Search box in the upper-right corner and click . The search result is displayed.

l The search scope covers only directories with current permissions. You cannot search subdirectories.Search by keywords supports fuzzy match and is case-insensitive. Results of the next page can besearched.

Table 6-4 Manager permission description

Resource SupportingPermission Management

Permission Setting

Alarm Authorizes the Manager alarm function. You can selectView to view alarms and Management to manage alarms.

Audit Authorizes the Manager audit log function. You can selectView to view audit logs and Management to manageaudit logs.

Dashboard Authorizes the Manager overview function. You can selectView to view the cluster overview.

Hosts Authorizes the node management function. You can selectView to view node information and Management tomanage nodes.

Services Authorizes the service management function. You canselect View to view service information and Managementto manage services.

System_cluster_management

Authorizes the MRS cluster management function. Youcan select Management to use the MRS patchmanagement function.

System_configuration Authorizes the MRS cluster configuration function. Youcan select Management to configure MRS clusters onManager.

System_task Authorizes the MRS cluster task function. You can selectManagement to manage periodic tasks of MRS clusterson Manager.

Tenant Authorizes the Manager multi-tenant managementfunction. You can select Management to view theManager tenant management page.

Table 6-5 HBase permission description


Permission Setting

SUPER_USER_GROUP Grants you HBase administrator rights.



2019-01-15 338


Permission Setting

Global HBase resource type, indicating the whole HBase.

Namespace HBase resource type, indicating namespace, which is usedto store HBase tables. It has the following permissions:l Admin: permission to manage the namespacel Create: permission to create HBase tables in the

namespacel Read: permission to access the namespacel Write: permission to write data to the namespacel Execute: permission to execute the coprocessor

(Endpoint)

Table HBase resource type, indicating a data table, which is usedto store data. It has the following permissions:l Admin: permission to manage a data tablel Create: permission to create column families and

columns in a data tablel Read: permission to read a data tablel Write: permission to write data to a data tablel Execute: permission to execute the coprocessor

(Endpoint)

ColumnFamily HBase resource type, indicating a column family, which isused to store data. It has the following permissions:l Create: permission to create columns in a column

familyl Read: permission to read a column familyl Write: permission to write data to a column family

Qualifier HBase resource type, indicating a column, which is usedto store data. It has the following permissions:l Read: permission to read a columnl Write: permission to write data to a column

Permissions of an HBase resource type of each level are shared by subdirectories by default.For example, if Read and Write permissions are added to the default namespace, they areautomatically added to the tables, column families, and columns in the namespace.



2019-01-15 339

Table 6-6 HDFS permission description


Permission Setting

Folder HDFS resource type, indicating an HDFS directory, whichis used to store files or subdirectories. It has the followingpermissions:l Read: permission to access the HDFS directoryl Write: permission to write data to the HDFS directoryl Execute: permission to perform an operation. It must

be selected when you add access or write permission.

Files HDFS resource type, indicating a file in HDFS. It has thefollowing permissions:l Read: permission to access the filel Write: permission to write data to the filel Execute: permission to perform an operation. It must

be selected when you add access or write permission.

Permissions of an HDFS directory of each level are not shared by subdirectories by default.For example, if Read and Execute permissions are added to the tmp directory, you mustselect Recursive at the same time to add permissions to subdirectories.

Table 6-7 Hive permission description


Permission Setting

Hive Admin Privilege Grants you Hive administrator rights.

Database Hive resource type, indicating a Hive database, which isused to store Hive tables. It has the following permissions:l Select: permission to query the Hive databasel Delete: permission to perform the deletion operation in

the Hive databasel Insert: permission to perform the insertion operation

in the Hive databasel Create: permission to perform the creation operation

in the Hive database



2019-01-15 340


Permission Setting

Table Hive resource type, indicating a Hive table, which is usedto store data. It has the following permissions:l Select: permission to query the Hive tablel Delete: permission to perform the deletion operation in

the Hive tablel Update: grants users the Update permission of the

Hive tablel Insert: permission to perform the insertion operation

in the Hive tablel Grant of Select: permission to grant the Select

permission to other users using Hive statementsl Grant of Delete: permission to grant the Delete

permission to other users using Hive statementsl Grant of Update: permission to grant the Update

permission to other users using Hive statementsl Grant of Insert: permission to grant the Insert

permission to other users using Hive statements

Permissions of a Hive resource type of each level are shared by resource types of sub-levelsby default. For example, if Select and Insert permissions are added to the default database,they are automatically added to the tables and columns in the database.

Table 6-8 Yarn permission description


Permission Setting

Cluster Admin Operations Grants you Yarn administrator rights.

root Root queue of Yarn. It has the following permissions:l Submit: permission to submit jobs in the queuel Admin: permission to manage permissions of the

current queue

Parent Queue Yarn resource type, indicating a parent queue containingsub-queues. A root queue is a type of a parent queue. Ithas the following permissions:l Submit: permission to submit jobs in the queuel Admin: permission to manage permissions of the

current queue



2019-01-15 341


Permission Setting

Leaf Queue Yarn resource type, indicating a leaf queue. It has thefollowing permissions:l Submit: permission to submit jobs in the queuel Admin: permission to manage permissions of the

current queue

Permissions of a Yarn resource type of each level are shared by resource types of sub-levelsby default. For example, if the Submit permission is added to the root queue, it isautomatically added to the sub-queue. Permissions inherited by sub-queues will not bedisplayed as selected in the Permission table.

Table 6-9 Hue permission description


Permission Setting

Storage Policy Admin Grants you storage policy administrator rights.

Step 4 Click OK. Return to Manage Role.

----End

Related TasksModifying a role


Step 2 In the Permission area, click Manage Role.

Step 3 In the row of the role to be modified, click Modify to modify role information.

NOTE

If you change permissions assigned to the role, it takes 3 minutes to make new configurations takeeffect.

Step 4 Click OK. The modification is complete.

----End

Deleting a role


Step 2 In the Permission area, click Manage Role.

Step 3 In the row of the role to be deleted, click Delete.

Step 4 Click OK. The role is deleted.

----End



2019-01-15 342

6.4 Creating a User Group

Scenario

This section describes how to create user groups and specify their operation permissions onMRS Manager. Management of single or multiple users can be unified in the user groups.After being added to a user group, users can obtain operation permissions owned by the usergroup.

Up to 100 user groups can be created on MRS Manager.

Prerequisites

Administrators have learned service requirements and created roles required by servicescenarios.

Procedure


Step 2 In the Permission area, click Manage User Group.

Step 3 Above the user group list, click Create User Group.

Step 4 Set Group Name and Description.

Group Name is mandatory and contains 3 to 20 digits, letters, and underscores (_).Description is optional.

Step 5 In Role, click Select and Add Role to select and add specified roles.

If you do not add the roles, the user group you are creating now does not have the permissionto use MRS clusters.

Step 6 Click OK. The user group is created.

----End

Related Tasks

Modifying a user group



Step 3 In the row of a user group to be modified, click Modify.

NOTE

If you change role permissions assigned to the user group, it takes 3 minutes to make new configurationstake effect.


----End



2019-01-15 343

Deleting a user group



Step 3 In the row of the user group to be deleted, click Delete.

Step 4 Click OK. The user group is deleted.

----End

6.5 Creating a User

Scenario

This section describes how to create users on MRS Manager based on site requirements andspecify their operation permissions to meet service requirements.

Up to 1000 users can be created on MRS Manager.

Prerequisites

Administrators have learned service requirements and created roles and role groups requiredby service scenarios.

Procedure


Step 2 In the Permission area, click Manage User.

Step 3 Above the user list, click Create User.

Step 4 Configure parameters as prompted and enter a username in Username.

NOTE

l If a username exists, you cannot create another username that only differs from the existingusername in case. For example, if User1 has been created, you cannot create user1.

l When you use the user you created, enter the correct username, which is case-sensitive.

l Username is mandatory and contains 3 to 20 digits, letters, and underscores (_).

l root, omm, and ommdba are reserved system users. Select another username.

Step 5 Set User Type to either Human-machine or Machine-machine.

l Human-machine users: used for O&M on MRS Manager and operations on componentclients. If you select this user type, you need to enter a password and confirm thepassword in Password and Confirm Password accordingly.

l Machine-machine users: used for MRS application development. If you select this usertype, you do not need to enter a password, because the password is randomly generated.

Step 6 In User Group, click Select and Join User Group to select user groups and add users tothem.



2019-01-15 344

NOTE

l If roles have been added to user groups, the users can be granted with permissions of the roles.

l If you want to grant new users with Hive permissions, add the users to the Hive group.

l If you want to manage tenant resources, assign the Manager_tenant role and the role correspondingto the tenant to the user group.

Step 7 In Primary Group, select a group as the primary group for users to create directories andfiles. The drop-down list contains all groups selected in User Group.

Step 8 In Assign Rights by Role, click Select and Add Role to add roles for users based on onsiteservice requirements.

NOTE

l When you create a user, if permissions of a user group that is granted to the user cannot meet servicerequirements, you can assign other created roles to the user. It takes 3 minutes to make rolepermissions granted to the new user take effect.

l Adding a role when you create a user can specify the user rights.

l A new user can access WebUIs of HDFS, HBase, Yarn, Spark, and Hue even when roles are notassigned to the user.

Step 9 In Description, provide description based on onsite service requirements.

Description is optional.

Step 10 Click OK. The user is created.

If a new user is used in the MRS cluster for the first time, for example, used for logging in toMRS Manager or using the cluster client, the password must be changed. For details, seesection "Changing the Password of an Operation User".

----End

6.6 Modifying User Information

Scenario

This section describes how to modify user information on MRS Manager, includinginformation about the user group, primary group, role, and description.

Procedure



Step 3 In the row of a user to be modified, click Modify.

NOTE

If you change user groups for or assign role permissions to the user, it takes at most 3 minutes to makenew configurations take effect.


----End



2019-01-15 345

6.7 Locking a User

Scenario

This section describes how to lock users in MRS clusters. A locked user cannot log in to MRSManager or perform security authentication in the cluster.

A locked user can be unlocked by an administrator manually or until the lock durationexpires. You can lock a user by using either of the following methods:

l Automatic lock: Set Number of Password Retries in Configure Password Policy. Ifuser login attempts exceed the parameter value, the user is automatically locked. Fordetails, see Modifying a Password Policy.

l Manual lock: The administrator manually locks a user.

The following describes how to manually lock a user. Machine-machine users cannot belocked.

Procedure



Step 3 In the row of a user you want to lock, click Lock User.

Step 4 In the window that is displayed, click OK to lock the user.

----End

6.8 Unlocking a User

Scenario

If a user's login attempts exceed the value of Number of Password Retries and the user islocked, the administrator can unlock the user on MRS Manager.

Procedure



Step 3 In the row of a user you want to unlock, choose Unlock User.

Step 4 In the window that is displayed, click OK to unlock the user.

----End



2019-01-15 346

6.9 Deleting a User

Scenario

If an MRS cluster user is not required, the administrator can delete the user on MRS Manager.

Procedure



Step 3 In the row of the user to be deleted, choose More > Delete

Step 4 Click OK.

----End

6.10 Changing the Password of an Operation User

Scenario

Passwords of Human-machine system users must be regularly changed to ensure MRScluster security. This section describes how to change your passwords on MRS Manager.


If you have downloaded a user authentication file, download it again and obtain the keytabfile after changing the password of the MRS cluster user.

Prerequisitesl You have obtained the current password policies from the administrator.l You have obtained the URL to access MRS Manager from the administrator.

Procedure

Step 1 On MRS Manager, move the mouse cursor to in the upper-right corner.

On the menu that is displayed, select Change Password.

Step 2 Fill in Old Password, New Password, and Confirm Password. Click OK.






2019-01-15 347

NOTE






----End

6.11 Initializing the Password of a System User

Scenario

This section describes how to initialize a password on MRS Manager if a user forgets thepassword or the password of a public account needs to be changed regularly. After passwordinitialization, the user must change the password upon the first login.


If you have downloaded a user authentication file, download it again and obtain the keytabfile after initializing the password of the MRS cluster user.

Initializing the Password of a Human-machine User



Step 3 In the row that contains the user whose password is to be initialized, click More > Initializepassword and change the password as prompted.

In the window that is displayed, enter the password of the current administrator account andclick OK. Then in Initialize Password, click OK.




NOTE






----End



2019-01-15 348

Initializing the Password of a Machine-machine User

Step 1 Prepare a client based on service conditions and log in to the node with the client installed.


sudo su - omm

Step 3 Run the following command to switch to the client directory, for example, /opt/client:

cd /opt/client


source bigdata_env

Step 5 Run the following command to log in to the console as user kadmin/admin:

kadmin -p kadmin/admin

Step 6 Run the following command to reset the password of a component running user: Thisoperation takes effect for all servers.

cpw Component running user name

For example, cpw oms/manager.




NOTE






----End

6.12 Downloading a User Authentication File

Scenario

When a user develops big data applications and runs them in an MRS cluster that supportsKerberos authentication, the user needs to prepare a Machine-machine user authenticationfile for accessing the MRS cluster. The keytab file in the authentication file can be used foruser authentication.

This section describes how to download a Machine-machine user authentication file andexport the keytab file on MRS Manager.



2019-01-15 349

NOTE

Before you download a Human-machine user authentication file, change the password for the user onMRS Manager to make the initial password set by the administrator invalid. Otherwise, the exportedkeytab file cannot be used. For details, see Changing the Password of an Operation User.

Procedure



Step 3 In the row of a user for whom you want to export the keytab file, choose More > Downloadauthentication credential to download the authentication file. After the file is automaticallygenerated, save it to a specified path and keep it secure.

Step 4 Open the authentication file with a decompression program.l user.keytab indicates a user keytab file used for user authentication.l krb5.conf indicates the configuration file of the authentication server. The application

connects to the authentication server according to the configuration file informationwhen authenticating users.

----End

6.13 Modifying a Password Policy

ScenarioThis section describes how to set password and user login security rules as well as user lockrules. Password policies set on MRS Manager take effect for Human-machine users only,because the passwords of Machine-machine users are randomly generated.

Modify password policies based on service security requirements, because they involve usermanagement security. Otherwise, security risks may be caused.

Procedure


Step 2 Click Configure Password Policy.

Step 3 Modify password policies as prompted. For parameter details, see Table 6-10.



2019-01-15 350

Table 6-10 Password policy parameter description


Minimum Password Length Indicates the minimum number of characters apassword contains. The value ranges from 6 to32. The default value is 6.

Number of Character Types Indicates the minimum number of charactertypes a password contains. The character typesare uppercase letters, lowercase letters, digits,spaces, and special characters (~`!?,.:;-_'(){}[]/<>@#$%^&*+|\=). The value can be 4 or 5. Thedefault value is 2, which means that a passwordmust contain at least two types of the followingcharacters: uppercase letters, lowercase letters,digits, special characters, and spaces.

Password Validity Period (days) Indicates the validity period (days) of apassword. The value ranges from 0 to 90. 0means that the password is permanently valid.The default value is 90.

Password Expiration NotificationDays

It is used to notify password expiration inadvance. After the value is set, if the differencebetween the cluster time and the passwordexpiration time is smaller than this value, theuser receives password expiration notifications.When logging in to MRS Manager, the user willbe notified that the password is about to expireand a message is displayed asking the user tochange the password. The value ranges from 0 toX (X must be set to the half of the passwordvalidity period and rounded down). Value 0indicates that no notification is sent. The defaultvalue is 5.

Interval of Resetting AuthenticationFailure Count (min)

Indicates the interval (minutes) of retainingincorrect password attempts. The value rangesfrom 0 to 1440. 0 indicates that incorrectpassword attempts are permanently retained and1440 indicates that incorrect password attemptsare retained for one day. The default value is 5.

Number of Password Retries Indicates the number of consecutive wrongpasswords allowed before the system locks theuser. The value ranges from 3 to 30. The defaultvalue is 5.

Account Lock Duration (min) Indicates the time period during which a user islocked when the user lockout conditions are met.The value ranges from 5 to 120. The defaultvalue is 5.



2019-01-15 351

----End

6.14 Configuring Cross-Cluster Mutual TrustRelationships

Scenario

If two clusters, both with Kerberos authentication enabled, need to access the resources ofeach other, the administrator must configure the mutual trust relationships between theclusters.

If no trust relationship is configured, resources of a cluster are available only for users in thecluster. MRS automatically assigns a unique domain name for each cluster to define thescope of resources for users.

Impact on the Systeml After cross-cluster mutual trust is configured, resources of a cluster become available for

users in the other cluster. User permission in the clusters must be regularly checkedbased on service and security requirements.

l After cross-cluster mutual trust is configured, the two clusters must be restarted and areunavailable during restart.

l After cross-cluster mutual trust is configured, internal users krbtgt/Local cluster domainname@External cluster domain name and krbtgt/External cluster domain name@Localcluster domain name are added to the two clusters. The internal users cannot be deleted.The default password of the users is Admin@123.

l After cross-cluster mutual trust is configured, the client must be re-installed.

Prerequisitesl Kerberos authentication is enabled for both clusters. For example, two analysis clusters

with Kerberos authentication enabled are created.l Both clusters are in the same VPC and subnet.

Procedure

Step 1 On the MRS management console, query all security groups of the two clusters.

Each cluster has two security groups, namely the security group of the Master node and Corenode respectively.

Step 2 On the VPC management console, add rules for each security group.

Set Protocol to ANY, Transfer Direction to Inbound, and Source to Security Group. Thesource is the security group of the peer cluster. Two inbound rules are required.

Step 3 Log in to MRS Manager of the two clusters separately. Click Service and check whether theHealth Status of all components is Good.l If yes, go to Step 4.



2019-01-15 352

l If no, contact technical support personnel for troubleshooting.

Step 4 Query configuration information.

1. On MRS Manager of the two clusters, choose Service > KrbServer > Instance. QueryOM IP Address of the two KerberosServer hosts.

2. Click Service Configuration. Set Type to All. Choose KerberosServer > Port in thenavigation tree on the left. Query the value of kdc_ports. The default value is 21732.

3. Click Realm and query the value of default_realm.

Step 5 On MRS Manager of either cluster, modify the peer_realms parameter.

Table 6-11 Parameter description


realm_name default_realm of the peer cluster

ip_port KDC address of the peer cluster. Format: IP address of aKerberosServer node in the peer cluster:kdc_portThe addresses of the two KerberosServer nodes are separated by acomma. For example, if the IP addresses of the KerberosServer nodesare 10.0.0.1 and 10.0.0.2 respectively, the value of this parameter is10.0.0.1:21732,10.0.0.2:21732.

NOTE

l To deploy trust relationships with multiple clusters, click to add items and specify relevant

parameters. To delete an item, click .l A cluster can have trust relationships with a maximum of 16 clusters. By default, no trust

relationship exists between different clusters that are trusted by a local cluster.

Step 6 Click Save Configuration. In the dialog box that is displayed, select Restart the affectedservices or instances and click OK.

After Operation succeeded is displayed, click Finish.

Step 7 Exit MRS Manager and log in to it again. If the login is successful, the configurations arevalid.

Step 8 Log in to MRS Manager of the other cluster and repeat Step 5 to Step 7.

----End

6.15 Configuring Users to Access Resources of a TrustedCluster

ScenarioAfter cross-cluster mutual trust is configured, permission must be configured for users in thelocal cluster, so that the users can access the same resources in the peer cluster as the users inthe peer cluster.



2019-01-15 353

PrerequisitesThe mutual trust relationship has been configured between two clusters (clusters A and B).The clients of the clusters have been updated.

Procedure

Step 1 Log in to MRS Manager of cluster A and choose System > Manage User. Check whethercluster A has accounts that are the same as those of cluster B.l If yes, go to Step 2.l If no, go to Step 3.

Step 2 Click on the left side of the username to unfold detailed user information. Check the usergroups and roles of the accounts to ensure that they have the same permission as the accountsof cluster B.

For example, user admin of cluster A has the permission to access and create files in the /tmpdirectory of cluster A. Then go to Step 4.

Step 3 Create the accounts in cluster A and bind the accounts to the user group and roles required bythe services. Then go to Step 4.

Step 4 Choose Service > HDFS > Instance. Query OM IP Address ofNameNode(hacluster,Active).

Step 5 Log in to the client of cluster B.

For example, if you have updated the client on the Master2 node, log in to the Master2 nodeto use the client. For details, see Client Management.

Step 6 Run the following command to access the /tmp directory of cluster A.

hdfs dfs -ls hdfs://192.168.6.159:25000/tmp

In the preceding command, 192.168.6.159 is the IP address of the active NameNode of clusterA; 25000 is the default port for communication between the client and the NameNode.

Step 7 Run the following command to create a file in the /tmp directory of cluster A.

hdfs dfs -touchz hdfs://192.168.6.159:25000/tmp/mrstest.txt

----End



2019-01-15 354

7 Using MRS

7.1 Accessing the UI of the Open Source Component

7.1.1 Overview

Scenario

Websites of different components are created and hosted on the Master or Core nodes in theMRS cluster by default. You can view information about the components on these websites.The websites can be accessed only through the network of the cluster and are not released onthe Internet for security reasons. Common users can access the websites by creating an ECSwith a graphic user interface (GUI) in the network.

If you do not want to create an extra ECS, you can turn to technical experts or developmentengineers. They can use the dynamic port forwarding function of the SSH channel to allowyou to access the websites.

Websites

Table 7-1 Clusters with Kerberos authentication disabled

Cluster Type Website Type Website

All Types MRS Manager https://IP address of the cluster Manager:28443/web

Analysis cluster HDFS NameNode http://IP address of the active NameNoderole instance:25002/dfshealth.html#tab-overview

HBase HMaster https://IP address of the active HMasterrole instance:21301/master-status

MapReduceJobHistoryServer

http://IP address of the JobHistoryServerrole instance:26012/jobhistory

MapReduce ServiceUser Guide 7 Using MRS

2019-01-15 355


YARN ResourceManager http://IP address of the activeResourceManager role instance:26000/cluster

Spark JobHistory For MRS 1.5.0 or later versions:http://IP address of the JobHistory roleinstance:22500/For MRS 1.3.0:http://IP address of the JobHistory roleinstance:23020/

Hue https://Floating IP address of Hue:21200The Loader page is a graphical datamigration management tool based on theopen source Sqoop WebUI and is hostedon the Hue WebUI.

Streamprocessingcluster

Storm http://IP address of any UI role instance:29280/index.html

Table 7-2 Clusters with Kerberos authentication enabled


All Types MRS Manager https://IP address of the cluster Manager:28443/web

Analysis cluster HDFS NameNode https://IP address of the cluster Manager:20026/HDFS/NameNode/30/dfshealth.html

HBase HMaster https://IP address of the cluster Manager:20026/HBase/HMaster/45/master-status

MapReduceJobHistoryServer

https://IP address of the cluster Manager:20026/Mapreduce/JobHistoryServer/54/jobhistory

YARN ResourceManager https://IP address of the cluster Manager:20026/Yarn/ResourceManager/42/cluster

Spark JobHistory Choose Service > Spark > JobHistory.

Hue https://IP address of the cluster Manager:21201/homeThe Loader page is a graphical datamigration management tool based on theopen source Sqoop WebUI and is hostedon the Hue WebUI.


2019-01-15 356


Streamprocessingcluster

Storm https://IP address of the cluster Manager:20026/Storm/UI/39/index.html

7.1.2 Creating an SSH Channel for Connecting to an MRS Cluster

Scenario

Users and an MRS cluster are in different networks. As a result, an SSH channel needs to becreated to send users' requests for accessing websites to the MRS cluster and dynamicallyforward them to the target websites.

Prerequisitesl You have prepared an SSH client for creating the SSH channel, for example, the Git

open source SSH client. You have downloaded and installed the client.l You have created a cluster and prepared a key file in the pem format.l Users can access the Internet on the local PC.

Procedure

Step 1 Log in to the MRS management console and choose Cluster > Active Cluster.

Step 2 Click the specified MRS cluster name.

Record Default Security Group of the Master node.

Step 3 Add an inbound rule to the security group of the Master node to allow data from the specifiedsources to access port 22.

For details, see Virtual Private Cloud > User Guide > Security > Security Group >Adding a Security Group Rule.

Step 4 Bind an elastic IP address to the Master2 node.

See "Assigning an EIP and Binding It to an ECS" in the Virtual Private Cloud User Guide(Network Components > EIP > Assigning an EIP and Binding It to an ECS).

Step 5 Locally start Git Bash and run the following command to log in to the Master2 node:

ssh -i Path of the key file linux@Elastic IP address

Step 6 Run the following commands to view data forwarding configurations:

1. cat /proc/sys/net/ipv4/ip_forwardIf 1 is displayed, the forwarding function has been configured. If it is not displayed,perform Related Tasks.

2. cat /etc/sysctl.conf | grep net.ipv4.ip_forwardIf net.ipv4.ip_forward=1 is displayed, the forwarding function has been configured. If itis not displayed, perform Related Tasks.


2019-01-15 357

Step 7 Run the following command to view the floating IP address:

ifconfig

In the command output, eth0:FI_HUE indicates the floating IP address of Hue andeth0:wsom specifies the floating IP address of MRS Manager. Record the value of inet.

Run the exit command to exit.

Step 8 Run the following command to create an SSH channel supporting dynamic port forwarding:

ssh -i Path of the key file -v -ND Local port linux@Elastic IP address

In the command, set Local port to the user's local port that is not occupied. Port 1080 isrecommended.

After the SSH channel is created, add -D to the command and run the command to start thedynamic port forwarding function. By default, the dynamic port forwarding function enables aSOCKS proxy process and monitors the user's local port. Port data will be forwarded to theMaster2 node using the SSH channel.

----End

Related Tasks

Modifying forwarding configurations on the node


Step 2 Run the following command to switch to user root:

sudo su - root

Step 3 Run the following commands to modify forwarding configurations:

echo 1 > /proc/sys/net/ipv4/ip_forward

echo "net.ipv4.ip_forward = 1" >> /etc/sysctl.conf

sysctl -w net.ipv4.ip_forward=1

Step 4 Run the following command to modify the sshd configuration file:

vi /etc/ssh/sshd_config

Press I to enter the edit mode. Locate AllowTcpForwarding and GatewayPorts and deletecomment tags. Modify them as follows. Save the changes and exit.

AllowTcpForwarding yesGatewayPorts yes

Step 5 Run the following command to restart the sshd service:

service sshd restart

----End


2019-01-15 358

7.1.3 Configuring a Website Accessed by Browsers

Scenario

Websites hosted in the MRS cluster can be accessed only using browsers. Google Chromemust be used, because the SSH channel enables the SOCKS proxy. The proxy must beenabled when any website is accessed.

Prerequisites

You have performed operations in Creating an SSH Channel for Connecting to an MRSCluster and obtained the floating IP addresses of the local proxy port and MRS Manager.

Procedure

Step 1 Configure a browser proxy.l Google Chrome

a. Create the rule.pac text file on the local PC. Copy the following content and save itto the file.function FindProxyForURL(url, host){ return "SOCKS5 localhost:1080";}

b. In the browser, choose Settings > Show advanced settings > Network > Changeproxy settings > Connections > LAN setting.

c. Select Use automatic configuration script and enter the path of the rule.pac file.The path format is file://c:Users/rule.pac. Use the default format. Do not configureother parameters.

d. Save the configurations and close the Settings page.

Step 2 In the address bar of the browser, enter the address for accessing MRS Manager.

Address format: https://Floating IP address of MRS Manager:28443/web

The username and password of the MRS cluster need to be entered for accessing clusters withKerberos authentication enabled, for example, user admin. They are not required foraccessing clusters with Kerberos authentication disabled.

When accessing the MRS Manager for the first time, you must add the address to the trustedsite list.

Step 3 Prepare the website access address.

1. Obtain the website address format and the role instance according to Websites.2. Click Services.3. Click the specified service name, for example, HDFS.4. Click Instance and view Service IP of NameNode(Active).

Step 4 In the address bar of the browser, enter the website address to access it.

Step 5 When logging out of the website, terminate and close the SSH channel.

----End


2019-01-15 359

7.2 Using Hadoop from ScratchThis section describes how to use Hadoop to submit a wordcount job. Wordcount, a typicalHadoop job, is used to count the words in texts.

Procedure

Step 1 Prepare the wordcount program.

The open source Hadoop example program contains the wordcount program. You candownload the Hadoop example program at http://dist.apache.org/repos/dist/release/hadoop/common/.

For example, select a Hadoop version hadoop-2.7.x. Download hadoop-2.7.x.tar.gz,decompress it, and obtain hadoop-mapreduce-examples-2.7.x.jar from the hadoop-2.7.x\share\hadoop\mapreduce directory. The hadoop-mapreduce-examples-2.7.x.jar exampleprogram contains the wordcount program.

NOTE

hadoop-2.7.x indicates the Hadoop version.

Step 2 Prepare data files.

There is no format requirement for data files. Prepare one or more TXT files. The following isan example of a TXT file:

qwsdfhoedfrffrofhuncckgktpmhutopmmajjpsffjfjorgjgtyiuyjmhombmbogohoyhmjhheyeombdhuaqqiquyebchdhmamdhdemmjdoeyhjwedcrfvtgbmojiyhhqssddddddfkfkjhhjkehdeiyrudjhfhfhffooqweopuyyyy

Step 3 Upload data to OBS.

1. Log in to the OBS console.2. Click Create Bucket to create a bucket and name it. The name must be unique;

otherwise the bucket cannot be created. Here name wordcount will be used as anexample.

3. In the wordcount bucket, click Create Folder to create the program, input, output,and log folders.– program: stores user programs.– input: stores user data files.– output: stores job output files.– log: stores job output log files.

4. Go to the program folder, click to select the program package downloaded in Step1, and click Upload.

5. Go to the input folder and upload the data file that is prepared in Step 2.

Step 4 Log in to the MRS management console. In the navigation tree on the left, choose Cluster >Active Cluster and click the cluster named mrs_20160907. The mrs_20160907 cluster wascreated in section Creating a Cluster.


2019-01-15 360

http://dist.apache.org/repos/dist/release/hadoop/common/

http://dist.apache.org/repos/dist/release/hadoop/common/

Step 5 Submit a wordcount job.

1. Select Job Management. On the Job tab page, click Create to go to the Create Jobpage, as shown in Figure 7-1.Only when the mrs_20160907 cluster is in the running state can jobs be submitted.

Figure 7-1 Creating a MapReduce job

Table 7-3 describes parameters for job configuration. The following is a jobconfiguration example:– Type: Select MapReduce.– Name: For example, mr_01.– Program Path:

Set the path to the address that stores the program on OBS. Replace the bucketname and program name with the names of the bucket and program that youcreated in Step 3.3. For example, s3a://wordcount/program/hadoop-mapreduce-examples-2.7.x.jar.

– Parameters:Indicate the main class of the program to be executed, for example, wordcount.

– Import From:Set the path to the address that stores the input data files on OBS. Replace thebucket name and input name with the names of the bucket and file folder that youcreated in Step 3.3. For example, s3a://wordcount/input.

– Export To:Set the path to the address that stores the job output files on OBS. Replace thebucket name and output name with the names of the bucket and file folder that youcreated in Step 3.3. For example, s3a://wordcount/output.


2019-01-15 361

– Log path:Set the path to the address that stores the job log files on OBS. Replace the bucketname and log name with the names of the bucket and file folder that you created inStep 3.3. For example, s3a://wordcount/log.

A job will be executed immediately after being created successfully.



Type Job typePossible types include:– MapReduce– Spark– Spark Script– Hive ScriptNOTE

To add jobs of the Spark and Hive types, you need to select Spark and Hivecomponents when creating a cluster and the cluster must be in the runningstate. Spark Script jobs support Spark SQL only, and Spark supports SparkCore and Spark SQL.

Name Job nameThis parameter consists of 1 to 64 characters, including letters,digits, hyphens (-), or underscores (_). It cannot be null.NOTE



When configuring this parameter, click OBS or HDFS, specify the file path,and click OK.

This parameter cannot be null.This parameter must meet the following requirements:– A maximum of 1023 characters are allowed, but special

characters (*?<">|\) are not allowed. The address cannot beempty or full of spaces.

– The path varies depending on the file system:n OBS: The path must start with s3a://, for example, s3a://

wordcount/program/hadoop-mapreduce-examples-2.7.x.jar.

n HDFS: The path must start with /user.– Spark Script must end with .sql; MapReduce and Spark must end

with .jar. sql and jar are case-insensitive.


2019-01-15 362


Parameters Key parameter for executing jobsThis parameter is assigned by an internal function. MRS is onlyresponsible for inputting the parameter.Format: package name.class nameA maximum of 2047 characters are allowed, but special characters(;|&>',<$) are not allowed. This parameter can be empty.



The path varies depending on the file system:– OBS: The path must start with s3a://.– HDFS: The path must start with /user.A maximum of 1023 characters are allowed, but special characters(*?<">|\) are not allowed. This parameter can be empty.







Step 6 View the job execution results.

1. Go to the Job Management tab page. On the Job tab page, check whether the jobs arecomplete.The job operation takes a while. After the jobs are complete, refresh the job list, asshown in Figure 7-2.


2019-01-15 363

Figure 7-2 Job list

You cannot execute a successful or failed job, but can add or copy the job. After settingjob parameters, you can submit the job again.

2. Log in to the OBS console. Go to the OBS directory and query job output information.

In the wordcount > output directory of OBS, you can query and download the joboutput files.

3. Log in to the OBS console. Go to the OBS directory and check the detailed job executionresults.

In the wordcount > log directory of OBS, you can query and download the jobexecution logs by job ID.

Step 7 Terminate a cluster.

For details, see Terminating a Cluster in the User Guide.

----End

7.3 Using Spark from ScratchThis section describes how to use Spark to submit a sparkPi job. SparkPi, a typical Spark job,is used to calculate the value of pi (π).

Procedure

Step 1 Prepare the sparkPi program.

The open source Spark example program contains the sparkPi program. You can downloadthe Spark example program at .https://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz

Decompress the Spark example program to obtain the spark-examples_2.11-2.1.0.jar file inthe spark-2.1.0-bin-hadoop2.7/examples/jars directory. The spark-examples_2.11-2.1.0.jarexample program contains the sparkPi program.


1. Log in to the OBS console.

2. Click Create Bucket to create a bucket and name it. The name must be unique;otherwise the bucket cannot be created. Here name sparkPi will be used as an example.

3. In the sparkpi bucket, click Create Folder to create the program, output, and logfolders.

4. Go to the program folder, click to select the program package downloaded in Step1, and click Upload.


2019-01-15 364

https://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz

https://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz

Step 3 Log in to the MRS management console. In the navigation tree on the left, choose Cluster >Active Cluster and click the cluster named mrs_20160907. The mrs_20160907 cluster wascreated in section Creating a Cluster.

Step 4 Submit a sparkPi job.

1. Select Job Management. On the Job tab page, click Create to go to the Create Jobpage, as shown in Figure 7-3.Only when the mrs_20160907 cluster is in the running state can jobs be submitted.

Figure 7-3 Creating a Spark job

Table 7-4 describes parameters for job configuration. The following is a jobconfiguration example:– Type: Select Spark.– Name: For example, job_spark.– Program Path:

Set the path to the address that stores the program on OBS. Replace the bucketname and program name with the names of the bucket and program that youcreated in Step 2.3. For example, s3a://sparkpi/program/spark-examples_2.11-2.1.0.jar.

– Parameters:Indicate the main class of the program to be executed, for example,org.apache.spark.examples.SparkPi 10.

– Export To:Set the path to the address that stores the job output files on OBS. Replace thebucket name and output name with the names of the bucket and file folder that youcreated in Step 2.3. For example, s3a://sparkpi/output.


2019-01-15 365

– Log path:Set the path to the address that stores the job log files on OBS. Replace the bucketname and log name with the names of the bucket and file folder that you created inStep 2.3. For example, s3a://sparkpi/log.

A job will be executed immediately after being created successfully.



Type Job typePossible types include:– MapReduce– Spark– Spark Script– Hive ScriptNOTE

To add jobs of the Spark and Hive types, you need to select Spark and Hivecomponents when creating a cluster and the cluster must be in the runningstate. Spark Script jobs support Spark SQL only, and Spark supports SparkCore and Spark SQL.

Name Job nameThis parameter consists of 1 to 64 characters, including letters,digits, hyphens (-), or underscores (_). It cannot be null.NOTE




This parameter cannot be null.This parameter must meet the following requirements:– A maximum of 1023 characters are allowed, but special

characters (*?<">|\) are not allowed. The address cannot beempty or full of spaces.

– The path varies depending on the file system:n OBS: The path must start with s3a://, for example, s3a://

wordcount/program/hadoop-mapreduce-examples-2.7.2.jar.

n HDFS: The path must start with /user.– Spark Script must end with .sql; MapReduce and Spark must end

with .jar. sql and jar are case-insensitive.


2019-01-15 366


Parameters Key parameter for executing jobsThis parameter is assigned by an internal function. MRS is onlyresponsible for inputting the parameter.Format: package name.class nameA maximum of 2047 characters are allowed, but special characters(;|&>',<$) are not allowed. This parameter can be empty.










Step 5 View the job execution results.

1. Go to the Job Management tab page. On the Job tab page, check whether the jobs arecomplete.The job operation takes a while. After the jobs are complete, refresh the job list, asshown in Figure 7-4.


2019-01-15 367

Figure 7-4 Job list

You cannot execute a successful or failed job, but can add or copy the job. After settingjob parameters, you can submit the job again.

2. Go to the OBS directory and query job output information.In the sparkpi > output directory of OBS, you can query and download the job outputfiles.

3. Go to the OBS directory and check the detailed job execution results.In the sparkpi > log directory of OBS, you can query and download the job executionlogs by job ID.



----End

7.4 Using Spark SQL from ScratchTo process structured data, Spark provides Spark SQL, which is similar to SQL.

You can create a table named src_data, write a data entry in each row of the src_data table,and store data in the mrs_20160907 cluster. You can then use SQL statements to query data inthe src_data table. Afterwards, you can delete the src_data table.

PrerequisitesYou have obtained the AK/SK for writing data from the OBS data source to the Spark SQLtable. The method for obtaining the AK/SK is as follows:

1. Register with and log in to the HUAWEI CLOUD management console.2. Change the region to EU-Paris and access the EU-Paris Console page.3. In Management&Deployment, choose Identity and Access Management.4. On the User page, click Create User.5. Set Username, set Credential Type to Access key, and click OK.6. In the Download Access Key dialog box, click OK to download the access key and save

it.

Procedure

Step 1 Prepare data sources for Spark SQL analysis.

The following is an example of a text file:

abcd3ghjiefgh658ko1234jjyu9


2019-01-15 368

7h8kodfg1kk99icxz3


1. Log in to the OBS management console.2. Click Create Bucket to create a bucket and name it. The name must be unique or else

the bucket cannot be created. Here name sparksql will be used as an example.3. In the sparksql bucket, click Create Folder to create the input folder.

4. Go to the input folder, click to select a local text file, and click Upload.

Step 3 Import the text file in OBS to HDFS.

1. Log in to the MRS management console. In the navigation tree on the left, chooseCluster > Active Cluster and click the cluster named mrs_20160907. Themrs_20160907 cluster was created in section Creating a Cluster.

2. Select File Management tab page.3. Click Create Folder and create the userinput file folder.4. Go to the userinput file folder, and click Import Data.5. Select the OBS and HDFS paths and click OK.

OBS path: s3a://sparksql/input/sparksql-test.txtHDFS path: /user/userinput

Step 4 Submit the Spark SQL statement.

1. On the Job Management tab page, select Spark SQL. The Spark SQL job page isdisplayed.Only when the mrs_20160907 cluster is in the running state can jobs be submitted.

2. Enter the Spark SQL statement to create a table.When entering Spark SQL statements, ensure that the characters contained are fewerthan 10,000.The syntax is as follows:CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name [(col_name data_type[COMMENT col_comment], ...)] [COMMENT table_comment] [PARTITIONED BY(col_name data_type [COMMENT col_comment], ...)] [CLUSTERED BY (col_name,col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_bucketsBUCKETS] [ROW FORMAT row_format] [STORED AS file_format] [LOCATIONhdfs_path];You can use either of the following two methods to create a table:– Method 1: Create table src_data and write data in every row.

The data source is stored in the /user/userinput file folder of HDFS: createexternal table src_data(line string) row format delimited fields terminated by '\\n'stored as textfile location '/user/userinput';The data source is stored in the /sparksql/input file folder of OBS: create externaltable src_data(line string) row format delimited fields terminated by '\\n' stored astextfile location's3a://AK:SK@sparksql/input';For the method of obtaining the AK/SK, see the description in Prerequisites.

– Method 2: Create table src_data1 and load data to the src_data1 table in batches.create table src_data1 (line string) row format delimited fields terminated by ',' ;


2019-01-15 369

load data inpath '/user/userinput/sparksql-test.txt' into table src_data1;

NOTE

When method 2 is used, the data from OBS cannot be loaded to the created tables directly.

3. Enter the Spark SQL statement to query a table.The syntax is as follows:SELECT col_name FROM table_name;To query data in the src_data table, for example, enter the following statement:select * from src_data;

4. Enter the Spark SQL statement to delete a table.The syntax is as follows:DROP TABLE [IF EXISTS] table_name;For example:drop table src_data;

5. Click Check to check whether the statements are correct.6. Click Submit.

After submitting Spark SQL statements, you can check whether the execution issuccessful in Last Execution Result and view detailed execution results in Last QueryResult Set.



----End

7.5 Using HBase from ScratchHBase is a scalable column-based distributed storage system. It features high reliability andhigh performance.

You can update the client on a Master node in the mrs_20160907 cluster. The client can beused to create a table, data can be inserted to, read, and deleted from the table, and the tablecan be modified and deleted.

Background

After an MRS cluster has been successfully created, the original client is by default stored inthe /opt/client directory on all nodes in the cluster. Before using the client, download theclient file, update the client, and locate the active management node of MRS Manager.

For example, if a user develops an application to manage information about users who useservice A in an enterprise, the operation processes of service A using the HBase client are asfollows:

l Create a user information table.l Add diplomas and titles of users to the table.l Query usernames and addresses by user ID.l Query information by username.


2019-01-15 370

l Deregister users and delete user data.l Delete the user information table after service A ends.

Table 7-5 User information

ID Name Gender Age Address

12005000201 A Male 19 City A

12005000202 B Female 23 City B

12005000203 C Male 26 City C

12005000204 D Male 18 City D

12005000205 E Female 21 City E

12005000206 F Male 32 City F

12005000207 G Female 29 City G

12005000208 H Female 30 City H

12005000209 I Male 26 City I

12005000210 J Male 25 City J

Procedure

Step 1 Download the client file or the client configuration file.

1. Log in to the MRS management console. In the navigation tree on the left, chooseCluster > Active Cluster and click the cluster named mrs_20160907. Themrs_20160907 cluster was created in section Creating a Cluster.

2. In the Cluster List > mrs_20160907 area, click Cluster Manager to open MRSManager.

3. Click Service, and click Download Client.Set Client Type to All client files or Only configuration files, set Download Path toServer, and click OK to generate the client file or the client configuration file. Thegenerated file is saved in the /tmp/MRS-Client directory on the active managementnode by default. You can modify the file save path as required.


1. In the Cluster List > mrs_20160907Choose Cluster > Active Cluster and click thecluster named mrs_20160907. In the Cluster List > mrs_20160907 area to view theActive Master Node IP Address parameter. Active Master Node IP Address is the IPaddress of the active Master node in a cluster, which is also the IP address of the activemanagement node of MRS Manager.The active and standby management nodes of MRS Manager are installed on Masternodes by default. Because Master1 and Master2 are switched over in active and standbymode, Master1 is not always the active management node of MRS Manager. Run acommand in Master1 to check whether Master1 is the active management node of MRSManager. For details about the command, see Step 2.4.


2019-01-15 371

2. Log in to the Master1 node using a password as user linux. For details, see Logging Into an ECS Using VNC in the User Guide.The Master node supports Cloud-init. The preset username and password for Cloud-initis linux and cloud.1234. If you have changed the password, log in to the node using thenew password. See "How Do I Log In to an ECS Once All Images Support Cloud-Init?"in the Elastic Cloud Server User Guide (FAQs > Login FAQs > How Do I Log In to anECS Once All Images Support Cloud-Init?).

3. Run the following command to switch to user omm.sudo su - rootsu - omm

4. Run the following command to confirm the active and standby management nodes.sh ${BIGDATA_HOME}/om-0.0.1/sbin/status-oms.sh | grep ActivedFor example, the following information is displayed. node-master2-LJXDj indicates thename of the active management node.192-168-1-17 node-master2-LJXDj V100R001C01 2016-10-01 06:58:41 active normal Actived

NOTE

If the Master1 node to which you have logged in is the standby management node and you need tolog in to the active management node, run the following command:

ssh IP address of the Master2 nodessh name of the active management nodeFor example, run the following command: ssh node-master2-LJXDj

5. Log in to the active management node, for example, node node-master2-LJXDj, as userroot.


After an MRS cluster is successfully created, the client is installed in the /opt/client directoryby default.

cd /opt/client

Step 4 Run the following command to update the client configuration for the active managementnode.

Switch to user omm.

sudo su - omm

sh refreshConfig.sh /opt/client Full path of the client configuration file package

For example, run the following command:

sh refreshConfig.sh /opt/client /tmp/MRS-Client/MRS_Services_Client.tar

If the following information is displayed, the configuration is updated successfully.

ReFresh components client config is complete.Succeed to refresh components client config.

Step 5 Use the client on a Master node.

1. On the active management node where the client is updated, for example, node-master2-LJXDj, run the following command to go to the client directory.


2019-01-15 372

cd /opt/client2. Run the following command to configure environment variables.

source bigdata_env3. If the Kerberos authentication is enabled for the current cluster, run the following

command to authenticate users. If the Kerberos authentication is disabled for the currentcluster, skip this step.kinit MRS cluster userFor example, kinit admin.

4. Run an HBase component client command directly.hbase shell

Step 6 Run commands on the HBase client to implement service A.

1. Create a user information table according to Table 7-5 and add data to it.create 'user_info',{NAME => 'i'}For example, to add data of user 12005000201, run the following commands insequence:put 'user_info','12005000201','i:name','A'put 'user_info','12005000201','i:gender','Male'put 'user_info','12005000201','i:age','19'put 'user_info','12005000201','i:address','City A'

2. Add degree and title information about the user to the table.For example, to add degree and title information about user 12005000201, run thefollowing commands:put 'user_info','12005000201','i:degree','master'put 'user_info','12005000201','i:pose','manager'

3. Query usernames and addresses by user ID.For example, to query the username and address of user 12005000201, run the followingcommand:scan'user_info',{STARTROW=>'12005000201',STOPROW=>'12005000201',COLUMNS=>['i:name','i:address']}

4. Query information by username.For example, to query information about user A, run the following command:scan'user_info',{FILTER=>"SingleColumnValueFilter('i','name',=,'binary:A')"}

5. Delete user data from the user information table.All user data needs to be deleted. For example, to delete data of user 12005000201, runthe following command:delete'user_info','12005000201','i'

6. Run the following command to delete the user information table.disable'user_info';drop 'user_info'



----End


2019-01-15 373

7.6 Using Hue

7.6.1 Accessing the Hue WebUI

Scenario

After Kerberos authentication is enabled and Hue is installed for an MRS cluster, users canuse Hadoop and Hive on the Hue WebUI.

This section describes how to open the Hue WebUI on the MRS cluster supporting Kerberosauthentication.

NOTE

To access the Hue WebUI, you are advised to use a browser that is compatible with the Hue WebUI, forexample, Google Chrome 50. The Internet Explorer may be incompatible with the Hue WebUI.


Site trust must be added to the browser when you access MRS Manager and Hue WebUI forthe first time. Otherwise, the Hue WebUI cannot be accessed.

Prerequisites

The MRS cluster administrator has assigned the permission for using Hive to the user. Fordetails, see Creating a User. For example, create Human-machine user hueuser, add theuser to the hive group, and assign the user the System_administrator role.

Procedure

Step 1 Access MRS Manager.

For details, see Accessing MRS Manager Supporting Kerberos Authentication.

Step 2 On MRS Manager, choose Service > Hue. In Hue WebUI of Hue Summary, click Hue(Active). The Hue WebUI is opened.

Hue WebUI provides the following functions:

l If Hive is installed in the MRS cluster, you can use Query Editors to execute querystatements of Hive.

l If Hive is installed in the MRS cluster, you can use Data Browsers to manage Hivetables.

l If HDFS is installed in the MRS cluster, you can use File Browser to view directoriesand files in HDFS.

l If Yarn is installed in the MRS cluster, you can use Job Browser to view all jobs in theMRS cluster.


2019-01-15 374

NOTE

l After obtaining the URL for accessing the Hue WebUI, user can give the URL to other users whocannot access MRS Manager for accessing the Hue WebUI.

l If you perform operations on the Hue WebUI only but not on MRS Manager, you must enter thepassword of the current login user when accessing MRS Manager again.

----End

7.6.2 Using HiveQL Editor on the Hue WebUI

Scenario

After Kerberos authentication is enabled for an MRS cluster, users can use the Hue WebUI toexecute HiveQL statements in the cluster.

Prerequisites

The MRS cluster administrator has assigned the permission for using Hive to the user. Fordetails, see Creating a User.

Accessing Query Editors

Step 1 Access the Hue WebUI and choose Query Editors > Hive. The Hive page is displayed.

Hive supports the following functions.

l Executes and manages HiveQL statements.l Queries HiveQL statements saved by the current user in Saved Queries.l Queries HiveQL statements executed by the current user in Query History.

Step 2 Click to display all databases in Hive.

----End

Executing HiveQL Statements

Step 1 Access Query Editors.

Step 2 Select a Hive database in Databases. The default database is default.

The system displays all available tables. You can enter a keyword of the table name to searchfor the desired table.

Step 3 Click the desired table name. All columns in the table are displayed.

Move the cursor to the row of the table and click . Column details are displayed.

Step 4 Enter the query statements in the area for editing HiveQL statements.

Click and choose Explain. The editor checks the syntax and execution plan of theentered statements. If the statements have syntax errors, the editor reports Error whilecompiling statement.

Step 5 Select the engine for executing the HiveQL statements.


2019-01-15 375

l mr: MapReduce computing framework

l spark: Spark computing framework

Step 6 Click to execute the HiveQL statements.

NOTE

l If you want to use the entered HiveQL statements again, click to save them.

l To format the HiveQL statements, click and choose Format.

l To delete the entered HiveQL statements, click and choose Clear.

l To clear the entered statements and start a new query, click and choose New query.

----End

Querying Execution Results

Step 1 View the execution results below the execution area on Hive. The Query History tab isdisplayed by default.

Step 2 Click a result to view the executed statements.

----End

Managing Statements

Step 1 Access Query Editors.

Step 2 Click Saved Queries.

Click a saved statement. The system automatically fills the statement in the editing area.

----End

Modifying Query Editors Settings

Step 1 On the Hive page, click .

Step 2 Click on the right side of Files, and click to specify the save path of the file.

You can click to add a file resource.

Step 3 Click on the right side of Functions. Enter the name of the user-defined function and thefunction class.

You can click to add a function.

Step 4 Click on the right side of Settings. Enter the Hive parameter name in Key under Settingsand the parameter value in Value. The session connects to Hive using the user-definedconfiguration.


2019-01-15 376

You can click to add a parameter.

----End

7.6.3 Using the Metadata Browser on the Hue WebUI

ScenarioAfter Kerberos authentication is enabled for an MRS cluster, users can use the Hue WebUI tomanage Hive metadata in the cluster.

PrerequisitesThe MRS cluster administrator has assigned the permission for using Hive to the user. Fordetails, see Creating a User.

Accessing the Metadata Browser

Step 1 Access the Hue WebUI.

Step 2 Choose Data Browsers > Metastore Tables to access Metastore Manager.

Metastore Manager supports the following functions.

l Creating a Hive table from a filel Manually creating a Hive tablel Viewing Hive table metadata

----End

Creating a Hive Table from a File

Step 1 Access Metastore Manager and select a database in Databases.

The default database is default.

Step 2 Click to access the Create a new table from a file page.

Step 3 Select a file.

1. In Table Name, enter a Hive table name.A Hive table name contains no more than 128 letters, numbers, or underscores (_) andmust start with a letter or number.

2. In Description, enter description about the Hive table as required.

3. In Input File or Location, click and select a file in HDFS for creating a Hive table.The file is used to store new data of the Hive table.If the file is not stored in HDFS, click Upload a file to upload the file from the localdirectory to HDFS. Multiple files can be simultaneously uploaded. The files cannot beempty.

4. If you want to import data in the file to the Hive table, select Import data (selected bydefault) in Load method.


2019-01-15 377

If you select Create External Table, an external Hive table is created.

NOTE

If you select Create External Table, select a path in Input File or Location.

If you select Leave Empty, an empty Hive table is created.5. Click Next.

Step 4 Set a delimiter.

1. In Delimiter, select one.If the delimiter you want to select is not in the list, select Other.. and enter a delimiter.

2. Click Preview to preview data processing.3. Click Next.

Step 5 Define a column.

1. If you click on the right side of Use first row as column names, the first row ofdata in the file is used as a column name. If you do not click it, the first row of data is notused as the column name.

2. In Column name, set a name for each column.A Hive table name contains no more than 128 letters, numbers, or underscores (_) andmust start with a letter or number.

NOTE

You can rename columns in batches by clicking on the right side of Bulk edit column names.Enter all column names and separate them by commas (,).

3. In Column Type, select a type for each column.

Step 6 Click Create Table to create the table. Wait for Hue to display information about the Hivetable.

----End

Manually Creating a Hive Table

Step 1 Access Metastore Manager and select a database in Databases.


Step 2 Click to access the Create a new table manually page.

Step 3 Set a table name.

1. In Table Name, enter a Hive table name.A Hive table name contains no more than 128 letters, numbers, or underscores (_) andmust start with a letter or number.

2. In Description, enter description about the Hive table as required.3. Click Next.

Step 4 Select a data storage format.l If data needs to be separated by delimiters, select Delimited and perform Step 5.l If data needs to be stored in serialization format, select SerDe and perform Step 6.


2019-01-15 378

Step 5 Set a delimiter.

1. In Field terminator, set a column delimiter.

If the delimiter you want to select is not in the list, select Other.. and enter a delimiter.

2. In Collection terminator, set a delimiter to separate the data set of columns of the arraytype in Hive. For example, the type of a column is array. A value needs to storeemployee and manager. The user specifies : as the delimiter. Therefore, the final valueis employee:manager.

3. In Map key terminator, set a delimiter to separate the data set of columns of the arraytype in Hive. For example, the type of a column is map. A value needs to store homethat is described as aaa and company that is described as bbb. The user defines | as thedelimiter. Therefore, the final value is home|aaa:company|bbb.

4. Click Next and perform Step 7.

Step 6 Set serialization properties.

1. In SerDe Name, enter the class name of the serialization format:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.

Users can expand Hive to support more user-defined serialization classes.

2. In Serde properties, enter the value of the serialization format: "field.delim"=",""colelction.delim"=":" "mapkey.delim"="|".

3. Click Next and perform Step 7.

Step 7 Select a data table format and click Next.

l TextFile: indicates that data is stored in text files.

l SequenceFile: indicates that data is stored in binary files.

l InputFormat: indicates that data in files is used in the user-defined input and outputformats.

Users can expand Hive to support more user-defined formatting classes.

a. In InputFormat Class, enter the class used by input data:org.apache.hadoop.hive.ql.io.RCFileInputFormat.

b. In OutputFormat Class, enter the class used by output data:org.apache.hadoop.hive.ql.io.RCFileOutputFormat.

Step 8 Select a file storage location and click Next.

Use default location is selected by default. If you want to customize a storage location,deselect the default value and specify a file storage location in External location.

Step 9 Set columns of the Hive table.

1. In Column name, set a column name.

A Hive table name contains no more than 128 letters, numbers, or underscores (_) andmust start with a letter or number.

2. In Column type, select a column type.

Click Add a column to add a new column.

3. Click Add a partition to add partitions for the Hive table, which can improve queryefficiency.


2019-01-15 379

Step 10 Click Create Table to create the table. Wait for Hue to display information about the Hivetable.

----End

Managing the Hive Table

Step 1 Access Metastore Manager and select a database in Databases. All tables in the database aredisplayed on the page.


Step 2 Click a table name in the database to view table details.

The following operations are supported: importing data, browsing data, deleting tables, andviewing file storage location.

NOTE

When viewing all tables in the database, you can select tables and perform the following operations:viewing tables, browsing data, and delete tables.

----End

7.6.4 Using File Browser on the Hue WebUI

Scenario

After Kerberos authentication is enabled for an MRS cluster, users can use the Hue WebUI tomanage files in HDFS.

Prerequisites

An MRS cluster administrator has granted users with permission to view, create, modify, anddelete files in HDFS. For details, see Creating a User.

Accessing File Browser

Step 1 Access the Hue WebUI and click File Browser.

Step 2 You can view the home directory of the current login user.

On the File Browser page, the following information about subdirectories or files in thedirectory is displayed.

Table 7-6 HDFS file attributes

Attribute Description

Name Name of a directory or file

Size File size

User Owner of a directory or file

Group Group of a directory or file


2019-01-15 380


Permissions Permission of a directory or file

Date Time when a directory or file is created

Step 3 In the search box, enter a keyword. The system automatically searches directories or files inthe current directory.

Step 4 Clear the search box. The system displays all directories or files.

----End

Performing Operations

Step 1 On File Browser, select one or more directories or files.

Step 2 Click Actions. On the menu that is displayed, select an operation.l Rename: renames a directory or file.l Move: moves a file. In Move to, select a new directory and click Move.l Copy: copies the selected files or directories.l Download: downloads the selected files. Directories are not supported.l Change permissions: changes permission to access the selected directory or file.

– You can grant the owner, the group, or other users with the Read, Write, andExecute permission.

– Sticky indicates that only HDFS administrators, directory owners, and file ownerscan delete or move files in the directory.

– Recursive indicates that permission is granted to subdirectories recursively.l Storage policies: sets the policies for storing files or directories in HDFS.l Summary: views HDFS storage information about the selected file or directory.

----End

Deleting Directories or Files

Step 1 On File Browser, select one or more directories or files.

Step 2 Click Move to trash. In Confirm Delete, click Yes to move them to the recycle bin.

If you want to directly delete the files without moving them to the recycle bin, click andselect Delete forever. In Confirm Delete, click Yes to confirm the operation.

----End

Accessing Other Directories

Step 1 Click the directory name, type a full path you want to access, for example, /mr-history/tmp,and press Enter.

The current user must have permission to access other directories.


2019-01-15 381

Step 2 Click Home to go to the home directory.

Step 3 Click History. The history records of directory access are displayed and the directories can beaccessed again.

Step 4 Click Trash to access the recycle bin of the current directory.

Click Empty trash to clean up the recycle bin.

----End

Uploading User Files

Step 1 On File Browser, click Upload.

Step 2 Select an operation.

l Files: uploads user files to the current user.

l Zip/Tgz/Bz2 file: uploads a compressed file. In the dialog box that is displayed, clickSelect ZIP, TGZ or BZ2 files to select the compressed file to be uploaded. The systemautomatically decompresses the file in HDFS. Compressed files in ZIP, TGZ, and BZ2formats are supported.

----End

Creating a New File or Directory

Step 1 On File Browser, click New.

Step 2 Select an operation.

l File: creates a file. Enter a file name and click Create.

l Directory: creates a directory. Enter a directory name and click Create.

----End

7.6.5 Using Job Browser on the Hue WebUI

Scenario

After Kerberos authentication is enabled for an MRS cluster, users can use the Hue WebUI toquery all jobs in the cluster.

Accessing Job Browser

Step 1 Access the Hue WebUI and click Job Browser.

Step 2 View the jobs in the cluster.

NOTE

The number on Job Browser indicates the total number of jobs in the cluster.

Job Browser displays the following job information.


2019-01-15 382

Table 7-7 MRS job attributes


Logs Log information. If a job has logs, is displayed.

ID Job ID, which is generated by the system automatically

Name Job name

Application Type Job type information

Status Job status. Possible values are RUNNING, SUCCEEDED,FAILED, and KILLED.

User User who starts the job

Maps Map progress

Reduces Reduce progress

Queue Yarn queue used for job running

Priority Job running priority

Duration Job running duration

Submitted Time when the job is submitted to the MRS cluster

NOTE

If the MRS cluster has Spark, the Spark-JDBCServer job is started by default to execute tasks.

----End

Searching for Jobs

Step 1 Enter keywords in Username or Text on Job Browser to search for the desired jobs.

Step 2 Clear the search criteria. The system displays all jobs.

----End

Querying Job Details

Step 1 In the job list on Job Browser, click the row that contains the desired job to view details.

Step 2 On the Metadata tab page, you can view the metadata of the job.

NOTE

You can click to open job running logs.

----End

7.7 Using Kafka


2019-01-15 383

7.7.1 Managing Kafka Topics

Scenario

Users can manage Kafka topics on the MRS cluster client to meet service requirements.

Prerequisites

The client has been updated.

Procedure

Step 1 On MRS Manager, choose Service > ZooKeeper > Instance. Query the IP addresses of theZooKeeper instances.

Record the IP address of any ZooKeeper instance.

Step 2 Log in to the node where the client is installed.



sudo su - omm

Step 4 Run the following command to switch to the client directory, for example, /opt/client/Kafka/kafka/bin.

cd /opt/client/Kafka/kafka/bin


source /opt/client/bigdata_env

Step 6 If Kerberos authentication is enabled, run the following command to authenticate the user. IfKerberos authentication is disabled, skip this step.

kinit Kafka username

For example, kinit admin

Step 7 Manage Kafka topics using the following commands:

l Create a topic.

sh kafka-topics.sh --create --topic Topic name --partitions Number of partitions usedby the topic --replication-factor Number of replicas of the topic --zookeeper IPaddress of the node where the ZooKeeper instance is located:clientPort/kafka

l Delete a topic.

sh kafka-topics.sh --delete --topic Topic name --zookeeper IP address of the nodewhere the ZooKeeper instance is located:clientPort/kafka


2019-01-15 384

NOTE

l The number of topic partitions or topic backups cannot exceed the number of Kafka instances.

l By default, clientPort of ZooKeeper is 24002.

l There are three ZooKeeper instances. Use the IP address of any one.

l For details about managing messages in Kafka topics, see Managing Messages in Kafka Topics.

----End

7.7.2 Querying Kafka Topics

Scenario

Users can query existing Kafka topics on MRS Manager.

Procedure


Step 2 Choose Service > Kafka > Kafka Topic Monitor.

All topics are displayed in the list by default. Users can view the number of partitions andbackups of the topics.

Step 3 Click the desired topic in the list to view its details.

----End

7.7.3 Managing Kafka User Permission

Scenario

For clusters with Kerberos authentication enabled, using Kafka requires relevant permission.MRS clusters can grant the use permission of Kafka to different users.

Table 7-8 lists the default Kafka user groups.

Table 7-8 Default Kafka user groups


kafkaadmin Kafka administrator group. Users in this group have the permissionto create, delete, read, and write all topics, and authorize otherusers.

kafkasuperuser Kafka super user group. Users in this group have the permission toread and write all topics.

kafka Kafka common user group. Users in this group must be authorizedby the users in kafkaadmin to read and write certain topics.


2019-01-15 385

Prerequisitesl The client has been updated.

l A user in the kafkaadmin group, for example admin, has been prepared.

Procedure

Step 1 On MRS Manager, choose Service > ZooKeeper > Instance. Query the IP addresses of theZooKeeper instances.

Record the IP address of any ZooKeeper instance.




sudo su - omm





Step 6 Run the following command to authenticate the Kafka administrator account.

kinit Administrator account


Step 7 Manage Kafka user permission using the following commands:

l Query the permission list of a topic.

sh kafka-acls.sh --authorizer-properties zookeeper.connect=IP address of the nodewhere the ZooKeeper instance is located:24002/kafka --list --topic Topic name

l Add producer permission to a user.

sh kafka-acls.sh --authorizer-properties zookeeper.connect=IP address of the nodewhere the ZooKeeper instance is located:24002/kafka --add --allow-principalUser:Username --producer --topic Topic name

l Remove producer permission of a user.

sh kafka-acls.sh --authorizer-properties zookeeper.connect=IP address of the nodewhere the ZooKeeper instance is located:24002/kafka --remove --allow-principalUser:Username --producer --topic Topic name

l Add consumer permission to a user.

sh kafka-acls.sh --authorizer-properties zookeeper.connect=IP address of the nodewhere the ZooKeeper instance is located:24002/kafka --add --allow-principalUser:Username --consumer --topic Topic name --group Consumer group name

l Remove consumer permission of a user.


2019-01-15 386

sh kafka-acls.sh --authorizer-properties zookeeper.connect=IP address of the nodewhere the ZooKeeper instance is located:24002/kafka --remove --allow-principalUser:Username --consumer --topic Topic name --group Consumer group name

NOTE

You need to enter y twice to confirm the removal of permission.

----End

7.7.4 Managing Messages in Kafka Topics

ScenarioUsers can produce or consume messages in Kafka topics using the MRS cluster client. Forclusters with Kerberos authentication enabled, the user must have the permission to performthese operations.

PrerequisitesThe client has been updated.

Procedure

Step 1 On MRS Manager, choose Service > Kafka > Instance. Query the IP addresses of the Kafkainstances.

Record the IP address of any Kafka instance.




sudo su - omm






kinit Kafka username


Step 7 Manage messages in Kafka topics using the following commands:l Produce messages.

sh kafka-console-producer.sh --broker-list IP address of the node where the Kafkainstance is located:21005 --topic Topic name --producer.config /opt/client/Kafka/kafka/config/producer.properties


2019-01-15 387

You can input specified information as the messages produced by the producer and thenpress Enter to send the messages. To end message producing, press Ctrl + C to exit.

l Consume messages.sh kafka-console-consumer.sh --topic Topic name --bootstrap-server IP address ofthe node where the Kafka instance is located:21005 --new-consumer --consumer.config /opt/client/Kafka/kafka/config/consumer.propertiesIn the configuration file, group.id (indicating the consumer group) is set to example-group1 by default. Users can change the value as required. The value takes effect eachtime a consumption occurs.By default, the system reads unprocessed messages in the current consumer group whenthe command is executed. If a new consumer group is specified in the configuration fileand the --from-beginning parameter is added to the command, the system reads allmessages that have not been automatically deleted in Kafka.

NOTE

l For the IP address of the node where the Kafka instance is located, use the IP address of any Brokerinstance.

l If Kerberos authentication is enabled, change the port to 21007.

----End

7.8 Using Storm

7.8.1 Submitting Storm Topologies on the Client

ScenarioUsers can submit Storm topologies on the MRS cluster client to continuously process streamdata. For clusters with Kerberos authentication enabled, users who submit topologies must bemembers of the stormadmin or storm group.

PrerequisitesThe client has been updated.

Procedure




sudo su - omm

Step 3 Run the following command to switch to the client directory, for example, /opt/client:

cd /opt/client


source bigdata_env


2019-01-15 388


kinit Storm username


Step 6 Run the following command to submit the Storm topology:

storm jar Topology package path Class name of the main topology method Topology name

The topology package path is /opt/client/Storm/storm-xxx/examples/storm-starter/storm-starter-topologies-xxx.jar, in which xxx indicates the Storm version. The following examplecommand is used to submit the Storm 1.0.2 client topology.

storm jar /opt/client/Storm/storm-1.0.2/examples/storm-starter/storm-starter-topologies-1.0.2.jar org.apache.storm.starter.WordCountTopology topo1

If the following information is displayed, the topology is submitted successfully.

Finished submitting topology: topo1

NOTE

l To support sampling messages, add the topology.debug and topology.eventlogger.executorsparameters. The example command is as follows:

storm jar /opt/client/Storm/storm-1.0.2/examples/storm-starter/storm-starter-topologies-1.0.2.jar org.apache.storm.starter.WordCountTopology topo1 -ctopology.debug=true -c topology.eventlogger.executors=1

l Data processing methods vary with topologies. The topology in the example generates charactersrandomly and separates character strings. To query the processing status, enable the samplingfunction and perform operations according to Querying Data Processing Logs of the Topology.

Step 7 Run the following command to query Storm topologies. For clusters with Kerberosauthentication enabled, only users in the stormadmin or storm group can query alltopologies.

storm list

----End

7.8.2 Accessing the Storm WebUI

Scenario

The Storm WebUI provides a graphical interface for using Storm. Only streaming clusterswith Kerberos authentication enabled support this function.

Prerequisitesl The password of user admin has been obtained. The admin password is specified by the

user when the MRS cluster is created.l If a user other than admin is used to access the Storm WebUI, the user must be added to

the storm or stormadmin user group.

Procedure

Step 1 Access MRS Manager.


2019-01-15 389

Step 2 Choose Service > Storm. In Storm WebUI of Storm Summary, click any UI link to accessthe Storm WebUI.

NOTE

When accessing the Storm WebUI for the first time, you must add the address to the trusted site list.

The following information can be queried on the Storm WebUI:

l Storm cluster summaryl Nimbus summaryl Topology summaryl Supervisor summaryl Nimbus configurations

----End

Relevant TasksQuery topology details.

Step 1 Access the Storm WebUI.

Step 2 In Topology Summary, click the desired topology to view its detailed information, status,Spouts information, Bolts information, and configurations.

----End

7.8.3 Managing Storm Topologies

ScenarioUsers can manage Storm topologies on the Storm WebUI. Users in the storm group canmanage only the topology tasks submitted by themselves, while users in the stormadmingroup can manage all topology tasks.

Procedure


Step 2 In the Topology summary area, click the desired topology.

Step 3 Use options in Topology actions to manage the Storm topology.l Activate the topology.

Click Activate to activate the topology.l Deactivate the topology.

Click Deactivate to deactivate the topology.l Re-deploy the topology.

Click Rebalance and specify the wait time (in seconds) of re-deployment. Generally, ifthe number of nodes in a cluster changes, the topology can be re-deployed to maximizeresource usage.

l Delete the topology.Click Kill and specify the wait time (in seconds) of the deletion.


2019-01-15 390

l Start or stop sampling messages.Click Debug. In the dialog box that is displayed, specify the percentage of the sampleddata volume. For example, if the value is set to 10, 10% of data is sampled.To stop sampling, click Stop Debug.

NOTE

This function is available only when the sampling function is enabled during topology submission.For details about querying data processing information, see Querying Data Processing Logs ofthe Topology.

l Modify the topology log level.Click Change Log Level to specify the new log level.

Step 4 Display the topology.

In the Topology Visualization area, click Show Visualization to visualize the topology.

----End

7.8.4 Querying Storm Topology Logs

ScenarioUsers can query topology logs to check the execution of a Storm topology in a workerprocess. To query the data processing logs of a topology, users must enable the Debugfunction when submitting the topology.

Prerequisitesl The network of the work environment has been configured according to Related

Operation.l The sampling function has been enabled for the topology.

Querying Worker Process Logs


Step 2 In the Topology Summary area, click the desired topology to view details.

Step 3 Click the desired Spouts or Bolts task. In the Executors (All time) area, click a port in Portto view detailed logs.

----End

Querying Data Processing Logs of the Topology


Step 2 In the Topology Summary area, click the desired topology to view details.

Step 3 Click Debug, specify the data sampling ratio, and click OK.

Step 4 Click the Spouts or Bolts task. In Component summary, click events to view dataprocessing logs.

----End


2019-01-15 391

7.9 Using CarbonData

7.9.1 Getting Started with CarbonDataThis section describes the procedure of using Spark CarbonData. All tasks are based on theSpark-beeline environment. The following tasks are included:

1. Connect to Spark.Before performing any operation on CarbonData, users must connect CarbonData toSpark.

2. Create a CarbonData table.After CarbonData connects to Spark, users must create a CarbonData table to load andquery data.

3. Load data to the CarbonData table.Users load data from CSV files in HDFS to the CarbonData table.

4. Query data in CarbonData.After data is loaded to the CarbonData table, users can run query commands such asgroupby and where.

Prerequisites

The client has been updated.

Procedure

Step 1 Connect CarbonData to Spark.

1. Log in to the node where the client is installed.For example, if you have updated the client on the Master2 node, log in to the Master2node to use the client. For details, see Client Management.

2. Switch the user and configure environment variables.sudo su - ommsource /opt/client/bigdata_env

3. If Kerberos authentication is enabled, run the following command to authenticate theuser. If Kerberos authentication is disabled, skip this step.kinit Spark usernameThe user must be added to the hive group.

4. Run the following command to connect to the Spark environment.spark-beeline

Step 2 Create a CarbonData table.

Run the following command to create a CarbonData table, which is used to load and querydata.

CREATE TABLE x1 (imei string, deviceInformationId int, mac string, productdatetimestamp, updatetime timestamp, gamePointId double, contractNumber double)


2019-01-15 392

STORED BY 'org.apache.carbondata.format'

TBLPROPERTIES('DICTIONARY_EXCLUDE'='mac','DICTIONARY_INCLUDE'='deviceInformationId');

Command result:

+---------+--+| result |+---------+--++---------+--+No rows selected (1.551 seconds)

Step 3 Load data from CSV files to the CarbonData table.

Currently, only CSV files are supported. The CSV column names specified in the LOADcommand must be the same and in the same sequence as the column names in the CarbonDatatable. The data formats and number of data columns in the CSV files must also be the same asthose in the CarbonData table.

The CSV files must be stored on HDFS. Users can upload the files to OBS and import themfrom OBS to HDFS on the File Management page of the MRS management console. IfKerberos authentication is enabled, prepare the CSV files in the work environment and importthem to HDFS using open-source HDFS commands. In addition, assign the Spark user withthe read and execute permissions of the files on HDFS.

For example, the data.csv file is saved in the tmp directory of HDFS and has the followingcontents:

x123,111,dd,2017-04-20 08:51:27,2017-04-20 07:56:51,2222,33333

The command for loading data from the file is as follows:

LOAD DATA inpath 'hdfs://hacluster/tmp/data.csv' into table x1options('DELIMITER'=',','QUOTECHAR'='"','FILEHEADER'='imei,deviceinformationid,mac,productdate,updatetime,gamepointid,contractnumber');

Command result:

+---------+--+| Result |+---------+--++---------+--+No rows selected (3.039 seconds)

Step 4 Query data in the CarbonData table.l Obtaining the number of records

Run the following command to obtain the number of records in the CarbonData table:select count(*) from x1;

l Querying with the groupby conditionRun the following command to obtain the deviceinformationid records withoutrepetition in the CarbonData table:select deviceinformationid,count (distinct deviceinformationid) from x1 group bydeviceinformationid;

l Querying with the where conditionRun the following command to obtain specific deviceinformationid records:select * from x1 where deviceinformationid='111';


2019-01-15 393

NOTE

If the query result has Chinese or other non-English characters, the columns in the query result may notbe aligned. This is because characters of different languages occupy different widths.

Step 5 Run the following command to exit the Spark environment.

!quit

----End

7.9.2 About CarbonData Table

Overview

CarbonData tables are similar to tables in the relational database management system(RDBMS). RDBMS tables consist of rows and columns to store data. CarbonData tables havefixed columns and also store structured data. In CarbonData, data is saved in entity files.

Data Types Supported

CarbonData tables support the following data types:

l Intl Stringl BigIntl Decimall Doublel TimeStamp

Table 7-9 describes the details about the data types.

Table 7-9 CarbonData data types

Data Type Description

Int 4-byte signed integer ranging from -2,147,483,648 to 2,147,483,647NOTE

If non-dictionary columns have Int data, the data is saved as BigInt data in the table.

String The maximum character string length is 100,000.

BigInt Data is saved using the 64-bit technology. The value ranges from-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.

Decimal The default value is (10,0) and maximum value is (38,38).NOTE

If a query condition is used, users can add BD after the number to obtain accurateresults. For example, select * from carbon_table where num =1234567890123456.22BD.

Double Data is saved using the 64-bit technology. The value ranges from 4.9E-324to 1.7976931348623157E308.

TimeStamp The default format is yyyy-MM-dd HH:mm:ss.


2019-01-15 394

NOTE

Measurement of all Int data is processed and displayed using the BigInt data type.

7.9.3 Creating a CarbonData Table

Scenario

A CarbonData table must be created to load and query data.

Creating a Table with Self-Defined Columns

Users can create a table by specifying its columns and data types. For analysis clusters withKerberos authentication enabled, if a user wants to create a CarbonData table in a databaseother than the default database, the Create permission of the database must be added to therole that the user is bound to in Hive role management.

Command example:

CREATE TABLE IF NOT EXISTS productdb.productSalesTable (

productNumber Int,

productName String,

storeCity String,

storeProvince String,

revenue Int)

STORED BY 'org.apache.carbondata.format'

TBLPROPERTIES (

'table_blocksize'='128',

'DICTIONARY_EXCLUDE'='productName',

'DICTIONARY_INCLUDE'='productNumber');

The following table describes the command parameters.

Table 7-10 Parameter description


productSalesTable Indicates the table name. The table is used to load data foranalysis.The table name consists of letters, digits, and underscores (_).


2019-01-15 395


productdb Indicates the database name. The database maintains logicalconnections with tables that it stores to identify and manage thetables.The database name consists of letters, digits, and underscores(_).

productNumberproductNamestoreCitystoreProvincerevenue

Indicate columns in the table. The columns are service entitiesfor data analysis.The column name (field name) consists of letters, digits, andunderscores (_).

table_blocksize Indicates the block size of data files used by the CarbonDatatable. The value ranges from 1 MB to 2048 MB. The default is1024 MB.l If table_blocksize is too small, a large number of small files

will be generated when data is loaded. This may affect theperformance of HDFS.

l If table_blocksize is too large, a large volume of data mustbe read from a block and the read concurrency is low whendata is queried. As a result, the query performancedeteriorates.

It is advised to set the block size based on the data volume. Forexample, set the block size to 256 MB for GB-level data, 512MB for TB-level data, and 1024 MB for PB-level data.

DICTIONARY_EXCLUDE

Specifies the columns that do not generate dictionaries. Thisfunction is optional and applicable to columns of highcomplexity. By default, the system generates dictionaries forcolumns of the String type. However, as the number of valuesin the dictionaries increases, conversion operations by thedictionaries increase and the system performance deteriorates.Generally, if a column has over 50,000 unique data records, it isconsidered as a highly complex column and dictionarygeneration must be disabled.NOTE

Non-dictionary columns support only the String and Timestamp datatypes.

DICTIONARY_INCLUDE

Specifies the columns that generate dictionaries. This functionis optional and applicable to columns of low complexity (withfewer than 50,000 unique data records). It improves theperformance of queries with the groupby condition.


2019-01-15 396

7.9.4 Deleting a CarbonData Table

ScenarioUnused CarbonData tables can be deleted. After a CarbonData table is deleted, its metadataand loaded data are deleted together.

Procedure

Step 1 Run the following command to delete a CarbonData table.

DROP TABLE [IF EXISTS] [db_name.]table_name;

db_name is optional. If db_name is not specified, the table named table_name in the currentdatabase is deleted.

For example, run the following command to delete the productSalesTable table in theproductdb database:

DROP TABLE productdb.productSalesTable;

Step 2 Run the following command to check whether the table is deleted.

SHOW TABLES;

----End

7.10 Using Flume

7.10.1 Introduction

ProcessThe process for collecting logs using Flume is as follows:

1. Install the Flume client.2. Configure the Flume server and client parameters.3. Collect and query logs using the Flume client.4. Stop and uninstall the Flume client.

Flume ClientA Flume client consists of the source, channel, and sink. The source sends the data to thechannel, and then the sink transmits the data from the channel to the external device.


2019-01-15 397

Table 7-11 Module description

Module Description

Source A source receives or generates data and sends the data to one or more channels.Sources can work in either data-driven or polling mode.Typical sources include:l Syslog and Netcat, which are integrated in the system to receive datal Exec and SEQ that generate event data automaticallyl Avro that is used for communication between agentsA source must be associated with at least one channel.

Channel A channel is used to buffer data between a source and a sink. After the sinktransmits the data to the next channel or the destination, the cache is deletedautomatically.The persistency of the channels varies with the channel types:l Memory channel: no persistencyl File channel: persistency implemented based on write-ahead logging (WAL)l JDBC channel: persistency implemented based on the embedded databaseChannels support the transaction feature to ensure simple sequential operations.A channel can work with sources and sinks of any quantity.

Sink A sink transmits data to the next hop or destination. After the transmission iscomplete, it deletes the data from the channel.Typical sinks include:l HDFS and Kafka that store data to the destinationl Null sink that automatically consumes the datal Avro that is used for communication between agentsA sink must be associated with at least one channel.

A Flume client can have multiple sources, channels, and sinks. A source can send data tomultiple channels, and then multiple sinks send the data out of the client.

Multiple Flume clients can be cascaded. That is, a sink can send data to the source of anotherclient.

Supplementary Information1. What are the reliability measures of Flume?

– The transaction mechanism is implemented between sources and channels, andbetween channels and sinks.

– The sink processor supports failover and load balancing.The following is an example of the load balancing configuration:server.sinkgroups=g1server.sinkgroups.g1.sinks=k1 k2server.sinkgroups.g1.processor.type=load_balanceserver.sinkgroups.g1.processor.backoff=trueserver.sinkgroups.g1.processor.selector=random


2019-01-15 398

2. What are the precautions for the aggregation and cascading of multiple Flume clients?– Use the Avro or Thrift protocol for cascading.– When the aggregation end contains multiple nodes, evenly distribute the clients to

these nodes. Do not connect all the clients to a single node.

7.10.2 Installing the Flume Client

ScenarioTo use Flume to collect logs, you must install the Flume client on the log host. You can createan ECS to install the client.

Prerequisitesl A streaming cluster with the Flume component has been created.l The log host is in the same VPC and subnet with the cluster. For details, see Using the

Client on Another Node of a VPC.l You have obtained the username and password for logging in to the log host.

Procedure

Step 1 Create an ECS that meets the requirements in the prerequisites.

Step 2 Log in to MRS Manager. Choose Service > Flume > Download Client.

1. In Client Type, select All client files.2. In Download Path, select Remote host.3. Set Host IP Address to the IP address of the ECS, set Host Port to 22, and set Save

Path to /home/linux.– If the default port 22 for logging in to an ECS using SSH has been changed, set

Host Port to the new port.– Save Path contains a maximum of 256 characters.

4. Set Login User to linux.If other users are used, ensure that they have read, write, and execute permission on thesave path.

5. In SSH Private Key, select and upload the private key used for creating the cluster.6. Click OK to start downloading the client to the ECS.

If the following information is displayed, the client package is successfully saved.Client files downloaded to the remote host successfully.

Step 3 Click Instance. Query the Business IP Address of any Flume instance and any twoMonitorServer instances.

Step 4 Log in to the ECS using VNC. See "Logging In to a Linux ECS Using VNC" in the ElasticCloud Server User Guide (Getting Started > Logging In to an ECS > Logging In to aLinux ECS Using VNC).

All standard (Standard_xxx) and enterprise (Enterprise_xxx) images support Cloud-Init. Thepreset username and password for Cloud-Init is linux and cloud.1234, respectively. If youhave changed the password, log in to the ECS using the new password. See "How Do I Log Into an ECS Once All Images Support Cloud-Init?" in the Elastic Cloud Server User Guide


2019-01-15 399

(FAQs > Login FAQs > How Do I Log In to an ECS Once All Images Support Cloud-Init?). It is recommended that you change the password upon the first login.

Step 5 On the ECS, switch to user root and copy the installation package to the /opt directory.

sudo su - root

cp /home/linux/MRS_Flume_Client.tar /opt

Step 6 Run the following command in the /opt directory to decompress the package and obtain theverification file and the configuration package of the client:

tar -xvf MRS_Flume_Client.tar

Step 7 Run the following command to verify the configuration package of the client:

sha256sum -c MRS_Flume_ClientConfig.tar.sha256

The command output is as follows:

MRS_Flume_ClientConfig.tar: OK

Step 8 Run the following command to decompress MRS_Flume_ClientConfig.tar:

tar -xvf MRS_Flume_ClientConfig.tar

Step 9 Run the following command to install the client running environment to a new directory, forexample, /opt/Flumeenv. The directory is automatically generated during installation.

sh /opt/MRS_Flume_ClientConfig/install.sh /opt/Flumeenv

If the following information is displayed, the client running environment is successfullyinstalled:

Components client installation is complete.

Step 10 Run the following command to configure the environment variable:

source /opt/Flumeenv/bigdata_env

Step 11 Run the following commands to decompress the Flume client package:

cd /opt/MRS_Flume_ClientConfig/Flume

tar -xvf FusionInsight-Flume-1.6.0.tar.gz

Step 12 Run the following command to check whether the password of the current user has expired:

chage -l root

If the value of Password expires is earlier than the current time, the password has expired.Run the chage -M -1 root command to validate the password.

Step 13 Run the following command to install the Flume client to a new directory, for example, /opt/FlumeClient. The directory is automatically generated during installation.

sh /opt/MRS_Flume_ClientConfig/Flume/install.sh -d /opt/FlumeClient -f Service IPaddresses of the MonitorServer instances -c Path of the Flume configuration file -l /var/log/ -e Service IP address of Flume -n Name of the Flume client

The parameters are described as follows:

l -d: indicates the installation path of the Flume client.


2019-01-15 400

l -f: (optional) indicates the service IP addresses of the two MonitorServer instances,separated by a comma. If the IP addresses are not configured, the Flume client will notsend alarm information to MonitorServer, and the client information will not bedisplayed on MRS Manager.

l -c: (optional) indicates the properties.properties configuration file that the Flume clientloads after installation. If this parameter is not specified, the fusioninsight-flume-1.6.0/conf/properties.properties file in the client installation directory is used by default. Theconfiguration file of the client is empty. You can modify properties.properties asrequired and the Flume client will load it automatically.

l -l: (optional) indicates the log directory. The default value is /var/log/Bigdata.l -e: (optional) indicates the service IP address of the Flume instance. It is used to receive

the monitoring indicators reported by the client.l -n: (optional) indicates the name of the Flume client.l IBM JDK does not support -Xloggc. You must change -Xloggc to -Xverbosegclog in

flume/conf/flume-env.sh. For 32-bit JDK, the value of -Xmx must not exceed 3.25 GB.

For example, run sh install.sh -d /opt/FlumeClient.

If the following information is displayed, the client is successfully installed:

install flume client successfully.

----End

7.10.3 Viewing Flume Client Logs

Scenario

This section describes how to locate problems using logs.

Prerequisites

You have correctly installed the Flume client.

Procedure

Step 1 Go to the Flume client log directory (/var/log/Bigdata by default).

Step 2 Run the following command to view the list of log files:

ls -lR flume-client-*

A log file example is shown as follows:

flume-client-1/flume:total 7672-rw-------. 1 root root 0 Sep 8 19:43 Flume-audit.log-rw-------. 1 root root 1562037 Sep 11 06:05 FlumeClient.2017-09-11_04-05-09.[1].log.zip-rw-------. 1 root root 6127274 Sep 11 14:47 FlumeClient.log-rw-------. 1 root root 2935 Sep 8 22:20 flume-root-20170908202009-pid72456-gc.log.0.current-rw-------. 1 root root 2935 Sep 8 22:27 flume-root-20170908202634-pid78789-gc.log.0.current-rw-------. 1 root root 4382 Sep 8 22:47 flume-root-20170908203137-pid84925-gc.log.0.current-rw-------. 1 root root 4390 Sep 8 23:46 flume-root-20170908204918-pid103920-


2019-01-15 401

gc.log.0.current-rw-------. 1 root root 3196 Sep 9 10:12 flume-root-20170908215351-pid44372-gc.log.0.current-rw-------. 1 root root 2935 Sep 9 10:13 flume-root-20170909101233-pid55119-gc.log.0.current-rw-------. 1 root root 6441 Sep 9 11:10 flume-root-20170909101631-pid59301-gc.log.0.current-rw-------. 1 root root 0 Sep 9 11:10 flume-root-20170909111009-pid119477-gc.log.0.current-rw-------. 1 root root 92896 Sep 11 13:24 flume-root-20170909111126-pid120689-gc.log.0.current-rw-------. 1 root root 5588 Sep 11 14:46 flume-root-20170911132445-pid42259-gc.log.0.current-rw-------. 1 root root 2576 Sep 11 13:24 prestartDetail.log-rw-------. 1 root root 3303 Sep 11 13:24 startDetail.log-rw-------. 1 root root 1253 Sep 11 13:24 stopDetail.log

flume-client-1/monitor:total 8-rw-------. 1 root root 141 Sep 8 19:43 flumeMonitorChecker.log-rw-------. 1 root root 2946 Sep 11 13:24 flumeMonitor.log

FlumeClient.log is the run log of the Flume client.

----End

7.10.4 Stopping or Uninstalling the Flume Client

ScenarioThis section describes how to stop and start the Flume client as well as uninstall it when theFlume data collection channel is not required.

Procedurel Stopping the Flume client

Suppose the installation path of the Flume client is /opt/FlumeClient. Run the followingcommand to stop the Flume client:

cd /opt/FlumeClient/fusioninsight-flume-1.6.0/bin

./flume-manage.sh stop

If the following information is displayed after the command execution, the Flume client issuccessfully stopped.

Stop Flume PID=120689 successful..

NOTE

The Flume client will be automatically restarted after being stopped. If you do not need automaticrestart, run the following command:./flume-manage.sh stop forceIf you want to restart the Flume client, run the following command:./flume-manage.sh start force

l Uninstalling the Flume client

Suppose the installation path of the Flume client is /opt/FlumeClient. Run the followingcommand to uninstall the Flume client:

cd /opt/FlumeClient/fusioninsight-flume-1.6.0/inst


2019-01-15 402

./uninstall.sh

7.10.5 Using the Encryption Tool of the Flume Client

Scenario

The Flume client provides an encryption tool to encrypt some parameter values in theconfiguration file.

Prerequisites

You have installed the Flume client.

Procedure

Step 1 Log in to the Flume client node and go to the client installation directory, for example, /opt/FlumeClient.

Step 2 Run the following command to switch the directory:

cd fusioninsight-flume-1.6.0/bin

Step 3 Run the following command to encrypt information:

./genPwFile.sh

Input the information that you want to encrypt twice.

Step 4 Run the following command to query the encrypted information:

cat password.property

NOTE

If the encryption parameter is used for the Flume server, you need to perform encryption on thecorresponding Flume server node. The path for the encryption script is /opt/Bigdata/FusionInsight/FusionInsight-Flume-1.6.0/flume/bin/genPwFile.sh. Execute the encryption script as user omm.

----End

7.10.6 Flume Configuration Parameter Description

Scenario

This section describes how to configure the sources, channels, and sinks of Flume, andmodify the configuration items of each module.

NOTE

You must input encrypted information for some configurations. For details on how to encryptinformation, see Using the Encryption Tool of the Flume Client.

Common Source Configurationsl Avro Source

An Avro source listens to the Avro port, receives data from the external Avro client, andplaces data into configured channels. Common configurations are as follows.


2019-01-15 403

Table 7-12 Common configurations of an Avro source

Parameter DefaultValue

Description

channels - Channel connected to the source.Multiple channels can be configured butmust be separated by spaces.To define the flow within a single agent,you need to link the sources and sinks viaa channel. A source instance can specifymultiple channels, but a sink instance canonly specify one channel.The format is as follows:<Agent >.sources.<Source>.channels =<channel1> <channel2> <channel3>...<Agent >.sinks.<Sink>.channels =<channel1>

type avro Type, which is set to avro. The type ofeach source is fixed.

bind - Bind to the host name or IP address thatis associated with the source.

port - Bound port

ssl false Indicates whether to use SSL encryption.l truel false

truststore-type JKS Java truststore type. Enter JKS or othersupported Java truststore type.

truststore - Java truststore file.

truststore-password - Java truststore password.

keystore-type JKS Keystore type. Enter JKS or othersupported Java keystore type.

keystore - Keystore file.

keystore-password - Keystore password.

l Spooling Source

A Spooling source monitors and transmits new files that have been added to directoriesin quasi-real-time mode. Common configurations are as follows.


2019-01-15 404

Table 7-13 Common configurations of a Spooling source


Description

channels - Channel connected to the source.Multiple channels can be configured.

type spooldir Type, which is set to spooldir.

monTime 0 (disabled) Thread monitoring threshold. When theupdate time (seconds) exceeds thethreshold, the source is restarted.

spoolDir - Monitoring directory.

fileSuffix .COMPLETED

Suffix added after file transmission iscomplete.

deletePolicy never Source file deletion policy after filetransmission is complete. The value canbe either never or immediate.

ignorePattern ^$ Regular expression of a file to be ignored.

trackerDir .flumespool Metadata storage directory duringtransmission.

batchSize 1000 Source transmission granularity.

decodeErrorPolicy FAIL Code error policyThe options are FAIL, REPLACE, andIGNORE.FAIL: Throw an exception and makeresolution fail.REPLACE: Replace unidentifiedcharacters with other characters(typically, U+FFFD).IGNORE: Directly discard characterstrings that fail to be resolved.NOTE

If a code error occurs in the file, setdecodeErrorPolicy to REPLACE orIGNORE. Flume will skip the code error andcontinue to collect subsequent logs.


2019-01-15 405


Description

deserializer LINE File parser. The value can be either LINEor BufferedLine.l When the value is set to LINE,

characters read from the file aretranscoded one by one.

l When the value is set toBufferedLine, one line or multiplelines of characters read from the fileare transcoded in batches, whichdelivers better performance.

deserializer.maxLineLength

2048 Maximum length for resolution by line.The value ranges from 0 to2,147,483,647.

deserializer.maxBatchLine

1 Maximum number of lines for resolutionby line. If multiple lines are set,maxLineLength must be set to acorresponding multiplier.For example, if maxBatchLine is set to2, maxLineLength is set to 4096 (2048 x2) accordingly.

selector.type replicating Selector type. The value can be eitherreplicating or multiplexing.l replicating indicates that the same

content is sent to every channel.l multiplexing indicates that content is

selectively sent to some channelsaccording to the replicatingdistribution rule.

interceptors - InterceptorFor details about configuration, seeFlume User Guide.

NOTE

The Spooling source ignores the last line feed character of each event when data is read by line.Therefore, Flume does not calculate the data volume counters used by the last line feed character.

l Kafka SourceA Kafka source consumes data from Kafka topics. Multiple sources can consume data ofthe same topic, and the sources consume different partitions of the topic. Commonconfigurations are as follows.


2019-01-15 406

https://flume.apache.org/FlumeUserGuide.html#flume-interceptors

Table 7-14 Common configurations of a Kafka source


Description


type org.apache.flume.source.kafka.KafkaSource

Type, which is set toorg.apache.flume.source.kafka.KafkaSource.


nodatatime 0 (disabled) Alarm threshold. An alarm is triggeredwhen the duration (seconds) that Kafkadoes not release data to subscribersexceeds the threshold.

batchSize 1000 Number of events written into a channelat a time.

batchDurationMillis 1000 Maximum duration of topic dataconsumption at a time. The unit ismillisecond.

keepTopicInHeader false Indicates whether to save topics in theevent header. If topics are saved, topicsconfigured in Kafka sinks becomeinvalid.l truel false

keepPartitionInHeader false Indicates whether to save partition IDs inthe event header. If partition IDs aresaved, Kafka sinks write data to thecorresponding partitions.l truel false

kafka.bootstrap.servers - List of Broker addresses, which areseparated by commas.

kafka.consumer.group.id

- Kafka consumer group ID.

kafka.topics - List of subscribed Kafka topics, whichare separated by commas.

kafka.topics.regex - Subscribed topics that comply withregular expressions. kafka.topics.regexhas a higher priority than kafka.topicsand will overwrite kafka.topics.


2019-01-15 407


Description

kafka.security.protocol SASL_PLAINTEXT

Security protocol of Kafka. The valuemust be set to PLAINTEXT for clustersin which Kerberos authentication isdisabled.

Other Kafka ConsumerProperties

- Other Kafka configurations. Thisparameter can be set to any consumptionconfiguration supported by Kafka, andthe .kafka prefix must be added to theconfiguration.

l Taildir Source

A Taildir source monitors file changes in a directory and automatically reads the filecontent. In addition, it can transmit data in real time. Common configurations are asfollows.

Table 7-15 Common configurations of a Taildir source


Description


type taildir Type, which is set to taildir.

filegroups - Group name of a collection file directory.Group names are separated by spaces.

filegroups.<filegroupName>.parentDir

- Parent directory. The value must be anabsolute path.

filegroups.<filegroupName>.filePattern

- Relative file path of the file group'sparent directory. Directories can beincluded and regular expressions aresupported. It must be used together withparentDir.

positionFile - Metadata storage directory duringtransmission.

headers.<filegroupName>.<headerKey>

- Key-value of an event when data of agroup is being collected.

byteOffsetHeader false Indicates whether each event headershould contain the location informationabout the event in the source file. Thelocation information is saved in thebyteoffset variable.


2019-01-15 408


Description

skipToEnd false Indicates whether Flume can locate thelatest location of a file and read the latestdata after restart.

idleTimeout 120000 Idle period during file reading, expressedin milliseconds. If the file data is notchanged in this idle period, the sourcecloses the file. If data is written into thisfile after it is closed, the source opens thefile and reads data.

writePosInterval 3000 Interval for writing metadata to a file,expressed in milliseconds.

batchSize 1000 Number of events written into a channelin a batch.


l HTTP SourceAn HTTP source receives data from an external HTTP client and sends the data to theconfigured channels. Common configurations are as follows.

Table 7-16 Common configurations of an HTTP source


Description


type http Type, which is set to http.

bind - Name or IP address of the bound host

port - Bound port

handler org.apache.flume.source.http.JSONHandler

Message parsing method of an HTTPrequest. The following methods aresupported:l org.apache.flume.source.http.JSON

Handler: JSONl org.apache.flume.sink.solr.morphlin

e.BlobHandler: BLOB

handler.* - Handler parameters.

enableSSL false Indicates whether SSL is enabled inHTTP.


2019-01-15 409


Description

keystore - Keystore path after SSL is enabled inHTTP.

keystorePassword - Keystore password after SSL is enabledin HTTP.

l OBS Source

An OBS source monitors and transmits new files that have been added to specifiedbuckets in quasi-real-time mode. Common configurations are as follows.

Table 7-17 Common configurations of an OBS source


Description


type http Type, which is set toorg.apache.flume.source.s3.OBSSource.

bucketName - OBS bucket name.

prefix - Monitored OBS path of the specifiedbucket. The path cannot start with a slash(/). If this parameter is not set, the rootdirectory of the bucket will be monitoredby default.

accessKey - User AK information.

secretKey - User SK information in ciphertext.

backingDir - Metadata storage directory duringtransmission.

endPoint - OBS access address. The address must bein the same region as MRS. The valuecan be either a domain name or an IPaddress.

basenameHeader false Indicates whether to save file names inthe event header. false indicates that filenames are not saved.

basenameHeaderKey basename Name of the field that the event headeruses to save a file name, which is alsocalled the key name.

batchSize 1000 Source transmission granularity.


2019-01-15 410


Description

decodeErrorPolicy FAIL Code error policyNOTE

If a code error occurs in the file, setdecodeErrorPolicy to REPLACE orIGNORE. Flume will skip the code error andcontinue to collect subsequent logs.

deserializer LINE File parser. The value can be either LINEor BufferedLine.l When the value is set to LINE,

characters read from the file aretranscoded one by one.

l When the value is set toBufferedLine, one line or multiplelines of characters read from the fileare transcoded in batches, whichdelivers better performance.

deserializer.maxLineLength

2048 Maximum length for resolution by line.

deserializer.maxBatchLine

1 Maximum number of lines for resolutionby line. If multiple lines are set,maxLineLength must be set to acorresponding multiplier.

selector.type replicating Selector type. The value can be eitherreplicating or multiplexing.

interceptors - Interceptor

Common Channel Configurationsl Memory Channel

A memory channel uses memory as the cache. Events are stored in memory queues.Common configurations are as follows.

Table 7-18 Common configurations of a memory channel


Description

type - Type, which is set to memory.

capacity 10000 Maximum number of events cached in achannel.

transactionCapacity 1000 Maximum number of events accessedeach time.


2019-01-15 411


Description

channelfullcount 10 Channel full count. When the countreaches the threshold, an alarm isreported.

l File Channel

A file channel uses local disks as the cache. Events are stored in the folder specified bydataDirs. Common configurations are as follows.

Table 7-19 Common configurations of a file channel


Description

type - Type, which is set to file.

checkpointDir ${BIGDATA_DATA_HOME}/flume/checkpoint

Checkpoint storage directory.

dataDirs ${BIGDATA_DATA_HOME}/flume/data

Data cache directory. Multiple directoriescan be configured to improveperformance. The directories areseparated by commas (,).

maxFileSize 2146435071 Maximum size of a single cache file. Theunit is byte.

minimumRequired-Space

524288000 Minimum idle space in the cache. Theunit is byte.

capacity 1000000 Maximum number of events cached in achannel.

transactionCapacity 10000 Maximum number of events accessedeach time.


l Memory File Channel

A memory file channel uses both memory and local disks as its cache and supportsmessage persistence. It provides similar performance as a memory channel and betterperformance than a file channel. Common configurations are as follows.


2019-01-15 412

Table 7-20 Common configurations of a memory file channel


Description

type org.apache.flume.channel.MemoryFileChannel

Type, which is set toorg.apache.flume.channel.MemoryFileChannel.

capacity 50000 Channel cache: maximum number ofevents cached in a channel.

transactionCapacity 5000 Transaction cache: maximum number ofevents processed by a transaction.l The parameter value must be greater

than the batchSize of the source andsink.

l The value of transactionCapacitymust be less than or equal to that ofcapacity.

subqueueByteCapacity 20971520 Maximum size (bytes) of events that canbe stored in a subqueue.A memory file channel uses both queuesand subqueues to cache data. Events arestored in a subqueue, and subqueues arestored in a queue.subqueueCapacity andsubqueueInterval determine the size ofevents that can be stored in a subqueue.subqueueCapacity specifies the capacityof a subqueue, and subqueueIntervalspecifies the duration that a subqueue canstore events. Events in a subqueue aresent to the destination only after thesubqueue reaches the upper limit ofsubqueueCapacity orsubqueueInterval.NOTE

The value of subqueueByteCapacity must begreater than the number of events specified bybatchSize.

subqueueInterval 2000 Maximum duration (milliseconds) that asubqueue can store events.

keep-alive 3 Waiting time of the Put and Take threadswhen the transaction or channel cache isfull. The unit is second.

dataDir - Cache directory for local files.


2019-01-15 413


Description

byteCapacity 80% of themaximumJVM memory

Channel cache capacity. Unit: byte

compression-type None Message compression format. The valuecan be either None or Snappy. When theformat is Snappy, event message bodiesthat are compressed in the Snappy formatcan be decompressed.


The following is a configuration example of a memory file channel:server.channels.c1.type = org.apache.flume.channel.MemoryFileChannelserver.channels.c1.dataDir = /opt/flume/mfdataserver.channels.c1.subqueueByteCapacity = 20971520server.channels.c1.subqueueInterval=2000server.channels.c1.capacity = 500000server.channels.c1.transactionCapacity = 40000

l Kafka ChannelA Kafka channel uses a Kafka cluster as the cache. Kafka provides high availability andmultiple copies to prevent data from being immediately consumed by sinks when Flumeor Kafka Broker crashes.

Table 7-21 Common configurations of a Kafka channel


Description

type - Type, which is set toorg.apache.flume.channel.kafka.KafkaChannel.

kafka.bootstrap.servers - List of Brokers in the Kafka cluster.

kafka.topic flume-channel Kafka topic used by the channel to cachedata.

kafka.consumer.group.id

flume Kafka consumer group ID.

parseAsFlumeEvent true Indicates whether data is parsed intoFlume events.

migrateZookeeperOff-sets

true Indicates whether to search for offsets inZooKeeper and submits them to Kafkawhen there is no offset in Kafka.


2019-01-15 414


Description

kafka.consumer.auto.offset.reset

latest Consumes data from the specifiedlocation when there is no offset.

kafka.producer.security.protocol

SASL_PLAINTEXT

Kafka producer security protocol.

kafka.consumer.security.protocol

SASL_PLAINTEXT

Kafka consumer security protocol.

Common Sink Configurationsl HDFS Sink

An HDFS sink writes data into HDFS. Common configurations are as follows.

Table 7-22 Common configurations of an HDFS sink


Description

channel - Channel connected to the sink.

type hdfs Type, which is set to hdfs.

monTime 0 (disabled) Thread monitoring threshold. When theupdate time (seconds) exceeds thethreshold, the sink is restarted.

hdfs.path - HDFS path.

hdfs.inUseSuffix .tmp Suffix of the HDFS file being written.

hdfs.rollInterval 30 Interval for file rolling. The unit issecond.

hdfs.rollSize 1024 Size for file rolling. The unit is byte.

hdfs.rollCount 10 Number of events for file rolling.

hdfs.idleTimeout 0 Timeout interval for closing idle filesautomatically. The unit is second.

hdfs.batchSize 1000 Number of events written into HDFS at atime.

hdfs.kerberosPrincipal - Kerberos username for HDFSauthentication. This parameter is notrequired for a cluster in which Kerberosauthentication is disabled.


2019-01-15 415


Description

hdfs.kerberosKeytab - Kerberos keytab for HDFSauthentication. This parameter is notrequired for a cluster in which Kerberosauthentication is disabled.

hdfs.fileCloseByEndE-vent

true Indicates whether the file is closed whenthe last event is received.

hdfs.batchCallTimeout - Timeout control duration (milliseconds)each time events are written into HDFS.If this parameter is not specified, thetimeout duration is controlled when eachevent is written into HDFS. When thevalue of hdfs.batchSize is greater than 0,configure this parameter to improve theperformance of writing data into HDFS.NOTE

The value of hdfs.batchCallTimeoutdepends on hdfs.batchSize. A greaterhdfs.batchSize requires a largerhdfs.batchCallTimeout. If the value ofhdfs.batchCallTimeout is too small, writingevents to HDFS may fail.

serializer.appendNewline

true Indicates whether to add a line feedcharacter (\n) after an event is written toHDFS. If a line feed character is added,the data volume counters used by the linefeed character will not be calculated byHDFS sinks.

l Avro Sink

An Avro sink converts events into Avro events and sends them to the monitoring ports ofthe hosts. Common configurations are as follows.

Table 7-23 Common configurations of an Avro sink


Description


type - Type, which is set to avro.

hostname - Name or IP address of the bound host

port - Monitoring port

batch-size 1000 Number of events sent in a batch.

ssl false Indicates whether to use SSL encryption.


2019-01-15 416


Description

truststore-type JKS Java truststore type.

truststore - Java truststore file.

truststore-password - Java truststore password.

keystore-type JKS Keystore type.

keystore - Keystore file.

keystore-password - Keystore password.

l HBase Sink

An HBase sink writes data into HBase. Common configurations are as follows.

Table 7-24 Common configurations of an HBase sink


Description


type - Type, which is set to hbase.

table - HBase table name.


columnFamily - HBase column family.

batchSize 1000 Number of events written into HBase at atime.

kerberosPrincipal - Kerberos username for HBaseauthentication. This parameter is notrequired for a cluster in which Kerberosauthentication is disabled.

kerberosKeytab - Kerberos keytab for HBaseauthentication. This parameter is notrequired for a cluster in which Kerberosauthentication is disabled.

l Kafka Sink

A Kafka sink writes data into Kafka. Common configurations are as follows.


2019-01-15 417

Table 7-25 Common configurations of a Kafka sink


Description


type - Type, which is set toorg.apache.flume.sink.kafka.KafkaSink.

kafka.bootstrap.servers - List of Kafka Brokers, which areseparated by commas.


kafka.topic default-flume-topic

Topic where data is written.

flumeBatchSize 1000 Number of events written into Kafka at atime.

kafka.security.protocol SASL_PLAINTEXT

Security protocol of Kafka. The valuemust be set to PLAINTEXT for clustersin which Kerberos authentication isdisabled.

Other Kafka ProducerProperties

- Other Kafka configurations. Thisparameter can be set to any productionconfiguration supported by Kafka, andthe .kafka prefix must be added to theconfiguration.

l OBS Sink

An OBS sink writes data into OBS. As OBS sink and HDFS sink use the same filesystem interface, their parameter configurations are similar. The following tableprovides common configurations of an OBS sink:

Table 7-26 Common configurations of an OBS sink


Description


type hdfs Type, which is set to hdfs.



2019-01-15 418


Description

hdfs.path - OBS path in the s3a://AK:SK@Bucket/Path/ format, for example, s3a://AK:SK@obs-nemon-sink/obs-sink/

hdfs.inUseSuffix .tmp Suffix of the OBS file being written.

hdfs.rollInterval 30 Interval for file rolling. The unit issecond.

hdfs.rollSize 1024 Size for file rolling. The unit is byte.

hdfs.rollCount 10 Number of events for file rolling.

hdfs.idleTimeout 0 Timeout interval for closing idle filesautomatically. The unit is second.

hdfs.batchSize 1000 Number of events written into OBS at atime.

hdfs.calltimeout 10000 Timeout interval for interaction withOBS. The unit is millisecond. Thetimeout interval must be as maximum aspossible, for example, 1000000, becausefiles are copied when some operations(such as OBS renaming) are performed,which requires a long time.

hdfs.fileCloseByEndE-vent

true Indicates whether the file is closed whenthe last event is received.

hdfs.batchCallTimeout - Timeout control duration (milliseconds)each time events are written into OBS.If this parameter is not specified, thetimeout duration is controlled when eachevent is written into OBS. When thevalue of hdfs.batchSize is greater than 0,configure this parameter to improve theperformance of writing data into OBS.NOTE

The value of hdfs.batchCallTimeoutdepends on hdfs.batchSize. A greaterhdfs.batchSize requires a largerhdfs.batchCallTimeout. If the value ofhdfs.batchCallTimeout is too small, writingevents to OBS may fail.

serializer.appendNewline

true Indicates whether to add a line feedcharacter (\n) after an event is written toOBS. If a line feed character is added, thedata volume counters used by the linefeed character will not be calculated byOBS sinks.


2019-01-15 419

7.10.7 Example: Using Flume to Collect Logs and Import Them toKafka

Scenario

This section describes how to use Flume to import log information to Kafka.

Prerequisitesl A streaming cluster with Kerberos authentication enabled has been created.l The Flume client has been installed on the node where logs are generated. For details,

see Installing the Flume Client.l The streaming cluster can properly communicate with the node where logs are generated.

Procedure

Step 1 Copy the configuration file of the authentication server from the Master1 node to the confdirectory on the Flume client node.

The full path of the configuration file on the Master1 node is /opt/Bigdata/FusionInsight/etc/1_X_KerberosClient/kdc.conf. X is a random number. The file must be saved by the userwho installs the Flume client, for example, user root.

Step 2 Log in to MRS Manager. Choose Service > Flume > Instance. Query the Business IPAddress of any node on which the Flume role is deployed.

Step 3 Copy the user authentication file from this node to the conf directory on the Flume clientnode.

The full path of the file is /opt/Bigdata/FusionInsight/FusionInsight-Flume-1.6.0/flume/conf/flume-keytab. The file must be saved by the user who installs the Flume client, forexample, user root.

Step 4 Copy the jaas.conf file from this node to the conf directory on the Flume client node.

The full path of the jaas.conf file is /opt/Bigdata/FusionInsight/etc/1_X_Flume/jaas.conf.X is a random number. The file must be saved by the user who installs the Flume client, forexample, user root.

Step 5 Log in to the Flume client node and go to the client installation directory. Run the followingcommand to edit the file:

vi conf/jaas.conf

Set the keyTab parameter to the full path of the user authentication file on the Flume client.Then save and exit the file.

Step 6 Run the following command to modify the flume-env.sh configuration file of the Flumeclient:

vi Flume client installation directory/fusioninsight-flume-1.6.0/conf/flume-env.sh

Add the following information after -XX:+UseCMSCompactAtFullCollection:

-Djava.security.krb5.conf=Flume client installation directory/fusioninsight-flume-1.6.0/conf/kdc.conf -Djava.security.auth.login.config=Flume client installation directory/fusioninsight-flume-1.6.0/conf/jaas.conf -


2019-01-15 420

Dzookeeper.server.principal=flume/hadoop.hadoop.com -Dzookeeper.request.timeout=120000

Change Flume client installation directory to the actual one. Then save and exit the file.

Step 7 Run the following command to modify the properties.properties configuration file of theFlume client:

vi Flume client installation directory/fusioninsight-flume-1.6.0/conf/properties.properties

Add the following information to the file:#########################################################################################client.sources = static_log_source client.channels = static_log_channel client.sinks = kafka_sink##########################################################################################LOG_TO_HDFS_ONLINE_1

client.sources.static_log_source.type = spooldirclient.sources.static_log_source.spoolDir = PATHclient.sources.static_log_source.fileSuffix = .COMPLETEDclient.sources.static_log_source.ignorePattern = ^$client.sources.static_log_source.trackerDir = PATHclient.sources.static_log_source.maxBlobLength = 16384client.sources.static_log_source.batchSize = 51200client.sources.static_log_source.inputCharset = UTF-8client.sources.static_log_source.deserializer = LINEclient.sources.static_log_source.selector.type = replicatingclient.sources.static_log_source.fileHeaderKey = fileclient.sources.static_log_source.fileHeader = falseclient.sources.static_log_source.basenameHeader = trueclient.sources.static_log_source.basenameHeaderKey = basenameclient.sources.static_log_source.deletePolicy = never

client.channels.static_log_channel.type = fileclient.channels.static_log_channel.dataDirs = PATHclient.channels.static_log_channel.checkpointDir = PATHclient.channels.static_log_channel.maxFileSize = 2146435071client.channels.static_log_channel.capacity = 1000000client.channels.static_log_channel.transactionCapacity = 612000client.channels.static_log_channel.minimumRequiredSpace = 524288000

client.sinks.kafka_sink.type = org.apache.flume.sink.kafka.KafkaSinkclient.sinks.kafka_sink.kafka.topic = flume_testclient.sinks.kafka_sink.kafka.bootstrap.servers = XXX.XXX.XXX.XXX:21007,XXX.XXX.XXX.XXX:21007,XXX.XXX.XXX.XXX:21007client.sinks.kafka_sink.flumeBatchSize = 1000client.sinks.kafka_sink.kafka.producer.type = syncclient.sinks.kafka_sink.kafka.security.protocol = SASL_PLAINTEXTclient.sinks.kafka_sink.kafka.kerberos.domain.name = hadoop.XXX.comclient.sinks.kafka_sink.requiredAcks = 0

client.sources.static_log_source.channels = static_log_channelclient.sinks.kafka_sink.channel = static_log_channel

Modify the following parameters as required. Then save and exit the file.

l spoolDirl trackerDirl dataDirsl checkpointDirl topic

If the topic does not exist in Kafka, it will be automatically created by default.


2019-01-15 421

l kafka.bootstrap.serversl kafka.kerberos.domain.name

This parameter value is the value of default_realm of kerberos in the Kafka cluster.

Step 8 The Flume client automatically loads the information in the properties.properties file.

After new log files are generated in the directory specified by spoolDir, the logs will be sentto Kafka producers and can be consumed by Kafka consumers.

----End

7.10.8 Example: Using Flume to Collect Logs and Import Them toOBS

Scenario

This section describes how to use Flume to import log information to OBS.

Prerequisitesl A streaming cluster has been created.l The Flume client has been installed on the node where logs are generated. For details,

see Installing the Flume Client.l The streaming cluster can properly communicate with the node where logs are generated.l The node where logs are generated can parse the domain name of OBS.

Procedure

Step 1 Create the core-site.xml file and save it to the conf directory of the Flume client.

An example of parameter file content is as follows:

<?xml version="1.0" encoding="UTF-8"?><configuration><property><name>fs.s3a.connection.ssl.enabled</name><value>true</value></property><property><name>fs.s3a.endpoint</name><value></value></property></configuration>

The value of fs.s3a.endpoint is an OBS access address. The address must be in the sameregion as MRS. The parameter value can be either a domain name or an IP address. On MRSManager, you can choose Service > Flume > Service Configuration, set Type to All, andview the value of s3service.s3-endpoint in S3service.

Step 2 Encrypt SK using the encryption tool of the Flume client. For details, see Using theEncryption Tool of the Flume Client.


vi conf/fusioninsight-flume-1.6.0/conf/properties.properties


2019-01-15 422

Add the following information to the file:

client.sources = linuxclient.channels = flumeclient.sinks = obs

client.sources.linux.type = spooldirclient.sources.linux.spoolDir = /tmp/nemonclient.sources.linux.montime = client.sources.linux.fileSuffix = .COMPLETEDclient.sources.linux.deletePolicy = neverclient.sources.linux.trackerDir = .flumespoolclient.sources.linux.ignorePattern = ^$client.sources.linux.batchSize = 1000client.sources.linux.inputCharset = UTF-8client.sources.linux.selector.type = replicatingclient.sources.linux.fileHeader = falseclient.sources.linux.fileHeaderKey = fileclient.sources.linux.basenameHeader = trueclient.sources.linux.basenameHeaderKey = basenameclient.sources.linux.deserializer = LINEclient.sources.linux.deserializer.maxBatchLine = 1client.sources.linux.deserializer.maxLineLength = 2048client.sources.linux.channels = flume

client.channels.flume.type = memoryclient.channels.flume.capacity = 10000client.channels.flume.transactionCapacity = 1000client.channels.flume.channelfullcount = 10client.channels.flume.keep-alive = 3client.channels.flume.byteCapacity = client.channels.flume.byteCapacityBufferPercentage = 20

client.sinks.obs.type = hdfsclient.sinks.obs.hdfs.path = s3a://AK:SK@obs-nemon-sink/obs-sinkclient.sinks.obs.montime = client.sinks.obs.hdfs.filePrefix = obs_%{basename}client.sinks.obs.hdfs.fileSuffix = client.sinks.obs.hdfs.inUsePrefix = client.sinks.obs.hdfs.inUseSuffix = .tmpclient.sinks.obs.hdfs.idleTimeout = 0client.sinks.obs.hdfs.batchSize = 1000client.sinks.obs.hdfs.codeC = client.sinks.obs.hdfs.fileType = DataStreamclient.sinks.obs.hdfs.maxOpenFiles = 5000client.sinks.obs.hdfs.writeFormat = Writableclient.sinks.obs.hdfs.callTimeout = 1000000client.sinks.obs.hdfs.threadsPoolSize = 10client.sinks.obs.hdfs.rollTimerPoolSize = 1client.sinks.obs.hdfs.round = falseclient.sinks.obs.hdfs.roundUnit = secondclient.sinks.obs.hdfs.useLocalTimeStamp = falseclient.sinks.obs.hdfs.failcount = 10client.sinks.obs.hdfs.fileCloseByEndEvent = trueclient.sinks.obs.hdfs.rollInterval = 30client.sinks.obs.hdfs.rollSize = 1024client.sinks.obs.hdfs.rollCount = 10client.sinks.obs.hdfs.batchCallTimeout = 0client.sinks.obs.serializer.appendNewline = trueclient.sinks.obs.channel = flume


l spoolDirl trackerDirl hdfs.path (AK and SK in the path must be actual values. SK is the encrypted one.)


2019-01-15 423


After new log files are generated in the directory specified by spoolDir, the logs will be sentto OBS.

----End

7.10.9 Example: Using Flume to Read OBS Files and Upload Themto HDFS

Scenario

This section describes how to use Flume to read the specified OBS directory and upload filesto HDFS.

Prerequisitesl A streaming cluster has been created.l The Flume client has been installed on the client node. For details, see Installing the

Flume Client.l The client node can properly communicate with the streaming cluster and HDFS cluster

nodes, including master and core nodes.l The client node can parse the domain name of OBS.

Procedure

Step 1 Copy the core-site.xml and hdfs-site.xml files from the HDFS cluster client to the Flumeclient installation directory/fusioninsight-flume-1.6.0/conf directory of the Flume clientnode.

Generally, the core-site.xml and hdfs-site.xml files are saved in the /HDFS/hadoop/etc/hadoop/ HDFS client installation directory.

Step 2 Download a user's authentication credential file from the HDFS cluster.

1. On MRS Manager, click System.2. In the Permission area, click Manage User.3. Select the user from the user list and click More to download the user's authentication

credential file.4. Decompress the authentication credential file to obtain the krb5.conf and user.keytab

files.

Step 3 Copy the krb5.conf and user.keytab files to the Flume client installation directory/fusioninsight-flume-1.6.0/conf directory of the Flume client node.

Step 4 Modify the flume-env.sh Flume client configuration file.

Run the following command to edit the flume-env.sh file:

vi Flume client installation directory/fusioninsight-flume-1.6.0/conf/flume-env.sh

Add the following information after -XX:+UseCMSCompactAtFullCollection.-Djava.security.krb5.conf=Flume client installation directory/fusioninsight-flume-1.6.0/conf/krb5.conf


2019-01-15 424

Change Flume client installation directory to the actual one. Then save and exit theconfiguration file.

Step 5 Add content that matches host in the /etc/hosts file of the HDFS cluster to the /etc/hosts fileof the Flume client node.

Step 6 Restart the Flume client.

Suppose the Flume client installation directory is /opt/FlumeClient. Run the followingcommand to restart the Flume client:

cd /opt/FlumeClient/fusioninsight-flume-1.6.0/bin

./flume-manage.sh restart

Step 7 Encrypt SK using the encryption tool of the Flume client. For details, see Using theEncryption Tool of the Flume Client.


vi Flume client installation directory/fusioninsight-flume-1.6.0/conf/properties.properties

Add the following information to the properties.properties file:

client.sources = obsclient.channels = flumeclient.sinks = hdfs

client.sources.obs.type=org.apache.flume.source.s3.OBSSourceclient.sources.obs.bucketName = obs-nemon-sinkclient.sources.obs.prefix = obs-source/client.sources.obs.accessKey = AK client.sources.obs.secretKey = SK client.sources.obs.backingDir = /tmp/obs/client.sources.obs.endPoint = client.sources.obs.basenameHeader = trueclient.sources.obs.basenameHeaderKey = basenameclient.sources.obs.channels = flume

client.channels.flume.type = memoryclient.channels.flume.capacity = 10000client.channels.flume.transactionCapacity = 1000client.channels.flume.channelfullcount = 10client.channels.flume.keep-alive = 3client.channels.flume.byteCapacity = client.channels.flume.byteCapacityBufferPercentage = 20

client.sinks.hdfs.type = hdfsclient.sinks.hdfs.hdfs.path = hdfs://hacluster/tmpclient.sinks.hdfs.montime = client.sinks.hdfs.hdfs.filePrefix = over_%{basename}client.sinks.hdfs.hdfs.fileSuffix = client.sinks.hdfs.hdfs.inUsePrefix = client.sinks.hdfs.hdfs.inUseSuffix = .tmpclient.sinks.hdfs.hdfs.idleTimeout = 0client.sinks.hdfs.hdfs.batchSize = 1000client.sinks.hdfs.hdfs.codeC = client.sinks.hdfs.hdfs.fileType = DataStreamclient.sinks.hdfs.hdfs.maxOpenFiles = 5000client.sinks.hdfs.hdfs.writeFormat = Writableclient.sinks.hdfs.hdfs.callTimeout = 10000client.sinks.hdfs.hdfs.threadsPoolSize = 10client.sinks.hdfs.hdfs.rollTimerPoolSize = 1client.sinks.hdfs.hdfs.kerberosPrincipal = adminclient.sinks.hdfs.hdfs.kerberosKeytab = /opt/FlumeClient/fusioninsight-


2019-01-15 425

flume-1.6.0/conf/user.keytabclient.sinks.hdfs.hdfs.round = falseclient.sinks.hdfs.hdfs.roundUnit = secondclient.sinks.hdfs.hdfs.useLocalTimeStamp = falseclient.sinks.hdfs.hdfs.failcount = 10client.sinks.hdfs.hdfs.fileCloseByEndEvent = trueclient.sinks.hdfs.hdfs.rollInterval = 30client.sinks.hdfs.hdfs.rollSize = 1024client.sinks.hdfs.hdfs.rollCount = 10client.sinks.hdfs.hdfs.batchCallTimeout = 0client.sinks.hdfs.serializer.appendNewline = trueclient.sinks.hdfs.channel = flume


l bucketName

l prefix

l backingDir

l endPoint

l accessKey

(Enter the actual AK value.)

l sercretKey

(Enter the actual encrypted SK value.)

l kerberosPrincipal

(The username must be configured in the security cluster.)

l kerberosKeytab

(The absolute path of the user's authentication credential file must be configured in thesecurity cluster.)


After new log files are generated in the prefix directory under bucketName, the logs will besent to OBS.

----End

7.11 Using Loader

7.11.1 Introduction

Process

The process for migrating user data with Loader is as follows:

1. Access the Loader page of the Hue WebUI.

2. Manage Loader links.

3. Create a job and select a data source link and a link for saving data.

4. Run the job to complete data migration.


2019-01-15 426

Loader Page

The Loader page is a graphical data migration management tool based on the open sourceSqoop WebUI and is hosted on the Hue WebUI. Perform the following operations to accessthe Loader page:

1. Access the Hue WebUI. For details, see Accessing the UI of the Open SourceComponent.

2. Choose Data Browsers > Sqoop.

The job management tab page is displayed by default on the Loader page.

Loader Link

Loader links save data location information. Loader uses links to access data or save data tothe specified location. Perform the following operations to access the Loader linkmanagement page:

1. Access the Loader page.

2. Click Manage links.

The Loader link management page is displayed.

You can click Manage jobs to return to the job management page.

3. Click New link to go to the configuration page and set parameters to create a Loaderlink.

Loader Job

Loader jobs are used to manage data migration tasks. Each job consists of a source data linkand a destination data link. A job reads data from the source link and saves data to thedestination link to complete a data migration task.

7.11.2 Loader Link Configuration

Overview

Loader supports the following links. This section describes configurations of each link.

l obs-connector

l generic-jdbc-connector

l ftp-connector or sftp-connector

l hbase-connector, hdfs-connector, or hive-connector

l voltdb-connector

OBS Link

An OBS link is a data exchange channel between Loader and OBS. Table 7-27 describes theconfiguration parameters.


2019-01-15 427

Table 7-27 obs-connector configuration


Name Name of a Loader link

OBS Server Enter the OBS endpoint address. The common format isOBS.Region.DomainName.For example, run the following command to view the OBS endpointaddress:cat /opt/Bigdata/apache-tomcat-7.0.78/webapps/web/WEB-INF/classes/cloud-obs.properties

Port Port for accessing OBS data. The default value is 5443.

Access Key AK for a user to access OBS

Security Key SK corresponding to AK

Relational Database Link

A relational database link is a data exchange channel between Loader and a relationaldatabase. Table 7-28 describes the configuration parameters.

NOTE

Some parameters are hidden by default. They appear only after you click Show Senior Parameter.

Table 7-28 generic-jdbc-connector configuration



Database type Data types supported by Loader links: ORACLE, MYSQL, andMPPDB

Host Database access address, which can be an IP address or domainname

Port Port for accessing the database

Database Name of the database saving data

Username Username for accessing the database

Password Password of the user. Use the actual password.

Table 7-29 Senior parameter configuration


Fetch Size A maximum volume of data obtained during each database access


2019-01-15 428


ConnectionProperties

Driver properties exclusive to the database link that is supported bydatabases of different types, for example, autoReconnect ofMYSQL. If you want to define the driver properties, click Add.

Identifier enclose Delimiter for reserving keywords in the database SQL. Delimitersdefined in different databases vary.

File Server LinkFile server links include FTP and SFTP links and serve as a data exchange channel betweenLoader and a file server. Table 7-30 describes the configuration parameters.

Table 7-30 ftp-connector or sftp-connector configuration



Hostname/IP Enter the file server access address, which can be a host name or IPaddress.

Port Port for accessing the file server.l Use port 21 for FTP.l Use port 22 for SFTP.

Username Username for logging in to the file server.

Password Password of the user

MRS Cluster LinkMRS cluster links include HBase, HDFS, and Hive links and serve as a data exchangechannel between Loader and data.

When configuring an MRS cluster name and link, select a connector, for example, hbase-connector, hdfs-connector, or hive-connector, and save it.

VoltDB LinkA VoltDB link is a data exchange channel between Loader and an in-memory database. Table7-31 describes the configuration parameters.

NOTE

Some parameters are hidden by default. They appear only after you click Show Senior Parameter.


2019-01-15 429

Table 7-31 voltdb-connector configuration



Database servers Database access address, which can be an IP address or domainname. You can configure multiple database addresses and separatethem by comma (,).

Port Port for accessing the database

Username Username for accessing the database

Password Password of the user. Use the actual password.

Table 7-32 Senior parameter configuration


ConnectionProperties

Delimiter for reserving keywords in the memory database SQL

7.11.3 Managing Loader Links

Scenario

You can create, view, edit, and delete links on the Loader page.

Prerequisites

You have accessed the Loader page. For details, see Loader Page.

Creating a Link

Step 1 On the Loader page, click Manage links.

Step 2 Click New link and configure link parameters.

For details about the parameters, see Loader Link Configuration.

Step 3 Click Save.

If link configurations, for example, IP address, port, and access user information, areincorrect, the link will fail to be verified and saved. In addition, VPC configurations mayaffect the network connectivity.

NOTE

You can click Test to immediately check whether the link is available.

----End


2019-01-15 430

Viewing a Link


l If Kerberos authentication is enabled in the cluster, all links created by the current userare displayed by default and other users' links cannot be displayed.

l If Kerberos authentication is disabled in the cluster, all Loader links of the cluster aredisplayed.

Step 2 In the search box of the Sqoop Links, enter a link name to filter the link.

----End

Editing a Link


Step 2 Click the link name to go to the edit page.

Step 3 Modify the link configuration parameters based on service requirements.

Step 4 Click Test.

If the test is successful, go to Step 5. If OBS Server fails to be connected, repeat Step 3.

Step 5 Click Save.

If a Loader job has integrated into a Loader link, editing the link parameters may affectLoader running.

----End

Deleting a Link


Step 2 In the line of the link, click Delete.

Step 3 In the dialog box, click Yes, delete it.

If a Loader job has been integrated into a Loader link, the deletion of the Loader link is notallowed.

----End

7.11.4 Source Link Configurations of Loader Jobs

Overview

When Loader jobs obtain data from different data sources, a link corresponding to a datasource type needs to be selected and the link properties need to be configured.


2019-01-15 431

obs-connector

Table 7-33 Data source link properties of obs-connector


Bucket Name OBS bucket for storing source data

Input directory orfile

Actual storage form of source data. It can be either all data files in adirectory or single data file contained in the bucket.

File format Loader supports the following file formats of data stored in OBS:l CSV_FILE: Specifies a text file. When the destination link is a

database link, only the text file is supported.l BINARY_FILE: Specifies binary files excluding text files.

Line Separator Identifier of each line end of source data

Field Separator Identifier of each field end of source data

Encode type Text encoding type of source data. It takes effect for text files only.

File split type The following types are supported:l File: The number of files is assigned to a map task by the total

number of files. The calculation formula is Total number offiles/Extractors.

l Size: A file size is assigned to a map task by the total file size.The calculation formula is Total file size/Extractors.

generic-jdbc-connector

Table 7-34 Data source link properties of generic-jdbc-connector


Schema name Name of the database storing source data. You can query and selectit on the interface.

Table name Data table storing the source data. You can query and select it onthe interface.

Partition column If multiple columns need to be read, use this column to split theresult and obtain data.

Where clause Query statement used for accessing the database


2019-01-15 432

ftp-connector or sftp-connector

Table 7-35 Data source link properties of ftp-connector or sftp-connector



Actual storage form of source data. It can be either all data files in adirectory or single data file contained in the file server.

File format Loader supports the following file formats of data stored in the fileserver:l CSV_FILE: Specifies a text file. When the destination link is a


Line Separator Identifier of each line end of source dataNOTE

When FTP or SFTP serves as a source link and File format is set toBINARY_FILE, the value of Line Separator in the advanced properties isinvalid.

Field Separator Identifier of each field end of source dataNOTE

When FTP or SFTP serves as a source link and File format is set toBINARY_FILE, the value of Field Separator in the advanced properties isinvalid.

Encode type Text encoding type of source data. It takes effect for text files only.




hbase-connector

Table 7-36 Data source link properties of hbase-connector


Table name HBase table storing source data


2019-01-15 433

hdfs-connector

Table 7-37 Data source link properties of hdfs-connector



Actual storage form of source data. It can be either all data files in adirectory or single data file contained in HDFS.

File format Loader supports the following file formats of data stored in HDFS:l CSV_FILE: Specifies a text file. When the destination link is a


Line Separator Identifier of each line end of source dataNOTE

When HDFS serves as a source link and File format is set toBINARY_FILE, the value of Line Separator in the advanced properties isinvalid.

Field Separator Identifier of each field end of source dataNOTE

When HDFS serves as a source link and File format is set toBINARY_FILE, the value of Field Separator in the advanced properties isinvalid.




hive-connector

Table 7-38 Data source link properties of hive-connector


Database Name of the Hive database storing the data source. You can queryand select it on the interface.

Table Name of the Hive table storing the data source. You can query andselect it on the interface.


2019-01-15 434

voltdb-connector

Table 7-39 Data source link properties of voltdb-connector


Partition column If multiple columns need to be read, use this column to split theresult and obtain data.

Table Name of the memory database table storing source data. You canquery and select it on the interface.

7.11.5 Destination Link Configurations of Loader Jobs

Overview

When Loader jobs save data to different storage locations, a destination link needs to beselected and the link properties need to be configured.

obs-connector

Table 7-40 Destination link properties of obs-connector


Bucket Name OBS bucket for storing final data

Output directory Directory for storing final data in the bucket. A directory must bespecified.

File format Loader supports the following file formats of data stored in OBS:l CSV_FILE: Specifies a text file. When the destination link is a


Line Separator Identifier of each line end of final data

Field Separator Identifier of each field end of final data

Encode type Text encoding type of final data. It takes effect for text files only.

generic-jdbc-connector

Table 7-41 Destination link properties of generic-jdbc-connector


Schema name Name of the database saving final data


2019-01-15 435


Table name Name of the table saving final data

ftp-connector or sftp-connector

Table 7-42 Destination link properties of ftp-connector or sftp-connector


Output directory Directory for storing final data in the file server. A directory mustbe specified.

File format Loader supports the following file formats of data stored in the fileserver:l CSV_FILE: Specifies a text file. When the destination link is a


Line Separator Identifier of each line end of final dataNOTE

When FTP or SFTP serves as a destination link and File format is set toBINARY_FILE, the value of Line Separator in the advanced properties isinvalid.

Field Separator Identifier of each field end of final dataNOTE

When FTP or SFTP serves as a destination link and File format is set toBINARY_FILE, the value of Field Separator in the advanced properties isinvalid.

Encode type Text encoding type of final data. It takes effect for text files only.

hbase-connector

Table 7-43 Destination link properties of hbase-connector


Table name Name of the HBase table saving final data. You can query andselect it on the interface.

Method Data can be imported to an HBase table using either BULKLOADor PUTLIST.


2019-01-15 436


Clear data beforeimport

Whether to clear data in the destination HBase table. Options are asfollows:l True: Clean up data in the table.l False: Do not clean up data in the table. When you select False,

an error is reported during job running if data exists in the table.

hdfs-connector

Table 7-44 Destination link properties of hdfs-connector


Output directory Directory for storing final data in HDFS. A directory must bespecified.

File format Loader supports the following file formats of data stored in HDFS:l CSV_FILE: Specifies a text file. When the destination link is a


Compression codec Compression mode used when a file is saved to HDFS. Thefollowing modes are supported: NONE, DEFLATE, GZIP,BZIP2, LZ4 and SNAPPY.

Overwrite How to process files in the output directory when files are importedto HDFS. Options are as follows:l True: Clean up files in the directory and import new files by

default.l False: Do not clean up files. If files exist in the output directory,

job running fails.

Line Separator Identifier of each line end of final dataNOTE

When HDFS serves as a destination link and File format is set toBINARY_FILE, the value of Line Separator in the advanced properties isinvalid.

Field Separator Identifier of each field end of final dataNOTE

When HDFS serves as a destination link and File format is set toBINARY_FILE, the value of Field Separator in the advanced properties isinvalid.


2019-01-15 437

hive-connector

Table 7-45 Destination link properties of hive-connector


Database Name of the Hive database storing final data. You can query andselect it on the interface.

Table Name of the Hive table saving final data. You can query and selectit on the interface.

voltdb-connector

Table 7-46 Destination link properties of voltdb-connector


Table Name of the memory database table storing final data. You canquery and select it on the interface.

7.11.6 Managing Loader Jobs

Scenario

You can create, view, edit, and delete jobs on the Loader page.

Prerequisites

You have accessed the Loader page. For details, see Loader Page.

Create a Job

Step 1 On the Loader page, click New job.

Step 2 In Information, set parameters.

1. In Name, enter a job name.2. In From link and To link, select links accordingly.

After you select a link of a type, data is obtained from the specified source and saved tothe destination.

NOTE

If no available link exists, click Add a new link.

Step 3 In From, configure the job of the source link.

For details, see Source Link Configurations of Loader Jobs.

Step 4 In To, configure the job of the destination link.


2019-01-15 438

For details, see Destination Link Configurations of Loader Jobs.

Step 5 Check whether a database link is selected in To link.

Database links include:

l generic-jdbc-connector

l hbase-connector

l hive-connector

l voltdb-connector

If you set To link to a database link, you need to configure a mapping between service dataand a field in the database table.

l If you set it to a database link, go to Step 6.

l If you do not set it to a database link, go to Step 7.

Step 6 In Field Mapping, enter a field mapping. Perform Step 7.

Field Mapping specifies a mapping between each column of user data and a field in thedatabase table.

Table 7-47 Field Mapping properties


Column Num Field sequence of service data

Sample First line of sample values of service data

Column Family When To link is hbase-connector, you can select a column familyfor storing data.

Destination Field Field for storing data

Type Type of the field selected by the user

Row Key When To link is hbase-connector, you need to select DestinationField as a row key.

NOTE

If the value of From is a connector of a file type, for example, SFTP, FTP, OBS, and HDFS files, thevalue of Field Mapping is the first row of data in the file. Ensure that the first row of data is complete.Otherwise, the Loader job will not extract columns that are not mapped.

Step 7 In Task Config, set job running parameters.

Table 7-48 Loader job running properties


Extractors Number of map tasks


2019-01-15 439


Loaders Number of reduce tasksThis parameter appears only when the destination field is HBase orHive.

Max error records insingle split

Error record threshold. If error records of a single map task exceedthe threshold, the task automatically stops and the obtained data isnot returned.NOTE

Data is read and written in batches for MYSQL and MPPDB of generic-jdbc-connector by default. Errors are recorded once at most for each batchof data.

Dirty data directory Directory for saving dirty data. If you leave this parameter blank,dirty data will not be saved.

Step 8 Click Save.

----End

Viewing a Job

Step 1 Access the Loader page. The Loader job management page is displayed by default.l If Kerberos authentication is enabled in the cluster, all jobs created by the current user

are displayed by default and other users' jobs cannot be displayed.l If Kerberos authentication is disabled in the cluster, all Loader jobs of the cluster are

displayed.

Step 2 In Sqoop Jobs, enter a job name or link type to filter the job.

Step 3 Click Refresh to obtain the latest job status.

----End

Editing a Job

Step 1 Access the Loader page. The Loader job management page is displayed by default.

Step 2 Click the job name to go to the edit page.

Step 3 Modify the job configuration parameters based on service requirements.

Step 4 Click Save.

NOTE

Basic job operations in the navigation bar on the left are Run, Copy, Delete, Disable, History Record,and Show Job JSON Definition.

----End

Deleting a Job

Step 1 Access the Loader page.


2019-01-15 440

Step 2 In the line of the specified job, click .

Alternatively, you can select one or more jobs and click Delete jobs in the upper right cornerof the job list.

Step 3 In the dialog box, click Yes, delete it.

If the state of a Loader job is Running, the job fails to be deleted.

----End

7.11.7 Preparing a Driver for MySQL Database Link

Scenario

As a component for batch data export, Loader can import and export data using a relationaldatabase.

Prerequisites

You have prepared service data.

Procedure

Step 1 Download the mysql-connector-java-5.1.21.jar MySQL JDBC driver from the MySQLofficial website.

Step 2 Upload mysql-connector-java-5.1.21.jar to the /opt/Bigdata/FusionInsight/FusionInsight-Sqoop-1.99.7/FusionInsight-Sqoop-1.99.7/server/jdbc Loader installation directory onactive and standby MRS master nodes.

Step 3 Change the owner of mysql-connector-java-5.1.21.jar to omm:wheel.

Step 4 Modify the jdbc.properties configuration file.

Change the key-value of MYSQL to mysql-connector-java-5.1.21.jar, for example,MYSQL=mysql-connector-java-5.1.21.jar.

Step 5 Restart Loader.

----End

7.11.8 Example: Using Loader to Import Data from OBS to HDFS

Scenario

If you need to import a large volume of data from the external cluster to the internal cluster,import it from OBS to HDFS.

Prerequisitesl You have prepared service data.

l You have created an analysis cluster.


2019-01-15 441

Procedure

Step 1 Upload service data to your OBS bucket.

Step 2 Obtain AK/SK information and create an OBS link and an HDFS link.

For details, see Loader Link Configuration.

Step 3 Access the Loader page. For details, see Loader Page.

If Kerberos authentication is enabled in the analysis cluster, follow instructions in Accessingthe Hue WebUI.

Step 4 Click New Job.

Step 5 In Information, set parameters.

1. In Name, enter a job name, for example, obs2hdfs.2. In From link, select the OBS link you create.3. In To link, select the HDFS link you create.

Step 6 In From, set source link parameters.

1. In Bucket Name, enter a name of the bucket storing service data.2. In Input directory or file, enter a detailed location of service data in the bucket.

If it is a single file, enter a complete path containing the file name. If it is a directory,enter the complete path of the directory.

3. In File format, enter the type of the service data file.

For details, see Table 7-33.

Step 7 In To, set destination link parameters.

1. In Output directory, enter the directory for storing service data in HDFS.If Kerberos authentication is enabled in the cluster, the current user who accesses Loaderneeds to have permission to write data to the directory.

2. In File format, enter the type of the service data file.The type must correspond to the type in Step 6.3.

3. In Compression codec, enter a compression algorithm. For example, if you do notcompress data, select NONE.

4. In Overwrite, select True.5. Click Show Senior Parameter and set Line Separator.6. Set Field Separator.

For details, see Table 7-44.

Step 8 In Task Config, set job running parameters.

1. In Extractors, enter the number of map tasks.2. In Loaders, enter the number of reduce tasks.

When the destination link is an HDFS link, Loaders is hidden.3. In Max error records in single split, enter an error record threshold.4. In Dirty data directory, enter a directory for saving dirty data, for example, /user/

sqoop/obs2hdfs-dd.


2019-01-15 442

Step 9 Click Save and execute.

On the Manage jobs page, view the job running result. You can click Refresh to obtain thelatest job status.

----End


2019-01-15 443

8 FAQs

8.1 What Is MRS?MapReduce Service (MRS for short), one of basic services on the public cloud, is used formanaging and analyzing massive data.

MRS builds a reliable, secure, and easy-to-use operation and maintenance (O&M) platform.The platform provides analysis and computing capabilities for massive data and can addressenterprises' demands on data storage and processing. Users can independently apply for anduse the hosted Hadoop, Spark, HBase, and Hive components to quickly create clusters on ahost, which provides batch data analysis and computing capabilities for massive data that doesnot have demanding requirements on real-time processing.

8.2 What Are the Highlights of MRS?The highlights of MRS are as follows:

l Easy to useMRS provides not only the capabilities supported by Hadoop, Spark, Spark SQL, HBase,and Hive, but also the unified SQL interaction interfaces in the entire process, whichsimplifies big data application development.

l Low costMRS is free of O&M and separates computing from storage. The computing cluster canbe created as required and released after a job operation is complete.

l StabilityMRS makes you spend less time on commissioning and monitoring clusters. The serviceusability reaches 99.9% and the data reliability reaches 99.9999%.

l High opennessMRS is open source-based and compatible with other services, and provides REST APIsand JDBC interfaces.

MapReduce ServiceUser Guide 8 FAQs

2019-01-15 444

8.3 What Is MRS Used For?Based on the Hadoop open source software, Spark in-memory computing engine, HBasedistributed storage database, and Hive data warehouse framework, MRS provides a unifiedplatform for storing, querying, and analyzing enterprise-level big data to help enterprisesquickly establish a massive data processing system. This platform has the following features:

l Analyzing and computing massive datal Storing massive data

8.4 How Do I Use MRS?MRS is easy to use and provides a user-friendly user interface (UI). By using computersconnected in a cluster, you can run various tasks, and process or store petabytes of data.

After Kerberos authentication is disabled. A typical procedure for using MRS is as follows:

1. Prepare data.Upload the local programs and data files to be computed to Object Storage Service(OBS).

2. Create a cluster.Create clusters before you use MRS. The cluster quantity is subject to the Elastic CloudServer (ECS) quantity. Configure basic cluster information to complete cluster creation.You can submit a job at the same time when you create a cluster.

NOTE

When you create a cluster, only one new job can be added. If you need to add more jobs, performStep 4.

3. Import data.After an MRS cluster is successfully created, use the import function of the cluster toimport OBS data to HDFS. An MRS cluster can process both OBS data and HDFS data.

4. Add a job.After a cluster is created, you can analyze and process data by adding jobs. Note thatMRS provides a platform for executing programs developed by users. You can submit,execute, and monitor such programs by using MRS. After a job is added, the job is in theRunning state by default.

5. View the execution result.The job operation takes a while. After job running is complete, go to the JobManagement page, and refresh the job list to view the execution results on the Job tabpage.You cannot execute a successful or failed job, but can add or copy the job. After settingjob parameters, you can submit the job again.

6. Terminate a cluster.If you want to terminate a cluster after jobs are complete, click Terminate in Cluster.The cluster status changes from Running to Terminating. After the cluster isterminated, the cluster status will change to Terminated and will be displayed inHistorical Cluster.


2019-01-15 445

8.5 How Do I Ensure Data and Service Running Security?MRS is a platform for massive data management and analysis and features high security. Itensures user data and service running security from the following aspects:

l Network isolationThe public cloud divides the entire network into two planes: the service plane andmanagement plane. The two planes are physically isolated to ensure security of theservice and management networks.– Service plane

Network plane where cluster components are running. It provides service channelsfor users and delivers data access, task submitting, and computing functions.

– Management planePublic cloud console. It is used to apply for and manage MRS.

l Host securityUsers can deploy third-party antivirus software based on their service requirements. Forthe operating system (OS) and interfaces, MRS provides the following securityprotection measures:– Hardening OS kernel security– Installing the latest OS patch– Controlling the OS rights– Managing OS interfaces– Preventing the OS protocols and interfaces from attacks

l Data securityMRS stores data on the OBS platform, ensuring data security.

l Data integrityAfter processing data, MRS encrypts and transmits data to the OBS system through SSL,ensuring data integrity.

8.6 How Do I Prepare a Data Source for MRS?MRS can process data in both OBS and HDFS. Before using MRS to analyze data, you arerequired to prepare the data.

1. Upload local data to OBS.

a. Log in to the OBS management console.b. Create a userdata bucket, and then create the program, input, output, and log

folders in the userdata bucket.

i. Click Create Bucket to create a userdata bucket.ii. In the userdata bucket, click Create Folder to create the program, input,

output, and log folders.c. Upload local data to the userdata bucket.

i. Go to the program folder, and click to select a user program.


2019-01-15 446

ii. Click Upload.iii. Repeat preceding steps to upload the data files to the input folder.

2. Import OBS data to HDFS.This function is available only when Kerberos authentication is disabled and the clusteris running properly.

a. Log in to the MRS management console.b. Go to the File Management page and select HDFS File List.c. Click the data storage directory, for example, bd_app1.

bd_app1 is just an example. The storage directory can be any directory on the page.You can create a directory by clicking Create Folder.

d. Click Import Data, and click Browse to configure the paths of HDFS and OBS, asshown in Figure 8-1.

Figure 8-1 Importing files

e. Click OK.You can view the file upload progress in File Operation Record.

8.7 What Is the Difference Between Data in OBS and Thatin HDFS?

The data source to be processed by MRS is from OBS or HDFS. OBS provides you withmassive, highly reliable, and secure data storage capabilities at a low cost. MRS can processthe data in OBS. You can view, manage, and use data by using OBS Console or an OBSclient. In addition, you can use the REST APIs to manage or access data. You can use theREST APIs alone or integrate it with service programs.

l OBS data storage: Data storage and computing are performed separately. OBS datastorage features low cost and unlimited storage capacity. The clusters can be terminatedat any time in OBS. The computing performance is determined by OBS accessperformance and is lower than that of HDFS. OBS is recommended when datacomputing is not frequent.

l HDFS data storage: Data storage and computing are performed together. HDFS datastorage features high cost, high computing performance, and limited storage capacity.Before terminating clusters, you must export and store the data. HDFS is recommendedwhen data computing is frequent.


2019-01-15 447

8.8 How Do I View All Clusters?On the Cluster page, you can view clusters in various states. If massive clusters are involved,you can turn pages to view clusters in any status.

l Active Cluster: contains all clusters except the clusters in the Failed or Terminatedstate.

l Failed Task: contains only the tasks in the Failed state. Task failures include:– Cluster creation failure– Cluster termination failure– Cluster capacity expansion failure

8.9 How Do I View Log Information?On the Operation Log page, you can view log information about users' operations on clustersand jobs only after Kerberos authentication is disabled. Currently, MRS has two types of logs:

l Cluster: Creating, terminating, shrinking, and expanding a clusterl Job: Creating, stopping, and deleting a job

Figure 8-2 shows log information about users' operations.

Figure 8-2 Log information

8.10 What Types of Jobs Are Supported by MRS?A job functions as a program execution platform provided by MRS. Currently, MRS supportsMapReduce jobs, Spark jobs, and Hive jobs. Table 8-1 describes job characteristics.


2019-01-15 448

Table 8-1 Job types

Type Description

MapReduce MapReduce is a programming model with parallel computingsimplified, and is used for parallel computing of big data sets (over oneTB).Map divides one task into multiple tasks, and Reduce summarizes theprocessing results of these tasks and produces the final analysis result.After you complete code development, pack the code into a JAR file inIDEA or Eclipse, upload the file to the MRS cluster for execution, andobtain the execution result.

Spark Spark is a batch data processing engine with high processing speed.Spark has demanding requirements on memory because it performscomputing based on memory. A Spark job includes:l Spark: ends with .jar, which is case-insensitive.l Spark Script: ends with .sql, which is case-insensitive.l Spark SQL: specifies standard Spark SQL statements, for example,

show tables;.

Hive Hive is a data warehouse framework built on Hadoop. Hive providesHive query language (HiveQL), similar to structured query language(SQL), to process structured data. Hive automatically converts HiveQLin Hive Script to a MapReduce task to query and analyze massive datastored in the Hadoop cluster.An example of a standard HiveQL statement is as follows: create tablepage_view(viewTime INT,userid BIGINT,page_urlSTRING,referrer_uel STRING,ip STRING COMMENT 'IP Addressof the User');

8.11 How Do I Submit Developed Programs to MRS?MRS provides a platform for executing programs developed by users. You can submit,execute, and monitor such programs by using MRS. To submit developed programs to MRS,set Program Path to the actual path for storing such programs, as shown in Figure 8-3.


2019-01-15 449

Figure 8-3 Creating a job

8.12 How Do I View Cluster Configurations?l After a cluster is created, you can choose Cluster > Active Cluster, select a running

cluster, and click its name to switch to the cluster information page. You can view thebasic configuration information about a cluster. The instance specifications andcapacities of nodes determine the data analysis and processing capability of a cluster.More advanced instance specifications and larger capacity allow faster cluster runningand better data processing, and accordingly require higher cluster costs.

l Choose Cluster > Active Cluster, select a running cluster, click its name to switch tothe cluster information page, and then click Cluster Manager to go to the clustermanagement page. On the MRS cluster management page that is displayed, you canview and process alarm information, modify cluster configurations, and install clusterpatches.

8.13 What Types of Host Specifications Are Supported byMRS?

MRS provides optimal specifications based on extensive experience in big data productoptimization. Host specifications are determined by CPUs, memory, and disks. Currently, thefollowing specifications are supported:

l s1.xlarge.linux.mrs -- 4 vCPU,16 GB

– CPU: 4-core

– Memory: 16 GB


2019-01-15 450

– System Disk: 40 GBl d1.xlarge.linux.mrs -- 6 vCPU,55 GB

– CPU: 6-core– Memory: 55 GB– System Disk: 40 GB– Data Disk: 1.8 TB x 3 HDDs

l c2.2xlarge.linux.mrs -- 8 vCPU,16 GB– CPU: 8-core– Memory: 16 GB– System Disk: 40 GB






More advanced host specifications enable better data processing, and accordingly requirehigher cluster costs. You can choose host specifications based on site requirements.

8.14 What Components Are Supported by MRS?MRS supports components, such as Hadoop 2.7.2, Spark 2.1.0, HBase 1.0.2, and Hive 1.2.1.More versions and components will be supported by later versions. For details, see Table 3-11in Creating a Cluster. A component is also known as a service in an MRS Manager.


2019-01-15 451

8.15 What Is the Relationship Between Spark andHadoop?

Spark is a fast and common computing engine that is compatible with Hadoop data. Spark canrun in a Hadoop cluster by using Yarn and process data of any type in HDFS, HBase, Hive,and Hadoop.

8.16 What Types of Spark Jobs Are Supported by an MRSCluster?

On the page of MRS, an MRS cluster supports Spark jobs submitted in Spark, Spark Script, orSpark SQL mode.

8.17 Can a Spark Cluster Access Data in OBS?Similar to a Hadoop cluster, a Spark cluster can access data stored in the OBS system. AfterKerberos authentication is disabled. You only need to set Import From and Export To to thepath of the OBS system when submitting jobs.

8.18 What Is the Relationship Between Hive and OtherComponents?

l Relationship between Hive and HDFS

Hive is the subproject of Apache Hadoop. Hive uses HDFS as the file storage system.Hive parses and processes structured data, and HDFS provides highly reliable underlyingstorage support for Hive. All data files in the Hive database are stored in HDFS, and alldata operations on Hive are also performed using HDFS APIs.

l Relationship between Hive and MapReduce

Hive data computing depends on MapReduce. MapReduce is a subproject of ApacheHadoop. It is a parallel computing framework based on HDFS. During data analysis,Hive translates HiveQL statements submitted by users into MapReduce jobs and submitsthe jobs to MapReduce for execution.

l Relationship between Hive and DBService

MetaStore (metadata service) of Hive processes the structure and attribute informationabout Hive databases, tables, and partitions. The information needs to be stored in arelational database and is maintained and processed by MetaStore. In MRS, the relationaldatabase is maintained by the DBService component.

l Relationship between Hive and Spark

Hive data computing can be implemented on Spark. Spark is an Apache project. It is adistributed computing framework based on memory. During data analysis, Hivetranslates HiveQL statements submitted by users into Spark jobs and submits the jobs toSpark for execution.


2019-01-15 452

8.19 What Types of Distributed Storage Are Supported byMRS?

MRS supports Hadoop 2.7.2 now and will support other mainstream Hadoop versionsreleased by the community.

8.20 Can MRS Cluster Nodes Be Changed on the MRSManagement Console?

MRS cluster nodes cannot be changed on the MRS management console. You are not advisedto change MRS cluster nodes on the ECS management console either. If you manually stop ordelete the ECS, modify or reinstall the ECS OS, or modify the ECS specifications for a clusternode on the ECS management console, the cluster may work improperly.

If you have performed any of the preceding operations, MRS automatically identifies anddeletes the involved cluster node. You can substitute the deleted node by expanding thecapacity of the cluster on the MRS management console. Do not perform any operation on anode during capacity expansion.


2019-01-15 453

A Change History

Release Date What's New

2019-01-15 This issue is the fourth official release.Modified the following contents:Required Permission for Using MRS

2018-12-13 This issue is the third official release.Modified the following contents:Required Permission for Using MRS

2018-11-01 This issue is the second official release.Modified the following content:l Cluster Listl Viewing and Exporting a Check Reportl Changing the Password for User adminl Creating a Role

2018-08-30 This issue is the first official release.

MapReduce ServiceUser Guide A Change History

2019-01-15 454

Huawei › en-us › eu-west-0-user... · 2019-12-14 · Contents 1...

Documents

Transcript of Huawei › en-us › eu-west-0-user... · 2019-12-14 · Contents 1...