[Xxxx] Syllabus - Big Data Administration Training for Apache Hadoop - 280715
-
Upload
ari-pribadi -
Category
Documents
-
view
213 -
download
1
description
Transcript of [Xxxx] Syllabus - Big Data Administration Training for Apache Hadoop - 280715
Course OverviewThe Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. Clouderas open source big data platform is the most widely adopted in the world that offers the industry's highest quality technical support for Apache Hadoop to easily install, configure and manage Hadoop cluster.This 4-day course provides students with a comprehensive understanding of all the steps necessary to operate and maintain a Hadoop cluster. After completing this course, students will be able to install Hadoop, MapReduce, Hive, Impala, and Pig, perform initial HDFS configuration, configure HDFS high availability, securing a Hadoop cluster with Kerberos, and maintain & monitor Hadoop cluster.Duration: 4 days
Who Should Attend
System administrators who will be setting up or maintaining a Hadoop clusterPrerequisites Some basic knowledge of Linux operating systems is strongly recommended.
Course Content
Suggested Next Course
Cloudera Manager Training for Apache Hadoop
Linux Administration and Security Big Data Developer for Apache Hadoop
Big Data Administrator Training for Apache Hadoop
The Case for Apache Hadoop
Why Hadoop? Fundamental Concepts Core Hadoop Components
HDFS
HDFS Features Writing and Reading Files NameNode Memory Considerations Overview of HDFS Security Using the Namenode Web UI Using the Hadoop File Shell
Getting Data into HDFS
Ingesting Data from External Sources with Flume Ingesting Data from Relational Databases with Sqoop REST Interfaces Best Practices for Importing Data
MapReduce
What Is MapReduce? Feature of MapReduce Basic Concepts Architectural Overview MapReduce Version 2 Failure Recovery Using the Job Tracker Web UI
Planning Your Hadoop Cluster
General Planning Considerations Choosing the Right Hardware Network Considerations Configuring Nodes Planning for Cluster Management
Hadoop Installation and Initial Configuration
Deployment Types Installing Hadoop Specifying the Hadoop Configuration Performing Initial HDFS Configuration Performing Initial MapReduce Configuration Hadoop Logging
Installing and Configuring Hive, Impala, & Pig
Hive Impala Pig
Hadoop Clients
What are Hadoop Clients? Installing and Configuring Hadoop Clients Installing and Configuring Hue Hue Authentication and Authorization
Cloudera Manager
The Motivation for Cloudera Manager Cloudera Manager Features Standard and Enterprise Versions Cloudera Manager Topology Installing Cloudera Manager Installing Hadoop Using Cloudera Manager Performing Basic Administration Tasks Using Cloudera Manager
Advanced Cluster Configuration
Advanced Configuration Parameters Configuring Hadoop Ports Explicitly Including and Excluding Hosts Configuring HDFS for Rack Awareness Configuring HDFS High Availability
Hadoop Security
Why Hadoop Security Is Important Hadoops Security System Concepts What Kerberos Is and How it Works Securing a Hadoop Cluster with Kerberos
Managing and Scheduling Jobs
Managing Running Jobs Scheduling Hadoop Jobs Configuring the FairScheduler
Cluster Maintenance
Checking HDFS Status Copying Data between Clusters Adding and Removing Cluster Nodes Rebalancing the Cluster Cluster Upgrading
Cluster Monitoring and Troubleshooting
General System Monitoring Monitoring Hadoop Clusters Troubleshooting Hadoop Clusters Common Misconfigurations Common Misconfigurations