Introduction To Hadoop Administration - SpringPeople

13
© SpringPeople Software Private Limited, All Rights Reserved. © SpringPeople Software Private Limited, All Rights Reserved. Introduction to Administration

Transcript of Introduction To Hadoop Administration - SpringPeople

Page 1: Introduction To Hadoop Administration - SpringPeople

© SpringPeople Software Private Limited, All Rights Reserved. © SpringPeople Software Private Limited, All Rights Reserved.

Introduction to

Administration

Page 2: Introduction To Hadoop Administration - SpringPeople

© SpringPeople Software Private Limited, All Rights Reserved.

What is Hadoop?

• Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation.

Page 3: Introduction To Hadoop Administration - SpringPeople

© SpringPeople Software Private Limited, All Rights Reserved.

What is HDFS? The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets. HDFS relaxes a few POSIX requirements to enable streaming access to file system data. HDFS was originally built as infrastructure for the Apache Nutch web search engine project. HDFS is now an Apache Hadoop subproject.

Page 4: Introduction To Hadoop Administration - SpringPeople

© SpringPeople Software Private Limited, All Rights Reserved.

HDFS Architecture

Page 5: Introduction To Hadoop Administration - SpringPeople

© SpringPeople Software Private Limited, All Rights Reserved.

Page 6: Introduction To Hadoop Administration - SpringPeople

© SpringPeople Software Private Limited, All Rights Reserved.

Page 7: Introduction To Hadoop Administration - SpringPeople

© SpringPeople Software Private Limited, All Rights Reserved.

What is Hadoop Cluster?

• A Hadoop cluster is a special type of computational cluster designed specifically for storing and analyzing huge amounts of unstructured data in a distributed computing environment.

• Such clusters run Hadoop's open source distributed processing software on low-cost commodity computers.

• Hadoop clusters are known for boosting the speed of data analysis applications. They also are highly scalable.

• Hadoop clusters also are highly resistant to failure because each piece of data is copied onto other cluster nodes, which ensures that the data is not lost if one node fails.

Page 8: Introduction To Hadoop Administration - SpringPeople

© SpringPeople Software Private Limited, All Rights Reserved.

Page 9: Introduction To Hadoop Administration - SpringPeople

© SpringPeople Software Private Limited, All Rights Reserved.

What is MapReduce?

• Hadoop MapReduce (Hadoop Map/Reduce) is a software framework for distributed processing of large data sets on compute clusters of commodity hardware. It is a sub-project of the Apache Hadoop project. The framework takes care of scheduling tasks, monitoring them and re-executing any failed tasks.

Page 10: Introduction To Hadoop Administration - SpringPeople

© SpringPeople Software Private Limited, All Rights Reserved.

• MapReduce has undergone a complete overhaul in hadoop-0.23 and we now have, what we call, MapReduce 2.0 (MRv2) or YARN.

• Apache™ Hadoop® YARN is a sub-project of Hadoop at the Apache Software Foundation introduced in Hadoop 2.0 that separates the resource management and processing components. YARN was born of a need to enable a broader array of interaction patterns for data stored in HDFS beyond MapReduce. The YARN-based architecture of Hadoop 2.0 provides a more general processing platform that is not constrained to MapReduce.

Apache Hadoop NextGen MapReduce (YARN)

Page 11: Introduction To Hadoop Administration - SpringPeople

© SpringPeople Software Private Limited, All Rights Reserved.

How you can master Hadoop Administration?

Become an expert in 2 days.

World class Hadoop Administration training by the industry experts.

More Details

Page 12: Introduction To Hadoop Administration - SpringPeople

© SpringPeople Software Private Limited, All Rights Reserved.

Suggested Audience & Other Details

• Prerequisites: Basic knowledge of unix and system administration. Prior knowledge of Hadoop is not required.

• Suggested Audience:

– Developers

– Architects

• Duration – 2 Days

Syllabus

Page 13: Introduction To Hadoop Administration - SpringPeople

© SpringPeople Software Private Limited, All Rights Reserved.

For further info/assistance contact

[email protected]

+91 80 656 79700

www.springpeople.com

Our Partners