Apache Hadoop

APACHE HADOOP ABSTRACT: The core of Apache Hadoop consists of a storage part ( Hadoop Distributed File System (HDFS)) and a processing part (Map Reduce). Hadoop splits files into large blocks (default 64MB or 128MB) and distributes the blocks amongst the nodes in the cluster. To process the data, Hadoop Map/Reduce transfers code (specifically Jar files) to nodes that have the required data, which the nodes then process in parallel. This approach takes advantage of data locality to allow the data to be processed faster and more efficiently via distributed processing than by using a more conventional supercomputer architecture that relies on a parallel file system where computation and data are connected via high-speed networking. The base Apache Hadoop framework is composed of the following modules: Hadoop Common – contains libraries and utilities needed by other Hadoop modules; Hadoop Distributed File System (HDFS) – a distributed file- system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster; Hadoop YARN – a resource-management platform responsible for managing compute resources in clusters and using them for scheduling of users' applications; and Hadoop Map Reduce – a programming model for large scale data processing.

Upload
sainath-reddy
Category

Documents
view
212
download
0

Embed Size (px):

description

its all about hadoop technology

Transcript of Apache Hadoop

APACHE HADOOP

ABSTRACT:

The core of Apache Hadoop consists of a storage part (Hadoop Distributed File System

(HDFS)) and a processing part (Map Reduce). Hadoop splits files into large blocks (default

64MB or 128MB) and distributes the blocks amongst the nodes in the cluster. To process the

data, Hadoop Map/Reduce transfers code (specifically Jar files) to nodes that have the

required data, which the nodes then process in parallel. This approach takes advantage of data

locality to allow the data to be processed faster and more efficiently via distributed

processing than by using a more conventional supercomputer architecture that relies on

a parallel file system where computation and data are connected via high-speed networking.

The base Apache Hadoop framework is composed of the following modules:

Hadoop Common – contains libraries and utilities needed by other Hadoop modules;

Hadoop Distributed File System (HDFS) – a distributed file-system that stores data on

commodity machines, providing very high aggregate bandwidth across the cluster;

Hadoop YARN – a resource-management platform responsible for managing compute

resources in clusters and using them for scheduling of users' applications; and

Hadoop Map Reduce – a programming model for large scale data processing.

Since 2012, the term "Hadoop" often refers not just to the base modules above but also to the

collection of additional software packages that can be installed on top of or alongside

Hadoop, such as Apache Pig, Apache Hive, Apache HBase and Apache Spark.

K.Vamshidhar Reddy

11BA1A0546

Apache Spark & Hadoop

Apache Hadoop Tutorial - enos.itcollege.eeenos.itcollege.ee/~jpoial/allalaadimised/reading/Apache-Hadoop... · and theMapReduce paper. A key advantage of Apache Hadoop is its design

Python 3 + apache hadoop

Apache Hadoop 1.1

Making Apache Hadoop Secure

Intro to Apache Hadoop

Apache hadoop by shah

InfiniDB Apache Hadoop Guide

Apache Hadoop Today & Tomorrow · 2019-12-21 · Apache Hadoop Projects . Programming Languages . Computation Object Storage Zookeeper (Coordination) Core Apache Hadoop Related Apache

Introduction to Apache Hadoop & Pig - SALSAHPCsalsahpc.indiana.edu/CloudCom2010/slides/PDF/tutorials/Yahoo... · Hadoop & Pig Milind Bhandarkar ... (hadoop, pig) (apache, pig) (hadoop,

Apache Hadoop 3 Current Status Ajisaka - schd.wsschd.ws/hosted_files/apachebigdata2016/0d/Apache Hadoop 3 Current... · Apache Hadoop 3, Current Status Apache: ... n metrics2 sink

20100130 hadoop apache

Apache Hadoop Developer Training.pdf

Apache Hadoop Java API

Dzone Apache Hadoop Deployment

Apache Hadoop Email Lists

Performance-Analyse von Apache Spark und Apache Hadoop€¦ · Apache Spark, Apache Hadoop, Big Data, Benchmarking, Performance-Analyse Kurzzusammenfassung Diese Bachelorarbeit beschäftigt

RefCardz - Apache Hadoop

Apache Hadoop Ecosystem - LIAS (Lab · Apache Hadoop Ecosystem ... Apache Drill, Cloudera Impala. Thank you for Your Attention Q & A Apache Hadoop Ecosystem ENSMA …

Apache Hadoop Today & Tomorrow - SNIA€¦ · Apache Hadoop Projects . Programming Languages . Computation Object Storage Zookeeper (Coordination) Core Apache Hadoop Related Apache

Apache Hadoop

Documents

Transcript of Apache Hadoop