Bulk Loading Into HBase With MapReduce

www.edureka.co/big-data-and-hadoop

Hadoop : Bulk loading with Mapreduce

View Big Data and Hadoop Course at: http://www.edureka.co/big-data-and-hadoop

For more details please contact us: US : 1800 275 9730 (toll free)INDIA : +91 88808 62004Email Us : sales@edureka.co

For Queries: Post on Twitter @edurekaIN: #askEdurekaPost on Facebook /edurekaIN

Objectives

Analyze different use-cases where MapReduce is used

Differentiate between Traditional way and MapReduce way

Learn about Hadoop 2.x MapReduce architecture and components

Understand execution flow of YARN MapReduce application

Implement basic MapReduce concepts

Run a MapReduce Program

At the end of this module, you will be able to

Where MapReduce is Used?

Weather Forecasting

HealthCare

Problem Statement:» De-identify personal health information.

Problem Statement:» Finding Maximum temperature recorded in a year.

Where MapReduce is Used?

MapReduce

FeaturesLarge Scale Distributed Model

Used in

Function

Design Pattern

Parallel Programming

A Program Model

Classification

Analytics

Recommendation

Index and SearchMap

Reduce

ClassificationEg: Top N records

AnalyticsEg: Join, Selection

RecommendationEg: Sort

SummarizationEg: Inverted Index

Implemented

Google

Apache Hadoop

MapReduce Paradigm

The Overall MapReduce Word Count Process

Input Splitting Mapping Shuffling Reducing Final Result

List(K3,V3)Deer Bear River

Dear Bear RiverCar Car RiverDeer Car Bear

Bear, 2Car, 3Deer, 2River, 2

Deer, 1Bear, 1River, 1

Car, 1Car, 1

River, 1

Deer, 1Car, 1Bear, 1

K2,List(V2)List(K2,V2)K1,V1

Car Car River

Deer Car Bear

Bear, 2

Car, 3

Deer, 2

River, 2

Bear, (1,1)

Car, (1,1,1)

Deer, (1,1)

River, (1,1)

MapReduce Application Execution

Executing MapReduce Application on YARN

YARN MR Application Execution Flow

MapReduce Job Execution

» Job Submission

» Job Initialization

» Tasks Assignment

» Memory Assignment

» Status Updates

» Failure Recovery

YARN MR Application Execution Flow

11.Task get Executed.

12.If any reducer in a Job Reducer, again AppMaster Request the Node Manager to start the and Allocate Container

13.Output of All the Maps given to reducer and Reducer get executed

14.Once Job finished, Application Master notify the Resource Manager and Client Library

15.Application Master closed.

Hadoop 2.x : YARN Workflow

Node Manager

Container 1.2

Container 1.1

Container 2.1

Container 2.2

Container 2.3

AppMaster 2

AppMaster 1

Scheduler

Applications Manager (AsM)

Resource

Manager

Summary: Application Workflow

Execution Sequence :

1. Client submits an application Client RM NM AM

1. Client submits an application

2. RM allocates a container to start AM

Client RM NM AM

3. AM registers with RM

Client RM NM AM

4. AM asks containers from RM

Client RM NM AM

5. AM notifies NM to launch containers

Client RM NM AM

6. Application code is executed in container

Client RM NM AM

7. Client contacts RM/AM to monitor application’s status

Client RM NM AM

7. Client contacts RM/AM to monitor application’s status

8. AM unregisters with RM

Client RM NM AM

Input Splits

INPUT DATA

PhysicalDivision

LogicalDivision

HDFSBlocks

InputSplits

Relation Between Input Splits and HDFS Blocks

1 2 3 4 5 6 7 8 9 10 11

Logical records do not fit neatly into the HDFS blocks.

Logical records are lines that cross the boundary of the blocks.

First split contains line 5 although it spans across blocks.

FileLines

BlockBoundary

Split Split Split

MapReduce Job Submission Flow

Input data is distributed to nodes

Node 1 Node 2

INPUT DATA

Each map task works on a “split” of dataMap

Node 1

Node 2

INPUT DATA

Each map task works on a “split” of data

Mapper outputs intermediate data

Node 1

Node 2

INPUT DATA

Data exchange between nodes in a “shuffle” process

Node 1

Node 2

Node 1 Node 2

INPUT DATA

Intermediate data of the same key goes to the same reducer

Node 1

Node 2

Reduce

Node 1

Reduce

Node 2

INPUT DATA

Intermediate data of the same key goes to the same reducer

Reducer output is stored

Node 1

Node 2

Reduce

Node 1

Reduce

Node 2

INPUT DATA

Getting Data to the Mapper

Input File Input File

Input split Input split Input split Input split

RecordReader RecordReader RecordReader RecordReader

Mapper Mapper Mapper Mapper

(intermediates) (intermediates) (intermediates) (intermediates)

Partition and Shuffle

Mapper Mapper Mapper Mapper

(intermediates) (intermediates) (intermediates) (intermediates)

Partitioner Partitioner Partitioner Partitioner

(intermediates) (intermediates) (intermediates)

Reducer Reducer Reducer

Input file

Input Split Input Split Input Split

RecordReader

Mapper Mapper Mapper

(Intermediates) (Intermediates) (Intermediates)

at Input Split

RecordReader

Mapper

Input file

(Intermediates)

Input Format

Combine FileInput Format<K,V>

Text Input Format

Key Value Text Input Format

Nline Input Format

Sequence FileInput Format<K,V>

File Input Format

Input Format<K,V>

org.apache.hadoop.mapreduce

<<interface>>

Composable

Input Format

Composite Input Format

DB Input Format<T>

Sequence File As

Binary Input Format

Sequence File As

Text Input Format

Sequence File Input

Filter<K,V>

Input Format – Class Hierarchy

What is Bulk Load

Process or method provided by dbmses to load multiple rows of data into a database table.

Way to load data (typically into a database) in 'large chunks‘

Loads hundreds/thousands/millions of records in a short period of time.

Demo: Bulk Load with MR

Bulk Loading Into HBase With MapReduce

Technology

Transcript of Bulk Loading Into HBase With MapReduce

Building a LINQ Provider for HBase MapReducejnbridge.com/labfiles/Building_a_LINQ_provider_for_HBase_MapRed… · Building a LINQ Provider for HBase MapReduce ... Azure and Windows

Who - Brown Universitycs.brown.edu/~jcmace/presentations/fonseca2015hpts... · across a cluster of eight machines simultaneously running HBase, Hadoop MapReduce, and direct HDFS clients.

MapReduce. MapReduce Outline MapReduce Architecture MapReduce Internals MapReduce Examples JobTracker Interface.

Maximizing Hadoop Performance with Hardware Compression Excels at Sifting through Huge Masses of Data to Find what is Useful HDFS MapReduce Pig Hive Sqoop HBase. MapReduce Data Flow

Hortonworks Data Platform - User Guides · 2014-10-28 · Hortonworks Data Platform Oct 28, 2014 1 1. HBase Import Tools HBase includes several methods of loading data into tables.

Lecture 11 Hadoop & Sparkece.uprm.edu/~wrivera/ICOM6025/Lecture11.pdf · HBase PIG R Hive Cassandra MapReduce . Hadoop • Designed to reliably store data using ... High Performance

Large-scale Data Mining: MapReduce and beyond · Tutorial overview Part 1 (Spiros): Basic concepts & tools MapReduce & distributed storage Hadoop / HBase / Pig / Cascading / Hive

R-Store: A Scalable Distributed System for Supporting Real ...acs.ict.ac.cn/storage/slides/RStore.pdf · –Modified HBase as storage –Mapreduce job for query execution •Periodically

An overview of the Hadoop/MapReduce/HBase framework …...using the open source HBase project which offers a truly distributed, NOSQL database that is capable of supporting thousands

Large-scale Data Mining: MapReduce and beyondysmoon/courses/2011_1/... · Tutorial overview Part 1 (Spiros): Basic concepts & tools MapReduce & distributed storage Hadoop / HBase

Big Data · Apache Hadoop, Hbase, Spark, MapReduce, Cassandra. Fundamentos Itinerario formativo Resumen 01 02 The Big Picture Introducción al Big Data, Usos y escenarios,

Accelerating Big Data with Hadoop (HDFS, MapReduce and ... · •Overview of Hadoop (HDFS, MapReduce and HBase) and Memcached •Challenges in Accelerating Enterprise Middleware •Designs

High Performance RDMA-based Design of HDFS over InfiniBandnowlab.cse.ohio-state.edu/static/media/publications/... · 2017. 7. 18. · HBase 5 HDFS HBase MapReduce Hadoop Framework

IIHTiihttrichy.com/brochuers/BigdataHadoopCoursesBrochure.pdf · Java Fundamentals, Hadoop Fundamentals, HDFS, MapReduce, Spark, Hive, Pig and Latin, HBase, Sqoop, Yarn, MongoDB and

Oct 2012 HUG: Project Panthera: Better Analytics with SQL, MapReduce, and HBase

Building an Excel Add-in for HBase MapReducejnbridge.com/labfiles/Building_an_Excel_addin_for_HBase_MapReduc… · Building an Excel Add-in for HBase MapReduce Summary This latest

MapReduce, HBase, Pig and Hive - University of California ...courses.ischool.berkeley.edu/i257/s17/Lectures/Lecture23_257.pptx.pdf · 2015.11.19- SLIDE 1 IS 257 – Fall 2015 MapReduce,

Joining 2 JSON files and Loading the Results in HBasecis.csuohio.edu/~sschung/cis612/CIS612-LAB4_2_HiveHBaseJoin_P… · Joining 2 JSON files and Loading the Results in HBase CIS

Web Services - densetsu.orgdensetsu.org/Cloud2012/(4) Web Services.pdf · 07.01.2012 Column Stores HBase 14.01.2012 MapReduce Hadoop 21.01.2012 Wrap-up --- Cloud Computing 120 Service

Entering the Zettabyte Age Jeffrey Krone · 2011-12-14 · HBase • HBase is a distributed Key / Value store built on top of Hadoop and is tightly integrated with the Hadoop MapReduce