Anatomy of file read in hadoop

ANATOMY

OF FIL

E READ

IN H

ADOOP

RAJESH A

NANDA KUMAR

RAJESH_1

290K@YA

HOO.COM

A SAMPLE HADOOP CLUSTER

Name Node

R1N1 R1N2 R1N3 R1N4

Rack R1

Rack R2

R2N1 R2N2 R2N3 R2N4

1. This is a Hadoop cluster with one name node and two racks named R1 and R2 in a data center D1. Each rack has 4 nodes and they are uniquely identified as R1N1, R1N2 and so on.

2. Replication factor is 3.3. HDFS block size is 64 MB.4. This cluster is used as an example to explain the concepts.

Data center D1

FACTS TO BE KNOW

1. Name node saves part of HDFS metadata like file location, permission, etc. in files called namespace image and edit logs. Files are stored in HDFS as blocks. These block information are not saved in any file. Instead it is gathered every time the cluster is started. And this information is stored in name node’s memory.

2. Replica Placement : Assuming the replication factor is 3; When a file is written from a data node (say R1N1), Hadoop attempts to save the first replica in same data node (R1N1). Second replica is written into another node (R2N2) in a different rack (R2). Third replica is written into another node (R2N1) in the same rack (R2) where the second replica was saved.

3. Hadoop takes a simple approach in which the network is represented as a tree and the distance between two nodes is the sum of their distances to their closest common ancestor. The levels can be like; “Data Center” > “Rack” > “Node”. Example; ‘/d1/r1/n1’ is a representation for a node named n1 on rack r1 in data center d1. Distance calculation has 4 possible scenarios as;1. distance(/d1/r1/n1, /d1/r1/n1) = 0 [Processes on

same node]2. distance(/d1/r1/n1, /d1/r1/n2) = 2 [different node is

same rack]3. distance(/d1/r1/n1, /d1/r2/n3) = 4 [node in different

rack in same data center]4. distance(/d1/r1/n1, /d2/r3/n4) = 6 [node in different

data center]

ANATOMY OF FILE READ – HAPPY PATH

Name Node

R1N1 R1N2 R1N3 R1N4

Rack R1

Rack R2

R2N1 R2N2 R2N3 R2N4

• Let’s assume a file named “sfo_crime.csv” of size 192 MB is saved in this cluster. • Also assume that the file was written from node R1N1.

sfo_crimes.csv

B1 B2 B3

B3 B1 B1 B3 B2 B2

B1

B2

B3

R2N1 R2N2R1N1

• The file is split into 3 blocks each of size 64 MB. And each block is copied 3 times in the cluster.

• Along with data, a checksum will be saved in each block. This is used to ensure the data read

from the block is read with out error.

R2N3 R2N4R1N1

R2N3 R2N1R1N1

Metadata

• When cluster is started, the metadata will look as shown on top right corner.

• Metadata is written in name node.

• When the cluster is up and running, the name node looks like how its shown here (right-side).• Let’s say we are trying to read the “sfo_crimes.csv” file from R1N2.• So a HDFS Client program will run on R1N2’s JVM.

RIN2 JVM

HDFS

Client

DistributedFileSystem Name Node

open()

B1

B2

B3

R2N1

R1N1

R2N2

R2N3

R1N1

R2N4

R2N3

R1N1

R2N1

Metadata

RPC call to get first few blocks of file

B1 (R1N1, R2N1, R2N2)B2 (R1N1, R2N3, R2N4)

• DFS makes a RPC call returns first few blocks on the file. NN returns the address of the

DN ORDERED with respect to the node from where the read is performed.

DFSInputStream

FSDataInputStream

• The block information is saved in DFSInputStream which is wrapped in FSDataInputStream.

• In response to ‘FileSystem.open()’, HDFS Client receives this FSDataInputStream.

B1 (R1N1, R2N1, R2N2)B2 (R1N1, R2N3, R2N4)

• First the HDFS client program calls the method open() on a Java class DistributedFileSystem (subclass of FileSystem).

HDFS

Client

RIN2 JVM

DFSInputStream

FSDataInputStream

B1 (R1N1, R2N1, R2N2)B2 (R1N1, R2N3, R2N4)

• From now on HDFS Client deals with FSDataInputStream (FSDIS).• HDFS Client invokes read() on the stream.

• Blocks are read in order. DFSIS connects to the closest node (R1N1) to read block B1. • DFSIS connects to data node and streams data to client, which calls read() repeatedly on the stream. DFSIS verifies checksums for the data transferred to client.

read()

R1N1

DFSIS connects to R1N1 to read block B1

Data streamed to client directly from data node.

• When the block is read completely, DFSIS closes the connection.

Name Node

HDFS

Client

RIN2 JVM

DFSInputStream

FSDataInputStream

B1 (R1N1, R2N1, R2N2)B2 (R1N1, R2N3, R2N4)

• Next DFSIS attempts to read block B2. As mentioned earlier, the previous connection is closed and a fresh connection is made to the closest node (R1N1) of block B2.

read()

R1N1



Name Node

HDFS

Client

RIN2 JVM

DFSInputStream

FSDataInputStream

• Now DFSIS has read all blocks returned by the first RPC call (B1 & B2). But the file is not read completely. In our case there is one more block to read.• DFSIS calls name node to get data node locations for next batch of blocks as needed.

read()

R1N1



Name NodeB3 (R1N1, R2N1, R2N3)

B3 (R1N1, R2N1, R2N3)

close()

• After the complete file is read for the HDFS client call close().

ANATOMY OF FILE READ – DATA NODE CONNECTION ERROR

HDFS

Client

RIN2 JVM

DFSInputStream

FSDataInputStream

B1 (R1N1, R2N1, R2N2)B2 (R1N1, R2N3, R2N4)

• Let’s say there is some error while connecting to R1N1.

read()

R1N1



Name Node

R1N1

• DFSIS remembers this info, so it won’t try to read from R1N1 for future blocks. Then it tries to connect to next closest node (R2N3).

R2N3

ANATOMY OF FILE READ – DATA NODE CHECKSUM ERROR

HDFS

Client

RIN2 JVM

DFSInputStream

FSDataInputStream

B1 (R1N1, R2N1, R2N2)B2 (R1N1, R2N3, R2N4)

• Let’s say there is a checksum error. This means the block is corrupt.

read()

R1N1



Name Node

• Information about this corrupt block is sent to name node. Then DFSIS tries to connect to next closest node (R2N3).

R2N3

Inform name node that the block in R1N1 is corrupt.

THE END

SORRY FOR MY POOR ENGLISH. PLEASE SEND YOUR VALUABLE FEEDBACK TO

[email protected]

Anatomy of file read in hadoop

Technology

Transcript of Anatomy of file read in hadoop