Hadoop_Video1

Hadoop HDFS --> Hadoop Distributed File System MapReduce --> Distributed Compute Framework Is written in JAVA

HDFS ( Hadoop Distributed File System ) Master Node --> NameNode Slave Nodes ---> DataNode All Daemons( for each node) are running separate JVM. Daemon: is a process which it running continuosly & waits for instructions.

NameNode It doesn't stores any data & It keeps the track of where data is stored. Stores meta data.

DataNodes All data stored in Data nodes Each node needs to connect the Namenode for that, need to add Namenode info in core-site.xml files of all Datanodes.(/etc/hadoop/conf) When Daemon starts, it checks for Namenode & it registers with Namenode. ( NN gets the Diskspace info from all DNs) Then NN shows all disks total space as Single Disk ( virtual disk)

Important Points

Hadoop splits the FIle into blocks(64MB to 128MB) or Chunks, stores each block in 3 datanodes.( 2 replicates) Writes slow & reads faster( writes in 3 locations & reads from different locations ) Parallel disk reads for increased through puts. Data replication & Fault Tolerance NN stores everything(metadata) in Memory, If metadata doesn't fit in Memory , then it will not store it. All put&get things will be done by Hadoop client( which will conects to NN & DNs ) NN stores Metadata in Memory, DNs stores the blocks in Local FS, Hadoop client provides virtual vies to the end user. URI:Universal Resource Identifier --> the path where data stores in Virtual Disk Namenode will make entry for URI in the memory. hdfs-site.xml will have the Block size information, according to that NN desides the how to split & where to store . According to that Hclient splits the file into blocks & Shifts first Block to mention DN, that DN sends the block to another DN, from that DN to another. DN Sends the Heartbeats for every 10 secs to NN( sends alive status), when ever block stored in DN, Heartbeat will reports that information to NN When NN gets confirmation from DN, then it will add the entry of that into Memory.( updates metadata in memory) Once one block copied, Hclient gets status & then it copies another block.

Namenode Crash Scenarios When ever NN crashes, we will lost the all Metadata info, for that 'editlog' introduced. First it updates it editlog file & then it updates the metadata in memory, then if NN crashes, we can get metadata info from editlog file. Edit logs are files in NN which keeps track of any changes happening in NN memory. ( this is similar to transactionlogs in RDBMS)

When ever NN crashes, it takes long time for creating metadata info in memory. For this we can take memory snapshots called as FSimage, and when ever NN crashes we can recovery metdata memory immediately with closest snapshot/FSimage & the missed data after snapshot can be done using editlogs. FSimage: periodical snapshots of NN metadata updated memory

File --> Hclient ---> Checks with NN for block size & location NN ---> Splits the file into blocks ---> Shifts first Block to mention DN, DN ---> that DN sends the block to another DN, from that DN to another. --> Sends the Heartbeats for every 10 secs --> NN --> when ever block stored in DN, Heartbeat will reports that information to -->NNNN --> when NN gets confirmation from DN, then it will add the entry of that into Memory.( before it adds it in editlog)

# hadoop fs -mkdir /abc/edf# hadoop fs -ls /abc/# hadoop fs -put /abc/edf

Checking the file in GUI, check blocks assinged.

Hadoop_Video1

Documents

Transcript of Hadoop_Video1