Hadoop Session 2 : HDFS
-
Upload
rishi-arora -
Category
Data & Analytics
-
view
123 -
download
0
Transcript of Hadoop Session 2 : HDFS
HDFSHADOOP DISTRIBUTED
FILE SYSTEM
Revision What is Big Data ? 3 V’s of Big Data What can we do with Big Data ? What is Hadoop ? Components of Hadoop
• DISTRIBUTED• FAULT TOLERENT• SCALABLE• FLEXIBLE• INTELLIGENT
Revision
Hadoop Components
SELF HEALINGDISTRIBUTED STORAGE
FAULT TOLERANTDISTRIBUTED COMPUTING
+ ABSTRACTION PARALLEL
PROCESSING
• Designed for modest number of Large files (millions instead of billions)• Sequential access not Random access• Write Once, Read Many Times• Data is split into BIG chunks and stored in multiple nodes as blocks• Blocks get replicated over the multiple nodes
HDFS Overview
HDFS Client Server Architecture Server - Name Node Client – Data Nodes File Split into multiple Blocks Multiple Copies of Each Block
NN
DATA NODES
5
3
2
1
4
3
144
2
5
1
TOPOLOGY OF HADOOP CLUSTER
NAME NODE
DATANODE
SECONDARYNAME NODE
DATANODE DATANODEDATANODE
Nodes in a HDFS Name Node Secondary Name Node Data Node Job Tracker Task Tracker
NAMENODE One NN per cluster Manages
File System namespace META Data
Single Point of failure Enterprise hardware i.e. RAID
machines
SECONDARY NAMENODE
NOT a backup node of NN NOT Automatic to replace NN Single Point of failure Enterprise hardware i.e. RAID
machines
DATANODE Many per cluster Manages
Blocks and Serves to the client Periodically report to NN list of block it
stores Use Inexpensive commodity hardware
JOB TRACKER One per cluster Manages
Job Requests submitted by client Initial point of contact for client Job starts at Job Tracker Single point of failure
TASK TRACKER Many per cluster Execute Map and Reduce Operation Read input splits for a Map Reduce
Job
Block Replication
1st Node at the client (Randomly Chosen) 2nd Different Rack than first 3rd Same Rack as the second
Replication factor = 3
REPLICA PLACEMENT
HDFS Large Blocks of 64 MB/128 MB
150 MB
64 MB
64 MB
22 MB
HDFS CLI
HDFS Files Read/Write hadoop fs -ls <path> hadoop fs -mkdir <path> hadoop fs -cp <Source> <Destination> hadoop fs -cat <File Path> hadoop fs –tail <File Path> hadoop fs -mv <Source> <Destination> hadoop fs -rm <path>
HDFS File Ownership sudo -u hdfs hadoop fs -chmod 600
hadoop/purchases.txt
sudo -u hdfs hadoop fs -chown root:root hadoop/purchases.txt
sudo -u hdfs hadoop fs -chgrp training hadoop/purchases.txt
HDFS Administration Commands
hadoop version hadoop classpath hadoop fsck - / hadoop balancer hadoop fs -du -s -h <path> **hadoop fs -setrep -w 2 <File Path> hadoop fs –expunge hadoop fs -df hdfs:/
HDFS Read/Write Local HDFS
hadoop fs -copyFromLocal <Source - Local>
<Destination - HDFS> hadoop fs –copyToLocal
<Source HDFS> < Source - Local> hadoop fs –put <source> <destination> hadoop fs –get <source> <destination>
Most ImportantHelp ??
hadoop fs -help