Meet Hadoop Family: part 1
-
Upload
caizerx -
Category
Data & Analytics
-
view
87 -
download
1
Transcript of Meet Hadoop Family: part 1
![Page 1: Meet Hadoop Family: part 1](https://reader031.fdocuments.in/reader031/viewer/2022021801/5875d0cc1a28ab8f438b52c7/html5/thumbnails/1.jpg)
HDFS
Meet Hadoop Family: part 1
![Page 2: Meet Hadoop Family: part 1](https://reader031.fdocuments.in/reader031/viewer/2022021801/5875d0cc1a28ab8f438b52c7/html5/thumbnails/2.jpg)
![Page 3: Meet Hadoop Family: part 1](https://reader031.fdocuments.in/reader031/viewer/2022021801/5875d0cc1a28ab8f438b52c7/html5/thumbnails/3.jpg)
• What is it? Distributed file system, designed to store very large files with streaming data access patterns
• Why it is needed? Very large fileStreaming data accessCommodity hardware
• Traditional design limitsRAC, MPP, brings data to computation, network become bottleneck
• Trade-offsHigh latency data accessNot good for lot of small filesWrite once, not support multiple write
![Page 4: Meet Hadoop Family: part 1](https://reader031.fdocuments.in/reader031/viewer/2022021801/5875d0cc1a28ab8f438b52c7/html5/thumbnails/4.jpg)
![Page 5: Meet Hadoop Family: part 1](https://reader031.fdocuments.in/reader031/viewer/2022021801/5875d0cc1a28ab8f438b52c7/html5/thumbnails/5.jpg)
A Client Reading Data From HDFS
![Page 6: Meet Hadoop Family: part 1](https://reader031.fdocuments.in/reader031/viewer/2022021801/5875d0cc1a28ab8f438b52c7/html5/thumbnails/6.jpg)
A Client Write Data to HDFS
![Page 7: Meet Hadoop Family: part 1](https://reader031.fdocuments.in/reader031/viewer/2022021801/5875d0cc1a28ab8f438b52c7/html5/thumbnails/7.jpg)
Network Distances in Hadoop
• distance(/d1/r1/n1, /d1/r1/n1) = 0 (processes on the same node) • distance(/d1/r1/n1, /d1/r1/n2) = 2 (different nodes on the same rack) • distance(/d1/r1/n1,/d1/r2/n3) = 4 (nodesondifferentracksinthesamedatacenter) • distance(/d1/r1/n1, /d2/r3/n4) = 6 (nodes in different data centers)
![Page 8: Meet Hadoop Family: part 1](https://reader031.fdocuments.in/reader031/viewer/2022021801/5875d0cc1a28ab8f438b52c7/html5/thumbnails/8.jpg)
• HDFS blocks, default size 128 mb (for a reason), default replication 3x
• Name Node, stores metadata of all blocks in the clusters, location configuration dfs.namenode.name.dir, default /dfs/xx
• Data nodes, store data blocks, also has metadata related to local blocks
• POSIX like (almost) permissions, rw(x), users, groups, mode
![Page 9: Meet Hadoop Family: part 1](https://reader031.fdocuments.in/reader031/viewer/2022021801/5875d0cc1a28ab8f438b52c7/html5/thumbnails/9.jpg)
• HDFS logs and web Interface, port 50070, port 50075
• WebHDFS/ HTTPFS REST interface http://sabtu:50070/webhdfs/v1/tmp?user.name=hdfs&op=GETFILESTATUS {"FileStatus":{"accessTime":0,"blockSize":0,"childrenNum":4,"fileId":16386,"group":"supergroup","length":0,"modificationTime":1467099643710,"owner":"hdfs","pathSuffix":"","permission":"1777","replication":0,"type":"DIRECTORY"}}
![Page 10: Meet Hadoop Family: part 1](https://reader031.fdocuments.in/reader031/viewer/2022021801/5875d0cc1a28ab8f438b52c7/html5/thumbnails/10.jpg)
• High Availability mode
• HDFS federation, similar concept with namespace / database sharding
• HDFS balancer
• Safe mode
• Distributed copy (distcp)
Some Features
![Page 11: Meet Hadoop Family: part 1](https://reader031.fdocuments.in/reader031/viewer/2022021801/5875d0cc1a28ab8f438b52c7/html5/thumbnails/11.jpg)
HDFS Federation
![Page 12: Meet Hadoop Family: part 1](https://reader031.fdocuments.in/reader031/viewer/2022021801/5875d0cc1a28ab8f438b52c7/html5/thumbnails/12.jpg)
• start cluster $HADOOP_PREFIX_HOME/bin/start-dfs.sh
• stop cluster$HADOOP_PREFIX_HOME/bin/stop-dfs.sh
• file operations hdfs dfs -cp x yhdfs dfs -ls x hdfs dfs -cat x hdfs dfs -put x y hdfs dfs -get x y
Common Commands
![Page 13: Meet Hadoop Family: part 1](https://reader031.fdocuments.in/reader031/viewer/2022021801/5875d0cc1a28ab8f438b52c7/html5/thumbnails/13.jpg)
Questions?https://www.meetup.com/Jakarta-Hadoop-Big-Data/