Cross-DC Fault-Tolerant ViewFileSystem @ Twitter
-
Upload
dataworks-summithadoop-summit -
Category
Technology
-
view
292 -
download
1
Transcript of Cross-DC Fault-Tolerant ViewFileSystem @ Twitter
Command Line Interface
hadoop —config /etc/hadoop/revenue-dcA fs -ls /hadoop —config /etc/hadoop/dw-dcB fs -ls /
……
Directory layout on local file system/etc/hadoop
revenue-dcA
dw-dcB
core-site.xml, hdfs-site.xml
Log processing
MR
YARN
HDFS
Revenue
MR
YARN
HDFS
Data Warehouse
MR
YARN
HDFS
Log processing
MR
YARN
HDFS
DataCenter A DataCenter B
DW @ dcB
HDFS blocks
Client machine:
hadoop -config /etc/hadoop/dw-dcB fs -get /user/bob/fileBDistributed FileSystem
fs.defaultFS -> hdfs://dw-nn-dcB -> NameNode1, NameNode2
/etc/hadoop/dw-dcB/hdfs-site.xml, core-site.xml
App paths HDFS paths
viewfs://dw-nn-dcB/var hdfs://dw-tmp-dcB/var
viewfs://dw-nn-dcB/user hdfs://dw-user-dcB/user
viewfs://dw-nn-dcB/logs hdfs://dw-log-dcB/logs
mountable.xmlFileSystem
ViewFileSystem (viewfs://)
Distributed FileSystem (hdfs://)
S3 FileSystem
Client machine:
hadoop -config /etc/hadoop/dw-dcB fs -get /user/bob/fileB
Distributed FileSystemdw-user-dcB -> dw-user-NN1-dcB, dw-user-NN2-dcB
ViewFileSystem/user -> hdfs://dw-user-dcB/user
/etc/hadoop/dw-dcB/hdfs-site.xml, core-site.xml, mountable.xml
DW @ dcB
hadoop —config /etc/hadoop/revenue-dcA fs -get /user/bob/fileA
hdfs --config /etc/hadoop/dw-dcB fsck -fs hdfs://dw-user-dc2 /
hadoop —config /etc/hadoop/revenue-dcA fs -get /user/bob/fileA
hdfs --config /etc/hadoop/dw-dcB fsck -fs hdfs://dw-user-dc2 /
// find all “fileC” files on all clustersfor i in `ls /etc/hadoop`;do hadoop --config /etc/hadoop/$i fs -ls fileC; done
○
Description $sourcePath Result
global namespace viewfs://revenue-nn-dcA/logs/dirB /logs/dirB Source path unresolvable
○
Description $sourcePath Result
global namespace viewfs://revenue-nn-dcA/logs/dirB /logs/dirB Source path unresolvable
Active namenode hdfs://revenue-log-nn1-dcA/logs/dirB /logs/dirB not reliable due to hard coded namenode
○
Description $sourcePath Result
global namespace viewfs://revenue-nn-dcA/logs/dirB /logs/dirB Source path unresolvable
Active namenode hdfs://revenue-log-nn1-dcA/logs/dirB /logs/dirB not reliable due to hard coded namenode
hftp & DNS alias hftp://revenue-nn-dcA/logs/dirB hftp reliability and efficiency
// no need to remember copy(From/To)Local aka get/putscalding> fsShell("-cp /local/user/cecily/file.txt /user/cecily/hdfs_file.txt")
res6: Int = 0
c@dc1
FileSystem nfly = FileSystem.get(conf);
out = nfly.create(“/nfly/C/user/cecily/file.txt”);
out.write(....);
out.close() BEGIN
out.close() END
In = nfly.open(“/nfly/C/user/cecily/file.txt”);
c@dc2
/user/cecily/_nfly_tmp_file.txt
/user/cecily/_nfly_tmp_file.txt
close, rename to /user/cecily/file.txt
close, rename to /user/cecily/file.txt
client@dc1