Hadoop I/O Analysis
-
Upload
richard-mcdougall -
Category
Technology
-
view
514 -
download
1
description
Transcript of Hadoop I/O Analysis
Architects)view)of)Hadoop)I/O)
I/O)analysis)using)vProbes))
Richard)McDougall)V1.0))
April)2012)
Architect’s)QuesFons)
• Does)Hadoop)really)need)compute)+)data)local)
• How)much)and)what)I/O)rates)of)ephemeral)data)do)we)need)to)design)for?)
• What)I/O)paKerns)do)we)need)to)support)HDFS?)
• What)is)the)I/O)paKern)of)MNR)tasks)• Are)there)opportuniFes)for)caching)–)map)input,)output)or)ephemeral?)
Controlled)Small)Study)
• Focus)on)developing)tooling)• Using)vProbes)+)Perl)+)R)• Hadoop)0.20.204)• Terasort)@)1GB)
• One)Namenode,)Tasktracker,)Datanode)
Terasort)
Map)Task)
Map)Task)
Map)Task)
Map)Task)
Input)File)
Input)Splits)(x16))
Sort)Chunk)of)Of)KeyNValues)
Shuffle) Reduce)(Sort))
Output)File)
Shuffle)output)To)Reducers) Combine)and)Sort)
Log)of)the)sort)‘Job’)$ log.pl job_201201261301_0005_1327649126255_rmc_TeraSort ! Item Time Jobname Taskname Phase Start-Time End-Time Elapsed ! Job 0.000 201201261301_0005 ! Job 201201261301_0005 ! Job 0.475 201201261301_0005 PREP ! Task 1.932 201201261301_0005 m_000017 SETUP ! MapAttempt 3.066 201201261301_0005 m_000017 SETUP ! MapAttempt 10.409 201201261301_0005 m_000017 SETUP SUCCESS 1.932 10.409 8.477 "setup"! Task 10.966 201201261301_0005 m_000017 SETUP SUCCESS 1.932 10.966 9.034 ! Job 201201261301_0005 RUNNING ! Task 10.970 201201261301_0005 m_000000 MAP ! Task 10.972 201201261301_0005 m_000001 MAP ! MapAttempt 10.981 201201261301_0005 m_000000 MAP ! MapAttempt 65.819 201201261301_0005 m_000000 MAP SUCCESS 10.970 65.819 54.849 ""! Task 68.063 201201261301_0005 m_000000 MAP SUCCESS 10.970 68.063 57.093 ! MapAttempt 10.998 201201261301_0005 m_000001 MAP ! MapAttempt 65.363 201201261301_0005 m_000001 MAP SUCCESS 10.972 65.363 54.391 ""! Task 68.065 201201261301_0005 m_000001 MAP SUCCESS 10.972 68.065 57.093 ! Task 68.066 201201261301_0005 m_000002 MAP ! Task 68.067 201201261301_0005 m_000003 MAP ! Task 68.068 201201261301_0005 r_000000 REDUCE ! MapAttempt 68.075 201201261301_0005 m_000002 MAP ! MapAttempt 139.789 201201261301_0005 m_000002 MAP SUCCESS 68.066 139.789 71.723 ""! Task 140.193 201201261301_0005 m_000002 MAP SUCCESS 68.066 140.193 72.127 ! MapAttempt 68.076 201201261301_0005 m_000003 MAP ! MapAttempt 139.927 201201261301_0005 m_000003 MAP SUCCESS 68.067 139.927 71.860 ""! Task 140.198 201201261301_0005 m_000003 MAP SUCCESS 68.067 140.198 72.131 !…! ReduceAttempt 68.112 201201261301_0005 r_000000 REDUCE ! ReduceAttempt 795.299 201201261301_0005 r_000000 REDUCE SUCCESS 68.068 795.299 727.231 "reduce > reduce"! Task 798.223 201201261301_0005 r_000000 REDUCE SUCCESS 68.068 798.223 730.155 ! Task 798.226 201201261301_0005 m_000016 CLEANUP ! MapAttempt 798.241 201201261301_0005 m_000016 CLEANUP ! MapAttempt 806.113 201201261301_0005 m_000016 CLEANUP SUCCESS 798.226 806.113 7.887 "cleanup"! Task 807.252 201201261301_0005 m_000016 CLEANUP SUCCESS 798.226 807.252 9.026 ! Job 807.253 201201261301_0005 SUCCESS 0.000 807.253 807.253 !
Terasort:)Map)and)Reduce)Phases)
Elapsed)Time)N)Seconds)
Reducer)
Mappers)
Setup)Map)
Cleanup)Map)
Terasort:)Map)and)Reduce)Phases)
Elapsed)Time)N)Seconds)
Reducer)
Mappers)
Setup)Map)
Cleanup)Map)
Zoom)in)on)
Map)Task)I/O)
Zoom)in)on)
Reduce)Task)I/O)
VMware)vProbes)
• Dynamic)InstrumentaFon)
• Probe)mulFple)VMs)
• Probe)VirtualizaFon)Layer)
• VMware)Fusion)and)WorkstaFon)
vProbes)
GUEST:ENTER:system_call {! string path;! comm = curprocname();! tid = curtid();! pid = curpid();! ppid = curppid();! syscall_num = sysnum;!! if(syscall_num == NR_open) {!
!path = guestloadstr(sys_arg0);! syscall_name = "open";! sprintf(syscall_args, "\"%s\", %x, %x", path, sys_arg1, sys_arg2); ! …!}!!GUEST:OFFSET:ret_from_sys_call:0 {!
!printf("%s/%d/%d/%d %s(%s) = %d <0>\n", comm, pid, rtid, ppid, syscall_name,! syscall_args, getgpr(REG_RAX)); !}!!!java/14774/15467/1 open("/host/hadoop/hdfs/data/current/subdir0/blk_1719908349220085071_1649.meta", 0, 1b6) = 144 <0>!java/14774/15467/1 stat("/host/hadoop/hdfs/data/current/subdir0/blk_1719908349220085071_1649.meta", 7f0b80a4e590) = 0 <0>!java/14774/15467/1 read(144, 7f0b80a4c470, 4096) = 167 <0>!!
Pathname)ResoluFon)filetracevp.pl: !!if ($syscall =~ m/open/) {! $path1 = $line;! $path1 =~ s/[A-z\/0-9]+[ ]+[a-z]+\("([^"]+)".*\n/\1/;! $fd1 = $line;! if ($fd1 =~ s/.* ([0-9]+) <.*>\n/\1/) {! $fds{$pid,$fd1} = $path1;!!if ($syscall =~ m/write/) {! $params = $line;! if ($params =~ s/^[A-z\/0-9]+[ ]+[a-z]+\(([0-9]+),.* ([0-9]+)\) = ([0-9]+) <(.*)>\n/\1,\2,\3,\4/) {! ($fd1, $size, $bytes, $lat) = split(',', $params);! $path1 = $fds{$pid, $fd1};!…!!!java,14774,15467,,open,0,0,0,0,144,/host/hadoop/hdfs/data/current/subdir0/blk_1719908349220085071_1649.meta,0,!java,14774,15467,,stat,0,0,0,0,0,/host/hadoop/hdfs/data/current/subdir0/blk_1719908349220085071_1649.meta,0,!java,14774,15467,,read,4096,167,0,0,144,/host/hadoop/hdfs/data/current/subdir0/blk_1719908349220085071_1649.meta,0,!!!!!
Controlled)SmallNScale)Study)
Job Counters ! Launched reduce tasks=1! SLOTS_MILLIS_MAPS=1146887! Launched map tasks=16! Data-local map tasks=16! SLOTS_MILLIS_REDUCES=766823! File Input Format Counters ! Bytes Read=1000057358! File Output Format Counters ! Bytes Written=1000000000! FileSystemCounters! FILE_BYTES_READ=2382257412! HDFS_BYTES_READ=1000059070! FILE_BYTES_WRITTEN=3402627838! HDFS_BYTES_WRITTEN=1000000000! Map-Reduce Framework! Map output materialized bytes=1020000096! Map input records=10000000! Reduce shuffle bytes=1020000096! Spilled Records=33355441! Map output bytes=1000000000! Map input bytes=1000000000! Combine input records=0! SPLIT_RAW_BYTES=1712! Reduce input records=10000000! Reduce input groups=10000000! Combine output records=0! Reduce output records=10000000! Map output records=10000000!
Hadoop)Distro) 236)Hadoop)Logs) 132)Hadoop)clienKmp)unjar) 1)Mappers)files)jobcache)N)spills) 1753)Mappers)files)jobcache)N)output) 1777)Reducer)Intermediate) 764)Reducers)Shuffle)and)Intermediate) 1744)Jobcache)class)files)and)shell)scripts) 1)Hadoop)Datanode) 1690)JVM)N)/usr/lib/jvm…) 98)
Total&MB& 7987&
$ hadoop jar hadoop-examples-0.20.204.0.jar teragen 10000000 teradata!<begin trace>!$ hadoop jar hadoop-examples-0.20.204.0.jar terasort teradata teraout!!
0) 200) 400) 600) 800)1000)1200)1400)1600)1800)2000)
Hadoop)Distro)
Hadoop)Logs)
Hadoop)clienKmp)unjar)
Mappers)files)jobcache)N)spills)
Mappers)files)jobcache)N)map)output)
Reducer)intermediate)file)
Reducers)files)jobcache)N)output)
Jobcache)class)files)and)shell)scripts)
Hadoop)Datanode)
JVM)N)/usr/lib/jvm…)
12)
)75%)of)Disk)Bandwidth)
Hadoop)I/O)Model)(With)some)data)from)early)observaFons))
Job)
Map)Task)
Map)Task)
Map)Task)
Map)Task)
Reduce)
Reduce)
HDFS)
DFS)Input)Data)
DFS)Output)Data))
12%)of)Bandwidth)
)12%)of)Bandwidth)
Spills)&)Logs)spill*.out*
Spills)
Map)Output)file.out*
Shuffle)Map_*.out*
Sort)
Combine)Intermediate.out*
One)Mapper)Task:)Temp)Data)path bytes/host/hadoop/clienttmp/mapred/local/taskTracker/rmc/jobcache/job_201201251035_0001/attempt_201201251035_0001_m_000000_0/output/file.out 67586124/host/hadoop/clienttmp/mapred/local/taskTracker/rmc/jobcache/job_201201251035_0001/attempt_201201251035_0001_m_000000_0/output/spill1.out 52762519/host/hadoop/clienttmp/mapred/local/taskTracker/rmc/jobcache/job_201201251035_0001/attempt_201201251035_0001_m_000000_0/output/spill0.out 52508540/host/hadoop/clienttmp/mapred/local/taskTracker/rmc/jobcache/job_201201251035_0001/attempt_201201251035_0001_m_000000_0/output/spill2.out 29698564/usr/lib/jvm/javaD6Dopenjdk/jre/lib/rt.jar 5057763/home/rmc/untars/hadoopD0.20.204.0/hadoopDcoreD0.20.204.0.jar 895582/home/rmc/untars/hadoopD0.20.204.0/lib/log4jD1.2.15.jar 82522/home/rmc/untars/hadoopD0.20.204.0/lib/commonsDlangD2.4.jar 70477/home/rmc/untars/hadoopD0.20.204.0/lib/commonsDconfigurationD1.6.jar 61007/usr/lib/x86_64DlinuxDgnu/gconv/gconvDmodules 51772/host/hadoop/clienttmp/mapred/local/taskTracker/rmc/jobcache/job_201201251035_0001/job.xml 44420/home/rmc/untars/hadoopD0.20.204.0/lib/commonsDcollectionsD3.2.1.jar 29974/host/hadoop/clienttmp/mapred/local/taskTracker/rmc/jobcache/job_201201251035_0001/attempt_201201251035_0001_m_000000_0/job.xml 21695/usr/lib/jvm/javaD6Dopenjdk/jre/lib/amd64/libnio.so 15946/home/rmc/untars/hadoopD0.20.204.0/conf/coreDsite.xml 11024/usr/lib/jvm/javaD6Dopenjdk/jre/lib/security/java.security 10081/proc/self/maps 7523
One)Mapper)Task:)Temp)I/O)Counts)I/O)measured)at)syscall)
1 2 4 8 16 32 64 128
256
512
1024
2048
4096
8192
1638
432
768
6553
613
1072
I/O Size Bucket
Num
ber o
f I/O
s
0
10000
20000
30000
40000
50000
60000
1 2 4 8 16 32 64 128
256
512
1024
2048
4096
8192
1638
432
768
6553
613
1072
Read I/O Size Bucket
Num
ber o
f I/O
s
0
5000
10000
15000
20000
25000
30000
1 2 4 8 16 32 64 128
256
512
1024
2048
4096
8192
1638
432
768
6553
613
1072
Write I/O Size Bucket
Num
ber o
f I/O
s
0
5000
10000
15000
20000
25000
30000
One)Mapper)Task:)Tmp)Bytes)Transferred)1 2 4 8 16 32 64 128
256
512
1024
2048
4096
8192
1638
432
768
6553
613
1072
I/O Size Bucket
Byte
s
0.0e+00
5.0e+07
1.0e+08
1.5e+08
2.0e+08
2.5e+08
1 2 4 8 16 32 64 128
256
512
1024
2048
4096
8192
1638
432
768
6553
613
1072
2621
4452
4288
1048
576
2097
152
4194
304
8388
608
1677
7216
3355
4432
6710
8864
1342
1772
8
I/O Size Bucket
Byte
s
0e+00
1e+07
2e+07
3e+07
4e+07
5e+07
6e+07
Logical)I/O)(sequenFal)grouping)of)syscalls))I/O)measured)at)syscall)
Reducer)Task:)Temp)Data)
Reducer)Task:)Temp)I/O)Counts)I/O)measured)at)syscall)
1 2 4 8 16 32 64 128
256
512
1024
2048
4096
8192
1638
432
768
6553
613
1072
I/O Size Bucket
Num
ber o
f I/O
s
0e+00
1e+05
2e+05
3e+05
4e+05
1 2 4 8 16 32 64 128
256
512
1024
2048
4096
8192
1638
432
768
6553
613
1072
Read I/O Size Bucket
Num
ber o
f I/O
s
0
50000
100000
150000
200000
250000
300000
1 2 4 8 16 32 64 128
256
512
1024
2048
4096
8192
1638
432
768
6553
613
1072
Write I/O Size Bucket
Num
ber o
f I/O
s
0
20000
40000
60000
80000
Reducer)Task:)Tmp)Bytes)Transferred)
Logical)I/O)(sequenFal)grouping)of)syscalls))I/O)measured)at)syscall)
1 2 4 8 16 32 64 128
256
512
1024
2048
4096
8192
1638
432
768
6553
613
1072
I/O Size Bucket
Byte
s
0.0e+00
5.0e+08
1.0e+09
1.5e+09
1 2 4 8 16 32 64 128
256
512
1024
2048
4096
8192
1638
432
768
6553
613
1072
2621
4452
4288
1048
576
2097
152
4194
304
8388
608
I/O Size Bucket
Byte
s
0e+00
1e+08
2e+08
3e+08
4e+08
5e+08
Datanode)–)Bytes)Transferred)1 2 4 8 16 32 64 128
256
512
1024
2048
4096
8192
1638
432
768
6553
613
1072
2621
4452
4288
1048
576
2097
152
4194
304
8388
608
1677
7216
3355
4432
6710
8864
1342
1772
8I/O Size Bucket
Byte
s
0e+00
1e+08
2e+08
3e+08
4e+08
5e+08
1 2 4 8 16 32 64 128
256
512
1024
2048
4096
8192
1638
432
768
6553
613
1072
Read I/O Size Bucket
Byte
s
0e+00
1e+08
2e+08
3e+08
4e+08
5e+08
6e+08
7e+08
1 2 4 8 16 32 64 128
256
512
1024
2048
4096
8192
1638
432
768
6553
613
1072
Write I/O Size Bucket
Byte
s
0e+00
2e+08
4e+08
6e+08
8e+08
1e+09
1 2 4 8 16 32 64 128
256
512
1024
2048
4096
8192
1638
432
768
6553
613
1072
I/O Size Bucket
Byte
s
0.0e+00
5.0e+08
1.0e+09
1.5e+09
Datanode)–)Actual)vs.)Logical)I/O)Size)
1 2 4 8 16 32 64 128
256
512
1024
2048
4096
8192
1638
432
768
6553
613
1072
2621
4452
4288
1048
576
2097
152
4194
304
8388
608
1677
7216
3355
4432
6710
8864
1342
1772
8I/O Size Bucket
Byte
s
0e+00
1e+08
2e+08
3e+08
4e+08
5e+08
1 2 4 8 16 32 64 128
256
512
1024
2048
4096
8192
1638
432
768
6553
613
1072
I/O Size Bucket
Byte
s
0.0e+00
5.0e+08
1.0e+09
1.5e+09
1 2 4 8 16 32 64 128
256
512
1024
2048
4096
8192
1638
432
768
6553
613
1072
2621
4452
4288
1048
576
2097
152
4194
304
8388
608
1677
7216
3355
4432
6710
8864
1342
1772
8
I/O Size Bucket
Byte
s
0e+00
1e+08
2e+08
3e+08
4e+08
5e+08
Logical)I/O)(sequenFal)grouping)of)syscalls))I/O)measured)at)syscall)
Datanode)–)IOPS)1 2 4 8 16 32 64 128
256
512
1024
2048
4096
8192
1638
432
768
6553
613
1072
2621
4452
4288
1048
576
2097
152
4194
304
8388
608
1677
7216
3355
4432
6710
8864
1342
1772
8I/O Size Bucket
Byte
s
0e+00
1e+08
2e+08
3e+08
4e+08
5e+08
1 2 4 8 16 32 64 128
256
512
1024
2048
4096
8192
1638
432
768
6553
613
1072
I/O Size Bucket
Num
ber o
f I/O
s
0
5000
10000
15000
20000
25000
1 2 4 8 16 32 64 128
256
512
1024
2048
4096
8192
1638
432
768
6553
613
1072
Read I/O Size Bucket
Num
ber o
f I/O
s
0
2000
4000
6000
8000
10000
1 2 4 8 16 32 64 128
256
512
1024
2048
4096
8192
1638
432
768
6553
613
1072
Write I/O Size Bucket
Num
ber o
f I/O
s
0
5000
10000
15000
Back)of)the)Envelope)Modeling)))
• How)much)bandwidth)does)terasort)need?)– 10)seconds)of)CPU/core)Fme)per)task)– 128MB)of)HDFS)per)task)
– ~3x,)384MB)of)temporary)data)per)task)
I/O&Component& Per7task& Per7task&Bandwidth& Per7host&(24&cores)&
HDFS)I/O) 128MB) ~13MBytes/s) 312Mbytes/sec)
Temp) 384MB) ~38Mbytes/sec) 912Mbytes/sec)
Do)we)need)locality?)• Main)issue)is)crossNsecFonal)bandwidth)– Secondary)issue)is)perNhost)link)speed)– Just)look)at)storage)I/O)now,)consider)shuffle)next)
I/O&Component&
Per7host&(24&cores)&
Network&Bandwidth&&w/&0%&locality&
Rack&Bandwidth&w/40&hosts&
HDFS)I/O) 312Mbytes/sec) 2.5Gbits) 100gbits)
Temp) 912Mbytes/sec) 7.3Gbits) 300gbits)
• Possible)Conclusion)– Must)have)locality)w/1Gbit)host)link)
– Feasible)to)have)remote)data)w/10Gbit)and)keeping)temp)local)only)