Hadoop World: Low Latency, Random Reads from HDFS
-
Upload
oleksiy-kovyrin -
Category
Documents
-
view
102 -
download
0
Transcript of Hadoop World: Low Latency, Random Reads from HDFS
RadFS
Random Access DFS
DFS in a slide
Locates DataNode, opens socket, says hi
DataNode allocates a thread to stream block contents from opened position
Client reads in as it processes
Great for streaming entire blocks
Positioned reads cut across the grain
A Different Approach
Everything is a positioned read
All interactions with DN are stateless
Wrapped ByteService interfaces Caching, Network, Checksum
Each configurable, can be optimized for task
Connections pooled and re-used often
Fewer threads on server
DFS Anatomy – seek + read()
DFS seek+read con't
Locates preferred block, caches DN locations
Opens new Socket and BlockReader for each random read
Reads from the Socket
Object creation means GC debt
No optimizations for repeated reads of the same data
Threads will consume resources on server after client hangup – files aren't automatically closed
RadFS Overview
RadFS seek+read, con't
Transparently caches frequently read data
Automatically pools/manages file handles
Reduces network congestion (in theory)
Lower DataNode workload 3 threads total instead of 1 per Xceiver
Configurable on client side for the task at hand
Network latency penalty on long-running reads
Checksum implementation means 2 reads per random read if caching is disabled
Implementation Notes
Checksum is currently generated by wrapping CheckSumFileSystem around RadFileSystem
Inefficient, reads 2 files over dfs
Improper – what if checksum block is corrupt?
CachingByteService implements lookahead (good) by copying bytes twice (bad)
Permissions happen “by accident” at namenode Attackable by searching blockid space on DNs
Could exchange UserAccessToken on request
Benchmark Environment
EC2 “Medium” - 2x2GHz, 1.7GB, shared I/O
Operations against 20GB sequence file
All tests run singlethreaded from the lightly loaded namenode
Fast internal network, adequate memory but not enough to page entire file
All benchmarks in a given set were run on the same instance, middle value from 3 runs
Random Reads - 2k
10,000 random reads of 2k each over the length of a 20GB file
DFS averaged 7.8ms while Rad with no cache averaged 4.4ms
Caching added a full 2ms – hardcoded lookahead was no help and lots of unnecessary byte copying
Random Reads – 2kb (avg in ms)
DFS RadFS Rad - No Cache0
2
4
6
8
10
Column 2
SequenceFile Search
Binary search over 10gb sequence file
DFS, RadFS with various cache settings
Indicative of potential filesystem uses Lucene
Large numbers of secondary indices
Ease of development
Read-only RDBMS-like systems built from ETLs or other long-running process
Sequence File Binary Search5000 searches, avg ms per search
DFSRAD0
RAD16MBRAD128MB
0
20
40
60
80
100
120
Column 2
Streaming
DFS is inherently faster for streaming due to the dedicated server thread
Checksumming is expensive!
Early radfs builds beat dfs at 1-byte read()s because they didn't have checksumming
Require a PipeliningByteService for use in streaming jobs that would make requests to Datanode, stream in and checksum in a separate client-side thread
Streaming – 1GB1b reads, time in seconds
DFS RAD no checksum0
50
100
150
200
250
300
350
Column 2
Streaming 1GB2k reads, time in seconds
DFS RAD0
20
40
60
80
100
120
140
160
Column 2
Going forward – modular reader
Going forward - Applications
Could improve Hbase, solves file handle problem and improves latency
Could be used to create low-latency lookup formats accessible from scripting languages
Cache is automatic, simplifying development
“Table” directory with main store file and several secondary index files generated by ETL
Lucene indices? Can be built with MapReduce
Going forward - Development Copy existing HDFS method of interleaving
checksums directly from datanode – one read Audit checksumming code for CPU efficiency
– reading can be CPU bound
Implement as a ByteService instead of clumsy wrapper around FileSystem. Make configurable
Implement PipeliningByteService to improve streaming by pre-fetching pages
Exchange UserAccessToken at each read, could possibly use for encryption of blockid
Contribute!
Patch is at Apache JIRA issue HDFS-516
Will be on GitHub momentarily
Goals: Equivalent streaming performance to DFS
Faster random read, caching option
Lower resource consumption on server
3 doable tasks above
Large configuration space to explore
Email me: [email protected]