October 2013 HUG: HBase 0.96
-
Upload
yahoo-developer-network -
Category
Technology
-
view
109 -
download
3
description
Transcript of October 2013 HUG: HBase 0.96
0.96.0
Bay Area Hadoop User Group, October 16th, 2013
Michael Stack <[email protected]>
• 0.96.0 Release Manager• Chair of Apache HBase PMC*• Apache Hadoop PMC• Engineer at Cloudera in San Francisco
* Project Management Committee
HBase?
"...scalable, distributed datastore."
"...open source, distributed, scalable, consistent, low latency, random access non-relational database..."
Inspiration
A Google Technology described in a 2006 paper, by Chang et al.?
●Apache Top-level Project○hbase.apache.org●Up out of Apache Hadoop contrib●Project goal: “Billions of rows X millions of columns on clusters of ‘commodity hardware”●HBase persists all data to HDFS●Uses Apache ZooKeeper○Cluster coordination
When would I use it?
BIG DATA
Random read/writes
SCALING!
Who uses it?
Who runs the project?
Diverse team*
* http://hbase.apache.org/team-list.html
COMMITTERS!
Preferably ALIVE!
•Release every month• Each more stable•& more performant•Some features…• Wire compatible between releases
•Currently at 0.94.12
http://www.flickr.com/photos/sysli/3026288256/sizes/o/in/photostream/
(Self-)Migration
Downstreamers● Minimal API disturbance
–None?–Last-minute feedback
●Hive, Sqoop, OpenTSDB● Deprecations
Stats● >2k issues fixed
– >1500 in 0.96.x only● Currently 6th Release Candidate● Branched 7months ago● 18months in the making
Requirements● Hadoop 1.0.3+● Hadoop 2.1.0-beta+● Must choose one
Big Themes● Stability● Operability
–Insight, tools● Scalability● Evolvability
http://www.flickr.com/photos/allspaw/5815258929/sizes/o/in/photostream/
http://www.flickr.com/photos/allspaw/5815258929/sizes/o/in/photostream/
http://www.flickr.com/photos/38595542@N02/3690830720/sizes/o/in/photostream/
• Dedicated meta WAL
• Don't put WAL replicas on local node– 33% of reads have to timeout
• Lowered ZK timeout– 30s instead of 180s
• Watcher script kills znode– Detection time approaches 0
• Faster assignment
HBase
• HDFS-4721 Speed up lease/block recovery when DN fails and a block goes into recovery– Do not recover on STALE DNs
• HDFS-3703 Decrease the datanode failure detection time– Avoid reading STALE DNs
• HDFS-3912 Detecting and avoiding stale datanodes for writing
HDFS
● Faster WAL replay/Distributed WAL Replay– No intermediate files
● No wait on NN– Committed
● Experimental● Regions online immediately for Writes
– Read older consistent view● “Favored Nodes”
Coming...
One rationale for pb: http://goo.gl/N0HO6n
• System tables• Filesystem• Up in zookeeper• Over the wire
RPC• Implements Protobuf Service
●Specification!• Data on the sideoEncodingoCompression
PB DATA
Scalability• e.g. Replicating 1k to 1k & heading north
• HBASE-8778 Region assigments scan table directory making them slow for huge tables
• HBASE-9208 ReplicationLogCleaner slow at large scale
• HBASE-8877 Reentrant row locks
Snapshots• By TableoSnapshot, clone, restore, export
• InexpensiveoJust metadata
• Good for...oBackupsoReplicationoOffline processing
Integration Tests• Cluster test module
o Standalone or clustero Sizeable
x data x runtime
• "Borrows" test types from all overo Netflix "ChaosMonkey"o Apache Accumulo linked-list dataloss
checkerhbase-it/src/test/java//org/apache/hadoop/hbase/mapreduce/IntegrationTestBulkLoad.java
hbase-it/src/test/java//org/apache/hadoop/hbase/mapreduce/IntegrationTestImportTsv.java
hbase-it/src/test/java//org/apache/hadoop/hbase/mttr/IntegrationTestMTTR.java
hbase-it/src/test/java//org/apache/hadoop/hbase/test/IntegrationTestBigLinkedList.java
hbase-it/src/test/java//org/apache/hadoop/hbase/test/IntegrationTestLoadAndVerify.java
hbase-it/src/test/java//org/apache/hadoop/hbase/trace/IntegrationTestSendTraceRequests.java
StochasticLoadBalancer
• Region Count
• Locality
• Movement Cost
• Table Count
• Regions/Table/RegionServer
• Read/Write Counts
• Memstore Size
• Storefile Size
Tracing• Review HDFS-5274 Add Tracing to HDFS!
Namespaces• Grouping of tables
– Like database in mysql
• System/User– hbase:meta
• Quota• Coming
– Security by ns– Grouping on cluster by ns
Metrics2● Radical revamp● Module of Interfaces
–H1 and H2 Impls modules● Categories/Naming/Patterns
API● Client/Dev● Hadoop Annotations
– Stable/Evolving/Private● Cell Interface
– KeyValue deprecated
Miscellaneous• X-Row (in-region) Transactions• Hardened Assignment• Hardened Replication• New UI• Online Merge• Finer grained ACLs• More Coprocessor hooks
More Misc.• Maven modularized• Client-side Types• Revamped defaults• Compactionso Pluggableo Smarter triggers
• Windows!
0.96.1, 0.96.2, etc.● Bug fixes● Performance fixes● ONLY!● No features!
• Right after 0.96.0– Month or two
• Rolling upgrade from 0.96.0
• In-line Cell-tags• Quota/Groupings• Reverse Scan
1.0.0?
Thank [email protected]