4 supporting h base jeff, jon, kathleen - cloudera - final 2
-
Upload
cloudera-inc -
Category
Technology
-
view
3.327 -
download
1
Transcript of 4 supporting h base jeff, jon, kathleen - cloudera - final 2
![Page 1: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/1.jpg)
Supporting HBase: How to Stabilize, Diagnose and Repair
Jeff Bean, Jonathan Hsieh, Kathleen Ting{jwfbean,jon,kathleen}@cloudera.com
5/22/12
HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved
![Page 2: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/2.jpg)
2
Who Are We?
• Jeff Bean• Designated Support Engineer, Cloudera• Education Program Lead, Cloudera
• Kathleen Ting• Support Manager, Cloudera• ZooKeeper Subject Matter Expert
• Jonathan Hsieh• Software Engineer, Cloudera• Apache HBase Committer and PMC member
HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved
![Page 3: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/3.jpg)
3
Outline
• Preventative HBase Medicine: • Tips for a healthy HBase
• The HBase Triage:• Fixes for acute HBase pains
• The HBase Surgery:• Repairing a Corrupted HBase
HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved
![Page 4: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/4.jpg)
4
Outline
• Preventative HBase Medicine: • Tips for a healthy HBase
• The HBase Triage:• Fixes for acute HBase pains
• The HBase Surgery:• Repairing a Corrupted HBase
HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved
![Page 5: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/5.jpg)
5
“Monitor your system, exercise your workload, and eat your vegetables.”
HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved
![Page 6: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/6.jpg)
6HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved
![Page 7: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/7.jpg)
7HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved
![Page 8: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/8.jpg)
8
HBase Cross-Section
HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved
Disk / Network
JVM / Linux
ZooKeeper HDFS
HBase
App MR
![Page 9: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/9.jpg)
9
Doctor’s Advice: “A ounce of prevention worth a pound of cure.”
• Understand your workload and test for it
• Size your cluster properly (see Cluster Sizer)
• Monitor, alert, and manage your cluster with Ganglia, Nagios, and/or Cloudera Manager• Don’t be Dr. House!
HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved
![Page 10: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/10.jpg)
10Copyright 2012 Cloudera Inc. All rights reserved
A Case Study
![Page 11: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/11.jpg)
11
Symptom: Long Running MapReduce job with blacklisted TaskTrackers
HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved
TaskTracker No. of Failures
NodeX 4
NodeY 3
NodeQ 7
NodeB 10
NodeP 8
NodeV 6
Disk / Network
JVM / Linux
ZK HDFS
HBase
App MR
![Page 12: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/12.jpg)
12
Symptom: Node B Task Logs
$ find . | xargs grep "giving up“./attempt_201107261334_0221_m_000962_1/syslog:2011-08-02
11:09:34,248 INFO org.apache.hadoop.ipc.HbaseRPC: Server at NodeA:60020 could not be reached after 1 tries, giving up.
./attempt_201107261334_0221_m_000962_1/syslog:2011-08-02 11:09:37,328 INFO org.apache.hadoop.ipc.HbaseRPC: Server at NodeA:60020 could not be reached after 1 tries, giving up.
./attempt_201107261334_0221_m_000962_1/syslog:2011-08-02 11:09:40,465 INFO org.apache.hadoop.ipc.HbaseRPC: Server at NodeA:60020 could not be reached after 1 tries, giving up.
HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved
Disk / Network
JVM / Linux
ZK HDFS
HBase
App MR
![Page 13: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/13.jpg)
13
Symptom: RegionServer logs of Node A:
2011-08-02 11:04:20,324 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer
: ABORTING region server serverName=NodeA,60020,1312228900706, load=(requests=10847, regions=342, usedHeap=8193, maxHeap=15350): regions
erver:60020-0x4316487a73e1626 regionserver:60020-0x4316487a73e1626 received expired from ZooKeeper, aborting
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved
Disk / Network
JVM / Linux
ZK HDFS
HBase
App MR
![Page 14: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/14.jpg)
14
Cascading failure! Some other node says ouch…2011-08-01 12:55:39,356 INFO org.apache.hadoop.hbase.regionserver.ShutdownHook:
Shutdown hook starting; hbase.shutdown.hook=true; fsShutdownHook=Thread[Thread-15,5,main]
2011-08-01 12:55:39,629 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Shutdown hook
2011-08-01 12:55:39,629 INFO org.apache.hadoop.hbase.regionserver.ShutdownHook: Starting fs shutdown hook thread.
2011-08-01 12:55:39,695 ERROR org.apache.hadoop.hdfs.DFSClient: Exception closing file /hbase/.logsNodeA,60020,1311651881177NodeA%3A60020.1311656326143 : java.io.IOException: Error Recovery for block blk_1102151039331207284_16350929 failed because recovery from primary datanode NodeA:50010 failed 6 times. Pipeline was NodeA:50010. Aborting...
java.io.IOException: Error Recovery for block blk_1102151039331207284_16350929 failed because recovery from primary datanode NodeA:50010 failed 6 times. Pipeline was NodeA:50010. Aborting...
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2841)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:2305)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2477)
2011-08-01 12:55:39,842 INFO org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook finished.
HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved
Disk / Network
JVM / Linux
ZK HDFS
HBase
App MR
![Page 15: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/15.jpg)
15
Symptom: Ganglia Memory Graph on Node A…
HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved
Disk / Network
JVM / Linux
ZK HDFS
HBase
App MR
![Page 16: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/16.jpg)
16
Symptom: Ganglia swap_free on Node A…
HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved
Disk / Network
JVM / Linux
ZK HDFS
HBase
App MR
![Page 17: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/17.jpg)
17
A Case study: Radiant Pain“I was having back pains, and it turned out to be my heart!”
• Too many MR Slots• MR Slots too large• Too many non-HBase
small files (HDFS-2379)
Node A Under Load
• “Arbitrary” processes pause or unresponsive
Node A swaps• MapReduce tasks fail• HDFS datanode
operations time out• HBase client operations
fail
Node B can’t connect to node A
• JobTracker blacklists TT on node B
• Jobs fail or run slow• NameNode re-replicates
blocks from node A
Masters Take Action
HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved
![Page 18: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/18.jpg)
18
Event Trail and Evidence Trail
Node A condition
(load)
Node A event (swap)
Node B symptom (connect)
Master Action
(blacklist)
HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved
Node A Monitoring
Transient swap not logged!
Node B Logs
Master Logs and
UIs
!!!?!?
![Page 19: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/19.jpg)
19
DOs and DON’Ts for keeping HBase Healthy
DOs• Monitor and Alert• Optimize network• Know your logs
DON’Ts• Swap• Oversubscribe MR• Share the network
HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved
![Page 20: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/20.jpg)
20
Outline
• Preventative HBase Medicine: • Tips for a healthy HBase
• The HBase Triage:• Fixes for acute HBase pains
• The HBase Surgery:• Repairing a Corrupted HBase
HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved
![Page 21: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/21.jpg)
“Cloudera 911 here, how can we help?”
![Page 22: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/22.jpg)
22
HBase Support Tickets
44%
12%
16%
28%
HBase, ZK, MR, HDFS MisconfigPatch RequiredFix HW/NWRepair Needed
HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved
![Page 23: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/23.jpg)
23
Understanding the logs helps us diagnose issues
• Related events logged by different processes in different places• Log messages point at each other• HDFS accesses by RS logged by NN and DN• HBase accesses by MR logged by JT, RS, NN, ZK• ZK logs indicate HBase health
HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved
![Page 24: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/24.jpg)
24Copyright 2012 Cloudera Inc. All rights reserved
The HBase Triage: Fixes for acute HBase pains
• Severe Pain
• Complete Unconsciousness
![Page 25: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/25.jpg)
25Copyright 2012 Cloudera Inc. All rights reserved
The HBase Triage: Fixes for acute HBase pains
• Severe Pain
• Complete Unconsciousness
![Page 26: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/26.jpg)
26
Connection Reset
WARN - Session <id> for server <server id>, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Connection reset by peer
What causes this?• Running out of ZK connections
How can it be resolved?• Manually close connections• Fixed in HBASE-5466 and HBASE-4773
HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved
Disk / Network
JVM / Linux
ZK HDFS
HBase
App MR
![Page 27: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/27.jpg)
27
Running out of DN Threads & File Descriptors
INFO hdfs.DFSClient: Could not obtain block <blk id> from any node: java.io.IOException: No live nodes contain current block. ERROR java.io.IOException: Too many open files
What causes this? • HBase likes to keep data files open
How can it be resolved? • Increase dfs.datanode.max.xcievers to 4096• Increase /etc/security/limits.conf
• hbase - nofile 32768 HBaseCon 2012. 5/22/12
Copyright 2012 Cloudera Inc. All rights reserved
Disk / Network
JVM / Linux
ZK HDFS
HBase
App MR
![Page 28: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/28.jpg)
28
“Long Garbage Collecting Pause”
WARN org.apache.hadoop.hbase.util.Sleeper: We slept 19118ms instead of 1000ms, this is likely due to a long garbage collecting pause and it's usually bad
How can it be resolved?• zoo.cfg: maxSessionTimeout=180000
hbase-site.xml: zookeeper.session.timeout=180000
• Oversubscribed if MR & HBase are co-located
HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved
Disk / Network
JVM / Linux
ZK HDFS
HBase
App MR
![Page 29: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/29.jpg)
29
Heap Allocation Per Node
HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved
(Map + Red) x Child Heap +
DN heap +
TT heap +
RS heap +
OS (20% of RAM)
Total RAM
![Page 30: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/30.jpg)
30Copyright 2012 Cloudera Inc. All rights reserved
The HBase Triage: Fixes for acute HBase pains
• Severe Pain
• Complete Unconsciousness
![Page 31: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/31.jpg)
31
ZK can’t start & HBase hangs
INFO org.apache.hadoop.hdfs.DFSClient: Could not complete file <name> retrying…
What causes this? • High dfs.replication.min causes HBase hang -
can’t close file until created all replicasHow can it be resolved? • Remove dfs.replication.min• Temp increase dfs.balance.bandwidthPerSec• Fixed in HDFS-2936
HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved
Disk / Network
JVM / Linux
ZK HDFS
HBase
App MR
![Page 32: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/32.jpg)
32
Unable to Load Database
FATAL org.apache.zookeeper.server.quorum.QuorumPeer: Unable to load database on disk
What causes this? • ZK data directories filled up
How can it be resolved? • Wipe out /var/zookeeper/version-2 • Run zkCleanup.sh script via cron
HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved
Disk / Network
JVM / Linux
ZK HDFS
HBase
App MR
![Page 33: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/33.jpg)
33
Downed HBase Master and RegionServers
WARN org.apache.zookeeper.server.quorum.Learner: Exception when following the leader java.net.SocketTimeoutException: Read timed out
What causes this? • Session Timeout + Session Expiration = NW Prob
How can it be resolved?• Monitor network (e.g. ifconfig)• Run ≥ 3 ZK servers (majority rules)
HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved
Disk / Network
JVM / Linux
ZK HDFS
HBase
App MR
![Page 34: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/34.jpg)
34Copyright 2012 Cloudera Inc. All rights reserved
The HBase Triage: Fixes for acute HBase pains
• Severe Pain
• Complete Unconsciousness
![Page 35: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/35.jpg)
35
Outline
• Preventative HBase Medicine: • Tips for a healthy HBase
• The HBase Triage:• Fixes for acute HBase pains
• The HBase Surgery:• Repairing a Corrupted HBase
HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved
![Page 36: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/36.jpg)
36
“To the operating room, please”
• Hbase refuses to start• Hbase’s HBCK reports inconsistencies
![Page 37: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/37.jpg)
37
HBase Support Tickets
44%
12%
16%
28%
HBase, ZK, MR, HDFS MisconfigPatch RequiredFix HW/NWRepair Needed
HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved
![Page 38: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/38.jpg)
38
HBase Support Tickets
44%
12%
16%
28%
HBase, ZK, MR, HDFS MisconfigPatch RequiredFix HW/NWRepair Needed
HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved
![Page 39: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/39.jpg)
39Copyright 2012 Cloudera Inc. All rights reserved
Detecting internal problems with hbck
• HBase since 0.90 has included a tool for scanning an HBase instance’s internals to find corruptions.
hbase hbck
hbase hbck -details
![Page 40: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/40.jpg)
40
Tables are sharded into regions
HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved
0000000000
1111111111
2222222222
3333333333
4444444444
5555555555
6666666666
7777777777
0000000000
1111111111
2222222222
3333333333
4444444444
5555555555
6666666666
7777777777
[‘’, A)
[A, B)
[B, ‘’)
Invariants: Maintain table integrity and region consistency !
![Page 41: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/41.jpg)
41
Table Integrity Invariants
• Every key shall get assigned to a single region.
• Table Regions shall:• Cover the entire range of possible
keys,• from the absolute start (‘’) • to the absolute end (unfortunately,
also ‘’).
HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved
[‘ ‘,A)[A,B)
[B, C)
[C, D)
[D, E)
[E, F)
[F, G)
[G, ‘ ‘)
![Page 42: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/42.jpg)
42
Region Consistency Invariants
HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved
.regioninfo in HDFS
Assigned onRegion server
info:regioninfo in META
RegionConsistent
Orphans
![Page 43: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/43.jpg)
43Copyright 2012 Cloudera Inc. All rights reserved
Repairing internal problems with hbck
• Newer and upcoming versions of HBase include an hbck that can fix internal problem as well as detect.• 0.90.7 • 0.92.2• 0.94.0• CDH3u4+ • CDH4b2+
Look’s like you’ve broken an invariant
![Page 44: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/44.jpg)
44
Bad region assignment
HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved
.regioninfo in HDFS
Assigned onRegion server
RegionConsistent
info:regioninfo in META
.regioninfo in HDFS
hbck -fix (0.90.x)hbck –fixAssignments (0.90.7+, 0.92.2+, 0.94+)
Orphans
![Page 45: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/45.jpg)
45
Region not in META
HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved
.regioninfo in HDFS
Assigned onRegion server
info:regioninfo in META
RegionConsistent
.regioninfo in HDFS
Orphans
hbck –fixAssignments -fixMeta
![Page 46: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/46.jpg)
46
Assigned onRegion server
Regioninfo not in HDFS
HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved
.regioninfo in HDFS
Assigned onRegion server
info:regioninfo in META
RegionConsistent
info:regioninfo in META
Assigned onRegion server
.regioninfo in HDFS
Orphans
hbck –fixAssignments -fixMeta
![Page 47: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/47.jpg)
47
Table Regions must not have holes
• Where to I put row key “CRUD”?• Where is region [C,D)?
• Repair: • Find the orphan and adopt it.• Fabricate a new region to fill the
hole
HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved
[‘ ‘,A)[A,B)
[B, C)
[D, E)
[E, F)
[F, G)
[G, ‘ ‘)
?
# NOTE! HBase should be idle (no get/put/split/compacts)hbck –fixHdfsHoles –fixHdfsOrphans –fixAssignments -fixMeta
![Page 48: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/48.jpg)
48
Table Regions must not overlap
• Hm.. Which region should “BAD” go?
• Is it [B, D) or is it [B,C)?• Likely due to a bad split.• Repair:• Merge regions or,• Sideline and bulk load
HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved
[‘ ‘,A)[A,B)
[B, D) [B,C)
[C, D)
[D, E)
[E, F)
[F, G)
[G, ‘ ‘)
??
# NOTE! HBase should be idle (no get/put/split/compacts)hbck –fixHdfsOverlaps –fixAssignments -fixMeta
![Page 49: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/49.jpg)
49
Assigned onRegion server
Consistency problem summary
HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved
.regioninfo in HDFS
info:regioninfo in META
RegionConsistent
Orphans
hbck –fixAssignments –fixMeta –fixHdfsHoles –fixHdfsOrphans –fixHdfsOverlaps
![Page 50: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/50.jpg)
50
Investigating further
• HFile – examine contents of HFiles• Hlog – examine contents of HLog file• OfflineMetaRepair – Rebuild meta table from file
system.• Also, some scripts for manual repairs:
https://github.com/jmhsieh/hbase-repair-scripts
HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved
![Page 51: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/51.jpg)
51
Outline
• Preventative HBase Medicine: • Tips for a healthy HBase
• The HBase Triage:• Fixes for acute HBase pains
• The HBase Surgery:• Repairing a Corrupted HBase
HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved
![Page 52: 4 supporting h base jeff, jon, kathleen - cloudera - final 2](https://reader036.fdocuments.in/reader036/viewer/2022062319/554f727bb4c905c8088b56b9/html5/thumbnails/52.jpg)
52
Questions?
HBaseCon 2012. 5/22/12Copyright 2012 Cloudera Inc. All rights reserved