HBase Status Report - Hadoop Summit Europe 2014

40
1 HBase 0.96+ A Report on the Current Status Lars George | EMEA Chief Architect

Transcript of HBase Status Report - Hadoop Summit Europe 2014

Page 1: HBase Status Report - Hadoop Summit Europe 2014

1

HBase  0.96+    A  Report  on  the  Current  Status  Lars  George  |  EMEA  Chief  Architect  

Page 2: HBase Status Report - Hadoop Summit Europe 2014

About  Me  

•  EMEA  Chief  Architect  @  Cloudera  •  ConsulDng  on  Hadoop  projects  (everywhere)  

•  Apache  CommiLer  •  HBase  and  Whirr  

•  O’Reilly  Author  •  HBase  –  The  DefiniDve  Guide  

•  Now  in  Japanese!  

•  Contact  •  [email protected]  •  @larsgeorge  

日本語版も出ました!  

Page 3: HBase Status Report - Hadoop Summit Europe 2014

The  Content...  

•  Version  History  •  Overview  of  new  Features  •  Summary  

Page 4: HBase Status Report - Hadoop Summit Europe 2014

CONFIDENTIAL  -­‐  RESTRICTED  

Version  History  A  Timeline  Overview  

Page 5: HBase Status Report - Hadoop Summit Europe 2014

HBase  Releases  

5

URL:  hLp://s.apache.org/hbase-­‐releases  

Page 6: HBase Status Report - Hadoop Summit Europe 2014

HBase  Releases  –  Issues  Closed  (JIRA)  

6

URL:  hLp://s.apache.org/hbase-­‐releases  

Page 7: HBase Status Report - Hadoop Summit Europe 2014

HBase  Releases  –  Issues  Closed  (DisDnct)  

7

URL:  hLp://s.apache.org/hbase-­‐releases  

Page 8: HBase Status Report - Hadoop Summit Europe 2014

HBase  Book?  

I  targeted  0.92.0  but…    

r1130336 | stack | 2011-06-02 00:52:45 +0200 ⤦ (Thu, 02 Jun 2011) | 1 line

Add link to meet up

... r1234894 | stack | 2012-01-23 17:50:43 +0100 ⤦

(Mon, 23 Jan 2012) | 1 line

Move version on past 0.92.0 to 0.92.1-SNAPSHOT $ svn log -r 1130336:1234894 | grep "^r" | wc -l

807

8

Page 9: HBase Status Report - Hadoop Summit Europe 2014

HBase  Book?  

I  am  trailing  0.92.0  by  800+  commits,  including  for  example     r1153634 | tedyu | 2011-08-03 21:59:48 +0200 ⤦

(Wed, 03 Aug 2011) | 2 lines

HBASE-3857 Change the HFile Format (Mikhail & Liyin)

…which  is  not  “unimportant”.  J    I  am  working  on  an  update!  

9

Page 10: HBase Status Report - Hadoop Summit Europe 2014

10

Coprocessors  and  more…  

HBase  0.92  

Page 11: HBase Status Report - Hadoop Summit Europe 2014

HBase  0.92  -­‐  Highlights  

•  682  issues  addressed  •  811  issues  total  in  0.92.x  line  •  New  logo!    (HBASE-­‐4312)  •  HFile  v2    (HBASE-­‐3857)  •  Distributed  Log  Splilng    (HBASE-­‐1364)  •  Enhanced  Master  UI  

•  Major  compacDon  progress    (HBASE-­‐3900)  •  Regions  in  transiDon    (HBASE-­‐4291)  •  Tasks    (HBASE-­‐3839)  

•  Slow  Query  Metrics    (HBASE-­‐4117)  

11

Page 12: HBase Status Report - Hadoop Summit Europe 2014

HBase  0.92  -­‐  Highlights  

•  Coprocessors  (HBASE-­‐2000)  •  Oneap  cache  (HBASE-­‐4027)  •  Online  Table  Schema  Change  (HBASE-­‐1730)  •  Regions  Size  from  256MB  to  1GB  (HBASE-­‐4374)  •  Hadoop  1  Support  (HBASE-­‐5125)  •  Snappy  Support  (HBASE-­‐3691)  •  Keep  last  version  with  TTL  (HBASE-­‐4071)  • MulDthreaded  CompacDons  (HBASE-­‐4572)  

12

Page 13: HBase Status Report - Hadoop Summit Europe 2014

HFile  v1  –  HBase  0.90  

13

•  Previously  the  file  layout  was  data  blocks,  meta  blocks  and  then  file  metadata  like  indexes.  

•  Each  data  block  held  a  magic  header  and  then  the  actual  data  sequenDally.  

Page 14: HBase Status Report - Hadoop Summit Europe 2014

HFile  v2  –  HBase  0.92+  

The  2nd  version  of  HFile  splits  the  indexes  and  Bloom  filters  up  into  a  hierarchy  and  interleaves  those  with  data  blocks.  

14

The  data  block  header  now  holds  addiDonal  info  on  the  block  itself.      

Source:  hLp://blog.cloudera.com/blog/2012/06/hbase-­‐io-­‐hfile-­‐input-­‐output/  

Page 15: HBase Status Report - Hadoop Summit Europe 2014

Coprocessors:  Observers  

15

Page 16: HBase Status Report - Hadoop Summit Europe 2014

Coprocessors:  RPC  Calls  

16

Page 17: HBase Status Report - Hadoop Summit Europe 2014

Slab  Cache  –  Off-­‐heap  Block  Cache  

17 hLp://blog.cloudera.com/blog/2012/01/caching-­‐in-­‐hbase-­‐slabcache/  

•  The  off-­‐heap  cache  uses  Java  NIO’s  Direct  ByteBuffer  structures  

•  Uses  its  on  slab  allocaDon  handling  •  Does  copy-­‐on-­‐read  during  access  of  data  

•  Uses  L2  cache  and  replaces  OS  buffer  cache  

Page 18: HBase Status Report - Hadoop Summit Europe 2014

18

Performance  Tuning…  

HBase  0.94  

Page 19: HBase Status Report - Hadoop Summit Europe 2014

HBase  0.94  -­‐  Highlights  

•  420  issues  addressed  •  1394  issues  total  in  0.94.x  line  •  Read  Caching  Improvements  (HBASE-­‐5074)  •  Seek  OpDmizaDon  

•  Bloom  Filter  for  Delete  Family  (HBASE-­‐4532)  •  Lazy  Seeks  (HBASE-­‐4465)  

• Write  to  WAL  OpDmizaDons    •  WAL  Compression  (HBASE-­‐4608)  

•  Data  Block  Encoding  of  KeyValues    (HBASE-­‐4218)  •  Improved  HBaseFsck  (HBASE-­‐5128)  

19

Page 20: HBase Status Report - Hadoop Summit Europe 2014

HBase  0.94  -­‐  Highlights  

•  Simplified  Region  Sizing  (HBASE-­‐4365)  •  Smarter  TransacDon  SemanDcs    

•  Atomic  Put&Delete  in  One  Call  (HBASE-­‐3584)  

•  Snapshots  (0.94.6)  (HBASE-­‐7360)  •  Atomic  Appends  (HBASE-­‐4102)  • MulD  Increment  and  Append  (HBASE-­‐2947)  • More  Aggressive  Off-­‐Peak  CompacDons  (HBASE-­‐4463)  

20

Page 21: HBase Status Report - Hadoop Summit Europe 2014

HBase  0.94  -­‐  Highlights  

•  Per  Column  Family  Metrics  (HBASE-­‐4219)  • MulD-­‐row  local  transacDons  (HBASE-­‐5229)  

•  Pluggable  Split  Key  Policy  (HBASE-­‐5304)  

•  Load  balance  regions  by  table  (HBASE-­‐3373)  •  Also  backported  to  0.92.1    

• Make  CompacDon  Code  Pluggable  (HBASE-­‐6427)  •  Deprecate  HTablePool  (0.94.11)  (HBASE-­‐6580)  •  Canary  Test  Tool  (HBASE-­‐4393)  

21

Page 22: HBase Status Report - Hadoop Summit Europe 2014

Block  Encoding  

•  Allows  to  reduce  data  footprint  in  memory  •  Only  encodes  the  key  porDon  of  a  key/value  pair  •  Encoded  keys  stay  encoded  also  during  flushes  •  Compression  on  top  of  encoding  takes  care  of  the  values  and  remainder  of  key  data  

   Example:  •  Key  length:  90B  •  Value  length:  8B  

22

Type   Ra0o  Key  Compression   92%  Total  Compression   85%  LZO  on  same  data   85%  LZO  axer  encoding   91%  

hLps://issues.apache.org/jira/browse/HBASE-­‐4218  

Page 23: HBase Status Report - Hadoop Summit Europe 2014

Block  Encoding:  None  

23 Source:  hLp://blog.cloudera.com/blog/2012/06/hbase-­‐io-­‐hfile-­‐input-­‐output/  

• With  no  encoding  the  Key/Value  structures  are  stored  verbaDm  (with  some  overhead  for  lengths)  

•  In  the  past  you  were  advised  to  keep  the  “keys”  short  for  that  reason    

Page 24: HBase Status Report - Hadoop Summit Europe 2014

Block  Encoding:  Prefix  Encoding  

24 Source:  hLp://blog.cloudera.com/blog/2012/06/hbase-­‐io-­‐hfile-­‐input-­‐output/  

•  The  encoding  patch  added  a  new  Cell  abstracDon  that  allows  for  extra  fields  in  a  Key/Value  

•  The  fields  are  used  to  track  necessary  details  for  the  encoding  

Page 25: HBase Status Report - Hadoop Summit Europe 2014

Block  Encoding:  Diff  Encoding  

25 Source:  hLp://blog.cloudera.com/blog/2012/06/hbase-­‐io-­‐hfile-­‐input-­‐output/  

•  Apart  from  the  prefix  encoding  there  are  other  ways  of  encoding  the  keys  

•  The  diff  encoding  is  one  of  such  approaches  

Page 26: HBase Status Report - Hadoop Summit Europe 2014

Block  Encoding  

•  Advantage  of  block  encoding  is  faster  decompression/decoding    

•  20-­‐80%  faster  than  LZO  

•  Also  it  allows  to  seek  data  sDll,  which  is  not  possible  with  compressed  data  

•  Penalty  is  a  slightly  slower  read  performance  compared  to  non-­‐encoded  keys  

•  Important  is  to  watch  the  sizes  and  repeDDon  of  key  data,  encoding  might  not  be  useful  for  random  data  

26 hLps://issues.apache.org/jira/browse/HBASE-­‐4218  

Page 27: HBase Status Report - Hadoop Summit Europe 2014

27

The  Singularity  

HBase  0.96  

Page 28: HBase Status Report - Hadoop Summit Europe 2014

HBase  0.96  -­‐  Highlights  

•  1219  issues  addressed  •  2243  issues  total  in  0.96.x  line  •  Improved  Stability  (HBASE-­‐6241/6201)  

•  ZK  based  Read/Write  locks  for  table  operaDons  (HBASE-­‐7305)  •  Scalability  Improvements  (HBASE-­‐8877)  

•  Schema  Storage  (HBASE-­‐8778)  •  Log  Cleaner  for  ReplicaDon  Speed  Up  (HBASE-­‐9208)  

•  Mean-­‐Time-­‐To-­‐Recovery  (MTTR)  Improvements  (HBASE-­‐5844/5926)  

•  Distributed  Log  Replay  (HBASE-­‐7006)  •  Dedicated  WAL  for  System  Table  (HBASE-­‐7213/8631)  

28

Page 29: HBase Status Report - Hadoop Summit Europe 2014

HBase  0.96  -­‐  Highlights  

•  Operability  Improvements  •  Hooks  for  Health  Scripts  (HBASE-­‐7399/7406)  •  Trace  Lagging  Calls  with  HTrace  (HBASE-­‐9121)  

•  Versioned  RPCs  and  Metadata  (Protobufs)    (HBASE-­‐3505)  •  Parallel  Seeks  in  Stores  (HBASE-­‐7495)  •  Hadoop  1  and  2  Support    

•  Secure  Short  Circuit  Reads  on  H2  (HBASE-­‐6783)  

•  Namespaces  Support  (HBASE-­‐8015)  •  New  Metrics  v2  (HBASE-­‐4050)  •  Cell  Interface  vs  KeyValue  (HBASE-­‐7162)  

29

Page 30: HBase Status Report - Hadoop Summit Europe 2014

HBase  0.96  -­‐  Highlights  

•  No  more  ROOT  table  (HBASE-­‐3171)  •  Remove  HFile  v1  (HBASE-­‐7660)  •  Trie  Data  Block  Encoding    (HBASE-­‐4676)  •  Remove  Client-­‐side  Row  Locks  (HBASE-­‐7263/7315)  •  CompacDon  and  Flush  Improvements

 (HBASE-­‐7516/7763/6466/7678)    (HBASE-­‐7667/7110/7603/7519/7842)  

•  Improved  Default  ConfiguraDon  (HBASE-­‐4657?)  •  Client-­‐side  Type  Library  (HBASE-­‐8089)    

30

Page 31: HBase Status Report - Hadoop Summit Europe 2014

HBase  0.96  -­‐  Highlights  

•  Online  Region  Merging  (HBASE-­‐7403/8219)  •  Bucket  Cache  Support  (HBASE-­‐7404)  •  Remove  older  ICV  Calls  (HBASE-­‐7032)  •  New  “Bootstrap”  based  UIs!    (HBASE-­‐6135)  •  Remove  Client-­‐side  Row  Locks  (HBASE-­‐7263/7315)  •  CompacDon  and  Flush  Improvements

 (HBASE-­‐7516/7763/6466/7678)    (HBASE-­‐7667/7110/7603/7519/7842)  

 

31

Page 32: HBase Status Report - Hadoop Summit Europe 2014

32

—  Michael  Stack,  HBase  PMC  Chair  

Page 33: HBase Status Report - Hadoop Summit Europe 2014

Mean-­‐Time-­‐To-­‐Recovery  (MTTR)  

•  Lot‘s  of  effort  put  into  improve  how  long  data  might  not  be  accessible  during  a  region  move  

•  The  offline  period  is  made  up  of  phases:    •  a  detecDon  phase,    •  a  repair  phase,    •  reassignment,  and  finally,    •  clients  noDcing  the  data  available  in  its  new  locaDon  

•  Improvements  in  many  of  those  areas  •  Faster  detecDon,  efficient  repair,  parallel  replay  •  Dedicated  WAL  for  system  tables  

33 hLps://blog.cloudera.com/blog/2013/10/hbase-­‐0-­‐96-­‐0-­‐released/  

Page 34: HBase Status Report - Hadoop Summit Europe 2014

34

Cell  Level  Security  

HBase  0.98  

Page 35: HBase Status Report - Hadoop Summit Europe 2014

HBase  0.98  -­‐  Highlights  

•  1303  issues  addressed  •  1458  issues  total  in  0.98.x  line  •  Cell  Level  Security  (HBASE-­‐6222/7663/7662)  •  Server-­‐side  EncrypDon  (HBASE-­‐7544)  • WAL  Throughput  Improvements  (HBASE-­‐8755)  •  Reverse  Scanner  (HBASE-­‐4811)  • MapReduce  over  Snapshot  Files  (HBASE-­‐8369)  •  Striped  CompacDons  (HBASE-­‐7667)  •  ThroLle  ReplicaDon  (HBASE-­‐9501)  

 

35

Page 36: HBase Status Report - Hadoop Summit Europe 2014

Cell  Level  Security  

•  Added  HFile  v3  which  can  store  arbitrary  metadata  in  a  cell,  called  tags  

•  Also  extended  ACL  checks  to  apply  to  cell  levels  • With  this  visibility  labels  can  be  stored  in  tags  •  An  API  and  CLI  tools  are  provided  that  are  akin  to  Accumulo’s,  axer  which  it  is  modeled  

•  AddiDonal  encrypDon  of  data  at  rest  ensures  further  security  of  sensiDve  data  

36 hLps://blogs.apache.org/hbase/entry/hbase_cell_security  

Page 37: HBase Status Report - Hadoop Summit Europe 2014

Visibility  Labels  

The  API  allows  to  set  visibility  by  using  expressions  with  “&”,  “|”,  and  “!”,  as  well  as  “(“  and  “)”,  e.g.  label  set  of  {  confidenDal,  secret,  topsecret,  probaDonary  }  could  be  combined  as    

( secret | topsecret ) & !probationary

 At  runDme  the  expressions  are  evaluated  against  a  user  and  then  applied  to  each  cell.  

37

Page 38: HBase Status Report - Hadoop Summit Europe 2014

38

The  Future…  

HBase  0.??  

Page 39: HBase Status Report - Hadoop Summit Europe 2014

HBase  Future  

•  Not  much  is  wriDng  in  stone  yet  • Master  gets  rewriLen  and  also  META  table  handling  

•  Build  in  consensus  (HBASE-­‐10296)  •  Co-­‐locate  Master  and  META  (HBASE-­‐10569)  

• MTTR  is  further  extended  into  interesDng  areas  •  Read  replicas  (HBASE-­‐10070)  

It  has  to  be  seen  when  1.0.0  is  released  and  what  it  contains.  Your  opinion  counts!  

39

Page 40: HBase Status Report - Hadoop Summit Europe 2014

40

Ques0ons?  

@larsgeorge