HBase: Just the Basics - oobdata.comoobdata.com/wp-content/uploads/2018/11/hbase101... ·...

23
1 HBase: Just the Basics Jesse Anderson – Curriculum Developer and Instructor v1

Transcript of HBase: Just the Basics - oobdata.comoobdata.com/wp-content/uploads/2018/11/hbase101... ·...

Page 1: HBase: Just the Basics - oobdata.comoobdata.com/wp-content/uploads/2018/11/hbase101... · •Designing schemas for HBase requires an in-depth knowledge •Schema Design is ‘data-centric’

1

HBase: Just the BasicsJesse Anderson – Curriculum Developer and Instructor

v1

Page 2: HBase: Just the Basics - oobdata.comoobdata.com/wp-content/uploads/2018/11/hbase101... · •Designing schemas for HBase requires an in-depth knowledge •Schema Design is ‘data-centric’

2

What Is HBase?

©2014 Cloudera, Inc. All rights reserved.2

• NoSQL datastore built on top of HDFS (Hadoop)

• An Apache Top Level Project

• Handles the various manifestations of Big Data

• Based on Google’s BigTable paper

Page 3: HBase: Just the Basics - oobdata.comoobdata.com/wp-content/uploads/2018/11/hbase101... · •Designing schemas for HBase requires an in-depth knowledge •Schema Design is ‘data-centric’

3

Why Use HBase?

©2014 Cloudera, Inc. All rights reserved.3

• Storing large amounts of data (TB/PB)

• High throughput for a large number of requests

• Storing unstructured or variable column data

• Big Data with random read and writes

Page 4: HBase: Just the Basics - oobdata.comoobdata.com/wp-content/uploads/2018/11/hbase101... · •Designing schemas for HBase requires an in-depth knowledge •Schema Design is ‘data-centric’

4

When to Consider Not Using HBase?

©2014 Cloudera, Inc. All rights reserved.4

• Only use with Big Data problems

• Read straight through files

• Write all at once or append new files

• Not random reads or writes

• Access patterns of the data are ill-defined

Page 5: HBase: Just the Basics - oobdata.comoobdata.com/wp-content/uploads/2018/11/hbase101... · •Designing schemas for HBase requires an in-depth knowledge •Schema Design is ‘data-centric’

5

HBase ArchitectureHow it works

Page 6: HBase: Just the Basics - oobdata.comoobdata.com/wp-content/uploads/2018/11/hbase101... · •Designing schemas for HBase requires an in-depth knowledge •Schema Design is ‘data-centric’

6

Meet the Daemons

©2014 Cloudera, Inc. All rights reserved.6

• HBase Master

• RegionServer

• ZooKeeper

• HDFS

• NameNode/Standby NameNode

• DataNode

Page 7: HBase: Just the Basics - oobdata.comoobdata.com/wp-content/uploads/2018/11/hbase101... · •Designing schemas for HBase requires an in-depth knowledge •Schema Design is ‘data-centric’

7

Daemon Locations

©2014 Cloudera, Inc. All rights reserved.7

NameNode

DataNodeDataNode

Standby NameNode

DataNode

RegionServer

Master

RegionServerRegionServer

ZooKeeper ZooKeeper ZooKeeper

Master Master

DataNodeDataNode DataNode

RegionServerRegionServerRegionServer

Master Nodes

Slave Nodes

Page 8: HBase: Just the Basics - oobdata.comoobdata.com/wp-content/uploads/2018/11/hbase101... · •Designing schemas for HBase requires an in-depth knowledge •Schema Design is ‘data-centric’

8

Tables and Column Families

©2014 Cloudera, Inc. All rights reserved.8

Column Family “contactinfo” Column Family “profilephoto”

Tables are broken into groupings called Column Families.

Group data frequently

accessed together and

compress it Group photos with different settings

Page 9: HBase: Just the Basics - oobdata.comoobdata.com/wp-content/uploads/2018/11/hbase101... · •Designing schemas for HBase requires an in-depth knowledge •Schema Design is ‘data-centric’

9

Rows and Columns

©2014 Cloudera, Inc. All rights reserved.9

Row key Column Family “contactinfo” Column Family “profilephoto”

adupont fname: Andre lname: Dupont

jsmith fname: John lname: Smith image: <smith.jpg>

mrossi fname: Mario lname: Rossi image: <mario.jpg>

Row keys identify a row

No storage penalty for unused columns

Each Column Family can have many columns

Page 10: HBase: Just the Basics - oobdata.comoobdata.com/wp-content/uploads/2018/11/hbase101... · •Designing schemas for HBase requires an in-depth knowledge •Schema Design is ‘data-centric’

10

Regions

©2014 Cloudera, Inc. All rights reserved.10

Row key Column Family “contactinfo”

adupont fname: Andre lname: Dupont

jsmith fname: John lname: Smith

A table is broken into regions

NameNode

DataNodeDataNode

Standby NameNode

DataNode

RegionServer

Master

RegionServerRegionServer

ZooKeeper ZooKeeper ZooKeeper

Master Master

DataNodeDataNode DataNode

RegionServerRegionServerRegionServer

Row key Column Family “contactinfo”

mrossi fname: Mario lname: Rossi

zstevens fname: Zack lname: Stevens

Regions are served by

RegionServers

Page 11: HBase: Just the Basics - oobdata.comoobdata.com/wp-content/uploads/2018/11/hbase101... · •Designing schemas for HBase requires an in-depth knowledge •Schema Design is ‘data-centric’

11

Client

Write Path

©2014 Cloudera, Inc. All rights reserved.11

NameNode

DataNodeDataNode

Standby NameNode

DataNode

RegionServer

Master

RegionServerRegionServer

ZooKeeper ZooKeeper ZooKeeper

Master Master

DataNodeDataNode DataNode

RegionServerRegionServerRegionServer

1. Which

RegionServer is

serving the Region?

2. Write to

RegionServer

Page 12: HBase: Just the Basics - oobdata.comoobdata.com/wp-content/uploads/2018/11/hbase101... · •Designing schemas for HBase requires an in-depth knowledge •Schema Design is ‘data-centric’

12

Client

Read Path

©2014 Cloudera, Inc. All rights reserved.12

NameNode

DataNodeDataNode

Standby NameNode

DataNode

RegionServer

Master

RegionServerRegionServer

ZooKeeper ZooKeeper ZooKeeper

Master Master

DataNodeDataNode DataNode

RegionServerRegionServerRegionServer

1. Which

RegionServer is

serving the Region?

2. Read from

RegionServer

Page 13: HBase: Just the Basics - oobdata.comoobdata.com/wp-content/uploads/2018/11/hbase101... · •Designing schemas for HBase requires an in-depth knowledge •Schema Design is ‘data-centric’

13

HBase APIHow to access the data

Page 14: HBase: Just the Basics - oobdata.comoobdata.com/wp-content/uploads/2018/11/hbase101... · •Designing schemas for HBase requires an in-depth knowledge •Schema Design is ‘data-centric’

14

No SQL Means No SQL

©2014 Cloudera, Inc. All rights reserved.14

• Data is not accessed over SQL

• You must:

• Create your own connections

• Keep track of the type of data in a column

• Give each row a key

• Access a row by its key

Page 15: HBase: Just the Basics - oobdata.comoobdata.com/wp-content/uploads/2018/11/hbase101... · •Designing schemas for HBase requires an in-depth knowledge •Schema Design is ‘data-centric’

15

Types of Access

©2014 Cloudera, Inc. All rights reserved.15

• Gets

• Gets a row’s data based on the row key

• Puts

• Upserts a row with data based on the row key

• Scans

• Finds all matching rows based on the row key

• Scan logic can be increased by using filters

Page 16: HBase: Just the Basics - oobdata.comoobdata.com/wp-content/uploads/2018/11/hbase101... · •Designing schemas for HBase requires an in-depth knowledge •Schema Design is ‘data-centric’

16

Gets

©2014 Cloudera, Inc. All rights reserved.16

1

2

3

4

Get g = new Get(ROW_KEY_BYTES);

Result r= table.get(g);

byte[] byteArray =

r.getValue(COLFAM_BYTS,COLDESC_BYTS);

String columnValue =

Bytes.toString(byteArray);

Page 17: HBase: Just the Basics - oobdata.comoobdata.com/wp-content/uploads/2018/11/hbase101... · •Designing schemas for HBase requires an in-depth knowledge •Schema Design is ‘data-centric’

17

Puts

©2014 Cloudera, Inc. All rights reserved.17

1

2

3

4

Put p = new

Put(Bytes.toBytes(ROW_KEY_BYTES);

p.add(COLFAM_BYTES, COLDESC_BYTES,

Bytes.toBytes("value"));

table.put(p);

Page 18: HBase: Just the Basics - oobdata.comoobdata.com/wp-content/uploads/2018/11/hbase101... · •Designing schemas for HBase requires an in-depth knowledge •Schema Design is ‘data-centric’

18

HBase Schema DesignHow to design

Page 19: HBase: Just the Basics - oobdata.comoobdata.com/wp-content/uploads/2018/11/hbase101... · •Designing schemas for HBase requires an in-depth knowledge •Schema Design is ‘data-centric’

19

No SQL Means No SQL

©2014 Cloudera, Inc. All rights reserved.19

• Designing schemas for HBase requires an in-depth knowledge

• Schema Design is ‘data-centric’ not ‘relationship-centric’

• You design around how data is accessed

• Row keys are engineered

Page 20: HBase: Just the Basics - oobdata.comoobdata.com/wp-content/uploads/2018/11/hbase101... · •Designing schemas for HBase requires an in-depth knowledge •Schema Design is ‘data-centric’

20

Treating HBase like a traditional RDBMS will lead to abject failure!Captain Picard

Page 21: HBase: Just the Basics - oobdata.comoobdata.com/wp-content/uploads/2018/11/hbase101... · •Designing schemas for HBase requires an in-depth knowledge •Schema Design is ‘data-centric’

21

Row Keys

©2014 Cloudera, Inc. All rights reserved.21

• A row key is more than the glue between two tables

• Engineering time is spent just on constructing a row key

• Contents of a row key vary by access pattern

• Often made up of several pieces of data

Page 22: HBase: Just the Basics - oobdata.comoobdata.com/wp-content/uploads/2018/11/hbase101... · •Designing schemas for HBase requires an in-depth knowledge •Schema Design is ‘data-centric’

22

Schema Design

©2014 Cloudera, Inc. All rights reserved.22

• Schema design does not start in an ERD

• Access pattern must be known and ascertained

• Denormalize to improve performance

• Fewer, bigger tables

Page 23: HBase: Just the Basics - oobdata.comoobdata.com/wp-content/uploads/2018/11/hbase101... · •Designing schemas for HBase requires an in-depth knowledge •Schema Design is ‘data-centric’

23 ©2014 Cloudera, Inc. All rights reserved.

Jesse Anderson@jessetanderson