Hbase an introduction
-
Upload
fabio-fumarola -
Category
Engineering
-
view
138 -
download
0
Transcript of Hbase an introduction
![Page 1: Hbase an introduction](https://reader030.fdocuments.in/reader030/viewer/2022032420/55a4f6ba1a28ab5c628b4575/html5/thumbnails/1.jpg)
Introduction to HBase
Ciaociao
Vai a fare
ciao ciao
Dr. Fabio Fumarola
![Page 2: Hbase an introduction](https://reader030.fdocuments.in/reader030/viewer/2022032420/55a4f6ba1a28ab5c628b4575/html5/thumbnails/2.jpg)
Contents
• BigTable• HBase
– Shell– Admin– Put– Get– Scan
• Coding Session
2
![Page 3: Hbase an introduction](https://reader030.fdocuments.in/reader030/viewer/2022032420/55a4f6ba1a28ab5c628b4575/html5/thumbnails/3.jpg)
BigTable
3
![Page 4: Hbase an introduction](https://reader030.fdocuments.in/reader030/viewer/2022032420/55a4f6ba1a28ab5c628b4575/html5/thumbnails/4.jpg)
Bigtable at google
• "Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable including web indexing, Google Earth, and Google Finance.”
4
![Page 5: Hbase an introduction](https://reader030.fdocuments.in/reader030/viewer/2022032420/55a4f6ba1a28ab5c628b4575/html5/thumbnails/5.jpg)
Feature
• Distributed
• Sparse
• Column-Oriented
• Versioned
5
![Page 6: Hbase an introduction](https://reader030.fdocuments.in/reader030/viewer/2022032420/55a4f6ba1a28ab5c628b4575/html5/thumbnails/6.jpg)
1. The map is indexed by a – <row key, column key, and a timestamp>
1. each value in the map is an uninterpreted array of bytes.
6
(row key, column key, timestamp) => value
![Page 7: Hbase an introduction](https://reader030.fdocuments.in/reader030/viewer/2022032420/55a4f6ba1a28ab5c628b4575/html5/thumbnails/7.jpg)
Key Concepts
• row key => 20120407152657
• column family => "personal:"• column key => "personal:givenName",
"personal:surname”
• timestamp => 1239124584398
• Column value => “mario”, “rossi”
7
![Page 8: Hbase an introduction](https://reader030.fdocuments.in/reader030/viewer/2022032420/55a4f6ba1a28ab5c628b4575/html5/thumbnails/8.jpg)
Example 1
8
![Page 9: Hbase an introduction](https://reader030.fdocuments.in/reader030/viewer/2022032420/55a4f6ba1a28ab5c628b4575/html5/thumbnails/9.jpg)
Get row 20120407145045
9
![Page 10: Hbase an introduction](https://reader030.fdocuments.in/reader030/viewer/2022032420/55a4f6ba1a28ab5c628b4575/html5/thumbnails/10.jpg)
HBase
• Use HBase when you need random, realtime read/ write access to your Big Data.This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware. HBase is an open-source, distributed, versioned, column-oriented store modeled after Google's Bigtable.
http://hbase.apache.org
10
![Page 11: Hbase an introduction](https://reader030.fdocuments.in/reader030/viewer/2022032420/55a4f6ba1a28ab5c628b4575/html5/thumbnails/11.jpg)
HBase Shell
hbase(main):001:0> create 'blog', 'info', 'content'
0 row(s) in 4.3640 seconds
hbase(main):002:0> put 'blog', '20120320162535', 'info:title', 'Document-oriented storage using CouchDB'
0 row(s) in 0.0330 seconds
hbase(main):003:0> put 'blog', '20120320162535', 'info:author', 'Bob Smith'
0 row(s) in 0.0030 seconds
hbase(main):004:0> put 'blog', '20120320162535', 'content:', 'CouchDB is a
document-oriented...'
0 row(s) in 0.0030 seconds
11
![Page 12: Hbase an introduction](https://reader030.fdocuments.in/reader030/viewer/2022032420/55a4f6ba1a28ab5c628b4575/html5/thumbnails/12.jpg)
HBase shellhbase(main):005:0> put 'blog', '20120320162535', 'info:category', 'Persistence'
0 row(s) in 0.0030 seconds
hbase(main):006:0> get 'blog', '20120320162535'
COLUMN
content:
info:author
info:category
info:title
4 row(s) in 0.0140 seconds
CELL
timestamp=1239135042862, value=CouchDB is a doc...
timestamp=1239135042755, value=Bob Smith
timestamp=1239135042982, value=Persistence
timestamp=1239135042623, value=Document-oriented...
12
![Page 13: Hbase an introduction](https://reader030.fdocuments.in/reader030/viewer/2022032420/55a4f6ba1a28ab5c628b4575/html5/thumbnails/13.jpg)
HBase shellhbase(main):015:0> get 'blog', '20120407145045', {COLUMN=>'info:author', VERSIONS=>3 }
timestamp=1239135325074, value=John Doe
timestamp=1239135324741, value=John
2 row(s) in 0.0060 seconds
hbase(main):016:0> scan 'blog', { STARTROW => '20120300', STOPROW => '20120400' }
ROW
20120320162535
20120320162535
20120320162535
20120320162535
COLUMN+CELL
column=content:, timestamp=1239135042862, value=CouchDB is...
column=info:author, timestamp=1239135042755, value=Bob Smith
column=info:category, timestamp=1239135042982, value=Persistence
column=info:title, timestamp=1239135042623, value=Document...
4 row(s) in 0.0230 seconds
13
![Page 14: Hbase an introduction](https://reader030.fdocuments.in/reader030/viewer/2022032420/55a4f6ba1a28ab5c628b4575/html5/thumbnails/14.jpg)
Java API
14
![Page 15: Hbase an introduction](https://reader030.fdocuments.in/reader030/viewer/2022032420/55a4f6ba1a28ab5c628b4575/html5/thumbnails/15.jpg)
Admin API// Create a new table
Configuration conf = HBaseConfiguration.create(); HBaseAdmin admin = new HBaseAdmin(conf);
String tableName = "people";
HTableDescriptor desc = new HTableDescriptor(tableName); desc.addFamily(new HColumnDescriptor("personal")); desc.addFamily(new HColumnDescriptor("contactinfo")); desc.addFamily(new HColumnDescriptor("creditcard")); admin.createTable(desc);
System.out.printf("%s is available? %b\n", tableName, admin.isTableAvailable(tableName));
15
![Page 16: Hbase an introduction](https://reader030.fdocuments.in/reader030/viewer/2022032420/55a4f6ba1a28ab5c628b4575/html5/thumbnails/16.jpg)
Client APIimport static org.apache.hadoop.hbase.util.Bytes.toBytes;
// Add some data into 'people' table
Configuration conf = HBaseConfiguration.create();
Put put = new Put(toBytes("connor-john-m-43299"));
put.add(toBytes("personal"), toBytes("givenName"), toBytes("John"));
put.add(toBytes("personal"), toBytes("mi"), toBytes("M")); put.add(toBytes("personal"), toBytes("surname"), toBytes("Connor"));
put.add(toBytes("contactinfo"), toBytes("email"), toBytes("[email protected]")); table.put(put);
table.flushCommits(); table.close();
16
![Page 17: Hbase an introduction](https://reader030.fdocuments.in/reader030/viewer/2022032420/55a4f6ba1a28ab5c628b4575/html5/thumbnails/17.jpg)
Finding Data
• GET (by row key)
• Scan (by row key ranges, filtering)
17
![Page 18: Hbase an introduction](https://reader030.fdocuments.in/reader030/viewer/2022032420/55a4f6ba1a28ab5c628b4575/html5/thumbnails/18.jpg)
Get
// Get a row. Ask for only the data you need. Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, "people");
Get get = new Get(toBytes("connor-john-m-43299")); get.setMaxVersions(2); get.addFamily(toBytes("personal"));
get.addColumn(toBytes("contactinfo"), toBytes("email"));
Result result = table.get(get);
18
![Page 19: Hbase an introduction](https://reader030.fdocuments.in/reader030/viewer/2022032420/55a4f6ba1a28ab5c628b4575/html5/thumbnails/19.jpg)
Update// Update existing values, and add a new one
Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, "people");
Put put = new Put(toBytes("connor-john-m-43299")); put.add(toBytes("personal"), toBytes("surname"), toBytes("Smith"));
put.add(toBytes("contactinfo"), toBytes("email"), toBytes("[email protected]"));
put.add(toBytes("contactinfo"), toBytes("address"), toBytes("San Diego, CA"));
table.put(put);
table.flushCommits();
table.close();
19
![Page 20: Hbase an introduction](https://reader030.fdocuments.in/reader030/viewer/2022032420/55a4f6ba1a28ab5c628b4575/html5/thumbnails/20.jpg)
Scans// Scan rows...
Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, "people");
Scan scan = new Scan(toBytes(”jhon-")); scan.addColumn(toBytes("personal"), toBytes("givenName")); scan.addColumn(toBytes("contactinfo", toBytes("email")); scan.addColumn(toBytes("contactinfo", toBytes("address")); scan.setFilter(new PageFilter(numRowsPerPage)); ResultScanner scanner = table.getScanner(scan);
for (Result result : scanner) {
// process result...
}
20
![Page 21: Hbase an introduction](https://reader030.fdocuments.in/reader030/viewer/2022032420/55a4f6ba1a28ab5c628b4575/html5/thumbnails/21.jpg)
Time to CodeThis is when things start to do hard
21
![Page 22: Hbase an introduction](https://reader030.fdocuments.in/reader030/viewer/2022032420/55a4f6ba1a28ab5c628b4575/html5/thumbnails/22.jpg)
Setup HBase Docker
• https://registry.hub.docker.com/u/banno/hbase-standalone/• https://registry.hub.docker.com/u/oddpoet/hbase-cdh5/
22
![Page 23: Hbase an introduction](https://reader030.fdocuments.in/reader030/viewer/2022032420/55a4f6ba1a28ab5c628b4575/html5/thumbnails/23.jpg)
Steps
• Shell• Java Project– Maven– Gradle
23