Dancing with the elephant h base1_final
-
Upload
asterixsmartplatf -
Category
Technology
-
view
768 -
download
3
Transcript of Dancing with the elephant h base1_final
![Page 1: Dancing with the elephant h base1_final](https://reader035.fdocuments.in/reader035/viewer/2022062404/554bc57fb4c90530298b54c8/html5/thumbnails/1.jpg)
Dancing With The Elephant
Persistence with HBase: Part 1
www.smart-platform.com@smartplatf
Event Sponsors
![Page 2: Dancing with the elephant h base1_final](https://reader035.fdocuments.in/reader035/viewer/2022062404/554bc57fb4c90530298b54c8/html5/thumbnails/2.jpg)
We will discuss
• Introduction to Hadoop• HBase: Definition, Storage Model, Usecases• Basic Data Access from shell• Hands-on with HBase API
![Page 3: Dancing with the elephant h base1_final](https://reader035.fdocuments.in/reader035/viewer/2022062404/554bc57fb4c90530298b54c8/html5/thumbnails/3.jpg)
What is Hadoop
• Framework for distributed processing of large datasets(BigData)
• HDFS+MapReduce• HDFS: (Data)
Distributed Filesystem responsible for storing data across cluster
Provides replication on cheap commodity hardware Namenode and DataNode processes
• MapReduce: (Processing) May be a future session
![Page 4: Dancing with the elephant h base1_final](https://reader035.fdocuments.in/reader035/viewer/2022062404/554bc57fb4c90530298b54c8/html5/thumbnails/4.jpg)
HBase: What
• a sparse, distributed, persistent, multidimensional, sorted map ( defined by Google’s paper on BigTable)
• Distributed NoSQL Database designed on top of HDFS
![Page 5: Dancing with the elephant h base1_final](https://reader035.fdocuments.in/reader035/viewer/2022062404/554bc57fb4c90530298b54c8/html5/thumbnails/5.jpg)
RDBMS Woes (with massive data)
• Scaling is Hard and Expensive• Turn off relational features/secondary indexes.. to
scale• Hard to do quick reads at larger tables sizes(500
GB)• Single point of failures• Schema changes
![Page 6: Dancing with the elephant h base1_final](https://reader035.fdocuments.in/reader035/viewer/2022062404/554bc57fb4c90530298b54c8/html5/thumbnails/6.jpg)
HBase: Why
• Scalable: Just add nodes as your data grows• Distributed: Leveraging Hadoop’s HDFS
advantages • Built on top of Hadoop : Being part of the
ecosystem, can be integrated to multiple tools• High performance for read/write
Short-Circuit reads Single reads: 1 to 10 ms, Scan for: 100s of rows in 10ms
• Schema less• Production-Ready where data is in order of
petabytes
![Page 7: Dancing with the elephant h base1_final](https://reader035.fdocuments.in/reader035/viewer/2022062404/554bc57fb4c90530298b54c8/html5/thumbnails/7.jpg)
HBase: Storage Model 1
![Page 8: Dancing with the elephant h base1_final](https://reader035.fdocuments.in/reader035/viewer/2022062404/554bc57fb4c90530298b54c8/html5/thumbnails/8.jpg)
HTable
• Tables are split into regions• Region: Data with continuous range of RowKeys
from [Start to End) sorted Order• Regions split as Table grows (Region size can be
configured)• Table Schema defines Column Families• (Table, RowKey, ColumnFamily, ColumnName, Timestamp)
Value
![Page 9: Dancing with the elephant h base1_final](https://reader035.fdocuments.in/reader035/viewer/2022062404/554bc57fb4c90530298b54c8/html5/thumbnails/9.jpg)
HTable(Data Structure)
• SortedMap(RowKey, List(
SortedMap(Column, List(
Value, Timestamp)
))
)
![Page 10: Dancing with the elephant h base1_final](https://reader035.fdocuments.in/reader035/viewer/2022062404/554bc57fb4c90530298b54c8/html5/thumbnails/10.jpg)
HBase: Data Read/Write
• Get: Random read• Scan: Sequential read• Put: Write/Update
![Page 11: Dancing with the elephant h base1_final](https://reader035.fdocuments.in/reader035/viewer/2022062404/554bc57fb4c90530298b54c8/html5/thumbnails/11.jpg)
HBase: Data Access Clients
• Demo of HBase shell• Java API
![Page 12: Dancing with the elephant h base1_final](https://reader035.fdocuments.in/reader035/viewer/2022062404/554bc57fb4c90530298b54c8/html5/thumbnails/12.jpg)
HBase: API
• Connection• DDL• DML• Filters• Hands-On
![Page 13: Dancing with the elephant h base1_final](https://reader035.fdocuments.in/reader035/viewer/2022062404/554bc57fb4c90530298b54c8/html5/thumbnails/13.jpg)
HBase: API
• Configuration: holds details where to find the cluster and tunable setting .
• Hconnection : represent connection to the cluster.
• HBaseAdmin: handles DDL operations(create, list,drop,alter).
• Htable (HTableInterface) :is a handle on a single Hbase table. Send “command” to the table (Put , Get , Scan , Delete , Increment)
![Page 14: Dancing with the elephant h base1_final](https://reader035.fdocuments.in/reader035/viewer/2022062404/554bc57fb4c90530298b54c8/html5/thumbnails/14.jpg)
HBase: API:DDL
Group name: ddl (Data Defination Language)
Commands: alter, create, describe, disable, drop, enable, exists, is_disabled, is_enabled, list
![Page 15: Dancing with the elephant h base1_final](https://reader035.fdocuments.in/reader035/viewer/2022062404/554bc57fb4c90530298b54c8/html5/thumbnails/15.jpg)
HBase: API:DDL
HBaseConfiguration conf = new HBaseConfiguration();conf.set("hbase.master","localhost:60010"); HBaseAdmin hbase = new HBaseAdmin(conf);HTableDescriptor desc = new HTableDescriptor(" testtable
");HColumnDescriptor meta = new HColumnDescriptor("
colfam1 ".getBytes());HColumnDescriptor prefix = new HColumnDescriptor("
colfam2 ".getBytes());desc.addFamily(meta);desc.addFamily(prefix);hbase.createTable(desc);
![Page 16: Dancing with the elephant h base1_final](https://reader035.fdocuments.in/reader035/viewer/2022062404/554bc57fb4c90530298b54c8/html5/thumbnails/16.jpg)
HBase: API:DML
Group name: dml (Data Manipulation Language)
Commands: count, delete, deleteall, get, get_counter, incr, put, scan, truncate
![Page 17: Dancing with the elephant h base1_final](https://reader035.fdocuments.in/reader035/viewer/2022062404/554bc57fb4c90530298b54c8/html5/thumbnails/17.jpg)
HBase: API:DML PUT
HTable table = new HTable(conf, "testtable");Put put = new Put(Bytes.toBytes("row1"));put.add(Bytes.toBytes("colfam1"),
Bytes.toBytes("qual1"),Bytes.toBytes("val1"));put.add(Bytes.toBytes("colfam1"),
Bytes.toBytes("qual2"),Bytes.toBytes("val2"));table.put(put);
![Page 18: Dancing with the elephant h base1_final](https://reader035.fdocuments.in/reader035/viewer/2022062404/554bc57fb4c90530298b54c8/html5/thumbnails/18.jpg)
HBase: API:DML GET
Configuration conf = HBaseConfiguration.create();HTable table = new HTable(conf, "testtable");Get get = new Get(Bytes.toBytes("row1"));get.addColumn(Bytes.toBytes("colfam1"),
Bytes.toBytes("qual1"));Result result = table.get(get);byte[] val =
result.getValue(Bytes.toBytes("colfam1"),Bytes.toBytes("qual1"));System.out.println("Value: " + Bytes.toString(val));
![Page 19: Dancing with the elephant h base1_final](https://reader035.fdocuments.in/reader035/viewer/2022062404/554bc57fb4c90530298b54c8/html5/thumbnails/19.jpg)
HBase: API:DML SCAN
Scan scan1 = new Scan();ResultScanner scanner1 = table.getScanner(scan1);
for (Result res : scanner1) {System.out.println(res);
}scanner1.close();
![Page 20: Dancing with the elephant h base1_final](https://reader035.fdocuments.in/reader035/viewer/2022062404/554bc57fb4c90530298b54c8/html5/thumbnails/20.jpg)
Other Projects around HBase
• SQL Layer: Phoenix, Hive, Impala• Object Persistence: Lily, Kundera
![Page 21: Dancing with the elephant h base1_final](https://reader035.fdocuments.in/reader035/viewer/2022062404/554bc57fb4c90530298b54c8/html5/thumbnails/21.jpg)
FollowUp
• Part2: Building KeyValue Data store in HBase Challenges we faced in SMART
• {Rahul, vinay}@briotribes.com
![Page 22: Dancing with the elephant h base1_final](https://reader035.fdocuments.in/reader035/viewer/2022062404/554bc57fb4c90530298b54c8/html5/thumbnails/22.jpg)
Shoutout To
![Page 23: Dancing with the elephant h base1_final](https://reader035.fdocuments.in/reader035/viewer/2022062404/554bc57fb4c90530298b54c8/html5/thumbnails/23.jpg)
HBase: Usecase (Facebook)
• Facebook Messaging: Titan 1.5 M ops per second at peak 6B+ messages per day 16 columns per operation across diff. families
• Facebook insights: Puma provides developers and Page owners with metrics about
their content > 1 M counter increments per second
![Page 24: Dancing with the elephant h base1_final](https://reader035.fdocuments.in/reader035/viewer/2022062404/554bc57fb4c90530298b54c8/html5/thumbnails/24.jpg)