Introduction to Hbase - Distributed Managementdmod.eu/WeeklyMeeting/hbase.pdf · Introduction to...
Transcript of Introduction to Hbase - Distributed Managementdmod.eu/WeeklyMeeting/hbase.pdf · Introduction to...
![Page 1: Introduction to Hbase - Distributed Managementdmod.eu/WeeklyMeeting/hbase.pdf · Introduction to Hbase ... CFs stored separately on disk access one without ... Tried MySQL,Cassandra,](https://reader030.fdocuments.in/reader030/viewer/2022020302/5a9d93bd7f8b9a28388bdf8e/html5/thumbnails/1.jpg)
Introduction to HbaseGkavresis Giorgos 1470
![Page 2: Introduction to Hbase - Distributed Managementdmod.eu/WeeklyMeeting/hbase.pdf · Introduction to Hbase ... CFs stored separately on disk access one without ... Tried MySQL,Cassandra,](https://reader030.fdocuments.in/reader030/viewer/2022020302/5a9d93bd7f8b9a28388bdf8e/html5/thumbnails/2.jpg)
Agenda
What is Hbase Installation About RDBMS Overview of Hbase Why Hbase instead of RDBMS Architecture of Hbase Hbase interface Summarise
![Page 3: Introduction to Hbase - Distributed Managementdmod.eu/WeeklyMeeting/hbase.pdf · Introduction to Hbase ... CFs stored separately on disk access one without ... Tried MySQL,Cassandra,](https://reader030.fdocuments.in/reader030/viewer/2022020302/5a9d93bd7f8b9a28388bdf8e/html5/thumbnails/3.jpg)
What is Hbase
Hbase is an open source,distributed sorted map modeled after Google's BigTable
![Page 4: Introduction to Hbase - Distributed Managementdmod.eu/WeeklyMeeting/hbase.pdf · Introduction to Hbase ... CFs stored separately on disk access one without ... Tried MySQL,Cassandra,](https://reader030.fdocuments.in/reader030/viewer/2022020302/5a9d93bd7f8b9a28388bdf8e/html5/thumbnails/4.jpg)
Open Source Apache 2.0 License Commiters and contributors from diverse
organizations like Facebook,Trend Micro etc.
![Page 5: Introduction to Hbase - Distributed Managementdmod.eu/WeeklyMeeting/hbase.pdf · Introduction to Hbase ... CFs stored separately on disk access one without ... Tried MySQL,Cassandra,](https://reader030.fdocuments.in/reader030/viewer/2022020302/5a9d93bd7f8b9a28388bdf8e/html5/thumbnails/5.jpg)
Installation
Download link
http://www.apache.org/dyn/closer.cgi/hbase/
Before starting it, you might want to edit
conf/hbasesite.xml and set the directory you want
HBase to write to, hbase.rootdir
Can be standalone or pseudo distributed and distributed
Start Hbase via $ ./bin/starthbase.sh
![Page 6: Introduction to Hbase - Distributed Managementdmod.eu/WeeklyMeeting/hbase.pdf · Introduction to Hbase ... CFs stored separately on disk access one without ... Tried MySQL,Cassandra,](https://reader030.fdocuments.in/reader030/viewer/2022020302/5a9d93bd7f8b9a28388bdf8e/html5/thumbnails/6.jpg)
About Relational DatabaseManagementSystems
Have a lot of Limitations Both read / write throughout not
possible(transactional databases) Specialized Hardware is quite expensive
![Page 7: Introduction to Hbase - Distributed Managementdmod.eu/WeeklyMeeting/hbase.pdf · Introduction to Hbase ... CFs stored separately on disk access one without ... Tried MySQL,Cassandra,](https://reader030.fdocuments.in/reader030/viewer/2022020302/5a9d93bd7f8b9a28388bdf8e/html5/thumbnails/7.jpg)
Background
Google releases paper on Bigtable – 2006 First usable Hbase – 2007 Hbase becomes Apache topleven project – 2010 Hbase 0.26.5 released.
![Page 8: Introduction to Hbase - Distributed Managementdmod.eu/WeeklyMeeting/hbase.pdf · Introduction to Hbase ... CFs stored separately on disk access one without ... Tried MySQL,Cassandra,](https://reader030.fdocuments.in/reader030/viewer/2022020302/5a9d93bd7f8b9a28388bdf8e/html5/thumbnails/8.jpg)
Overview of Hbase
Hbase is a part of Hadoop Apache Hadoop is an opensource system to
reliably store and process data across many commodity computers
Hbase and Hadoop are written in Java Hadoop provides:
Fault tolerance Scalability
![Page 9: Introduction to Hbase - Distributed Managementdmod.eu/WeeklyMeeting/hbase.pdf · Introduction to Hbase ... CFs stored separately on disk access one without ... Tried MySQL,Cassandra,](https://reader030.fdocuments.in/reader030/viewer/2022020302/5a9d93bd7f8b9a28388bdf8e/html5/thumbnails/9.jpg)
Hadoop advantages
Data pararell or computepararell.For example: Extensive machine learning on <100 GB of image
data Simple SQL queries on >100 TB of clickstreaming
data
![Page 10: Introduction to Hbase - Distributed Managementdmod.eu/WeeklyMeeting/hbase.pdf · Introduction to Hbase ... CFs stored separately on disk access one without ... Tried MySQL,Cassandra,](https://reader030.fdocuments.in/reader030/viewer/2022020302/5a9d93bd7f8b9a28388bdf8e/html5/thumbnails/10.jpg)
Hadoop's components
MapReduce(Process) Faulttolerant distributed processing
HDFS(store) Selfhealing Highbandwidth Clustered storage
![Page 11: Introduction to Hbase - Distributed Managementdmod.eu/WeeklyMeeting/hbase.pdf · Introduction to Hbase ... CFs stored separately on disk access one without ... Tried MySQL,Cassandra,](https://reader030.fdocuments.in/reader030/viewer/2022020302/5a9d93bd7f8b9a28388bdf8e/html5/thumbnails/11.jpg)
Difference Between Hadoop/HDFS and Hbase
HDFS is a distributed file system that is well suited for the storage of large files.HBase, on the other hand, is built on top of HDFS and provides fast record lookups (and updates) for large tables.
HDFS has based on GFS file system.
![Page 12: Introduction to Hbase - Distributed Managementdmod.eu/WeeklyMeeting/hbase.pdf · Introduction to Hbase ... CFs stored separately on disk access one without ... Tried MySQL,Cassandra,](https://reader030.fdocuments.in/reader030/viewer/2022020302/5a9d93bd7f8b9a28388bdf8e/html5/thumbnails/12.jpg)
Hbase is
Distributed – uses HDFS for storage Column – Oriented MultiDimensional(Versions) Storage System
![Page 13: Introduction to Hbase - Distributed Managementdmod.eu/WeeklyMeeting/hbase.pdf · Introduction to Hbase ... CFs stored separately on disk access one without ... Tried MySQL,Cassandra,](https://reader030.fdocuments.in/reader030/viewer/2022020302/5a9d93bd7f8b9a28388bdf8e/html5/thumbnails/13.jpg)
Hbase is NOT
A sql Database – No Joins, no query engine, no datatypes, no (damn) sql
No Schema No DBA needed
![Page 14: Introduction to Hbase - Distributed Managementdmod.eu/WeeklyMeeting/hbase.pdf · Introduction to Hbase ... CFs stored separately on disk access one without ... Tried MySQL,Cassandra,](https://reader030.fdocuments.in/reader030/viewer/2022020302/5a9d93bd7f8b9a28388bdf8e/html5/thumbnails/14.jpg)
Storage Model
Column – oriented database (column families) Table consists of Rows, each which has a primary
key(row key) Each Row may have any number of columns Table schema only defines Column familes(column
family can have any number of columns) Each cell value has a timestamp
![Page 15: Introduction to Hbase - Distributed Managementdmod.eu/WeeklyMeeting/hbase.pdf · Introduction to Hbase ... CFs stored separately on disk access one without ... Tried MySQL,Cassandra,](https://reader030.fdocuments.in/reader030/viewer/2022020302/5a9d93bd7f8b9a28388bdf8e/html5/thumbnails/15.jpg)
Static Columns
int varchar int varchar int
int varchar int varchar int
int varchar int varchar int
![Page 16: Introduction to Hbase - Distributed Managementdmod.eu/WeeklyMeeting/hbase.pdf · Introduction to Hbase ... CFs stored separately on disk access one without ... Tried MySQL,Cassandra,](https://reader030.fdocuments.in/reader030/viewer/2022020302/5a9d93bd7f8b9a28388bdf8e/html5/thumbnails/16.jpg)
Something different
Row1 ColA = Value→
ColB = Value ColC = Value Row2 ColX = Value→
ColY = Value ColZ = Value
![Page 17: Introduction to Hbase - Distributed Managementdmod.eu/WeeklyMeeting/hbase.pdf · Introduction to Hbase ... CFs stored separately on disk access one without ... Tried MySQL,Cassandra,](https://reader030.fdocuments.in/reader030/viewer/2022020302/5a9d93bd7f8b9a28388bdf8e/html5/thumbnails/17.jpg)
A Big MapRow Key + Column Key + timestamp
=> valueRow Key Column Key Timestamp Value
1 Info:name 1273516197868
Sakis
1 Info:age 1273871824184
21
1 Info:sex 1273746281432
Male
2 Info:name 1273863723227
Themis
2 Info:name 1273973134238
Andreas
![Page 18: Introduction to Hbase - Distributed Managementdmod.eu/WeeklyMeeting/hbase.pdf · Introduction to Hbase ... CFs stored separately on disk access one without ... Tried MySQL,Cassandra,](https://reader030.fdocuments.in/reader030/viewer/2022020302/5a9d93bd7f8b9a28388bdf8e/html5/thumbnails/18.jpg)
One more example
Row Key Data
cutting Info:{'height':'9ft','state':'CA'}Roles:{'ASF':Director','Hadoop':'Founder'}
tlipcon Info:{'height':5ft7','state':'CA'}Roles:{'Hadoop':'Committer'@ts=2010'Hadoop':'PMC'@ts=2011'Hive':'Contributor'}
![Page 19: Introduction to Hbase - Distributed Managementdmod.eu/WeeklyMeeting/hbase.pdf · Introduction to Hbase ... CFs stored separately on disk access one without ... Tried MySQL,Cassandra,](https://reader030.fdocuments.in/reader030/viewer/2022020302/5a9d93bd7f8b9a28388bdf8e/html5/thumbnails/19.jpg)
Column Families
Different sets of columns may have different priorities
CFs stored separately on disk access one without wasting IO on the other.
Configurable by column family Compression(none,gzip,LZO) Version retention policies Cache priority
![Page 20: Introduction to Hbase - Distributed Managementdmod.eu/WeeklyMeeting/hbase.pdf · Introduction to Hbase ... CFs stored separately on disk access one without ... Tried MySQL,Cassandra,](https://reader030.fdocuments.in/reader030/viewer/2022020302/5a9d93bd7f8b9a28388bdf8e/html5/thumbnails/20.jpg)
Hbase vs RDBMS
RDBMS Hbase
Data layout Row-oriented Column family oriented
Query language SQL Get/put/scan/etc *
Security Authentication/Authorization
Work in Progress
Max data size TBs Hundrends of PBs
Read / write throughput limits
1000s queries/second Millions of queries per second
![Page 21: Introduction to Hbase - Distributed Managementdmod.eu/WeeklyMeeting/hbase.pdf · Introduction to Hbase ... CFs stored separately on disk access one without ... Tried MySQL,Cassandra,](https://reader030.fdocuments.in/reader030/viewer/2022020302/5a9d93bd7f8b9a28388bdf8e/html5/thumbnails/21.jpg)
Terms and Daemons
Region A subset of table's rows,
RegionServer(slave) Serves data for reads and writes
Master Responsible for coordinating the slaves Assigns regions, detects failures of Region Servers Control some admin function
![Page 22: Introduction to Hbase - Distributed Managementdmod.eu/WeeklyMeeting/hbase.pdf · Introduction to Hbase ... CFs stored separately on disk access one without ... Tried MySQL,Cassandra,](https://reader030.fdocuments.in/reader030/viewer/2022020302/5a9d93bd7f8b9a28388bdf8e/html5/thumbnails/22.jpg)
Distributed coordination
To manage master election and server availability we use Zookeeper
Set up a cluster, provides distributed coordination primitives
An excellent tool for building cluster management systems
![Page 23: Introduction to Hbase - Distributed Managementdmod.eu/WeeklyMeeting/hbase.pdf · Introduction to Hbase ... CFs stored separately on disk access one without ... Tried MySQL,Cassandra,](https://reader030.fdocuments.in/reader030/viewer/2022020302/5a9d93bd7f8b9a28388bdf8e/html5/thumbnails/23.jpg)
Hbase Architecture
![Page 24: Introduction to Hbase - Distributed Managementdmod.eu/WeeklyMeeting/hbase.pdf · Introduction to Hbase ... CFs stored separately on disk access one without ... Tried MySQL,Cassandra,](https://reader030.fdocuments.in/reader030/viewer/2022020302/5a9d93bd7f8b9a28388bdf8e/html5/thumbnails/24.jpg)
Distributed coordination
To manage master election and server availability we use Zookeeper
Set up a cluster, provides distributed coordination primitives
An excellent tool for building cluster management systems
![Page 25: Introduction to Hbase - Distributed Managementdmod.eu/WeeklyMeeting/hbase.pdf · Introduction to Hbase ... CFs stored separately on disk access one without ... Tried MySQL,Cassandra,](https://reader030.fdocuments.in/reader030/viewer/2022020302/5a9d93bd7f8b9a28388bdf8e/html5/thumbnails/25.jpg)
Hbase Interface
Java Thrift(Ruby,Php,Python,Perl,C++,..) Hbase Shell
![Page 26: Introduction to Hbase - Distributed Managementdmod.eu/WeeklyMeeting/hbase.pdf · Introduction to Hbase ... CFs stored separately on disk access one without ... Tried MySQL,Cassandra,](https://reader030.fdocuments.in/reader030/viewer/2022020302/5a9d93bd7f8b9a28388bdf8e/html5/thumbnails/26.jpg)
Hbase API
get(row) put(row,Map<column,value>) scan(key range, filter) increment(row, columns) Check and Put, delete etc.
![Page 27: Introduction to Hbase - Distributed Managementdmod.eu/WeeklyMeeting/hbase.pdf · Introduction to Hbase ... CFs stored separately on disk access one without ... Tried MySQL,Cassandra,](https://reader030.fdocuments.in/reader030/viewer/2022020302/5a9d93bd7f8b9a28388bdf8e/html5/thumbnails/27.jpg)
Hbase shell hbase(main):003:0> create 'test', 'cf'
0 row(s) in 1.2200 seconds
hbase(main):004:0> put 'test', 'row1', 'cf:a', 'value1'
0 row(s) in 0.0560 seconds
hbase(main):005:0> put 'test', 'row2', 'cf:b', 'value2'
0 row(s) in 0.0370 seconds
hbase(main):006:0> put 'test', 'row3', 'cf:c', 'value3'
0 row(s) in 0.0450 seconds
![Page 28: Introduction to Hbase - Distributed Managementdmod.eu/WeeklyMeeting/hbase.pdf · Introduction to Hbase ... CFs stored separately on disk access one without ... Tried MySQL,Cassandra,](https://reader030.fdocuments.in/reader030/viewer/2022020302/5a9d93bd7f8b9a28388bdf8e/html5/thumbnails/28.jpg)
Hbase shell cont.
hbase(main):007:0> scan 'test'
ROW COLUMN+CELL
row1 column=cf:a, timestamp=1288380727188, value=value1
row2 column=cf:b, timestamp=1288380738440, value=value2
row3 column=cf:c, timestamp=1288380747365, value=value3
3 row(s) in 0.0590 seconds
![Page 29: Introduction to Hbase - Distributed Managementdmod.eu/WeeklyMeeting/hbase.pdf · Introduction to Hbase ... CFs stored separately on disk access one without ... Tried MySQL,Cassandra,](https://reader030.fdocuments.in/reader030/viewer/2022020302/5a9d93bd7f8b9a28388bdf8e/html5/thumbnails/29.jpg)
Hbase in java
HBaseConfiguration conf = new HBaseConfiguration();
conf.addResource(new Path("/opt/hbase0.19.3/conf/hbasesite.xml"));
HTable table = new HTable(conf, "test_table");
BatchUpdate batchUpdate = new BatchUpdate("test_row1");
batchUpdate.put("columnfamily:column1", Bytes.toBytes("some value"));
batchUpdate.delete("column1");
table.commit(batchUpdate);
![Page 30: Introduction to Hbase - Distributed Managementdmod.eu/WeeklyMeeting/hbase.pdf · Introduction to Hbase ... CFs stored separately on disk access one without ... Tried MySQL,Cassandra,](https://reader030.fdocuments.in/reader030/viewer/2022020302/5a9d93bd7f8b9a28388bdf8e/html5/thumbnails/30.jpg)
Get Data
Read one column value from a row
Cell cell = table.get("test_row1", "columnfamily1:column1");
To read one row with given columns, use HTable#getRow() method.
RowResult singleRow = table.getRow(Bytes.toBytes("test_row1"));
![Page 31: Introduction to Hbase - Distributed Managementdmod.eu/WeeklyMeeting/hbase.pdf · Introduction to Hbase ... CFs stored separately on disk access one without ... Tried MySQL,Cassandra,](https://reader030.fdocuments.in/reader030/viewer/2022020302/5a9d93bd7f8b9a28388bdf8e/html5/thumbnails/31.jpg)
A ”tough” facebook application
Realtime counters of URLs shared, links ”liked”, impressions generated
20 billion events/day (200K events/sec) ~30 sec latency from click to count Heavy use of incrementColumnValue API Tried MySQL,Cassandra, settled on Hbase
![Page 32: Introduction to Hbase - Distributed Managementdmod.eu/WeeklyMeeting/hbase.pdf · Introduction to Hbase ... CFs stored separately on disk access one without ... Tried MySQL,Cassandra,](https://reader030.fdocuments.in/reader030/viewer/2022020302/5a9d93bd7f8b9a28388bdf8e/html5/thumbnails/32.jpg)
Use Hbase if
You need random wrire,random read or both (but not neither)
You need to do many thousands of operations per sec on multiple TB of data
Your access patterns are simple
![Page 33: Introduction to Hbase - Distributed Managementdmod.eu/WeeklyMeeting/hbase.pdf · Introduction to Hbase ... CFs stored separately on disk access one without ... Tried MySQL,Cassandra,](https://reader030.fdocuments.in/reader030/viewer/2022020302/5a9d93bd7f8b9a28388bdf8e/html5/thumbnails/33.jpg)
Thank you \../