1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN,...
-
Upload
abigayle-carpenter -
Category
Documents
-
view
222 -
download
0
description
Transcript of 1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN,...
![Page 1: 1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva…](https://reader035.fdocuments.in/reader035/viewer/2022062601/5a4d1bde7f8b9ab0599de503/html5/thumbnails/1.jpg)
1
HBASE – THE SCALABLE DATA STOREAn Introduction to HBaseXLDB Europe Workshop 2013: CERN, Geneva
James KinleyEMEA Solutions Architect, Cloudera
![Page 2: 1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva…](https://reader035.fdocuments.in/reader035/viewer/2022062601/5a4d1bde7f8b9ab0599de503/html5/thumbnails/2.jpg)
2
— The Apache Software Foundation
“Apache HBase is the Hadoop database, a distributed, scalable, big data store.”
![Page 3: 1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva…](https://reader035.fdocuments.in/reader035/viewer/2022062601/5a4d1bde7f8b9ab0599de503/html5/thumbnails/3.jpg)
Why Hadoop and HBase?
3
• Datasets are constantly growing and intake soars• CERN stores 100PB of physics data, with 75PB being
generated in past 3 years• Traditional databases are expensive to scale and
inherently difficult to distribute• Commodity hardware is cheap and powerful• Hadoop…
• Is designed to store and process extremely large datasets in batch
• Is not intended for realtime querying• Does not support random access
![Page 4: 1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva…](https://reader035.fdocuments.in/reader035/viewer/2022062601/5a4d1bde7f8b9ab0599de503/html5/thumbnails/4.jpg)
History of Hadoop and HBase
4
• Google solved its scalability problems• “The Google File System” published October 2003
• Hadoop DFS
• “MapReduce: Simplified Data Processing on Large Clusters” published December 2004• Hadoop MapReduce
• “BigTable: A Distributed Storage System for Structured Data” published November 2006• HBase
![Page 5: 1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva…](https://reader035.fdocuments.in/reader035/viewer/2022062601/5a4d1bde7f8b9ab0599de503/html5/thumbnails/5.jpg)
What is HBase?
5
• Distributed• Column-Oriented• Multi-Dimensional
• High-Availability (CAP?)• High-Performance• Storage System
• Project Goals:• Billions of Rows * Millions of Columns * Thousands of
Versions• Petabytes of data stored across thousands of commodity
servers
![Page 6: 1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva…](https://reader035.fdocuments.in/reader035/viewer/2022062601/5a4d1bde7f8b9ab0599de503/html5/thumbnails/6.jpg)
HBase is not…
6
• A SQL Database• No native query engine, no SQL, no types, no joins• Transactions and secondary indexes only as add-ons but
immature
• A drop-in replacement for your RDBMS• You must be ok with RDBMS anti-schema
• Denormalized data• Wide and sparsely populated tables• Just say “no” to your DBA
![Page 7: 1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva…](https://reader035.fdocuments.in/reader035/viewer/2022062601/5a4d1bde7f8b9ab0599de503/html5/thumbnails/7.jpg)
HBase tables
7
![Page 8: 1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva…](https://reader035.fdocuments.in/reader035/viewer/2022062601/5a4d1bde7f8b9ab0599de503/html5/thumbnails/8.jpg)
HBase tables
8
![Page 9: 1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva…](https://reader035.fdocuments.in/reader035/viewer/2022062601/5a4d1bde7f8b9ab0599de503/html5/thumbnails/9.jpg)
HBase tables
9
![Page 10: 1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva…](https://reader035.fdocuments.in/reader035/viewer/2022062601/5a4d1bde7f8b9ab0599de503/html5/thumbnails/10.jpg)
HBase tables
10
![Page 11: 1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva…](https://reader035.fdocuments.in/reader035/viewer/2022062601/5a4d1bde7f8b9ab0599de503/html5/thumbnails/11.jpg)
HBase tables
11
![Page 12: 1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva…](https://reader035.fdocuments.in/reader035/viewer/2022062601/5a4d1bde7f8b9ab0599de503/html5/thumbnails/12.jpg)
HBase tables
12
![Page 13: 1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva…](https://reader035.fdocuments.in/reader035/viewer/2022062601/5a4d1bde7f8b9ab0599de503/html5/thumbnails/13.jpg)
HBase tables
13
![Page 14: 1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva…](https://reader035.fdocuments.in/reader035/viewer/2022062601/5a4d1bde7f8b9ab0599de503/html5/thumbnails/14.jpg)
HBase tables
14
![Page 15: 1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva…](https://reader035.fdocuments.in/reader035/viewer/2022062601/5a4d1bde7f8b9ab0599de503/html5/thumbnails/15.jpg)
HBase tables
15
![Page 16: 1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva…](https://reader035.fdocuments.in/reader035/viewer/2022062601/5a4d1bde7f8b9ab0599de503/html5/thumbnails/16.jpg)
HBase tables
16
![Page 17: 1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva…](https://reader035.fdocuments.in/reader035/viewer/2022062601/5a4d1bde7f8b9ab0599de503/html5/thumbnails/17.jpg)
HBase tables
17
![Page 18: 1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva…](https://reader035.fdocuments.in/reader035/viewer/2022062601/5a4d1bde7f8b9ab0599de503/html5/thumbnails/18.jpg)
HBase tables
18
![Page 19: 1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva…](https://reader035.fdocuments.in/reader035/viewer/2022062601/5a4d1bde7f8b9ab0599de503/html5/thumbnails/19.jpg)
HBase tables
19
• Tables are sorted by Row Key in lexicographical order• Table schema only defines its Column Families
• Each family consists of any number of Columns• Each column consists of any number of Versions• Columns only exist when inserted, no NULLs• Columns within a family are sorted and stored together• Everything except table name are byte[]
• (Table > Row Key > Family:Column > Timestamp) > Value
![Page 20: 1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva…](https://reader035.fdocuments.in/reader035/viewer/2022062601/5a4d1bde7f8b9ab0599de503/html5/thumbnails/20.jpg)
HBase Architecture
20
• Table is made up of any number of regions• Region is specified by its startKey and endKey• Each region may live on different node and is made up
of several HDFS files and blocks• Two types of node: Master and RegionServer• Special tables -ROOT- and .META. store schema
information and region locations• Master server monitors RegionServers as well as
region assignment and load balancing• Uses ZooKeeper for distributed coordination
![Page 21: 1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva…](https://reader035.fdocuments.in/reader035/viewer/2022062601/5a4d1bde7f8b9ab0599de503/html5/thumbnails/21.jpg)
HBase Architecture
21
![Page 22: 1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva…](https://reader035.fdocuments.in/reader035/viewer/2022062601/5a4d1bde7f8b9ab0599de503/html5/thumbnails/22.jpg)
Impala
22
• Open-source, general-purpose SQL query engine• Runs directly within Hadoop:
• Reads widely used Hadoop file formats and HBase tables• Talks to widely used Hadoop storage managers• Runs on the same nodes that run Hadoop processes
• High performance • C++ instead of Java• Runtime code generation (LLVM)• A completely new execution engine that doesn’t build on
MapReduce