@JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS !...

107
Phoenix James Taylor @JamesPlusPlus http://phoenix-hbase.blogspot.com/ We put the SQL back in NoSQL https://github.com/forcedotcom/phoenix

Transcript of @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS !...

Page 1: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Phoenix James Taylor @JamesPlusPlus http://phoenix-hbase.blogspot.com/

We put the SQL back in NoSQL https://github.com/forcedotcom/phoenix

Page 2: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Agenda

Completed

l What/why HBase?

Page 3: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Agenda

Completed

l What/why HBase? l What/why Phoenix?

Page 4: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Agenda

Completed

l What/why HBase? l What/why Phoenix? l How does Phoenix work?

Page 5: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Agenda

Completed

l What/why HBase? l What/why Phoenix? l How does Phoenix work? l Demo

Page 6: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Agenda

Completed

l What/why HBase? l What/why Phoenix? l How does Phoenix work? l Demo l Roadmap

Page 7: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Agenda

Completed

l What/why HBase? l What/why Phoenix? l How does Phoenix work? l Demo l Roadmap l Q&A

Page 8: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

What is HBase?

Completed

l Developed as part of Apache Hadoop

Page 9: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

What is HBase?

Completed

l Developed as part of Apache Hadoop l Runs on top of HDFS

Page 10: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

What is HBase?

Completed

l Developed as part of Apache Hadoop l Runs on top of HDFS l Key/value store

Page 11: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

What is HBase?

Completed

l Developed as part of Apache Hadoop l Runs on top of HDFS l Key/value store

Map

Page 12: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

What is HBase?

Completed

l Developed as part of Apache Hadoop l Runs on top of HDFS l Key/value store

Map

Distributed

Page 13: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

What is HBase?

Completed

l Developed as part of Apache Hadoop l Runs on top of HDFS l Key/value store

Map

Distributed

Sparse

Page 14: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

What is HBase?

Completed

l Developed as part of Apache Hadoop l Runs on top of HDFS l Key/value store

Map Sorted

Distributed

Sparse

Page 15: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

What is HBase?

Completed

l Developed as part of Apache Hadoop l Runs on top of HDFS l Key/value store

Map Sorted

Distributed Consistent

Sparse

Page 16: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

What is HBase?

Completed

l Developed as part of Apache Hadoop l Runs on top of HDFS l Key/value store

Map Sorted

Distributed Consistent

Sparse Multidimensional

Page 17: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Cluster Architecture

Page 18: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Sharding

Page 19: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Why Use HBase?

Completed

l If you have lots of data

Page 20: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Why Use HBase?

Completed

l If you have lots of data l Scales linearly

Page 21: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Why Use HBase?

Completed

l If you have lots of data l Scales linearly l Shards automatically

Page 22: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Why Use HBase?

Completed

l If you have lots of data l Scales linearly l Shards automatically

l If you can live without transactions

Page 23: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Why Use HBase?

Completed

l If you have lots of data l Scales linearly l Shards automatically

l If you can live without transactions l If your data changes

Page 24: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Why Use HBase?

Completed

l If you have lots of data l Scales linearly l Shards automatically

l If you can live without transactions l If your data changes l If you need strict consistency

Page 25: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

What is Phoenix?

Completed

Page 26: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

What is Phoenix?

Completed

l SQL skin for HBase

Page 27: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

What is Phoenix?

Completed

l SQL skin for HBase l Alternate client API

Page 28: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

What is Phoenix?

Completed

l SQL skin for HBase l Alternate client API l Embedded JDBC driver

Page 29: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

What is Phoenix?

Completed

l SQL skin for HBase l Alternate client API l Embedded JDBC driver l Runs at HBase native speed

Page 30: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

What is Phoenix?

Completed

l SQL skin for HBase l Alternate client API l Embedded JDBC driver l Runs at HBase native speed l Compiles SQL into native HBase calls

Page 31: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

What is Phoenix?

Completed

l SQL skin for HBase l Alternate client API l Embedded JDBC driver l Runs at HBase native speed l Compiles SQL into native HBase calls l So you don’t have to!

Page 32: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Cluster Architecture

Page 33: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Cluster Architecture

Phoenix

Page 34: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Cluster Architecture

Phoenix

Phoenix

Page 35: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Phoenix Performance

Page 36: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Why Use Phoenix?

Page 37: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Why Use Phoenix?

Completed

l Give folks an API they already know

Page 38: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Why Use Phoenix?

Completed

l Give folks an API they already know l Reduce the amount of code needed

Page 39: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Why Use Phoenix?

Completed

l Give folks an API they already know l Reduce the amount of code needed

SELECT TRUNC(date,'DAY’), AVG(cpu) FROM web_stat WHERE domain LIKE 'Salesforce%’ GROUP BY TRUNC(date,'DAY’)

Page 40: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Why Use Phoenix?

Completed

l Give folks an API they already know l Reduce the amount of code needed l Perform optimizations transparently

Page 41: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Why Use Phoenix?

Completed

l Give folks an API they already know l Reduce the amount of code needed l Perform optimizations transparently

l Aggregation l Skip Scan l Secondary indexing (soon!)

Page 42: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Why Use Phoenix?

Completed

l Give folks an API they already know l Reduce the amount of code needed l Perform optimizations transparently l Leverage existing tooling

Page 43: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Why Use Phoenix?

Completed

l Give folks an API they already know l Reduce the amount of code needed l Perform optimizations transparently l Leverage existing tooling

l SQL client/terminal l OLAP engine

Page 44: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

How Does Phoenix Work?

Completed

l Overlays on top of HBase Data Model l Keeps Versioned Schema Respository l Query Processor

Page 45: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Phoenix Data Model

HBase Table

Phoenix maps HBase data model to the relational world

Page 46: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Phoenix Data Model

HBase Table Column Family A Column Family B

Phoenix maps HBase data model to the relational world

Page 47: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Phoenix Data Model

HBase Table Column Family A Column Family B

Qualifier 1 Qualifier 2 Qualifier 3

Phoenix maps HBase data model to the relational world

Page 48: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Phoenix Data Model

HBase Table Column Family A Column Family B

Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value

Phoenix maps HBase data model to the relational world

Page 49: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Phoenix Data Model

HBase Table Column Family A Column Family B

Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value

Row Key 2 Value Value

Phoenix maps HBase data model to the relational world

Page 50: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Phoenix Data Model

HBase Table Column Family A Column Family B

Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value

Row Key 2 Value Value

Row Key 3 Value

Phoenix maps HBase data model to the relational world

Page 51: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

HBase Table Column Family A Column Family B

Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value

Row Key 2 Value Value

Row Key 3 Value

Phoenix Data Model

HBase Table Column Family A Column Family B

Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value

Row Key 2 Value Value

Row Key 3 Value

Phoenix maps HBase data model to the relational world

Page 52: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

HBase Table Column Family A Column Family B

Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value

Row Key 2 Value Value

Row Key 3 Value

HBase Table Column Family A Column Family B

Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value

Row Key 2 Value Value

Row Key 3 Value

Phoenix Data Model

HBase Table Column Family A Column Family B

Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value

Row Key 2 Value Value

Row Key 3 Value

Phoenix maps HBase data model to the relational world

Page 53: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

HBase Table Column Family A Column Family B

Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value

Row Key 2 Value Value

Row Key 3 Value

HBase Table Column Family A Column Family B

Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value

Row Key 2 Value Value

Row Key 3 Value

Phoenix Data Model

HBase Table Column Family A Column Family B

Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value

Row Key 2 Value Value

Row Key 3 Value

Phoenix maps HBase data model to the relational world

Multiple Versions

Page 54: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Phoenix Data Model

HBase Table Column Family A Column Family B

Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value

Row Key 2 Value Value

Row Key 3 Value

Phoenix maps HBase data model to the relational world Phoenix Table

Page 55: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Phoenix Data Model

HBase Table Column Family A Column Family B

Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value

Row Key 2 Value Value

Row Key 3 Value

Phoenix maps HBase data model to the relational world Phoenix Table

Key Value Columns

Page 56: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Phoenix Data Model

HBase Table Column Family A Column Family B

Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value

Row Key 2 Value Value

Row Key 3 Value

Phoenix maps HBase data model to the relational world Phoenix Table

Key Value Columns Row Key Columns

Page 57: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Phoenix Metadata

Completed

l  Stored in a Phoenix HBase table

Page 58: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Phoenix Metadata

Completed

l  Stored in a Phoenix HBase table l  SYSTEM.TABLE

Page 59: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Phoenix Metadata

Completed

l  Stored in a Phoenix HBase table l  Updated through DDL commands

Page 60: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Phoenix Metadata

Completed

l  Stored in a Phoenix HBase table l  Updated through DDL commands

l  CREATE TABLE l  ALTER TABLE l  DROP TABLE l  CREATE INDEX l  DROP INDEX

Page 61: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Phoenix Metadata

Completed

l  Stored in a Phoenix HBase table l  Updated through DDL commands l  Keeps older versions as schema evolves

Page 62: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Phoenix Metadata

Completed

l  Stored in a Phoenix HBase table l  Updated through DDL commands l  Keeps older versions as schema evolves l  Correlates timestamps between schema and data

Page 63: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Phoenix Metadata

Completed

l  Stored in a Phoenix HBase table l  Updated through DDL commands l  Keeps older versions as schema evolves l  Correlates timestamps between schema and data

l  Flashback queries use schema that was in-place then

Page 64: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Phoenix Metadata

Completed

l  Stored in a Phoenix HBase table l  Updated through DDL commands l  Keeps older versions as schema evolves l  Correlates timestamps between schema and data l  Accessible via JDBC metadata APIs

Page 65: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Phoenix Metadata

Completed

l  Stored in a Phoenix HBase table l  Updated through DDL commands l  Keeps older versions as schema evolves l  Correlates timestamps between schema and data l  Accessible via JDBC metadata APIs

l  java.sql.DatabaseMetaData l  Through Phoenix queries!

Page 66: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Example

Row Key

SERVER METRICS

HOST VARCHAR DATE DATE RESPONSE_TIME INTEGER GC_TIME INTEGER CPU_TIME INTEGER IO_TIME INTEGER …

Over metrics data for clusters of servers with a schema like this:

Page 67: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Example Over metrics data for clusters of servers with a schema like this:

Key Values

SERVER METRICS

HOST VARCHAR DATE DATE RESPONSE_TIME INTEGER GC_TIME INTEGER CPU_TIME INTEGER IO_TIME INTEGER …

Page 68: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

With 90 days of data that looks like this:

SERVER METRICS HOST DATE RESPONSE_TIME GC_TIME

sf1.s1 Jun 5 10:10:10.234 1234 sf1.s1 Jun 5 11:18:28.456 8012 … sf3.s1 Jun 5 10:10:10.234 2345 sf3.s1 Jun 6 12:46:19.123 2340 sf7.s9 Jun 4 08:23:23.456 5002 1234 …

Example

Page 69: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Example Walk through query processing for three scenarios

Page 70: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Example Walk through query processing for three scenarios

1.  Chart Response Time Per Cluster

Page 71: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Example Walk through query processing for three scenarios

1.  Chart Response Time Per Cluster

2.  Identify 5 Longest GC Times

Page 72: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Example Walk through query processing for three scenarios

1.  Chart Response Time Per Cluster

2.  Identify 5 Longest GC Times

3.  Identify 5 Longest GC Times again and again

Page 73: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Scenario 1 Chart Response Time Per Cluster

Completed

SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’)

Page 74: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Scenario 1 Chart Response Time Per Cluster

Completed

SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’)

Page 75: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Scenario 1 Chart Response Time Per Cluster

Completed

SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’)

Page 76: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Scenario 1 Chart Response Time Per Cluster

Completed

SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’)

Page 77: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Scenario 1 Chart Response Time Per Cluster

Completed

SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’)

Page 78: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Step 1: Client Identify Row Key Ranges from Query

Completed

SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’)

Row Key Ranges HOST DATE

Page 79: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Step 1: Client Identify Row Key Ranges from Query

Completed

SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’)

Row Key Ranges HOST DATE

Page 80: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Step 1: Client Identify Row Key Ranges from Query

Completed

SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’)

Row Key Ranges HOST DATE

Page 81: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Step 1: Client Identify Row Key Ranges from Query

Completed

SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’)

Row Key Ranges HOST DATE sf1

Page 82: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Step 1: Client Identify Row Key Ranges from Query

Completed

SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’)

Row Key Ranges HOST DATE sf1 sf3

Page 83: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Step 1: Client Identify Row Key Ranges from Query

Completed

SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’)

Row Key Ranges HOST DATE sf1 sf3 sf7

Page 84: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Step 1: Client Identify Row Key Ranges from Query

Completed

SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’)

Row Key Ranges HOST DATE sf1 t1 - * sf3 sf7

Page 85: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Step 2: Client Overlay Row Key Ranges with Regions

Completed

R1

R2

R3

R4

sf1

sf4

sf6

sf1 sf3

sf7

Page 86: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Step 3: Client Execute Parallel Scans

Completed

R1

R2

R3

R4

sf1

sf4

sf6

sf1

sf3

sf7

scan1

scan3

scan2

Page 87: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Step 4: Server Filter using Skip Scan

Completed

sf1.s1 t0 SKIP

Page 88: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Step 4: Server Filter using Skip Scan

Completed

sf1.s1 t1 INCLUDE

Page 89: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Step 4: Server Filter using Skip Scan

Completed

sf1.s2 t0 SKIP

Page 90: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Step 4: Server Filter using Skip Scan

Completed sf1.s2 t1 INCLUDE

Page 91: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Step 4: Server Filter using Skip Scan

sf1.s3 t0 SKIP

Page 92: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Step 4: Server Filter using Skip Scan

sf1.s3 t1 INCLUDE

Page 93: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

SERVER METRICS HOST DATE sf1.s1 Jun 2 10:10:10.234 sf1.s2 Jun 3 23:05:44.975 sf1.s2 Jun 9 08:10:32.147 sf1.s3 Jun 1 11:18:28.456 sf1.s3 Jun 3 22:03:22.142 sf1.s4 Jun 1 10:29:58.950 sf1.s4 Jun 2 14:55:34.104 sf1.s4 Jun 3 12:46:19.123 sf1.s5 Jun 8 08:23:23.456 sf1.s6 Jun 1 10:31:10.234

Step 5: Server Intercept Scan in Coprocessor

SERVER METRICS HOST DATE AGG sf1 Jun 1 … sf1 Jun 2 … sf1 Jun 3 … sf1 Jun 8 … sf1 Jun 9 …

Page 94: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Step 6: Client Perform Final Merge Sort

Completed

R1

R2

R3

R4

scan1

scan3

scan2

SERVER METRICS HOST DATE AGG sf1 Jun 5 … sf1 Jun 9 … sf3 Jun 1 … sf3 Jun 2 … sf7 Jun 1 … sf7 Jun 8 …

Page 95: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Scenario 2 Find 5 Longest GC Times

Completed

SELECT host, date, gc_time FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’) ORDER BY gc_time DESC LIMIT 5

Page 96: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Scenario 2 Find 5 Longest GC Times

•  Same client parallelization and server skip scan filtering

Page 97: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Scenario 2 Find 5 Longest GC Times

Completed

•  Same client parallelization and server skip scan filtering •  Server holds 5 longest GC_TIME value for each scan

R1

SERVER METRICS HOST DATE GC_TIME sf1.s1 Jun 2 10:10:10.234 22123

sf1.s1 Jun 3 23:05:44.975 19876

sf1.s1 Jun 9 08:10:32.147 11345

sf1.s2 Jun 1 11:18:28.456 10234

sf1.s2 Jun 3 22:03:22.142 10111

Page 98: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

SERVER METRICS HOST DATE GC_TIME sf1.s1 Jun 2 10:10:10.234 22123

sf1.s1 Jun 3 23:05:44.975 19876

sf1.s1 Jun 9 08:10:32.147 11345

sf1.s2 Jun 1 11:18:28.456 10234

sf1.s2 Jun 3 22:03:22.142 10111

Scenario 2 Find 5 Longest GC Times

•  Same client parallelization and server skip scan filtering •  Server holds 5 longest GC_TIME value for each scan •  Client performs final merge sort among parallel scans

Scan1

Scan2

Scan3

Page 99: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Scenario 3 Find 5 Longest GC Times

Completed

CREATE INDEX gc_time_index ON server_metrics (gc_time DESC, date DESC) INCLUDE (host, response_time)

Page 100: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Scenario 3 Find 5 Longest GC Times

Completed

CREATE INDEX gc_time_index ON server_metrics (gc_time DESC, date DESC) INCLUDE (host, response_time)

Page 101: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Scenario 3 Find 5 Longest GC Times

Completed

CREATE INDEX gc_time_index ON server_metrics (gc_time DESC, date DESC) INCLUDE (host, response_time)

Page 102: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Scenario 3 Find 5 Longest GC Times

Completed

CREATE INDEX gc_time_index ON server_metrics (gc_time DESC, date DESC) INCLUDE (host, response_time)

Row Key

GC_TIME_INDEX GC_TIME INTEGER DATE DATE HOST VARCHAR RESPONSE_TIME INTEGER

Page 103: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Scenario 3 Find 5 Longest GC Times

Completed

CREATE INDEX gc_time_index ON server_metrics (gc_time DESC, date DESC) INCLUDE (host, response_time)

Key Value

GC_TIME_INDEX GC_TIME INTEGER DATE DATE HOST VARCHAR RESPONSE_TIME INTEGER

Page 104: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Scenario 3 Find 5 Longest GC Times

Completed

SELECT host, date, gc_time FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’) ORDER BY gc_time DESC LIMIT 5

Page 105: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Demo

Completed

l Phoenix Stock Analyzer l Fortune 500 companies l 10 years of historical stock prices l Demonstrates Skip Scan in action l Running locally on my single node laptop cluster

Page 106: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Phoenix Roadmap

Completed

l  Secondary Indexing l  Count distinct and percentile l  Derived tables l  Hash Joins l  Apache Drill integration l  Cost-based query optimizer l  OLAP extensions l  Transactions

Page 107: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Thank you! Questions/comments?