Software and Services Group SQL (92 and Beyond) Support for Hive Jason Dai Principal Engineer Intel...

9
Software and Services Group SQL (92 and Beyond) Support for Hive Jason Dai Principal Engineer Intel SSG (Software and Services Group)

Transcript of Software and Services Group SQL (92 and Beyond) Support for Hive Jason Dai Principal Engineer Intel...

Page 1: Software and Services Group SQL (92 and Beyond) Support for Hive Jason Dai Principal Engineer Intel SSG (Software and Services Group)

Software and Services Group

SQL (92 and Beyond) Support for Hive

Jason DaiPrincipal Engineer

Intel SSG (Software and Services Group)

Page 2: Software and Services Group SQL (92 and Beyond) Support for Hive Jason Dai Principal Engineer Intel SSG (Software and Services Group)

2

Software and Services Group

What SQL support is needed?

More SQL-92 support for analytics

• Complete SQL data type system– Data types (e.g., Datetime, fixed precision numbers), type conversion rules & function

(CAST), Datetime expressions and functions (e.g. extract, +/- interval), etc.

• Full subquery support– Subquery in WHERE clauses, correlated subquery, scalar subquery, etc.– New expressions (EXISTS, ALL, ANY, etc.)

• Complete Set operators– DISTINCT UNION, INTERSECT, EXCEPT, etc.

• Multiple-table SELECT statement

• Update/delete?– On HBase only?

(Almost) SQL-92 compliance?

• How about transaction?

2

Page 3: Software and Services Group SQL (92 and Beyond) Support for Hive Jason Dai Principal Engineer Intel SSG (Software and Services Group)

3

Software and Services Group

What SQL support is needed (continued)?

Additional analytics support (beyond SQL-92)

• Advanced OLAP functions for analysis & reporting– E.g., rank, rollup, cube, window function (SQL 2003), etc.

• Advanced SQL syntax– E.g. WITH clause (SQL-99)

• Procedural extensions– E.g., Begin, End, If…Then...Else, Loop/Exit/Continue, etc.

3

Page 4: Software and Services Group SQL (92 and Beyond) Support for Hive Jason Dai Principal Engineer Intel SSG (Software and Services Group)

4

Software and Services Group

Workload Analysis

4

TPC-H TPC-DS

Complex Subquery Y Y

Multiple-table SELECT Y Y

Set operators Y

SQL data types (especially Datetime)

Y Y

Advanced OLAP functions (e.g., rank, grouping and window functions)

Y

WITH clause (SQL-99) Y

UPDATE/DELETE Y

Page 5: Software and Services Group SQL (92 and Beyond) Support for Hive Jason Dai Principal Engineer Intel SSG (Software and Services Group)

5

Software and Services Group

Let’s Get Our Hands Dirty

5

ParserSemantic Analyzer

(Optimizer)ExecutionQuery

AST (Abstract Syntax Tree) Execution Plan

(Almost) SQL-compliant Hive parser

• A lot of work: SQL much more complex than HiveQL– HiveQL grammar file: ~61KB with 2487 lines– SQL (with PL/SQL extensions) grammar file: ~524KB with 8583 lines

• Also complex: many existing Hive grammar rules need to be changed– To support more complex SQL constructs (e.g., subquery)

UDF/UDAF/UDTF

• For some operators (e.g., rank)

Page 6: Software and Services Group SQL (92 and Beyond) Support for Hive Jason Dai Principal Engineer Intel SSG (Software and Services Group)

6

Software and Services Group

Let’s Get Our Hands Dirty

6

ParserSemantic Analyzer

(Optimizer)ExecutionQuery

AST (Abstract Syntax Tree) Execution Plan

Analysis, transformation & optimization

• SQL data type system

• Subquery support (incl. subquery unnestting)

• Multiple-table SELECT

• Set operations

• Advanced OLAP functions

• …

Page 7: Software and Services Group SQL (92 and Beyond) Support for Hive Jason Dai Principal Engineer Intel SSG (Software and Services Group)

7

Software and Services Group

Project Panthera:

• Our open source efforts to enable better analytics capabilities on Hadoop/HBase

• https://github.com/intel-hadoop/project-panthera

How to Leverage Existing Works?

7

*https://github.com/porcelli/plsql-parser

Hive Parser

Hive-AST

HiveQL

DriverQuery

(Open Source)

SQL Parser*

SQL-AST

SQL-AST Analyzer & Translator

Multi-Table SELECT

Subquery Unnestin

g…

Hive Semantic Analyzer

INTERSECT Support

MINUS Support

Hadoop MR

SQLHive-AST

A SQL engine for Hive MapReduce

Goal: full analytical SQL support for OLAP Subquery in WHERE clause Correlated subquery Multiple-table SELECT statement …

Page 8: Software and Services Group SQL (92 and Beyond) Support for Hive Jason Dai Principal Engineer Intel SSG (Software and Services Group)

8

Software and Services Group

NextR Hive UDFs

• https://github.com/nexr/hive-udf

• UDFs for Oracle db extensions (rank, decode, nvl, etc.)

SQL windowing functions for Hive

• https://github.com/hbutani/SQLWindowing

How to Leverage Existing Works?

8

Page 9: Software and Services Group SQL (92 and Beyond) Support for Hive Jason Dai Principal Engineer Intel SSG (Software and Services Group)

9

Software and Services Group

9