Software and Services Group SQL (92 and Beyond) Support for Hive Jason Dai Principal Engineer Intel...
-
Upload
ophelia-rogers -
Category
Documents
-
view
219 -
download
3
Transcript of Software and Services Group SQL (92 and Beyond) Support for Hive Jason Dai Principal Engineer Intel...
Software and Services Group
SQL (92 and Beyond) Support for Hive
Jason DaiPrincipal Engineer
Intel SSG (Software and Services Group)
2
Software and Services Group
What SQL support is needed?
More SQL-92 support for analytics
• Complete SQL data type system– Data types (e.g., Datetime, fixed precision numbers), type conversion rules & function
(CAST), Datetime expressions and functions (e.g. extract, +/- interval), etc.
• Full subquery support– Subquery in WHERE clauses, correlated subquery, scalar subquery, etc.– New expressions (EXISTS, ALL, ANY, etc.)
• Complete Set operators– DISTINCT UNION, INTERSECT, EXCEPT, etc.
• Multiple-table SELECT statement
• Update/delete?– On HBase only?
(Almost) SQL-92 compliance?
• How about transaction?
2
3
Software and Services Group
What SQL support is needed (continued)?
Additional analytics support (beyond SQL-92)
• Advanced OLAP functions for analysis & reporting– E.g., rank, rollup, cube, window function (SQL 2003), etc.
• Advanced SQL syntax– E.g. WITH clause (SQL-99)
• Procedural extensions– E.g., Begin, End, If…Then...Else, Loop/Exit/Continue, etc.
3
4
Software and Services Group
Workload Analysis
4
TPC-H TPC-DS
Complex Subquery Y Y
Multiple-table SELECT Y Y
Set operators Y
SQL data types (especially Datetime)
Y Y
Advanced OLAP functions (e.g., rank, grouping and window functions)
Y
WITH clause (SQL-99) Y
UPDATE/DELETE Y
5
Software and Services Group
Let’s Get Our Hands Dirty
5
ParserSemantic Analyzer
(Optimizer)ExecutionQuery
AST (Abstract Syntax Tree) Execution Plan
(Almost) SQL-compliant Hive parser
• A lot of work: SQL much more complex than HiveQL– HiveQL grammar file: ~61KB with 2487 lines– SQL (with PL/SQL extensions) grammar file: ~524KB with 8583 lines
• Also complex: many existing Hive grammar rules need to be changed– To support more complex SQL constructs (e.g., subquery)
UDF/UDAF/UDTF
• For some operators (e.g., rank)
6
Software and Services Group
Let’s Get Our Hands Dirty
6
ParserSemantic Analyzer
(Optimizer)ExecutionQuery
AST (Abstract Syntax Tree) Execution Plan
Analysis, transformation & optimization
• SQL data type system
• Subquery support (incl. subquery unnestting)
• Multiple-table SELECT
• Set operations
• Advanced OLAP functions
• …
7
Software and Services Group
Project Panthera:
• Our open source efforts to enable better analytics capabilities on Hadoop/HBase
• https://github.com/intel-hadoop/project-panthera
How to Leverage Existing Works?
7
*https://github.com/porcelli/plsql-parser
Hive Parser
Hive-AST
HiveQL
DriverQuery
(Open Source)
SQL Parser*
SQL-AST
SQL-AST Analyzer & Translator
Multi-Table SELECT
Subquery Unnestin
g…
Hive Semantic Analyzer
INTERSECT Support
MINUS Support
…
Hadoop MR
SQLHive-AST
A SQL engine for Hive MapReduce
Goal: full analytical SQL support for OLAP Subquery in WHERE clause Correlated subquery Multiple-table SELECT statement …
8
Software and Services Group
NextR Hive UDFs
• https://github.com/nexr/hive-udf
• UDFs for Oracle db extensions (rank, decode, nvl, etc.)
SQL windowing functions for Hive
• https://github.com/hbutani/SQLWindowing
How to Leverage Existing Works?
8
9
Software and Services Group
9