Big Data Query Landscape: N1QL and More: Couchbase Connect 2015

30
BIG DATA QUERY LANDSCAPE – N1QL AND MORE Yingyi Bu | Couchbase

Transcript of Big Data Query Landscape: N1QL and More: Couchbase Connect 2015

Page 1: Big Data Query Landscape: N1QL and More: Couchbase Connect 2015

BIG DATA QUERY LANDSCAPE – N1QL AND MORE

Yingyi Bu | Couchbase

Page 2: Big Data Query Landscape: N1QL and More: Couchbase Connect 2015

©2015 Couchbase Inc. 2

About Myself

Sr. Software Engineer @ Couchbase Committer @ AsterixDB (Research Project under Apache Incubation) PhD Student @ UC Irvine N1QL [email protected]@buyingyi

Page 3: Big Data Query Landscape: N1QL and More: Couchbase Connect 2015

©2015 Couchbase Inc. 3

Agenda

Introduction Operational Query Processing Analytical Query Processing Comparison and Unification Summary

Page 4: Big Data Query Landscape: N1QL and More: Couchbase Connect 2015

Introduction

Page 5: Big Data Query Landscape: N1QL and More: Couchbase Connect 2015

©2015 Couchbase Inc. 5

Research Projects

Introduction

NoSQL

SQL-on-HadoopETL

SQL++Unification

Connector

Page 6: Big Data Query Landscape: N1QL and More: Couchbase Connect 2015

©2015 Couchbase Inc. 6

Language Unification Research SQL Backward Compatible Rich Data Model Configurable Semantics

System Unification Research A Single Language Interface Scale-out for Both Workloads Resource Scheduling Underneath

Introduction

SQL++

Page 7: Big Data Query Landscape: N1QL and More: Couchbase Connect 2015

Operational Query Processing

Page 8: Big Data Query Landscape: N1QL and More: Couchbase Connect 2015

©2015 Couchbase Inc. 8

ArrayList<URI> nodes = new ArrayList<URI>();

// Add one or more nodes of your clusternodes.add(URI.create("http://127.0.0.1:8091/pools"));

// Try to connect to the clientCouchbaseClient client = null;try { client = new CouchbaseClient(nodes, "default", "");} catch (Exception e) { System.err.println("Error connecting to Couchbase: " + e.getMessage()); System.exit(1);} // Put the key-value pair into Couchbase.client.set("hello", "couchbase!").get();

// Return the result and cast it to stringString result = (String) client.get("hello");System.out.println(result);

Operational Query Processing

PutGet

What If? JSON Filtering Flatten Group-by Aggregation Join Ordering

Page 9: Big Data Query Landscape: N1QL and More: Couchbase Connect 2015

©2015 Couchbase Inc. 9

N1QL – SQL for NoSQL Nested Data Heterogeneous

Data Dynamic typing[ {

"beer-sample": { "brewery_id": "bro"

"abv": {"m1":1, "m2“:2},

"category": "North American Lager”,

"type": "beer" }

}, { "beer-sample": { "abv": 9.5, "brewery_id": "brouwerij"

} }]

SELECT category, type, abv.m1FROM `beer-sample`WHERE type = “beer”

[ { "category": "North American Lager", "type": "beer”, "m1": 1 }]

Standard SELECT pipeline Joins, subqueries, set

operators UNNEST and NEST

Page 10: Big Data Query Landscape: N1QL and More: Couchbase Connect 2015

©2015 Couchbase Inc. 10

Cassandra

SQL-like query languageFeature N1QL

Cassandra

Lookup ✔ ✔Filtering ✔ ✔Ordering ✔ ✔Aggregation

✔ ✖

Join ✔ ✖Subqueries ✔ ✖Unnest ✔ ✖Schema-free

✔ ✖

SELECT firstname, lastname FROM users WHERE birth_year = 1981 AND country = 'FR' ALLOW FILTERING;

SELECT * FROM posts WHERE userid='john doe' AND (blog_title, posted_at) > ('John''s Blog', '2012-01-01')

Page 11: Big Data Query Landscape: N1QL and More: Couchbase Connect 2015

©2015 Couchbase Inc. 11

MongoDB

JavaScript-like languageFeature N1QL

MongoDB

Lookup ✔ ✔Filtering ✔ ✔Ordering ✔ ✔Aggregation

✔ ✔

Join ✔ ✖Subqueries ✔ ✖Unnest ✔ ✔Schema-free

✔ ✔

db.sales.aggregate( [ { $group : { _id : { month: { $month: "$date" }, day:

{ $dayOfMonth: "$date" }, year: { $year: "$date" } }, totalPrice: { $sum: { $multiply: [ "$price",

"$quantity" ] } }, averageQuantity: { $avg: "$quantity" }, count: { $sum: 1 } } } ])

db.users.find( { age: { $gt: 18 } }, { name: 1, address: 1 } ).limit(5)

Page 12: Big Data Query Landscape: N1QL and More: Couchbase Connect 2015

Analytical Query Processing

Page 13: Big Data Query Landscape: N1QL and More: Couchbase Connect 2015

©2015 Couchbase Inc. 13

Hive

INSERT OVERWRITE TABLE school_summary SELECT subq1.school, COUNT(1) FROM (SELECT a.status, b.school, b.gender FROM status_updates a JOIN profiles b ON (a.userid = b.userid AND a.ds='2009-03-20' )) subq1GROUP BY subq1.school

ProjectProject

Scan (a)

FilterScan (b)

ReduceSink ReduceSink

Join

Group-by

FileSink

Scan

ReduceSink

Group-by

FileSink

M1

R1

M2

R2 More data types than SQL Hadoop or Tez as runtime

Page 14: Big Data Query Landscape: N1QL and More: Couchbase Connect 2015

©2015 Couchbase Inc. 14

Impala

INSERT OVERWRITE TABLE school_summarySELECT subq1.school, COUNT(1) FROM (SELECT a.status, b.school, b.gender FROM status_updates a JOIN profiles b ON (a.userid = b.userid AND a.ds='2009-03-20' )) subq1GROUP BY subq1.school

ProjectProject

Filter HDFS Scan (b)

Hash Join

HDFS Scan (a)

Pre-Agg

Merge-Agg

HDFS Write

ANSI SQL-92 HDFS/HBase as the

storage Native MPP execution

engine

Page 15: Big Data Query Landscape: N1QL and More: Couchbase Connect 2015

©2015 Couchbase Inc. 15

Spark SQL

ctx = new HiveContext()users = ctx.table("users")young = users.where(users("age") < 21) println(young.count())

SELECT count(*) FROM userswhere age < 21

SQL DataFrames

SQL

DataFrames

Unresolved Logical Plan

Logical Plan

PhysicalPlans

SelectedPhysicalPlan

RD

Ds

Cost

M

odel

Catalog

Page 16: Big Data Query Landscape: N1QL and More: Couchbase Connect 2015

©2015 Couchbase Inc. 16

Drill

ANSI SQL-92 Nested Data Schema

Inference

Centralized schema Static Managed by DBAs

Self-describing or schema-less Dynamic evolving Managed by applications Embedded in data CSV, JSON, Parquet, ORC

Page 17: Big Data Query Landscape: N1QL and More: Couchbase Connect 2015

Comparison and Unification

Page 18: Big Data Query Landscape: N1QL and More: Couchbase Connect 2015

©2015 Couchbase Inc. 18

Comparison and Unification

AsterixDB – System Unification Research Query language? Language Comparisons SQL++ – Language Unification Research N1QL and SQL++

SQL++

Unification

Research Projects

Page 19: Big Data Query Landscape: N1QL and More: Couchbase Connect 2015

©2015 Couchbase Inc. 19

NoSQL data model with schema flexibility Declarative full-fledged query language (AQL) Partitioned native LSM-based storage Secondary index (B-Tree, R-Tree, and keyword

index) Single-row transaction Spatial/temporal data types External data (HDFS) access and indexing Native MPP query execution engine

AsterixDB (Apache incubator)

Operational

Analytical

Page 20: Big Data Query Landscape: N1QL and More: Couchbase Connect 2015

©2015 Couchbase Inc. 20

Query Language?

SELECT subq1.school, COUNT(1) FROM (SELECT a.status, a.date, b.school, b.region FROM status_updates a JOIN profiles b ON (a.userid = b.userid AND a.date='2009-03-20' )) subq1GROUP BY subq1.school

Relational JSON Nested

tuples/collections Partial/missing

schema Heterogeneity Complex values

What If? Replace COUNT(1) with “(select * from subq1 order by date limit 3)”; “school” is not in the

schema of the “profiles” table

“school” is missing in some profiles;

“school” is a nested tuple.

Page 21: Big Data Query Landscape: N1QL and More: Couchbase Connect 2015

©2015 Couchbase Inc. 21

Language Comparison: Data Model

SystemTop-level Values

Heterogeneity

Arrays Bags MapsNested Tuples

Primitive

Values

Hive Bags/Tuples ✖ ✔ ✖ P ✔ ✔Impala Bags/Tuples ✖ ✖ ✖ ✖ ✖ ✔Spark SQL

Bags/Tuples ✖ ✔ ✖ ✔ ✔ ✔

Drill Bags/Tuples ✖ ✔ ✖ ✔ ✔ ✔N1QL Bags/Tuples ✔ ✔ ✖ ✖ ✔ ✔Cassandra

Bags/Tuples ✖ P ✖ P ✖ ✔

MongoDB

Bags/Tuples ✔ ✔ ✖ ✖ ✔ ✔

AsterixDB

Any Values ✔ ✖ ✔ ✖ ✔ ✔

Page 22: Big Data Query Landscape: N1QL and More: Couchbase Connect 2015

©2015 Couchbase Inc. 22

Language Comparison: Types

SystemDynamic

Type Check

Static Type

CheckAny Type

Open Type

Union Type

Optional

Hive ✖ ✔ ✖ ✖ ✖ ✖

Impala ✖ ✔ ✖ ✖ ✖ ✖

Spark SQL

✖ ✔ ✖ ✖ ✖ ✖

Drill ✖ ✔ ✖ ✖ ✖ ✖

N1QL ✔ ✖ – –

Cassandra

✖ ✔ ✖ ✖ ✖ ✖

MongoDB

✔ ✖ – –

AsterixDB

✔ ✔ ✔ ✔ ✖ ✔

Page 23: Big Data Query Landscape: N1QL and More: Couchbase Connect 2015

©2015 Couchbase Inc. 23

Language Comparison: Path Navigation

SystemTuple Nav.

absent

Tuple Nav.

mismatch

Array Nav.

absent

Array Nav. mismatch

Map Nav.

absent

Map Nav.

mismatch

Hive error error null error null errorImpala error error -- -- -- --Spark SQL

error error error error null error

Drill error error error error null errorN1QL missing missing missing missing -- --Cassandra

error error -- -- -- --

MongoDB

missing missing -- -- -- --

AsterixDB

null error error error -- --

No Errors!

Page 24: Big Data Query Landscape: N1QL and More: Couchbase Connect 2015

©2015 Couchbase Inc. 24

Language Comparison: SELECT Clause

System

Project Tuples with Non-scalar Subqueries

Project Tuples with

Nested Collections

Project Non-Tuples

Hive ✖ ✔ ✖Impala ✖ ✖ ✖Spark SQL ✖ ✔ ✖Drill ✖ ✔ ✖N1QL ✔ ✔ ✔Cassandra ✖ ✖ ✖MongoDB ✖ ✔ ✔AsterixDB ✔ ✔ ✔

Page 25: Big Data Query Landscape: N1QL and More: Couchbase Connect 2015

©2015 Couchbase Inc. 25

Language Comparison: FROM Clause

SystemSubque

ryJoins

Inner Unnest

Outer Unnest

Ordinal Positions

Hive ✔ ✔ ✔ ✔ ✔Impala ✔ ✔ ✖ ✖ ✖Spark SQL

✔ ✔ ✖ ✖ ✖

Drill ✔ ✔ ✔ ✖ ✖N1QL ✔ ✔ ✔ ✔ ✖Cassandra

✖ ✖ ✖ ✖ ✖

MongoDB

✖ ✖ ✔ ✖ ✖

AsterixDB

✔ ✔ ✔ ✖ ✔

Page 26: Big Data Query Landscape: N1QL and More: Couchbase Connect 2015

©2015 Couchbase Inc. 26

JSON data model INNER/OUTER FLATTEN CLAUSE Arbitrary subqueries in SELECT Configurable parameters for semantics

Path navigations Equality evaluations Collection coercions

SQL++ (The “++” Part)

Supported by N1QL!

Made consistent in N1QL!

Page 27: Big Data Query Landscape: N1QL and More: Couchbase Connect 2015

©2015 Couchbase Inc. 27

SQL++ Configuration for N1QLConfigurati

onParameter Value Parameter Value

@path

tuple_nav.absent missing tuple_nav.type_mismatch

missing

array_nav.absent missing array_nav.type_mismatch

missing

map_nav.absent missing map_nav.type_mismatch

missing

@eq

complex yes type_mismatch false

null_eq_null null null_eq_value null

null_eq_missing missing missing_eq_missing missing

missing_eq_value missing null_and_missing missing

null_and_true null null_and_null null

missing_and_true missing missing_and_missing missing

Page 28: Big Data Query Landscape: N1QL and More: Couchbase Connect 2015

SummaryN1QL in a Bigger Context

Page 29: Big Data Query Landscape: N1QL and More: Couchbase Connect 2015

©2015 Couchbase Inc. 29

Operational Query Processing Rich Data Model SQL is BACK, but with EXTENSIONS!

Analytical Query Processing Rich Data Model is a MUST!

Unification The trend!

Summary

Page 30: Big Data Query Landscape: N1QL and More: Couchbase Connect 2015

Thank you.Q & A