Couchbase Live Europe 2015: N1QL: Performance Tuning and Scaling
Enterprise Architect's Perspective of Couchbase with N1QL: Couchbase Connect 2015
-
Upload
couchbase -
Category
Technology
-
view
145 -
download
1
Transcript of Enterprise Architect's Perspective of Couchbase with N1QL: Couchbase Connect 2015
ENTERPRISE ARCHITECT'S PERSPECTIVE OF COUCHBASE WITH N1QL
Keshav Murthy Couchbase [email protected]@N1QL @rkeshavmurthy
©2015 Couchbase Inc. 2
©2015 Couchbase Inc. 3
Agenda
Application requirements
Data requirements
Couchbase with N1QL
Application Requirements
©2015 Couchbase Inc. 5
Application Requirements
Rapid application development Changing market needs Changing data needs
Scalability Unknown user demand Constantly growing throughput
Consistent Performance Low response time for better user experience High throughput to handle viral growth
Reliability Always online
Common application requirements
Database Requirements
©2015 Couchbase Inc. 7
Database Requirements
Development environment Data Modeling APIs Query Language
Performance, Performance, Performance
AvailabilityConsistencyFlexibilityManageability
©2015 Couchbase Inc. 8
Data Management Landscape
Processing in Files
MapReduceGeneric
fileformats
Rows/Columns in files (tables)Hive – Pig -
etc
QueryImpalaHive
NoSQLMongoDB
CouchbaseHbase
Cassandra
HADOOP (Analytical)
Disk & Storage
RDBMS
Highly Structured Data
SQL, R, etcBytes &
Blocks
$100K – $200K / TB
$1K/TB
$10K/TB
Semi Structured & Self describing
No Structure
OLTP EDW
$10K-$20K/TB
Drill
Operational Big data
Couchbase 3.0
©2015 Couchbase Inc. 10
Couchbase Server 3.0 Cluster Architecture
10
STORAGE
Couchbase Server 1
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Managed CacheStorage
Data Servi
ceSTORAGE
Couchbase Server 2
Managed Cache
Cluster ManagerCluster Manager
Data Servi
ceSTORAGE
Couchbase Server 3
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Data Servi
ceSTORAGE
Couchbase Server 4
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Data Servi
ceSTORAGE
Couchbase Server 5
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Data Servi
ceSTORAGE
Couchbase Server 6
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Data Servi
ce
Managed CacheStorage
Managed CacheStorage
Managed CacheStorage
Managed CacheStorage
Managed CacheStorage
©2015 Couchbase Inc. 11
read/write/update
Active
SERVER 1
Active
SERVER 2
Active
SERVER 3
APP SERVER 1
COUCHBASE Client Library
CLUSTER MAP
COUCHBASE Client Library
CLUSTER MAP
APP SERVER 2
Shard 5
Shard 2
Shard 9
Shard
Shard
Shard
Shard 4
Shard 7
Shard 8
Shard
Shard
Shard
Shard 1
Shard 3
Shard 6
Shard
Shard
Shard
Replica Replica Replica
Shard 4
Shard 1
Shard 8
Shard
Shard
Shard
Shard 6
Shard 3
Shard 2
Shard
Shard
Shard
Shard 7
Shard 9
Shard 5
Shard
Shard
Shard
Multi-Node Operations
• Docs distributed evenly across servers
• Each server stores both active and replica docs Only one “copy” is master at a time
• Client library provides app with simple interface to database
• Cluster map provides map to which server doc is on App never needs to know
• App reads, writes, updates docs
• Multiple app servers can access same document at same time
©2014 Couchbase, Inc. 11
Why N1QL?
©2015 Couchbase Inc. 13
Properties of Real-World Data
Rich structure Attributes, Sub-structure
Relationships To other data
Value evolution Data is updated
Structure evolution Data is reshaped
Person
Name
DOB
Billing
Connections
Purchases
Jane Smith
Jan-30-1990
©2015 Couchbase Inc. 14
Models for Representing Data
Data Concern Relational Model JSON Document Model (NoSQL)
Rich Structure Multiple flat tables Constant assembly /
disassembly
Documents No assembly required!
Relationships Represented Queried (SQL)
Represented Queried? Not until
now…
Value Evolution
Data can be updated Data can be updated
Structure Evolution
Uniform and rigid Manual change
(disruptive)
Flexible Dynamic change
What is N1QL?
©2015 Couchbase Inc. 16
SELECT Statement
SELECT [ DISTINCT ] …
FROM … JOIN …
WHERE …
GROUP BY … HAVING …
ORDER BY …
LIMIT …
OFFSET …
( UNION | INTERSECT | EXCEPT )
[ ALL ] …
©2015 Couchbase Inc. 17
SELECT Statement Highlights
Querying across relationships JOINs
Subqueries
Aggregation
MIN, MAX
( SUM, COUNT, AVG, ARRAY_AGG ) [ DISTINCT ]
Combining result sets using set
operators
( UNION, INTERSECT, EXCEPT ) [ DISTINCT ]
©2015 Couchbase Inc. 18
Data Modification Statements
UPDATE … SET … WHERE …
DELETE FROM … WHERE …
INSERT INTO … ( KEY, VALUE ) VALUES …
INSERT INTO … ( KEY …, VALUE … ) SELECT …
MERGE INTO … USING … ON …
WHEN [ NOT ] MATCHED THEN …
Note: Couchbase Server provides per-
document atomicity.
©2015 Couchbase Inc. 19
Query Execution: Join
"CUSTOMER": {"C_D_ID": 10,"C_ID": 1938, "C_W_ID": 1,
"C_BALANCE": -10, "C_CITY": ”San Jose", "C_CREDIT": "GC”, \"C_DELIVERY_CNT": 0, "C_DISCOUNT": 0.3866,
"C_FIRST": ”Jay","C_LAST": ”Smith",
"C_MIDDLE": "OE", "C_PAYMENT_CNT": 1, "C_PHONE": ”555-123-1234", "C_SINCE": "2015-03-22 00:50:42.822518", "C_STATE": ”CA", "C_STREET_1": ”555, Tideway Drive", "C_STREET_2": ”Alameda",
"C_YTD_PAYMENT": 10, "C_ZIP": ”94501" }
Document key: “1.10.1938” Document key: “1.10.143”
“ORDERS”: { “O_CUSTOMER_KEY”: “1.10.1938): "O_D_ID": 10, "O_ID": 1,
"O_ALL_LOCAL": 1, "O_CARRIER_ID": 2, "O_C_ID": 1938, "O_ENTRY_D": "2015-05-19 16:22:08.544472", "O_ID": 143, "O_OL_CNT": 10, "O_W_ID": 1}x
“ORDERS”: { “O_CUSTOMER_KEY”: “1.10.1938”):
"O_ALL_LOCAL": 1, "O_CARRIER_ID": 2, "O_C_ID": 1938, "O_D_ID": 10, "O_ENTRY_D": "2015-05-19 16:22:08.544472", "O_ID": 1355, "O_OL_CNT": 10, "O_W_ID": 3}
Document key: “1.10.1355”
©2015 Couchbase Inc. 20
Query Execution: Join
SELECT COUNT(o.O_ORDER_CNT ) AS CNT_O_OL_CNT FROM ORDERS o INNER JOIN CUSTOMER c ON KEYS (o.O_CUSTOMER_KEY) WHERE o.O_CARRIER_NAME = ”Penske” AND c.C_STATE = “CA”;
Two keyspace joins
ON Clause for the joinFetch
Parse
Plan
JoinFilter
Offset
Limit
Project
Sort
Aggregate
Scan
Couchbase 4.0
©2015 Couchbase Inc. 22
Couchbase Server Cluster Architecture
22
STORAGE
Couchbase Server 1
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Managed CacheStorage
Data Service
Index Service
Query Service
STORAGE
Couchbase Server 2
Managed Cache
Cluster ManagerCluster Manager
Data Service
Index Service
Query Service
STORAGE
Couchbase Server 3
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Data Service
Index Service
Query Service
STORAGE
Couchbase Server 4
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Data Service
Index Service
Query Service
STORAGE
Couchbase Server 5
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Data Service
Index Service
Query Service
STORAGE
Couchbase Server 6
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Data Service
Index Service
Query Service
Managed CacheStorage
Managed CacheStorage
Managed CacheStorage
Managed CacheStorage
Managed CacheStorage
©2015 Couchbase Inc. 23
Couchbase Server Cluster Service Deployment
23
STORAGE
Couchbase Server 1
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Managed CacheStorage
Data Servi
ceSTORAGE
Couchbase Server 2
Managed Cache
Cluster ManagerCluster Manager
Data Servi
ceSTORAGE
Couchbase Server 3
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Data Servi
ceSTORAGE
Couchbase Server 4
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Query
Service
STORAGE
Couchbase Server 5
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Query
Service
STORAGE
Couchbase Server 6
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Index
Service
Managed CacheStorage
Managed CacheStorage Storage
STORAGE
Couchbase Server 6
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Index
Service
Storage
Managed Cache
Managed Cache
Multi Dimensional Scaling
©2015 Couchbase Inc. 24
Index Service: Global Secondary Index
Index Service
Snapshot at T1
Snapshot at T2
Index Email1
Scan Port
Query Service
Connection Pool
IndexClient
Metadata Cache
(Email1, Email2)
Index Service
Snapshot at T3
Snapshot at T4
Index Email2
Scan Port
Connection Pool
Create index Email1 on Customer(Email) using gsi;Create index Email2 on Customer(Email) using gsi;
©2015 Couchbase Inc. 25
Data Service
Projector & Router
Index Service: Global Secondary Index
Query Service
Bucket#1
Bucket#2
DCP Stream
Index Service
SupervisorIndex maintenance &
Scan coordinator
Index#2
Index#1
Index#4Index#3
ForestDBStorage Engine
Bucket#2
Bucket#1
©2015 Couchbase Inc. 26
Query Service: Parallelized for Performance
Client
FetchParse Plan Join FilterPre-Aggregate
Offset Limit ProjectSortAggregateScan
Query ServiceInde
x Servi
ce
Data Servi
ce
Application Development:
SDKs for N1QL
©2015 Couchbase Inc. 30
Native N1QL Support: Usage in the SDKs
30
C / C++
REST API
©2015 Couchbase Inc. 31
Client to Query Service: REST API
Communication protocol is REST on top of HTTP
The database protocol structure is embedded within the REST API.
Query Service is stateless: All query information is embedded within the REST request.
REST is open. All REST clients work with N1QL
All N1QL clients, JDBC, ODBC drivers use REST
Fetch
Parse
Plan
Join
Filter
Offset
Limit
Project
Sort
Aggregate
Scan
import requestsimport jsonurl = "http://localhost:8093/query"s1=”SELECT * FROM CUSTOMER WHERE C_ID = 1284";r = requests.post(url, data=s1, auth=('Administrator', 'abc'))print r.json()
©2015 Couchbase Inc. 32
32
// Instantiate The Query API
var couchbase = require('couchbase');var myCluster = new couchbase.Cluster(‘localhost:8091”);var myBucket = myCluster.openBucket(‘travel-sample’);var myQuery = couchbase.N1qlQuery;
N1QL API: NodeJS
©2015 Couchbase Inc. 33
N1QL API: NodeJS
33
function query(sql,done){ var queryToRun = myQuery.fromString(sql) .consistency(myQuery.Consistency.REQUEST_PLUS); myBucket.query(queryToRun,function(err,result){ if (err) { console.log("ERR:",err); done(err,null); return; } done(null,result); return; });}
Performance
©2015 Couchbase Inc. 35
Performance, Performance, Performance Business Demands Highly Responsive Apps
• Architecture based on “speed of disk”
• Requires joins across many tables
• High throughput requires very expensive hardware
• Architecture based on “speed to memory”
• Faster access to aggregated, de-normalized objects
• High throughput at low TCO with cluster of commodity servers
Application layer
RDBMSCache Application layer
RDBMSCache
Couchbase
Availability - Revisited
©2015 Couchbase Inc. 40
Availability: Cross Cluster Availability (XDCR)
Fast Streaming Replication Complete copy of the data in cluster data into another cluster Can be used both for availability and master-master replication Used for both online-recovery
Master
Local Repli
ca
Index
Map/Redu
ce Remote
Replica
IndexMap/Redu
ce
San Francisco
New York
Hadoop
Client/Application
Integration
Backup/Export
Tooling
XDCR
Manageability
©2015 Couchbase Inc. 42
Manageability
machine 1 machine 2 machine 3
Ethernet
Couchbase Node
Couchbase Node
Couchbase Node
©2015 Couchbase Inc. 43
Anatomy of a Node
machine 1
babysitter
query
indexe
r
mem
cach
ed
ns-se
rver
xdcr
vie
w -
en
gin
e
oth
er …
The Cluster Manager is babysitter
and ns-server
Security
©2015 Couchbase Inc. 45
Previously… In 2.2 In 2.5 In 3.0 New in 4.0
SASL AuthN with Bucket Passwords
Admin User
Secure Build Platform
Read-Only User
Easy Admin Password
Reset
Non-Root User
Deployments
Secure Communication for XDCR
Encrypted Client-Server Communicati
on
Encrypted Admin Access
Access Log
Data-at-Rest Encryption
• Simplified compliance with admin auditing
• External identity management for admins using LDAP
Couchbase security journey
Application Development
©2015 Couchbase Inc. 47
Flexibility: Agile Development
• Hundreds or thousands of inter-related tables
• Handles structured data well, unstructured data poorly
• Rigid schema requires migrations that can take weeks, months
• Impedance mismatch with developers
• Aggregates & denormalizes data into documents
• Handles structured & unstructured data equally well
• Inferred schema requires no migration
• JSON rapidly being adopted
Hotel Descriptions
Reviews
User Profiles
Reviews points to
users
Hotels points to reviews
{ “ID”: 1, “NAME”: “Fairmont San Francisco”,…}
{“REVIEW_ID”: 1, “REVIEW”: “Loved Hotel…”,…}
{ “REVIEW_ID”: 2, “REVIEW”: “Nice, but …”,…}
{ “USER_ID”: 1, “DISPLAY”: “Ted’s Trip…”,…}
{ “USER_ID”: 2, “DISPLAY”: “WhatWhat …”,…}
Application Development:
Data Modeling with N1QL
©2015 Couchbase Inc. 49
Development: Goals of Data Modeling for N1QL
1. Define document boundaries
2. Define relationships
3. Express relationships
to facilitate and optimize your desired access
patterns.
©2015 Couchbase Inc. 50
Elements of ER Model
Description Examples
EntityRepresents a noun, object, or “thing” in the domain
Employee, product, blog, episode, profile, session
Relationship
Represents a dependency or interaction between two entities
Manager supervises employee, blog has comments, user owns session
Cardinality
Specifies how many instances of an entity can occur in each side of a relationship. A combination of 0, 1, or N for each side of a relationship.
0 to 1, exactly 1, 0 to N, 1 to N
©2015 Couchbase Inc. 51
Expressing Relationships
3 ways to express relationships in
Couchbase Parent contains keys of children (outbound)
Children contain key of parent (inbound)
Both of the above (dual)
High cardinality affects outbound
relationships Makes parent document bigger and slower
Makes it expensive to load a subset of relationships (e.g.
paging through blog comments)
©2015 Couchbase Inc. 52
N1QL Access Methods and Performance
Fastest to slowest, 1 to 4
Method Description
1 USE KEYS Single fetch, no index scan
2 JOIN Fetch of left-hand-side, then fetches of right-hand-side
3 Index Scan Partial index scan, then fetches
4 Primary Scan Full bucket scan, then fetches
©2015 Couchbase Inc. 53
Child Representation and Access Method
Child Representation
Access Method Notes
1 Embedded USE KEYS
• Parent with children loaded via USE KEYS
• Child can be surfaced via UNNEST
2 Outbound relationship JOIN • Parent contains child keys
• Children loaded via JOIN
3 Inbound relationship Index scan
• Children contain parent key
• child.parent_key is indexed• Index is scanned to load
children
4 Not modeled Primary scan • Relationship not explicitly modeled
©2015 Couchbase Inc. 54
Maintenance of Relationships
Couchbase does not provide cascading deletes Dangling references are possible INNER JOINs and INNER NESTs omit dangling references LEFT OUTER JOINs and LEFT OUTER NESTs safely include
dangling references
Application or background task may need to
clean up Identify and remove dangling references How to identify? Use N1QL’s LEFT OUTER JOINs!
Summary
©2015 Couchbase Inc. 56
Couchbase Server Cluster Service Deployment
56
STORAGE
Couchbase Server 1
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Managed CacheStorage
Data Servi
ceSTORAGE
Couchbase Server 2
Managed Cache
Cluster ManagerCluster Manager
Data Servi
ceSTORAGE
Couchbase Server 3
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Data Servi
ceSTORAGE
Couchbase Server 4
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Query
Service
STORAGE
Couchbase Server 5
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Query
Service
STORAGE
Couchbase Server 6
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Index
Service
Managed CacheStorage
Managed CacheStorage Storage
STORAGE
Couchbase Server 6
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Index
Service
Storage
Managed Cache
Managed Cache
Multi Dimensional Scaling
©2015 Couchbase Inc. 57
Couchbase: Multiple Dimensions
Data Service: Scalable Key-Value Cluster
Index + Aggregation: Views
Index: View Indexing for N1QL
Index: Global Secondary Index
Index: Spatial Index
Index: Full Text Search
N1QL = SQL + JSON
XDCR: Inter data center replication
Couchbase SDKs in every language
©2015 Couchbase Inc. 58
Data Management Landscape
Processing in Files
MapReduceGeneric
fileformats
Rows/Columns in files (tables)Hive – Pig -
etc
QueryImpalaHive
NoSQLMongoDB
CouchbaseHbase
Cassandra
HADOOP (Analytical)
Disk & Storage
RDBMS
Highly Structured Data
SQL, R, etcBytes &
Blocks
$100K – $200K / TB
$1K/TB
$10K/TB
Semi Structured & Self describing
No Structure
OLTP EDW
$10K-$20K/TB
Drill
Operational Bigdata
Couchbase
N1QL
z