Couchbase 2.0 and Indexing/Querying
-
Upload
couchbase -
Category
Technology
-
view
749 -
download
4
description
Transcript of Couchbase 2.0 and Indexing/Querying
![Page 1: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/1.jpg)
1 1
Couchbase Server 2.0 Indexing and Querying
Quick Overview
Chris AndersonCofounder and Architect
![Page 2: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/2.jpg)
Couchbase Server 2.0 – Webinar series
www.couchbase.com/webinars
![Page 3: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/3.jpg)
• Vibrant and growing community focused
on distributed database technology
• Supports both key-value and
document-oriented use cases
• Community contributes via core server
and client development, connectors and
plugins, usability feedback, docs and
forums, etc.
• All project components available under
the Apache Public License
• Packaged software available from Couchbase, Inc
• Community and enterprise editions
Couchbase NoSQL Open Source Technology
O p e n S o u r c e P r o j e c t
![Page 4: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/4.jpg)
4
What we’ll talk about
• View basics• Lifecycle of a view
Index definition, build, and query phase Indexing details
• Replica indexes, failover and compaction• Primary and Secondary indexes• View best practices• Additional patterns
![Page 5: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/5.jpg)
5
JSON Documents
• Map more closely to objects or entities• CRUD Operations, lightweight schema
• Stored under an identifier key
{ “fields” : [“with basic types”, 3.14159, true], “like” : “your favorite language”}
client.set(“mydocumentid”, myDocument);mySavedDocument = client.get(“mydocumentid”);
![Page 6: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/6.jpg)
What are Views?
• Extract fields from JSON documents and produce an index of the selected information
![Page 7: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/7.jpg)
Views – The basics
• Define materialized views on JSON documents and then query across the data set • Using views you can define
• Primary indexes • Simple secondary indexes (most common use case)• Complex secondary, tertiary and composite indexes• Aggregations (reduction)
• Indexes are eventually indexed • Queries are eventually consistent with respect to documents• Built using Map/Reduce technology
• Map and Reduce functions are written in Javascript
![Page 8: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/8.jpg)
View LifecycleDefine -> Build -> Query
8 8
![Page 9: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/9.jpg)
• Create design documents on a bucket• Create views within a design document
Buckets & Design docs & Views
9
BUCKET 1
Design document
1
View 1
View 2
View 3
Design document
2
View 4
View 5
Design document
3
View 6
View 7
BUCKET 2
![Page 10: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/10.jpg)
10
33 2
Eventually indexed Views – Data flow2
Managed Cache
Dis
k Q
ueue
Disk
Replication Queue
App Server
Couchbase Server Node
Doc 1Doc 1
Doc 1
To other node
View engine
Doc 1
![Page 11: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/11.jpg)
11
COUCHBASE SERVER CLUSTER
Distributed Indexing and Querying
User Configured Replica Count = 1
ACTIVE
Doc 5
Doc 2
Doc
Doc
Doc
SERVER 1
REPLICA
Doc 3
Doc 1
Doc 7
Doc
Doc
Doc
APP SERVER 1
COUCHBASE Client Library
CLUSTER MAP
COUCHBASE Client Library
CLUSTER MAP
APP SERVER 2
Doc 9
• Indexing work is distributed amongst nodes
• Parallelize the effort
• Each node has index for data stored on it
• Queries combine the results from required nodes
ACTIVE
Doc 3
Doc 1
Doc
Doc
Doc
SERVER 2
REPLICA
Doc 6
Doc 4
Doc 9
Doc
Doc
Doc
Doc 8
ACTIVE
Doc 4
Doc 6
Doc
Doc
Doc
SERVER 3
REPLICA
Doc 2
Doc 5
Doc 8
Doc
Doc
Doc
Doc 7
Query
Create Index / View
![Page 12: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/12.jpg)
12
DEFINE Index / View Definition in JavaScript
CREATE INDEX City ON Brewery.City;
12
![Page 13: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/13.jpg)
13
BUILD Distributed Index Build Phase
• Optimized for lookups, in-order access and aggregations• View reads are from disk (different performance profile than GET/SET)• Views built against every document on every node–Group them in a design document
• Views are automatically kept up to date
![Page 14: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/14.jpg)
14
QUERY Dynamic Queries with Optional Aggregation
Query ?startkey=“J”&endkey=“K”{“rows”:[{“key”:“Juneau”,“value”:null}]}
• Eventually consistent with respect to document updates• Efficiently fetch a document or group of similar documents • Queries will use cached values from B-tree inner nodes when possible• Take advantage of in-order tree traversal with group_level queries
![Page 15: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/15.jpg)
–All the views within a design document are incrementally updated when the view is accessed or auto-indexing kicks in–Automatic view updates• In addition to forcing an index build at query time, active & replica indexes are
updated every 3 seconds of inactivity if there are at least 5000 new changes (configurable)
–The entire view is recreated if the view definition has changed–Views can be conditionally updated by specifying the “stale”
argument to the view query–The index information stored on disk consists of the combination
of both the key and value information defined within your view.
Index building details
![Page 16: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/16.jpg)
16
Queries run against stale indexes by default
• stale=update_after (default if nothing is specified)–always get fastest response–can take two queries to read your own writes
• stale=ok–auto update will trigger eventually–might not see your own writes for a few minutes– least frequent updates -> least resource impact
• stale=false–Use with “set with persistence” if data needs to be included in
view results–BUT be aware of delay it adds, only use when really required
![Page 17: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/17.jpg)
Views and Replica indexes
• In addition to replicas for data (up to 3 copies), optionally create replica for indexes• Each node manages replica index data structures• Set at a bucket level • Replica index populated from replica data• Replica index is used after a failover
![Page 18: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/18.jpg)
Views and failover
• Replica indexes enabled on failover• Replicas indexes are rebuilt on replica nodes–Automatically incrementally built based on replica data–Updated every 3 seconds of inactivity if there are at least 5000
new changes–Not copied/moved to be consistent with persisted replica data
![Page 19: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/19.jpg)
View Compaction
• Compaction is ONLINE • Reclaims empty allocated space from disk• Indexes are stored on disk for active vBuckets on each node
and updated in append-only manner• Auto-compaction performed in the background–Set the database fragmentation levels–Set the index fragmentation levels–Choose a schedule–Global and bucket specific settings
![Page 20: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/20.jpg)
20
Development vs. Production Views
• Development views index a subset of the data.• Publishing a view builds the
index across the entire cluster.• Queries on production views
are scattered to all cluster members and results are gathered and returned to the client.
![Page 21: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/21.jpg)
21
21
Simple Primary and
Secondary Indexing
![Page 22: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/22.jpg)
Example Document
22
Document ID
![Page 23: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/23.jpg)
23
Define a primary index on the bucket
• Lookup the document ID / key by key, range, prefix, suffix
Index definition
![Page 24: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/24.jpg)
24
Define a secondary index on the bucket
• Lookup an attribute by value, range, prefix, suffix
Index definition
![Page 25: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/25.jpg)
25
Find documents by a specific attribute
• Lets find beers by brewery_id!
![Page 26: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/26.jpg)
26
The index definition
ValueKey
![Page 27: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/27.jpg)
27
The result set: beers keyed by brewery_id
![Page 28: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/28.jpg)
View Best Practices
28
28
![Page 29: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/29.jpg)
function(doc, meta){ if (doc.ingredient) { emit(doc.ingredient.ingredtext, null); }} 2
9
View writing guidance
• Move frequently used views out to a separate design document– All views in a design document are updated at the same time– This can result in increase index building time if all views are in a single design
document, especially for frequently accessed views.– However, grouping views into smaller number of design documents improves overall
performance
• Try to avoid computing too many things with one view• Use built-in reduces where possible - custom reduces are not optimized• Check for attribute existence
![Page 30: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/30.jpg)
30
View writing guidance
function(doc, meta) { if(doc.type == “player”) emit(doc.experience, null);}
• Do not include the document in the view value– Instead either use the GET / SET API or the API that includes documents filtered by
the query [example: willIncludeDocs()]– Emit either null or the ID instead (meta.id) in your key or value data
• Don’t emit too much data into a view value– Use views to filter documents – Then use the data path to access the matched documents
• Use Document Types to make views more selective
emit(doc.name, null)
![Page 31: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/31.jpg)
What impact do views have on the system?
• Complexity of the index CPU • Size of the value emitted and selectivity Disk size, I/O• Replica index Disk size, I/O, CPU• Number of design doc CPU, I/O, Disk size– 4 active and 2 replica design documents are built in parallel by default – Can be changed using the maxParallelIndexers and maxParallelReplicaIndexers parameters
• Compaction of views CPU, I/O • Rebalance time Increases with views to support consistent query
results during rebalance – Can be disabled using the indexAwareRebalanceDisabled parameter
![Page 32: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/32.jpg)
Views and OS caching
• File system cache availability for the index has a big impact performance• Indexes are disk based and should have sufficient file system
cache available for faster query access• In house performance results show that by doubling system
cache availability–query latency reduces to half – throughput increases by 50%
• Runs based on 10 million items with 16GB bucket quota and 4GB, 8GB system RAM availability for indexes
![Page 33: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/33.jpg)
Query PatternBasic Aggregations
33
33
![Page 34: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/34.jpg)
34
Use a built-in reduce function with a group query
• Lets find average abv for each brewery!
![Page 35: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/35.jpg)
35
35
We are reducing doc.abv with _stats
![Page 36: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/36.jpg)
36
36
Group reduce (reduce by unique key)
![Page 37: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/37.jpg)
Query PatternTime-based Rollups
37
37
![Page 38: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/38.jpg)
38
Find patterns in beer comments by time
{ "type": "comment", "about_id": "beer_Enlightened_Black_Ale", "user_id": 525, "text": "tastes like college!", "updated": "2010-07-22 20:00:20"}{ "id": "f1e62"}
timestamp
![Page 39: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/39.jpg)
39
Query with group_level=2 to get monthly rollups
![Page 40: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/40.jpg)
40
dateToArray() is your friend
• String or Integer based timestamps• Output optimized for group_level
queries• array of JSON numbers:
[2012,9,21,11,30,44]
dateT
oArra
y
()
![Page 41: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/41.jpg)
41
group_level=2 results
• Monthly rollup• Sorted by time—sort the query results in your
application if you want to rank by value—no chained map-reduce
![Page 42: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/42.jpg)
42
group_level=3 - daily results - great for graphing
• Daily, hourly, minute or second rollup all possible with the same index.
![Page 43: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/43.jpg)
43
43
Query PatternLeaderboard
![Page 44: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/44.jpg)
44
Aggregate value stored in a document
• Lets find the top-rated beers!
{ "brewery": "New Belgium Brewing", "name": "1554 Enlightened Black Ale", "abv": 5.5, "description": "Born of a flood...", "category": "Belgian and French Ale", "style": "Other Belgian-Style Ales", "updated": "2010-07-22 20:00:20", “ratings” : { “jchris” : 5, “scalabl3” : 4, “damienkatz” : 1 }, “comments” : [ “f1e62”, “6ad8c” ]}
ratings
![Page 45: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/45.jpg)
45
45
Sort each beer by its average rating
• Lets find the top-rated beers!
average
![Page 46: Couchbase 2.0 and Indexing/Querying](https://reader033.fdocuments.in/reader033/viewer/2022052323/55878ff9d8b42a365d8b46f0/html5/thumbnails/46.jpg)
Questions?
46
46