Couchbase_UK_2013_App_Dev_with_Indexes_Queries_Geo

38

Transcript of Couchbase_UK_2013_App_Dev_with_Indexes_Queries_Geo

App Development with Indexes, Queries and Geo

Michael Nitschinger

Developer Advocate

Agenda

• Introduction to Indexing and Querying in Couchbase

• The lifecycle of Couchbase Views

• Indexing and Querying with related documents

• Patterns

Indexing and Querying

Couchbase Server 2.0: Views

• Views can cover a few different use cases

­ Simple secondary indexes (the most common)

­ Complex secondary, tertiary and composite indexes

­ Aggregation functions (reduction)

• Example: count the number of North American Ales

­ Organizing related data

• Built using Map/Reduce

­ Map function creates a matrix from document fields

­ Reduce function summarizes (reduces) information

­ Written using superfast Javascript (Google V8)

Querying from ViewsQuerying from Ruby Client

View Lifecycle: Define – Build - Query

View Definition (in JavaScript)

like: CREATE INDEX city ON brewery.city;

Distributed Index Build Phase

9

• Optimized for lookups, in-order access and aggregations

• All view reads from disk (different performance profile)

• View builds against every document on every node

• Automatically kept up to date (on writes and reads)

Doc 4

Doc 2

Doc 5

SERVER

1

Doc 6

Doc 4

SERVER

2

Doc 7

Doc 1

SERVER 3

Doc 3

Doc 9

Doc 7

Doc 8 Doc 6

Doc 3

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

Doc 9

Doc 5

DOC

DOC

DOC

Doc 1

Doc 8 Doc 2

Replica Docs Replica Docs Replica Docs

Active Docs Active Docs Active Docs

Dynamic Range Queries with Optional Aggregation

• Efficiently fetch an row or group of related rows.

• Queries use cached values from B-tree inner nodes when possible

• Take advantage of in-order tree traversal with group_level

queries

Doc 4

Doc 2

Doc 5

SERVER

1

Doc 6

Doc 4

SERVER

2

Doc 7

Doc 1

SERVER 3

Doc 3

Doc 9

Doc 7

Doc 8 Doc 6

Doc 3

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

Doc 9

Doc 5

DOC

DOC

DOC

Doc 1

Doc 8 Doc 2

Replica Docs Replica Docs Replica Docs

Active Docs Active Docs Active Docs

?startkey=“J”&endkey=“K”

{“rows”:[{“key”:“Juneau”,“value”:null}]}

Queries run against stale indexes by default

• stale = UPDATE_AFTER (default if nothing is specified)

­ always get fastest response

­ can take two queries to read your own writes

• stale = OK

­ auto update will trigger eventually

­ might not see your own writes for a few minutes

­ least frequent updates -> least resource impact

• stale = FALSE

­ Use with Persistence observe if data needs to be included in view results

­ BUT aware of delay it adds, only use when really required

Development vs. Production Views

• Development views index a subset of the data.

• Publishing a view builds the index across the entire cluster.

• Queries on production views are scattered to all cluster members and results are gathered and returned to the client.

Emergent Schema

JSON.org

Github API

"Capture the user's intent"

• Falls out of your key-value usage

• Helps to know what's efficient

• Deal with unstructured data more easily­ Different schemas/APIs

Query Pattern: Basic Aggregations

Simple secondary Index

• Lets find average abv for each brewery!

Aggregation: Reducing doc.abv with _stats

Group reduce (reduce by unique key)

Query Pattern: Time Based Rollups

Find patterns in beer comments by time

{"type": "comment","about_id":

"beer_Enlightened_Black_Ale",

"user_id": 525,

"text": "tastes like college!","updated": "2010-07-22 20:00:20"

}{

"id":

"u525_c1"

}

timestam

p

Query with group_level=2 to get monthly rollups

dateToArray() is your friend

• String or Integer based timestamps

• Output optimized for group_level queries

• array of JSON numbers: [2012,9,21,11,30,44]

group_level=2 results

22

• Monthly rollup

• Sorted by time—sort the query results in your

application if you want to rank by value—no chained map-reduce

Query Pattern: Leaderboard

Aggregate value stored in a document

• Lets find the top-rated beers!

{"brewery": "New Belgium Brewing",

"name": "1554 Enlightened Black Ale",

"style": "Other Belgian-Style Ales","updated": "2010-07-22 20:00:20",

“ratings” : {

“ingenthr” : 5,

“jchris” : 4,

“scalabl3” : 5,

“damienkatz” : 1

},

“comments” : [ “f1e62”, “6ad8c“ ]

}

ratings

Sort each beer by its average rating

• Lets find the top-rated beers!

26

average

Query Pattern: Collation of Related Docs

Join Through Collation

See Bradley Holt’s presentationfrom CouchConf Boston:http://www.couchbase.com/couchconf-boston

Anti-patterns

• Emitting document or too much data into a view­ Especially avoid including the doc itself in an emit() call

• Reduces that don’t reduce­ If you implement a custom reduce, make sure it doesn’t expand!

• Expecting a query on an index to be as fast­ Secondary indexes need to be built, happen asynchronously, and are

(currently) cached at the filesystem level

• Trying to do too much with one view­ Instead, co-locate views in design documents, or have separate design

documents

• Note that sometimes, you may need to make requests of multiple views­ There is not directly a method of doing a join, but there is a technique

What about Geo?

• Experimental in the 2.0 release

• Currently completely rewritten internally

• Supports GeoJSON, will support more rich queries soon.

• Java SDK contains Geo support right now!

Couchbase Integration

Integration with ElasticSearch

ElasticSearch

1. ElasticSearch Query

2. ElasticSearch Result

3. Couchbase Multi-GET

4. Couchbase Result

The Learning Portal

• Designed and built as a collaboration

between MHE Labs and Couchbase

• Serves as proof-of-concept and

testing harness for Couchbase +

ElasticSearch integration

• Available for download and further

development as open source code

https://github.com/couchbaselabs/learningportal

Integration with Hadoop

Logs

Couchbase Server Cluster

Hadoop Cluster

sqoop import

LogsLogs

LogsLogs

Ad Targeting

Platform

sqoop export

flumeflow

Views Allow Common Methods of QueryingCommon patterns such as simple secondary indexes, count and average aggregations, and time series rollups are simple and fast.

Couchbase Integrates for Full Text and Large AnalyticsCouchbase integrates with ElasticSearch, Hadoop and other systems.

Summary

Couchbase has Views for Indexing and QueryingViews are incremental map-reduce code that run across all documents.

Q&A

Thanks!