EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

57
DEMYSTIFYING OAK SEARCH PRESENTED BY Justin Edelson & Darin Kuntze | Adobe

Transcript of EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

Page 1: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

DEMYST IFY ING OAK SEARCH

P R E S E N T E D B Y

Justin Edelson & Darin Kuntze | Adobe

Page 2: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

2

Page 3: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

3

• Oak Query Implementation

• Cost Calculation

• Oak Index Implementations

AGENDA

Page 4: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

4

CAVEAT

Covers Oak 1.0.5 (AEM 6.0 SP1)

Page 5: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

5

• Search is the most significant change for AEM developers between CRX2 and

Oak.

WHY SHOULD YOU CARE?

Page 6: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

6

CRX2 Search – Limited Optimization Opportunities

Baseline Search Performance – OK

No “Plan” Output

Single Index

Minimal Configuration

WHY SHOULD YOU CARE?

Page 7: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

7

Oak Search – Many Optimization Opportunities

Baseline Performance – Slow

Viewable Plan

Different Index Types

WHY SHOULD YOU CARE?

Page 8: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

8

OAK QUERY IMPLEMENTAT ION OVERVIEW

Page 9: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

9

EXAMPLE

/jcr:root/content/geometrixx/en/products//element(*,

nt:unstructured)[@sling:resourceType = 'geometrixx/components/title' and

@jcr:title = 'Triangle']

Page 10: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

10

Oak supports an “explain” query prefix, similar to what many RDMBS’s support.

explain /jcr:root/content/geometrixx/en/products//element(*,

nt:unstructured)[@sling:resourceType = 'geometrixx/components/title’]

Shows you which index was used.

queryResult.getRows().nextRow().getValue("plan")

SEE ING THE PLAN

Page 11: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

11

SEE ING THE PLAN – EXPLAIN QUERY TOOL

Plan

Explanation

Page 12: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

12

Stored in the repository as nodes under /oak:index

Node Type is oak:QueryIndexDefinition

Single mandatory property – “type”

Optional generic properties:

async – set to “async” to do index updates asynchronously

reindex – set to true to trigger a reindex

declaringNodeTypes – one or more node types to restrict indexing

entryCount – used to weight indexes

INDEX DEF IN IT IONS

Page 13: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

13

Sync indexes (the default) update in the context of a save() call

Async indexes do not.

Every 5 seconds, the diff between the last successful indexed state and the

HEAD state is read and used to update the index

CONSEQUENCE - async indexes may not return up-to-date returns

The OOTB ordered and Lucene indexes are defined as async.

All external indexes (e.g. Solr) should also be async.

SYNC VS. ASYNC INDEX

Page 14: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

14

VIEWING CURRENT INDEXES

Page 15: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

15

Many indexes store their content in the repository, but hidden.

Cannot be viewed using CRXDE Lite.

Must use oak-run

TarMK – use either “explore” (GUI) or “console” (CLI) command

MongoMK – use “console” command

• Vote for OAK-2096 to get “explore” support working for MongoMK

VIEWING INDEX CONTENT

Page 16: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

16

Created as content via CRXDE Lite / deployed using content package

Created through code.

Created through configuration.

CREATING AN INDEX

Page 17: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

17

When the configuration changes

For example, changing the declaringNodeTypes

But not the entryCount

(Sometimes) After updating Oak

Check the Release Notes, this should be prominently indicated.

But not arbitrarily…

Reindexing is a resource intensive process.

Reindexing will NOT help query performance.

WHEN SHOULD YOU RE INDEX?

Page 18: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

18

Each Index calculates a relative cost for the query

Number between 0 and Infinity

0 = “Pick me!”

Infinity = “Don’t Pick Me!”

COST CALCULAT ION

Page 19: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

19

Enable DEBUG logging on org.apache.jackrabbit.oak.query.QueryImpl

Per Index Type Cost

Enable DEBUG logging on

org.apache.jackrabbit.oak.plugins.index.property.PropertyIndex

Detailed Property Cost

Enable DEBUG logging on

org.apache.jackrabbit.oak.plugins.index.property.OrderedPropertyIndex

Detailed Ordered Property Cost

Enable DEBUG logging on org.apache.jackrabbit.oak.plugins.index.lucene

Detailed Lucene Cost

DEBUGGING COST CALCULAT ION

Page 20: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

20

Query = /jcr:root/content/geometrixx/en/products//element(*,

nt:unstructured)[@sling:resourceType = 'geometrixx/components/title' and @jcr:title

= 'Triangle']

cost for aggregate lucene is Infinity

cost for reference is Infinity

cost for ordered is Infinity

cost for nodeType is Infinity

property cost for sling:resourceType is 10003.0

property cost for jcr:title is Infinity

Cheapest property cost is 10003.0 for property sling:resourceType

cost for property is 10003.0

cost for traverse is 199996.0

SAMPLE DEBUG OUTPUT

Page 21: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

21

Query = /jcr:root/content/geometrixx/en/products//element(*,

nt:unstructured)[@sling:resourceType = 'geometrixx/components/title' and

@type='large']

cost for aggregate lucene is Infinity

cost for reference is Infinity

cost for ordered is Infinity

cost for nodeType is Infinity

property cost for sling:resourceType is 10003.0

property cost for type is 21.0

Cheapest property cost is 21.0 for property type

cost for property is 21.0

cost for traverse is 199996.0

SAMPLE DEBUG OUTPUT

Page 22: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

22

These indexes you can create new ones of

Property

Ordered Property

Solr

Lucene

These you shouldn’t

Reference

Node Type

And then there is a special one

Traversing

INDEX IMPLEMENTAT IONS

Page 23: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

23

Stores node paths indexed by a particular property value

Example: /oak:index/slingResourceType

Can be unique (unique = true)

Examples: rep:principalName & jcr:uuid

Only usable with sync indexes

PROPERTY INDEX

Page 24: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

24

PROPERTY INDEX – IN OAK EXPLORER

Page 25: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

25

PROPERTY INDEX – IN OAK EXPLORERA Match!

Page 26: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

26

PROPERTY INDEX - UNIQUE

Page 27: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

27

Generalized Cost Calculation:

Cost per Execution + (Estimated Matches * Cost per Entry)

Cost per Execution – 2

Cost per Entry – 1

PROPERTY INDEX – COST CALCULAT ION

Page 28: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

28

For name=value queries (e.g.

[@sling:resourceType=‘foundation/components/text’], including lists

If entry count provided, the estimated cost is entry count / key count + number

of values in the query

• Key count defaults to entry count / 10000, but can be manually specified

Otherwise, counts up to 100 matches across the first three values.

If > 100 matches, estimated matches are 1.1 ^ (the average depth of matches)

If > 3 values, estimated matches are extrapolated from the first three values.

For exists queries (e.g. [@sling:resourceType]

If entry count provided, it is the estimated count.

Otherwise, counts up to 100 matches across all values.

If > 100 matches, estimated matches are 1.1 ^ (the average depth of matches)

PROPERTY INDEX – EST IMATING MATCHES

Page 29: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

29

Stores node paths indexed by a particular property value

Has extra :next property on each value node to handle ordering

Example: /oak:index/cqLastModified

WARNING – only supports lexigraphic sorting

ORDERED INDEX

Page 30: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

30

ORDERED INDEX – IN OAK EXPLORER

Page 31: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

31

1 + (Estimated Matches * 1.3)

Similar to Property Index

Doesn’t support entryCount

ORDERED INDEX - COST CALCULAT ION

Page 32: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

32

Flat list of UUIDs.

Each node points to a path.

Cost is always 1 if a match is available

REFERENCE INDEX

Page 33: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

33

REFERENCE INDEX – IN OAK EXPLORER

Page 34: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

34

Special type of Property Index

Note that not all node types are indexed by default

Has a default entryCount of a very high value

NODE TYPE INDEX

Page 35: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

35

Oak Index Implementation:

LUCENE

Page 36: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

36

What Oak Lucene is (and is not)

LUCENE

Page 37: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

37

FLOW

jcr:containsquery

detected

Repo-based Lucene index

queried

Results Returned

Page 38: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

38

//*[jcr:contains(., ‘Experience Manager’)]

Any query that includes a full text condition

Native queries

FULL TEXT QUERIES

Page 39: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

39

oak:QueryIndexDefintion

type = lucene

async = async

includePropertyTypes[] = String, Binary

excludePropertyNames[] = …

reindex = true

LUCENE DEF IN IT ION

Page 40: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

40

What can’t you do?

LUCENE

Page 41: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

41

Customize the tika configuration

Configurable analyzers (OAK-2177)

Synonyms

Boost Terms at index time (OAK-2178)

LUCENE

Page 42: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

42

SOLR

Based on Lucene

Fault TolerantRich Document Handlers

Geospatial Search

Load Balancing

AEM 6.0 Configurable:

Full Text Search

IndexingNative Queries

Page 43: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

43

There are 4 configurable components

Oak Solr embedded server

Oak Solr indexing / search

Oak remote server

Oak Solr server provider

SOLR CONFIGURATION

Page 44: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

44

oak:QueryIndexDefintion

type = solr

async = async

reindex = true

SOLR DEF IN IT ION

Page 45: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

45

//*[jcr:contains(., ‘Experience

Manager’)]

SOLR FULL TEXT QUERIES

Solr enables restrictions based on:

• Path

• Property

• Primary Type

Page 46: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

46

jcr:containsquery

detected

Remote solrindex

queried

Results Returned

FLOW

• In oak-solr-core 1.0.1+ (AEM 6 SP1) you can add property, path & primary

• type restrictions to your query

Page 47: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

47

Types of Solr that Oak uses

SOLR TYPES

Embedded Solr

Primarily used for development

work. The solr instance runs within

AEM and can be configured similar

to the remote instance

Remote Solr

Used for non-development

level environments. Typically

these instances take

advantage of fault tolerant

features of the Solr cloud. In

many cases, existing solr

instances are used.

Page 48: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

48

SOLR CONCEPTUAL ARCHITECTURE

AEM 6

Node 1

AEM 6

Node 2

Solr

Shard 1

Solr

Shard 2

Zookeeper

Solr Cloud

Page 49: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

49

Main differences with the Lucene index

You create and control the solr config

Analyzers

Schema

• You must have a schema.xml that accurately reflects the properties and fields you want indexed (and queried). Which is similar to how the property indexes are configured.

Currency

Language

Enabling additional Solr native functionality (example: mlt - more like this)

Some indexing overhead offloaded

All of this is configured on the Solr servers

LUCENE VS. SOLR

Page 50: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

50

//*[rep:native('lucene', 'wine OR beer')]

NATIVE QUERIES

native

function

query type

solr or

lucene

query

select [jcr:path] from [nt:base]

where native('solr', 'mlt?q=Wine&mlt.fl=text&mlt.mindf=1&mlt.mintf=1')

Page 51: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

51

JCR BASED SOLR QUERIES

• Oak index cost is

factored

• Transparent to

executor

• Familiar JCR query

syntax

• Easy access to

repository objects

Page 52: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

52

SOLR TROUBLESHOOTING

AEM 6

(Solrj)

Page 53: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

53

Oak 1.0.8

Lucene Property Indexes

Copy on Write for Lucene Indexes

ONE MORE TH ING…

Page 54: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

54

XPath still works.

AND ONE MORE TH ING…

Page 55: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

55

ACS AEM Commons & ACS AEM Tools - http://adobe-consulting-services.github.io/

AEM Docs - http://docs.adobe.com/docs/en/aem/6-0/deploy/upgrade/queries-

and-indexing.html

Oak Docs - http://docs.adobe.com/docs/en/aem/6-0/deploy/upgrade/queries-

and-indexing.html

QUERY RESOURCES

Page 56: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search

56

Training http://bit.ly/AEMTraining

Documentation http://bit.ly/AEM5Docs & http://bit.ly/AEM6Docs

GEMs Webinar Knowledge Exchange www.adobe.com/go/gems

Mobile Dev: Get started with Adobe PhoneGap

https://github.com/blefebvre/aem-phonegap-kitchen-sink

https://github.com/blefebvre/aem-phonegap-starter-kit

Community

Meet with your peers on-line and in-person, get technical

help from the community, access community articles

• AEM Technologist Community: http://adobe.ly/Qe5BBw

• Evolve for AEM Technologists: http://bit.ly/EvolveDev

• AEM Help Forum: http://adobe.ly/OYdtY0

PackageShare

Sign in to the Adobe

Marketing Cloud to

access packages http://bit.ly/AMCPKGSHARE

Marketing Cloud

Exchangehttp://bit.ly/MCXChange

ADOBE EXPERIENCE MANAGERDeveloper Resources

Page 57: EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search