Google Is a Two Page Site

102
Google Is Just a Two Page Site Relevant Results with Sitecore.ContentSearch Martina Helene Welander Technical Consulting Engineer, Sitecore

Transcript of Google Is a Two Page Site

Page 1: Google Is a Two Page Site

Google Is Just a Two Page SiteRelevant Results with Sitecore.ContentSearch

Martina Helene WelanderTechnical Consulting Engineer, Sitecore

Page 2: Google Is a Two Page Site

Speaker

• Technical Consulting Engineer at Sitecore• Community and Information Enthusiast• Ecosystem Sites with Dnepropetrovsk Team

Martina Helene Welander

Page 3: Google Is a Two Page Site
Page 4: Google Is a Two Page Site

Hi!• Martina Welander• Technical Consulting Engineer• Ecosystem sites• mhwelander.net / @mhwelander

Page 5: Google Is a Two Page Site

Speaker

• Technical Consulting Engineer at Sitecore• Community and Information Enthusiast• Ecosystem Sites with Dnepropetrovsk Team• @mhwelander / mhwelander.net

Martina Helene Welander

Page 6: Google Is a Two Page Site
Page 7: Google Is a Two Page Site

Speaker

Page 8: Google Is a Two Page Site

In the direction of awesome, that’s where

Page 9: Google Is a Two Page Site

…let’s do search!

Page 10: Google Is a Two Page Site
Page 11: Google Is a Two Page Site

Can haz knowledge?

Page 12: Google Is a Two Page Site

Google Is Just a Two Page SiteRelevant Results with Sitecore.ContentSearch

Martina Helene WelanderTechnical Consulting Engineer, Sitecore

Page 13: Google Is a Two Page Site

“Google is simply a search box with a second page of results. And those results are from other sites!”

Page 14: Google Is a Two Page Site
Page 15: Google Is a Two Page Site
Page 16: Google Is a Two Page Site

Lalala hello world

examples lalala ten

items in my tree!

Page 17: Google Is a Two Page Site
Page 18: Google Is a Two Page Site
Page 19: Google Is a Two Page Site

Sitecore.ContentSearch 101

Page 20: Google Is a Two Page Site

Sitecore 7

Page 21: Google Is a Two Page Site

Search and index

ALL the items

*

*

Page 22: Google Is a Two Page Site
Page 23: Google Is a Two Page Site

Search API(LINQ-based)

Search Technology Provider(DLLs and Configuration)

Search Technology API and Indexes

IEnumerable<DocSearchResult>

Page 24: Google Is a Two Page Site

var index = Sitecore.ContentSearch.ContentSearchManager.GetIndex("sitecore_master_index");

using (var context = index.CreateSearchContext()){

var query = context.GetQueryable<ResultItem>().Where(x => x.Title == "Hej"); var executedResults = query.GetResults(); myModel.myList = executedResults.Hits.Select(x => x.Document).ToList();

}

Page 25: Google Is a Two Page Site
Page 26: Google Is a Two Page Site

Where Sitecore adds value• Source content to index to strongly typed object – and back again!• You can actually index anything• Provider model – Solr, Lucene, Elastic Search, Azure Search• Provider-agnostic LINQ-based search API• Highly configurable

Page 27: Google Is a Two Page Site

Sitecore.ContentSearch is an API

Page 28: Google Is a Two Page Site
Page 29: Google Is a Two Page Site

Where should I focus my efforts?

Page 30: Google Is a Two Page Site
Page 31: Google Is a Two Page Site

CONFIIIIIG!

Page 32: Google Is a Two Page Site
Page 33: Google Is a Two Page Site

Crawlers

Mappers

Converters

Sitecore Field Index Field Object Property

Analyzers

Sitecore Field Searchable Data

Analyzer Wrappers

Page 34: Google Is a Two Page Site

Back to Plain Ol’ SearchActually kind of difficult

Page 35: Google Is a Two Page Site

It’s all about the Pentiums analyzers(Tokenizers and Filters)

Page 36: Google Is a Two Page Site

Tokenizers

Page 37: Google Is a Two Page Site

Hello my name is Martina

“Hello”, “my”, “name”, “is”, “Martina”

Page 38: Google Is a Two Page Site

Types of TokenizerStandardTokenizer

“My name is Martina” “My”, “name”, “is”, “Martina”

KeywordTokenizer“My name is Martina” “My name is Martina”

N-Gram Tokenizer (Min 4, Max 5)“sitecore” -> “site”, “itec”, “ecor”, “core”, “siteco”, “iteco” … etc

Page 39: Google Is a Two Page Site

Filters

Page 40: Google Is a Two Page Site

Examples of Filters• Standard Filter• (Snowball) Porter Stem Filter• Stop Filter• Synonym Filter• Keep Words Filter• Pattern Replace Filter

ORDER MATTERS!

Page 41: Google Is a Two Page Site

Indexing Process

Page 42: Google Is a Two Page Site
Page 43: Google Is a Two Page Site

Index

Page 44: Google Is a Two Page Site

Query

Page 45: Google Is a Two Page Site
Page 46: Google Is a Two Page Site

Results

Page 47: Google Is a Two Page Site

“name””Hello”

“Hello, my name is Martina”

“Martina”“my”

Rebuild when analyser changes!

Contains(“Hello, my name is Martina”)

Page 48: Google Is a Two Page Site

Configuring a custom analyzer

Page 49: Google Is a Two Page Site

Lucene – What does it look like?

Page 50: Google Is a Two Page Site
Page 51: Google Is a Two Page Site
Page 52: Google Is a Two Page Site

Solr – What does it look like?

<fieldType name="text" class="solr.TextField">  <analyzer>    <tokenizer class="solr.StandardTokenizerFactory"/>    <filter class="solr.StandardFilterFactory"/>    <filter class="solr.LowerCaseFilterFactory"/>    <filter class="solr.StopFilterFactory"/>    <filter class="solr.EnglishPorterFilterFactory"/>  </analyzer></fieldType>

Page 53: Google Is a Two Page Site

Previewing and help

Page 54: Google Is a Two Page Site
Page 55: Google Is a Two Page Site
Page 56: Google Is a Two Page Site
Page 57: Google Is a Two Page Site

6492 12:54:21 INFO ExecuteQueryAgainstLucene (sitecore_master_index): content:make~0.7 title:make~0.7 content:new~0.7 title:new~0.7 content:item~0.7 title:item~0.7 - Filter :

Debugging A Lucene-Based ContentSearch In Sitecore- Dan Cruickshank

Page 58: Google Is a Two Page Site

My Super-Duper Analyzer

Page 59: Google Is a Two Page Site

…which isn’t very special at all • Standard analyser• Standard filter• Porter Stem Filter• StopWords Filter• Synonym Filter (EXM / ECM, PXM / APS)*• Lowercase filter

Page 60: Google Is a Two Page Site

The Query

Page 61: Google Is a Two Page Site

What makes something relevant? (tf.idf)• tf – term frequency • Idf – inverse document frequency • coord - # of terms found in document • fieldNorm – field length

Page 62: Google Is a Two Page Site

My fields• Title• Text• Byline• Keywords• Product

Page 63: Google Is a Two Page Site

context.GetQueryable<ResultItem>() .Where(…)

Page 64: Google Is a Two Page Site

.Filter() vs .Where()

Page 65: Google Is a Two Page Site

#1 – Find me a match• Equals()• Contains()• StartsWith()

.Where(x => x.ResultsTitle.Contains("scaling"))

.Where(x => x["scaling"].Contains("scaling"))

.Match()

.EndsWith()

Page 66: Google Is a Two Page Site

#2 – Slop and fuzziness!• Like()• Fuzzy search – fuzziness factor (float)• Phrase search – slop (int)

Page 67: Google Is a Two Page Site

#3 – I love you, PredicateBuilder Expression<Func<ResultItem, bool>> predicate = PredicateBuilder.True<ResultItem>();

foreach (var word in list) {

predicate = predicate.Or(x => x.Title.Contains(word);}

False for ‘OR’,True for ‘AND’

Page 68: Google Is a Two Page Site

#4 – Boost

• At query time• At index time (type or field)• Rules-based

Page 69: Google Is a Two Page Site

BOOST

Page 70: Google Is a Two Page Site

BOOST

Page 71: Google Is a Two Page Site

~1000 real items

storageType=“true”

Page 72: Google Is a Two Page Site

Attempt #1: EVERYTHING

Page 73: Google Is a Two Page Site
Page 74: Google Is a Two Page Site

If the title…

• Like phrase (with slop)• Contains phrase• Starts with phrase• Equals phrase

If the content…

• Like phrase (with slop)• Contains phrase• Starts with phrase• Equals phrase

Page 75: Google Is a Two Page Site

Search: xDB Scaling

Page 76: Google Is a Two Page Site

Search: Managing engagement plans

Page 77: Google Is a Two Page Site

Search: Create engagement plans

Page 78: Google Is a Two Page Site
Page 79: Google Is a Two Page Site

A couple of important lessons•Whole Phrases vs Individual Terms• Boost()• Contains() / Equals()

Page 80: Google Is a Two Page Site

Attempt #2: Phrase and terms

Page 81: Google Is a Two Page Site

“engagement plan setup”OR

“engagement” OR “plan” OR “setup”

Page 82: Google Is a Two Page Site

“engagement” AND “plan”OR

“engagement” AND “setup”OR

“plan” AND “setup”OR

“engagement” AND “plan” AND “setup”

Page 83: Google Is a Two Page Site

Needs more boost

Page 84: Google Is a Two Page Site
Page 85: Google Is a Two Page Site

Attempt #3: Favouring titles

Page 86: Google Is a Two Page Site

Sitecore 7 ContentSearch Tips- Matt Burke

“Finding a user’s search term in the title or keywords of a document is probably more relevant than one where the term is only in the body”

Page 87: Google Is a Two Page Site
Page 88: Google Is a Two Page Site

My work in progress

Page 89: Google Is a Two Page Site

If nothing is working, you probably didn’t rebuild your index

Page 90: Google Is a Two Page Site
Page 91: Google Is a Two Page Site
Page 92: Google Is a Two Page Site
Page 93: Google Is a Two Page Site
Page 94: Google Is a Two Page Site
Page 95: Google Is a Two Page Site

Search: xDB Scaling

Page 96: Google Is a Two Page Site

Search: Manage engagement plans

Page 97: Google Is a Two Page Site

Search: Create engagement plans

Page 98: Google Is a Two Page Site

// TODO: On the plane home• Keywords• Location • Pinning exact title matches – “scaling”• Expected search phrases with boost – e.g. “scaling xDB”, “xDB

scaling”, “xDB scaling options”

xDB• Key Behaviour Cache – developer or editor?• Common searches

Page 99: Google Is a Two Page Site

It’s not all queries and indexes• Vague titles are a bit of a nightmare• Review use of keywords in content• “I would never search for that!” • Continuous user testing and tuning

Page 100: Google Is a Two Page Site

What I learned• It isn’t magic • Get to know the provider• Content and content structure matter• Search is actually quite hard

Page 101: Google Is a Two Page Site
Page 102: Google Is a Two Page Site

OrganizersSponsor

Thanks to our… &…