Google Is a Two Page Site

Post on 12-Apr-2017

573 views 1 download

Transcript of Google Is a Two Page Site

Google Is Just a Two Page SiteRelevant Results with Sitecore.ContentSearch

Martina Helene WelanderTechnical Consulting Engineer, Sitecore

Speaker

• Technical Consulting Engineer at Sitecore• Community and Information Enthusiast• Ecosystem Sites with Dnepropetrovsk Team

Martina Helene Welander

Hi!• Martina Welander• Technical Consulting Engineer• Ecosystem sites• mhwelander.net / @mhwelander

Speaker

• Technical Consulting Engineer at Sitecore• Community and Information Enthusiast• Ecosystem Sites with Dnepropetrovsk Team• @mhwelander / mhwelander.net

Martina Helene Welander

Speaker

In the direction of awesome, that’s where

…let’s do search!

Can haz knowledge?

Google Is Just a Two Page SiteRelevant Results with Sitecore.ContentSearch

Martina Helene WelanderTechnical Consulting Engineer, Sitecore

“Google is simply a search box with a second page of results. And those results are from other sites!”

Lalala hello world

examples lalala ten

items in my tree!

Sitecore.ContentSearch 101

Sitecore 7

Search and index

ALL the items

*

*

Search API(LINQ-based)

Search Technology Provider(DLLs and Configuration)

Search Technology API and Indexes

IEnumerable<DocSearchResult>

var index = Sitecore.ContentSearch.ContentSearchManager.GetIndex("sitecore_master_index");

using (var context = index.CreateSearchContext()){

var query = context.GetQueryable<ResultItem>().Where(x => x.Title == "Hej"); var executedResults = query.GetResults(); myModel.myList = executedResults.Hits.Select(x => x.Document).ToList();

}

Where Sitecore adds value• Source content to index to strongly typed object – and back again!• You can actually index anything• Provider model – Solr, Lucene, Elastic Search, Azure Search• Provider-agnostic LINQ-based search API• Highly configurable

Sitecore.ContentSearch is an API

Where should I focus my efforts?

CONFIIIIIG!

Crawlers

Mappers

Converters

Sitecore Field Index Field Object Property

Analyzers

Sitecore Field Searchable Data

Analyzer Wrappers

Back to Plain Ol’ SearchActually kind of difficult

It’s all about the Pentiums analyzers(Tokenizers and Filters)

Tokenizers

Hello my name is Martina

“Hello”, “my”, “name”, “is”, “Martina”

Types of TokenizerStandardTokenizer

“My name is Martina” “My”, “name”, “is”, “Martina”

KeywordTokenizer“My name is Martina” “My name is Martina”

N-Gram Tokenizer (Min 4, Max 5)“sitecore” -> “site”, “itec”, “ecor”, “core”, “siteco”, “iteco” … etc

Filters

Examples of Filters• Standard Filter• (Snowball) Porter Stem Filter• Stop Filter• Synonym Filter• Keep Words Filter• Pattern Replace Filter

ORDER MATTERS!

Indexing Process

Index

Query

Results

“name””Hello”

“Hello, my name is Martina”

“Martina”“my”

Rebuild when analyser changes!

Contains(“Hello, my name is Martina”)

Configuring a custom analyzer

Lucene – What does it look like?

Solr – What does it look like?

<fieldType name="text" class="solr.TextField">  <analyzer>    <tokenizer class="solr.StandardTokenizerFactory"/>    <filter class="solr.StandardFilterFactory"/>    <filter class="solr.LowerCaseFilterFactory"/>    <filter class="solr.StopFilterFactory"/>    <filter class="solr.EnglishPorterFilterFactory"/>  </analyzer></fieldType>

Previewing and help

6492 12:54:21 INFO ExecuteQueryAgainstLucene (sitecore_master_index): content:make~0.7 title:make~0.7 content:new~0.7 title:new~0.7 content:item~0.7 title:item~0.7 - Filter :

Debugging A Lucene-Based ContentSearch In Sitecore- Dan Cruickshank

My Super-Duper Analyzer

…which isn’t very special at all • Standard analyser• Standard filter• Porter Stem Filter• StopWords Filter• Synonym Filter (EXM / ECM, PXM / APS)*• Lowercase filter

The Query

What makes something relevant? (tf.idf)• tf – term frequency • Idf – inverse document frequency • coord - # of terms found in document • fieldNorm – field length

My fields• Title• Text• Byline• Keywords• Product

context.GetQueryable<ResultItem>() .Where(…)

.Filter() vs .Where()

#1 – Find me a match• Equals()• Contains()• StartsWith()

.Where(x => x.ResultsTitle.Contains("scaling"))

.Where(x => x["scaling"].Contains("scaling"))

.Match()

.EndsWith()

#2 – Slop and fuzziness!• Like()• Fuzzy search – fuzziness factor (float)• Phrase search – slop (int)

#3 – I love you, PredicateBuilder Expression<Func<ResultItem, bool>> predicate = PredicateBuilder.True<ResultItem>();

foreach (var word in list) {

predicate = predicate.Or(x => x.Title.Contains(word);}

False for ‘OR’,True for ‘AND’

#4 – Boost

• At query time• At index time (type or field)• Rules-based

BOOST

BOOST

~1000 real items

storageType=“true”

Attempt #1: EVERYTHING

If the title…

• Like phrase (with slop)• Contains phrase• Starts with phrase• Equals phrase

If the content…

• Like phrase (with slop)• Contains phrase• Starts with phrase• Equals phrase

Search: xDB Scaling

Search: Managing engagement plans

Search: Create engagement plans

A couple of important lessons•Whole Phrases vs Individual Terms• Boost()• Contains() / Equals()

Attempt #2: Phrase and terms

“engagement plan setup”OR

“engagement” OR “plan” OR “setup”

“engagement” AND “plan”OR

“engagement” AND “setup”OR

“plan” AND “setup”OR

“engagement” AND “plan” AND “setup”

Needs more boost

Attempt #3: Favouring titles

Sitecore 7 ContentSearch Tips- Matt Burke

“Finding a user’s search term in the title or keywords of a document is probably more relevant than one where the term is only in the body”

My work in progress

If nothing is working, you probably didn’t rebuild your index

Search: xDB Scaling

Search: Manage engagement plans

Search: Create engagement plans

// TODO: On the plane home• Keywords• Location • Pinning exact title matches – “scaling”• Expected search phrases with boost – e.g. “scaling xDB”, “xDB

scaling”, “xDB scaling options”

xDB• Key Behaviour Cache – developer or editor?• Common searches

It’s not all queries and indexes• Vague titles are a bit of a nightmare• Review use of keywords in content• “I would never search for that!” • Continuous user testing and tuning

What I learned• It isn’t magic • Get to know the provider• Content and content structure matter• Search is actually quite hard

OrganizersSponsor

Thanks to our… &…