Demystifying Endeca’s search results ranking Kristina Spurgin with input & support from Ben...

31
Demystifying Endeca’s search results ranking Kristina Spurgin with input & support from Ben Pennell & Jeff Campbell UNC Libraries

Transcript of Demystifying Endeca’s search results ranking Kristina Spurgin with input & support from Ben...

Page 1: Demystifying Endeca’s search results ranking Kristina Spurgin with input & support from Ben Pennell & Jeff Campbell UNC Libraries.

Demystifying Endeca’s search results ranking

Kristina Spurginwith input & support from Ben Pennell & Jeff Campbell

UNC Libraries

Page 2: Demystifying Endeca’s search results ranking Kristina Spurgin with input & support from Ben Pennell & Jeff Campbell UNC Libraries.
Page 3: Demystifying Endeca’s search results ranking Kristina Spurgin with input & support from Ben Pennell & Jeff Campbell UNC Libraries.
Page 4: Demystifying Endeca’s search results ranking Kristina Spurgin with input & support from Ben Pennell & Jeff Campbell UNC Libraries.
Page 5: Demystifying Endeca’s search results ranking Kristina Spurgin with input & support from Ben Pennell & Jeff Campbell UNC Libraries.
Page 6: Demystifying Endeca’s search results ranking Kristina Spurgin with input & support from Ben Pennell & Jeff Campbell UNC Libraries.

Endeca details

• Search Configuration and Relevance Ranking– The supported search methods and details on how

results are ranked for each• TRLN Endeca Data Model– The major field groups, with brief descriptions of

their use, and indexing and display properties.• Endeca Extract and Mappings Spreadsheet– Details on how MARC fields get mapped into

Endeca fields

Page 7: Demystifying Endeca’s search results ranking Kristina Spurgin with input & support from Ben Pennell & Jeff Campbell UNC Libraries.

TRLN Endeca Search Interfaces

• Words anywhere (i.e. Keyword)• Author• Title• Journal title• Subject• ISBN/ISSN• (Publisher)

Page 9: Demystifying Endeca’s search results ranking Kristina Spurgin with input & support from Ben Pennell & Jeff Campbell UNC Libraries.

Spotting the relevancy strata

• Subject search relevancy strategy– Exact phrase match, starting from beginning of a

single field is the gold-standard match– Subject heading search: commonplace book

Page 10: Demystifying Endeca’s search results ranking Kristina Spurgin with input & support from Ben Pennell & Jeff Campbell UNC Libraries.
Page 11: Demystifying Endeca’s search results ranking Kristina Spurgin with input & support from Ben Pennell & Jeff Campbell UNC Libraries.

PubDateSort = 1700

No pub date!

Page 12: Demystifying Endeca’s search results ranking Kristina Spurgin with input & support from Ben Pennell & Jeff Campbell UNC Libraries.
Page 13: Demystifying Endeca’s search results ranking Kristina Spurgin with input & support from Ben Pennell & Jeff Campbell UNC Libraries.

A more complex search: keyword(AKA “Words anywhere”)

“Searches all indexed fields, but only uses some fields to rank results.” -- Search Configuration and Relevance Ranking

Page 15: Demystifying Endeca’s search results ranking Kristina Spurgin with input & support from Ben Pennell & Jeff Campbell UNC Libraries.

What fields are indexed?• Endeca Extract and Mappings Spreadsheet gives

the detailed info.

Page 16: Demystifying Endeca’s search results ranking Kristina Spurgin with input & support from Ben Pennell & Jeff Campbell UNC Libraries.

More on keyword search(AKA “Words anywhere”)

“Matches in the main title, subject headings, and main author fields will be given the highest ranking.” -- Search Configuration and Relevance Ranking

Page 17: Demystifying Endeca’s search results ranking Kristina Spurgin with input & support from Ben Pennell & Jeff Campbell UNC Libraries.
Page 18: Demystifying Endeca’s search results ranking Kristina Spurgin with input & support from Ben Pennell & Jeff Campbell UNC Libraries.
Page 19: Demystifying Endeca’s search results ranking Kristina Spurgin with input & support from Ben Pennell & Jeff Campbell UNC Libraries.

More on keyword search(AKA “Words anywhere”)

“Queries that match as a phrase are ranked higher than those which do not.” -- Search Configuration and Relevance Ranking

Page 20: Demystifying Endeca’s search results ranking Kristina Spurgin with input & support from Ben Pennell & Jeff Campbell UNC Libraries.

More on keyword search(AKA “Words anywhere”)

“Exact term matches are ranked higher than those returned because of spell correction, stemming, and thesaurus lookups.” -- Search Configuration and Relevance Ranking

Page 21: Demystifying Endeca’s search results ranking Kristina Spurgin with input & support from Ben Pennell & Jeff Campbell UNC Libraries.
Page 22: Demystifying Endeca’s search results ranking Kristina Spurgin with input & support from Ben Pennell & Jeff Campbell UNC Libraries.

More on keyword search(AKA “Words anywhere”)

“Matches in tables of contents, summaries, or selected EAD elements are not used to determine ranking.” -- Search Configuration and Relevance Ranking

Page 23: Demystifying Endeca’s search results ranking Kristina Spurgin with input & support from Ben Pennell & Jeff Campbell UNC Libraries.

An aside on keyword search(AKA “Words anywhere”)

Page 24: Demystifying Endeca’s search results ranking Kristina Spurgin with input & support from Ben Pennell & Jeff Campbell UNC Libraries.

Fields used to rank Keyword resultsMost important to least

Main TitleMain Title NormalizedTitle VernacularTitle Vernacular SegmentedSubject HeadingsSubjects NormalizedSubjects Vernacular SegmentedMain AuthorMain Author NormalizedMain Author VernacularMain Author Vernacular SegmentedCompanyVarying TitlesVarying Titles Vernacular SegmentedOther AuthorsOther Author TranslationAuthors NormalizedMain Uniform TitleMain Uniform Title VernacularMain Uniform Title Vernacular SegmentedUniform TitleUniform Title VernacularUniform Title Vernacular SegmentedTitle Index

Earlier TitleLater TitleHost Item LinkingUncontrolled SubjectOther TitlesOther Title TranslationTranslated as LinkingTranslation of LinkingSeries Title IndexSeries StatementSeries NormalizedSeries Statement VernacularSeries Statement Vernacular SegmentedPublisherPublisher NormalizedSound Recording ImprintDirectorPerformer CreditsProduction CreditsBiographical SketchRelated CollectionsDigital CollectionGenreProduct

Page 25: Demystifying Endeca’s search results ranking Kristina Spurgin with input & support from Ben Pennell & Jeff Campbell UNC Libraries.

Fields used to rank Title resultsMost important to least

Title1Title2Title3Title4Main TitleMain Title NormalizedJournal Title IndexTitle VernacularTitle Vernacular SegmentedVarying TitlesTitles NormalizedVarying Titles Vernacular SegmentedMain Uniform TitleMain Uniform Title VernacularMain Uniform Title Vernacular Segmented

Page 26: Demystifying Endeca’s search results ranking Kristina Spurgin with input & support from Ben Pennell & Jeff Campbell UNC Libraries.

1 word titles

2 word titles

3 word titles

Page 27: Demystifying Endeca’s search results ranking Kristina Spurgin with input & support from Ben Pennell & Jeff Campbell UNC Libraries.

Fields used to rank Journal Title resultsMost important to least

Journal Title IndexJournal Uniform TitleJournal Title AbbreviationJournal Later TitleJournal Earlier Title

Page 28: Demystifying Endeca’s search results ranking Kristina Spurgin with input & support from Ben Pennell & Jeff Campbell UNC Libraries.

Fields used to rank Author resultsMost important to least

Main AuthorMain Author NormalizedMain Author VernacularMain Author Vernacular SegmentedDirectorPerformer CreditsProduction CreditsAuthor

Page 29: Demystifying Endeca’s search results ranking Kristina Spurgin with input & support from Ben Pennell & Jeff Campbell UNC Libraries.

Fields used to rank Subject resultsMost important to least

Subject HeadingsSubjects Vernacular SegmentedSubjects NormalizedGenre

Page 30: Demystifying Endeca’s search results ranking Kristina Spurgin with input & support from Ben Pennell & Jeff Campbell UNC Libraries.

What is irrelevant to relevancy?

• Many aspects of the record are NOT considered in relevancy ranking

• FORMAT is the biggest surprise, it seems

Page 31: Demystifying Endeca’s search results ranking Kristina Spurgin with input & support from Ben Pennell & Jeff Campbell UNC Libraries.

And, with that whirlwind tour…

Image source