Fulltext search pres

22
24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander SilverStripe and Full Text Search Giving the people what they want Wednesday, 24 August 2011

Transcript of Fulltext search pres

Page 1: Fulltext search pres

24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander

SilverStripe and

Full Text SearchGiving the people what they want

Wednesday, 24 August 2011

Page 2: Fulltext search pres

24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander

What we’re covering

• What does search give you

• Three ways to get it

• Built in db backed search

• Sphinx module

• Full text search module

Big topic, not much time

Wednesday, 24 August 2011

Page 3: Fulltext search pres

24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander

What we’re not

• Search result visualization

• Search refinement

• Boost, result pre-calculation, faceting, spell checking, real-

time results

• Integrating search with IA

• Measuring search usefulness

• 3rd party modules

But that doesn’t mean they’re not important

Wednesday, 24 August 2011

Page 4: Fulltext search pres

24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander

Why add search?

Wednesday, 24 August 2011

Page 5: Fulltext search pres

24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander

What are you trying to do?

• Most people use navigation by preference

• Stats depend on site, but average 70-95% navigation

• Search is primarily used to locate stuff that’s not obvious

how to navigate to

• Deeply nested pages

• Cross-cutting information not provided as an taxonomic structure

• Re-discovering remembered items

• If search doesn’t give immediate results, users fall back to

navigation again

Be aware of the goals of your users

Wednesday, 24 August 2011

Page 6: Fulltext search pres

24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander

Getting you there quicker

• Interesting is relative

• Ideally return the page the user is after

• But failing that, at least return a page the user is interested in

• Speed is perception

• Raw speed is rarely noticed (except when it is)

• Ability to understand results is as important as accuracy of results

• A second click is OK, as long as there’s a likely payoff: “did you mean” is fine,

disambiguation is OK, paging is useless

To be used, search has to give interesting pages faster than navigation

Wednesday, 24 August 2011

Page 7: Fulltext search pres

24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander

Technology & Tools

Wednesday, 24 August 2011

Page 8: Fulltext search pres

24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander

Database internal full text search

• Most databases come with some full text search built in

• Generally work by adding new indexes to a table column

• Can easily combine full text queries with other filters

• But databases aren’t really designed for it

• Poor query language - no booleans

• Poor language processing

• Limited feature set - no field boost, spell checking, search suggestions,

faceting, result fragments, ....

• Sometime costly technically (MyISAM)

It’s just another index

Wednesday, 24 August 2011

Page 9: Fulltext search pres

24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander

External full text indexers

• Given a schema, and a set of documents, builds an index

• Schema gives both text processing and result relevancy rules

• Different engines either retrieve documents themselves or have documents

sent to them

• Indexes might be write-once (rebuild entire index to add changes)

• Gives a language to query those indexes

• Generally query language is engine-specific

Solr, Sphinx, Elastic Search

Wednesday, 24 August 2011

Page 10: Fulltext search pres

24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander

External engines + SilverStripe

• Building schemas is hard, time consuming, annoying when

model changes

• Can build schemas directly off models

• Effectively free - all the necessary information is already present

• Flexible search - can change form structure without index changes

• Inefficient - includes information you won’t search against

• Or can build schemas off query design

• Needs more though around design of query up front

• More efficient, leads to some more powerful abilities

A tale of two abstractions

Wednesday, 24 August 2011

Page 11: Fulltext search pres

24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander

SilverStripe Integration

Wednesday, 24 August 2011

Page 12: Fulltext search pres

24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander

Built-in search

✓No external dependancies, separate indexes, schema files or

setup

- Can only search SiteTree and File objects, and only specific

fields

- Quality of results is heavily database dependent

Your database-dependent, barely acceptable default

Wednesday, 24 August 2011

Page 13: Fulltext search pres

24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander

Sphinx module

✓ Very little configuration gives great results on moderate

sized sites

✓ Can search any DataObject, but...

- Combining search over multiple DataObjects doesn’t really

work

- Limited real-time update support

- No exact match string mode makes filtering tricky

Easy, quality full text search

Wednesday, 24 August 2011

Page 14: Fulltext search pres

24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander

Fulltext search module

✓ Schemas generated from query structure More flexible and efficient than generating from model structure Closer to how external engines work natively

✓ Eventually multiple search backend support Currently: Solr In future: Sphinx, Elastic Search, Zend_Lucene Not intended to allow code-less swapping of backends.

- Currently needs Solr, which is a Java app Loves memory, hates empty disk space

Powerful (eventually) search engine independent toolkit

Wednesday, 24 August 2011

Page 15: Fulltext search pres

24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander

Full text search module example

Wednesday, 24 August 2011

Page 16: Fulltext search pres

24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander

Define an indexSchema gets generated from this index

Wednesday, 24 August 2011

Page 17: Fulltext search pres

24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander

Define a formStandard SilverStripe stuff

Wednesday, 24 August 2011

Page 18: Fulltext search pres

24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander

Build a query & apply to an indexFilter and excludes can be build & nested

Wednesday, 24 August 2011

Page 19: Fulltext search pres

24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander

Final thoughts

Wednesday, 24 August 2011

Page 20: Fulltext search pres

24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander

Search without searching

• Looks like navigation, acts like search

• Instant taxonomies

• Deal with inconsistent data

• Encourages exploration

Search engines as fuzzy matchers

Wednesday, 24 August 2011

Page 21: Fulltext search pres

24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander

Links

• https://github.com/silverstripe/silverstripe-sphinx

• https://github.com/silverstripe-labs/silverstripe-fulltextsearch

• http://sphinxsearch.com/

• http://lucene.apache.org/solr/

• http://www.elasticsearch.org/

• https://github.com/nyeholt/silverstripe-solr

• http://code.google.com/p/lucene-silverstripe-plugin/

Modules I’ve covered + some other stuff

Wednesday, 24 August 2011

Page 22: Fulltext search pres

Thank you!

24 August, 2011 • SilverStripe Wellington Meetup •

Hamish Friedlander

Twitter: @hafriedlander

Email: [email protected]

Wednesday, 24 August 2011