Advanced Search With Lucene

30
Advanced Search with Lucene Drupal + Lucene without the caffeine.

Transcript of Advanced Search With Lucene

Advanced Search with Lucene

Drupal + Lucene without the caffeine.

Introductions

PresentersChris Pliakas – Engineer

Erich Beyrent – VP of Engineering

http://www.commonplaces.com

Presentation Summary

•The problem with Search

What is Lucene?

The Search Lucene API module

Advanced usage

Implementing the API

The state of SLAPI and where it is going

Problem - Common Search Requests

Advanced query Syntax

High-performance, scalable

Ability to add custom facets

Multisite search, content not shared

Managed through Drupal admin interface

Analysis of the Core Search

Pros: Good API ... for the most part

Pure PHP solution Works out of the box

Cons:Elementary query syntaxNot scalableNo good method to alter queryAdding facets is unwieldy

What is Lucene?

An open source text search library written in

High-performance and full featured

Supported by the Apache Software Foundation

is ...

Capabilities of Lucene

•Ranked search results

•Boolean AND, AND NOT, OR

•Fielded data search

•Powerful query types•Wildcard, fuzzy, range, boost

•Field and term grouping

•Index on filesystem, no SQL

Search Lucene API

Search Lucene API

http://drupal.org/project/luceneapi

Goals of Search Lucene API

•Integrate Lucene into Drupal

•API for Lucene backend, define hooks

•Implement and extend core Search API

•Easy to install, no external services

•Native PHP solution

Drupal ninjas use hooks, andwe don't want to upset ninjas.

Where's the PHP?

What is the Zend Framework?

Well documented, tested, E_STRICT compliant

ZF's Zend_Search_Lucene component

Object oriented PHP port of Lucene

Lucene index binary compatible with Java

Stripped down version of required components

Expertly Decaffeinated by the

Installation

•Download Search Lucene API from Drupal.

•http://drupal.org/project/luceneapi

Download ZF components from SourceForge.net

Enable the Search Lucene API modules

•Search, Search Lucene API, Search Lucene Content

Run !!

Your site search now rocks.

Configuring Search Lucene API

Hijacking the core search box

Error handling settings

Search Lucene Content settings

Configuring facets

No kittens were harmed in the makingof the D6 version of Search Lucene API

Performance Testing

Search Lucene API vs. Search vs. Apache Solr

Memory consumption

Page load time

Index maintenance operations

Comparison With Other Engines

Improving Lucene Performance

Search results caching

Result set limit

Index optimization

Performance Settings

Maintaining Lucene With Drush

Who needs cron?

Performing common maintenance tasks

Retrieving index information

Updating “gotcha”

The future of Drush integration

Search Lucene API

Implementing the API

Objects passed by reference

Exceptional error handling with Exceptions

Autoload implementation

Abstraction layer for common ZF objects

Before we start developing ...

PHP 5 Language Constructs

Faceted Search

•“A faceted classification system allows the assignment

• of multiple classifications to an object, enabling the

•classifications to be ordered in multiple ways, rather

•than in a single, pre-determined, taxonomic order.”

•~Wikipedia

“Wikipedia is the best thing ever. Anyone inthe world can write anything they want aboutany subject. So you know you are getting thebest possible information” ~Michael Scott

Creating Facets

Why the Facet API makes sense

hook_luceneapi_facet($op, $module, $type)

Handling facets via “facet handler” callback

How to $_GET facet values

Defining multiple facets in one hook.

Advanced facets on Twolia

Creating a Search Lucene API Facet Module

How the Facet API Works

Converting $_POST to $_GET

Facet hook invoked in luceneapi_form_alter()

Callbacks invoked in luceneapi_search('search')

Facet queries appended as required subqueries

Very similar to the core Search

Extending Search Lucene Content

Index Hooks

•hook_luceneapi_document_alter($doc, $module, $type)

•hook_luceneapi_document_delete($item, $module, $type)

“Useful for adding extra fields forfaceted searched ad filtering whichdata can be deleted from the index”

Extending Search Lucene Content

Search Hooks

•hook_luceneapi_query_alter($query, $module, $type)

•hook_luceneapi_result_alter(&$result, $module, $type)

•hook_luceneapi_positive_keys($keys, $module, $type)

“Useful for modifying the final search query and the informationdisplayed in the results”

Creating a Search Lucene API Module

•Core search hooks:•hook_search(), hook_update_index()

•Search Lucene API hooks:•hook_luceneapi_index($op)

“Search Lucene API is an extensionof the core Search API”

Future Development

Search Lucene API

Going Forward

Drawbacks

Memory intensive

Lack of an SMP solution

Lucene index on NFS volumes

Distributed indexes?

Search Lucene API 2.0

Process control extension

Forking the search processes

Index opened only once on startup

Drupal module becomes the application

Addressing scalability

Search Lucene API 2.0

User, help, multisite search

Result sorting

User defined weights and boost factors

Better index statistics

Improved caching mechanism

New Features

Recap

Replace core search with Search Lucene API

Install, configure, and tune SLAPI modules

Maintain indexes via Drush

Use and extend Seach Lucene API

In Summary ...

Search Lucene API

Questions

Thank you!

Search Lucene APIhttp://drupal.org/project/luceneapi

PresentersChris Pliakas – Engineer

Erich Beyrent – VP of Engineering

http://www.commonplaces.com