Drupal 7 and SolR

Post on 12-Apr-2017

33 views 0 download

Transcript of Drupal 7 and SolR

Drupal 7 + SolrTools’ overview / Integration / Usage / Case studies

IntroductionDigital agency established since

2008 in Mauritius

Recognized as one of the most expert offshore web agencies in Drupal

More than 150 projects in Drupal

IntroductionTechnical Director at Esokia

10 years of experience in PHP

7 years of experience in Drupal

Introduction

What is Drupal?Free, community-built website development

toolModular and extensible content managementOpen sourceBuilt on PHPCreated by Dries BuytaertFirst release in January 2001

Showcase

Showcase

Showcase

More closer

Key concept in DrupalFlexibility / Simplicity / UtilityHigh standard of usability for developers, administrators, and users

Modularity / Extensibility / MaintenabilitySlim and powerful core that can be readily extended through custom modules

Drupal in the future?Drupal 8 expected on October, 2015 (approx.)

Big architectural changes

Built with

What is Solr?Standalone enterprise search serverFull-text search, faceted navigationREST-like APIOpen sourceWritten in JavaSupported by Apache Software FoundationFirst release in January 2007

Main FeaturesFull-Text SearchPowerful matching capabilities including phrases, wildcards, joins, grouping...

Faceted SearchSlicing of data using a large array of faceting algorithms

Main FeaturesHigh Volume TrafficProven on a high scale all over the world

Extensible Plugin ArchitectureWell-defined extension points for indexing, analysis, request handling, query parsing...

Main FeaturesGeospatial SearchMake location-based search with built-in support for spatial search

Rich Document ParsingIndex rich content such as Adobe

PDF, Microsoft Word and more

Other featuresQuery suggestionsProviding suggestions to users as they type in their queries

Spell checking“Did you mean… ?”

HighlightingHelp users to focus on their search

External configuration via XMLAdjust and extend setup with XML files

What’s next?Now in version 5.0 (since February, 2015)Cluster oriented with ZooKeeper & SolrCloudEasier installationBetter admin UI

Solr is now a mature product, with an easier handling

Drupal & Solr - How it works?

HTTPPOST/GET

DB

INDEX

Drupal Application

Apache SOLR Server

Drupal & Solr - How it works?Drupal send content to Solr on cron’s run.Each new or updated content is marked for indexation.Deleted content is removed from the index on cron’s run.

Drupal & Solr - How it works?In Solr, a Document is the unit of search and index.An index consists of one or more Documents, a Document consits of one or more Fields.Each Drupal’s entity is a Document.

Drupal & Solr - How it works?Index & Analysis in Solr:

➩Solr store keywords instead of pages and build an inverted index

➩All data go through many transformations during the analysis phase

Drupal & Solr - Drupal’s side

One module: Apache Solr Search

Custom XML files for Solr’s configuration

Full entity support

Hooks for indexing, querying, displaying data

Many related modules to extend capabilities

Drupal & Solr - Drupal’s sideDrupal’s specific fields for Solr

Drupal & Solr - Drupal’s sideExample of hook

Drupal & Solr - Drupal’s sideExample of hook

Drupal & Solr - Drupal’s sideExample of hook

Drupal & Solr - Solr’s side

Solr 4.x supported by Drupal

Multicore support (one core per application)

Least amount of software dependencies

Drupal & Solr - Solr’s sideSolr Admin UI in 3.x

Drupal & Solr - Solr’s sideSolr Admin UI in 4.x

Drupal & Solr - Solr’s sideRequirements for Solr:✓Apache Solr✓Java (just the JRE)✓… And that’s all!

An embeded Jetty comes with Solr.

Case Study n°1: La Lettre M

Context✓ Business directory

➩ Thousands companies to index➩Not only company, but also companies’ directors➩Financial data and number of employees

✓ External database➩ Huge database with different update sources

✓ Unstructured data➩ Unusable for faceted search

Solutions✓ Preprocessing data

➩Using hook to format data✓ Custom Solr field

➩Created custom data to index in Solr✓ Custom facets

➩ Add custom facet for filtering purpose

SolutionsPreprocessing data with hook implementation

➩ Standardization of financial data➩ Standardization of number of employees

✓ For faceted search, data was structured in range

SolutionsAdd custom indexable Solr field

➩ Hooks➩ Class functions

✓ New indexabledata

SolutionsCreate custom facet for user experience

➩ Defining new widget➩ Use class inherit

✓ Better user experience in frontend

Case Study n°2: Eramet

Context✓ International listed company with official documents

➩Need to publish financial and official reports➩Create public charter and politic documents➩Strategic and essential data in documents➩French and english documents

Solutions✓ Extract and index data from documents

➩Using Apache Tika as dependency✓ Distinction between french and english documents

➩ Use of Drupal’s File API and i18n functionnalities

SolutionsExtract data from document with Tika

➩ Extract metadata and text➩ Extracted datas are added to the index➩ Tika’s call in integrated to Solr config XML file

✓ Easy to increase search capability

SolutionsSeparation of files based on language

➩ Creation of indexable entity➩ Add language as an indexable field

✓ Default behavior of Apache Solr Attachmentmodule

InterludeWhat is Tika?

A content analysis toolkitSupported by Apache Software FoundationOpen sourceWritten in Java

InterludeSupported formats (non exhaustive list)

Case study n°3: GFM

Context✓ B2B directory

➩Thousands entries to index➩Cross data capabilities➩Sticky and highlighted entries

✓ Migration context➩From SQL Server to Solr

✓ Unstructured data➩Old database with different mainteners

Solutions✓ Preprocessing data

➩Using hook to format data✓ Dual Solr index

➩One for sticky entries, one for standard entries✓ Usage of taxonomy

➩Categorized content for cross data

SolutionsData standardization for search purposes

➩ Storing entries’ number➩ Managing data update

✓ Volume and recency as search criteria

SolutionsCreate specific indexes

➩ Separate Solr query for result’s limitation

➩ Maintain a display counter

✓ One query per index and combining results

SolutionsTransform taxomony result in search result

➩ Create a query on Solr

✓ Full transparencyfor the user

Other applicationsTraining catalog

Search by filtering sessions by topics and/or dateProduct catalog

Fine search based on various attributes and scopeVideo database

Catchup TVUser directory

Filtering by function, localization...

Other solutions

Open source / Full-text search / Written in C++

Open source / Rich document parsing / Written in C++

Open source / Full-text search / RESTful API

Other solutions

Module available for Drupal

Less popular than Solr

Elasticsearch as an outsider

Questions?

Thank you!