Vagif Jalilov Rivet Logic

32
Integrating Apache Solr with Alfresco WCM for Faceted Search and Navigation of Next- Generation Web Sites Vagif Jalilov Rivet Logic

description

Integrating Apache Solr with Alfresco WCM for Faceted Search and Navigation of Next-Generation Web Sites. Vagif Jalilov Rivet Logic. About Rivet Logic. Award-winning professional services focused on: Enterprise Content Management Web Content Management Collaboration and Social Communities - PowerPoint PPT Presentation

Transcript of Vagif Jalilov Rivet Logic

Page 1: Vagif Jalilov Rivet Logic

Integrating Apache Solr with Alfresco WCM for Faceted Search and Navigation of Next-Generation Web Sites

Vagif JalilovRivet Logic

Page 2: Vagif Jalilov Rivet Logic

About Rivet Logic• Award-winning professional services focused on:

– Enterprise Content Management– Web Content Management– Collaboration and Social Communities

• Using Leading Open Source Software

Page 3: Vagif Jalilov Rivet Logic

Business Case for Alfresco & Solr• Large scale sites• Need for real-time updates• Full-text search• Faceted search

Page 4: Vagif Jalilov Rivet Logic

Technical Challenges for Search• Accurately index each page

– Solution: Assembly of relevant content to index• Targeted, real-time indexing

– Solution: Trigger indexing from publishing mechanism

Page 5: Vagif Jalilov Rivet Logic

Possible Index Solutions• Spidering/Crawling

– Follow navigational & cross-links– Parse HTML and fetch relevant content– Spider full (or partial) site each time

• Real-time Indexing– Triggered by FSR deployment– Process only change-set (incremental updates)– Assemble relevant page content

Page 6: Vagif Jalilov Rivet Logic

Source Control• Source code & libs• View templates• Site navigation• Web content

CMS (Alfresco)• Binary Content

Typical Web Application

Page 7: Vagif Jalilov Rivet Logic

Source Control• Source code & libs• (View templates)

CMS (Alfresco)• Binary Content• Web Content• Site Navigation• (View templates)

“Managed” (Riveted) Web Application

Page 8: Vagif Jalilov Rivet Logic

Page Composition

Section-html.xml

Related-links.xml

Supporting-items.xml

Meta-content.xml

Page-metadata.xml

dynamic

dynamic

Page 9: Vagif Jalilov Rivet Logic

Content Delivery

(http://crafterrivet.org)

Page 10: Vagif Jalilov Rivet Logic

Alfresco WCM Lifecycle

Page 11: Vagif Jalilov Rivet Logic

Indexing Architecture

Page 12: Vagif Jalilov Rivet Logic

Solr Customizations• Custom Solr

– Schema.xml• Fields (Type, Indexed/Stored)• Unique key

– Solrconfig.xml• “dismax” type request handler to define queried fields• ExtractingRequestHandler (indexing RT docs)

Page 13: Vagif Jalilov Rivet Logic

Custom Solr Schema <field name="page_url" type="string" indexed="true" stored="true"

required="true"/> <field name="page_title" type="text" indexed="true" stored="true"/> <field name="page_category" type="string" indexed="true"

stored="true"/> <field name="page_type" type="string" indexed="true"

stored="true"/> <field name="page_last_modified" type="date" indexed="true"

stored="true"/> <field name="page_text" type="text" indexed="true" stored="true"/> <field name="page_file_size" type="int" indexed="false"

stored="true"/> </fields>

<uniqueKey>page_url</uniqueKey>

Page 14: Vagif Jalilov Rivet Logic

ExtractingRequestHandler <!-- Solr Cell: http://wiki.apache.org/solr/ExtractingRequestHandler --> <requestHandler name="/update/extract"

class="org.apache.solr.handler.extraction.ExtractingRequestHandler" startup="lazy">

<lst name="defaults"> <str name="fmap.content">page_text</str> <str name="fmap.title">page_title</str> <str name="uprefix">ignored_</str> </lst> </requestHandler>

<dynamicField name="ignored_*" type="ignored"/>

ContentStreamUpdateRequest up = new ContentStreamUpdateRequest("/update/extract");up.addFile(new File(filePath));SolrServer solrServer = new CommonsHttpSolrServer(solrServerUrl);solrServer.request(up);solrServer.commit();

Page 15: Vagif Jalilov Rivet Logic

Custom RequestHandler <!-- DisMaxRequestHandler allows easy searching across multiple

fields for simple user-entered phrases. It's implementation is now just the standard SearchHandler with a default query type of "dismax". see http://wiki.apache.org/solr/DisMaxRequestHandler --> <requestHandler name=”solrDemoDismax" class="solr.SearchHandler" > <lst name="defaults"> <str name="defType">dismax</str> <str name="qf"> page_title^5.0 page_text^1.0 </str> </lst> </requestHandler>

Page 16: Vagif Jalilov Rivet Logic

Compilation• Compiler Engine processes all instructions• Dispatches to appropriate Page Type Compiler

Page 17: Vagif Jalilov Rivet Logic

Content Deployment & Solr Update

Page 18: Vagif Jalilov Rivet Logic

Compiler Instructions<updates deploy-root=”/path/to/content/root"> ...

<update>/solutions/security/article.xml</update><delete>/products/widget/top-section.xml</delete>...

</updates>

Page 19: Vagif Jalilov Rivet Logic

Compilation Types1. Web Pages (HTML)2. Rich Text (PDF)

Page 20: Vagif Jalilov Rivet Logic

Web Page Compilation & Indexing

Indexer Instructions

Page 21: Vagif Jalilov Rivet Logic

HTML Indexer Instruction<?xml version="1.0" encoding="ISO-8859-1"?><add> <doc> <field name="page_url">/solutions/content-mgmt/overview.html</field> <field name="page_title">Increase productivity and streamline workflow

throughout the enterprise</field> <field name="page_description">Commercial enterprises and government agencies

face significant challenges as they strive to meet a rapidly growing need to manage thousands ...</field>

<field name="page_category”>Solutions</field> <field name="page_type">Web Page</field> <field name="page_last_modified">2009-12-18T15:03:57Z</field> <field name="page_text">Rivet Logic addresses many of today's workplace

challenges with Enterprise Content Management (ECM) solutions that enable organizations to transform traditional content repositories and static intranets into dynamic, collaborative work environments through open source functionality. Through ...</field>

</doc> </add>

Page 22: Vagif Jalilov Rivet Logic

Rich Text Compilation & Indexing

Page 23: Vagif Jalilov Rivet Logic

Rich Text Indexer Instruction<?xml version="1.0" encoding="ISO-8859-1"?><add> <doc> <field

name=”page_file">/docroot/static/about-us/press-releases/2010/rl_crafter_studio.pdf</field>

<field name=”page_url”>/about-us/press-releases/2010/rl_crafter_studio.pdf</field>

<field name="page_title”>Rivet Logic launches Crafter Studio for user friendly Web content authoring and publishing.</field>

<field name="page_category">News</field> <field name="page_type">Press Release</field> <field name="page_last_modified">2007-12-19T08:00:00Z</field> <field name="page_file_size”>135</field> </doc></add>

Page 24: Vagif Jalilov Rivet Logic

Compiler Configuration

Page 25: Vagif Jalilov Rivet Logic

Compiler Configuration<compiler-config>

<page-types><page-type

name="Solution Page”compiler="com.rivetlogic.index.compile.ArticleCompiler"><uri-pattern pattern=".*/page-content/solutions/.*(article|

page-metadata|meta-content).xml$" /><properties>

<property field=“page_type” value=“Web Page”/><property field=“page_category”

value=“Solutions”/></properties>

</page-type><page-type

name="Press Release Page”

compiler="com.paetec.index.model.compile.PressReleaseCompiler"><uri-pattern pattern=".*/press-releases/.*/(press-release|

meta-content).xml$" /><properties>

<property field=“page_type” value=“Press Release”/>

<property field=“page_category” value=“News”/></properties>

</page-type><page-types>

<compiler-config>

Page 26: Vagif Jalilov Rivet Logic

Search UI• Full text search• Faceted search on category & type• Pagination or search result clustering• Keyword highlighting in search results• Track user queries

Page 27: Vagif Jalilov Rivet Logic

Search Results Page

Page 28: Vagif Jalilov Rivet Logic

Clustered Results

Page 29: Vagif Jalilov Rivet Logic

Summary• Requirements:

– Real time updates– Full editorial control– Faceted search

• Solution– Alfresco CMS– Alfresco plugin for Solr indexing– Compile updates & index– Serve in UI (ft search + facets)

Page 30: Vagif Jalilov Rivet Logic

Q & A• Thank you for attending :-)• Questions, comments…

Page 31: Vagif Jalilov Rivet Logic

Appendix

Page 32: Vagif Jalilov Rivet Logic

Search Model/API