Multi faceted responsive search, autocomplete, feeds engine & logging

55
Multi-faceted responsive search, autocomplete, feeds engine and logging Remi Mikalsen Search Engineer, utdanning.no

description

Presented by Remi Mikalsen, Search Engineer, The Norwegian Centre for ICT in Education Learn how utdanning.no leverages open source technologies to deliver a blazing fast multi-faceted responsive search experience and a flexible and efficient feeds engine on top of Solr 3.6. Among the key open source projects that will be covered are Solr, Ajax-Solr, SolrPHPClient, Bootstrap, jQuery and Drupal. Notable highlights are ajaxified pivot facets, multiple parents hierarchical facets, ajax autocomplete with edge-n-gram and grouping, integrating our search widgets on any external website, custom Solr logging and using Solr to deliver Atom feeds. utdanning.no is a governmental website that collects, normalizes and publishes study information for related to secondary school and higher education in Norway. With 1.2 million visitors each year and 12.000 indexed documents we focus on precise information and a high degree of usability for students, potential students and counselors.

Transcript of Multi faceted responsive search, autocomplete, feeds engine & logging

Page 1: Multi faceted responsive search, autocomplete, feeds engine & logging

Multi-faceted responsive search, autocomplete, feeds engine and logging

Remi MikalsenSearch Engineer, utdanning.no

Page 2: Multi faceted responsive search, autocomplete, feeds engine & logging

Multi-faceted Multi-faceted responsive search, responsive search, autocomplete, autocomplete, feeds engine and feeds engine and logginglogging

Page 3: Multi faceted responsive search, autocomplete, feeds engine & logging

Introduction

Remi MikalsenSearch engineer, utdanning.no

«Utdanning.no is the official Norwegian national education and career portal, and includes an overview of education in Norway and more than 500 career descriptions» - utdanning.no

« [...] Our main goals are to improve the quality of education and to improve learning outcomes and learning for children, pupils and students thourgh use of ICT in education» - iktsenteret.no

Page 4: Multi faceted responsive search, autocomplete, feeds engine & logging

utdanning.no

Drupal 7 & Solr 3.6

~3 million visitors / year~12,000 documents~18,000,000 terms~260 fields

~1 QPS (~9M searches / year)

~8 ms latency

Page 5: Multi faceted responsive search, autocomplete, feeds engine & logging

Data integration in the CMS

Page 6: Multi faceted responsive search, autocomplete, feeds engine & logging

Universities, colleges and community colleges~30 different endpoints

~3500 documents

Folk high schools(non-academic)

1 national endpoint~650 documents

Secondary schools1 national endpoint~1100 documents

Higher education admissions(Samordna opptak)

1 national endpoint~1500 documents

Secondary schools metadata (Grep)

1 national endpoint~650 documents

Higher education metadata (NUS)

1 national endpoint~3500 documents

Transform & normalize

Drupal 7ER-model

Added value

Editorial staffProfessions, interviews,

education summaries, etc.~1500 documents

Professions metadata(STYRK)

2 national endpoints~1000 documents

Fetch data

Solr 3.6De-normalized

Searchable

Page 7: Multi faceted responsive search, autocomplete, feeds engine & logging

Indexing

Page 8: Multi faceted responsive search, autocomplete, feeds engine & logging

Drupal 7

Apache Solr Search Integration 7.x-1.1

Customizedbusiness logic

Solr 3.6

ProsBasic Drupal integrationTrack document changesSome facet supportEasily extendable

ConsLacks deep introspectingLittle de-normalizationHacky hierarchies (Drupal)

NoteCustom config files!schema.xml(mainly dynamic fields)

solrconfig.xml(mainly a drupal request handler)

We addedDeep introspectingData de-normalizationSolid hierarchy supportPivot facet supportAtomizationManual partial re-index

schema.xml - field types (auto) - various copy fields - better spell - bucket fields - autocomplete

Page 9: Multi faceted responsive search, autocomplete, feeds engine & logging

Organization (school)

Study programStudy programStudy program

Organization (school)

+

all its Study programs

Drupal DB Solr documents

Study program

+

Organization

Page 10: Multi faceted responsive search, autocomplete, feeds engine & logging

<doc> <str name="id">394353</str> <bool name="bs_mainsearch">true</bool> <str name="bundle">org</str> <str name="bundle_name">Organization</str> <str name="label">ACME University</str> <str name="atom">[XML]</str> <arr name="related_nodes"> <str>ACME Rocket Science</str> <str>Study program 2</str> <str>Study program N</str> </arr>

<arr name="sm_geography_hierarchy"> <str>1>California</str> <str>2>California>San Diego</str> <str>3>California>San Diego>Gaslamp Quarter</str> </arr>

<str name="ss_menu_1">orgmenu</str> <str name="ss_menu_2">org</str></doc>

Page 11: Multi faceted responsive search, autocomplete, feeds engine & logging

<doc> <str name="id">394354</str> <bool name="bs_mainsearch">true</bool> <str name="bundle">he</str> <str name="bundle_name">Higher Education</str> <str name="label">ACME Rocket Science</str> <str name="atom">[XML]</str>

<arr name="sm_offered_by"> <str>ACME University</str> </arr> <arr name="sm_study_area"> <str>Engineering</str> <str>Science</str> </arr>

<long name="its_field_semesters">8</long>

<str name="ss_menu_1">edumenu</str> <str name="ss_menu_2">he</str></doc>

Page 12: Multi faceted responsive search, autocomplete, feeds engine & logging

Searching

- Site search

- Embedded search

- Feeds engine

Page 13: Multi faceted responsive search, autocomplete, feeds engine & logging

Site search

Page 14: Multi faceted responsive search, autocomplete, feeds engine & logging

Our goalStudents, councelors and teachers must find what they look for

How? - Interaction design (IxD) vs graphical design - User testing, user testing and user testing (and experience)

- Resulting in a GUI specification we must implement

Page 15: Multi faceted responsive search, autocomplete, feeds engine & logging

Ajax-Solr is our JS framework:https://github.com/evolvingweb/ajax-solr/wiki/reuters-tutorial - manages all querying - widgets for interaction with and displaying results - events fire search requests which updates widgets

We extended it heavily - Developed all our widgets (10+) - Added logging (async, via ajax, local and GA) - Distributed configuration (server + client) - Simplified initialization script

But it also works out of the box!

Page 16: Multi faceted responsive search, autocomplete, feeds engine & logging

Logger~200 lines

JS library~1700 lines Solr 3.6

Our Website

Solr proxy~85 lines

ajax-solrevolvingweb

SolrPhpClientr60Default config

Initialize (config)

JS library(copy)Search

ACME EngineeringLorum sollicitudin nunc id nibh blandit pellentesque ipsum.

ACME LawCras nunc id nibh blandit pellentesque sollicitudin.

ACME MedIpsum ollicitudin nunc id blandit nibh pellentesque nibh.

- Include JS library- Initialize- Set up HTML- Search! (and log)

Page 17: Multi faceted responsive search, autocomplete, feeds engine & logging

Site search – widgets & faceting

Ajax Solr allows defining N widgets

«Everything» is a widget

A facet is an instance of a FacetWidget

Interaction with widgets may fire query

All facetation is piped into one query

All widgets are updated after Solr response

Page 18: Multi faceted responsive search, autocomplete, feeds engine & logging

Some facet widgets we have developed - Plain

Facet values and facet counts in a listMultiple (AND) or single choice

- HierarchicalFacet values and facet counts in a listClicking on a facet value drills down into the hierarchy; facet.prefix + fq

- DropdownDisplays facet values in a dropdown listUseful for mobile devices in our responsive theme

- TagcloudFacet values in a tagcloud

- Pivot facetOur menu system

Page 19: Multi faceted responsive search, autocomplete, feeds engine & logging

Adding facets

Configfacets['interests'] = new facetobject('tagcloud', 'field_interests', '#interests');facets['ispublic'] = new facetobject('plain', 'field_ispublic', '#ispublic');config['facets'] = facets;

HTML<ul id="interests"></ul><ul id="ispublic"></ul>

INITIALIZEManager.addFacets(config);

Page 20: Multi faceted responsive search, autocomplete, feeds engine & logging

Example widget codeAjaxSolr.PlainFacetWidget = AjaxSolr.AbstractFacetWidget.extend({ multivalue: true, target: null, // HTML target id field: null, // Solr-field

facet_display_limit: 5, // Max facets to display before «See more» facet_field_sort: null, // Optional facet sort dependencies: null, // Conditional display of facet

facet_display_more: 'See more', facet_display_less: 'See less',

...

init: function() { ...} beforeRequest: function() { ... } afterRequest: function() { ... }});

Page 21: Multi faceted responsive search, autocomplete, feeds engine & logging
Page 22: Multi faceted responsive search, autocomplete, feeds engine & logging

Site search – pivot facet

Page 23: Multi faceted responsive search, autocomplete, feeds engine & logging

Pivot faceting allows you to facet within the results of the parent facet

- http://wiki.apache.org/solr/SimpleFacetParameters

Slight problem; we don't run Solr 4.x!

Page 24: Multi faceted responsive search, autocomplete, feeds engine & logging

ProblemMenu facets shouldn't affect each other, but affect search result and other facets

Page 25: Multi faceted responsive search, autocomplete, feeds engine & logging

Our solutionSolr document 1 <str name="ss_menu_1">orgmenu</str> <str name="ss_menu_2">org</str>

Solr document 2 <str name="ss_menu_1">edumenu</str> <str name="ss_menu_2">higher_ed</str>

Solr document 3 <str name="ss_menu_1">edumenu</str> <str name="ss_menu_2">secondary</str>

Solr query when a top level menu tab is selected fq={!tag=ss_menu_1}ss_menu_1:edumenu& facet.field={!ex=ss_menu_1}ss_menu_1

Solr query when a sub-level menu tab is selected fq={!tag=ss_menu_1}ss_menu_1:edumenu& fq={!tag=ss_menu_1,ss_menu_2}ss_menu_2:higher_ed& facet.field={!ex=ss_menu_1}ss_menu_1& facet.field={!ex=ss_menu_2}ss_menu_2

Page 26: Multi faceted responsive search, autocomplete, feeds engine & logging

Drawbacks - Can be VERY slow on large indexes with many unique terms in the facet

Why do we do it?

- Small index; 18M terms, 12K documents - Pivot facet fields have very few distinct values (5-8)!

Page 27: Multi faceted responsive search, autocomplete, feeds engine & logging
Page 28: Multi faceted responsive search, autocomplete, feeds engine & logging

Site search - autocomplete

Page 29: Multi faceted responsive search, autocomplete, feeds engine & logging

Our goalGive our users the feeling that we've implemented a mind-reader

How?With relevant, grouped suggestions* as they type in a search query

Do we succeed?50% of our «clicks to content» from searches comes from autocomplete

Page 30: Multi faceted responsive search, autocomplete, feeds engine & logging

Implementing autocomplete is «easy» 1) Ajax 2) Detect keystrokes 3) Send one request per keystroke 4) Receive results, populate result list

Techniques we employ - Minimal payload (reduced fl) - But same boosts and qf as «normal» queries - group=true, group.field=, group.limit= - start_label^1.5 wild_label^1 wild_other^0.25 - Caching (jsonp, cache=true)

Page 31: Multi faceted responsive search, autocomplete, feeds engine & logging

Define field type <fieldType name="startsWith" class="solr.TextField"> <analyzer type="index"> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z])" replacement="" replace="all"/> <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="25" /> </analyzer> <analyzer type="query"> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z])" replacement="" replace="all"/> </analyzer> </fieldType>

Define fields <field name="start_label" type="startsWith" indexed="true" stored="false" multiValued="false"/>

Copy fields <copyField source="label" dest="start_label"/>

Page 32: Multi faceted responsive search, autocomplete, feeds engine & logging

Define field type <fieldType name="wildCardType" class="solr.TextField" omitNorms="true"> <analyzer type="index"> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="70" side="front"/> <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="70" side="back"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt" ignoreCase="false"/> <filter class="solr.NorwegianLightStemFilterFactory"/> </analyzer> </fieldType>

Define fields <field name="wild_label" type="wildCardType" indexed="true" stored="false" multiValued="false"/> <field name="wild_other" type="wildCardType" indexed="true" stored="false" multiValued="true"/>

Copy fields <copyField source="label" dest="wild_label"/> <copyField source="teaser" dest="wild_other"/> <copyField source="body" dest="wild_other"/> <copyField source="searchwords" dest="wild_other"/> <copyField source="related_nodes" dest="wild_other"/>

Page 33: Multi faceted responsive search, autocomplete, feeds engine & logging
Page 34: Multi faceted responsive search, autocomplete, feeds engine & logging

Embedded search

Page 35: Multi faceted responsive search, autocomplete, feeds engine & logging

Our goalLet other sites search our data

How?The exact same way we do ourselves

Do we succeed?Two external sites are up and running and a third is on its way

Page 36: Multi faceted responsive search, autocomplete, feeds engine & logging

Logger~200 lines

JS library~1700 lines Solr 3.6

ACME Website

Solr proxy~85 lines

ajax-solrevolvingweb

ACME config SolrPhpClientr60

Default config

Config (override)

JS library(copy)Search

ACME EngineeringLorum sollicitudin nunc id nibh blandit pellentesque ipsum.

ACME LawCras nunc id nibh blandit pellentesque sollicitudin.

ACME MedIpsum ollicitudin nunc id blandit nibh pellentesque nibh.

- Register with us- Include our JS library- Set up config- Set up HTML- Search! (and log)

Page 37: Multi faceted responsive search, autocomplete, feeds engine & logging

<html> <head>

<title>ACME Website</title>

<!-- utdanning.no search framework --> <script src="/js/jquery.js"></script> <script src="http://example.com/solrservice/js-min/solr-search-full-min.js"></script> <script src="/js/search-init.js"></script>

</head> <body> <!-- Search form --> <form> <input id="query" name="query" type="search" /> <input type="submit" value="Search" /> </form>

<!-- Search results --><div><ul class="hits" id="hits"></ul></div>

</body>

</html>

Page 38: Multi faceted responsive search, autocomplete, feeds engine & logging

<script type="text/javascript">

// ACME mockup init-script

var Manager; // Search manager object uno_config = loadConfig(http://example.com/solrservice/.../acme.config);

// Fully customizable search configuration, e.g.: uno_config['server']['qf'] = 'label^1.8 content^1.2';

// Search box widgetManager.addPlainSearch(uno_config);

// Result list widgetManager.addResults(uno_config);

Manager.finalizeConfig(uno_config);

Manager.doRequest(); // Optional

Page 39: Multi faceted responsive search, autocomplete, feeds engine & logging

Site owners have full controlAdd, edit and configure widgetsQuery fields, boosts, etc.FacetingStylingPre-limit search to parts of our index

Because we eat our own dog food!

Page 40: Multi faceted responsive search, autocomplete, feeds engine & logging

Feeds engine

Page 41: Multi faceted responsive search, autocomplete, feeds engine & logging

Our goalDeliver data in bulk to partner organizations

How?Restful searchable data endpoint that returns XML (Atom++)

Do we succeed?Beta-partner up and running with stunning performance

Page 42: Multi faceted responsive search, autocomplete, feeds engine & logging

ConsumerQuery

Default config

Feeds engine~300 lines

Solr proxy~85 lines Solr 3.6

Logger~200 lines

SolrPhpClientr60

Page 43: Multi faceted responsive search, autocomplete, feeds engine & logging

Feeds engine - Parses incoming query - Loads config (filters, weights, ...) - Transforms incoming + config to Solr URL - Sends to Solr proxy

Solr Proxy - Loads Solr PHP Client library - Sends search request and parses response - Returns results to Feeds engine

Feeds engine - Loads logger and logs results - Picks out ATOM from response - Glues result inside an ATOM frame - Display feed

Page 44: Multi faceted responsive search, autocomplete, feeds engine & logging

http://example.com/data/atom/organizationshttp://example.com/data/atom/organizations/10/2http://example.com/data/atom/organizations?fq=type:HEhttp://example.com/data/atom/organizations?fq=type:HE&q=law

Consume with feeds reader

Page 45: Multi faceted responsive search, autocomplete, feeds engine & logging

Logging

Page 46: Multi faceted responsive search, autocomplete, feeds engine & logging

How?

Logging back-end written in PHP that writes to a MySQL database- called asynchronously from JS library- called inline in Feeds engine

Google Analytics (ga.js)- called from JS library (searchwords and categories)

What?

- Search terms - Facets - User interaction - List of search results - Stack latency (JS, PHP, Solr) - Search domain - Session

Page 47: Multi faceted responsive search, autocomplete, feeds engine & logging

Why?

Most popular queries with no results?

Most popular queries?

How does QPS affect latency?

Follow a user through search (interaction design & user testing)

Displaying logs

Charts are generated with Google Chart Tools in Drupal

Other statistics can easily be explored with Drupal Views

Page 48: Multi faceted responsive search, autocomplete, feeds engine & logging
Page 49: Multi faceted responsive search, autocomplete, feeds engine & logging
Page 50: Multi faceted responsive search, autocomplete, feeds engine & logging
Page 51: Multi faceted responsive search, autocomplete, feeds engine & logging

Demo (includes responsiveness)

Page 52: Multi faceted responsive search, autocomplete, feeds engine & logging

http://utdanning.no/sok

http://utdanning.no/search

http://utdanning.no/solrservice/utdanning.no

Page 53: Multi faceted responsive search, autocomplete, feeds engine & logging

Drupal 7Apache Solr Search Integration+ custom indexingOmega theme (responsiveness with Drupal)+ custom js

Ajax Solr+ custom widgetsSolr Php Client r60+ custom proxyBootstrap (responsiveness without Drupal)

jQueryGoogle Chart Tools

Page 54: Multi faceted responsive search, autocomplete, feeds engine & logging

Remi MikalsenRemi [email protected]@iktsenteret.no

iktsenteret.noiktsenteret.no

Multi-faceted Multi-faceted responsive search, responsive search, autocomplete, autocomplete, feeds engine and feeds engine and logginglogging

Page 55: Multi faceted responsive search, autocomplete, feeds engine & logging

CONTACTRemi [email protected]