Apache Solr 4 Part 1 - Introduction, Features, Recency Ranking and Popularity Ranking

Apache Solr!

Ramzi Alqrainy!

Search Guy!

Part 1!

What !is Apache Solr ?!

Apache Solr!!

“ a standalone full-text search server with Apache Lucene at the backend. “!

Cont.!

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. !

In brief Apache Solr exposes Lucene's JAVA API as REST like API's which can be called over HTTP from any programming language/platform.!

Why!use Apache Solr ?!

Features!

l  Full Text Search!l  Faceted navigation!l  More items like this(Recommendation)/

Related searches !l  Spell Suggest/Auto-Complete!l  Custom document ranking/ordering!l  Snippet generation/highlighting!And a lot More....!

Why Solr ?!

Also, Solr is only provides :!

1. Result Grouping / Field Collapsing!

2. Query Elevation!

3. Pivot Facet!

4. Pluggable Search/update Workflow!

5. Hash-Based Duplication!

Field Collapsing

“ Collapses a group of results with the same field value down to a single (or fixed number) of entries.”!

For example, most search engines such as Google collapse on site so only one or two entries are shown, along with a link to click to see more results from that site. Field collapsing can also be used to suppress duplicate documents.!

Result Grouping

“ groups documents with a common field value into groups, returning the top documents per group, and the top groups based on what documents are in the groups”!

One example is a search at Best Buy for a common term such as DVD, that shows the top 3 results for each category ("TVs & Video","Movies","Computers", etc)!

Query Elevation

enables you to configure the top results for a given query regardless of the normal lucene scoring. This is sometimes called "sponsored search", "editorial boosting" or "best bets".!

Pivot Facet

You can think of it as "Decision Tree Faceting" which tells you in advance what the "next" set of facet results would be for a field if you apply a constraint from the current facet results!

Pluggable Search/update Workflow

You can modify the workflow of existing API endpoints / document instert or updates!

Hash-Based Duplication

Determining the uniqueness of a document not based on ad ID-Field, but the hash signature of a field.!

Useful for web pages for example, where the URL may be different but the content the same.!

Boost documents by age!

•  Just do a descending sort by age = done?!

•  B o o s t m o r e r e c e n t d o c u m e n t s a n d p e n a l i z e o l d e r documents just for being old!

•  U s e f u l f o r n e w s , business docs, and local search !

Solr: Indexing!In schema.xml: <fieldType name="tdate" class="solr.TrieDateField" omitNorms="true" precisionStep="6" positionIncrementGap="0"/> <field name="pubdate" type="tdate" indexed="true" stored="true" required="true" />

Date published = DateUtils.round(item.getPublishedOnDate(),Calendar.HOUR);

FunctionQuery Basics!•  FunctionQuery: Computes a value for each

document!– Ranking!– Sorting!

constant literal fieldvalue ord rord sum sub product

pow abs log sqrt map scale query linear

recip max min ms sqedist - Squared Euclidean Dist hsin, ghhsin - Haversine Formula geohash - Convert to geohash strdist

Solr: Query Time Boost!•  Use the recip function with the ms function:!q={!boost b=$recency v=$qq}& recency=recip(ms(NOW/HOUR,pubdate),3.16e-11,0.08,0.05)& qq=wine

•  Use edismax vs. dismax if possible:!

q=wine& boost=recip(ms(NOW/HOUR,pubdate),3.16e-11,0.08,0.05)

•  Recip is a highly tunable function!–  recip(x,m,a,b) implementing a / (m*x + b) –  m = 3.16E-11 a= 0.08 b=0.05 x = Document Age

Tune Solr recip function!

Tips and Tricks!•  Boost should be a multiplier on the relevancy score !

•  {!boost b=} syntax confuses the spell checker so you need to use spellcheck.q to be explicit!q={!boost b=$recency v=$qq}&spellcheck.q=wine

•  Bottom out the old age penalty using min:!–  min(recip(…), 0.20)

•  Not a one-size fits all solution – academic research focused on when to apply it !

•  Score based on number of unique views!

•  Not known at indexing time!

•  View count should be broken into time slots!

Boost by Popularity!

Popularity Illustrated!

Solr: ExternalFileField!In schema.xml: <fieldType name="externalPopularityScore" keyField="id" defVal="1" stored="false" indexed="false" class=”solr.ExternalFileField" valType="pfloat"/> <field name="popularity" type="externalPopularityScore" />

Popularity Boost: Nuts & Bolts!

Logs Solr Server

User activity logged

View Coun1ng Job

solr-home/data/ external_popularity

a=1.114 b=1.05 c=1.111 …

commit

Popularity Tips & Tricks •  For big, high traffic sites, use log analysis!

– Perfect problem for MapReduce!– Take a look at Hive for analyzing large volumes

of log data!

•  Minimum popularity score is 1 (not zero) … up to 2 or more!–  1 + (0.4*recent + 0.3*lastWeek + 0.2*lastMonth

•  Watch out for spell checker “buildOnCommit”!

Filtering By User Preferences •  Easy approach is to build basic preference

fields in to the index:!– Content types of interest – content_type!– High-level categories of interest - category!– Source of interest – source!

!•  We had too many categories and sources that

a user could enable / disable to use basic filtering!– Custom SearchComponent with a connection to a

JDBC DataSource!

Preferences Component!

•  Connects to a database!•  Caches DocIdSet in a Solr FastLRUCache!•  Cached values marked as dirty using a

simple timestamp passed in the request!!Declared in solrconfig.xml:! <searchComponent ! class=“demo.solr.PreferencesComponent" ! name=”pref">! <str name="jdbcJndi">jdbc/solr</str> ! </searchComponent>! 26

YOU HAVE QUESTIONS …. WE HAVE ANSWERS!

ramzi.alqrainy@gmail.com!

References!

•  h5p://wiki.apache.org/solr/ •  h5p://www.lucidworks.com/ •  Apache Solr 4 Cookbook

Apache Solr 4 Part 1 - Introduction, Features, Recency Ranking and Popularity Ranking

Technology

Transcript of Apache Solr 4 Part 1 - Introduction, Features, Recency Ranking and Popularity Ranking

Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin

Recency in Verb Phrase Attachment

Boosting Documents in Solr by Recency, Popularity and Personal Preferences - By Timothy Potter

SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance

SOLR - vtechworks.lib.vt.edu · data from HBase Timestamp conversion ... Ranking Didn't use clustering and classification for recommendation Recommendation handler based on clustering

Solr - home.apache.orgpeople.apache.org/~yonik/presentations/Solr_notes.pdf · solr/data/index Master solr/data/index Searcher new segment solr/data/snapshot-2006062950000 1. hard

Apache Solr

Boosting Documents in Solr by Recency, Popularity, and User Preferences

NYC Lucene/Solr Meetup: Spark / Solr

UCSF Kenya DREAMS - Recency Learning Hub · 2019-07-01 · DREAMS RECENCY HUB LAB FORM Finalized on Wed, Feb 13, 2019 at 17:37 DREAMS RECENCY HUB LAB FORM Finalized on Thu, Feb 14,

Basics of Solr and Solr Integration with AEM6

Time is of Essence: Improving Recency Ranking Using Twitter Data

Oak / Solr integration Tommaso Teofili · Oak / Solr integration Tommaso Teofili . adaptTo() 2012 ! Why ! Search on Oak with Solr ! Solr based QueryIndex ! Solr based MK ! Benchmarks

Apache Solr Cookbook - the-eye.euApache Solr Cookbook iii 4 Solr autocomplete example 27 4.1 Install Apache Solr ...

Solr Recipes

Solr Flair: Search User Interfaces Powered by Apache Solr

Computation of “ReCenCy” CRiminal HistoRy points undeR ... · 8/18/2010 · “recency” points, this criminal history provision is not the only measure of “recency” in

Solr Flair

Optimizing SOLR to Improve Searchinfo2.magento.com/rs/magentosoftware/images/SOLR... · Agenda ! Overview of SOLR ! Basic Solr Troubleshooting – Common SOLR Troubleshooting and

2021-2022 Flight Crew Recency Requirements