Hatcher Erik - Rapid Prototyping with Solr

Rapid Prototyping with Solr

Erik Hatcher, Lucid Imagination erik.hatcher @ lucidimagination.com, May 25, 2011

Abstract §  Got data? Let's make it searchable! This interactive

presentation will demonstrate getting documents into Solr quickly, will provide some tips in adjusting Solr's schema to match your needs better, and finally will discuss how showcase your data in a flexible search user interface. We'll see how to rapidly leverage faceting, highlighting, spell checking, and debugging. Even after all that, there will be enough time left to outline the next steps in developing your search application and taking it to production.

My Background §  Erik Hatcher

•  Lucid Imagination §  Technical Staff

•  Co-author §  Java Development with Ant / Ant in Action (Manning) §  Lucene in Action (Manning)

•  Apache Software Foundation §  Committer – Lucene / Solr §  PMC – Lucene TLP §  Member

Why prototype? §  Demonstrate Solr can handle your data and

searching needs; mitigate risk, learn the unknown

§  It’s quick and easy, with very little time investment

§  Immediate functional user interface impresses decision makers and target users; get buy-in •  The user interface IS the app

Prior Art §  Hoss’ amazing ISFDB work

•  http://www.lucidimagination.com/blog/tag/isfdb/ §  Previous “Rapid Prototyping with Solr” presentations

•  Data.gov Catalog on Solr: http://www.lucidimagination.com/blog/2010/11/05/data-gov-on-solr/

•  Rich text files on Solr: http://www.lucidimagination.com/Community/Hear-from-the-Experts/Podcasts-and-Videos/Rapid-Prototyping-Search-Applications-Solr

•  CSV (conference attendee data) on Solr: http://www.slideshare.net/erikhatcher/rapid-prototyping-with-solr-4312681

Rapid Prototyping using CSV §  Fired up Solr’s example configuration §  /update/csv

•  http://localhost:8983/solr/update/csv? commit=true&stream.file=EuroCon2010.csv&fieldnames=first,last,company,title,country&header=true&f.country.map=Great+Britain:United+Kingdom

§  Tweak configuration •  schema: domain-centric field names •  solrconfig: /browse request handler •  Template adjustments

§  Instant classic search results view, tree map visualization of facet data, and random selection of contest winners

CSV results

… using rich text files §  curl "http://localhost:8983 /solr/update/extract?

stream.file=/docs/file.pdf &literal.id=/docs/file.pdf

… using Data.Gov catalog data §  /update/csv – again!

Explaining

Suggest

Venn Viz

E-commerce data §  http://bbyopen.com/ §  Product data, via easy HTTP JSON API

Ingesting the data require 'solr’!#...!1.upto(max_pages) do |page|! puts "Processing page #{page}"! json = fetch_page(page)! ! response = JSON.parse(json, :symbolize_names=>true)! puts "Total products: #{response[:total]}" if page == 1!! mapping = {! :id => :sku,! :name_t => :name,! :thumbnail_s => :thumbnailImage,! :url_s => :url,! :type_s => :type,! :category_s => Proc.new {|prod| ! prod[:categoryPath].collect {|cat| cat[:name]}.join(' >> ')},! :department_s => :department,! :class_s => :class,! :subclass_s => :subclass,! :sale_price_f => :salePrice! }!! Solr::Indexer.new(response[:products], mapping, ! {:debug => debug, :buffer_docs => 500}).index!end!

solr-ruby’s secret power §  Solr::Indexer.new(

source, mapping, options ).index

§  “Quacks like a duck” §  source simply #each’s §  mapping simply #[]’s

… on Prism

What is Prism? §  Yet another opinionated brainstorm from Erik §  https://github.com/lucidimagination/Prism §  Under the covers

•  Ruby §  because it’s beautiful

•  Sinatra §  to be lightweight and have elegant flexible routing

•  Velocity §  because it is easy to learn and use, and has powerful features, facilitates

edit/refresh work

§  Separate from Solr, Rack-savvy, allows easy coding of new routes and capabilities

§  Designed to work with any arbitrary Solr instance, and already has some basic LucidWorks Enterprise capability

§  Totally a proof-of-concept at this point – just a quick hack

… on Solritas

Solritas? §  Pronounced: so-LAIR-uh-toss §  Celeritas is a Latin word, translated as "swiftness" or

"speed". It is often given as the origin of the symbol c, the universal notation for the speed of light - http:// en.wikipedia.org/wiki/Celeritas

§  Technically it’s the VelocityResponseWriter (wt=velocity) •  simply passes the Solr response through the Apache

Velocity templating engine §  http://wiki.apache.org/solr/VelocityResponseWriter §  Built into Solr, available instantly out of the box at:

http://localhost:8983/solr/browse

… on Blacklight

Blacklight? §  http://projectblacklight.org/ §  Blacklight is a free and open source Ruby on Rails based

discovery interface (a.k.a. “next-generation catalog”) especially optimized for heterogeneous collections. You can use it as a library catalog, as a front end for a digital repository, or as a single-search interface to aggregate digital content that would otherwise be siloed.

§  Production sites: •  http://search.lib.virginia.edu/ •  http://searchworks.stanford.edu/

§  Features: •  Authentication •  Saved searches •  Bookmarks – saved result items •  Selected items – for exporting to 3rd party systems •  Customizable / extensible UI

Prototyping Tips and Tools §  Get data into Solr in the simplest possible way

•  CSV – if it fits, it’s really nice §  Schema adjusting

•  <dynamicField name="*" type="string" multiValued="true"/> •  <copyField source="*" dest="text"/>

§  Data analysis •  Understand what Solr is doing with your fields •  Solr’s Schema Browser and /admin/luke request handler

§  UI •  /browse – easy tweaking of <solr-home>/conf/velocity/*.vm

templates

Now what? §  Script the indexing process: full and

incremental/delta §  Work with real users on real needs §  Integrate into production systems §  Iterate on schema enhancements and

configuration tweaks §  Deploy to staging/production environments and

work at scale: collection size, real queries and volume, hardware and JVM settings

Test §  Performance §  Scalability §  Relevance §  Automate all of the above, start baselines,

avoid regressions

Thanks!

Hatcher Erik - Rapid Prototyping with Solr

Documents

Transcript of Hatcher Erik - Rapid Prototyping with Solr

Solr Flair: Search User Interfaces Powered by Apache Solr

Problems Hatcher

Optimizing SOLR to Improve Search - Magentoinfo2.magento.com/rs/magentoenterprise/images/SOLR... · 2020-06-08 · Agenda ! Overview of SOLR ! Basic Solr Troubleshooting – Common

Payloads in Solr - Erik Hatcher, Lucidworks

Experience with a Cluster JVM Philip J. Hatcher University of New Hampshire hatcher@unh.edu.

Solr JDBC - Lucene/Solr Revolution 2016

Inside Solr 5 - Bangalore Solr/Lucene Meetup

-ruby - Code4Lib · -ruby the best open source search engine + ruby rubyconf 2007 Presented by: Erik Hatcher. Solr • Search server • Enterprise scale (100M+ documents), very fast

NYC Lucene/Solr Meetup: Spark / Solr

The%NoSQL%Database% - home.apache.orgpeople.apache.org/~yonik/presentations/solr4_nosql... · EarliestHA% Solr%Conﬁguraons% Load%Balancer% Appservers% Solr%Searchers% Solr%Master%

SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance

Rapid prototyping with solr - By Erik Hatcher

Code4Lib · Getting Started with Lucene Debugging Relevance Issues in Search Optimizing Findability in Lucene and Solr Yonik Seeley Faceted Search with Solr Erik Hatcher Getting Started

Oak / Solr integration Tommaso Teofili · Oak / Solr integration Tommaso Teofili . adaptTo() 2012 ! Why ! Search on Oak with Solr ! Solr based QueryIndex ! Solr based MK ! Benchmarks

Solr - home.apache.orgpeople.apache.org/~yonik/presentations/Solr_notes.pdf · solr/data/index Master solr/data/index Searcher new segment solr/data/snapshot-2006062950000 1. hard

Apache Solr CMS Integration @ Lucene/Solr Revolution San Diego 2013

Richard Hatcher & Tricia Le Gallais Birmingham City University R Hatcher & T Le Gallais BCU_WEP1.

ISIS Hatcher Manual Spanish.pdf

Hatcher Solutions

Mark Hatcher VMET Transcript