Solr

19
Solr

description

Solr TechSig 18/4/13 slides.

Transcript of Solr

Page 1: Solr

Solr

Page 2: Solr

What is it?

• Text search index (engine)• Open source• Not a search product• A tool that allows you to create a search

solution

Page 3: Solr

What is it like?

• Google, Google Appliance.• FAST• Oracle Secure Enterprise Search• etc.

Page 4: Solr

Google Appliance:

• Sucks data in• Can’t really configure• Stuck with results• Bonnet is locked

Page 5: Solr

Solr:

• You need to feed data in• Highly configurable• Search results can be tuned• There is no bonnet

Page 6: Solr

Why am I doing a talk?

• Did a course• LucidWorks content• Presented by FindWise• FindWise are a search specialist that use a

range of search engines

Page 7: Solr

Caveats

• Course was in Solr 4.1.0, we use 3.6.1 for APVMA

• Course focussed on search, not ingestion or presentation

• Java API recommended for ingestion• ‘Browse’ interface uses Velocity templates for

presentation, but probably isn’t good enough for most projects.

Page 8: Solr

Where does Solr fit?

Page 9: Solr

Application Architecture

Page 10: Solr

Apache Tika

• Data import handler• Used to be part of Lucene• XML• PDF• Word• Excel• etc.

Page 11: Solr

Manifold CF

• Apache• Connector framework• Used to connect to content repositories (source)• Sharepoint• Documentum• CMIS• JDBC• RSS

Page 12: Solr

Hydra

• FindWise• Although Solr supports validation (e.g.

‘required’), don’t use it for data cleanup.• Validation failure inconvenient: whole job fails• Feed in clean data.• Use Hydra for cleanup.

Page 13: Solr

Apache ZooKeeper

• Used for SolrCloud• Clustering and sharding• Solr 4.1.0 only• Side project for Hadoop• Used to manage Hadoop clusters

Page 14: Solr

Inside

Page 15: Solr

General Approach

• Design schema• Prototyping• Integration

Page 16: Solr

Design Schema

• A data modelling exercise• schema.xml• Dynamic fields can be useful in the first pass:

<dynamicField name=“*" type="string" indexed="true" />

Page 17: Solr

Prototyping

• Get the data in (index)• csv, XML, JSON• post.jar• URL to search and inspect raw results• ‘browse’ interface allows developer to

understand how the search is working• solrconfig.xml

Page 18: Solr

Integration

• Not covered• Content ingestion• Presentation of results• Up to you…

Page 19: Solr

Demo