Twin Cities Drupal Users Group - October 22, 2008
EthicShare: Solr + Drupal
Under the Hood Tour
EthicShare?
• Who: University of Minnesota's Center for Bioethics, the University of Minnesota Libraries, and the University of Minnesota Department of Computer Science and Engineering
• EthicShare’s pilot implementation builds on a recent planning phase that was a collaboration with the University of Virginia, Georgetown University, Indiana University-Bloomington, and Indiana University-Purdue University, Indianapolis.
• What: A sustainable aggregation of bioethics research and a platform for scholarship
• When: Pilot Phase runs from January 2008 - June 2009
• How: Funded by the Andrew W. Mellon Foundation
The Platform• Drupal
• Community Development Framework• Solr
• Faceted Search Appliance
The Process
• Origin: Created by CNET and released January 2006
• Became an Apache Software Foundation project shortly thereafter
• Builds on the Lucene Search Engine Library• Comes with Lucene’s search syntax and features
• Provides simple HTTP/XML API• Strongly typed field definitions• Noteworthy Implementations
Netflix, CNET Reviews, GameSpot, Digg• More: http://wiki.apache.org/solr/PublicServers
Behind the Scenes - Indexing• HTTP/XML API http://localhost:8983/solr/update http://localhost:8983/solr/select• Indexing = POSTing XML Records to /update• Commands: <add><delete><commit/><optimize/>
<add>
<doc>
<field name=”nid">101</field>
<field name=”vid">2</field>
<field name="title">Solr Search is Simply Great</field>
<field name=”body">Solr and Drupal are like PB And J</field>
<field name="changed">1224707462</field>
<field name=”tid">4</field>
<field name=”name">libsys</field>
<field name=”uid">10297</field>
</doc>
</add>
Behind the Scenes - Searching• Get Contents of …/select URL: cURL, file_get_contents($url)…• ApacheSolr makes use of a Solr PHP Client Abstraction Layer
• http://wiki.apache.org/solr/SolPHP
Setup - Solr Directory Layout
Tomcat Files:…/tomcat/webapps/solr_ethicshare.war (cp solr.war from example dir)
…/tomcat/conf/Catalina/localhost/solr_ethicshare.xml
solr_ethicshare.xml - Tell Tomcat About Solr <Context docBase="solr_ethicshare.war" debug="0" crossContext="true" > <Environment name="solr/home" type="java.lang.String" value="/usr/local/solr_home/ethicshare" override="true" /></Context>
Solr Schema - Fields and Types
• Starter schema: – ../drupaldir/sites/all/modules/apachesolr/schema.xml
• <types> ex:– string=solr.StrField– boolean=solr.BoolField
• <fields>– <field name="title" type="string" indexed="true" stored="true"/>
Solr Schema - <type> Analyzers
• Tokenize on whitespace, then remove any common words (StopFilterFactory)
• Remove any duplicates (RemoveDuplicatesTokenFilterFactory)
Solr Schema - Dynamic Fields
<dynamicField name="smfield*" type="string" indexed="true" stored="true" multiValued="true"/>
<dynamicField name="tmfield*" type="text" indexed="true" stored="true" multiValued="true"/>
Solr Schema - Some Example Options• uniqueKey
• <!-- Field to use to determine and enforce document uniqueness.• Unless this field is marked with required="false", it will be a required field• -->
• <uniqueKey>nid</uniqueKey>
• defaultSearchField• <!-- field for the QueryParser to use when an explicit fieldname is absent -->
• <defaultSearchField>text</defaultSearchField>
• solrQueryParser• <!-- SolrQueryParser configuration: defaultOperator="AND|OR" --
• <solrQueryParser defaultOperator="AND"/>
ApacheSolr Search Integration Module• Core Search Integrated• Blocks for facet configuration• Schedules Indexing (via core search)• Theme Hooks for overriding look and feel• CCK Integration
• hook_apachesolr_cck_field_mappings()• Which Fields to Index• How to Index them• Callback to pre-process fields• Whether or Not to Provide a Facet Block
• Help! We need testers for alpha3!
• http://drupal.org/project/apachesolr
• Installing Solr + Tomcat• http://mikejoconnor.net/content/solr-ubercartorg
• Google Book Search API• http://code.google.com/apis/books/
• unAPI• http://unapi.info/
• ApacheSolr Search Integration• http://drupal.org/project/apachesolr
• IBM Developer Works - Solr• http://www.ibm.com/developerworks/java/library/j-
solr1/• SolPHP
• http://wiki.apache.org/solr/SolPHP
Links
Top Related