Jazzed about Solr: People as a Search Problem - By Joshua Tuberville
-
Upload
lucenerevolution -
Category
Technology
-
view
2.381 -
download
4
description
Transcript of Jazzed about Solr: People as a Search Problem - By Joshua Tuberville
![Page 1: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/1.jpg)
About SolrPeople as A Search Problem
Thursday, May 26, 2011
![Page 2: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/2.jpg)
About Me
• Building websites since 1996, Java since 1997
• Prior web search experience• Building and scaling eHarmony
products since 2002
Thursday, May 26, 2011
![Page 3: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/3.jpg)
What is Jazzed
• Subscription Based Dating Site
• Incubated by eHarmony
Thursday, May 26, 2011
![Page 4: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/4.jpg)
What is Jazzed
• Create a profile• Search for others• View their photos• Privately
Communicate
Thursday, May 26, 2011
![Page 5: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/5.jpg)
What is Jazzed
• Create a profile• Search for others• View their photos• Privately
Communicate
Thursday, May 26, 2011
![Page 6: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/6.jpg)
What is Jazzed
• Create a profile• Search for others• View their photos• Privately
Communicate
Thursday, May 26, 2011
![Page 7: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/7.jpg)
What is Jazzed
• Create a profile• Search for others• View their photos• Privately
Communicate
Thursday, May 26, 2011
![Page 8: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/8.jpg)
How is it different?
• Covers broader range of relationships• Easy to get started• Real profiles screened by machine and
humans• Fast, effective search oriented tools
Thursday, May 26, 2011
![Page 9: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/9.jpg)
Jazzed Stats
• Started Fall 2009• Beta Summer 2010• Launched October 2010• 100,000s of Profiles• 1,000s of Searches Daily
Thursday, May 26, 2011
![Page 10: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/10.jpg)
Jazzed Architecture
• Event-driven SOA• REST, JSON, EIP, Not-only-SQL• Technology incubation
Thursday, May 26, 2011
![Page 11: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/11.jpg)
Tech Stack
• Java 6, Spring 3, Jersey 1.1, JMS (AQMP)
• RHEL 4, Oracle 11g, Voldemort 0.81, Solr 1.4.1, NFS
Thursday, May 26, 2011
![Page 12: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/12.jpg)
Thursday, May 26, 2011
![Page 13: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/13.jpg)
Thursday, May 26, 2011
![Page 14: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/14.jpg)
Not Covered
• Distributed Search• Caching Strategies• Data Import• Analyzers/Tokenizers
Thursday, May 26, 2011
![Page 15: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/15.jpg)
Why Lucene?
• Proven Solid IR library• Prefer Open Source Solutions• Not Only SQL• Flexible Ranking • Pluggable
Thursday, May 26, 2011
![Page 16: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/16.jpg)
Why Solr
• Performant, Extensible, RESTful Service• Configuration, Schema, Multicores• Admin Interface• Replication, Backups, Monitoring
Thursday, May 26, 2011
![Page 17: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/17.jpg)
Open Source
• Strengthens Engineering Team• Be apart of great community• Not Brochure-ware
Thursday, May 26, 2011
![Page 18: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/18.jpg)
Not Only SQL
• One solution does not fit all• Prefer availability over consistency• Horizontal Scaling over Vertical
Thursday, May 26, 2011
![Page 19: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/19.jpg)
Flexible Ranking
• Query Strategies• Boolean Algebra• Vector Space Analysis• Hybrids
• Extensive Function Support• Index and Query Boosting
Thursday, May 26, 2011
![Page 20: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/20.jpg)
...Oh My!
• Standard Plugins - Geospatial*, Faceting, Spelling, MoreLikeThis
• Full Text with Highlighted Results• Client agnostic
Thursday, May 26, 2011
![Page 21: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/21.jpg)
Inevitable Question
• “Does it scale?”• Solr POC Benchmark
• 10 Million profiles• >200 queries/sec under 100ms 90th• Default tuning until 5 million profiles
Thursday, May 26, 2011
![Page 22: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/22.jpg)
Profile Service
• RESTful Hybrid Data Service• Public, Private, Attributes• Event Producer
Thursday, May 26, 2011
![Page 23: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/23.jpg)
Profiles
• Mostly structured• Categories - Eye Color, Desired
Ethnicity• Dates - Birthdate• Numbers - Coordinates, Age Range• Text -Name, Headline
Thursday, May 26, 2011
![Page 24: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/24.jpg)
Inverting People
• Stored as an inverted index
• Index random accessed by term
Term DocumentMALE 1, 3, 5, 7, 9
FEMALE 2, 4, 6, 8, 10HAIR_RED 8
HAIR_BLOND 1, 2, 5, 6EYE_BLUE 1, 2, 3, 10
EYE_BROWN 4, 5, 6, 7, 8, 9fun 1, 3, 7, 9
funny 2, 4, 6, 10beach 1, 2, 3, 4, 5, 6, 7, 8
Thursday, May 26, 2011
![Page 25: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/25.jpg)
Schema Design
• Single “Table”• One-to-many = multi-value fields• Individual vs Composite Fields
• copyTo and have both!
Thursday, May 26, 2011
![Page 26: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/26.jpg)
Field considerations
• Stored or not• Indexed or not• Multivalued - desires fields• Type
Thursday, May 26, 2011
![Page 27: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/27.jpg)
Solr Types Used
• tdate, tint, tfloat* - birthdate, loginAt• text - all text• string - id, non indexed text• random - good for random sorts• enum - for all enumerations
The ‘t’ is for Trie
Thursday, May 26, 2011
![Page 28: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/28.jpg)
Data Duplication
• By function - numberPhotos & hasPhotos
• By relationship - hiddenBy & hidden• By analysis - name & text
Thursday, May 26, 2011
![Page 29: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/29.jpg)
Saving Profiles
• Updating is in memory operation• No partial updates• Commit means flush index changes• Autocommit on maxDocs, maxTime or
both
Thursday, May 26, 2011
![Page 30: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/30.jpg)
Why Also Voldemort
• Private profiles can not be stale• Many fields not searchable or viewable
by others• Isolate queries from fetch by id
Thursday, May 26, 2011
![Page 31: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/31.jpg)
Querying
• Superset of Lucene• Efficient Range Queries• Multiple Query Handlers
• Dismax, Boost, Geo
Thursday, May 26, 2011
![Page 32: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/32.jpg)
Recall vs Precision
• Focus on recall when corpus is small• Precision once it is at critical mass
Thursday, May 26, 2011
![Page 33: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/33.jpg)
Boolean Queries
• Default operator set to AND• +gender:FEMALE +seeking:MALE
+eyeColor:EYE_BLUE +hairColor:(HAIR_RED, HAIR_BLONDE)
• Sort order is important
Thursday, May 26, 2011
![Page 34: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/34.jpg)
Hybrid Queries
• Default operator set to OR• +gender:FEMALE +seeking:MALE
eyeColor:EYE_BLUE hairColor:(HAIR_RED, HAIR_BLONDE)
Thursday, May 26, 2011
![Page 35: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/35.jpg)
Why you’re lucky if you like redheads
• Inverse Document Frequency (IDF)
• Rarer is favored over more common
• More fields matched = higher ranking
1.Blue eyed, redheads2.Blue eyed, blonds3.Redheads4.Blonds
Thursday, May 26, 2011
![Page 36: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/36.jpg)
Boosting
• Query time by importance• eyeColor:EYE_BLUE^2
hairColor:HAIR_BLOND
Thursday, May 26, 2011
![Page 37: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/37.jpg)
Filter Fields
• Useful for roles and other lists
• -hidden:(2 4 6)
id hidden
1 2, 4, 6
2 1
Thursday, May 26, 2011
![Page 38: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/38.jpg)
Filter Fields
• Useful for roles and other lists
• -hidden:(2 4 6)• -hiddenBy:1
id hidden
1 2, 4, 6
2 1
id hiddenBy1 22 14 16 1
Thursday, May 26, 2011
![Page 39: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/39.jpg)
Date Math
• Simplifies query preprocessing• +birthDate:[NOW/DAY+1DAY-36YEAR
TO NOW/DAY-25YEAR]
Thursday, May 26, 2011
![Page 40: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/40.jpg)
Date Math
• Simplifies query preprocessing• +birthDate:[NOW/DAY+1DAY-36YEAR
TO NOW/DAY-25YEAR]
Between 25 and 35 years old
Thursday, May 26, 2011
![Page 41: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/41.jpg)
Distance Searching
• lat, lon, distance• SolrLocal by Patrick O’Leary• Additional overhead ~90ms per query• Superceded in Solr 3.1
Thursday, May 26, 2011
![Page 42: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/42.jpg)
Testing Queries
• Log queries and ids returned• Version your search strategies• Improve one thing at a time
Thursday, May 26, 2011
![Page 43: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/43.jpg)
Geo Service
• Read-mostly service• Fields - Postal Code, Country,
State, Cities, Lat, Lon• Usage - Registration
Validation, City Selection
Thursday, May 26, 2011
![Page 44: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/44.jpg)
Operations
• Servlet container and filesystem• Jetty 6, 64 Java 6 JVM• 8G Heap -XX:+UseCompressedOops
Thursday, May 26, 2011
![Page 45: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/45.jpg)
Operations
• Active/Passive • Layer 7 Load balancing• Nightly snapshots• Eventually SolrCloud
Thursday, May 26, 2011
![Page 46: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/46.jpg)
Multicore
• Run multiple schemas on the same• Hot swappable for backwards
compatible changes• private / public profiles
Thursday, May 26, 2011
![Page 47: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/47.jpg)
Security
• No security provided• At minimum secure
your UpdateHandler• Separate Cores
<delete><query>*:*</query>
</delete>
Thursday, May 26, 2011
![Page 48: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/48.jpg)
Future
• Solr 3.1• Mutual Matching• Faceting / Guided Search• Incorporating spelling• Hierarchies, categories, better ranking
models
Thursday, May 26, 2011
![Page 49: Jazzed about Solr: People as a Search Problem - By Joshua Tuberville](https://reader038.fdocuments.in/reader038/viewer/2022110307/555af27cd8b42abe058b4f66/html5/thumbnails/49.jpg)
Faceting
• Returns counts with query results
• Efficient • Guides the user
toward precision
Thursday, May 26, 2011