Webinar: Replace Google Search Appliance with Lucidworks Fusion
-
Upload
lucidworks -
Category
Technology
-
view
517 -
download
2
Transcript of Webinar: Replace Google Search Appliance with Lucidworks Fusion
Replacing GSA with Lucidworks Fusion
Evan Sayer Senior Search Engineer
Lucidworks
Guy Sperry Enterprise Content Management & Big Data Architect
County of Sacramento
Introduc)on
• Lucidworks – Founded in 2007 – Contributes ~70% of the open-‐source code commiJed to the Apache Lucene/Solr project
• Lucidworks Fusion: our enterprise search plaNorm built on top of Apache Solr
• Apache Solr: the most popular open-‐source enterprise search engine on Earth
Google Search Appliance (GSA)
• Google’s enterprise search soluPon offered from 2002-‐2016
• One-‐stop shopping: a complete enterprise-‐search soluPon in one box
• EoL as of February 2016, support phased-‐out completely by 2018
GSA Strengths
• Easy to setup and configure – “plug and play” – Lower start-‐up cost and lower Pme-‐to-‐value than many other contemporary soluPons
– RelaPvely straighNorward to operate on an ongoing basis
– Achieve a decent search experience quite quickly and easily
• Takeaway: GSA minimized necessary investment in technical experPse
Replacing GSA with Fusion
• Easy to setup and configure, “plug and play” – Fusion Index Workbench
• Quickly connect to and ingest data • IntuiPvely iterate on improving search results • Easily A/B test tweaks to ETL logic
– Dashboards and Log AnalyPcs – Monitoring/alerPng APIs that integrate with common tools to ease ongoing maintenance
GSA Strengths
• Out-‐of-‐box search UI – Highly useful during development, iteraPng on relevancy improvements, etc.
– Customizable enough to use as an end-‐user search UI
• Takeaway: GSA minimized necessary investment in technical experPse
Replacing GSA with Fusion
• Out-‐of-‐box search UI – Lucidworks View
• Highly customizable/”skin-‐able”
• Fully open-‐source: hJps://github.com/lucidworks/lucidworks-‐view
• Built on top of a modern stack (AngularJS)
GSA Strengths
• Broad support for connecPng to, ingesPng, and securing data – Many out-‐of-‐box connectors to common sources: CRM, Wikis, databases etc.
– Extensible connector framework • Takeaway: GSA minimized necessary investment in technical experPse
Replacing GSA with Fusion
• Broad support for connecPng to, ingesPng, and securing data – Fusion ships with ~40 connectors to common sources
• JDBC, Web, Alfresco, Box, Dropbox, Drupal, Github, Google Drive, Jive, JIRA, Sharepoint, MongoDB, Hadoop/HDFS, Salesforce, Slack, lots more…
• Fusion connectors’ security-‐trimming funcPonality secures content/searches out-‐of-‐box
– Fusion Index Pipelines enable easily pushing data into the index as well, via a REST API
– Custom connector development via Fusion’s Connectors API
GSA Weaknesses
• Broad theme: insufficient control over the search experience – Relevancy tuning and controls are exceedingly opaque
• “Source Biasing”: +/-‐ [strong|medium|weak] – Lack of control over indexing workflow
• Custom metadata processing was a chore, if feasible – Oren referred to as a “black box” design
• Non-‐trivial to scale – Appliance packaging restricts freedom in scaling up – Per-‐document pricing model
• Incorrect facet counts!?
Fusion – Fine-‐grained Control over *Everything*
Fusion – Fine-‐grained Control over *Everything*
• Fusion Index Pipelines – True fine-‐grained control over ETL; as much or as liJle as desired
• For content from source X, I want to redact this set of keywords • For content from source Y, I want to extract the Ptle from this HTML tag • For content from source Z, I want to lookup the authorized groups from another database, and add
them to a field in each document
• Fusion Query Pipelines – True fine-‐grained control over request/response logic at query-‐Pme
• For queries containing keyword X, I want to rewrite the query to be something else • For queries in language Y, I want to boost results matching in this separate set of fields • For matching documents containing keyword Z, I want to redact all occurrences of Z before returning
the results – Fusion signals: collect users’ queries+clicks and aggregate them over Pme
• UPlize this knowledge to dynamically boost the most commonly-‐clicked item(s) for a given query • ConPnually improve relevancy without manual human input
• If you’re already familiar with Solr/Lucene, hack away! J
Fusion – Fine-‐grained Control over *Everything*
• Scaling – Fusion uPlizes best-‐in-‐class Apache Solr as the backend search engine
• Scale to billions of documents linearly
– Fusion services scale independently • As opposed to GSA, which scaled in units of enPre appliances • If you want to ingest content faster, add addiPonal connectors nodes • If you want to enable greater query throughput, add addiPonal query-‐processing nodes
– StraighNorward APIs/processes for provisioning addiPonal nodes • Just spin up a new node, install Fusion, and point it at the central cluster manager
(Apache Zookeeper)
• Easily overlay Fusion on top of any exisPng Solr cluster
Fusion as a plaDorm
• Get started with ease: hJps://lucidworks.com/products/fusion/download/ 1. Point Fusion at your data 2. Setup a simple baseline search app with Lucidworks View 3. Iterate on the actual search experience to your heart’s content J
• Delve into the details (or don’t!) – Fusion provides the necessary framework to tackle tough and/or use-‐case-‐specific search
problems – Anything but a “black box” design – Most components are customizable and extensible
• Implement your own Fusion components in Java using our APIs
• Scale with minimal effort, maximal flexibility – Scale linearly up to billions of docs with Apache Solr – Self-‐service APIs for se{ng up addiPonal nodes to expand capacity – Per-‐node instead of per-‐doc pricing means fewer surprises when it’s Pme to renew licenses
“Fusion gave us the features we needed to replace Google Search Appliance in a matter of weeks. With Fusion’s out-of-the-box capabilities, we skipped months in our dev cycle so we could focus our team where they would have the most impact. We cut our licensing costs by 50% and improved application usability. The Lucidworks professional services team amplified our success even further.
“We’re all Fusion from here on out!”
Lourduraju PamishettySenior IT Application ArchitectInfoblox
Customers Who’ve Made the Switch
Fusion as a plaDorm
• Accurate facet counts – What a concept! J
• Take Fusion for a spin: hJps://lucidworks.com/products/fusion/download/
Agenda
• IntroducPon to County of Sacramento
• Why Sacramento County is search first for data delivery
• How Fusion helps us meet our data delivery challenges
• How Fusion has helped us fill gaps ler by GSA rePrement
Sacramento County
• 34 departments and affiliated organizaPons serving 1.5 million people
• Commitment to open government and transparency
• CiPzen engagement
Why Sacramento County is Search First
• Enterprise apps, data snackers and LOB apps – ADABAS (Mainframe)
– RDBMS
– CDH
– ECM
• Diverse, heterogeneous data environment
• Our challenge: securely deliver prompt access to relevant data
Fusion/Solr in Sacramento County
• Documents and content – Cross-‐repository search
– Source repository security
• GIS
• Cross-‐Source Data Processing and AnalyPcs – Fusion connectors
– Spark in Fusion
• Log Analysis
• NOSQL – Why be MEAN when you can be SANE?
Gaps LeH by GSA
Fusion was our final GSA patch
AgendaSearch.saccounty.net
• The Brown Act – Make public meePngs accessible to ciPzens – Maintain transparency
• AgendaSearch – Search and consume public documents – Integrate with agenda management – Lucidworks View – Has reduced PRAs
Immediate Win with View
• County Legal Counsel
• ~2 million document archive
• Document level security
• IntuiPve and feature rich UI
• Search soluPon delivered before lunch
Q&A
Resources:
• Download Fusion: hJps://lucidworks.com/products/fusion/download/
• Lucene/Solr RevoluPon 2016 – Oct 11-‐14 – Boston, MA: lucenerevoluPon.org