Gettingstarted With Lucid Works Enterprise

download Gettingstarted With Lucid Works Enterprise

of 41

Transcript of Gettingstarted With Lucid Works Enterprise

  • 8/8/2019 Gettingstarted With Lucid Works Enterprise

    1/41

    Getting Started With

    LucidWorks EnterpriseA Lucid Imagination

    Technical White Paper

  • 8/8/2019 Gettingstarted With Lucid Works Enterprise

    2/41

    Getting Started With LucidWorks Enterprise

    A Lucid Imagination Technical White Paper October2010 Page i

    2010 by Lucid Imagination, Inc. under the terms of Creative Commons license, as detailed at

    http://www.lucidimagination.com/Copyrights-and-Disclaimers/. Version 1.5, published 7 October 2010. Solr,

    Lucene, and their logos are trademarks of theApache Software Foundation.

    http://www.lucidimagination.com/Copyrights-and-Disclaimers/http://www.lucidimagination.com/Copyrights-and-Disclaimers/http://www.apache.org/http://www.apache.org/http://www.apache.org/http://www.apache.org/http://www.apache.org/http://www.apache.org/http://www.lucidimagination.com/Copyrights-and-Disclaimers/
  • 8/8/2019 Gettingstarted With Lucid Works Enterprise

    3/41

    Getting Started With LucidWorks Enterprise

    A Lucid Imagination Technical White Paper October2010 Page ii

    Abstract

    LucidWorks Enterprise is the search solution development platform built on the power of

    Apache Solr/Lucene technology, developed by the enterprise search experts at Lucid

    Imagination. LucidWorks Enterprise leverages the disruptive innovation of the leading

    open source search technology to deliver unmatched scalability to billions of documents,

    with subsecond query and faceting response time. By building and expanding the scalablepower of Solr open source technology with vital new features, the search experts at Lucid

    Imagination have created an integrated platform that simplifies and empowers predictable,

    reliable search application development.

    This document is intended to provide you with a basic working knowledge of the

    LucidWorks Enterprise search development platform. It provides you with an overview of

    the softwares functions and an explanation of how to use them from the provided user

    interface, as opposed to from a programming perspective.

    You will learn about installation, indexing content from local files, web sites, and databases,

    and searching, as well as improving the user experience using features such as user alerts,auto-complete and spell-check.

    This document does not require any previous programming experience.

  • 8/8/2019 Gettingstarted With Lucid Works Enterprise

    4/41

    Getting Started With LucidWorks Enterprise

    A Lucid Imagination Technical White Paper October2010 Page iii

    Table of Contents

    introduction ............................................................................................................................................................ 2What Youll Learn In This Document........................................................................................................ 2What This Document Wont Teach You .................................................................................................... 3How LucidWorks Enterprise Works .......................................................................................................... 3

    Installation ............................................................................................................................................................... 4

    Using The Installation Wizard ...................................................................................................................... 4Installing Via Command Line ........................................................................................................................ 6Testing The Installation .................................................................................................................................. 7

    Basic Searching ................................................................................................................................................... 11Understanding Search Queries ................................................................................................................. 11

    Searching Individual Fields ................................................................................................................... 13Range Queries ............................................................................................................................................. 13

    Faceted Searching .......................................................................................................................................... 13

    Improving Search Coverage ........................................................................................................................... 16Understanding Fields ................................................................................................................................... 16Indexing The Local Filesystem .................................................................................................................. 19Indexing Http Documents ........................................................................................................................... 20Indexing Database Records ........................................................................................................................ 21Indexing Solr Documents ............................................................................................................................ 23Scheduling Tasks ............................................................................................................................................ 24

    Deleting Test Documents ............................................................................................................................ 25

    Improving The Search Experience .............................................................................................................. 27User Alerts ........................................................................................................................................................ 27Helping Users Create Their Queries ....................................................................................................... 29

  • 8/8/2019 Gettingstarted With Lucid Works Enterprise

    5/41

    Getting Started With LucidWorks EnterpriseA Lucid Imagination Technical White Paper October2010 Page 1

    Auto-Complete............................................................................................................................................ 29Spell-Checking ............................................................................................................................................ 30Find Similar Links ..................................................................................................................................... 30Enabling These Functions ...................................................................................................................... 31Specifying Fields ........................................................................................................................................ 32Indexing ........................................................................................................................................................ 33

    Improving Relevancy ........................................................................................................................................ 33Synonyms .......................................................................................................................................................... 34Stopwords ......................................................................................................................................................... 35Click Scoring ..................................................................................................................................................... 36

    Summary ............................................................................................................................................................... 37Next Steps ............................................................................................................................................................. 37

  • 8/8/2019 Gettingstarted With Lucid Works Enterprise

    6/41

    Getting Started With LucidWorks EnterpriseA Lucid Imagination Technical White Paper October2010 Page 2

    Introduction

    Welcome to LucidWorks Enterprise, the search platform that takes the power of

    Solr/Lucene and delivers it to you in one convenient, supported package. This document

    will take you from the very beginning of installing the software down through some of the

    things youll need to know to make the most of it.

    LucidWorks Enterprise has been designed to provide you with the search capabilities and

    benefits of Solr while still providing the ease of use you need to work efficiently in an

    environment in which data is everywhere, and you need to get a handle on it. While it does

    provide some great opportunities for programmers to take control and build powerful

    search applications using those capabilities, its also been designed to take much of the pain

    out of using such a complex system.

    As such, many of the things you can do with LucidWorks Enterprise can be accomplished

    without any programming at all. The administrative user interface provides a way to index

    documents for searching, make queries, and even learn about how your system is being

    used. It also provides a web interface to most of the functions youll need to run your

    system.

    What Youll Learn in this DocumentThis document is meant to give you a running start on getting the most out of LucidWorks

    Enterprise. It teaches you about:

    How LucidWorks Enterprise works

    Installing the software and indexing your first test collection

    Searching for data, and how to make the most of the individual fields youve indexed

    How to use faceted searches to filter results

    How to index local files (such as word processing documents), web pages (such asthose on local or remote web sites), and even databases

    How to tell LucidWorks about specific attributes of your data

  • 8/8/2019 Gettingstarted With Lucid Works Enterprise

    7/41

    Getting Started With LucidWorks EnterpriseA Lucid Imagination Technical White Paper October2010 Page 3

    How to configure user-friendly features such as user alerts for new content, auto-complete, spell-check, and the ability to find related results that dont necessarily

    contain the specified search term

    How to improve the quality of search results by specifying synonyms and stopwords, common words that should be left out of a search

    What ThisDocument Wont Teach You

    This document is intended to get you started using LucidWorks Enterprise; its not aprogrammers guide. In addition to numerous ReST-based APIs, LucidWorks Enterprise

    gives you complete access to the source code underlying the platform, so you have full

    control over query handling, results relevancy, and other factors. For more information on

    making use of these capabilities, see the product documentation.

    How LucidWorks Enterprise WorksLucidWorks Enterprise works by indexing data, or breaking it down into individual

    words or terms, each of which is assigned to a field, against which you can later search. A

    collection of fields is considered a document. For example, a PDF on your hard drive

    might have fields for author, title, and text, and these three fields make up thedocument.

    Because you can control not only the names of these fields, but also how theyre treated by

    the indexer and the query engine, you have a great deal of flexibility when it comes to

    indexing your data. For example, you might index product data out of your database, and

    specify that you want the title and description to be indexed (searchable) and that the id

    column is to be treated as a unique key, so that if you update the database, each product

    document in the index can be updated appropriately.

    Once youve indexed your data, its ready for searching. The query parser takes the users

    request and compares it to the data stored in the index. If it finds a relevant match (or

    matches) it then returns information about all of the matching documents.

    LucidWorks Enterprise provides a convenient web-based interface for indexing content,

    and for controlling the types of information to be returned for each document that satisfies

    a query. You can then build upon the application to decide how to use that data. That said,

  • 8/8/2019 Gettingstarted With Lucid Works Enterprise

    8/41

    Getting Started With LucidWorks EnterpriseA Lucid Imagination Technical White Paper October2010 Page 4

    the LucidWorks Enterprise interface provides everything you need to see the results of a

    query, with no programming required.

    But first, youll need to install the software.

    Installation

    At the time of this writing, LucidWorks Enterprise is available as a limited distribution

    Developer Access Release, fromhttp://www.lucidimagination.com/lucidworks-enterprise.

    Once you download the software, youre ready to run the installer.

    LucidWorks Enterprise provides both a graphical and a command-line option for

    installation. Both provide the same options, but the command-line version can be used for

    systems where a graphical user interface isnt an option.

    Note that this document is meant to be a guide, and not a comprehensive look at

    installation. If you need more details, please search the downloaded documentation for

    Installation.

    Using the Installation WizardTo install LucidWorks Enterprise using the GUI, perform the following steps:

    1) Make sure that you have Java 1.6 (JDK or JRE) installed on your machine. You candownload Java from

    http://www.oracle.com/technetwork/java/javase/downloads/index.html.

    2) Double-click the installer to start it. If you are using the *.jar file and double-clickingdoesnt start the installer, go to the command line and type

    java -jar lucidworks-enterprise-installer.jar

    (Make sure to use the correct filename.)

    3) Click Next to go to the system requirements. The typical desktop machine willhandle small-scale data collections -- less than 100,000 documents, depending on

    your specific hardware and existing software, as well as how you intend to use the

    data. You will need 8GB to 16GB for a large scale deployment. LucidWorks

    Enterprise runs on Windows (XP or higher), Linux (kernel 2.4 or higher) and MacOS

    (10.5 or higher).

    http://www.lucidimagination.com/lucidworks-enterprisehttp://www.lucidimagination.com/lucidworks-enterprisehttp://www.lucidimagination.com/lucidworks-enterprisehttp://www.oracle.com/technetwork/java/javase/downloads/index.htmlhttp://www.oracle.com/technetwork/java/javase/downloads/index.htmlhttp://www.oracle.com/technetwork/java/javase/downloads/index.htmlhttp://www.lucidimagination.com/lucidworks-enterprise
  • 8/8/2019 Gettingstarted With Lucid Works Enterprise

    9/41

    Getting Started With LucidWorks EnterpriseA Lucid Imagination Technical White Paper October2010 Page 5

    4) Click Next. Read the user agreement, make sure it doesnt contain anythingobjectionable, and click I accept the terms of this license agreement. Click Next

    again.

    5) Choose which components you want to install on this machine, and the addressesfrom which you want to access them.

    The default is to install and activate all components on the local server using ports

    8888 and 8989, and for beginning purposes this is just fine. If this configuration

    conflicts with existing applications, however, feel free to change the port numbers.

    Keep in mind, however, that when they are installed during the same session, the

    SearchUI, AdminUI, and Alerts need to use the same port.

    If you are installing components on different machines or ports, be sure to alter the

    addresses to point to the appropriate server and port.

    Click Next to continue.

  • 8/8/2019 Gettingstarted With Lucid Works Enterprise

    10/41

    Getting Started With LucidWorks EnterpriseA Lucid Imagination Technical White Paper October2010 Page 6

    6) Select the installation path and click Next. Click OK to let the installer create thenew directory.

    7) Review the selected options and click Next to install the software. Depending onyour system, this may take several minutes. When the progress bar shows that the

    installation is complete, click Next.

    8) LucidWorks Enterprise takes a bit longer than usual to start for the first time, somake sure that Start LucidWorks Enterprise is checked, and then click Next to

    continue with the installation.

    9) Click the Next button when it becomes available.

    10)Decide whether to create shortcuts for other users, and where to place them, andthen click Next.

    11)If all has gone well, you will see a screen telling you LucidWorks Enterprise has beeninstalled, and offering the opportunity to create an installation script. If you are

    installing on multiple computers, this script will simplify the process by pre-filling

    the values you chose for this installation. Click Done to dismiss the installer.

    Installing Via Command LineThe process for installing via the command line is virtually identical to installing via thegraphical interface. Follow these steps:

    1) Make sure that you have Java 1.6 (JDK or JRE) installed on your machine. You candownload Java from

    http://www.oracle.com/technetwork/java/javase/downloads/index.html.

    2) In the directory in which the *.jar file is located, execute the following command:

    java -jar lucidworks-enterprise-installer.jar --console

    (Again, make sure to use the correct file name.)

    3) Follow the steps presented in the installer. For information on specific steps, seethe instructions above.

    http://www.oracle.com/technetwork/java/javase/downloads/index.htmlhttp://www.oracle.com/technetwork/java/javase/downloads/index.htmlhttp://www.oracle.com/technetwork/java/javase/downloads/index.html
  • 8/8/2019 Gettingstarted With Lucid Works Enterprise

    11/41

    Getting Started With LucidWorks EnterpriseA Lucid Imagination Technical White Paper October2010 Page 7

    Testing the InstallationYou can test the new installation by pulling up the administration user interface in your

    browser. To do that, go to:

    http://localhost:8989

    If everything installed and started properly, you will see the generic search page:

    Of course, at this point, you dont actually have any data indexed, so theres no point

    searching. To index data, you will need to log into the admin UI, so click the login link atthe top of the page and log in using:

    Username: admin

    Password: admin

    (This user is pre-installed; before going to production, you will want to either use the User

    API or an LDAP directory to manage your users.)

    Logging in will bring you to the Quick Start page, where you can index content. Just to have

    something to search against, click Local Filesystem and enter the path for a small-ish

    directory of documents.

    http://localhost:8989/http://localhost:8989/
  • 8/8/2019 Gettingstarted With Lucid Works Enterprise

    12/41

    Getting Started With LucidWorks EnterpriseA Lucid Imagination Technical White Paper October2010 Page 8

    Click Continue.

    For now, choose to index this content immediately.

    Click Finish to go to the Dashboard Summary page.

  • 8/8/2019 Gettingstarted With Lucid Works Enterprise

    13/41

    Getting Started With LucidWorks EnterpriseA Lucid Imagination Technical White Paper October2010 Page 9

    This page shows you whats been going on with your system. In this case, we havent had

    any queries, so we can just see the number of documents that have been indexed. Thispage updates automatically, so by watching it, youll know when indexing has been

    completed. The final data will show under Recently Completed.

    Finally, we have data to search! Click the Search tab to go to the search page.

  • 8/8/2019 Gettingstarted With Lucid Works Enterprise

    14/41

    Getting Started With LucidWorks EnterpriseA Lucid Imagination Technical White Paper October2010 Page 10

    Notice that information about the data youve indexed already appears on the search page.In this case, you can see document authors, the data source, and the types of documents.

    These notations are called facets, and can be used to narrow results. (Well talk more about

    facets later.)

    Enter a search term and click Search to see the results.

  • 8/8/2019 Gettingstarted With Lucid Works Enterprise

    15/41

    Getting Started With LucidWorks EnterpriseA Lucid Imagination Technical White Paper October2010 Page 11

    Now that we know everythings working properly, we can talk about searching itself.

    Basic searching

    On the surface, searching is pretty simple. Enter a keyword, press search, and get results.

    And thats true. It is that simple. But its also powerful, in that you have the ability to get

    more out of your searching than a simple keyword search. In this section, well discuss

    how to get more out of your searches.

    Understanding Search QueriesThe simplestsearch query involves just a keyword or phrase, such as when I enter lucid

    in the search box:

  • 8/8/2019 Gettingstarted With Lucid Works Enterprise

    16/41

    Getting Started With LucidWorks EnterpriseA Lucid Imagination Technical White Paper October2010 Page 12

    We can also combine terms into a single query. For example, I can find documents that

    contain the terms indexing and delete with

    indexing AND delete

    In fact, LucidWorks Enterprise provides a default AND operator unless you specify

    something else, such as

    indexing OR delete

    which finds documents that contain either term.

    In fact, LucidWorks Enterprise includes support for a whole range of operators, includingcomparative operators such as < and proximity operators such as NEAR, BEFORE, or AFTER. See

    the LucidWorks Enterprise Users Guide for the list.

  • 8/8/2019 Gettingstarted With Lucid Works Enterprise

    17/41

    Getting Started With LucidWorks EnterpriseA Lucid Imagination Technical White Paper October2010 Page 13

    Searching individual fieldsSometimes, however, you want to be more specific. For example, I might want to find all of

    my PDF files. If I did a search for just

    application/pdf

    Id get no results, because that information isnt stored in the default search field. Instead, I

    could search for

    mimeType:application/pdf

    This tells LucidWorks Enterprise to search for documents that have a value ofapplication/pdf in the mimeType field.

    Range queriesYou also have the option to search for a range of values. For example, I can find all of the

    documents in my index that have 50 pages or less with:

    pageCount:[0 TO 50]

    Or if I wanted to be even more specific, I could find all PDF files in that range:

    mimeType:application/pdf AND pageCount:[0 TO 50]

    One place you often find this kind of query is in faceted searches.

    Faceted SearchingThe use of facets is one of the great recent advances in searching. Search facets enable

    users to narrow down their search by a variety of factors. For example, if we go back to

    the original keyword search for lucid, you can see a number of different options down theright-hand side of the page:

  • 8/8/2019 Gettingstarted With Lucid Works Enterprise

    18/41

    Getting Started With LucidWorks EnterpriseA Lucid Imagination Technical White Paper October2010 Page 14

    Notice that each entry includes not just a description of what it is, but also how many

    relevant documents there are. So I can see that this data source has 18 HTML documents

    that mention Lucid, and one that was authored by Grant Ingersoll. If I wanted to narrow

    my search to, say, OpenOffice presentations, I could click that link under Type.

  • 8/8/2019 Gettingstarted With Lucid Works Enterprise

    19/41

    Getting Started With LucidWorks EnterpriseA Lucid Imagination Technical White Paper October2010 Page 15

    This narrows the list from 51 to just 7 documents. If I wanted to, I could further narrow the

    list, say, to show only the documents authored by me.

    To clear the existing filters and go back to the original search, click the clear filters link

    under the search box.

    Now lets look at getting more data to search.

  • 8/8/2019 Gettingstarted With Lucid Works Enterprise

    20/41

    Getting Started With LucidWorks EnterpriseA Lucid Imagination Technical White Paper October2010 Page 16

    Improving search coverage

    Of course, the quality of search results depends on the quality of data added to the index. If

    the appropriate information isnt in the index, even the most carefully constructed query

    isnt going to find it.

    In this section, well show you how to index both local and remote content so that it can be

    found by your users.

    Understanding FieldsThe first thing well need to do before doing any indexing is understand just how

    LucidWorks Enterprise looks at the data were putting into it.

    Each document is made up of one or more fields. You can see a list of existing fields by

    clicking the Index tab, and the Fields subtab in the administration user interface.

  • 8/8/2019 Gettingstarted With Lucid Works Enterprise

    21/41

    Getting Started With LucidWorks EnterpriseA Lucid Imagination Technical White Paper October2010 Page 17

    If you click a field, youll see the option to edit the properties of that field. When youre just

    starting out, youll want to understand these properties:

    Name: This value is the name by which the field is known, both in the indexingprocess and in queries

  • 8/8/2019 Gettingstarted With Lucid Works Enterprise

    22/41

    Getting Started With LucidWorks EnterpriseA Lucid Imagination Technical White Paper October2010 Page 18

    Field Type: This property determines how the field is handled. For example, in thiscase, were looking at English text (as opposed to German, etc.) rather than a date, a

    sortable number, and so on.

    Indexed: This property specifies whether the contents of this field are used todetermine whether a document matches a particular query.

    Stored: This property determines whether the original value of the field is stored,potentially to be returned as part of a result.

    Multi-valued: This property determines whether a document can have multiplevalues for this field.

    Field Default: This value determines the value that will be used for the document ifno value is given when its indexed.

    Search by Default: This property determines whether the field will be used in asearch for which the user doesnt specify a particular field.

    Include in Results: Make sure this option is checked if you want this field to showup in the search results for this document.

    Highlight: This property determines whether the given term will be shown in

    context for this field.

    Facet: This property determines whether this field shows up as an available filteron the search page. Note that documents without this field wont show up in the

    counts for this facet.

  • 8/8/2019 Gettingstarted With Lucid Works Enterprise

    23/41

    Getting Started With LucidWorks EnterpriseA Lucid Imagination Technical White Paper October2010 Page 19

    Use for Deduplication: In all likelihood, you will want to re-index your content as itchanges. This setting enables you to determine how LucidWorks Enterprise knows

    this is the same document. For a product, it might be a product number. For

    HTTP data sources, it might be the URL.

    You can also delete and add fields from this interface.

    Indexing the Local Filesystem

    Even if you havent given any thought to what documents youd like to search, you likelyhave a ready source of material right on your hard drive. To create a data source from local

    files, click the Index tab, then the Sources subtab, and the FileSystem sub-subtab.

    Enter a user-friendly name for your new data source and the full directory path in which

    the documents are stored. You have the option to drill down into subdirectories or not, as

    well as to follow symbolic links or not.

    (Note that for security reasons, data indexed from the local filesystem will notautomatically be available via a link from the search results. Search the product

    documentation for linking for information on how to configure LucidWorks Enterprise to

    activate those links.)

    Click Create to create the data source.

  • 8/8/2019 Gettingstarted With Lucid Works Enterprise

    24/41

    Getting Started With LucidWorks EnterpriseA Lucid Imagination Technical White Paper October2010 Page 20

    Note that this process does not start the indexer; well look at that under Scheduling in a

    moment.

    Indexing HTTP DocumentsAnother option is to index web documents. You might want to index the contents of your

    local intranet, or perhaps you have a repository of content thats currently available via the

    browser. You can also use it to monitor external web sites. To set up a data source of web

    content, click the Index tab, then the Sources subtab and the Web sub-subtab.

    The Name should be something youll recognize later, and the URL is the value at which you

    want the crawler to start.The Allow Paths and Disallow Paths values enable you to control where the crawler goes.

    For example, if I were to index my own site, as Im doing here, I might want only my own

    content, so Ive specified only paths that start with my URL, using a regular expression to

    specify the rest of the path. Similarly, I might not want to index my administration pages.

  • 8/8/2019 Gettingstarted With Lucid Works Enterprise

    25/41

    Getting Started With LucidWorks EnterpriseA Lucid Imagination Technical White Paper October2010 Page 21

    You can use Disallow Paths to allow the crawler to follow links to external sites, but avoid,

    say, indexing tens of thousands of tweets.

    The Crawl Depth specifies how far down the crawler will go. A depth of 0 crawls only the

    specified URL; to get only that page and the pages linked directly by it, specify a depth of 1.

    Click Create to create the data source.

    Note that this process does not start the indexer; well look at that under Scheduling in a

    moment.

    Indexing Database RecordsAnother fertile area for data indexing is the database. Here the effort required is a little

    greater, but so are the potential rewards. Before you index any database content, however,

    there are two tasks you must accomplish:

    1) Determine the fields youre going to be indexing from your database. Its unlikelythat the fields with which LucidWorks Enterprise is preconfigured will match your

    column names exactly, and unless you map those columns to existing fields, all of

    your data will wind up in the text_all field, making it difficult to search.

    Fortunately, LucidWorks Enterprise gives you complete and easy control over fielddefinitions. Use the Fields sub-subtab to create any necessary fields.

    2) Make sure the appropriate JDBC driver is available to LucidWorks Enterprise. LWEdoesnt ship with any available drivers, so you will have to upload your own. To do

    that, youll need to download Curl (available athttp://curl.haxx.se/download.html)

    and use the following command to upload the *.jar file with the appropriate drivers:

    curl -F file=@ http://localhost:8888/api/collections/collection1/jdbcdrivers

    Once youve accomplished these two steps, youre ready to create the new data source.

    Click the Index tab, then the Sources subtab and the DB sub-subtab.

    http://curl.haxx.se/download.htmlhttp://curl.haxx.se/download.htmlhttp://curl.haxx.se/download.htmlhttp://curl.haxx.se/download.html
  • 8/8/2019 Gettingstarted With Lucid Works Enterprise

    26/41

    Getting Started With LucidWorks EnterpriseA Lucid Imagination Technical White Paper October2010 Page 22

    As usual, choose and enter a recognizable name for the data source, and enter the JDBC

    URL, minus any authentication information. For example:

    jdbc:mysql://127.0.0.1/productDB

    Enter the JDBC driver name. This is the actual class name you would use in a Java

    application, such as com.mysql.jdbc.Driver. Enter the username and password for the

    database.

    Finally, enter the query used to extract the data from the database, mapping each column to

    a LucidWorks Enterprise field. For example:select id as id, prod_name as name, price_pt as price from products

    In all likelihood, you will have information about a single item, or document, in several

    tables; to add it to your index, create the appropriate joins to add all of your data as a single

    query.

  • 8/8/2019 Gettingstarted With Lucid Works Enterprise

    27/41

    Getting Started With LucidWorks EnterpriseA Lucid Imagination Technical White Paper October2010 Page 23

    (In some cases, you will have structures that cant be added with a single query. See the

    Programmers Guide to learn how to handle these situations using Solrs

    DataImportHandler.)

    Click Create to create the data source.

    Note that this process does not start the indexer; well look at that under Scheduling in a

    moment.

    Indexing Solr Documents

    One final method of indexing data involves adding it directly using Solrs native document

    format. To do that, you will need a Solr document, which is just an XML document, such as:

    prod3_0Auxiliary Data3productPrintersDokad SPE 3299 Printer99A great printer that doesn't use a lot of ink.Sed ut perspiciatis ...Dokad SPE 3299 Printer99A great printer that doesn't use a

    lot of ink.Sed ut ...

    prod4_0Auxiliary Data4productCamerasAccessoriesKinok UltraCam II550Boy, the Kinok UltraCam II is a great camera, and it hooks

    up to your printer terrifically.Lorem ipsum dolor sit amet, consectetur adipisicing

    elit, ...Kinok UltraCam II550Boy, the Kinok UltraCam II is a great

    camera, and it hooks up to your printer terrifically.Lorem ipsum dolor sit amet,...

  • 8/8/2019 Gettingstarted With Lucid Works Enterprise

    28/41

    Getting Started With LucidWorks EnterpriseA Lucid Imagination Technical White Paper October2010 Page 24

    To index this type of document, click the Index tab, then the Sources subtab and the Solr

    sub-subtab.

    Enter a recognizable name, as well as the path to the actual document. Note that weve

    added a data_sourcefield that matches what weve named the data source. This is becauseSolr documents dont automatically have this field populated, as the other types do, so if we

    want that information to show up in the search facets, we need to provide it ourselves.

    Click Create to create the data source.

    Note that this process does not start the indexer; well look at that under Scheduling right

    now.

    Scheduling TasksWere finally ready to look at scheduling the indexing of our data sources. Under the Index

    tab, click the Schedules subtab.

  • 8/8/2019 Gettingstarted With Lucid Works Enterprise

    29/41

    Getting Started With LucidWorks EnterpriseA Lucid Imagination Technical White Paper October2010 Page 25

    Click the icon next to DataSources to expand that option, and click the data source you wantto index to highlight it.

    You have the option to start indexing in a specific amount of time (such as 0 seconds to

    start immediately) or at a specific time on a specific day.

    Also, if your data is likely to change, you can specify the frequency with which you want to

    reindex it.

    You can also deactivate an index if youd like to stop the indexing process. Deactivating an

    index wont make the data unavailable, however. To do that, youll need to delete it

    altogether.

    Deleting Test DocumentsDeleting a data source is a pretty straightforward process. Click the Index tab, and then the

    Sources subtab. Under the list of data source types, youll find a Delete button.

  • 8/8/2019 Gettingstarted With Lucid Works Enterprise

    30/41

    Getting Started With LucidWorks EnterpriseA Lucid Imagination Technical White Paper October2010 Page 26

    To delete a data source, highlight it and press the Delete button. Note that there is no

    confirmation dialog. Once you click Delete, its gone.

    Sort of. While it will no longer be updated, the data indexed as part of that data source is

    still in the index, and queries will still return it. To get rid of it altogether, you will need to

    call the underlying engine directly.

    To delete all the data in your index and I really do mean all the data in your index point

    your browser to the following URL:

    http://localhost:8888/solr/update?stream.body=\*:\*]

    This process is the opposite of the Delete button; it gets rid of the data, but no data sources.

    So youre ready to either reindex, or delete them and start over.

    Assuming that you havent started over, were ready to look at enhancing your users

    search experience.

  • 8/8/2019 Gettingstarted With Lucid Works Enterprise

    31/41

    Getting Started With LucidWorks EnterpriseA Lucid Imagination Technical White Paper October2010 Page 27

    Improving the Search Experience

    LucidWorks Enterprise builds on the rich ecosystem that includes the Solr/Lucene on

    which it is built. That means that you have access to all of the best bells and whistles

    available with Solr, plus even more, rightat your fingertips. In this section, well lookat

    how to configure some of the most useful.

    User AlertsWhen youre dealing with huge amounts of data, one of the biggest challenges is keeping up

    with it as it grows. One way to do that is to use user alerts, which notify you when new

    content matching your queries has been added to the system.

    To set up a user alert for a query, click the Add this query as alert link under the search

    box.

    This linktakes you to a page where you can specify the details of where youd like to

    receive the alert.

  • 8/8/2019 Gettingstarted With Lucid Works Enterprise

    32/41

    Getting Started With LucidWorks EnterpriseA Lucid Imagination Technical White Paper October2010 Page 28

    The alerts arrive at the specified email address using the Name of the alert as the subject

    line, so you can specify it in a way that works with your mail filters.

    You can also specify how often to check for new data, with the option to limit how often it

    actually sends you data.

    Now, all that said, by default, email alerts are not enabled when LucidWorks Enterprise

    ships, because they require administrator configuration. To enable them, edit the file

    /rails/config/alerts.yml

    to include the appropriate SMTP information.

    Even without additional configuration, however, you can still see the results of an alert. To

    do that, save the alert to add it to the list provided when you click the View saved alerts link

    under the search box.

  • 8/8/2019 Gettingstarted With Lucid Works Enterprise

    33/41

    Getting Started With LucidWorks EnterpriseA Lucid Imagination Technical White Paper October2010 Page 29

    By clicking the Preview link for the alert, you can see the latest results for this particular

    query.

    Helping Users Create Their QueriesSeveral of the functions available in LucidWorks Enterprise can help you help your users by

    providing guidance on what they should be searching for. These include auto-complete,

    spell-checking, and find similar links.

    Auto-completeOne of the best ways to make sure that users dont wind up with no results is to guide

    them towards terms that actually exist within the index. And one of the best ways to do

    that is using auto-complete functionality.

    Auto-complete looks at the characters the user has already entered and offers terms that

    start with those characters.

  • 8/8/2019 Gettingstarted With Lucid Works Enterprise

    34/41

    Getting Started With LucidWorks EnterpriseA Lucid Imagination Technical White Paper October2010 Page 30

    Spell-checkingYou can also check the spelling of the terms the user has entered against the existing index

    of terms, and offer suggestions. For example, if you were to search for printe, the system

    might suggest printer based on the content you have indexed.

    Find Similar linksThe find similar functionality helps users by finding content they may not have known

    they were looking for. For example, a search for sports might find a document that

    contains the word basketball, even if the document doesnt include the word sports atall.

    If a similar result is available, the link appears under the existing result.

  • 8/8/2019 Gettingstarted With Lucid Works Enterprise

    35/41

    Getting Started With LucidWorks EnterpriseA Lucid Imagination Technical White Paper October2010 Page 31

    In order for these three functions to work, youll need to make sure of three things:

    1) Make sure auto-complete and/or spell checking are enabled.

    2) Make sure that at least one field is specified as a source for these terms.

    3) For spell-checking and auto-complete, youll need to make sure that indexing forthese terms has been performed.

    Enabling These FunctionsTo enable auto-complete, spell-checking, or find similar, click the Queries tab and the

    Settings subtab. Click the Search Settings item to highlight it.

  • 8/8/2019 Gettingstarted With Lucid Works Enterprise

    36/41

    Getting Started With LucidWorks EnterpriseA Lucid Imagination Technical White Paper October2010 Page 32

    Make sure that Enable auto-complete, Enable spell checking, and/or Show find similar

    links are checked, and click Save Settings.

    Specifying FieldsTo specify one or more fields for these functions, click the Index tab and the Fields subtab.

    Highlight the relevant field.

  • 8/8/2019 Gettingstarted With Lucid Works Enterprise

    37/41

    Getting Started With LucidWorks EnterpriseA Lucid Imagination Technical White Paper October2010 Page 33

    Make sure that Index for Spell Checking , Index for Auto-complete, and/or Use in FindSimilar are checked, and click Save Settings.

    IndexingFinally, make sure that the spell-checking and/or auto-complete information has been

    indexed. To do that, click the Index tab and the Schedules subtab. Click the icon next to

    Activities to expand it and highlight spelling or auto-complete. Schedule these indexes just

    as you would schedule your data sources.

    Now that weve got good queries, its time to make sure they return good results.

    Improving Relevance

    One of the advantages of using a search platform such as LucidWorks Enterprise is that

    results can be ranked by relevance, with the results most likely to be what the user is

    looking for at the top of the list. Out of the box, LucidWorks Enterprise does a pretty good

  • 8/8/2019 Gettingstarted With Lucid Works Enterprise

    38/41

    Getting Started With LucidWorks EnterpriseA Lucid Imagination Technical White Paper October2010 Page 34

    job, but there are ways that you can help LWE improve relevancy even more by including

    input from the best computer out there: the human brain.

    SynonymsOne way to provide better results is to provide LucidWorks Enterprise with groupings of

    words that have the same (or at least similar) meanings. For example, a search for lawyer

    should probably also find documents that only contain attorney. Mostindustries and

    subject areas have their own set of jargon and synonyms, and you can configure them

    directly from within the administration user interface.

    Click the Queries tab, and then the Settings subtab. Highlight Synonyms and Stopwords,

    and then expand the Synonyms entry.

    From here, you can add new entries or remove existing entries. Each line is considered a

    group; you can add as many comma-delimited terms as you like.

  • 8/8/2019 Gettingstarted With Lucid Works Enterprise

    39/41

    Getting Started With LucidWorks EnterpriseA Lucid Imagination Technical White Paper October2010 Page 35

    Click Save Settings when youre finished.

    Stopwords

    In search parlance, a stopword is a word thats so common that adding it to a query rarely

    increases the quality of results, and frequently decreases it. For example, if you did a

    search for the City of Chicago, Chicago would certainly provide good results. City

    might as well. But how many billions of documents that have nothing to do with Chicago

    contain the words the and of?

    Fortunately, LucidWorks Enterprise understands the concepts of stopwords, and in most

    cases, will eliminate them from your query. It also understands how to handle stopwords

    on the back end so that they help improve relevance (for example, by judging the proximity

    of two words) rather than hinder it.

    LucidWorks Enterprise starts with a list of several dozen stop words, such as a, and,

    for, and so on. You may find, however, that you need to add your own. To do that, click

    the Queries tab and the Settings subtab. Highlight Synonyms and Stopwords, and expand

    the Stopwords entry.

  • 8/8/2019 Gettingstarted With Lucid Works Enterprise

    40/41

    Getting Started With LucidWorks EnterpriseA Lucid Imagination Technical White Paper October2010 Page 36

    As with synonyms, you can use this interface to add, edit, or delete stopwords.

    Click ScoringPerhaps the best way for LucidWorks Enterprise to know whether a result is really

    relevant for a particular search query is to keep track of whether a human thinks it is. Click

    scoring makes that happen.

    When you enable click scoring, LucidWorks Enterprise tracks which results are most often

    clicked for a particular query, and boosts their relevance scores accordingly. It will then

    be more likely to present those results higher in the list for that query.Using click scoring requires manual configuration of LucidWorks Enterprise. For

    information on how to set it up, search for Click Scoring Relevance Framework in the

    LucidWorks Enterprise documentation.

  • 8/8/2019 Gettingstarted With Lucid Works Enterprise

    41/41

    Summary

    By providing a search development platform with a fast, flexible architecture built on open

    source, LucidWorks Enterprise harnesses the power of Solr/Lucene in a convenient, well-

    curated package, while sparing you the programming pain that would otherwise be

    required to get a basic system up and running.

    In this document, we showed how to install a single-server instance of LucidWorks

    Enterprise, and how to index local, HTTP, and database content. We also looked at the

    basic concepts involved in performing search queries.

    We then covered some of the bells and whistles that are available to make your life, and the

    lives of your users, easer, and how to configure them.

    You should now have a fully-functioning search platform, ready for data and customization.

    Next Steps

    For more information on how Lucid Imagination can help search application developers,

    employees, customers, and partners find the information they need, please visitwww.lucidimagination.comto access blog posts, articles, and reviews of dozens of

    successful implementations.

    Please e-mail specific questions to:

    Support and Service:[email protected]

    Sales and Commercial:[email protected]

    Consulting:[email protected]

    Or call: 1.650.353.4057

    http://www.lucidimagination.com/http://www.lucidimagination.com/mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]://www.lucidimagination.com/