Using Lucene for Search within XIS
-
Upload
accessinnovations -
Category
Technology
-
view
578 -
download
0
description
Transcript of Using Lucene for Search within XIS
XIS Lucene Indexing and Search
What is XIS? XIS is a XML schema-based database system used to
store user data All records are stored in individual XML files Option to zip XML files available with XIS Project DTD
How XIS Data Is Stored Docsets
Stores records with multiple fields (similar to SQL Table) Can also have subfields and lists of field values nested within a
record Can look up values from other fields in other Docsets or other
tables Tables
Stores a single list of values Can be referenced by other Docsets Can be directly accessible for editing or kept hidden from user
view
How to Create a XIS Project Create DTD file for XIS project
Specify MAI Thesaurus to link to project Create Docset and Tables Specify ID lengths for each Docset Create fields for Docsets
Save DTD to dhserver/projects/projects/xml folder Create XIS Project folder under dhserver/data Create subfolders for each Docset under XIS Project
folder as well as Tables directory XIS Projects can only be created by administrators
Starting a XIS Project Start Data Harmony server where project is located Log in to Admin module
Start MAI Thesaurus Start XIS Project Index XIS Project, especially if just created
Run startXis program Enter server, port, thesaurus, username, and password
to log in
Indexing a XIS Project
XIS Login Screen
XIS Project View
XIS Docset View
XIS Table View
XIS Record Format Saved in XML file Starts with tag to represent Docset name along with ID
as attribute Fields are listed within Docset tag along with values.
Subfields are nested within their parent fields
XIS Search View
XIS Search Results
Current XIS Indexing and Search Uses text-based indexes Creates large number of index files (one for each field) Generates temporary files for results Uses less reliable RandomAccessFile search Has limited amount of search operands Does not take into account numerical values
Lucene vs. Current XIS Index Fewer index files needed Allows for broader searches
Fuzzy matching Start and end wildcard searches
Recognizes numerical and date fields as such Can be utilized to remove stopwords
New Lucene Search Process Establish index reader to perform search Submit query string containing fields and parameters Return results
Other Lucene Functions Will be used for adding, updating, and deleting XIS
records Indexes will be housed on Data Harmony server
Any Questions?