Transactional Lucene

download Transactional Lucene

of 2

Transcript of Transactional Lucene

  • 7/27/2019 Transactional Lucene

    1/2

  • 7/27/2019 Transactional Lucene

    2/2

    enabling users to choose which catalog to search.

    Repeatable indexing tests from the same initial index: maybe you want to run a bunch of performance tests,

    perhaps trying different RAM buffer sizes or merge factors, starting from a large initial index. To do this, simply run

    each test, but in the end, instead of closing the I ndexWr i t er , use the r ol l back method to quickly return the index

    to its initial state, ready for the next test.

    Force all index segments to be merged down to a single segment, but also keep the prior multi-segment commit.

    Then you can do tests to compare multi-segment vs single-segment performance.

    Indexing and searching over the NFS file system: because NFS does not protect still-open files from deletion, you

    must use an I ndexDel et i onPol i c y to keep each commit around until all open readers have finished with thecommit (ie, reopened to a newer commit). The simple approach is time-based, for example: don't delete the commit

    until it is 15 minutes old, and then always reopen your readers every 5 minutes. Without this you'll hit all sorts of

    scary exceptions when searching over NFS.

    Distributed commit: if you have other resources that must commit atomically along with the changes to your Lucene

    index, you can use the two-phased commit API. This is simple, but vulnerable to failures during the 2nd phaes; to

    also recover from such cases, for example if Lucene completed its 2nd phase commit but the database's 2nd phase

    hit some error or crash or power loss, you can easily rollback Lucene's commit by opening an I ndexWr i t er on the

    prior commit.

    Experimental index changes: maybe you want to try re-indexing some subset of your index in a new way, but you're

    not sure it'll work out. In this case, just keep the old commit around, and then rollback if it didn't work out, or

    delete the old commit if it did.

    Time-based snapshots: maybe you'd like the freedom to roll back to your index as it existed 1 day ago, 1 week ago,

    1 month ago, etc., so you preserve commits based on their age.

    Remember that keeping more than one commit alive will necessarily consume additional disk space, however, the

    overhead is often small since the multiple commits will usually share common segments, especially the larger, older ones.

    actional Lucene - Blog - SearchWorkings.org http://www.searchworkings.org/blog/-/blogs/transactional-lucene?

    3/26/2012