Inline Tagging and Dictionary Connection

Post on 11-May-2015

479 views 0 download

Tags:

description

Gena M. San Nicholas, a taxonomist and biology subject-matter expert (SME) at Access Innovations, Inc., shows how Data Harmony's machine-aided indexing (M.A.I.) module produces tagged subject terms within bodies of text for XML and other repositories. This aids in search and leverages subject metadata, resulting in added value to data collections.

Transcript of Inline Tagging and Dictionary Connection

Inline TaggingGENA SAN NICOLAS

EDITOR/TAXONOMIST

Introduction

What’s the big deal about Data Harmony, anyway?

My background—biology Searching through science databases was tedious and laborious

Frequently, the only way to tell if an article was what you wanted was to actually read the whole thing

Costly if your institution didn’t have accession rights to that particular publication.

Data Harmony allows the user to “browse the book”

Rulebase allows editors to assign context to full-text and disambiguate terms

Indexing terms are XML-tagged by Data Harmony in the document

Rulebase is auto-generated but is easily edited

“Easily edited”—easy for an experienced editor

Test MAI

Look at indexing results

Compare rule to trigger words in full-text test

Tweak rule as necessary

We’ve made it easy for you!

But wait!!! There’s more!!!

With Inline Tagging, we make it EVEN EASIER for you!!!

What is Inline Tagging?

From the DH-Inline tagging documentation: “Access Innovations’ Inline Tagging function finds and labels thesaurus concepts (identified by rules stored within the thesaurus rule base ) within the full text of an article (in XML or PDF format ) by applying XML wrappers, or “tags”.  The process of adding XML tags within content is called “inline tagging.” Thanks to the XML format, metadata can be included within content files—not just in a set-aside area at the beginning or end of the file, but woven into the very text. “

This allows the user to truly “browse the book” according to your content management needs.

We take this MAIstro output:

…and turn it into this:

HTML output format is completely customizable

MAIstro Inline Tagging Web Service

To facilitate integration of DataHarmony's MAIstro suite with a publishing pipeline or other workflow, a simple web service can be installed that performs automatic indexing. This web service is an abstraction of the Java APIs that DataHarmony's MAIstro uses.

The web service has two functions: TestSettings: For configuration and debugging

GetTerms: Call MAIstro's GetTerms API and return a formatted document with the subject terms tagged inline with xml tags