Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact...

76
Xtractor Designer Advanced User Manual Version 6.1 DocuPhase Corporation

Transcript of Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact...

Page 1: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Xtractor DesignerAdvanced User Manual

Version 6.1

DocuPhase Corporation1499 Gulf to Bay Boulevard, Clearwater, FL 33755 Tel: (727) 441-8228 – Fax: (727) 444-4419Email: [email protected]

Page 2: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Web: www.DocuPhase.com

Copyright © 2000 – 2017, DocuPhase Corporation, All rights reserved.All rights reserved. No part of the contents of this publication may be reproduced, transmitted, transcribed, stored in a retrieval system, or translated into any language in any form without written consent from DocuPhase Corporation.This software product, including the manual and media, is copyrighted and contains proprietary information that is subject to change without notice. The software may be used or copied only in accordance with the terms of the license agreement.DocuPhase is a registered trademark of DocuPhase Corporation. All other trademarks are acknowledged as the exclusive property of their respective owners.

Version 6.1 -- 5.14.2023

Xtractor Advanced User Manual 6.1 Page 2 of 68

Page 3: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Table of ContentsIntroduction.................................................................................................................5

Welcome to Xtractor!................................................................................................5Purpose and Assumptions.........................................................................................6Foundational Terminology........................................................................................7Client Support Services Contact Information............................................................9

Xtractor Product Overview.........................................................................................10Data Extraction.......................................................................................................11OCR........................................................................................................................11Barcodes.................................................................................................................11Zone Recognition....................................................................................................13Subpage Processing................................................................................................14Recognition Zones Defined.....................................................................................14Cover Page Handling Options.................................................................................15

Xtractor Designer Interface........................................................................................16Xtractor Main Menu................................................................................................16Xtractor Toolbar......................................................................................................20Xtractor Workspace Viewer.....................................................................................22Xtractor Image Viewer............................................................................................23

Xtractor Administration..............................................................................................24Workspace Hierarchy..............................................................................................24Workspace Definition Configuration........................................................................25

Workspace Properties Configuration....................................................................25Project Properties Configuration..........................................................................26

Template and Image Processing Properties............................................................26Template Processing Properties...........................................................................26Image Processing Properties................................................................................29Recognition Template..........................................................................................31Selection Template..............................................................................................31Routing Template................................................................................................34

Zone Properties......................................................................................................37Date Properties....................................................................................................37

Xtractor Advanced User Manual 6.1 Page 3 of 68

Page 4: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Barcode Zone Properties.....................................................................................39Barcode Zone Definitions................................................................................39Document Separator Tab.................................................................................41Modifications Tab.............................................................................................42

OCR Machine Zone Properties.............................................................................44OCR Machine Definitions.................................................................................44OCR Character Set...........................................................................................45Document Separator Tab.................................................................................45Modifications Tab.............................................................................................45Validation Tab..................................................................................................47

Working with Xtractor................................................................................................48Creating an Image Template..................................................................................48

Open the sample TIFF image using Xtractor........................................................48Defining Recognition Zones.................................................................................48Testing the recognition zones..............................................................................49Saving the image template.................................................................................49

Using Xtractor to Process DocuPhase Documents..................................................49Appendix A: Xtractor Installation..............................................................................50

Prerequisites & Minimum Requirements.................................................................50Xtractor Login.........................................................................................................50

Appendix B – Routing Template.................................................................................52Appendix C – Split then Index Template....................................................................53Appendix D – Index then Split Template....................................................................54Appendix E – Selection Template...............................................................................55Appendix F: Routing Documents to Xtractor.............................................................56

Xtractor Advanced User Manual 6.1 Page 4 of 68

Page 5: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Introduction

Welcome to Xtractor!Xtractor is the Barcode and OCR recognition Engine or the DocuPhase content management product suite that extracts and applies data from TIFF images and PDF imaged files produced by scanning paper documents. Xtractor has an interactive designer component that is used to establish zones within the pages of TIFF documents where OCR/Barcode technology can be applied to read essential information used to identify document types and index fields of the document being processed. Xtractor can also be used to process PDF files produced by a paper-document scanner.Xtractor’s document capture features include image enhancement controls, correction tools and color image support to assure the quality of scanned documents and to improve the accuracy of returned results. The enhancement functions also minimize the file size of scanned documents which increases the number of documents that can be saved in the available capacity for DocuPhase-Repository storage.The Xtractor Service runs in the background, behind the “scenes”, to process TIFF documents by applying the Xtractor definitions and OCR/Barcode technology. Xtractor can process single-page and multi-page TIFF documents as well as split compound TIFF documents separating them into multiple separate documents each with their own set of index values. Xtractor can also process certain types of Portable Document Format (PDF) files that are produced by scanners.Although the PDF file-format is covered by standards, there are two types of PDF document-format files:

Image-Based PDF Files – PDF files that are produced by scanning paper documents to create a document file composed of image pages. They may also be single-page or multiple-page TIFF/PDFs.

Text-Based PDF Files – PDF files that are produced by PDF authoring tools that contain text content which may also contain image objects such as illustrations, pictures and screenshots. They may also be single-document or multiple-document PDFs.

Although Xtractor has a Designer’s interactive User Interface, operationally Xtractor also runs as a background service processing and indexing Image-Based PDF and TIFF documents. Functionally, iDox and Xtractor both locate and extract information from the document files they process in the background, but each processes its own specific types of files:

The DocuPhase Xtractor product is used to process TIFF image files and Image-Based PDF files.

Xtractor Advanced User Manual 6.1 Page 5 of 68

Page 6: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

The DocuPhase iDox product is designed to process Text-Based PDF Files.

The Image-Based PDF files are converted and handled as TIFF format documents as they are processed by Xtractor.Both the iDox and Xtractor services are able to receive their appropriate documents after capture as configured by assigned routing codes at the appropriate time. This allows their configured designs to process each document as they receive it – Performing fully-automatic OCR/Barcode recognition to Locate, Index and Split Documents as well as to further route documents to the Data Exchange service that performs configured database lookups in the background for additional fully-automatic Indexing of the captured documents in DocuPhase.

Purpose and AssumptionsThis manual has been written for the advanced Administrator and Designer user to prepare them to design and configure the Xtractor product using the interactive Xtractor design interface tool to recognize and process different types of TIFF documents and Image-Based PDF documents automatically as a background service.The following are assumptions made with regard to the reading audience for this guide:

The reader has received at least one week of DocuPhase provided training. The reader has used the DocuPhase DocuPhase software product for a period

of no less than one month.

Xtractor Advanced User Manual 6.1 Page 6 of 68

Page 7: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Foundational TerminologyThe following key terminology that appears in this document are defined to aid the reader in case they are unfamiliar with these terms.Barcodes. Barcodes are special machine print codes for storing information. This information is easily and accurately extracted by auto-indexing tools, such as Xtractor.Data Extraction: Data Extraction allows for the collection and manipulation (extraction) of data by use of a computer. Xtractor uses a process called zone recognition to extract index values from scanned documents.Image-Based PDF: A Portable Data File (PDF) format that contains only images. The image content may be pictorial, graphical and/or an image of text much like found in a TIFF file format.

Image-based PDFs are produced by scanning paper documents to create a PDF composed of images as opposed to Text-Based PDF files that have been produced by PDF authoring tools.

Use Xtractor to process Image-Based PDF files.Note: You can tell if a PDF is Image-Based, by opening it with Adobe Acrobat Reader® to a page with text information – If you cannot highlight some paragraph text and cannot use the copy-to-clipboard feature to copy it, then it is an Image-Based PDF.

The DocuPhase iDox product does not process the text information in Image-Based PDF files – However, the DocuPhase Xtractor product does process Image-Based PDF format files. Tagged Image File Format (TIFF). A TIFF or TIF is a popular image file format that is used by graphic artist and the publishing industry, Fax-image capture as well as common industry standard file format for storing content such as single or multiple page image documents and images.Subpage Processing. Xtractor can separate and process multiple documents with the use of a properly configured Document Separator.

Note: Subpage Processing provides the ability to split multipage documents into multiple separate documents.

Text-Based PDF: A Portable Data File (PDF) format that contains text information that can be accessed such as by highlighting and copying it to clipboard. The PDF content may be pictorial, graphical and actual text information.Text-Based PDF are produced by PDF authoring tools as opposed to Image-based PDFs that have been produced by scanning paper documents to create a PDF composed of images.

Use iDox to process Text-Based PDF files.Note: You can tell if a PDF is Text-Based, by opening it with Adobe Acrobat Reader® to a page with text information – If you can highlight some text and

Xtractor Advanced User Manual 6.1 Page 7 of 68

Page 8: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

can use the windows feature copy-to-clipboard to copy it, then it is a Text-Based PDF.

The DocuPhase Xtractor product does not process the text information in Text-Based PDF files – However, the DocuPhase iDox product does process Text-Based PDF format files. OCR. Optical Character Recognition (OCR) is a technology that allows a computer to recognize text from paper and then translate that text into a data.Portable Document Format (PDF). Portable Data File (PDF) is an industry standard, royalty free, file format for representing one to many-page image documents that was originally developed by Adobe to be independent of application software, hardware and operating systems. A PDF file encapsulates a complete description for a fixed-layout flat document which includes text, graphics, fonts and other information necessary to display the document. Two main types of PDF documents are Image-Based PDFs and Text-Based PDFs.

Note: Image-based PDFs can be processed by the DocuPhase Xtractor product and Text-based PDFs can be processed by the DocuPhase iDox product.

Zone Recognition. Xtractor makes use of a data extraction technique referred to as Zone Recognition. Through zone recognition, Xtractor is prompted to search within pre-defined zones to perform data extraction. In Xtractor, a Recognition Zone is an area on a page where useful data is anticipated to reside. Guidelines for the extraction of data exist within each recognition zone’s definition. This approach to Data Extraction is most successful when applied to pages that adhere to moderately standard formats. There are two types of recognition zones defined in an image template and processed by Xtractor to obtain data:

Barcode: Barcodes are the most accurate type of zone recognition Xtractor performs. When configured correctly, a barcode recognition zone can yield 100% accuracy in an optimized system to translate the barcodes to usable computer data.

OCR: OCR capture collects all machine print data within a recognition zone to translate the optically-recognized characters in images into usable computer data.

Xtractor Advanced User Manual 6.1 Page 8 of 68

Page 9: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Client Support Services Contact InformationDocuPhase is committed to providing quality service and support for our customers. If you are experiencing difficulty with your DocuPhase software, please let us hear from you so we can help.Client Support Services are provides as part of your Maintenance Program. Enhanced support programs are available upon request. The standard support feature set includes:

Product Updates and Upgrades Telephone and Email support during local business hours Remote Connect Support during local business hours

You may: Contact us by email at [email protected]. Reach us by phone at (727) 441-8228. Reach us by fax at (727) 444-4419. Find us online at www.DocuPhase.com/support.

Xtractor Advanced User Manual 6.1 Page 9 of 68

Page 10: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Xtractor Product OverviewXtractor uses advanced image processing and recognition technologies to extract useful information from barcodes and machine print (OCR) obtained from scanned or faxed documents. The Xtractor Service runs in the background behind the “scenes” to process TIFF documents applying the configured Xtractor definitions and OCR/Barcode technology. Xtractor can process single-page, multi-page TIFF documents as well as split compound TIFF documents separating them into multiple separate documents each with their own set of index values.Xtractor updates index values for the documents it processes by reading values from first page of the document image itself. Sub-page processing allows zones to be read from other pages.All of Xtractor’s features are accessible through an icon-driven user interface. Xtractor’s robust user interface provides rapid creation of new templates without the need for IT assistance or support. Xtractor delivers high performance run-time processing of documents that are configured with just the click of a mouse.Automated Indexing Rules include format restrictions; Exceptions are routed to the appropriate department for review and correction. Verification capabilities enable operators to easily move from one exception document to another, making corrections possible at any stage plus the movement of cumbersome forms to research queues for future review.Xtractor’s document capture features include image-enhancement controls, correction tools and color-image support that can be configured to assure the quality of scanned documents and to improve the accuracy of returned results. The enhancement functions also minimize the file size of scanned documents, increasing the number of documents that can be saved in the available capacity for DocuPhase-Repository storage.The Xtractor icon-driven user interface is used to design and develop the templates and specifications that the Xtractor Barcode and OCR Recognition Engine uses while running in the background extracting data and automatically updating document indexes for TIFF-image documents using the data it extracts.

Xtractor Advanced User Manual 6.1 Page 10 of 68

Page 11: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Both the iDox and Xtractor services are able to receive their appropriate documents after capture, as configured by assigned routing codes, at the appropriate time. This allows their configured designs to process each document as they receive it – Performing fully-automatic OCR/Barcode recognition to Locate, Index and Split Documents as well as to further route documents to the Data Exchange service that performs configured database lookups in the background for additional fully-automatic Indexing of the captured documents in DocuPhase. The key features of Xtractor include the following features for TIFF and Image-Based PDF documents:

Automated Indexing with Data Extraction Optical Character Recognition Barcode Recognition Zone Recognition Sub-page Processing -- The ability to Split compound TIFF documents Cover Page Handling Options:

o Retain Cover Pageo Auto Remove Cover Page or o Auto Move Cover Page to be the Last Page

These Xtractor features are briefly explained below.

Data ExtractionData Extraction allows for the automated collection and manipulation (extraction) of data by a computer. Xtractor uses a process called zone recognition to extract Index values from paper documents that – upon scanning paper documents to create digital TIFF or Image-Based PDF files and through the use of DocuPhase technology, they are converted to electronic TIFF images for Data Extraction processing and subsequently stored as automatically indexed documents in DocuPhase (DocuPhase’s Content Management solution).

Xtractor Advanced User Manual 6.1 Page 11 of 68

Note: The DocuPhase Xtractor product is used to process and extract information from single-page, multiple-page compound document TIFF-image documents and automatically converts Image-Based PDF files to TIFF format for processing. Likewise, the DocuPhase iDox product performs similar functions for Portable Document Format (PDF) single-document and multiple-document Text-Based PDF files.

Page 12: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

OCROptical Character Recognition (OCR) is a technology that allows a computer to recognize text from digital images of paper and then translate that text into a form that the computer can manipulate as data. The potential of an OCR system is enormous because it enables users to harness the power of computers to access and manage (extract) data contained within printed documents that are scanned as well as from other image sources.

BarcodesBarcodes are special machine-print codes for storing information. This information is easily and accurately extracted by auto-indexing tools, such as Xtractor. The DocuPhase BarCoder product is able to conveniently generate printed bar-coded and patch-coded document separator sheets that can be inserted as a cover-sheet for each document that will be stacked together in the input hopper of a scanning device. The Patch-Code and/or a Bar-Code is then used to automatically separate the pages of each document with the BarCoder Cover-Sheet as the 1st page of each document.

Note: A BarCode can not only be used as a separator for documents, but it can also be used to identify the type of document, etc.

Although the document-separator Cover Sheet can remain captured as the 1st page of a document, Xtractor has an alternative 1st Page option for handling a document Cover Sheet by:

Automatically removing the 1st Page (i.e., the barcoded cover-sheet) from the captured document’s TIFF image

Automatically moving the 1st Page (i.e., the barcoded separator cover-sheet) to become the last page of the captured document’s TIFF image.

This 1st Page option, improves the display of the captured image in DocuPhase since the first page displayed for each document will be the actual 1st page of the captured document itself and not a barcoded cover sheet, or a non-barcoded cover/title page.

Note: Since Xtractor works with the TIFF when it is in a Pre-Entered State before it is submitted to DocuPhase, what the new 1st Page option does is independent of Revision Control settings and does not create a new revision.

The following is a basic example of a Medical Office-Visit Encounter which produced three documents for the same patient (e.g., Lillian Brevoort) and the same encounter number (e.g., 2223347) that are separated and automatically indexed by a BarCoder printed Cover Sheet for this Encounter along with other encounter records with cover sheets that may be placed together in a scanner or Fax machine input hopper.

Xtractor Advanced User Manual 6.1 Page 12 of 68

Page 13: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

BarCoder Printed Cover Sheet Medical Information for thisPatent Encounter

XRAY for this Patient Encounter

EKG for this Patient Encounter

Cover Sheet for this Batch of Documents

from thisPatient Encounter: 2223347

Xtractor Advanced User Manual 6.1 Page 13 of 68

Page 14: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Zone RecognitionXtractor makes use of a data extraction technique referred to as Zone Recognition. Through Zone Recognition, Xtractor is prompted to search within pre-defined zones to perform data extraction. In Xtractor, a Recognition Zone is an area on a page where useful data is anticipated to reside. Guidelines for the extraction of data are configured for the area in each Recognition Zone. This approach to Data Extraction is most successful when applied to pages that adhere to simple or moderately-complex standard formats. The barcode sheet seen below (in the topic: Recognition Zones Defined) is an example of a page that Xtractor could easily read and auto-index, given it had an Image Template to apply to the page for recognition (i.e., to know where to look).Xtractor allows the user to create an Image Template by defining all the possible Recognition Zones of a page. Recognition Zones are delineated in red, rectangular boxes. Below is a general example of an Image Template with four Recognition Zones.

F-12345

The Xtractor Designer configures Zones and Fields where

specific Data can be found using Identifying Codes

F-12345

BarCodes can provide Index Data or Identify the Type of Document

OCR Text can provide Index Data or Identify the Type of Document

With use of Image Templates, Xtractor analyzes a scanned document in its efforts to extract target data. If the target data extracted from the recognition zones meets pre-defined confidence thresholds and other validation criteria, Xtractor then automatically indexes the target data for the current document retains it in the DocuPhase database.

Xtractor Advanced User Manual 6.1 Page 14 of 68

Page 15: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

During recognition processing, the Image Template and its associated Recognition Zones are applied to Documents stored in the DocuPhase system that have been flagged for automated indexing.

Subpage ProcessingXtractor can separate (i.e., split) and process multiple documents with the use of a properly configured Document Separator.

Note: Subpage Processing provides the ability to split compound documents into multiple separate documents.

Recognition Zones DefinedXtractor’s core functionality is Automatic Indexing (auto-indexing). This process enables Xtractor to automatically index documents previously converted from paper to electronic format contained in the DocuPhase database (e.g., by scanning or faxing).Xtractor is capable of searching DocuPhase’s Application tables to find files that are new and unprocessed. These files will have the .tiff (image) extension. When Xtractor encounters a TIFF file with proper pre-defined attributes, it loads the first page of the TIFF file. Once the first page of the TIFF file is loaded, Xtractor examines data contained within pre-defined Recognition Zones of an Image Template. Xtractor then extracts values from each zone for subsequent auto-indexing to the DocuPhase database. When it is finished processing one document, Xtractor continues to the next document to repeat the process.There are two types of Recognition Zones defined in an Image Template and processed by Xtractor:

Barcode OCR

Element Xtractor Recognition Zone Types Description

BarcodeBarcodes are the most accurate type of zone recognition Xtractor performs. When configured correctly, a barcode recognition zone can yield 100% accuracy in an optimized system.

OCROCR capture collects all machine print data within a Recognition Zone.

Xtractor consists of four main areas. Main Menu Toolbar Workspace Definition Viewer Image Viewer

Xtractor Advanced User Manual 6.1 Page 15 of 68

Page 16: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Cover Page Handling OptionsIn some situations, the first page of a document is not desired to be seen initially when it is displayed by the DocuPhase Viewer. For example when BarCoder is used, the patch-coded and bar-coded coversheet it generates is placed on top of each document being scanned to separated and automatically index the documents. After scanning, this patch- and bar-coded page remains on the front of the TIFF-image of the multi-page document as it is passed to Xtractor for processing.Likewise, a title or cover page may have been scanned with the document that the users may or may not want displayed as the document’s 1st page when displayed later.The Cover Page Handling Option is used to specify one of the following methods for handling the cover page (i.e., whether or not it is barcoded):

Retain Cover Page (the Default) Auto Remove Cover Page Auto Move Cover Page to be the Last Page

Xtractor Advanced User Manual 6.1 Page 16 of 68

Page 17: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Xtractor Designer InterfaceAdvanced users such as DocuPhase Designers and Administrators can log on to the Xtractor Designer component to establish definitions and zones to control the OCR/Barcode processing of TIFF documents and image pages by the Xtractor Service component.

Xtractor Main MenuThe main menu contains four drop-down menus: File, Options, View, and Help.

Element Xtractor Main Menu Description

File Provides the ability to work with the Xtractor Files.

New Creates a new project.

Open Opens an existing project.

Close Closes the existing project without exiting the program.

Save Saves the current project.

Save As Displays the Save As dialog to save the current project

Exit Closes the current project and exits the application

Xtractor Advanced User Manual 6.1 Page 17 of 68

Page 18: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Element Xtractor Main Menu Description

OptionsThe Options menu consists of one option: Select Recognition Engine. Use the Recognition Engine Selection dialog to select the recognition engine type and accuracy settings.

MTX (Mtext) MTX Recognition Engine provides: Fast selectable OCR engine. Support for twelve (12) languages. Supports a maximum of 64 zones on one

image; Supports Omnifont, Draftdot9 and Draftdot24 filling methods.

Provides two (2) page-level accuracy and speed trade off settings including a combined Accurate & Balanced value and Fast.

Provides Checking Subsystem based correction.

Xtractor Advanced User Manual 6.1 Page 18 of 68

Page 19: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Element Xtractor Main Menu Description

MOR MOR Recognition Engine provides: Supports 119 languages. Supports a maximum of 500 zones on one

image. Supports Omnifont, Draftdot24 and OCR-A

filling methods. Supports character training to achieve

improved accuracy. Provides three (3) page-level accuracy and

speed trade off settings including Accurate, Balanced and Fast.

Provides Checking Subsystem based correction.

FRX (FireWorX)FRX Recognition Engine provides:

Optimized for speed. Support for 54 languages. Supports a maximum of 2,500 zones on one

image. Supports Omnifont filling methods. Supports character training to achieve

improved accuracy.

Voting 2-WayPLUS 2-way voting engine for accurate and fast machine print OCR capabilities

Xtractor Advanced User Manual 6.1 Page 19 of 68

Page 20: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Element Xtractor Main Menu Description

Voting 3-WayOmniPage 3-way voting engine for accurate and fast machine print OCR capabilities

View The View menu consists of one option: Refresh. Use this option to refresh the Workspace Definition Viewer and the Image Viewer.

Help The Help menu consists of one option: About. The About option provides a brief description of Xtractor by listing the product name, company name and version number.

Xtractor Advanced User Manual 6.1 Page 20 of 68

Page 21: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Xtractor ToolbarThe Toolbar displays icons that give quick access to commonly used controls in Xtractor. The toolbar displays different icons depending on which object is highlighted in the Workspace Viewer. When a Workspace is highlighted, the following controls are available for quick access.

Element Xtractor Toolbar Description

New Template Project Creates a new workspace.Note: This functionality is also available when the template or the zone is highlighted.

Open WorkspaceOpens a browse dialog to allow the user to browse for an existing workspace.

Note: This functionality is also available when the template or the zone is highlighted.

Save Current Workspace

Opens a browse dialog to allow the user to save the current workspace configuration.

Note: This functionality is also available when the template or the zone is highlighted.

New ProjectAllows the user to create a new project.

PropertiesAfter a specific project, template or zone is highlighted, press this icon to access properties associated with the object selected.

Note: This functionality is also available when the template or the zone is highlighted.

DeleteAfter the user highlights a specific project, template or zone, press this icon to prompt the user with a confirmation to remove the object they have selected.

Note: This functionality is also available when the template or the zone is highlighted.

Xtractor Advanced User Manual 6.1 Page 21 of 68

Page 22: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Element Xtractor Toolbar Description

New TemplateAllows the user to create a new “regular” template.

New Selection TemplateCreates a new Selection Template.

New Routing TemplateCreates a new Routing Template.

RunInitiates Xtractor to scan the repository (one time) for images with an Extract Status. (By default this status is ‘A’).

Run ContinuouslyInitiates Xtractor to continuously search the repository for images with an Extract Status. (By default this status is ‘A’).

StopSelect this icon to stop Xtractor from running.

Note: At this time, the user must repeatedly press the stop icon to stop the service.

TestVerifies the accuracy of the zones being read.

Zoom InIncreases the viewable size of the image.

Zoom OutDecreases the viewable size of the image.

Zone SelectionSelect a specific zone that has been placed on the image.

OCR Machine ZonePlace a zone on the image that recognizes optical characters.

Xtractor Advanced User Manual 6.1 Page 22 of 68

Page 23: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Element Xtractor Toolbar Description

Barcode ZonePlace a zone on the image that recognizes barcodes.

OCR Checkbox ZonePlace a zone on the image that recognizes checkbox selections.

Note: This feature is currently disabled, but will be available in a future release.

OCR Handprint ZonePlaces a zone on the image that recognizes a handprint.

Note: This feature is currently disabled, but will be available in a future release.

Switch Color/BWToggles the image to display as color or black and white. This functionality is useful in allowing the user to see how templates appear before processing and after processing.

Xtractor Workspace ViewerWorkspace Viewer contains the attributes of the workspace, the project, template(s) and each recognition zone. The workspace items are represented by:

Icons A list of names

To access the configuration of each workspace item, follow these steps:1. Highlight the item, right-click it and then select Properties. 2. Select a zone icon to display the Zone Properties.

Xtractor Advanced User Manual 6.1 Page 23 of 68

Page 24: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Xtractor Image ViewerThe Image Viewer displays both the image template and its associated recognition zones. As Xtractor processes documents, it loads target documents into the image viewer and applies the image template to extract data from the document. Set recognition zones are highlighted in red.

Xtractor Advanced User Manual 6.1 Page 24 of 68

Page 25: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Xtractor AdministrationWhen Xtractor initializes for the first time, you are prompted to enter your registration key. The registration key can be obtained from an authorized DocuPhase vendor. This encrypted key is based upon the Machine Key and it is derived from the serial number of the physical hard drive device.To perform the configuration process, follow these steps:1. Enter your registration key in the Registration Key field and click Submit. After

successfully submitting the registration key, the Xtractor Login Dialog loads. 2. Select Server.3. Enter the User Name, Password and the Server. 4. Click Login.

Workspace HierarchyXtractor workspaces have a Project a Template and a Zone.

The Workspace contains the attributes of the workspace, the project, template(s) and each recognition zone.

The Project is the top level container object within the Workspace file. The Template is associated to a DocuPhase Application. The Zone updates a DocuPhase index field with data read from the document.

Xtractor Advanced User Manual 6.1 Page 25 of 68

Page 26: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Workspace Definition ConfigurationTo access the configuration of each workspace item, highlight the item, right-click it and then select Properties or double-click on a zone icon to display the zone properties dialog box. 1. Open a Workspace or right-click on the existing workspace and select

Properties.Note: The Workspace Properties dialog is displayed.

2. The Workspace Properties dialog box requests the name of the web server that Xtractor processes.

Note: This feature allows the user to log into one server while processing against another server.

3. Deselect the Show Run Status.Note: This prevents the Run Status screen from appearing when the Project is activated

4. Select OK.

Workspace Properties ConfigurationWorkspace configuration takes place in the Workspace Properties dialogue box shown below.

The Workspace Properties dialogue box requests the name of the Web Server that Xtractor processes. This feature allows the user to log into one server while processing against another server.Deselect the Show Run Status option to prevent the Run Status screen from appearing when the Project is activated.

Xtractor Advanced User Manual 6.1 Page 26 of 68

Page 27: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Project Properties ConfigurationTo configure a project, follow these steps.1. Create a new project or right-click on an existing project and select Template

Properties. Note: The Project Properties dialog is displayed.

2. Define the desired project name.3. Select OK.

Template and Image Processing PropertiesTemplate Processing PropertiesThere are three types of templates:

Recognition, Selection, and Routing templates.

Xtractor Advanced User Manual 6.1 Page 27 of 68

Note: To access the Recognition Template Properties, Create a new recognition template or right-click an existing recognition template and select Template Properties Make the desired changes and click OK to save.

Page 28: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

The Template Properties dialog displays the following elements.Element Template Processing Properties Description

NameThe user-defined name of the template.

ApplicationThe DocuPhase application that contains associated templates and images for use with Xtractor.

Template FileDisplays the name of the file being used as the template.

ProcessA check in this checkbox allows this template to process when the workspace is activated. Remove the check from this checkbox to disable this template from processing when the workspace is activated.

Extract StatusThe user defined alpha character entered in this field is the signal that lets Xtractor know that a new document has entered the DocuPhase system and needs to be processed and auto-indexed.

Split StatusThe user defined alpha character entered in this field is the ‘temporary’ status that is assigned to pages that have been separated from the original document during the first pass for subpage processing. By default, this value is’ S’. Xtractor then scans the system for Split Status documents and processes those using the Extract Status defined by the user.

Xtractor Advanced User Manual 6.1 Page 28 of 68

Page 29: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Element Template Processing Properties Description

Unconfident StatusXtractor assigns the user defined alpha character entered in this field to a document when it encounters a problem with a document that requires human intervention.By default, this value is ‘P’. If it is issued to a “bad” document, Xtractor moves to the next document in DocuPhase with an ‘A’ (Extract) Status and begins processing it.“Bad” documents receive an unconfident status in two basic scenarios:

The first scenario is that Xtractor did not find a barcode or machine print in a defined zone. Since Xtractor was expecting data where there is no data to process, the document is flagged with a ‘P’ to make it visible in DocuPhase in the indexing queue.

The second, more common scenario is that the data detected by Xtractor does not meet the confidence threshold for a defined zone in the Xtractor template.

If a “bad” document has been flagged with a ‘P’, it requires human intervention to complete the indexing process; it becomes viewable in DocuPhase in the indexing queue; a holding bin that allows manual assignment of index values to documents sent to DocuPhase.

Complete StatusThe user defined alpha character entered in this field is assigned to the recognition document by Xtractor and is used to signal DocuPhase that the file has been processed and has met all of the confidence requirements for each defined zone in the Xtractor template. When the file status is changed to this character, it is searchable as an archived document in DocuPhase. By default this value is ‘E’.

Xtractor Advanced User Manual 6.1 Page 29 of 68

Page 30: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Element Template Processing Properties Description

Corruption StatusThe user defined alpha character entered in this field is assigned to the recognition document by Xtractor and is used to signal DocuPhase that the file has not been processed because it was found to be corrupted or unreadable.An example of an unreadable file would be a word document or text file.Xtractor is designed to read TIFF images only. If Xtractor encounters a non-image file, it is considered corrupt. By default, this value is ‘C’.

Image Processing PropertiesTo access the Image Processing Properties, follow these steps.1. Create a new template or right-click on an existing template and select Image

Processing Properties.

The Image Processing Properties dialog is displayed.

Xtractor Advanced User Manual 6.1 Page 30 of 68

Page 31: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

2. Make the desired changes.

Xtractor Advanced User Manual 6.1 Page 31 of 68

Page 32: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

3. Select OK.

Element Image Processing Properties Description

ContrastCheck this box to allow manual adjustment of the contrast of the image. Move the slider to the right to decrease the contrast. Move the slider to the left to increase the contrast.

BrightnessCheck this box to allow manual adjustment of the brightness of the image. Move the slider to right to increase the brightness. Move the slider to the left to decrease the brightness.

SaturationCheck this box to allow manual adjustment of the saturation of the image. Move the slider to right to increase the saturation. Move the slider to the left to decrease the saturation.

Note: This option only applies to color images.

HueCheck this box to allow manual adjustment of the hue of the image by moving the slider left and right.

Note: This option only applies to color images.

Intensity DetectCheck this box to adjust manually the level of intensity detected in the image by moving the slider left and right.

Note: This option only applies to color images.

DespeckleCheck this box to enable ‘despeckle’. This feature cleans up the image after it is scanned. Remove the check from the checkbox to disable this feature.

DeskewCheck this box to enable “deskew”, a feature that removes slanting, twisting or other distortion from the image after the image is scanned. Remove the check from the checkbox to disable this feature.

AlignCheck this box to enable “align”, a feature that repositions the image after the image is scanned. Remove the check from the checkbox to disable this feature.

Convert to Black and White

Check this box to convert the image to black and white prior to processing.

Xtractor Advanced User Manual 6.1 Page 32 of 68

Page 33: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Element Image Processing Properties Description

MagnificationThis drop-down menus allows you to choose the level of magnification of the original image and adjusted image windows.

Recognition TemplateA recognition template reads data from an image file and updates the associated index fields in DocuPhase. A recognition template is the only type of template that automatically indexes index-fields in DocuPhase. Recognition Template configuration takes place in two areas:

Template Properties and Image Processing Properties

To access the Recognition Template Properties, follow this step.1. Create a new recognition template or right-click an existing recognition template

and select Template Properties.

Note: The Template Properties dialog is displayed.

Selection TemplateA selection template extracts information from a recognition zone located on the first page of a document and then chooses which recognition template to apply for auto-indexing the document. Selection templates are used when multiple document form layouts need to be processed for the same DocuPhase application.

For instance, form layouts may differ for a purchase order form between 2006 and 2007.

To access the Selection Template Properties, follow these steps.1. Create a new selection template or right-click on an existing selection template

and select Template Properties.

Xtractor Advanced User Manual 6.1 Page 33 of 68

Page 34: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Note: The Template Properties dialog is displayed.

2. Make the desired changes.3. Click OK.Selection Template Properties are described in the following table.Element Selection Template Properties Description

NameThe user defined name of the template.

ApplicationDisplays the DocuPhase application the template is associated with and where Xtractor locates the images.

Template FileA template association refers to the relationship between a template type (e.g., Invoices) and its Association Value. When a template is created, users create a unique association value in reference to the template. In the example below, association values for invoice templates begin with INV, followed by three numeric characters.

Xtractor Advanced User Manual 6.1 Page 34 of 68

Page 35: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Element Selection Template Properties Description

AssociationsThis icon opens the Template Association dialog. From this dialog, the user can enter the template and any of the values that indicate which template to use, as shown in the example below.

Split Status The user defined alpha character entered in this field is the “temporary”’ status that is assigned to pages that have been separated from the original document during the first pass for subpage processing. By default, this value is ‘S’. Xtractor then scans the system for Split Status documents and processes those using the Extract Status defined by the user (default is ‘A’).

Extract Status The user-defined alpha character (by default this value is ‘A’) that notifies Xtractor that a new document has entered DocuPhase that needs to be scanned and auto-indexed.

Xtractor Advanced User Manual 6.1 Page 35 of 68

Page 36: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Element Selection Template Properties Description

Unconfident Status Xtractor assigns the user defined alpha character entered in this field to a document when it encounters a problem with a document that requires human intervention.By default, this value is ‘P’. If it is issued to a “bad” document, Xtractor moves to the next document in DocuPhase with an 'A' Extract Status and begin processing it.“Bad” documents receive an unconfident status in two basic scenarios.

The first scenario is that Xtractor did not find a barcode or machine print in a defined zone. Since Xtractor was expecting data where there is no data to process, the document is flagged with a ‘P’ to make it visible in DocuPhase in the indexing queue.

The second, more common scenario is that the data detected by Xtractor does not meet the confidence threshold for a defined zone in the Xtractor template.

If a “bad” document is flagged with a ‘P’, it requires human intervention to complete the indexing process; it becomes viewable in DocuPhase in the indexing queue; a holding bin that allows manual assignment of index values to documents sent to DocuPhase.

Complete StatusThe user defined alpha character entered in this field is assigned to the recognition document by Xtractor and is used to signal DocuPhase that the file has been processed and has met all of the confidence requirements for each defined zone in the Xtractor template.When the file status is changed to this character, it is searchable as an archived document in DocuPhase. By default this value is ‘E’.

Xtractor Advanced User Manual 6.1 Page 36 of 68

Page 37: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Element Selection Template Properties Description

Corruption Status The user defined alpha character entered in this field is assigned to the recognition document by Xtractor and is used to signal DocuPhase that the file has not been processed because it was found to be corrupted or unreadable.An example of an unreadable file would be a word document or text file.Xtractor is designed to read TIFF images only. If Xtractor encounters a non-image file, it is considered corrupt. By default this value is ‘C’.

Routing TemplateA Routing template is a special template that reads a value from a recognition zone on the first page of the image, then routes the document to a different DocuPhase application based on this value read. After routing, Xtractor uses the appropriate recognition template to update the index values for that document in the proper application.To access the Routing Template Properties, follow these steps. 1. Create a new routing template or right-click on an existing routing template and

select Template Properties.

Note: The Template Properties dialog is displayed.

2. Make the desired changes.3. Select OK.

Xtractor Advanced User Manual 6.1 Page 37 of 68

Page 38: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Xtractor Advanced User Manual 6.1 Page 38 of 68

Page 39: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Routing Template Properties are described in the following table.Element Routing Template Properties Description

Name The user defined name of the Template.

ApplicationThe DocuPhase application the template is associated with and where Xtractor locates the images.

Template File The name of the file used as the template.

Route by Application Name

A check in the “Route by Application Name” checkbox configures the routing template to read the text value of the application and route the image accordingly.The text value of the application must exist on the image in order to use this feature. Remove the check from the “Route by Application Name” checkbox to disable this feature.

Split StatusThe user defined alpha character entered in this field is the “temporary” status that is assigned to pages that have been separated from the original document during the first pass for sub-page processing.By default, this value is ‘S’.Xtractor then scans the system for Split Status documents and process those using the Extract Status defined by the user.

Extract StatusThe user-defined alpha character (by default this value is ‘A’) that notifies Xtractor that a new document has entered DocuPhase which needs to be scanned and auto-indexed.

Xtractor Advanced User Manual 6.1 Page 39 of 68

Page 40: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Element Routing Template Properties Description

Unconfident StatusXtractor assigns the user defined alpha character entered in this field to a document when it encounters a problem with a document that requires human intervention.By default, this value is ‘P’. When issued to a “bad” document, Xtractor moves to the next document in DocuPhase with an A Extract Status and begins processing.There are two basic scenarios in which “bad” documents receive an unconfident status:

The first scenario is that Xtractor did not find a barcode or machine print in a defined zone. Since Xtractor was expecting data where there is no data to process, the document is flagged with a P to make it visible in DocuPhase in the indexing queue.

The second, more common scenario is that the data detected by Xtractor does not meet the confidence threshold for a defined zone in the Xtractor template.

If a “bad” document is flagged with a ‘P’, it requires human intervention to complete the indexing process; it becomes viewable in DocuPhase in the indexing queue; a holding bin that allows manual assignment of index values to documents sent to DocuPhase.

Complete StatusThe user defined alpha character entered in this field is assigned to the recognition document by Xtractor and is used to signal DocuPhase that the file has been processed and has met all of the confidence requirements for each defined zone in the Xtractor template.When the file status is changed to this character, it is searchable as an archived document in DocuPhase.By default, this value is ‘E’.

Xtractor Advanced User Manual 6.1 Page 40 of 68

Page 41: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Element Routing Template Properties Description

Corruption StatusThe user defined alpha character entered in this field is assigned to the recognition document by Xtractor and is used to signal DocuPhase that the file has not been processed because it was found to be corrupted or unreadable.

An example of an unreadable file would be a word document or text file.

Xtractor is designed to read TIFF images only.If Xtractor encounters a non-image file, it is considered corrupt. By default this value is ‘C’.

Zone PropertiesZone Properties configuration takes place in the Zone Properties dialog box. A slightly different initial dialog box is associated with each of the three types of possible recognition zones:

Barcode, OCR Machine, and OCR Checkbox Zones.

Use this dialog box to access the zone properties dialog, create a new Barcode, OCR Machine or OCR Checkbox zone or right-click on an existing zone from the workspace viewer.

Date PropertiesThe Date Properties dialogue allows you to reformat the date found in the document before inserting it into the database.Select Date from the Data Transformation drop-down box and select Properties.

Xtractor Advanced User Manual 6.1 Page 41 of 68

Page 42: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Xtractor Advanced User Manual 6.1 Page 42 of 68

Page 43: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

. Element Date Properties Dialog Description

Input Mask

This mask tells Xtractor what order the date is in and allows Xtractor to reorder the date so it can be inserted into the database. The Input Mask may be either a fixed length mask (MMDDYY) or discriminated length mask (m/d/y).

Note: Date masks can be concatenated using the ‘^’ character but only date masks of the same type. For instance, if the mask in the Data Modifications mask edit box were to have the string ##/##/##^##/##/#### the input mask could read mm/dd/yy^mm/dd/yyyy.

Add to Century

Select the desired century to apply to YY values.

Pivot YearBy default, Xtractor will assume the current century when determining MMDDYY dates (2000). However, you may want to process dates from the last century (1900). Enter a Pivot Year value to determine which Add to Century value is applied.

If the YY value is greater than or equal to the Pivot Year value, the Add to Century value is used.

Barcode Zone PropertiesBarcode zone properties display three tab selections:

Definition,

Xtractor Advanced User Manual 6.1 Page 43 of 68

Page 44: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Document Separator, and Modification.

Barcode Zone DefinitionsThe Barcode Zone Definitions elements are described below.

Xtractor Advanced User Manual 6.1 Page 44 of 68

Page 45: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Element Barcode Zone Properties Description

IdentityThis option is unavailable to the user.

RequiredA check in the “Required” checkbox sets the zone as mandatory. Remove the check if the zone is not mandatory.

Document SeparatorA check in this box specifies this zone as the document separator. Only one document separator can exist per template.

Code 39A check in this box enables Xtractor to read Code 39 Barcodes.

Code 128A check in this box enables Xtractor to read Code 128 Barcodes.

IndexAssociate the zone to an index by selecting the name of the index from the drop-down menu. The index name normally auto-populates the zone name.

NameThis is the name applied to the zone. Normally the name of the zone auto-populates and is the same as the selected application.

Zone SpecificationsThe values for zone specification are automatically populated when the user defines the zone area on the image. These values set the Top, Left, Height, Width and Justification of the zone area.

Xtractor Advanced User Manual 6.1 Page 45 of 68

Page 46: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Document Separator TabThe Document Separator tab has one primary input field defined below:

Element Document Separator Tab Description

Mask This mask instructs Xtractor to recognize a specific pattern within the zone. If this pattern is recognized, only then does it cause a separation.Mask values (symbols) include:

@ Represents a single alpha character. # Represents a single number. @(n) Represents n number of characters. #(n) Represents n number of numbers. @* Represents a continuous series of alpha

characters until the first non-alpha character. #* Represents a continuous series of numbers

until the first non-numeric character. ^ is used to separate multiple masks.

For example ##/##/##^##/##/####

These mask formats also apply to the mask field under the Modification tab for post results modification. The Modification tab only applies to the results after the recognition.

Xtractor Advanced User Manual 6.1 Page 46 of 68

Page 47: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Modifications TabThe Modification tab has two main sections, outlined and defined below:

Element Modification Tab Description

Line NumberThe numeric value that designates which line Xtractor recognizes when a zone covers multiple lines of data.

TagThis user definable field contains alpha/numeric character(s) that indicate where Xtractor begins to extract data.

Ending Delimiter(s)This user definable field contains alpha/numeric character(s) that indicate where Xtractor stops extracting data. Three supported switches are:

\t (Tab) \r (Carriage Return) \n (Line Feed)

To First SpacePlace a check in the ‘To First Space’ checkbox to instruct Xtractor to continue recognition until it encounters the first space.

MaskThis mask instructs Xtractor to recognize a specific pattern within the one. If this pattern is recognized, only then does it cause a separation.

Xtractor Advanced User Manual 6.1 Page 47 of 68

Page 48: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Element Modification Tab Description

Default ValuePopulate the default value field and select the ‘If Empty’ radio button to allow Xtractor to substitute the default value to an index when a recognition zone is seen as empty.Populate the default value field and select the ‘If Unconfident’ option to allow Xtractor to substitute the default value to an index when a recognition zone is seen as questionable.

FindThe value that Xtractor searches for within the results of recognized data.

Replace WithThe value that Xtractor uses to replace the value specified in the Find field.

PrependThe value that Xtractor inserts before recognized data.

AppendThe value that Xtractor inserts after recognized data.

NoneUse this radio button if a modification to the text is not necessary.

To UpperUse this option to convert all text to upper case.

To LowerUse this option to convert all text to lower case.

To ProperUse this option to convert all text to title case (only the first letter of each word is capitalized).

Strip All PunctuationA check in this checkbox removes all punctuation from the result of recognized data.

Remove Embedded Spaces

A check in this checkbox removes all spaces from the result of recognized data.

TranslationProvides the ability to select a Translation Type and/or Index.

TypeProvides a list of index values to be populated.

IndexProvides a drop down menu containing a set of values to populate DocuPhase index fields with text other than what has been actually read from the document.

Xtractor Advanced User Manual 6.1 Page 48 of 68

Page 49: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

OCR Machine Zone PropertiesOCR Machine zone properties display five tab selections:

Definition, OCR Character Set, Document Separator, Modification, and Validation.

OCR Machine DefinitionsThe Definition tab has three main sections, outlined and defined below:Element OCR Machine Zone Definition Tab Description

IdentityThis option is unavailable to the user.

RequiredA check in the “Required” checkbox sets the zone as mandatory. Remove the check if the zone is not mandatory.

Document SeparatorA check in this box specifies this zone as the document separator. Only one document separator can exist per template.

IndexAssociate the zone to an index by selecting the name of the index from the drop-down menu.The index name normally auto-populates the zone name.

NameThis is the name applied to the zone. Normally the name of the zone auto-populates and is the same as the selected application.

Zone SpecificationsThe values for zone specification are automatically populated when the user defines the zone area on the image.These values set the Top, Left, Height, Width and Justification of the zone area.

Xtractor Advanced User Manual 6.1 Page 49 of 68

Page 50: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

OCR Character SetThe OCR Filter mask instructs Xtractor to recognize a specific pattern within the zone. OCR Character properties are described in the following table.Element OCR Character Set Description

LettersXtractor will use a letter character comparison. Be sure to specify either All, Upper-case, or Lower-case character sets.

NumbersXtractor will use a number character comparison.

PunctuationXtractor will use a punctuation character comparison.

Misc. CharactersXtractor will use a miscellaneous character comparison.

Document Separator TabElement Document Separator Tab Description

MaskThis mask instructs Xtractor to recognize a specific pattern within the one. If this pattern is recognized, only then does it cause a separation.

Modifications TabElement Modifications Tab Description

Line NumberThe numeric value that designates which line Xtractor recognizes when a zone covers multiple lines of data.

TagThis user definable field contains alpha/numeric character(s) that indicate where Xtractor begins to extract data.

Ending Delimiter(s)This user definable field contains alpha/numeric character(s) that indicate where Xtractor stops extracting data. Three supported switches are:\t (Tab)\r (Carriage Return)\n (Line Feed)

Xtractor Advanced User Manual 6.1 Page 50 of 68

Page 51: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Element Modifications Tab Description

To First SpacePlace a check in the ‘To First Space’ checkbox to instruct Xtractor to continue recognition until it encounters the first space.

MaskThis mask instructs Xtractor to recognize a specific pattern within the one. If this pattern is recognized, only then does it cause a separation.

Default ValuePopulate the default value field and select the ‘If Empty’ radio button to allow Xtractor to substitute the default value to an index when a recognition zone is seen as empty.Populate the default value field and select the ‘If Unconfident’ option to allow Xtractor to substitute the default value to an index when a recognition zone is seen as questionable.

FindThe value that Xtractor searches for within the results of recognized data.

Replace WithThe value that Xtractor uses to replace the value specified in the Find field.

PrependThe value that Xtractor inserts before recognized data.

AppendThe value that Xtractor inserts after recognized data.

NoneUse this radio button if a modification to the text is not necessary.

To UpperUse this option to convert all text to upper case.

To LowerUse this option to convert all text to lower case.

To ProperUse this option to convert all text to title case (only the first letter of each word is capitalized).

Strip All PunctuationA check in this checkbox removes all punctuation from the result of recognized data.

Remove Embedded Spaces

A check in this checkbox removes all spaces from the result of recognized data.

TypeThe numeric value that designates which line Xtractor recognizes when a zone covers multiple lines of data.

Xtractor Advanced User Manual 6.1 Page 51 of 68

Page 52: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Element Modifications Tab Description

IndexThis user definable field contains alpha/numeric character(s) that indicate where Xtractor begins recognizing data.

Xtractor Advanced User Manual 6.1 Page 52 of 68

Page 53: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Validation TabElement Validation Tab Description

Post Recognition Validation

This selection allows the user to compare the captured data to an external reference table to ensure data integrity.

Xtractor Advanced User Manual 6.1 Page 53 of 68

Page 54: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Working with Xtractor

Creating an Image TemplateXtractor uses recognition zones to store information about how it should process documents. The collection of recognition zones and associated attributes for a specific processing task represents an image template. An image template requires an associated sample TIFF image file.The steps needed to create your first recognition template in Xtractor are outlined below:1. Create a sample TIFF image file2. Open the Sample TIFF image in Xtractor.3. Define recognition zones.4. Configure recognition zones.5. Test template.6. Save new Image Template.

Open the sample TIFF image using XtractorTo open a TIFF image in Xtractor to be used as an image template, follow these steps in a newly created project (the example project is named “Practice”):1. In Xtractor, right-click on the project Practice.2. Select New Template.3. Highlight the target TIFF.4. Select Open.

The first page of the TIFF file appears in the image viewer. Notice we now have an image template named “New Template”.

5. Let’s change the name of “New Template” to “Invoices Template”. To do this, right click on “New Template” and select Template Properties.

6. Type “Invoices Template” in the Name Field.7. Click OK.

Defining Recognition ZonesTo define a recognition zone, follow these steps:1. Click on the Barcode Zone or the OCR Zone option in the Control Panel.

Note: In this example, we selected the OCR Zone.

Xtractor Advanced User Manual 6.1 Page 54 of 68

Page 55: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

2. On the image loaded into the image viewer, click and drag your mouse to make a rectangle containing the information you wish to capture. As you drag the cursor, it changes appearance to crosshairs.

3. After you have dragged your cursor to enclose the data you wish to capture, release the mouse button. The Zone Properties immediately appears. Select the desired index from the drop-down menu listed in the Application section and then click OK.

4. After you click OK to return to the image viewer in Xtractor, the new recognition zone is delineated with a red rectangle. Notice that the name given to the zone in Step 4 appears in the workspace viewer as a new zone definition key with the name “Name.”

Testing the recognition zonesTo test the newly defined recognition zone, follow these steps:1. Select the template you want to test.2. Right-click on the selected template and choose the Test option. A Test Results

summary appears. It lists the Zone(s) tested as well as the Confidence of that test and the Results.

Saving the image templateTo save the image template, follow this step:1. Select File > Save to save the template and all of the associated zones. Xtractor

saves the entire workspace. By default, workspaces are saved in the

C:\Program Files\iDatix\Xtractor\workspaces directory.

Using Xtractor to Process DocuPhase DocumentsWe are now ready to process documents sent to DocuPhase that match the template we just designed. ScanDox or other previous module in its step-by-step flow must have set the target file’s with their Object-Status set to code “A”. Recall that an “A” status is the code that typically notifies Xtractor of existing documents in DocuPhase that require processing; However, this convention can be reconfigured for your site. To process documents:1. Highlight the project you want to test.2. Right-click on the project and select the Run Continuously option.3. The Processing Status monitor appears indicating that Xtractor is continually

scanning DocuPhase for “A” status documents that have been scanned in by ScanSpeed.

Xtractor Advanced User Manual 6.1 Page 55 of 68

Page 56: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Note: To stop the process, select Stop Repeatedly until the Processing Status monitor closes.

For additional information on how documents are captured and routed to Xtractor, see Appendix F: Routing Documents to Xtractor.

Xtractor Advanced User Manual 6.1 Page 56 of 68

Page 57: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Appendix A: Xtractor Installation

Prerequisites & Minimum RequirementsThe following Xtractor installation consideration topics are documented in the following locations:

Minimum Requirements: See DocuPhase Prerequisites Guide Prerequisites: See DocuPhase Prerequisites Guide Xtractor Installation from .MSI:

Xtractor Designer Xtractor ServiceSee: DocuPhase Installation & Upgrade Guide, Topic: Installing the Optional Components.

After installation, the Xtractor shortcut icon is placed on the desktop of the installed unit for convenient access to the Xtractor Designer.

Xtractor LoginTo log in to Xtractor, follow these steps.1. Navigate to and select the Xtractor desktop shortcut icon.

Note: Optionally, you may select Start>All Programs>iDatix>Xtractor.

Xtractor Advanced User Manual 6.1 Page 57 of 68

Page 58: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Note: The Xtractor log in dialog displays.

Logon to Xtractor

2. Enter your User Name.3. Enter your Password.4. Click the Server button, then Select your Server using the drop-down list.5. Define the Server name and select Load.

Xtractor Advanced User Manual 6.1 Page 58 of 68

Page 59: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

6. Select OK.Note: The Alias Manager dialog displays the appropriate Server Name.

7. Click the Login button and Xtractor launches.

Appendix B – Routing TemplateRouting Templates are special purpose templates which read a value from a zone on the first page of the image, then Routes the document to a different DocuPhase Application based on the value read. After Routing, Xtractor calls the appropriate Recognition Template to update the Index Values for that document in the new application.

Xtractor Advanced User Manual 6.1 Page 59 of 68

Page 60: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Xtractor Advanced User Manual 6.1 Page 60 of 68

Page 61: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Appendix C – Split then Index Template

Xtractor Advanced User Manual 6.1 Page 61 of 68

Page 62: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Appendix D – Index then Split Template

Xtractor Advanced User Manual 6.1 Page 62 of 68

Page 63: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Appendix E – Selection TemplateSelection Templates are used when multiple document form layouts need to be processed for the same DocuPhase Application. The Selection Template reads a value from a zone on the first page of a document and chooses which Recognition Template to use for updating the DocuPhase Application index fields based on the value it reads. The value read from the documents by the Selection Template Zone can be any text, but must be unique for each form layout to be processed and must be consistently located on all forms to be processed.

Xtractor Advanced User Manual 6.1 Page 63 of 68

Page 64: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Appendix F: Routing Documents to XtractorThe DocuPhase Fixed and Standard Object Status Codes are listed below that indicate state transitions as a document is processed by different DocuPhase Products and Services in the proper sequence as designed and configured for the installation.

Fixed Object Status Codes: C - Corrupted Document in Archive

E - Submit or Searchable in DocuPhase I - Routed to Individual or Group InBox P - Routed as Pending Indexing

X - Soft-Deleted Document in Archive Y - Source Document in Archive that is Replaced by its Split-Out DocumentsStandard Object Status Codes:A - Routed to ExtractorM - Routed to Multi-Function ServiceR – Routed to Data ExchangeV – Routed to RapidDoc (a legacy product)

The Fixed Object Status Code assignment are hard-coded and fixed in the system so they must be used as assigned. However, the Standard Object Status Codes used by administrators for each installation is flexible allowing single-character code values to be re-assigned and other letters and digits may be used by making the proper configuration settings.

For purposes of examples in this manual, these fixed and standard Object Status Code values will be used.

Xtractor Advanced User Manual 6.1 Page 64 of 68

Page 65: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

The following diagram provides an example of the DocuPhase document capture and routing concepts that illustrates how TIFF documents are routed to Xtractor after successful capture, indexing and submission to DocuPhase.

E

ScanDox Xtractor

iDox

Multi-Function Service

“iDox-In”Hot Folder“App A”

Hot Folder

InBoxesA B C

RapidDox

iSynergy

Data Exchange

iSynergy

Scan or Drag & DropDocument Files A

E

R

This Example Shows Flexible Routing for Captured Documents amongiDatix Products & Services using

Hot Folders & Object Status Codes

I

C C-Status – Corrupt Content Image/File

--- Inactive Archive in iSynergy ---

On Success

On Success

On Success

iSynergy Manual

Indexing Page

Multi-FunctionDevice (MFD)

“App B”Hot Folder

“App Z”Hot Folder

. . .

Scanner

iSynergyServices

M

P

Illustrating Two WaysDocuments Enter the System via:

MFD Devices/iSynergy Services Scanner Devices/ScanDox

On Success On Success

On SuccessDocuPhaseManual

Indexing Page

DocuPhaseServices

iDox

Although there are many document capture and routing possibilities and combinations, the above example shows two ways documents can enter the system to perform similar processing to illustrate and compare the two example designs.This example scenario assumes the following design requirements:

Capture Documents via a Multi-Function Device or Scanner – All documents in this example are assumed to be TIFF-image documents.

Some manual indexing is initially performed OCR/Barcode processing is needed by Xtractor (A-status) ODBC database lookups using the accumulated index fields should be done as

a final step before submission to DocuPhase (R-Status) Submit as a Searchable Document in DocuPhase (E-status)

Xtractor Advanced User Manual 6.1 Page 65 of 68

Page 66: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

In the above example diagram, the following functions take place in sequence and are explained below:1. Documents enter the system for capture via:

a. A Multi-Function Device (MFD) – Each of MFD-device’s scanning profiles are configured to place its output files in the appropriate DocuPhase Application’s hot folder. DocuPhase Services monitors hot folders for incoming content files

which it places in the DocuPhase Repository file system and creates an initial DocuPhase (eSpeed) database record linked to the content file.In this example, DocuPhase Services sets the initial Object Status to “M” since it is an MFD-supplied document to route it to the Multi-Function Service.

M-Status: The Multi-Function Service is configured on success to set the Object Status to “P” to route the new document to the appropriate DocuPhase Application’s Manual Indexing Queue.

P-Status: At the DocuPhase Manual Indexing Page an operator pulls P-Status documents from the selected DocuPhase Application’s Manual Indexing Queue and enters the necessary indexing fields.

On successful submission of the Document to DocuPhase after indexing, this example configuration is designed to route the document to Xtractor for further OCR/Barcode processing by setting the Object Status to “A”.

Note: At this point in the example, the flows are the same for the MFD and Scanner/ScanDox paths so the flow description for this path will resume with Xtractor at step 2, below.

b. A Scanner and ScanDox – Using ScanDox Profiles for specified DocuPhase Application, content documents can be captured by: scanning, selecting-and-opening, or entered by dragging-and-dropping digital-content files into ScanDox. ScanDox has the capability to manually split compound documents into

individual documents or join document pages prior to indexing, but these features are not required in our example.

Manual indexing can be done with ScanDox as needed as well as ODBC lookups to other update index fields based on the currently accumulated index information.

Note: For purposes of this example, assume that not all ODBC lookups and index field updates can be done until after Xtractor has obtained certain necessary indexes since Xtractor and Data Exchange routing is included.

Upon completing ScanDox processing by clicking a Submit button, the selected ScanDox profile is configured to set the Object Status to A-Status to route the document to Xtractor.

Xtractor Advanced User Manual 6.1 Page 66 of 68

Page 67: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

Note: At this point in the example, the flows are the same for the MFD and Scanner/ScanDox paths so the flow description for this path will resume with Xtractor at step 2, below.

Xtractor Advanced User Manual 6.1 Page 67 of 68

Page 68: Word Template - help.docuphase.com Advanced Use… · Web viewClient Support Services Contact Information9. Xtractor Product Overview10. Data Extraction11

2. A-Status: Xtractor receives the TIFF document to perform the OCR and Barcode reading operations to identify the document and the other operations that have been configured for it to perform for this type of document. The configuration for this type of document specifies:a. On Success: the Object Status is set to “R” for Data Exchange processing.b. On Failure: the Object Status is set to “P” to allow manual re-indexing and

review by an operator.3. R-Status: Data Exchange receives the document for database Look-up and

Index Field update processing based on the configuration defined for this document’s DocuPhase Application.a. On Success: the Object Status is set to “E” for Submission or return to

DocuPhase as a searchable document.b. On Failure: the Object Status is set to “P” to allow manual re-indexing and

review by an operator.4. E-Status: The document is submitted to DocuPhase making it part of the

searchable documents protected and secured by DocuPhase.Note: This example, described above, illustrates the use of “Cascading Status” result codes which dynamically determine the routing of a document during its lifecycle in DocuPhase.

As each case is configured in each of the components by the installation and DocuPhase administrator staff, a success and failure status is defined or defaulted for each component and its cases. As the document is processed and its routing follows a path dynamically determined by the success and failure status codes that result – so flows the document.

For additional information on DocuPhase document capture and routing, see the DocuPhase Advanced User Manual, section: Document Capture Concepts.

Xtractor Advanced User Manual 6.1 Page 68 of 68