Automated Data Capture and Extraction with ChronoScan for Automated Metadata and Classification

Post on 16-Jul-2015

277 views 7 download

Tags:

Transcript of Automated Data Capture and Extraction with ChronoScan for Automated Metadata and Classification

With ChronoScan

Capture and Extraction: Where ECM Begins

Capture means many things when speaking of Document/Records Management or Enterprise Content Management.

AIIMAssociation for Information and Image Management

“Capture boils down to entering content into the system.”

Extraction is an important element of Capture…

By extraction we mean pulling the important information from the content to use for classification or taxonomy purposes, creation of the appropriate metadata or tags, and more.

Extraction is an important element of Capture…

So Why is Capture and Extraction so Important?

All Information Governance and Content Management Depends on Correct Metadata

• Find key information on demand

• Apply the correct data security/privacy rules

• Determine the correct data retention

• Protect your entity regarding eDiscovery/legal compliance issues

• Turn your content or knowledge into a competitive advantage

You have to correctly identify the document or content to:

a comprehensive suite of software for document scanning, data extraction and integration into your ECM, CMIS compliant, or line of business database.

ChronoScan is:

The capture of the “thing”:

• Scans• Faxes• Emails• PrintStreams

Exterior Interior

Let’s categorize capture by what we’ll call the Exterior and the Interior

The capture of the content of the “thing”:

Actual data and information extracted from the “thing” such as invoice number, line items, customer number, vendor number, patient name…whatever your information concerns.

This presentation looks at the “interior” capture accomplished by ChronoScan’s “extraction” features.

ChronoScan’s Extraction Features We’ll Examine

OCR technology is the foundation for many of

ChronoScan’s auto extraction capabilities.

Using sophisticated OCR technologies such as Zonal OCR and Grid OCR, ChronoScan can extract data to classify the document and create indexes (metadata or tags) from structured and unstructured

documents.

Extract only data from the area of your document where your important information is found for fast, automatic data extraction.

Zonal OCR Capture

Use Dynamic Text Anchors to link to moving text using constant or variable patterns, thus accommodating unstructured documents.

Zonal OCR Capture

Here, ChronoScan finds the word “subtotal” and captures the data to the right. Extracted data can be further manipulated and used for validation.

Optimize for your documents with multiple parameters like image processing, OCR engine, type of data to find, regular expression validation and more.

Zonal OCR Capture

Grid OCR is used for Line Item Extraction and

Advanced Report Breakdown or Dismount.

With Line Item Extraction, extract and manipulate line data found on such forms as invoices or delivery tickets.

Advanced Report Breakdown or DismountConvert complex reports to a structured data format.Convert complex PDF or scanned OCR reports into a structured data format. With this unique feature, ChronoScan is able to break down complex reports automatically, splitting every different record as an independent processing unit. The software is able to adapt extraction to different rules and page limits to break down and structure visually complex documents into a compressible data file (CSV/XLS).

Advanced Report Breakdown or Dismount

Break Down

Extract

Converts complex reports to structured data.

ChronoScan breaks down complex reports automatically, splitting every different record as an independent processing unit.

Easily adapt extraction to different rules and page limits to break down and structure visually complex documents into a compressible data file (CSV/XLS).

(using sophisticated Grid OCR)

Nuance OCR Plug-In Option

The world's most accurate and robust OCR available.

• Dramatically increases zonal OCR confidence

• Improves OCR triggers precision• Better & faster background OCR

increases precision on regular expression rules

• Better image orientation detection

Extract 1D/2D barcodes from your documents and assign any part of them to fields for indexing, database export, TXT report, file naming, etc.

Barcodes are tried and true information tags.

Read Barcodes from Images

Assign custom actions based on the barcoded values such as set field values, split documents, etc.

Process Captured Data

1 2

Barcodes can be used on separator or slip sheets to designate where documents should end and begin when a stack of documents are scanned. And the barcode information on the separator sheets can be extracted for indexing, naming and routing purposes too.

ChronoScan imports PDF files with native text so you can easily index the fields you want and export your data to TXT, CSV, Excel, Word, HTML, and OLE/ODBC databases to easily feed your indexing or database application.

Automate PDF Processing TasksAutomatically extract fields and tables from PDF files.

ChronoScan learns the Document Type using comprehensive layout recognition features to “remember” user actions. Every different document type can be assigned to a different template or job to customize OCR areas, settings and actions.

Result: Scan/import documents together, without previous preparation to automate repetitive tasks and improve data input.

Automatic Document Learning:

Training ChronoScan to identify documents with Intelligent Document Recognition to automatically capture information

Type 1 Documents

Type 2 Documents

Once data is identified, it can be used for many purposes

besides indexing or metadata creation.

Validation

File Naming

File Splitting Routing

Classification

ECM Integration

Bookmarking

Metadata

Once data is identified, it can be used for many purposes

besides indexing or metadata creation.

Relying on manual scrutiny to bring this “wild content” under control simply will not work. The failure of humans to consistently tag and classify new documents as they are filed has created the mess in the first place.

© AIIM 2014, www.aiim.org

Remember, Everything Depends on Correct Metadata

Relying on manual scrutiny to bring this “wild content” under control simply will not work. The failure of humans to consistently tag and classify new documents as they are filed has created the mess in the first place.

Remember, Everything Depends on Correct Metadata

The Key: Automatic Metadata Creation

With ChronoScan

© AIIM 2014, www.aiim.org

For more on:• Automated document classification• Automated metadata creation• Batch Document processing• Batch PDF mining• Batch text mining• Batch TIF mining• Text mining• Extracting metadata,• Data extraction from unstructured data• Intelligent data capture• Data extraction• Using regex to extract data• Document scanning • Extracting data• Extract meta data, • Scanner software, • Barcode recognition, • OCR software, • Capture tutorial • Pdf scanning,• Scanning software • Indexing• Document indexing• Automated capture• Meta data • Docufi• Imageramp• ChronoScan• Data capture• What is ChronoScan• US Chronoscan reseller• ChronoScan in the US

www.docufi.com info@docufi.comCopyright ©2014

Get Started With Us

Our solutions include, ImageRamp Batch for folder processing, and ChronoScan Capture for advanced data mining and barcode requirements.

Built on over 30 years’ experience in the Document Imaging and Capture market

DocuFi is a premier ChronoScan Solutions Partner offering extensive professional services to configure the system to your specific requirements. DocuFi has been providing custom solutions into health care, financial services, retail, educational and other markets since 2010.