Automated Data Capture and Extraction with ChronoScan for Automated Metadata and Classification

With ChronoScan

Capture and Extraction: Where ECM Begins

Capture means many things when speaking of Document/Records Management or Enterprise Content Management.

AIIMAssociation for Information and Image Management

“Capture boils down to entering content into the system.”

Extraction is an important element of Capture…

By extraction we mean pulling the important information from the content to use for classification or taxonomy purposes, creation of the appropriate metadata or tags, and more.

Extraction is an important element of Capture…

So Why is Capture and Extraction so Important?

All Information Governance and Content Management Depends on Correct Metadata

• Find key information on demand

• Apply the correct data security/privacy rules

• Determine the correct data retention

• Protect your entity regarding eDiscovery/legal compliance issues

• Turn your content or knowledge into a competitive advantage

You have to correctly identify the document or content to:

a comprehensive suite of software for document scanning, data extraction and integration into your ECM, CMIS compliant, or line of business database.

ChronoScan is:

The capture of the “thing”:

• Scans• Faxes• Emails• PrintStreams

Exterior Interior

Let’s categorize capture by what we’ll call the Exterior and the Interior

The capture of the content of the “thing”:

Actual data and information extracted from the “thing” such as invoice number, line items, customer number, vendor number, patient name…whatever your information concerns.

This presentation looks at the “interior” capture accomplished by ChronoScan’s “extraction” features.

ChronoScan’s Extraction Features We’ll Examine

OCR technology is the foundation for many of

ChronoScan’s auto extraction capabilities.

Using sophisticated OCR technologies such as Zonal OCR and Grid OCR, ChronoScan can extract data to classify the document and create indexes (metadata or tags) from structured and unstructured

documents.

Extract only data from the area of your document where your important information is found for fast, automatic data extraction.

Zonal OCR Capture

Use Dynamic Text Anchors to link to moving text using constant or variable patterns, thus accommodating unstructured documents.

Zonal OCR Capture

Here, ChronoScan finds the word “subtotal” and captures the data to the right. Extracted data can be further manipulated and used for validation.

Optimize for your documents with multiple parameters like image processing, OCR engine, type of data to find, regular expression validation and more.

Zonal OCR Capture

Grid OCR is used for Line Item Extraction and

Advanced Report Breakdown or Dismount.

With Line Item Extraction, extract and manipulate line data found on such forms as invoices or delivery tickets.

Advanced Report Breakdown or DismountConvert complex reports to a structured data format.Convert complex PDF or scanned OCR reports into a structured data format. With this unique feature, ChronoScan is able to break down complex reports automatically, splitting every different record as an independent processing unit. The software is able to adapt extraction to different rules and page limits to break down and structure visually complex documents into a compressible data file (CSV/XLS).

Advanced Report Breakdown or Dismount

Break Down

Extract

Converts complex reports to structured data.

ChronoScan breaks down complex reports automatically, splitting every different record as an independent processing unit.

Easily adapt extraction to different rules and page limits to break down and structure visually complex documents into a compressible data file (CSV/XLS).

(using sophisticated Grid OCR)

Nuance OCR Plug-In Option

The world's most accurate and robust OCR available.

• Dramatically increases zonal OCR confidence

• Improves OCR triggers precision• Better & faster background OCR

increases precision on regular expression rules

• Better image orientation detection

Extract 1D/2D barcodes from your documents and assign any part of them to fields for indexing, database export, TXT report, file naming, etc.

Barcodes are tried and true information tags.

Read Barcodes from Images

Assign custom actions based on the barcoded values such as set field values, split documents, etc.

Process Captured Data

Barcodes can be used on separator or slip sheets to designate where documents should end and begin when a stack of documents are scanned. And the barcode information on the separator sheets can be extracted for indexing, naming and routing purposes too.

ChronoScan imports PDF files with native text so you can easily index the fields you want and export your data to TXT, CSV, Excel, Word, HTML, and OLE/ODBC databases to easily feed your indexing or database application.

Automate PDF Processing TasksAutomatically extract fields and tables from PDF files.

ChronoScan learns the Document Type using comprehensive layout recognition features to “remember” user actions. Every different document type can be assigned to a different template or job to customize OCR areas, settings and actions.

Result: Scan/import documents together, without previous preparation to automate repetitive tasks and improve data input.

Automatic Document Learning:

Training ChronoScan to identify documents with Intelligent Document Recognition to automatically capture information

Type 1 Documents

Type 2 Documents

Once data is identified, it can be used for many purposes

besides indexing or metadata creation.

Validation

File Naming

File Splitting Routing

Classification

ECM Integration

Bookmarking

Metadata

Once data is identified, it can be used for many purposes

besides indexing or metadata creation.

Relying on manual scrutiny to bring this “wild content” under control simply will not work. The failure of humans to consistently tag and classify new documents as they are filed has created the mess in the first place.

Remember, Everything Depends on Correct Metadata

Relying on manual scrutiny to bring this “wild content” under control simply will not work. The failure of humans to consistently tag and classify new documents as they are filed has created the mess in the first place.

Remember, Everything Depends on Correct Metadata

The Key: Automatic Metadata Creation

With ChronoScan

For more on:• Automated document classification• Automated metadata creation• Batch Document processing• Batch PDF mining• Batch text mining• Batch TIF mining• Text mining• Extracting metadata,• Data extraction from unstructured data• Intelligent data capture• Data extraction• Using regex to extract data• Document scanning • Extracting data• Extract meta data, • Scanner software, • Barcode recognition, • OCR software, • Capture tutorial • Pdf scanning,• Scanning software • Indexing• Document indexing• Automated capture• Meta data • Docufi• Imageramp• ChronoScan• Data capture• What is ChronoScan• US Chronoscan reseller• ChronoScan in the US

Get Started With Us

Our solutions include, ImageRamp Batch for folder processing, and ChronoScan Capture for advanced data mining and barcode requirements.

Built on over 30 years’ experience in the Document Imaging and Capture market

DocuFi is a premier ChronoScan Solutions Partner offering extensive professional services to configure the system to your specific requirements. DocuFi has been providing custom solutions into health care, financial services, retail, educational and other markets since 2010.

Learn More:

Automated Data Capture and Extraction with ChronoScan for Automated Metadata and Classification

Technology

Transcript of Automated Data Capture and Extraction with ChronoScan for Automated Metadata and Classification

Automated Extraction of Semantic Legal Metadata Using Natural Language Processing · 2019-07-12 · Natural Language Processing (NLP). I. INTRODUCTION Legal metadata provides explicit

BMO's Fully Automated SOA ETL Metadata Capture Soln

Oy Metadata Content j of Metadata. Discovery Access Understanding Levels of Metadata joy of Metadata Metadata Standards Why standards Which standards.

ChronoScan Document Scanning and Capture Output Options

Chapter 3: Automated Metadata Extraction from Art Images · Chapter 3: Automated Metadata Extraction from Art Images 101 3 The Process of Image Retrieval Information retrieval is

Nearly-Automated Metadata Hierarchy Creation Emilia Stoica and Marti Hearst SIMS University of California, Berkeley.

Chapter 3: Automated Metadata Extraction from Art Imagespaws.kettering.edu/~pstanche/ADCH-ch3.pdfChapter 3: Automated Metadata Extraction from Art Images 101 3 The Process of Image

European SharePoint Conference Automated Tagging and Metadata Management with SharePoint 2010

Navigating the application of Modernisation Frameworks ......• The Statistical Workflow Management System (SWMS) will support the introduction of automated, standard, reusable metadata

A Training and Classification System in Support of Automated Metadata Extraction

Department of Primary Industries, Parks, Water & …dpipwe.tas.gov.au/Documents/TASVEG_3_0_Metadata.pdfTASVEG 3.0 metadata statement 7 this project were performed via automated means

Metadata Content. Levels of Metadata Discovery Access Understanding Metadata Standards Additional Metadata Common Core Metadata Why standards Which standards.

Using MS-ACCESS® Metadata to Drive Automated SAS® Data Processing Gary N. Weeks Centers for Disease Control Atlanta, Georgia.

Serendipity for SharePoint - The Power of Automated Metadata, Migration, Taxomony and Search Acceleration

Metadata lecture 3, metadata schemes

Oy What is Metadata? j of Metadata. Metadata 101 joy of Metadata Common terms What is metadata? Why metadata?

Linux Metadata - Columbia Universitysmb/classes/s06-4118/l21.pdf · Linux Metadata Linux Metadata Where is Metadata Stored? Metadata in the File Metadata in the Directory Crash Recovery

Semi-automated metadata extraction in the long-term

Automated Metadata in Multimedia Information Systems: …disi.unitn.it/~bernardi/Courses/DL/automated_metadata.pdf · 2012. 4. 13. · 4.2 Surrogates for Collections of Video ...

Migrating ETDs from Dublin Core to MODSdownloads.alcts.ala.org/mw_ac/...Metadata_Creation_Bortmas_Slides.… · Migrating ETDs from Dublin Core to MODS: Automated processes for metadata