Konstantin Anisimovich Tatiana Danielyan News/june19/State-of-the... · 2019. 9. 16. · Neural...

State-of-the-art in neural

network implemented by ABBYY R&D

Konstantin Anisimovich Tatiana Danielyan

A brief overview of this presentation

Power of Deep Learning

Image Processing

Document analysis and capture

Traditional Machine Learning vs. Deep learning.

Traditional Machine Learning

Input Feature extraction Classification

PlaneNot Plane

Output

Deep Learning

Input Feature extraction + Classification

PlaneNot Plane

Output

Deep learning — a class of machine learning algorithms that use a graph of multiple layers of linear processing units for feature extraction and transformation. Each successive layer uses the output from the previous layer as input.

Deep learning

Advantages of Deep Learning

Learning high-level features from data in an incremental manner.

No need of domain expertise and hard core feature extraction.

Can solve the problem end to end, while traditional Machine learning techniques need the problem statements to break down to different parts to be solved first and then their results to be combine at final stage.

Power of Deep Learning

We use Deep Learning in all our AI components

Deep learning. Why the deep neural networks are not always the best solution

Disadvantages of Deep Learning

When “natural” features exist, SVM and GBDT provide comparable or better results using less computing resources.

It doesn’t work so well with small data.

Deep networks are very “black box” in that even now researchers do not fully understand the “inside” of deep networks.

Content

Image Processing

1. Image processing, correction and improvement

2. Image classification

3. Object detection

4. Text detection (Find text)

5. Neural network barcode detection

Image processing, correction and improvement

• Criterion of Crop launching — CNN for classifier which can decide whether Crop mechanism should be called or not.

• Crop — Gradient boosting to estimate Crop hypotheses. We plan to integrate CNN approach to correct more complicated geometric distortions (documents with folds and creases), to support new document types (IDs).

• Improvement — CNN for color elements removing, for detection and removing photometric distortions on images (noise, glares, blur).

TASKTo correct geometric and photometric distortions on images of various types documents to make it looks like an ideal scan and to improve the resulting quality of OCR

Image processing, correction and improvement: Crop launch criterionExamples

False pass

Redundant crop processCaptured image contain already cropped document or contain other content (no document).Issues:• It takes additional time to process and it can be

critical in mobile application.• Improper crop result can be appearedSolving: Run Crop launch criteria module that classifies input image and run crop module if it is necessary. Module should work much faster than crop module.

Normal crop processInput of the crop module receives dataand crop module process it and outputcropped document.

image crop module

Image processing, correction and improvement: CropExamples

Original photo image Preprocessed image

Image semantic segmentation

Image processing, correction and improvement:Image semantic segmentationExamples

Image processing, correction and improvement:Image semantic segmentationNN architectureFacts about this NN:• CNN architecture• 10-11 layers• 1 output with multi-class answer (edges, text regions, photos, stamps)• Training dataset: 2000 real images • + augmentation during training gives more samples х10

Image classification

• Classification — CNN for image classification by document type (letters, invoices, business cards, etc) — extract document “features” for future ML models (transfer learning).

• Clusterization - clusterization is based on classification features.

TASKTo detect the class of processed document for further correct choice of image preprocessing scenario and set of recognition settings. And clusterization of documents is needed to process different classes with different flexible descriptions.

Image classificationExamples

A4 document Double page Receipt Contract Insurance policy Invoice

Image classificationNN architecture

Facts about this NN:CNN architecture18 layersTraining dataset: public datasets with different images10-15 000 real images of printed documents

Document objects detection (stamps, signatures, logos)

• Semantic document segmentation —CNN for semantic document segmentation

TASKTo improve the quality of document analysis through excluding from analysis the areas with stamps, signatures and logos.

To detect areas with color stamps and signatures to be able to filter color elements and to save black printed text for further OCR.

Document objects detection (stamps, signatures, logos).Examples (DA result without object detection)

• Incorrect text order, further natural language text analysis is impossible• Missed text – some text is inside picture block, cannot be recognized

Document objects detection (stamps, signatures, logos).Examples (DA result with object detection)

• Correct text order, further natural language text analysis is possible• All text is inside text blocks, can be recognized

Text Detection (on real environment scenes and ID documents)

• Text detection — CNN for real environment scenes (optimized model for mobile devices, 3.5 Mb) .

• Text detection - Modified architecture for ID documents, trained on various ID images.

TASKTo detect text on mobile photo to recognize itand identify it’s type for ID documents (Name, Date, Document number etc.) for data capture scenarios

Text Detection (on real environment scenes and ID documents).Examples

Original image Traditional approach DL approach

Original Image Traditional approach DL approach

Surname Date

Document Number

Text DetectionNN architecture

Facts about this NN:• CNN architecture• 231 layers• 5 outputs make this CNN universal

Training data set:• 10 000 images ≈ 100 000 text regions• Augmentation during training gives more samples:

х 100-1000

Exit 1

AbcAbcAbcAbcAbc

Exit 2

AbcAbcAbcAbcAbc

Exit 3

AbcAbcAbcAbcAbc

Exit 4

AbcAbcAbcAbcAbc

Exit 5

AbcAbcAbcAbcAbc

Neural network barcode detection

We use mixed approach:

• Traditional barcode detector for the postcodes;

• CNN-barcode detector for other barcode types.

TASKTo improve the quality of barcode detection in 2 main scenarios:

- to sort the document stream using the barcode on the page as a feature;

- to find and learn the value of the barcode.

Australian_Pos

tAztec

Codabar

Code39

Code39.Cod

Code93

Code128

DataMatrix

EAN 8 EAN 13IATA 2

Industrial 2 of

Interleaved 2of 5

ITF(USA_

Mortgage

KixMatrix2 of 5

MaxiCode

PDF417 PostnetQRCod

eRoyalmail

UCC128

UPC-A UPC-E

IntelligentMail(USPS-4CB)

Traditional approach 78% 95% 100% 98% 100% 99% 98% 72% 88% 96% 100% 96% 100% 95% 100% 94% 100% 96% 97% 78% 91% 74% 100% 87% 96% 90%

Mixed approach 78% 90% 100% 96% 100% 100% 100% 93% 100% 100% 100% 100% 100% 100% 100% 94% 100% 99% 99% 78% 98% 74% 100% 99% 100% 90%

Detection: comparative testing between traditional approach (our current solution) and mixed approach (CNN + traditional for postcodes). Correctly found barcodes are depicted on the chart:

The detection in the full-cycle barcode processing in comparison with traditional approach results. Correctly found and recognized barcodes are depicted on the chart:

Australian_Pos

tAztec

Codabar

Code39

Code39.Cod

Code93

Code128

DataMatrix

EAN 8 EAN 13IATA 2

Industrial 2 of

Interleaved 2of 5

ITF(USA_

Mortgage

KixMatrix2 of 5

MaxiCode

PDF417 PostnetQRCod

eRoyalmail

UCC128

UPC-A UPC-E

IntelligentMail(USPS-4CB)

Traditional approach 30% 63% 90% 85% 96% 79% 91% 61% 64% 77% 87% 63% 92% 66% 100% 69% 98% 91% 87% 41% 76% 59% 93% 38% 91% 79%

Mixed approach 31% 71% 92% 86% 98% 91% 93% 73% 73% 79% 89% 74% 94% 68% 100% 70% 98% 93% 87% 46% 79% 59% 93% 47% 96% 79%

1D example (Code128) 2D example (QRCode)

Traditional approach examples

Mixed approach examples (traditional for postcodes + CNN for other barcode types)

Facts about this NN:• CNN architecture• 24 layers• Training dataset: 8000 real images • + augmentation during training gives more samples х10

Content

1. Architecture of OCR: segmentation, recognition, context analysis

2. Classical approach: word oversegmentation

3. CNN for hieroglyphs recognition

4. End-to-end word recognition without character separation

Architecture of OCR: segmentation, recognition, context analysis

OCR stages

Document Page Text block Text line Word Symbol

Classical approach for recognition: word oversegmentation

TASKImage ->text. We need to get the sequence of text symbols for the image of text line. Classic approach needs to segment line into words and word into characters.

CNN for hieroglyphs recognition

TASKImage ->text. We need to get the sequence of text symbols for the image of text line. We use the same base method –word is spited into characters. Convolutional neural networks approach allows us to integrate some new method of character recognition.

Real images

TwoStage CNN TwoStage CNNFast modeClassic approach

Classic approachFast mode

88,00%

90,00%

92,00%

94,00%

96,00%

98,00%

100,00%

300 350 400 450 500 550 600 650 700

Speed (char/sec)

Japanese results

CNN for hieroglyphs recognitionNN architecture

Facts about this NN:• Two stage CNN architecture• 7 layers for every net• ~8 000 symbols are divided into 100 clusters (groups of symbols)• Training dataset: 7700 real images for symbols• + 7000 synthetic images

End-to-end word recognition without character separation

TASKImage ->text. We need to get the sequence of text symbols for the image of text line. End-to-end neural networks approach allows us to pass the character separation stage.

Original image

Traditional approachwith word segmentation

End-to-end approach result

End-to-end word recognition without character separationNN architecture

Facts about this NNs:

• CNN + RNN (LSTM — Long short-term memory) architecture

• 37 layers• Training dataset: 100 000 real images (one

image = image of one word/fragment)• + 200 000 augmented images

End-to-end word recognition without character separationResults

News/Magazines

94,00%

94,50%

95,00%

95,50%

96,00%

96,50%

97,00%

0 100 200 300 400 500 600 700 800 900

Speed (char/sec)

Arabic Results

Content

Document analysis and capture

1. DNN for Invoice Capture: field detection, line items

2. On-premise supervised learning for Invoices

DNN for Invoice Capture: field detection, line items

TASKTo improve Invoice capture OOTB

ML methodPre-trained NN for Invoices

Field:- field name: Total- field value: 1,287.39

Field:- field name: BSB- field value: 0320000

Field:- field name: Account- field value: 403762

Field:- field name: Vendor- field value: Nestle Australia Ltd

Field:- field name: TAX Invoice- field value: 1126609718

Field:- field name: Invoice Date- field value: 05/03/14

Line item 1

Table heading

Total line

DNN for Invoice Capture: field detection, line items

DNN for Invoice Capture: field detection

Fields F-measure Fields count

InvoiceDate 0,89 3015

InvoiceNumber 0,86 2940

Total 0,83 3158

Test dataset – 3265 documents Test was made in March 2019

Traditional

approach

Fields

Invoice Number 0.76

Invoice Date 0.79

Total 0.83

Australia

Total 0,83 2424

Test dataset – 2726 documents

Canada

Total 0,87 548

Sweden

Total 0,83 615

France

Total 0,83 540

Germany

InvoiceDate 0,99 57

Total 0,95 55

Great Britain

Total 0,9 627

The algorithm automatically detects the type of the incoming invoice and relates it to those invoices that have already been edited.

What is the value?Verification results re-used to extract data from a new “similar” invoice gives an increase in the extraction quality.

On-premise supervised learning for InvoicesCombines pre-trained DL model with on-the-fly learning, so that local model can be updated in real time.

Content

1. Flexi NLP: Machine Learning on the customer side

2. Flexi NLP: Use cases

Natural language processing (NLP)* is a subfield of computer science, information engineering, and artificial intelligence concerned with the interactions between computers and human (natural) languages, in particular how to program computers to process and analyze large amounts of natural language data.

What is NLP and why do we need it? Main NLP tasks

https://en.wikipedia.org/wiki/Natural_language_processing41

Bi-LSTM for Named Entity Recognition (NER)

NER - ORG NER - LOC

Semantic Analysis

Flexi NLP: Machine Learning on the customer side

Section extraction

Decision Tree

ensemble

Lexical Analysis

Entities

Votes for best variant

Text(e-mail, contact, news, corporate

charter…)

Bank opened its branch…

Splits text into words and sentences.

Executes morphological analysis

Understands text on the sentence level, and detects links between sentences.

Splits a document into relevant sections

Decision Tree

ensembleEntities

Decision Tree

ensemble

Entities

Semantic Analysis

… …

Intents

Relationships

+ NLP Studio

Name Entity Recognition

The State Bank of India (SBI) has announced that it has opened its first branch in South Korea.

Flexi NLP: Fact extraction

NER - ORG NER - LOC

NER: “Organization”, “Location” Semantic: Resolved anaphora (Bank – it)Semantic context: “Organization” opens a branch in “Location”

NLP “knows” a lot about each word in the sentence. This knowledge is fed into machine learning model and used as features.

Flexi NLP: Fact extraction for Contracts, Leases, Loans

TASKExtract Agreement Date, Party Name, Agreement Max Amount, Available Amount and etc.

USE CASEAudit, Contract migration

Segments:PreambleContract SubjectAvailable Amount

Attributes:Agreement DateParty NameAgreement Max AmountAvailable Amount

Flexi NLP: Fact extraction for News analysis

NER: “Organization” “Location” Semantic context: Who fined Whom Why

TASKExtract risk factors (positive and negative), e.g. fact of fraud of the business.

USE CASERisk management

THE CENTRAL BANK has fined Allied Irish Bank (AIB)

nearly €2.3 million for a series of anti-money laundering

and terrorist financing compliance failures.

The €2,275,000 fine relates to six breaches of the Criminal

Justice (Money Laundering and Terrorist Financing) Act

2010 (CJA 2010).

AIB had admitted to all six breaches.

Flexi NLP: Fact extraction for Healthcare

TASKExtract medication, adverse effects and relationships between them

USE CASEHealthcare: Research Article Analysis (pharmacovigilance)

Entities:

Age 59-year-old Gender Female Symptom Loss of muscle strengthMedication Simvastatin Dosage 40 mg PO qDay in the eveningMedication Fluconazole Dosage 150 mg PO as a single dose Symptom Rhabdomyolysis

Simvastatin

Fluconazole

Loss of muscle strength

Rhabdomyolysis

Connect medications and side effects:

Relationships

ABBYY AI

1. We create intelligent skills for a robot-clerk which is able to solve practical tasks.

2. The implementation of these skills allows to reduce the time for routine operations or to make balanced management decisions.

3. We combine Knowledge Engineering and Machine Learning approaches to reach the best result with the limited training data.

4. We use the following machine learning algorithms: Naive Bayes, SVM, Gradient Enhancement, Deep Learning, Evolutionary Algorithm.

Thank you

Questions &Answers

Konstantin Anisimovich Tatiana Danielyan News/june19/State-of-the... · 2019. 9. 16. · Neural...

Documents

Transcript of Konstantin Anisimovich Tatiana Danielyan News/june19/State-of-the... · 2019. 9. 16. · Neural...

THE GOODS JUNE19, 2016, June 26, 2016

How to Generate Barcode labels using Barcode Maker Software

Edmonton june19 2015 web

Training Guide Idea Pad U350 Consumer June19

Barcode Generator Software to create 2D Barcode Labels

Wearable Barcode Scanner,Finger Barcode Scanner,Ring ...

Automatic 2D Barcode Location and Recognitionpmartin/projects/barcode...Then a final reading can be performed for parsing into the application. Reading the Barcode Our barcode algorithm

Newsletter june19

DNA Barcode Standards - The International Barcode of Life

Arman Danielyan M D - baypsychiatry.com · Arman Danielyan, M.D. Diplomate, American Board of Psychiatry & Neurology Child, Adolescent & Adult Psychiatry and Psychopharmacology _____

RUGGED COMPL Elmark Automatyka Sp. z 0.0. …support.elmark.com.pl/rgd/pdf/ca10.pdf · Barcode Scanner Optional integrated barcode scanner support D/2D barcode reading. The barcode

Revision History€¦ · Scan the “ Start of configuration ” barcode . Scan the “ Min “ barcode Set the “ Max ” barcode . Scan the “ Set ” barcode to save this setting

How to print barcode labels with drpu barcode software

Arman Danielyan M D - moodychild.comArman Danielyan, M. D. Diplomate, American Board of Psychiatry and Neurology Child, Adolescent & Adult Psychiatry and Psychopharmacology tel: (925)

Barcode & Barcode Reader

Datalogic barcode scanners for general purpose used handheld barcode readers

TechnoRiverStudio Barcode Software · Title: TechnoRiverStudio Barcode Software Author: TechnoRiver Subject: Barcode Software Keywords: Barcode Software Created Date: 1/3/2013 7:29:59

Steps to generate barcode labels with upc a barcode font

BARCODE ART. Lady Bug Shoots Barcode Leaves Barcode.

NationalTrackingPoll#2006102 June19-21,2020 ...