Post on 06-Aug-2021
State-of-the-art in neural
network implemented by ABBYY R&D
Konstantin Anisimovich Tatiana Danielyan
A brief overview of this presentation
Power of Deep Learning
Image Processing
OCR
Document analysis and capture
NLP
2
Traditional Machine Learning vs. Deep learning.
Traditional Machine Learning
Input Feature extraction Classification
PlaneNot Plane
Output
Deep Learning
Input Feature extraction + Classification
PlaneNot Plane
Output
3
Deep learning — a class of machine learning algorithms that use a graph of multiple layers of linear processing units for feature extraction and transformation. Each successive layer uses the output from the previous layer as input.
Deep learning
4
Advantages of Deep Learning
Learning high-level features from data in an incremental manner.
No need of domain expertise and hard core feature extraction.
Can solve the problem end to end, while traditional Machine learning techniques need the problem statements to break down to different parts to be solved first and then their results to be combine at final stage.
Power of Deep Learning
5
We use Deep Learning in all our AI components
Deep learning. Why the deep neural networks are not always the best solution
6
Disadvantages of Deep Learning
When “natural” features exist, SVM and GBDT provide comparable or better results using less computing resources.
It doesn’t work so well with small data.
Deep networks are very “black box” in that even now researchers do not fully understand the “inside” of deep networks.
Content
Image Processing
1. Image processing, correction and improvement
2. Image classification
3. Object detection
4. Text detection (Find text)
5. Neural network barcode detection
7
Image processing, correction and improvement
• Criterion of Crop launching — CNN for classifier which can decide whether Crop mechanism should be called or not.
• Crop — Gradient boosting to estimate Crop hypotheses. We plan to integrate CNN approach to correct more complicated geometric distortions (documents with folds and creases), to support new document types (IDs).
• Improvement — CNN for color elements removing, for detection and removing photometric distortions on images (noise, glares, blur).
TASKTo correct geometric and photometric distortions on images of various types documents to make it looks like an ideal scan and to improve the resulting quality of OCR
8
Image processing, correction and improvement: Crop launch criterionExamples
False pass
Redundant crop processCaptured image contain already cropped document or contain other content (no document).Issues:• It takes additional time to process and it can be
critical in mobile application.• Improper crop result can be appearedSolving: Run Crop launch criteria module that classifies input image and run crop module if it is necessary. Module should work much faster than crop module.
Normal crop processInput of the crop module receives dataand crop module process it and outputcropped document.
image crop module
image crop module
9
Image processing, correction and improvement: CropExamples
10
Original photo image Preprocessed image
Image semantic segmentation
11
Image processing, correction and improvement:Image semantic segmentationExamples
12
Image processing, correction and improvement:Image semantic segmentationNN architectureFacts about this NN:• CNN architecture• 10-11 layers• 1 output with multi-class answer (edges, text regions, photos, stamps)• Training dataset: 2000 real images • + augmentation during training gives more samples х10
13
Image classification
• Classification — CNN for image classification by document type (letters, invoices, business cards, etc) — extract document “features” for future ML models (transfer learning).
• Clusterization - clusterization is based on classification features.
TASKTo detect the class of processed document for further correct choice of image preprocessing scenario and set of recognition settings. And clusterization of documents is needed to process different classes with different flexible descriptions.
14
Image classificationExamples
A4 document Double page Receipt Contract Insurance policy Invoice
15
Image classificationNN architecture
Facts about this NN:CNN architecture18 layersTraining dataset: public datasets with different images10-15 000 real images of printed documents
16
Document objects detection (stamps, signatures, logos)
• Semantic document segmentation —CNN for semantic document segmentation
TASKTo improve the quality of document analysis through excluding from analysis the areas with stamps, signatures and logos.
To detect areas with color stamps and signatures to be able to filter color elements and to save black printed text for further OCR.
17
Document objects detection (stamps, signatures, logos).Examples (DA result without object detection)
18
• Incorrect text order, further natural language text analysis is impossible• Missed text – some text is inside picture block, cannot be recognized
Document objects detection (stamps, signatures, logos).Examples (DA result with object detection)
19
• Correct text order, further natural language text analysis is possible• All text is inside text blocks, can be recognized
Text Detection (on real environment scenes and ID documents)
• Text detection — CNN for real environment scenes (optimized model for mobile devices, 3.5 Mb) .
• Text detection - Modified architecture for ID documents, trained on various ID images.
TASKTo detect text on mobile photo to recognize itand identify it’s type for ID documents (Name, Date, Document number etc.) for data capture scenarios
20
Text Detection (on real environment scenes and ID documents).Examples
Original image Traditional approach DL approach
Original Image Traditional approach DL approach
Name
Surname Date
Document Number
21
Text DetectionNN architecture
Facts about this NN:• CNN architecture• 231 layers• 5 outputs make this CNN universal
Training data set:• 10 000 images ≈ 100 000 text regions• Augmentation during training gives more samples:
х 100-1000
Exit 1
AbcAbcAbcAbcAbc
Exit 2
AbcAbcAbcAbcAbc
Exit 3
AbcAbcAbcAbcAbc
Exit 4
AbcAbcAbcAbcAbc
Exit 5
AbcAbcAbcAbcAbc
22
Neural network barcode detection
We use mixed approach:
• Traditional barcode detector for the postcodes;
• CNN-barcode detector for other barcode types.
TASKTo improve the quality of barcode detection in 2 main scenarios:
- to sort the document stream using the barcode on the page as a feature;
- to find and learn the value of the barcode.
23
Neural network barcode detection
24
Australian_Pos
tAztec
Codabar
Code39
Code39.Cod
e 32
Code93
Code128
DataMatrix
EAN 8 EAN 13IATA 2
of 5
Industrial 2 of
5
Interleaved 2of 5
ISBN
ITF(USA_
Mortgage
KixMatrix2 of 5
MaxiCode
PDF417 PostnetQRCod
eRoyalmail
UCC128
UPC-A UPC-E
IntelligentMail(USPS-4CB)
Traditional approach 78% 95% 100% 98% 100% 99% 98% 72% 88% 96% 100% 96% 100% 95% 100% 94% 100% 96% 97% 78% 91% 74% 100% 87% 96% 90%
Mixed approach 78% 90% 100% 96% 100% 100% 100% 93% 100% 100% 100% 100% 100% 100% 100% 94% 100% 99% 99% 78% 98% 74% 100% 99% 100% 90%
0%
20%
40%
60%
80%
100%
120%
Detection: comparative testing between traditional approach (our current solution) and mixed approach (CNN + traditional for postcodes). Correctly found barcodes are depicted on the chart:
The detection in the full-cycle barcode processing in comparison with traditional approach results. Correctly found and recognized barcodes are depicted on the chart:
Australian_Pos
tAztec
Codabar
Code39
Code39.Cod
e 32
Code93
Code128
DataMatrix
EAN 8 EAN 13IATA 2
of 5
Industrial 2 of
5
Interleaved 2of 5
ISBN
ITF(USA_
Mortgage
KixMatrix2 of 5
MaxiCode
PDF417 PostnetQRCod
eRoyalmail
UCC128
UPC-A UPC-E
IntelligentMail(USPS-4CB)
Traditional approach 30% 63% 90% 85% 96% 79% 91% 61% 64% 77% 87% 63% 92% 66% 100% 69% 98% 91% 87% 41% 76% 59% 93% 38% 91% 79%
Mixed approach 31% 71% 92% 86% 98% 91% 93% 73% 73% 79% 89% 74% 94% 68% 100% 70% 98% 93% 87% 46% 79% 59% 93% 47% 96% 79%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1D example (Code128) 2D example (QRCode)
Neural network barcode detection
25
1D example (Code128) 2D example (QRCode)
Traditional approach examples
Mixed approach examples (traditional for postcodes + CNN for other barcode types)
Neural network barcode detection
26
Facts about this NN:• CNN architecture• 24 layers• Training dataset: 8000 real images • + augmentation during training gives more samples х10
Content
OCR
1. Architecture of OCR: segmentation, recognition, context analysis
2. Classical approach: word oversegmentation
3. CNN for hieroglyphs recognition
4. End-to-end word recognition without character separation
27
Architecture of OCR: segmentation, recognition, context analysis
OCR stages
Document Page Text block Text line Word Symbol
28
Classical approach for recognition: word oversegmentation
TASKImage ->text. We need to get the sequence of text symbols for the image of text line. Classic approach needs to segment line into words and word into characters.
29
CNN for hieroglyphs recognition
TASKImage ->text. We need to get the sequence of text symbols for the image of text line. We use the same base method –word is spited into characters. Convolutional neural networks approach allows us to integrate some new method of character recognition.
Real images
30
TwoStage CNN TwoStage CNNFast modeClassic approach
Classic approachFast mode
88,00%
90,00%
92,00%
94,00%
96,00%
98,00%
100,00%
300 350 400 450 500 550 600 650 700
Qu
alit
y (w
/o s
pac
e er
rors
, %)
Speed (char/sec)
Japanese results
CNN for hieroglyphs recognitionNN architecture
Facts about this NN:• Two stage CNN architecture• 7 layers for every net• ~8 000 symbols are divided into 100 clusters (groups of symbols)• Training dataset: 7700 real images for symbols• + 7000 synthetic images
31
End-to-end word recognition without character separation
TASKImage ->text. We need to get the sequence of text symbols for the image of text line. End-to-end neural networks approach allows us to pass the character separation stage.
Original image
Traditional approachwith word segmentation
End-to-end approach result
32
End-to-end word recognition without character separationNN architecture
Facts about this NNs:
• CNN + RNN (LSTM — Long short-term memory) architecture
• 37 layers• Training dataset: 100 000 real images (one
image = image of one word/fragment)• + 200 000 augmented images
33
End-to-end word recognition without character separationResults
34
Books
Docs
News/Magazines
Docs
Books
News/Magazines
94,00%
94,50%
95,00%
95,50%
96,00%
96,50%
97,00%
0 100 200 300 400 500 600 700 800 900
Qu
alit
y (w
/o s
pac
e er
rors
, %)
Speed (char/sec)
Arabic Results
Content
Document analysis and capture
1. DNN for Invoice Capture: field detection, line items
2. On-premise supervised learning for Invoices
35
DNN for Invoice Capture: field detection, line items
36
TASKTo improve Invoice capture OOTB
ML methodPre-trained NN for Invoices
Field:- field name: Total- field value: 1,287.39
Field:- field name: BSB- field value: 0320000
Field:- field name: Account- field value: 403762
Field:- field name: Vendor- field value: Nestle Australia Ltd
Field:- field name: TAX Invoice- field value: 1126609718
Field:- field name: Invoice Date- field value: 05/03/14
Line item 1
Table heading
Total line
DNN for Invoice Capture: field detection, line items
37
DNN for Invoice Capture: field detection
38
USA
Fields F-measure Fields count
InvoiceDate 0,89 3015
InvoiceNumber 0,86 2940
Total 0,83 3158
Test dataset – 3265 documents Test was made in March 2019
Traditional
approach
Fields
Invoice Number 0.76
Invoice Date 0.79
Total 0.83
Australia
Fields F-measure Fields count
InvoiceDate 0,94 2426
InvoiceNumber 0,87 2380
Total 0,83 2424
Test dataset – 2726 documents
Canada
Fields F-measure Fields count
InvoiceDate 0,89 553
InvoiceNumber 0,87 553
Total 0,87 548
Test dataset – 638 documents
Sweden
Fields F-measure Fields count
InvoiceDate 0,95 614
InvoiceNumber 0,91 558
Total 0,83 615
Test dataset – 628 documents
France
Fields F-measure Fields count
InvoiceDate 0,89 535
InvoiceNumber 0,85 536
Total 0,83 540
Test dataset – 568 documents
Germany
Fields F-measure Fields count
InvoiceDate 0,99 57
InvoiceNumber 0,98 57
Total 0,95 55
Test dataset – 122 documents
Great Britain
Fields F-measure Fields count
InvoiceDate 0,92 622
InvoiceNumber 0,93 615
Total 0,9 627
Test dataset – 646 documents
The algorithm automatically detects the type of the incoming invoice and relates it to those invoices that have already been edited.
What is the value?Verification results re-used to extract data from a new “similar” invoice gives an increase in the extraction quality.
On-premise supervised learning for InvoicesCombines pre-trained DL model with on-the-fly learning, so that local model can be updated in real time.
39
Content
NLP
1. Flexi NLP: Machine Learning on the customer side
2. Flexi NLP: Use cases
40
Natural language processing (NLP)* is a subfield of computer science, information engineering, and artificial intelligence concerned with the interactions between computers and human (natural) languages, in particular how to program computers to process and analyze large amounts of natural language data.
What is NLP and why do we need it? Main NLP tasks
https://en.wikipedia.org/wiki/Natural_language_processing41
Bi-LSTM for Named Entity Recognition (NER)
NER - ORG NER - LOC
42
Semantic Analysis
Flexi NLP: Machine Learning on the customer side
Section extraction
Decision Tree
ensemble
Lexical Analysis
Entities
Votes for best variant
Text(e-mail, contact, news, corporate
charter…)
Bank opened its branch…
Splits text into words and sentences.
Executes morphological analysis
Understands text on the sentence level, and detects links between sentences.
Splits a document into relevant sections
43
Decision Tree
ensembleEntities
Decision Tree
ensemble
Entities
Semantic Analysis
Semantic Analysis
… …
Intents
Facts
Relationships
+ NLP Studio
NER
Name Entity Recognition
The State Bank of India (SBI) has announced that it has opened its first branch in South Korea.
Flexi NLP: Fact extraction
NER - ORG NER - LOC
NER: “Organization”, “Location” Semantic: Resolved anaphora (Bank – it)Semantic context: “Organization” opens a branch in “Location”
44
NLP “knows” a lot about each word in the sentence. This knowledge is fed into machine learning model and used as features.
Flexi NLP: Fact extraction for Contracts, Leases, Loans
45
TASKExtract Agreement Date, Party Name, Agreement Max Amount, Available Amount and etc.
USE CASEAudit, Contract migration
Segments:PreambleContract SubjectAvailable Amount
Attributes:Agreement DateParty NameAgreement Max AmountAvailable Amount
Flexi NLP: Fact extraction for News analysis
46
NER: “Organization” “Location” Semantic context: Who fined Whom Why
TASKExtract risk factors (positive and negative), e.g. fact of fraud of the business.
USE CASERisk management
THE CENTRAL BANK has fined Allied Irish Bank (AIB)
nearly €2.3 million for a series of anti-money laundering
and terrorist financing compliance failures.
The €2,275,000 fine relates to six breaches of the Criminal
Justice (Money Laundering and Terrorist Financing) Act
2010 (CJA 2010).
AIB had admitted to all six breaches.
Flexi NLP: Fact extraction for Healthcare
47
TASKExtract medication, adverse effects and relationships between them
USE CASEHealthcare: Research Article Analysis (pharmacovigilance)
Entities:
Age 59-year-old Gender Female Symptom Loss of muscle strengthMedication Simvastatin Dosage 40 mg PO qDay in the eveningMedication Fluconazole Dosage 150 mg PO as a single dose Symptom Rhabdomyolysis
Simvastatin
Fluconazole
Loss of muscle strength
Rhabdomyolysis
Connect medications and side effects:
Relationships
48
ABBYY AI
1. We create intelligent skills for a robot-clerk which is able to solve practical tasks.
2. The implementation of these skills allows to reduce the time for routine operations or to make balanced management decisions.
3. We combine Knowledge Engineering and Machine Learning approaches to reach the best result with the limited training data.
4. We use the following machine learning algorithms: Naive Bayes, SVM, Gradient Enhancement, Deep Learning, Evolutionary Algorithm.
Thank you
Questions &Answers