Trivandrum
-
Upload
vgovindaraju -
Category
Education
-
view
183 -
download
5
description
Transcript of Trivandrum
Scanner
Storage
OCR
Noisy TextNewton Kinematics Notes
Query
FormsLetters Notes
Handwritten Documents Relevance
Outline
Recognition Postal Applications Paradigms Fusion
Search IR Models Word Spotting
Challenge of Handwriting
Input
Output20187
+2246Handwriting Recognition
Postal Context (138 mil records) ZIP Code 30% of ZIP Codes
contain a single street name
5% of ZIP Codes contain a single primary number
2% of ZIP Codes contain a single add-on
<ZIP Code, primary number>
Maximum number of records returned is 3,071
<ZIP Code, add-on> Maximum number of
records returned is 3,070
Lex Top 1 Top 2
10 96.5 98.7
100 89.2 94.1
1000 75.3 86.3
LDR
Paradigms
Context Ranked Lexicon
Lexicon Driven OCR
LDR
Lexicon Free OCR
LFR
Segmentation Recognition Post-processing
Lexicon Free (LFR)4
5
67 82 3
1
1 32 4 5 6 7 8i[.8], l[.8] u[.5], v[.2]
w[.6], m[.3]
w[.7]
i[.7]u[.3]
m[.2]m[.1]
r[.4]
d[.8]o[.5]
-Image from 1 to 3 is a in with 0.5 confidence-Image from segment 1 to 4 is a ‘w’ with 0.7 confidence-Image from segment 1 to 5 is a ‘w’ with 0.6 confidence and an ‘m’ with 0.3 confidence
Find the best path in graph from segment 1 to 8
Lexicon Driven (LDR)
1 2 3 4 5 6 7 8 9
w[7.6]
w[7.2]r[3.8]
w[5.0]
w[8.6]
o[7.6]r[6.3]
d[4.9]
w[5.0]
o[6.6]
o[6.0]
o[7.2]o[10.6] d[6.5]
d[4.4]
r[7.5]r[6.4]
o[7.8]r[8.6]
o[8.7]r[7.4]
r[7.6]
o[8.3]
o[7.7]r[5.8]
1 2 3 4 5 6 7 8 9
o[6.1]
Find the best way of accounting for characters ‘w’, ‘o’, ‘r’, ‘d’ buy consuming all segments 1 to 8
Distance between lexicon entry ‘word’ first character ‘w’ and the image between:- segments 1 and 4 is 5.0- segments 1 and 3 is 7.2- segments 1 and 2 is 7.6
Grapheme Models (LFR)
grapheme pos orientation angle
Down cusp 3.0 -90o
Up loop
Down arc
Writer Specific Modeling
Holistic Features
a) Amherst b) Buffalo c) Boston d) None of the above
ABLE TRIPTRAP
A TN
Words
Letters
Features
Interactive Models (LDR)
1-way activation[McClelland and Rumelhart 1981]
2-way interaction
Interactive Models (LDR)Phrase Level
T-crossings, loops, ascenders, descenders, length
West Central StreetWest Main StreetSunset Avenue
West Central StreetEast Central StreetSunset Avenue
West Central StreetWest Central AvenueSunset Avenue
Lexicon 1 Lexicon 2 Lexicon 3
Interactive Model
features
image
2-way interaction
Interactive ModelsCharacter Recognition
Adaptive feature selection
Adaptive number of features
Adaptive resolutions
Gradient (4) and Moment (5) Features
0 1 0 1 1 1 0 0 1
[Park and Govindaraju, IEEE CVPR 2000]
Active Recognition
ResultsActiveModel
Neural Net
KNN
Top 1% 95.7 % 96.4% 95.7%
Temp 612 976 3,777
Msec 1.45 11.5 384
Training hrs
1 24 1
10 class digit recognition
25656 training and 12242 test
(Postal +NIST)
Lex size LDR % GM %
10 96.86 96.56
100 91.36 89.12
1000 79.58 75.38
(Top 50) 98.00 98.40
20000 62.43 58.14
(Top 100) 93.59 93.39
Fusion
Identification Task
Verification Task
LDR
LFR
Question: if we find optimal and , is it necessarily ? Nf 1f 1ffN
Fusion of RecognizersType III
),( 21
11 ssfN
LDR
5.6
7.4
…
LFR
.52
.81
…
Identification task:
Amherst
Buffalo
…
Verification task:
5.6 .52Amherst
),( 22
12 ssfN
),( 211 ssf
1S
2S Ni ,...,1maxarg
SAccept
Reject
• Sum rule
• Weighted sum rule
• Product rule
• Max rule
• Rank-based methods
Traditional Fusion Rules2121
1 ),( ssssf
22
11
211 ),( swswssf
21211 ),( ssssf
),max(),( 21211 ssssf
}),,{,( 111
111Niii sssrankrs
21211 ),( iiii rrssf
)|,(),( 21211 genrrPssf iiii
Likelihood RatioVerification Tasks
Impostor
Genuine
Rec
ogni
zer
sco
re 2
Recognizer score 1
• 2 classes: imposter and genuine• Pattern classification task
),(
),(),(
21
2121
ssp
sspssf
imp
genlr
Minimum risk criteria: optimal decision boundaries coincide with the contours of likelihood ratio function:
Metaclassification with NN, SVM, etc. also possible
lrV ff
Vf
[Prabhakar, Jain 02] [Nandkumar, Jain, Das 08]
Optimal Combination functions
LFR is correct 54.8%
LDR is correct 77.2%
Both are correct 48.9%
Either is correct 83.0%
Likelihood Ratio 69.8%
Weighted Sum 81.6%
• LR combination is worse than single matcher
Vf
LRV ff
Identification Task Results
Top choice correct rate
Verification Task Results
ROC
)},,,{,,,,( 2121ik
Mkkk
Miiii ssssssfS
Independence of ScoresIn a single trial
),( 21
11 ssf
Amherst
5.6
7.4
…
Buffalo
.52
.81
…
LDR
LFR
…
),( 22
12 ssf
…. ….
)},,,{,,,,( 2121ik
Mkkk
Miiii ssssssfS Lexicon1 Lexicon i
LexiconN
Independence of ScoresIn a single trial
Recognizer 1
Recognizer M
Dependent
Dependent
Tulyakov & Govindaraju, TIFS 2009
Independent?
Optimal Combination ?:lrN ff Set size
LFR LDR Both correct
Eithercorrect
LR Weighted sum
54.8% 77.2% 48.9% 83.0% 69.8% 81.6%
6147 3366 4744 3005 5105 4293 5015
2nd choice
3rd choice
4th choice
Mean
LFR .4359 .4755 .4771 .1145
LDR .7885 .7825 .7673 .5685
Correlated Scores
Dependent on input signal
Optimal Trainable Combination Function
Minimizing misclassification cost:
)|,,...,,()|,,...,,( 2121
11
2121
11 jNNiNN sssspssssp
Classify as rather thani j
Assume that scores assigned to different classes are independent:
),()...,()...,(
)|,()...|,()...|,()|,,...,,(21212
111
212121
11
2121
11
NNimpiigenimp
iNNiiiiiNN
sspsspssp
sspsspsspssssp
),()...,()...,(),()...,()...,( 212121
11
212121
11 NNimpjjgenimpNNimpiigenimp sspsspsspsspsspssp
),(
),(
),(
),(21
21
21
21
jjimp
jjgen
iiimp
iigen
ssp
ssp
ssp
ssp ),(maxarg 21
,...,1iilr
Nissf
Nf
Tulyakov & Govindaraju IJPRAI 2009
Combination Methods Identification Tasks
Rec
og
niz
er s
core
2
Recognizer score 1
ImpostorGenuine
Rec
og
niz
er s
core
2
Recognizer score 1
ImpostorGenuine
Rec
og
niz
er S
core
2
Recognizer score 1
No!
Traditional Training mixes the genuine and imposter scores from different trials.
BR
eco
gn
izer
sc
ore
2
Recognizer score 1
ImpostorGenuine
Rex
cog
niz
er s
core
2
Recognizer score 1
ImpostorGenuine
Rec
og
niz
er s
core
2
Biometric score 1
Model Training MUST process scores from one identification trial as a single training sample.
Combination Methods Identification Tasks
• Initialize a combination function
• Get scores from the same identification trial (for all trials)• Update function so Genuine score better than any impostor score
),,,(
),,,(()
21
21
Miiiimp
Miiigen
sssp
ssspf
),,,( 21 Msssf
0,1
1()
)( 12
21
1
jsss M
MMe
f
Best Impostor Function
Sum of Logistic Functions
Iterative Methods
Likelihood Ratio
Weighted sum
Best Impostor Likelihood Ratio
Logistic Sum
Neural Network
LFR & LDR 69.84 81.58 80.07 81.43 81.67
li & C 97.24 97.23 97.01 97.34 97.39
li & G 95.90 95.47 95.99 96.17 96.29
Outline
Recognition Postal Applications Paradigms Fusion
Search Lexicon Reduction Word Spotting IR Models
Search for Handwritten Documents
LexiconGood Quality10K 1K
Historical10K 1K
Medical4K
Top 1 (%) 57 67 12 28 20
Top 3 (%) 69 72 22 44 27
Top 10 (%) 74 75 32 72 42
• Lexicons are typically large: >5K• Need around 70% accuracy
Strategy• Reduce lexicon size using topic categorization (DAS 06;08)• Use Top-N choices returned by OCR (ICDAR 07)
•Pre Hospital Care ReportWNY: 250,000 filed a yearNYC: 50,000 filed in a dayPDAs not popular
•OHR issuesLoosely constrained writing styleLarge lexiconsHeterogeneous data
6,700 carbon forms stored at 300 DPI1000 PCR forms ground truthed
Search EngineHandwritten Forms
Search Engine for Medical Forms
•Find all people who reported asthma problems in NY•How many people with high blood pressure are on medication X?•Is there an epidemic breaking?
Topic Categorization Lexicon Reduction
Lex FreeLarge Lexicon> 5K
HandwrittenMedical
Documents
ICR Features
~33% wordRecognition rate(10 points gain)
Topic Categorization
Select Reduced Lexicon~2.5K
Lex Driven
ICR Features Index
cohesion(wa ,wb ) z f (wa ,wb )
f (wa )* f (wb ))
DIGESTIVE-SYSTEM FQ CHSN PHRASE30 0.72 PAIN INCIDENT5 0.31 PAIN TRANSPORTED42 0.54 PAIN CHEST52 0.81 STOMACH PAIN9 0.25 HOME PAIN6 0.43 VOMITING ILLNESS
Topic Features
(Chu-Carroll, et al., 1999)
Bt, c At, c
At, e2
e1
n
IDF( t) log2
n
c( t)
X t, c IDF(t)Bt, c
z cos(x,y) xyT
xi2 yi
2
i1
n
i1
n
Topic Categorization
35
Results
CLT to RLT CL to RL CLT to ALT CLT to SLT
HR 7.48% 7.42% 17.58% 7.42%
Error Rate 10.78% 10.88% 24.53% 10.21%
C: complete lexiconR: reduced lexiconA: category givenS: features syntheticT: truth present
Outline
Recognition Postal Applications Paradigms Fusion
Search Lexicon Reduction Word Spotting IR Models
Urgent Issue of our Times
Vast, irreplaceable, culturally vital legacy
collections of historical documents are competing
ineffectively for attention with billions of digital
documents
Thus historical archives are threatened with
neglect, perceived irrelevance, …. & eventually,
oblivion?
Threat: ‘If it’s not in Google, it doesn’t exist!’
Baird 2003
What is possible today?• View Document Images
Document Enhancement
[Shi, Setlur, and Govindaraju 2008]
Transcript-Mapping
1787 Thomas Jefferson letter and its transcript
Image
Transcript
+ +
What is not possible today?
Multilingual Document Corpus
Retrieved Documents
English
Hindi Sanskrit
Translations of “strength”
Crosslingual Retrieval
SEARCHHandwritten Documents
Image – Based
Use Image Based
Features
OCR - Based
Use OCR Recognition
Results
Query rendered
Poor performance in multiple writer scenarios
Image Based Methods
(Rath 07 IJDAR)
SEARCHHandwritten Documents
Image – Based
Use Image Based
Features-
OCR - Based
Use OCR recognition
results
Indexing Retrieval
Handwriting Recognition
Vector IR Model (TF-IDF)
Set of terms {ti};
Set of documents {dj} of length {Lj}
Term Frequency (TF)
Inverted Document Frequency-IDF
Query TF
Similarity
j
jiji L
freqtf ,
,
}0 |{#
}{#log
,
jij
ji freqd
didf
otherwise ,0
query in is if ,1,
qttf i
qi
qii
ijij tfidftfqd ,,),(sim
jitf ,terms
back 0.024
.
.
.
0.008pain
}pain"" ,back"{"q
.
.
.
.
.
.
.
.
.
iidf
4.1
2.4
.
.
.
.
.
.
.
.
.
qitf ,
1
1
0
...
0
0
...
0
0
...
0
),sim( qd j
[Baeza-Yates99]
Modifications to VM
L
freqtf ji
ji,
,
}0|{#
}{#log
,
jij
ji freqd
didf
Classic VM: computes the tf and IDF from the OCR’ed text (top-1)
L
freqtf jiocr
ji
}{E ,,
5.0}{E|#
}{#log
,
jij
jocri freqd
didf
Modified VM: computes the tf and idf from the top-n choices of word recognition
Required Inputs
Word segmentation result
Word recognition likelihoods
Estimation
: word images]...[ 21 Lwwww
L
kkiji wtfreqE
1, )|Pr(}{
)|pain""Pr( kw 0.02 0.01 0.2 0.01 0.01
}{ ,pain"" jfreqE
…Doc dj
[Rath 04, Howe 05]
Estimating Term Frequency
wI
wiwji ItIfreq )|Pr(Pr}{E ,
wI
)Pr( wI
)head"Pr(" w|I
)arm"Pr(" w|I
)pelvis"Pr(" w|I
...
1 1 5.0 1 ...
...
...
...
......
2.0
05.0
01.0
7.0
07.0
01.0... ... ... ...
8.0
01.0
002.0 01.0
07.0
03.0
,...}pelvis"",arm"",head""{:}{ 210 tttti
...07.0101.05.0
7.0105.01
)|arm""Pr(Pr
}{E ,1
wI
ww
j
II
freqdj
Estimating Segmentation
Word Segmentation Gap between adjacent
connected components above a threshold D
Generate multiple hypotheses with multiple D
If hypothesis Iw overlaps
m other hypotheses, then
wIPr
1
1Pr
m
Iw
d > D
3 hypotheses
wIPr2
1
3
1
2
1
m 1 2 1
Top-Rank (Top-S candidates involved)
Weighted Top-Rank
Empirical
rate OCR )1(R- toprate OCR R- top)|Pr( wi It
otherwise ,0
)rank(1 if ,1
)|Pr(St
SIt iwi
))rank((R it
Word Recognition )|Pr( wi It
i
d
i
d
iwi
i
i
et
etIt
2
2
2
2
2
2
)Pr(
)Pr()|Pr(
Thank you!