Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers Recognition of...

19
Recognition of Recognition of Fragmented Fragmented Characters Using Characters Using Multiple Feature- Multiple Feature- Subset Classifiers Subset Classifiers Institute of Information Scienc Institute of Information Scienc e, e, Academia Sinica, Taiwan Academia Sinica, Taiwan C.H. Chou, C.Y. Guo, and F. Chang C.H. Chou, C.Y. Guo, and F. Chang Inter. Conf. Document Analysis and Recognition 2007

Transcript of Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers Recognition of...

Page 1: Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers.

Recognition of Recognition of Fragmented Fragmented

Characters Using Characters Using Multiple Feature-Multiple Feature-Subset Classifiers Subset Classifiers

Institute of Information Science, Institute of Information Science, Academia Sinica, TaiwanAcademia Sinica, Taiwan

C.H. Chou, C.Y. Guo, and F. ChangC.H. Chou, C.Y. Guo, and F. Chang

Inter. Conf. Document Analysis and Recognition 2007

Page 2: Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers.

Introduction

Recognizing fragmented characters, broken Recognizing fragmented characters, broken characters, in printed documents of poor characters, in printed documents of poor printing quality.printing quality.

Complement to ordinary mending techniques.Complement to ordinary mending techniques. Using only intact characters as training Using only intact characters as training

samples.samples. Multiple features apply to enhance recognition Multiple features apply to enhance recognition

accuracy.accuracy. The resultant classifiers can classify both The resultant classifiers can classify both

intact and fragmented characters with a high intact and fragmented characters with a high degree of accuracy.degree of accuracy.

Page 3: Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers.

ExampleExample

from Chinese newspapers published between 1951 and 1961.

(a) most severe (b) less severe (c) least severe

Page 4: Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers.

Feature ExtractionFeature Extraction Binary image, each pixel is represented by 1 (black) or Binary image, each pixel is represented by 1 (black) or

0 (white).0 (white). LD (Linear Normalization + Density Feature)LD (Linear Normalization + Density Feature) Invariant to character fragmentation.Invariant to character fragmentation. LN → Reduction.LN → Reduction. Feature vector consists of 256 components, values Feature vector consists of 256 components, values

range [0, 16].range [0, 16]. ND (Nonlinear Sharp Normalization + Direction ND (Nonlinear Sharp Normalization + Direction

Feature)Feature) Invariant to sharp deformation.Invariant to sharp deformation. NSN → Contour → 4 Direction map → Blurring → NSN → Contour → 4 Direction map → Blurring →

Reduction.Reduction. Feature vector consists of 256 components, values Feature vector consists of 256 components, values

range [0, 255].range [0, 255].

Page 5: Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers.

Random Subspace Random Subspace MethodMethod

The Random Subspace Method (RSM) consists in random selection of a certain number of subspaces from the original feature space, and train a classifier on each subspace

Each set of training samples is derived from a Each set of training samples is derived from a set of feature vectors projected into a set of feature vectors projected into a subspace.subspace.

Subspace Projection of ordinary feature Subspace Projection of ordinary feature vector to Sub-characters.vector to Sub-characters.

Randomly select a small number of Randomly select a small number of dimensions from a ordinary feature vector.dimensions from a ordinary feature vector.

The applied dimensions (w) of subspace: 32, The applied dimensions (w) of subspace: 32, 64, 128.64, 128.

Page 6: Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers.

Random Subspace MethodRandom Subspace Method

Page 7: Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers.

Random Subspace MethodRandom Subspace Method

Voting

Page 8: Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers.

Filter Model of Feature Selection

RSM

Page 9: Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers.

Wrapper Model of Feature Selection

Page 10: Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers.

Architecture of the proposed method

Page 11: Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers.

An ExampleAn Example

Page 12: Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers.

Classification MethodsClassification Methods

Page 13: Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers.

Experiment resultsExperiment results

Page 14: Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers.

The accuracy of different The accuracy of different classification methodsclassification methods

Multiple classifiers outperform single classifiers.Multiple classifiers outperform single classifiers. Hybrid feature always outperforms both LD and NHybrid feature always outperforms both LD and N

D features.D features. GCNNs performs higher accuracy than CARTs.GCNNs performs higher accuracy than CARTs.

Page 15: Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers.

Computation time of the two classification methods.

Page 16: Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers.

The accuracy for three The accuracy for three types of test documentstypes of test documents

LD outperforms ND for most severe and less LD outperforms ND for most severe and less severe data.severe data.

ND is better than LD for least severe data.ND is better than LD for least severe data. Hybrid has the better accuracy than either Hybrid has the better accuracy than either

LD or ND.LD or ND.

Page 17: Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers.

CARTs VS. GCNNsCARTs VS. GCNNs The accuracy rates of CARTs and GCNNs witThe accuracy rates of CARTs and GCNNs wit

h incremental number of classifiers and diffh incremental number of classifiers and different w of subspaceerent w of subspace

The more classifiers get the better accuracy.The more classifiers get the better accuracy. GCNNs require fewer classifiers to archive sGCNNs require fewer classifiers to archive s

aturation accuracy than CARTs.aturation accuracy than CARTs.

Page 18: Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers.

CARTs VS. GCNNsCARTs VS. GCNNs

Page 19: Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers.

Conclusion

Proposing a learning approach to deal with both intact and fragmented characters in archived newspapers

The multiple predictors achieve much higher accuracy rates than single classifiers.

The hybrid predictors, which use both types of feature, perform better than those using only a single feature.

GCNN rule achieve higher accuracy, and require fewer classifiers, than those generated by the CART algorithm.