An E-team on statistical techniques for unsupervised segmentation and classification E. Salerno CNR...
-
date post
20-Jan-2016 -
Category
Documents
-
view
216 -
download
0
Transcript of An E-team on statistical techniques for unsupervised segmentation and classification E. Salerno CNR...
![Page 1: An E-team on statistical techniques for unsupervised segmentation and classification E. Salerno CNR – Istituto di Scienza e Tecnologie dell’Informazione.](https://reader036.fdocuments.in/reader036/viewer/2022062519/56649d7a5503460f94a5f0a7/html5/thumbnails/1.jpg)
An E-team on statistical techniques for unsupervised segmentation and
classification
E. Salerno
CNR – Istituto di Scienza e Tecnologie dell’Informazione
Pisa, Italy
Muscle Joint WP5-WP7 Focus Meeting, Rocquencourt, December 2005
![Page 2: An E-team on statistical techniques for unsupervised segmentation and classification E. Salerno CNR – Istituto di Scienza e Tecnologie dell’Informazione.](https://reader036.fdocuments.in/reader036/viewer/2022062519/56649d7a5503460f94a5f0a7/html5/thumbnails/2.jpg)
Overview
• Unsupervised processing: Why?
• Statistical approach
• What we have done
• What we propose
• What we would like to share with partners
![Page 3: An E-team on statistical techniques for unsupervised segmentation and classification E. Salerno CNR – Istituto di Scienza e Tecnologie dell’Informazione.](https://reader036.fdocuments.in/reader036/viewer/2022062519/56649d7a5503460f94a5f0a7/html5/thumbnails/3.jpg)
Unsupervised processing: why?
Unsupervised processing is often essential in important applications
Document image analysis
Showthrough cancellationYo no quiero encarecerte el servicio que te hago en darte a conocer tan notable y tan honorado caballero; pero quiero que me agradezcas ...
OCR
Remote sensing Thematization
Classification
![Page 4: An E-team on statistical techniques for unsupervised segmentation and classification E. Salerno CNR – Istituto di Scienza e Tecnologie dell’Informazione.](https://reader036.fdocuments.in/reader036/viewer/2022062519/56649d7a5503460f94a5f0a7/html5/thumbnails/4.jpg)
Statistical approach
Problem setting
• A data model
• A source model
• A statistically significant data sample
• Learn the model (use statistics)
• Estimate the sources (inverse problem)
![Page 5: An E-team on statistical techniques for unsupervised segmentation and classification E. Salerno CNR – Istituto di Scienza e Tecnologie dell’Informazione.](https://reader036.fdocuments.in/reader036/viewer/2022062519/56649d7a5503460f94a5f0a7/html5/thumbnails/5.jpg)
Statistical approach
Methods
• Independent component analysis
• Dependent component analysis
• Bayesian approaches
Applications
• Multispectral data analysis
• Multisensor data analysis
• Multiview data analysis
![Page 6: An E-team on statistical techniques for unsupervised segmentation and classification E. Salerno CNR – Istituto di Scienza e Tecnologie dell’Informazione.](https://reader036.fdocuments.in/reader036/viewer/2022062519/56649d7a5503460f94a5f0a7/html5/thumbnails/6.jpg)
What we have done in document image analysis
Original Recovery of bleed-through
Color decorrelation
![Page 7: An E-team on statistical techniques for unsupervised segmentation and classification E. Salerno CNR – Istituto di Scienza e Tecnologie dell’Informazione.](https://reader036.fdocuments.in/reader036/viewer/2022062519/56649d7a5503460f94a5f0a7/html5/thumbnails/7.jpg)
Attenuation of stains
What we have done in document image analysis
Color decorrelation
![Page 8: An E-team on statistical techniques for unsupervised segmentation and classification E. Salerno CNR – Istituto di Scienza e Tecnologie dell’Informazione.](https://reader036.fdocuments.in/reader036/viewer/2022062519/56649d7a5503460f94a5f0a7/html5/thumbnails/8.jpg)
Data Output 1
Output 3Output 2
What we have done in document image analysis
Independent component analysis
Text extraction from ancient palimpsests
© The Owner of the Archimedes Palimpsest
![Page 9: An E-team on statistical techniques for unsupervised segmentation and classification E. Salerno CNR – Istituto di Scienza e Tecnologie dell’Informazione.](https://reader036.fdocuments.in/reader036/viewer/2022062519/56649d7a5503460f94a5f0a7/html5/thumbnails/9.jpg)
Text separation from color document scans
Edge-preserving Bayesian approach
What we have done in document image analysis
Main text pattern at convergence
Show-through outline at convergence
Main text outline at convegence
Show-through pattern at convergence
![Page 10: An E-team on statistical techniques for unsupervised segmentation and classification E. Salerno CNR – Istituto di Scienza e Tecnologie dell’Informazione.](https://reader036.fdocuments.in/reader036/viewer/2022062519/56649d7a5503460f94a5f0a7/html5/thumbnails/10.jpg)
What we have done in document image analysis
Other document image processing applications
• Watermark extraction
• Joint deblurring and separation
• Color restoration
• Show-through cancellation/extraction from recto-verso grayscale scans
![Page 11: An E-team on statistical techniques for unsupervised segmentation and classification E. Salerno CNR – Istituto di Scienza e Tecnologie dell’Informazione.](https://reader036.fdocuments.in/reader036/viewer/2022062519/56649d7a5503460f94a5f0a7/html5/thumbnails/11.jpg)
What we propose
E-team on statistical techniques for unsupervised segmentation and classification
We are looking for partners with similar interests to collaborate in
• Extensive experimentation of available procedures on multispectral document data
• Development of specific data models for color/multispectral or grayscale recto-verso document images
• Ad-hoc registration procedures for recto and verso pages
• Joint deblurring-segmentation• Training (exploit MUSCLE fellowships)
![Page 12: An E-team on statistical techniques for unsupervised segmentation and classification E. Salerno CNR – Istituto di Scienza e Tecnologie dell’Informazione.](https://reader036.fdocuments.in/reader036/viewer/2022062519/56649d7a5503460f94a5f0a7/html5/thumbnails/12.jpg)
What we propose
What we would like to share with partners
• ICA software for text extraction• Expertise in separation and
deblurring procedures• Graylevel recto-verso test
database (Gerolamo Cardano’s Contradicentium Medicorum, 1663)
![Page 13: An E-team on statistical techniques for unsupervised segmentation and classification E. Salerno CNR – Istituto di Scienza e Tecnologie dell’Informazione.](https://reader036.fdocuments.in/reader036/viewer/2022062519/56649d7a5503460f94a5f0a7/html5/thumbnails/13.jpg)
What we propose
People at ISTI
• Anna Tonazzini• Ercan Kuruoglu• Emanuele Salerno• MUSCLE Fellow(s)• Research collaborators
![Page 14: An E-team on statistical techniques for unsupervised segmentation and classification E. Salerno CNR – Istituto di Scienza e Tecnologie dell’Informazione.](https://reader036.fdocuments.in/reader036/viewer/2022062519/56649d7a5503460f94a5f0a7/html5/thumbnails/14.jpg)