Retrieval of the Ornaments from the Hand-Press Period: an Overview Etienne BaudrierLSIIT (Illkirch,...

14
Retrieval of the Ornaments from the Hand-Press Period: an Overview Etienne Baudrier LSIIT (Illkirch, France) Sébastien Busson CESR (Tours, France) Silvio Corsini BCU (Lausanne, Switzerland) Mathieu Delalandre CVC (Barcelona, Spain) Jérôme Landré CReSTIC (Troyes, France) Frédéric Morain- CReSTIC (Troyes,

Transcript of Retrieval of the Ornaments from the Hand-Press Period: an Overview Etienne BaudrierLSIIT (Illkirch,...

Retrieval of the Ornaments from the Hand-Press Period: an Overview

Etienne Baudrier LSIIT (Illkirch, France)

Sébastien Busson CESR (Tours, France)

Silvio Corsini BCU (Lausanne, Switzerland)

Mathieu Delalandre CVC (Barcelona, Spain)

Jérôme Landré CReSTIC (Troyes, France)

Frédéric Morain-Nicolier CReSTIC (Troyes, France)

Plan

• About this work …

• Hand Press Period

• About Ornaments

• Digital Collection of Ornaments

• How DIA can help ?

• Content Based Image Retrieval

• Visual Comparison

• Conclusions and Perspectives

About this work …

Computer Science People

1. Etienne Baudrier2. Mickael Coustaty3. Mathieu Delalandre 4. Nathalie Girard5. Nicholas Journet6. Dimosthenis Karatzas7. Jerome Landré8. Kamel Ait-Mohand9. Jean-Marc Ogier10.Nicolas Ragot11.Jean-Yves Ramel

Human Science People

1. Pierre Aquilon2. Sébastien Busson3. Silvio Corsini4. Marie-Luce Demonet5. Stephen Rawles 6. Toshinori Uetani

One-day Workshop13th November 2007

CESR, Tours city, France

CESR

Labs of Human ScienceLabs of Computer Science

Hand Press Period (1/2)

The Hand-Press period runs from around 1454 (approximate date of Gutenberg’s invention) tothrough the first half of the nineteenth century (when mechanized presses started to appear).

a hand-press book

1454Gutenberg

half 18th

mechanized presses

Hand Press

hand press character matrix

Hand Press Period (2/2)

HPB Databasehttp://www.cerl.org/ 22 European libraries1450 - half 19th

3 Millions books

Trinity old library (Dublin, Ireland)16th - today

Mathematics, medicine, history, music, religion, literature, etc.

About Ornaments (1/2)

Ornaments in pages

“lettrine”

“fleuron”

to start a paragraph

trademark of a printing house

“cul de lampe”

to close a part or a chapter

to epitomize a concept , or to represent a person, such as a king or saint.

“emblème”

Categories of ornaments

ornaments text

About Ornaments (2/2)

Page 3,4 ornaments/page

Book 103,4 ornaments/book

Foreground pixels [Journet’05]

Text 63%

Graphics 37%

Part of ornaments in books(BVH dataset, 46 books)

sciences, medical, religion …

Hand Press books are composed for a large part of ornaments.

Pictures were a powerful mean of communication at this period due to the low education level of people.

Digital Collections of Ornaments (1/2)

25H112 rocks

44G312 prisoner; in fetters

91E461 punishment of Prometheus; he is chained to a rock, usually by Vulcan and/or Mercury

91E4611 an eagle tears at Prometheus' liver

DigitalizationPre-processing

(deskew, lighting correction, filtering, cropping…)

Layout analysisand segmentation

[Ramel’07]

Expert Classification using thesaurus

icon class encoding of an emblem image

Digital Collections of Ornaments (2/2)

DLs Size Periods Web links

BVH 14 000 16th http://www.bvh.univ-tours.fr

Fleuron 6 600 17th http://dbserv1-bcu.unil.ch/ornements/scripts/

Impact 2 200 16th-18th http://eclipsi.bib.ub.es/imp/impcat.htm

Mouriau 1 850 18th http://www.ornements-typo-mouriau.be/

Moriane 1 500 18th http://promethee.philo.ulg.ac.be/moriane/ornSearch.aspx

26 150

Collections of ornaments are small in regard to mass digitalization collections (e.g. Million Book Project), two main reasons:

(1)Mass digitalization projects are thought in terms of OCR only (layout analysis aims to perform text/graphics separation, final electronic documents are “ASCII code”, no use of high-level document model)

Digitalization programs should consider better the graphics aspects.

(2)Classification using thesaurus by human experts is time consuming (15-20 mn per image) Collaborative platforms, integrating DIA components, can help in.

Other smallest datasets are ArtDico, Canadian heraldry, Printers' Devices, etc.

How DIA can help ? (1/2)

A duplicated block

Redundancy of ornaments in books

A same block used in 2 books

Vascosan 1555 Marnef 1576Printing house

tamponexchange

copy

1531-1548

1511-1542

1555-1578

1497-1507

Tracking of plugs noise

offset

precision

skewing

scaling

scalability,mass of data

weak resolution,lossy compression

How DIA can help ? (2/2)

DB1

DB2

CBIR

DBn

---

Query image

Visual Comparison

R1

R2

R3

Context informationPublication datesPublication placesPractices of printers…

submita query

retrievalresults

comparison

visualization

assign previous classification

Meta

Meta

Meta

Meta

Digital CollectionsOf ornaments

Content Based Image Retrieval

Ideal methodHigh precision (weak difference)Robust (noise, skew, offset)Invariant to scaleFast comparison (online, mass of data)Scalable

Bigun’96

Chen’03

w

h

h w

Radiogram 0° Radiogram 90°

Detection of key points

(Haris)

Zernike moments

(local template)

Nearest points compared with a

likelihood estimation

Baudrier’08

Expert set resolution analysis

Hausdorff distance

between images

SVM classification

Delalandre’07

Run Length Encoding

Histogram centering

RLE Comparison

Orientation Radiograms

Fourier Descriptors

Euclidean Distance Comparison

Visual Comparison

Ideal methodHighlight pertinent differencesMake an hypothesis of relative datingInvariant to scaleRobust (noise, skew, offset)

Beusekom’07

Detection of points of interest (connected

components)

Pixel to Pixel Difference Map

(PPDMap)

PPDMapBlockA#1 LDMapBlockA#2

Baudrier’07

Equivalent ellipse computation

(first image moments)

Local Dissimilarity Map (LDMap)

Image Registration

Visualization Method

Conclusions and Perspectives

Large ornament material is available, but there is few digital collections Digitalization programs should consider better the graphics aspects. Collaborative platforms, integrating DIA components, can help in. Two database levels (with, without thesaurus classification)

DIA components CBIR systems (orientation signature, points of interest, image distance, compressed

representation) Lack of evaluation of the methods make difficult the comparison To define benchmark datasets (time, precision/recall) Methods propose a tradeoff between complexity/precision, possible combination

Visual Comparison (registration, PPDMap, LDMap) Hard point is the registration, user interaction could help in