Cornell July 25, 2002 NUMDAM Pierre Bérard Institut Fourier, CNRS–Université Joseph Fourier &...

31
Cornell July 25, 2002 NUMDAM Pierre Bérard Institut Fourier, CNRS–Université Joseph Fourier & Cellule MathDoc, CNRS–Université Joseph Fourier Grenoble (France)

Transcript of Cornell July 25, 2002 NUMDAM Pierre Bérard Institut Fourier, CNRS–Université Joseph Fourier &...

Page 1: Cornell July 25, 2002 NUMDAM Pierre Bérard Institut Fourier, CNRS–Université Joseph Fourier & Cellule MathDoc, CNRS–Université Joseph Fourier Grenoble.

Cornell July 25, 2002 NUMDAM

Pierre Bérard

Institut Fourier, CNRS–Université Joseph Fourier

&

Cellule MathDoc, CNRS–Université Joseph Fourier

Grenoble (France)

Page 2: Cornell July 25, 2002 NUMDAM Pierre Bérard Institut Fourier, CNRS–Université Joseph Fourier & Cellule MathDoc, CNRS–Université Joseph Fourier Grenoble.

Cornell July 25, 2002 NUMDAM

Cellule MathDocwww-mathdoc.ujf-grenoble.fr

• An institute on Scientific Information & Communication in Mathematics, supported by Centre National de la Recherche Scientifique (CNRS) and Ministère de la Recherche.

• General mission: documentation issues in mathematics at the national level in France, in cooperation with mathematics libraries and institutes.

Page 3: Cornell July 25, 2002 NUMDAM Pierre Bérard Institut Fourier, CNRS–Université Joseph Fourier & Cellule MathDoc, CNRS–Université Joseph Fourier Grenoble.

Cornell July 25, 2002 NUMDAM

Page 4: Cornell July 25, 2002 NUMDAM Pierre Bérard Institut Fourier, CNRS–Université Joseph Fourier & Cellule MathDoc, CNRS–Université Joseph Fourier Grenoble.

Cornell July 25, 2002 NUMDAM

NUMDAMDigitisation of Ancient Mathematics Documents

NUMérisation de

Documents

Anciens

Mathématiques

A digitisation program supported by and Ministère de la Recherche, managed by the Cellule MathDoc.

Page 5: Cornell July 25, 2002 NUMDAM Pierre Bérard Institut Fourier, CNRS–Université Joseph Fourier & Cellule MathDoc, CNRS–Université Joseph Fourier Grenoble.

Cornell July 25, 2002 NUMDAM

NUMDAM: aims

• Reinforce French mathematical journals (visibility, accessibility, durability).

• Hand down digitised archives of the French mathematical heritage to future generations and participate in international efforts with the same endeavour.

• Strive towards making this digitised mathematical heritage freely accessible.

Page 6: Cornell July 25, 2002 NUMDAM Pierre Bérard Institut Fourier, CNRS–Université Joseph Fourier & Cellule MathDoc, CNRS–Université Joseph Fourier Grenoble.

Cornell July 25, 2002 NUMDAM

Political choices

• Database freely accessible on the web.• Full text freely accessible after a moving – wall

(depending on each serial).• Scheduled interoperability between retro-digitized

and natively digital collections.• National and international co-operations in as far

as possible.

Page 7: Cornell July 25, 2002 NUMDAM Pierre Bérard Institut Fourier, CNRS–Université Joseph Fourier & Cellule MathDoc, CNRS–Université Joseph Fourier Grenoble.

Cornell July 25, 2002 NUMDAM

Technical choices

• Scan from first to last page @ 600 dpi.

• OCR (non-corrected @99,9%, mathematical formulae and images excluded).

• Multi-page files for logical units (TIFF, PDF + hidden text, DjVu).

• End-of-article bibliographies treated (corrected OCR @ 99,99% + mark-up of “ author ”, “ title ”, “ year ” fields)

• Database: cataloguing data for each article, summary (if present), end-of-article bibliography (if present), hidden OCRed text. Structured data exchange in XML.

• In as far as possible links to/from JFM, ZM and MR databases.

• Future enhancements scheduled depending on technology available.

Page 8: Cornell July 25, 2002 NUMDAM Pierre Bérard Institut Fourier, CNRS–Université Joseph Fourier & Cellule MathDoc, CNRS–Université Joseph Fourier Grenoble.

Cornell July 25, 2002 NUMDAM

Production choices

• Use of an external operator for the technical treatments.

• « In house » study, segmentation, cataloguing, quality control, and display.

• Quality and durability policy :

Prefer standard and easily convertible formats, as sources of future processing if necessary (TIFF, XML), not be tied to a proprietary system.

Archive high quality images, which should allow to regenerate the text (formula OCR, structure recognition).

Page 9: Cornell July 25, 2002 NUMDAM Pierre Bérard Institut Fourier, CNRS–Université Joseph Fourier & Cellule MathDoc, CNRS–Université Joseph Fourier Grenoble.

Cornell July 25, 2002 NUMDAM

NUMDAM Phase IJournals

Journal Period

Annales de l’Institut Fourier 1949 – 2000

Bulletin de la Société mathématique de France 1872 – 2000

Mémoires de la Société mathématique de France 1964 – 2000

Publications mathématiques (IHÉS) 1959 – 2000

Journées équations aux dérivées partielles (Saint-Jean-de-Monts)

1975 – 2000

About 136 000 pages and 5 500 articles

Annales scientifiques de l’École normale supérieure

1864 – 1998

About 67 000 pages and 1 750 articles

Page 10: Cornell July 25, 2002 NUMDAM Pierre Bérard Institut Fourier, CNRS–Université Joseph Fourier & Cellule MathDoc, CNRS–Université Joseph Fourier Grenoble.

Cornell July 25, 2002 NUMDAM

NUMDAM Phase I: Chronology

• Spring 2003. — End of the industrial phase of NUMDAM Phase I, public access to articles via the web.

• Autumn 2002. — Start of NUMDAM Phase II. Dealing with © issues continued.

• August 2002. — First 50,000 pages delivered by vendor.

• Feb. - May 2002. — Setting-up production chain (vendor) and quality control (Cellule MathDoc). Dealing with © issues.

• Dec. 2001. — Choice of vendor validated by CNRS.

• Nov. 2000 - Oct. 2001. — Cataloguing and checking database.

• Oct. 2000 - May 2001. — Writing up schedule of conditions/vendor.

• July 2000. — Funding by CNRS.

Page 11: Cornell July 25, 2002 NUMDAM Pierre Bérard Institut Fourier, CNRS–Université Joseph Fourier & Cellule MathDoc, CNRS–Université Joseph Fourier Grenoble.

Cornell July 25, 2002 NUMDAM

NUMDAM Phase II

• Take an active part in the Digital Mathematics Library project. Cooperate with other digitisation projects (Gallica–BnF, possibly EMANI digitisation part).

Inventory of resources & cooperation with historians and mathematicians to make scientific choices and establish priorities, in order to

• Digitise all French mathematics journals (Annales de l’Institut Henri Poincaré, Annales de l’Université de Toulouse, Comptes Rendus de l’Académie, Journal de l’École polytechnique, ....), and possibly some mathematically important general science journals.

• Digitise important seminar series (séminaires Bourbaki, Cartan, séminaire de Probabilités de Strasbourg, ...).

• Digitise a substantial set of important monographies.

Page 12: Cornell July 25, 2002 NUMDAM Pierre Bérard Institut Fourier, CNRS–Université Joseph Fourier & Cellule MathDoc, CNRS–Université Joseph Fourier Grenoble.

Cornell July 25, 2002 NUMDAM

Software developments

SQL XML

Quality control

Authors id & ©

Display

Links

Database maintenance

Quality control

Schedule of technical conditions

VendorDigitisation

SegmentationTreatements

(ocr & bibliographies)

Display: Search and Browsing

Links: JFM, MR, ZM

Examination of collections and setting-

up the database

Copyright issues and negotiations with

publishers

NUMDAM programme: overview

Page 13: Cornell July 25, 2002 NUMDAM Pierre Bérard Institut Fourier, CNRS–Université Joseph Fourier & Cellule MathDoc, CNRS–Université Joseph Fourier Grenoble.

Cornell July 25, 2002 NUMDAM

Quality control procedure

LOG

(Log of errors)

Automatic controlPerl

Sorting samples Perl

Samples

(files TIFF; XML, TIFF, PDF, DjVu)

Files received from vendor

TIFF; XML, TIFF, PDF and DjVu

Log of errors

BDMySQL

Check-listPhp

Visual control

Synthesis

Rejection

Validation

Page 14: Cornell July 25, 2002 NUMDAM Pierre Bérard Institut Fourier, CNRS–Université Joseph Fourier & Cellule MathDoc, CNRS–Université Joseph Fourier Grenoble.

Cornell July 25, 2002 NUMDAM

NUMDAM Programme

XML description of physical volumes

Page 15: Cornell July 25, 2002 NUMDAM Pierre Bérard Institut Fourier, CNRS–Université Joseph Fourier & Cellule MathDoc, CNRS–Université Joseph Fourier Grenoble.

Cornell July 25, 2002 NUMDAM

Publications Mathématiques de l’Institut des Hautes Études Scientifiques

Physical volume: Year 1962, Volume 12

Page 16: Cornell July 25, 2002 NUMDAM Pierre Bérard Institut Fourier, CNRS–Université Joseph Fourier & Cellule MathDoc, CNRS–Université Joseph Fourier Grenoble.

Cornell July 25, 2002 NUMDAM

A paper in a physical volume

Article by Bernard Dwork in Publications Mathématiques IHÉS, 12 (1962), 5-68

Page 17: Cornell July 25, 2002 NUMDAM Pierre Bérard Institut Fourier, CNRS–Université Joseph Fourier & Cellule MathDoc, CNRS–Université Joseph Fourier Grenoble.

Cornell July 25, 2002 NUMDAM

Page 18: Cornell July 25, 2002 NUMDAM Pierre Bérard Institut Fourier, CNRS–Université Joseph Fourier & Cellule MathDoc, CNRS–Université Joseph Fourier Grenoble.

Cornell July 25, 2002 NUMDAM

Bibliographies

Page 19: Cornell July 25, 2002 NUMDAM Pierre Bérard Institut Fourier, CNRS–Université Joseph Fourier & Cellule MathDoc, CNRS–Université Joseph Fourier Grenoble.

Cornell July 25, 2002 NUMDAM

Cross-linking

External databasesJFM, MR, ZM, ...DB of articles & DB of images

MR 28#3039

ZM 0173.48601

MR 10,592e

ZM 0032.39402

PMIHES_1962__12__5_0

EDBM

SQL

PDFDjVu

Page 20: Cornell July 25, 2002 NUMDAM Pierre Bérard Institut Fourier, CNRS–Université Joseph Fourier & Cellule MathDoc, CNRS–Université Joseph Fourier Grenoble.

Cornell July 25, 2002 NUMDAM

MR —— NUMDAM

MR–lookup

|Publications IHES|Shih||13||1962||PMIHES_1962__13__5_0||

|Inst. Hautes Etudes Sci. Publ. Math.|Shih||13||1962||PMIHES_1962__13__5_0|26#1893|Homologie des espaces fibr\'es.

BdD NUMDAM

MR

MR–lookup

Page 21: Cornell July 25, 2002 NUMDAM Pierre Bérard Institut Fourier, CNRS–Université Joseph Fourier & Cellule MathDoc, CNRS–Université Joseph Fourier Grenoble.

Cornell July 25, 2002 NUMDAM

JFM & ZM —— NUMDAM

New identification tool in development in the LIMES framework (EU project)

|Publications IHES|Shih||13||1962||PMIHES_1962__13__5_0||

|Inst. Hautes Etudes Sci. Publ. Math.|Shih||13||1962||PMIHES_1962__13__5_0|0105.16903|Homologie des espaces fibr\'es.

BdD NUMDAM

ZM

ZM–lookup

Page 22: Cornell July 25, 2002 NUMDAM Pierre Bérard Institut Fourier, CNRS–Université Joseph Fourier & Cellule MathDoc, CNRS–Université Joseph Fourier Grenoble.

Cornell July 25, 2002 NUMDAM

Identification of authors:two purposes

• Improve search facilities by setting-up a reference list of authors.

• Provide a tool to help address copyright issues.

Page 23: Cornell July 25, 2002 NUMDAM Pierre Bérard Institut Fourier, CNRS–Université Joseph Fourier & Cellule MathDoc, CNRS–Université Joseph Fourier Grenoble.

Cornell July 25, 2002 NUMDAM

Internal tool ...

Page 24: Cornell July 25, 2002 NUMDAM Pierre Bérard Institut Fourier, CNRS–Université Joseph Fourier & Cellule MathDoc, CNRS–Université Joseph Fourier Grenoble.

Cornell July 25, 2002 NUMDAM

Page 25: Cornell July 25, 2002 NUMDAM Pierre Bérard Institut Fourier, CNRS–Université Joseph Fourier & Cellule MathDoc, CNRS–Université Joseph Fourier Grenoble.

Cornell July 25, 2002 NUMDAM

Page 26: Cornell July 25, 2002 NUMDAM Pierre Bérard Institut Fourier, CNRS–Université Joseph Fourier & Cellule MathDoc, CNRS–Université Joseph Fourier Grenoble.

Cornell July 25, 2002 NUMDAM

Page 27: Cornell July 25, 2002 NUMDAM Pierre Bérard Institut Fourier, CNRS–Université Joseph Fourier & Cellule MathDoc, CNRS–Université Joseph Fourier Grenoble.

Cornell July 25, 2002 NUMDAM

Page 28: Cornell July 25, 2002 NUMDAM Pierre Bérard Institut Fourier, CNRS–Université Joseph Fourier & Cellule MathDoc, CNRS–Université Joseph Fourier Grenoble.

Cornell July 25, 2002 NUMDAM

NUMDAM: search interface based on EDBM (in development)

Page 29: Cornell July 25, 2002 NUMDAM Pierre Bérard Institut Fourier, CNRS–Université Joseph Fourier & Cellule MathDoc, CNRS–Université Joseph Fourier Grenoble.

Cornell July 25, 2002 NUMDAM

JFM MRZM

Abstract if available

Page 30: Cornell July 25, 2002 NUMDAM Pierre Bérard Institut Fourier, CNRS–Université Joseph Fourier & Cellule MathDoc, CNRS–Université Joseph Fourier Grenoble.

Cornell July 25, 2002 NUMDAM

NUMDAM URLs• Main:

www-mathdoc.ujf-grenoble.fr/NUMDAM/

• Visitors (sample files):www-mathdoc.ujf-grenoble.fr/NUMDAM/Visitors/

Login: VISITORS Pwd: v\to\num

• LiNuM (Books at BnF, Cornell, Göttingen, Michigan):www-mathdoc.ujf-grenoble.fr/LiNuM/

• Journal de Mathématiques Pures et Appliquées 1836 – 1880 (BnF):www-mathdoc.ujf-grenoble.fr/JMPA/

• Search NUMDAM database:math-sahel.ujf-grenoble.fr/NUMDAM/Public/Bd/consultation.htm

• Inventory:math-sahel.ujf grenoble.fr/NUMDAM/Public/Inventaire/inventaire.htm

Page 31: Cornell July 25, 2002 NUMDAM Pierre Bérard Institut Fourier, CNRS–Université Joseph Fourier & Cellule MathDoc, CNRS–Université Joseph Fourier Grenoble.

Cornell July 25, 2002 NUMDAM

Thank you for your attention ...