Europe’s Beginnings through the Looking Glass: Publishing Historical Documents on the Web Using...

23
Roberto Rosselli Del Turco - Università di Torino Florentina Armaselu - CVCE [email protected] [email protected] Chiara Di Pietro - Università di Pisa Lars Wieneke - CVCE [email protected] [email protected] Raffaele Masotti - Università di Pisa [email protected] 1 www.cvce.eu Europe’s Beginnings through the Looking Glass: Publishing Historical Documents on the Web Using EVT

Transcript of Europe’s Beginnings through the Looking Glass: Publishing Historical Documents on the Web Using...

Roberto Rosselli Del Turco - Università di Torino Florentina Armaselu - [email protected] [email protected] Di Pietro - Università di Pisa Lars Wieneke - [email protected] [email protected] Masotti - Università di Pisa [email protected]

1www.cvce.eu

Europe’s Beginnings through the Looking Glass: Publishing Historical Documents on

the Web Using EVT

The CVCE

Summary 2

1. Overview of the WEU-DIPLO project2. Experiments with Web publication platforms3. EVT adaptation• experiments• publication framework overview

4. Future work5. Conclusion6. References

Summary

Summary 3

Overview of the WEU-DIPLO project: document structure. ©WEU-UEO

Overview WEU-DIPLO 4

Header

Content

Footer

1. Goal: XML-TEI encoding, corpus analysis and Web publication of institutional documents of the W.E.U. (Western European Union):• Topics: armament production, standardization, control in the period from 1954 to 1982;• Source: Archives nationales de Luxembourg, W.E.U collection.

2. Initial format: • digitized versions (JPEG) of typewritten materials (one file per page).

3. Size:

*proc. = processed

Overview of the WEU-DIPLO project

Overview WEU-DIPLO 5

Category Number of documents

Number of documents per language

Number of pages

Number of pages per language

EN FR FR proc.* EN FR FR proc.*

Note 89 43 46 37 395 191 204 155Minutes 30 15 15 15 256 138 118 118Memorandum 3 1 2 2 16 7 9 9Study 2 0 2 1 12 0 12 8

Discourse 1 0 1 0 4 0 4 0Draft protocol 2 1 1 0 4 2 2 0

Total 127 60 67 55 687 338 349 290

Overview of the WEU-DIPLO project: workflow

Overview WEU-DIPLO 6

Microsoft Word Styling (headers, footers) – WEU-DIPLO

Overview WEU-DIPLO 7

Microsoft Word Styling (headings, line breaks, paragraphs) – WEU-DIPLO

Overview WEU-DIPLO 8

XML-TEI Encoding: WEU-DIPLO - metadata, header. ©WEU-UEO

Overview WEU-DIPLO 9

@@hAuthor @@hArchNum

@@hStampConfid@@hDocRef

@@hOrigDate

@@hOrigLang

@@hVersion

XML-TEI Encoding: WEU-DIPLO – Headings, paragraphs, line breaks. ©WEU-UEO

Overview WEU-DIPLO 10

@@Heading2

@@Paragraph

@@LineBreak

INTRODUCTION TO EVT

EVT FOR DIPLOMATIC DOCUMENTS

EVT experiments

Experiments 14

(Partial) customisation:• General layout: folders structure, images renaming.

• EVT Transformer: builder pack (XSLT)o added/modified templates for transforming specific patterns (headers, footers, paragraphs) (layout

not fully supported – e.g. sections, subsections, paragraph indentation, etc.).

• EVT Viewer: CSSo added/modified statements to support visualisation in the browser of specific patterns (alignment,

text decoration, colour of headers, footers, etc.).

• Manual modificationo XML-TEI input: page breaks linked to the facsimile images;o transformation output: changed HTML output to support particular features (Text-Link, HotSpot) (should

not occur in the real workflow).

EVT experiments – facsimile/transcription page side-by-side view (title page). ©WEU-UEO

Experiments 15

1. Goal: • publishing on the CVCE’s Web site different types of documents on

European Integration history.2. Types of documents (for the majority, high quality multilingual

transcriptions are available - TXT, RTF, SRT formats):• treaties;• administrative documents (minutes, notes, memoranda);• press articles;• handwritten notes;• letters;• video and audio archives.

3. Types of features to be implemented (required / optional):• side by side facsimile/transcription (replicating the original with more or

less fidelity) (r);• multipanel alignment (r);• text-image link (o);• zooming (r);• HotSpot (o), etc.

EVT adaptation – towards a TEI-based publication framework – types of documents/features

EVT adaptation 17

EVT adaptation – towards a TEI-based publication framework – manuscript note (Werner corpus)

EVT adaptation 18

EVT adaptation/combination with other tools – towards a TEI-based publication framework – general layout

EVT adaptation 19

EVT adaptation – towards a TEI-based publication framework – architecture, workflow

EVT adaptation 20

General architecture General workflow

1. Identification of features to be implemented in the digital editions:• visualisation;• search.

2. Publication framework design:• core / plugin;• optional / project specific.

3. Implementation of the module for XML-TEI conversion (potential adaptation of OxGarage for batch processing).

4. Implementation/integration into existing CVCE architecture:• Back End;• Front End.

Future work

Future work 21

EVT framework:• flexible enough to support different types of documents in

European integration history; • possibility to compare original / transcription (of interest for

researchers in European integration studies);• different degrees of fidelity to the original can be envisaged

(balance manual / automatic processing).EVT adaptation:

• minimise the amount of manual interventions in the XML-TEI documents;

• publication framework with modular architecture to allow gradual development and customisation according to the needs of the projects.

Conclusion

Future work 22

DEMO

THANKS A LOT FOR YOUR ATTENTION

• EVT (Edition Visualization Technology): http://sourceforge.net/projects/evt-project/

• KILN : http://kiln.readthedocs.org/en/latest/#

• TEIBoilerplate : http://dcl.ils.indiana.edu/teibp/ • TEI (Text Encoding Initiative): http://www.tei-c.org • Versioning Machine: http://v-machine.org/ • XTF (eXtensible Text Framework): http://xtf.cdlib.org/about/

References

References 25