Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln...
Transcript of Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln...
![Page 1: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/1.jpg)
Manfred ThallerUniversität zu* Köln
Characterisation
Digital Preservation Planning: Principles, Examples and the Future with Planets.
July 29th, 2008
* University at, NOT of Cologne
![Page 2: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/2.jpg)
I – What is (in) a format?
2
![Page 3: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/3.jpg)
An image
3
![Page 4: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/4.jpg)
An image
4
6 rows5 columns
![Page 5: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/5.jpg)
An image
5
5 rows6 columns
![Page 6: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/6.jpg)
An image
6
1 1 1 1 11 0 0 0 11 1 0 1 1
1 1 0 1 11 1 0 1 11 1 1 1 1
1 == blue 0 == red
![Page 7: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/7.jpg)
An image
7
1 1 1 1 11 0 0 0 11 1 0 1 1
1 1 0 1 11 1 0 1 11 1 1 1 1
1 == green 0 == yellow
![Page 8: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/8.jpg)
An image
8
1 1 1 1 11 0 0 0 11 1 0 1 1
1 1 0 1 11 1 0 1 11 1 1 1 1
Store:1,1,1,1,1,1,0,0,0,1,1,1,0,1,1,1,1,0,1,1,1,1,0,1,1,1,1,1,1,1
Uncompressed
![Page 9: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/9.jpg)
An image
9
1 1 1 1 11 0 0 0 11 1 0 1 1
1 1 0 1 11 1 0 1 11 1 1 1 1
Store:6,1,3,0,3,1,1,0,4,1,1,0,4,1,1,0,7,1
(Compressed)Run Length Encoded
![Page 10: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/10.jpg)
An image
10
dimensions
photogrammetric interpretation
compression
![Page 11: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/11.jpg)
An image
11
<basic information>
<rendering information>
<storage information>
<data>
![Page 12: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/12.jpg)
File format
12
<basic information>What to do?
<rendering information>How to do it?
<storage information>How to move it from persistent to
deployed form?<data>
What to deploy?
![Page 13: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/13.jpg)
13
<basic information>What to do?
<rendering information>How to do it?
<storage information>How to move it from persistent to
deployed form?<data>
What to deploy?
File format
![Page 14: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/14.jpg)
14
<basic information>Mandatory
<rendering information>Useful
<storage information>Historical
<data>Mandatory
File format
![Page 15: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/15.jpg)
15
A deterministic specification how the properties of a digital object can reversibly be converted into a linear bytestream (bitstream).
File format
![Page 16: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/16.jpg)
II – Why would we want to know?
16
![Page 17: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/17.jpg)
III – Which format to choose?
17
![Page 18: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/18.jpg)
Recommended formats: text
18
High confidence Medium confidence Low confidence
� Plain text (encoding: ISO8859-1 - 9, UTF-8, UTF-16 with BOM)� XML (includes XSD/XSL/XHTML, etc.; with included or accessibleschema and characterencoding explicitlyspecified)� PDF/A-1 (ISO 19005-1)
� Cascading Style Sheets (*.css)� DTD (*.dtd)� PDF (*.pdf) (embedded fonts)� Rich Text Format 1.x (*.rtf)� HTML 4.x (include aDOCTYPE declaration)� SGML (*.sgml)� Open Office (*.sxw/*.odt)� Office Open XML (*.docx)
�PDF (*.pdf) (encrypted)� Microsoft Word (*.doc)� WordPerfect (*.wpd)� DVI (*.dvi)� All other text formats notlisted here
http://www.fcla.edu/digitalArchive/pdfs/recFormats.pdf
![Page 19: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/19.jpg)
Recommended formats: bitmap / raster image
19
High confidence Medium confidence Low confidence
�TIFF (uncompressed)� PNG (*.png)
� BMP (*.bmp)� JPEG/JFIF (*.jpg)�JPEG2000 (prefer lossless or uncompressed) (*.jp2)�TIFF (compressed)�GIF (*.gif)
�MrSID (*.sid)�TIFF (in Planar format) �FlashPix (*.fpx)�PhotoShop (*.psd)�All other raster image formats not listed here
http://www.fcla.edu/digitalArchive/pdfs/recFormats.pdf
![Page 20: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/20.jpg)
Recommended formats: vector graphics
20
High confidence Medium confidence Low confidence
�SVG 1.1 (no Java binding) (*.svg)
�Computer Graphic Metafile (CGM, WebCGM) (*.cgm)
�Encapsulated Postscript (EPS)�Macromedia Flash (*.swf)�All other vector image formats not listed here
http://www.fcla.edu/digitalArchive/pdfs/recFormats.pdf
![Page 21: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/21.jpg)
Recommended formats: audio
21
High confidence Medium confidence Low confidence
�AIFF (PCM) (*.aif, *.aiff)� WAV (PCM) (*.wav)
�SUN Audio (uncompressed) (*.au)�Standard MIDI (*.mid,*.midi)�Ogg Vorbis (*.ogg)�Free Lossless Audio Codec (*.flac)� Advance Audio Coding (*.mp4, *.m4a, *.aac)� MP3 (MPEG-1/2, Layer 3)(*.mp3)
�AIFC (compressed) (*.aifc)� NeXT SND (*.snd)� RealNetworks 'Real Audio‚ (*.ra, *.rm, *.ram)� Windows Media Audio�(*.wma)�WAV (compressed) (*.wav)�All other audio formats not listed here
http://www.fcla.edu/digitalArchive/pdfs/recFormats.pdf
![Page 22: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/22.jpg)
Recommended formats: video
22
High confidence Medium confidence Low confidence
�Motion JPEG 2000(ISO/IEC 15444-4)(*.mj2)� AVI (uncompressed)(*.avi)�QuickTime Movie(uncompressed)(*.mov)�Motion JPEG (*.avi,*.mov)
�Ogg Theora (*.ogg)�MPEG-1, MPEG-2 (*.mpg, *.mpeg)�MPEG-4(*.mp4)
�AVI (compressed) (*.avi)�QuickTime Movie(compressed) (*.mov)�RealNetworks 'Real Video‚ (*.rv)�Windows Media Video(*.wmv)�All other video formats not listed here
http://www.fcla.edu/digitalArchive/pdfs/recFormats.pdf
![Page 23: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/23.jpg)
Recommended formats: “data base”
23
High confidence Medium confidence Low confidence
�Delimited Text (*.txt,*.csv)�SQL DDL
�DBF (*.dbf)�OpenOffice *.sxc/*.ods)�Office Open XML *.xlsx)
�Excel (*.xls)�All other spreadsheet/ database formats not listed here
http://www.fcla.edu/digitalArchive/pdfs/recFormats.pdf
![Page 24: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/24.jpg)
Recommended formats: 3D (“virtual reality”)
24
High confidence Medium confidence Low confidence
�X3D (*.x3d) �VRML (*.wrl, *.vrml)�U3D (Universal 3D fileformat)
�All other virtual reality�formats not listed here
http://www.fcla.edu/digitalArchive/pdfs/recFormats.pdf
![Page 25: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/25.jpg)
25
Doctoral thesis on robustness of file formats:
Volker Heydegger, University at Cologne.
![Page 26: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/26.jpg)
IV – How to we identify a format?
26
![Page 27: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/27.jpg)
What kind of file is this?
27
Two ways to identify a file:
(a)By extension.
„Each file ending with *.doc is a MS Word document“
![Page 28: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/28.jpg)
28
Two ways to identify a file:
(b) By internal characteristics („magic number“, „signature“).
A TIFF file begins with …Bytes 0-1: The byte order used within the file. Legal values are:“II” (4949.H) / “MM” (4D4D.H)Bytes 2-3 An arbitrary but carefully chosen number (42) that further identifies the file as aTIFF file.
What kind of file is this?
![Page 29: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/29.jpg)
File format registries - URLs
29
PRONOM:http://www.nationalarchives.gov.uk/pronom/(does not only rely on extensions)
Global Digital Format Registry:http://hul.harvard.edu/gdfr
(predominantly project description)
FileExt:http://filext.com
(predominantly links to software)
![Page 30: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/30.jpg)
V – What’s a file characteristic, than?
30
![Page 31: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/31.jpg)
Technical metadata �
A high proportion of the preservation metadata will be in narrative format and will require manual entry by Library staff. A significant subset of the data however, relating to technical file characteristics, can be automatically extracted from the digital object by reading the file header details. This successful extraction of preservation metadata has been proved in a previous National Library proof of concept project. The automated capture of this information will significantly reduce the amount of manual data entry required from Library staff.
� file characteristics.
31
![Page 32: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/32.jpg)
Why automate?
32
1 million objects: use one second for each.
== 16666.7 minutes == 277.8 hours
== 11.57 working days of a computer
== 34.7 8-hour days for a Human== 7 working weeks
![Page 33: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/33.jpg)
Why automate?
33
1 million objects: use five minutes for each.
== 416 666.7 hours
== 52 803.4 8-hour days for a Human== way too much for anything
![Page 34: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/34.jpg)
Formats in PLANETS:File characteristics
34
Based on two formal languages:
(1)eXtensible Characterisation Extraction Language (= XCEL)
(2)eXtensible Characterisation Description Language (= XCDL)
![Page 35: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/35.jpg)
Migrator
tiff
png
Extractor
tiff XCEL png XCEL
Comparator
png XCDL
tiff XCDL
93%
The comparator
35
![Page 36: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/36.jpg)
Extractor
Appropriate XCELs Comparator
C-Set
The comparator
36
![Page 37: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/37.jpg)
Why data?
37
► Photoshop ►
► Photoshop ►
Becomes discoverable only from the actual data …
![Page 38: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/38.jpg)
V – What is not in a file format?
38
![Page 39: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/39.jpg)
Testfile in Word 2007
39
![Page 40: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/40.jpg)
Testfile in Word 2003 (2007)
40
![Page 41: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/41.jpg)
Testfile in Open Office ODT
41
![Page 42: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/42.jpg)
Testfile in PDF
42
![Page 43: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/43.jpg)
Measuring the pages …
43
Cut out page from rendering surface.
Scale to common dimensions: 371 +/- 1 x 521 +/- 1
Measure1. The leftmost and lowest completely black pixel in the letter “A”
starting the first line of the main text.2. The leftmost and highest completely black pixel in the letter “E”
starting the first line of the text in the footnote.3. The geometrical centre of the period at the end of the main
sentence.4. The geometrical centre of the period at the end of the footnote
text.
![Page 44: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/44.jpg)
Measuring Word 2003
44
(i) = 45 / 134;
(ii) = 57 / 470;
(iii) = 215 / 322 ;
(iv) = 254 / 483
![Page 45: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/45.jpg)
Measuring Word 2007
45
(i) = 45 / 134;
(ii) = 57 / 470;
(iii) = 215 / 322 ;
(iv) = 254 / 483
![Page 46: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/46.jpg)
Open Office ODT
46
(i) = 44 / 132;
(ii) = 52 / 469;
(iii) = 214 / 320 ;
(iv) = 247 / 482
![Page 47: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/47.jpg)
47
(i) = 45 / 130;
(ii) = 59 / 467;
(iii) = 215 / 317 ;
(iv) = 254 / 480
![Page 48: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/48.jpg)
Summary I
48
The comparison of the four renderings of the example pages described above seem to indicate clearly, that a migration from the Word family of formats to PDF is a better way to preserve the content of the document, than a migration to the Open Office format.
![Page 49: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/49.jpg)
Measuring Word 2003
49
Relationship tagged explicitly.
Text / footnote separation clear.
Rendering / layout not (totally) predicatble.
Footnote indicator unpredictable.
![Page 50: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/50.jpg)
Measuring Word 2007
50
Relationship tagged explicitly.
Text / footnote separation extremely clear.
Rendering / layout pretty predictable.
Footnote indicator not predictable.
![Page 51: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/51.jpg)
Open Office ODT
51
Relationship tagged explicitly.
Text / footnote separation extremely clear.
Rendering / layout a little bit predictable.
Footnote indicator predictable.
![Page 52: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/52.jpg)
52
Relationship expressed by layout.
Text / footnote separation missing.
Rendering / layout very much predictable.
Footnote indicator predictable.
![Page 53: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/53.jpg)
Summary II
53
The comparison of the four internal structures of the example pages described above seem to indicate clearly, that a migration from the Word family of formats to PDF is a worse way to preserve the content of the document, than a migration to the Open Office format.
![Page 54: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/54.jpg)
Small technical note
54
Do not forget, that the whole movement started by SGML, carried into the WWW by HTML, transferred to content by the TEI and started XML as a basic empowering technology ...... assumes that rendering is NOT particularly relevant.
![Page 55: Characterisation - planets-project.eu · Manfred Thaller Universität zu * Köln manfred.thaller@uni-koeln.de Characterisation Digital Preservation Planning: Principles, Examples](https://reader031.fdocuments.in/reader031/viewer/2022031023/5b9f54fe09d3f2385c8b9598/html5/thumbnails/55.jpg)
Proposal
55
<significantPoints><point x=”45” y=”134” /><point x=”57” y=”470” /><point x=”215” y=”322” /><point x=”254” y=”483” />
</significantPoints>