2009 PLANETS Vienna - MIXED migration to XML
-
Upload
dirk-roorda -
Category
Education
-
view
283 -
download
0
description
Transcript of 2009 PLANETS Vienna - MIXED migration to XML
![Page 1: 2009 PLANETS Vienna - MIXED migration to XML](https://reader033.fdocuments.in/reader033/viewer/2022060119/558c91ffd8b42af2428b475a/html5/thumbnails/1.jpg)
Towards an Infrastructure of Migration
Dirk Roorda
![Page 2: 2009 PLANETS Vienna - MIXED migration to XML](https://reader033.fdocuments.in/reader033/viewer/2022060119/558c91ffd8b42af2428b475a/html5/thumbnails/2.jpg)
• .
![Page 3: 2009 PLANETS Vienna - MIXED migration to XML](https://reader033.fdocuments.in/reader033/viewer/2022060119/558c91ffd8b42af2428b475a/html5/thumbnails/3.jpg)
History of MIXED
• history• defining• developing• using• exploiting
digital preservationdata and standardsopen sourcedata archivesresearch infrastructures
![Page 4: 2009 PLANETS Vienna - MIXED migration to XML](https://reader033.fdocuments.in/reader033/viewer/2022060119/558c91ffd8b42af2428b475a/html5/thumbnails/4.jpg)
what is it?
MIXED is a file format converter
plus a set of formats, called SDFP, i.e. Standard Data Formats for Preservation
![Page 5: 2009 PLANETS Vienna - MIXED migration to XML](https://reader033.fdocuments.in/reader033/viewer/2022060119/558c91ffd8b42af2428b475a/html5/thumbnails/5.jpg)
founding idea
National Archive (NL): testbed
![Page 6: 2009 PLANETS Vienna - MIXED migration to XML](https://reader033.fdocuments.in/reader033/viewer/2022060119/558c91ffd8b42af2428b475a/html5/thumbnails/6.jpg)
XML is an appropriate choice for the long-term preservation of spreadsheets. XML can be used to specify the context, content and structure of spreadsheets.
testbed: spreadsheets
![Page 7: 2009 PLANETS Vienna - MIXED migration to XML](https://reader033.fdocuments.in/reader033/viewer/2022060119/558c91ffd8b42af2428b475a/html5/thumbnails/7.jpg)
testbed: databases
At present, XML is the most effective strategy for the durable preservation of databases. XML is highly capable of representing the context, content, and structure of databases.
This strategy can implemented using a number of different methods.
![Page 8: 2009 PLANETS Vienna - MIXED migration to XML](https://reader033.fdocuments.in/reader033/viewer/2022060119/558c91ffd8b42af2428b475a/html5/thumbnails/8.jpg)
Conversion to preservable formats.
what do repositories want
Automatically
at most once
Faithfully.
![Page 9: 2009 PLANETS Vienna - MIXED migration to XML](https://reader033.fdocuments.in/reader033/viewer/2022060119/558c91ffd8b42af2428b475a/html5/thumbnails/9.jpg)
preservation strategy
Migration and emulation are complementary strategies. Migration is best for offering usable content. Emulation is best for invoking the original experience.
Migration to XML is normalised migration, hence we coin it smart migration.
![Page 10: 2009 PLANETS Vienna - MIXED migration to XML](https://reader033.fdocuments.in/reader033/viewer/2022060119/558c91ffd8b42af2428b475a/html5/thumbnails/10.jpg)
Ingredients
suitable xml formats for your data
software to convert
legacy data to xml
ingest data to xml
xml to dissemination data
connectors to your repository workflow
![Page 11: 2009 PLANETS Vienna - MIXED migration to XML](https://reader033.fdocuments.in/reader033/viewer/2022060119/558c91ffd8b42af2428b475a/html5/thumbnails/11.jpg)
MIXED - snapshot
![Page 12: 2009 PLANETS Vienna - MIXED migration to XML](https://reader033.fdocuments.in/reader033/viewer/2022060119/558c91ffd8b42af2428b475a/html5/thumbnails/12.jpg)
timeline
![Page 13: 2009 PLANETS Vienna - MIXED migration to XML](https://reader033.fdocuments.in/reader033/viewer/2022060119/558c91ffd8b42af2428b475a/html5/thumbnails/13.jpg)
defining MIXED
• history• defining• developing• using• exploiting
digital preservationdata and standardsopen sourcedata archivesresearch infrastructures
![Page 14: 2009 PLANETS Vienna - MIXED migration to XML](https://reader033.fdocuments.in/reader033/viewer/2022060119/558c91ffd8b42af2428b475a/html5/thumbnails/14.jpg)
XML
XML sounds great
what is MIXED’s XML?
![Page 15: 2009 PLANETS Vienna - MIXED migration to XML](https://reader033.fdocuments.in/reader033/viewer/2022060119/558c91ffd8b42af2428b475a/html5/thumbnails/15.jpg)
Data kinds
Data comes in kinds, defined by the typical applications that manipulate it.
Spreadsheets, databases, rich text, images, audio, video, drawings, ...
The need for these applications are the basic reason for the threat of data loss caused by software obsolescence.
![Page 16: 2009 PLANETS Vienna - MIXED migration to XML](https://reader033.fdocuments.in/reader033/viewer/2022060119/558c91ffd8b42af2428b475a/html5/thumbnails/16.jpg)
standards for data kinds
binary vendor formats (doc)
ascii vendor formats (rtf)
open formats (HTML export)
interchange formats (ad-hoc XML)
standard formats (defined XML: OOXML)
preservation formats (selected XML: SDFP)
![Page 17: 2009 PLANETS Vienna - MIXED migration to XML](https://reader033.fdocuments.in/reader033/viewer/2022060119/558c91ffd8b42af2428b475a/html5/thumbnails/17.jpg)
SDFP
Standard Data Formats for Preservation
Spreadsheets: ODF subset
Databases: e-David-XML
Statistical Data: DDI
![Page 18: 2009 PLANETS Vienna - MIXED migration to XML](https://reader033.fdocuments.in/reader033/viewer/2022060119/558c91ffd8b42af2428b475a/html5/thumbnails/18.jpg)
SDFP as umbrella
![Page 19: 2009 PLANETS Vienna - MIXED migration to XML](https://reader033.fdocuments.in/reader033/viewer/2022060119/558c91ffd8b42af2428b475a/html5/thumbnails/19.jpg)
Datatypes
numbers: ISO 6093
date-time: ISO 8601-3
characters: UNICODE
![Page 20: 2009 PLANETS Vienna - MIXED migration to XML](https://reader033.fdocuments.in/reader033/viewer/2022060119/558c91ffd8b42af2428b475a/html5/thumbnails/20.jpg)
Scope (kinds)
initially
tabular data
spreadsheets and databases
later
statistical data
and then
text, still images, ...
![Page 21: 2009 PLANETS Vienna - MIXED migration to XML](https://reader033.fdocuments.in/reader033/viewer/2022060119/558c91ffd8b42af2428b475a/html5/thumbnails/21.jpg)
Scope (aspects)
databases
data model
data itself
spreadsheets
cell positions
values
formulas
Content semantics
![Page 22: 2009 PLANETS Vienna - MIXED migration to XML](https://reader033.fdocuments.in/reader033/viewer/2022060119/558c91ffd8b42af2428b475a/html5/thumbnails/22.jpg)
Aspects that didn’t make it
presentation details
fonts
forms
action details
update, insert, delete
stored procedures
triggers
![Page 23: 2009 PLANETS Vienna - MIXED migration to XML](https://reader033.fdocuments.in/reader033/viewer/2022060119/558c91ffd8b42af2428b475a/html5/thumbnails/23.jpg)
developing MIXED
• history• defining• developing
• using• exploiting
digital preservationdata and standardsopen sourcedata archivesresearch infrastructures
![Page 24: 2009 PLANETS Vienna - MIXED migration to XML](https://reader033.fdocuments.in/reader033/viewer/2022060119/558c91ffd8b42af2428b475a/html5/thumbnails/24.jpg)
design principles
building block in workflows
no built-in user interface
easily extensible / updatable
use and produce open source code
![Page 25: 2009 PLANETS Vienna - MIXED migration to XML](https://reader033.fdocuments.in/reader033/viewer/2022060119/558c91ffd8b42af2428b475a/html5/thumbnails/25.jpg)
framework and plugins
framework
managing plugins
managing execution
administration
plugins
for each conversion
from/to SDFP
![Page 26: 2009 PLANETS Vienna - MIXED migration to XML](https://reader033.fdocuments.in/reader033/viewer/2022060119/558c91ffd8b42af2428b475a/html5/thumbnails/26.jpg)
issues
how loose/tight are the components connected?
pure own Java code / borrow existing programs in other languages?
modularity of file type recognition (JHOVE)
![Page 27: 2009 PLANETS Vienna - MIXED migration to XML](https://reader033.fdocuments.in/reader033/viewer/2022060119/558c91ffd8b42af2428b475a/html5/thumbnails/27.jpg)
Using MIXED
• history• defining• developing• using• exploiting
digital preservationdata and standardsopen sourcedata archivesresearch infrastructures
![Page 28: 2009 PLANETS Vienna - MIXED migration to XML](https://reader033.fdocuments.in/reader033/viewer/2022060119/558c91ffd8b42af2428b475a/html5/thumbnails/28.jpg)
Data archives
collect
preserve
re-use
![Page 29: 2009 PLANETS Vienna - MIXED migration to XML](https://reader033.fdocuments.in/reader033/viewer/2022060119/558c91ffd8b42af2428b475a/html5/thumbnails/29.jpg)
improvements for repositories
• users can select format most usable to them, irrespective of producer
• users can select the preservation format, in case usable formats are not supported
• less uncertainties in interpretation, either by humans or by software
![Page 30: 2009 PLANETS Vienna - MIXED migration to XML](https://reader033.fdocuments.in/reader033/viewer/2022060119/558c91ffd8b42af2428b475a/html5/thumbnails/30.jpg)
further improvements
combine data from heterogeneous sources• different formats (straightforward)• different data models (advanced)• different data kinds
![Page 31: 2009 PLANETS Vienna - MIXED migration to XML](https://reader033.fdocuments.in/reader033/viewer/2022060119/558c91ffd8b42af2428b475a/html5/thumbnails/31.jpg)
Exploiting MIXED
• history• defining• developing• using• exploiting
digital preservationdata and standardsopen sourcedata archivesresearch infrastructures
![Page 32: 2009 PLANETS Vienna - MIXED migration to XML](https://reader033.fdocuments.in/reader033/viewer/2022060119/558c91ffd8b42af2428b475a/html5/thumbnails/32.jpg)
Research Infrastructures
![Page 33: 2009 PLANETS Vienna - MIXED migration to XML](https://reader033.fdocuments.in/reader033/viewer/2022060119/558c91ffd8b42af2428b475a/html5/thumbnails/33.jpg)
Data on an Infrastructure
• higher demand for interoperability• more needs for standards• more opportunities for re-use• more scope for digital preservation tools
![Page 34: 2009 PLANETS Vienna - MIXED migration to XML](https://reader033.fdocuments.in/reader033/viewer/2022060119/558c91ffd8b42af2428b475a/html5/thumbnails/34.jpg)
Conversions needed
lots of them ...
![Page 35: 2009 PLANETS Vienna - MIXED migration to XML](https://reader033.fdocuments.in/reader033/viewer/2022060119/558c91ffd8b42af2428b475a/html5/thumbnails/35.jpg)
Conversion as a service
• a uniform resource• yielding uniform results
• easily accessible• product of community effort
• a good conversion requires a lot of intelligent work
• quality is reached in an iterative manner
![Page 36: 2009 PLANETS Vienna - MIXED migration to XML](https://reader033.fdocuments.in/reader033/viewer/2022060119/558c91ffd8b42af2428b475a/html5/thumbnails/36.jpg)
MIXED as Infrastructure
• provides a standard for preservation formats
• implements the tools to maintain the standard
• accumulates the shared wisdom of data formats
![Page 37: 2009 PLANETS Vienna - MIXED migration to XML](https://reader033.fdocuments.in/reader033/viewer/2022060119/558c91ffd8b42af2428b475a/html5/thumbnails/37.jpg)
The End of MIXED
when software vendors realize
that there should always be
an im/export
to a preservation format,
it means ...........