Embedding Object Data in TEI Documents - Challenges and ... · Embedding Object Data in TEI...
Transcript of Embedding Object Data in TEI Documents - Challenges and ... · Embedding Object Data in TEI...
Embedding Object Data in TEI Documents
Challenges and Solutions
Florian Willems, M. A. – cand. phil. Sven Ole Clemens
http://cceh.uni-koeln.de/
November 27, 2009
1 / 16
Embedding Object Data in TEI Documents
Embedding Object Data in TEI Documents
Challenges and Solutions
Florian Willems, M. A. – cand. phil. Sven Ole Clemens
http://cceh.uni-koeln.de/
November 27, 20092009-1
2-0
3
Embedding Object Data in TEI Documents
Ladies and gentlemen, we are Sven Ole Clemens and Florian Willems of
the Institute for Computer Science for the Humanities at the University
of Cologne. Both of us work with the TEI, albeit in different projects -
me in the ENRICH project of the European Union, Sven Ole in the
“Stichwerke”-project of the Forschungsarchiv fur Antike Plastik and the
German Archaeological Institute. In this presentation we will give you a
glimpse on how object data for the material sciences can be incorporated
in TEI-described books.
Books
2 / 16
Embedding Object Data in TEI Documents
Books
2009-1
2-0
3
Embedding Object Data in TEI Documents
Books
For the last ten years, the Forschungsarchiv has been digitizing early
printed books on the subject of archaeology and its adjectant fields. One
of the earliest projects was the re-issue of the catalogue of the Museo
Maffeiano, the original being from 1749 and one of the earliest scientific
archaeological publications. Even in this project, one of the principal
requirements was the connection of object data with the scanned pages
of the book itself, to facilitate a critical approach on the reception of
antiquity. This led to many generations of frontends, many methods of
interaction between books and object data, and some dead ends. Here
we see the general viewer in its latest incarnation.
The TEI, Its Use...
3 / 16
Embedding Object Data in TEI Documents
The TEI, Its Use...
2009-1
2-0
3
Embedding Object Data in TEI Documents
The TEI, Its Use...
The ongoing development of the TEI and some project funding-related
circumstances led to our adaption of the same. Realizing that it gained a
lot of acceptance in the text community, we took the great step of
ditching all our other text/markup-related methods - except, naturally,
the metadata formats like METS/MODS and MARK21, which we need
to be fully interoperable with other metadata harvesters and warehouses.
But all the real textwork now depends solely on the TEI, for which we
developed an editor and viewer, which will be demontrated later.
...and Limitations
4 / 16
Embedding Object Data in TEI Documents
...and Limitations
2009-1
2-0
3
Embedding Object Data in TEI Documents
...and Limitations
As I said before - one of the main objectives of all our book-related efforts
is and always has been the connection of books with the real-world
objects mentioned therein. Here the TEI in its standard phenotype falls
decidedly short, if one wants to create not only links to URIs of objects,
but wants to contain the object data within the “book object”. But for
now, let’s take a look at our objects and their representations.
Objects...
Physical entities
Controlled and uncontrolleddescriptions
Inheritance
Abstract and non-abstract
Opinions and time
5 / 16
Embedding Object Data in TEI Documents
Objects...
Physical entities
Controlled and uncontrolleddescriptions
Inheritance
Abstract and non-abstract
Opinions and time
2009-1
2-0
3
Embedding Object Data in TEI Documents
Objects...
Objects in our very narrow and at the same time very broad sense are
physical entities which can be photographed and/or described in terms of
controlled and uncontrolled vocabulary. They may in themselfs consist of
multiple other objects and may share and inherit attributes and values in
an object-oriented way. Most of them originate in classical antiquity, but
there are also abstract classes of objects such as topographical ones or
different receptions over time of the same physical object.
...and Contexts
6 / 16
Embedding Object Data in TEI Documents
...and Contexts
2009-1
2-0
3
Embedding Object Data in TEI Documents
...and Contexts
Contexts are what makes the difference between a glorified card index
and a serious research database - in the first step, at last. This first step
necessitates a lot of manual work by competent knowledge workers. Then
and only then you reach a degree of certanity on the validity of the data,
and then you can start to thing about the possibilities a contextualised,
object oriented database in earnest.
Relations, Objects and RDF, oh my!
Openness is the only way
Interaction is necessary
RDF is one possibility
CIDOC-CRM is the choice forthe cultural, material sciences
7 / 16
Embedding Object Data in TEI Documents
Relations, Objects and RDF, oh my!
Openness is the only way
Interaction is necessary
RDF is one possibility
CIDOC-CRM is the choice forthe cultural, material sciences
2009-1
2-0
3
Embedding Object Data in TEI Documents
Relations, Objects and RDF, oh my!
So. To really use all the possibilities of validated data, you need to take
it out into the open and let it interact with other data. And to this end,
you need open standards and interfaces. Standards which can reflect the
contexts and make them available to other databases. Standards you can
import. We chose RDF, and CIDOC-CRM. At the moment, around 25
percent of Arachne’s data fields are mapped to the CIDOC-CRM, with
considerable effort going into expanding these mappings to interact with
a wide variety of research databases - in our case, as studies and
prototypes, the CLAROS project at Oxfort and the Perseus project at
Boston.
Standards are great, let’s make a new one!
8 / 16
Embedding Object Data in TEI Documents
Standards are great, let’s make a new one!
2009-1
2-0
3
Embedding Object Data in TEI Documents
Standards are great, let’s make a new one!
All these requirements, if reviewed a few years back, just shout for a new
standard of incorporating objects in tagged texts. Why not create an
elaborate XML schema to combine the power of RDF and the TEI. Why
not make it a little bit proprietary to contain binary objects.
Standards are great, let’s make a new one!
No.
9 / 16
Embedding Object Data in TEI Documents
Standards are great, let’s make a new one!
No.
2009-1
2-0
3
Embedding Object Data in TEI Documents
Standards are great, let’s make a new one!
This is definitly not what any scientist of sound mind can want, although
especially in Germany this is frequently done. But we thought for a while
and took the obvious step:
Containers and Self-Containment
10 / 16
Embedding Object Data in TEI Documents
Containers and Self-Containment
2009-1
2-0
3
Embedding Object Data in TEI Documents
Containers and Self-Containment
Our first and foremost goal is the markup of text. But a close second is
the connection between text and objects, not only by link but as a form
of self-contained object. This object shall contain all the data on the
objects mentioned in the text, with a plausible amount of context data.
And as self-containment and linking are not mutually exclusive, why not
use both. The “static” embedded data, reflecting the knowledge on the
object by the time of the creation of the self-contained object, and the
link to the database, reflecting the current point of view on the object.
This leads to some further thoughts:
Cite, Store, Version
Any one object must be citeable
The human-readable data must be human-readable
It has to be possible to follow the genesis of the information in theobject
All harvested data should be harvested as snapshots maintaining thestatus and version of the given data set
11 / 16
Embedding Object Data in TEI Documents
Cite, Store, Version
Any one object must be citeable
The human-readable data must be human-readable
It has to be possible to follow the genesis of the information in theobject
All harvested data should be harvested as snapshots maintaining thestatus and version of the given data set
2009-1
2-0
3
Embedding Object Data in TEI Documents
Cite, Store, Version
Any one object, be it the self-contained complex one or one of its
children, has tpo be citable. To archive this, the whole object itself has
to have version control - be it through a secondary sytem like SVN, or as
an ihrernt ability of a future, advanced container. All non-binary data has
to be human-readable, for archival purposes as well as for sheer common
sense. Combining these two, it poses no problem to use tools like “diff”
or “patch” to follow and reflect on the genesis of the contained
knowledge. Knowing its own version number, each object can be
distinguished from its older or younger cousins, thus facilitating what we
Germans like to call “Quellenkritik”.
Interoperability and Usage
Sometimes, ignorance is bliss
Waste Not, Want Not!
Different editors for differentdata
Easy-to-program extractiontools
12 / 16
Embedding Object Data in TEI Documents
Interoperability and Usage
Sometimes, ignorance is bliss
Waste Not, Want Not!
Different editors for differentdata
Easy-to-program extractiontools
2009-1
2-0
3
Embedding Object Data in TEI Documents
Interoperability and Usage
Now we have our object, we need to think about the real-life applications
which could use such a contraption. Sure, someone whose sole interest
lies in the text itself doesn’t need all the supplementing object data. But
as hard drives grow, a little too much data can’t hurt anymore - we’re
talking mostly about text, and in the first generations some image data.
So it should be possible to easily write an editor just for the data one’s
research interestr is focused on. As XML is notoriously easy to parse,
extraction tools for relevant data may comprise of only a few lines of perl
or Java. As we see on the left hand side: one self contained object may
even contain others to have all the secondary literature - barring digital
availability - relevant to itself.
Real Life, so far
13 / 16
Embedding Object Data in TEI Documents
Real Life, so far
2009-1
2-0
3
Embedding Object Data in TEI Documents
Real Life, so far
Here we see an example, where a self-contained object is the best choice
for working in a distrubuted context - the catalogue “Musee de sculpture
antique et moderne” by Comte de Clarac. It containes all the pages of
the catalogue, conncted via database to the actual photographs of the
objects. It also contains all the texts Clarac himself wrote about the
obejcts, as well as their subsequent history of reception, ownership and
the most recent research data on most of the statues. This data resides
inside Arachne, accessible to all, but not as easy to export and integrate
into a distributed environment as we would like it to be - we may export
the object data via CIDOC CRM, but the connection to the book itself
would most probably be lost with this export. So here we could embed
the CIDOC data as RDF inside the TEI document. This project may not
be yet TEIified, but the ongoing projects at the Forschungsarchiv are and
will be. So: off to the current tools and project.
An Editor
14 / 16
Embedding Object Data in TEI Documents
An Editor
2009-1
2-0
3
Embedding Object Data in TEI Documents
An Editor
I leave the practical demonstration to the person who did lots of the
actual programming - Sven Ole Clemens.
Conclusion and Possibilities
Use a given standard
Even use it as a container!
Facilitate versioning, storage and communication
Not everyone needs all the contained data, but who cares?
Next steps: advanced containers!
e.g. a .tar which also contains all the binary data...
15 / 16
Embedding Object Data in TEI Documents
Conclusion and Possibilities
Use a given standard
Even use it as a container!
Facilitate versioning, storage and communication
Not everyone needs all the contained data, but who cares?
Next steps: advanced containers!
e.g. a .tar which also contains all the binary data...
2009-1
2-0
3
Embedding Object Data in TEI Documents
Conclusion and Possibilities
To summarize, we use a given standard, overload one of the tags just
slightly, and by these means create a self-contained textual object, which
can be versioned, long-time archived and is able to be exploded into its
different contents by very simple means. The next step would be an
advanced container, a .tar or something like that, which can also hold
binary data relevant for the object itself. We’ll see how it will develop.
Thank You!
http://cceh.uni-koeln.de/
http://hki.uni-koeln.de/
http://arachne.uni-koeln.de/
http://enrich.manuscriptorium.com/
[email protected] — [email protected]
16 / 16
Embedding Object Data in TEI Documents
Thank You!
http://cceh.uni-koeln.de/
http://hki.uni-koeln.de/
http://arachne.uni-koeln.de/
http://enrich.manuscriptorium.com/
[email protected] — [email protected]
2009-1
2-0
3
Embedding Object Data in TEI Documents
Thank You!
Thanks for your attention. Now the obligatory question: Questions?