Embedding Object Data in TEI Documents - Challenges and ... · Embedding Object Data in TEI...

32
Embedding Object Data in TEI Documents Challenges and Solutions Florian Willems, M. A. – cand. phil. Sven Ole Clemens http://cceh.uni-koeln.de/ November 27, 2009 1 / 16 Embedding Object Data in TEI Documents

Transcript of Embedding Object Data in TEI Documents - Challenges and ... · Embedding Object Data in TEI...

Page 1: Embedding Object Data in TEI Documents - Challenges and ... · Embedding Object Data in TEI Documents...and Contexts Contexts are what makes the di erence between a glori ed card

Embedding Object Data in TEI Documents

Challenges and Solutions

Florian Willems, M. A. – cand. phil. Sven Ole Clemens

http://cceh.uni-koeln.de/

November 27, 2009

1 / 16

Embedding Object Data in TEI Documents

Page 2: Embedding Object Data in TEI Documents - Challenges and ... · Embedding Object Data in TEI Documents...and Contexts Contexts are what makes the di erence between a glori ed card

Embedding Object Data in TEI Documents

Challenges and Solutions

Florian Willems, M. A. – cand. phil. Sven Ole Clemens

http://cceh.uni-koeln.de/

November 27, 20092009-1

2-0

3

Embedding Object Data in TEI Documents

Ladies and gentlemen, we are Sven Ole Clemens and Florian Willems of

the Institute for Computer Science for the Humanities at the University

of Cologne. Both of us work with the TEI, albeit in different projects -

me in the ENRICH project of the European Union, Sven Ole in the

“Stichwerke”-project of the Forschungsarchiv fur Antike Plastik and the

German Archaeological Institute. In this presentation we will give you a

glimpse on how object data for the material sciences can be incorporated

in TEI-described books.

Page 3: Embedding Object Data in TEI Documents - Challenges and ... · Embedding Object Data in TEI Documents...and Contexts Contexts are what makes the di erence between a glori ed card

Books

2 / 16

Embedding Object Data in TEI Documents

Page 4: Embedding Object Data in TEI Documents - Challenges and ... · Embedding Object Data in TEI Documents...and Contexts Contexts are what makes the di erence between a glori ed card

Books

2009-1

2-0

3

Embedding Object Data in TEI Documents

Books

For the last ten years, the Forschungsarchiv has been digitizing early

printed books on the subject of archaeology and its adjectant fields. One

of the earliest projects was the re-issue of the catalogue of the Museo

Maffeiano, the original being from 1749 and one of the earliest scientific

archaeological publications. Even in this project, one of the principal

requirements was the connection of object data with the scanned pages

of the book itself, to facilitate a critical approach on the reception of

antiquity. This led to many generations of frontends, many methods of

interaction between books and object data, and some dead ends. Here

we see the general viewer in its latest incarnation.

Page 5: Embedding Object Data in TEI Documents - Challenges and ... · Embedding Object Data in TEI Documents...and Contexts Contexts are what makes the di erence between a glori ed card

The TEI, Its Use...

3 / 16

Embedding Object Data in TEI Documents

Page 6: Embedding Object Data in TEI Documents - Challenges and ... · Embedding Object Data in TEI Documents...and Contexts Contexts are what makes the di erence between a glori ed card

The TEI, Its Use...

2009-1

2-0

3

Embedding Object Data in TEI Documents

The TEI, Its Use...

The ongoing development of the TEI and some project funding-related

circumstances led to our adaption of the same. Realizing that it gained a

lot of acceptance in the text community, we took the great step of

ditching all our other text/markup-related methods - except, naturally,

the metadata formats like METS/MODS and MARK21, which we need

to be fully interoperable with other metadata harvesters and warehouses.

But all the real textwork now depends solely on the TEI, for which we

developed an editor and viewer, which will be demontrated later.

Page 7: Embedding Object Data in TEI Documents - Challenges and ... · Embedding Object Data in TEI Documents...and Contexts Contexts are what makes the di erence between a glori ed card

...and Limitations

4 / 16

Embedding Object Data in TEI Documents

Page 8: Embedding Object Data in TEI Documents - Challenges and ... · Embedding Object Data in TEI Documents...and Contexts Contexts are what makes the di erence between a glori ed card

...and Limitations

2009-1

2-0

3

Embedding Object Data in TEI Documents

...and Limitations

As I said before - one of the main objectives of all our book-related efforts

is and always has been the connection of books with the real-world

objects mentioned therein. Here the TEI in its standard phenotype falls

decidedly short, if one wants to create not only links to URIs of objects,

but wants to contain the object data within the “book object”. But for

now, let’s take a look at our objects and their representations.

Page 9: Embedding Object Data in TEI Documents - Challenges and ... · Embedding Object Data in TEI Documents...and Contexts Contexts are what makes the di erence between a glori ed card

Objects...

Physical entities

Controlled and uncontrolleddescriptions

Inheritance

Abstract and non-abstract

Opinions and time

5 / 16

Embedding Object Data in TEI Documents

Page 10: Embedding Object Data in TEI Documents - Challenges and ... · Embedding Object Data in TEI Documents...and Contexts Contexts are what makes the di erence between a glori ed card

Objects...

Physical entities

Controlled and uncontrolleddescriptions

Inheritance

Abstract and non-abstract

Opinions and time

2009-1

2-0

3

Embedding Object Data in TEI Documents

Objects...

Objects in our very narrow and at the same time very broad sense are

physical entities which can be photographed and/or described in terms of

controlled and uncontrolled vocabulary. They may in themselfs consist of

multiple other objects and may share and inherit attributes and values in

an object-oriented way. Most of them originate in classical antiquity, but

there are also abstract classes of objects such as topographical ones or

different receptions over time of the same physical object.

Page 11: Embedding Object Data in TEI Documents - Challenges and ... · Embedding Object Data in TEI Documents...and Contexts Contexts are what makes the di erence between a glori ed card

...and Contexts

6 / 16

Embedding Object Data in TEI Documents

Page 12: Embedding Object Data in TEI Documents - Challenges and ... · Embedding Object Data in TEI Documents...and Contexts Contexts are what makes the di erence between a glori ed card

...and Contexts

2009-1

2-0

3

Embedding Object Data in TEI Documents

...and Contexts

Contexts are what makes the difference between a glorified card index

and a serious research database - in the first step, at last. This first step

necessitates a lot of manual work by competent knowledge workers. Then

and only then you reach a degree of certanity on the validity of the data,

and then you can start to thing about the possibilities a contextualised,

object oriented database in earnest.

Page 13: Embedding Object Data in TEI Documents - Challenges and ... · Embedding Object Data in TEI Documents...and Contexts Contexts are what makes the di erence between a glori ed card

Relations, Objects and RDF, oh my!

Openness is the only way

Interaction is necessary

RDF is one possibility

CIDOC-CRM is the choice forthe cultural, material sciences

7 / 16

Embedding Object Data in TEI Documents

Page 14: Embedding Object Data in TEI Documents - Challenges and ... · Embedding Object Data in TEI Documents...and Contexts Contexts are what makes the di erence between a glori ed card

Relations, Objects and RDF, oh my!

Openness is the only way

Interaction is necessary

RDF is one possibility

CIDOC-CRM is the choice forthe cultural, material sciences

2009-1

2-0

3

Embedding Object Data in TEI Documents

Relations, Objects and RDF, oh my!

So. To really use all the possibilities of validated data, you need to take

it out into the open and let it interact with other data. And to this end,

you need open standards and interfaces. Standards which can reflect the

contexts and make them available to other databases. Standards you can

import. We chose RDF, and CIDOC-CRM. At the moment, around 25

percent of Arachne’s data fields are mapped to the CIDOC-CRM, with

considerable effort going into expanding these mappings to interact with

a wide variety of research databases - in our case, as studies and

prototypes, the CLAROS project at Oxfort and the Perseus project at

Boston.

Page 15: Embedding Object Data in TEI Documents - Challenges and ... · Embedding Object Data in TEI Documents...and Contexts Contexts are what makes the di erence between a glori ed card

Standards are great, let’s make a new one!

8 / 16

Embedding Object Data in TEI Documents

Page 16: Embedding Object Data in TEI Documents - Challenges and ... · Embedding Object Data in TEI Documents...and Contexts Contexts are what makes the di erence between a glori ed card

Standards are great, let’s make a new one!

2009-1

2-0

3

Embedding Object Data in TEI Documents

Standards are great, let’s make a new one!

All these requirements, if reviewed a few years back, just shout for a new

standard of incorporating objects in tagged texts. Why not create an

elaborate XML schema to combine the power of RDF and the TEI. Why

not make it a little bit proprietary to contain binary objects.

Page 17: Embedding Object Data in TEI Documents - Challenges and ... · Embedding Object Data in TEI Documents...and Contexts Contexts are what makes the di erence between a glori ed card

Standards are great, let’s make a new one!

No.

9 / 16

Embedding Object Data in TEI Documents

Page 18: Embedding Object Data in TEI Documents - Challenges and ... · Embedding Object Data in TEI Documents...and Contexts Contexts are what makes the di erence between a glori ed card

Standards are great, let’s make a new one!

No.

2009-1

2-0

3

Embedding Object Data in TEI Documents

Standards are great, let’s make a new one!

This is definitly not what any scientist of sound mind can want, although

especially in Germany this is frequently done. But we thought for a while

and took the obvious step:

Page 19: Embedding Object Data in TEI Documents - Challenges and ... · Embedding Object Data in TEI Documents...and Contexts Contexts are what makes the di erence between a glori ed card

Containers and Self-Containment

10 / 16

Embedding Object Data in TEI Documents

Page 20: Embedding Object Data in TEI Documents - Challenges and ... · Embedding Object Data in TEI Documents...and Contexts Contexts are what makes the di erence between a glori ed card

Containers and Self-Containment

2009-1

2-0

3

Embedding Object Data in TEI Documents

Containers and Self-Containment

Our first and foremost goal is the markup of text. But a close second is

the connection between text and objects, not only by link but as a form

of self-contained object. This object shall contain all the data on the

objects mentioned in the text, with a plausible amount of context data.

And as self-containment and linking are not mutually exclusive, why not

use both. The “static” embedded data, reflecting the knowledge on the

object by the time of the creation of the self-contained object, and the

link to the database, reflecting the current point of view on the object.

This leads to some further thoughts:

Page 21: Embedding Object Data in TEI Documents - Challenges and ... · Embedding Object Data in TEI Documents...and Contexts Contexts are what makes the di erence between a glori ed card

Cite, Store, Version

Any one object must be citeable

The human-readable data must be human-readable

It has to be possible to follow the genesis of the information in theobject

All harvested data should be harvested as snapshots maintaining thestatus and version of the given data set

11 / 16

Embedding Object Data in TEI Documents

Page 22: Embedding Object Data in TEI Documents - Challenges and ... · Embedding Object Data in TEI Documents...and Contexts Contexts are what makes the di erence between a glori ed card

Cite, Store, Version

Any one object must be citeable

The human-readable data must be human-readable

It has to be possible to follow the genesis of the information in theobject

All harvested data should be harvested as snapshots maintaining thestatus and version of the given data set

2009-1

2-0

3

Embedding Object Data in TEI Documents

Cite, Store, Version

Any one object, be it the self-contained complex one or one of its

children, has tpo be citable. To archive this, the whole object itself has

to have version control - be it through a secondary sytem like SVN, or as

an ihrernt ability of a future, advanced container. All non-binary data has

to be human-readable, for archival purposes as well as for sheer common

sense. Combining these two, it poses no problem to use tools like “diff”

or “patch” to follow and reflect on the genesis of the contained

knowledge. Knowing its own version number, each object can be

distinguished from its older or younger cousins, thus facilitating what we

Germans like to call “Quellenkritik”.

Page 23: Embedding Object Data in TEI Documents - Challenges and ... · Embedding Object Data in TEI Documents...and Contexts Contexts are what makes the di erence between a glori ed card

Interoperability and Usage

Sometimes, ignorance is bliss

Waste Not, Want Not!

Different editors for differentdata

Easy-to-program extractiontools

12 / 16

Embedding Object Data in TEI Documents

Page 24: Embedding Object Data in TEI Documents - Challenges and ... · Embedding Object Data in TEI Documents...and Contexts Contexts are what makes the di erence between a glori ed card

Interoperability and Usage

Sometimes, ignorance is bliss

Waste Not, Want Not!

Different editors for differentdata

Easy-to-program extractiontools

2009-1

2-0

3

Embedding Object Data in TEI Documents

Interoperability and Usage

Now we have our object, we need to think about the real-life applications

which could use such a contraption. Sure, someone whose sole interest

lies in the text itself doesn’t need all the supplementing object data. But

as hard drives grow, a little too much data can’t hurt anymore - we’re

talking mostly about text, and in the first generations some image data.

So it should be possible to easily write an editor just for the data one’s

research interestr is focused on. As XML is notoriously easy to parse,

extraction tools for relevant data may comprise of only a few lines of perl

or Java. As we see on the left hand side: one self contained object may

even contain others to have all the secondary literature - barring digital

availability - relevant to itself.

Page 25: Embedding Object Data in TEI Documents - Challenges and ... · Embedding Object Data in TEI Documents...and Contexts Contexts are what makes the di erence between a glori ed card

Real Life, so far

13 / 16

Embedding Object Data in TEI Documents

Page 26: Embedding Object Data in TEI Documents - Challenges and ... · Embedding Object Data in TEI Documents...and Contexts Contexts are what makes the di erence between a glori ed card

Real Life, so far

2009-1

2-0

3

Embedding Object Data in TEI Documents

Real Life, so far

Here we see an example, where a self-contained object is the best choice

for working in a distrubuted context - the catalogue “Musee de sculpture

antique et moderne” by Comte de Clarac. It containes all the pages of

the catalogue, conncted via database to the actual photographs of the

objects. It also contains all the texts Clarac himself wrote about the

obejcts, as well as their subsequent history of reception, ownership and

the most recent research data on most of the statues. This data resides

inside Arachne, accessible to all, but not as easy to export and integrate

into a distributed environment as we would like it to be - we may export

the object data via CIDOC CRM, but the connection to the book itself

would most probably be lost with this export. So here we could embed

the CIDOC data as RDF inside the TEI document. This project may not

be yet TEIified, but the ongoing projects at the Forschungsarchiv are and

will be. So: off to the current tools and project.

Page 27: Embedding Object Data in TEI Documents - Challenges and ... · Embedding Object Data in TEI Documents...and Contexts Contexts are what makes the di erence between a glori ed card

An Editor

14 / 16

Embedding Object Data in TEI Documents

Page 28: Embedding Object Data in TEI Documents - Challenges and ... · Embedding Object Data in TEI Documents...and Contexts Contexts are what makes the di erence between a glori ed card

An Editor

2009-1

2-0

3

Embedding Object Data in TEI Documents

An Editor

I leave the practical demonstration to the person who did lots of the

actual programming - Sven Ole Clemens.

Page 29: Embedding Object Data in TEI Documents - Challenges and ... · Embedding Object Data in TEI Documents...and Contexts Contexts are what makes the di erence between a glori ed card

Conclusion and Possibilities

Use a given standard

Even use it as a container!

Facilitate versioning, storage and communication

Not everyone needs all the contained data, but who cares?

Next steps: advanced containers!

e.g. a .tar which also contains all the binary data...

15 / 16

Embedding Object Data in TEI Documents

Page 30: Embedding Object Data in TEI Documents - Challenges and ... · Embedding Object Data in TEI Documents...and Contexts Contexts are what makes the di erence between a glori ed card

Conclusion and Possibilities

Use a given standard

Even use it as a container!

Facilitate versioning, storage and communication

Not everyone needs all the contained data, but who cares?

Next steps: advanced containers!

e.g. a .tar which also contains all the binary data...

2009-1

2-0

3

Embedding Object Data in TEI Documents

Conclusion and Possibilities

To summarize, we use a given standard, overload one of the tags just

slightly, and by these means create a self-contained textual object, which

can be versioned, long-time archived and is able to be exploded into its

different contents by very simple means. The next step would be an

advanced container, a .tar or something like that, which can also hold

binary data relevant for the object itself. We’ll see how it will develop.

Page 31: Embedding Object Data in TEI Documents - Challenges and ... · Embedding Object Data in TEI Documents...and Contexts Contexts are what makes the di erence between a glori ed card

Thank You!

http://cceh.uni-koeln.de/

http://hki.uni-koeln.de/

http://arachne.uni-koeln.de/

http://enrich.manuscriptorium.com/

[email protected][email protected]

16 / 16

Embedding Object Data in TEI Documents

Page 32: Embedding Object Data in TEI Documents - Challenges and ... · Embedding Object Data in TEI Documents...and Contexts Contexts are what makes the di erence between a glori ed card

Thank You!

http://cceh.uni-koeln.de/

http://hki.uni-koeln.de/

http://arachne.uni-koeln.de/

http://enrich.manuscriptorium.com/

[email protected][email protected]

2009-1

2-0

3

Embedding Object Data in TEI Documents

Thank You!

Thanks for your attention. Now the obligatory question: Questions?