PC in TB Manfred Thaller PLANETS TB meeting, DenHaag, Sept 28th. '06.

Post on 26-Mar-2015

218 views 1 download

Tags:

Transcript of PC in TB Manfred Thaller PLANETS TB meeting, DenHaag, Sept 28th. '06.

PC in TB

Manfred ThallerPLANETS TB meeting, DenHaag,

Sept 28th. '06

PC* in TB

* as represented by PC/2, PC/4 and PP/5

or: The XCEL / XCDL concept.

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06

Building block I

A language, which allows a program to read "any file specification" based on

==> "eXtensible Characterisation Extraction

Language"

Formulate the humanly readable specifications of TIFF, RTF, WAV …in a language, which a general purpose program can read.

General enough that any existing format specification can be expressed in it. (LATeX, MAX, VRML …)

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06

Building block I - Warning

After the alphabet had been designed ...

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06

Building block I - Warning

After the alphabet had been designed ... ... somebody had still to write all those books.

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06

Building block I - Warning

After the alphabet had been designed ... ... somebody had still to write all those books.

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06

Building block II

A language, which allows a program to describe "any file content" using a

==> "eXtensible Characterisation Definition

Language"

Formulate the content of any file in an abstract language, which captures the complete information contained in it.

General enough that any existing content can be expressed in it.

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06

Building block III

A program, which is able to interpret a format description in XCEL, and, using that, extracts from any file of that format a XCDL description of its content.

Production level quality. Indicative performance: <= 1 second / file.

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06

Building block IV

A program, which takes two XCDL descriptions and delivers a statement about the similarity of the information described.

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06

Relationship to DOWPC/2 defines the languages.(Starting: month 1 – [ finished month 18 ]. ) Deliverable: End month 5.Reuses PRONOM / DROID.

PC/4 implements the extraction mechanism(Starting: month 1, ups, 4 – [ finished month 18 ]. )

Reuses any existing tools.

PP/5 implements comparison mechanism and metrics of similarity of "information".

(Starting: month 15.)

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06

Metadata Derivation

File format A: # of color bands

File format B: depth<xsd:complexType name="bitDepth">

<xsd:complexContent><xsd:extension base="symbolType">

<xsd:sequence><xsd:element name="validValues" type="integerList"/>

</xsd:sequence></xsd:extension>

</xsd:complexContent></xsd:complexType>

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06

Metadata Derivation

From observed file properties

==> Property Ontology

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06

Basic Elements:Byte OrderEncodingsPosition Types...

Structuring Elements:Item (logical unit that contains at least one sub-item)Symbol (smallest logical unit)

Image Schema:Colour TypeWidthHeightBit Depth…

Text Schema:Font-StyleFont-FamilySizeLanguage…

Multimedia Schema:PitchSamplerateChannelsFramerate...

PNG Instance

RTF Instance

TIFFInstance

PDFInstance

WAVInstance

MPEG4Instance

Processing Instructions:filepointerssymbol-counters…

Schema Architecture

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06

Metrics of Comparison I

"Information" will be grouped according to three levels:

–Descriptive (width, height,photogrammetric interpretation, aka “1 = red” )

–History (compression,photogrammetric interpretation, aka “1 = red”)

–Content (bytestream)

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06

Metrics of Comparison II

–Descriptive (width, height,photogrammetric interpretation, aka “1 = red” ) Can this be the same object?

–History (compression,photogrammetric interpretation, aka “1 = red”) Can this have been the same object?

–Content (bytestream) Is this the same object?

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06

Metrics of Comparison III

– Is the sequence of (UTF16) characters the same?

–Are properties with the same symbolic name applied to the same areas within the UTF16 sequence?

–Are the properties related to the same objects?

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06

XCDL: Observation

An XCDL description at the content level is actually a "universal virtual file format" …

… though inflated to about 210 % of the original size.

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06

PC (XCEL/XCDL) ==> TB

Provide:

comparison tool.

[ profiling tool. ]

[ validation. ]

[ identification. ]

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06

TB ==> PC (XCEL/XCDL)

Quis custodiet ipsos custodes?

Or: Who tests the testing tool?

Or: Beta (and possibly pre-Beta) “testing”.

Behaviour.

Performance.

Calibration.

Reference objects.

The end

Manfred Thaller PLANETS TB, Den Haag, Sept. 28th '06