Post on 01-Feb-2016
description
TMF - a tutorialPart 3: Designing (schemas and)
filters
TMF - Terminological Markup Framework
Laurent Romary - Laboratoire Loria
General principles
Terminological information interchange– Three components:
• Source TDB1
• Target TDB2
• Terminological interchange format– A specific TML (DXLT, Geneter)
TDB1 TDB2
TML
Important notice
– GMT is not a TML• A too abstract format
– Uncontrolled recursivity (‘ struct ’ element)
– Uncontrolled content (‘ feat ’ and ‘ annot ’)
• Necessity to provide a schema to check interchanged data
– Precise list of datacategory
– Precise definition of format
– GMT is here to provide conceptual simplicity
Designing filters
TML to GMT
General principles
Just for your information– The creation of the filters can be automatized
Basic processes– Reduction of expansion trees– Mapping elements and attributes to the
corresponding data categories
Reducing expansion trees
Example• DXLT (Martif) sub-tree
<ntig><!-- some general information associated with the term --><termGrp>
<!-- term related information --></termGrp>
</ntig>
• GMT<struct type="TS"><!-- some features -->
</struct>
Element mapping
Example• DXLT (Martif)
<definition>Bla, bla, bla etc.</definition>
• GMT<feat type="definition">Bla, bla, bla etc.</feat>
Structural elements
Generating a GMT ‘ struct ’ element
<xsl:template match="termEntry"><xsl:element name="struct">
<xsl:attribute name="type">TE</xsl:attribute>
<xsl:apply-templates select="@*|node()"/></xsl:element>
</xsl:template>
Features
Generating a GMT‘ feat ’ element» (style=Attribute)
<xsl:template match="@id"><xsl:element name="feat">
<xsl:attribute name="type">iso12620-identifier</xsl:attribute>
<xsl:value-of select="."/></xsl:element>
</xsl:template>
Features
Generating a GMT‘ feat ’ element» (style=Element)
<xsl:template match="term"><xsl:element name="feat">
<xsl:attribute name="type">iso12620-term</xsl:attribute>
<xsl:apply-templates/></xsl:element>
</xsl:template>
Features
Generating a GMT‘ feat ’ element» (style=TypedElement)
<xsl:template match="descrip[@type='subjectField']"><xsl:element name="attr">
<xsl:attribute name="type">SubjectField</xsl:attribute>
<xsl:apply-templates/></xsl:element>
</xsl:template>
XML Schemas for TMLs
…work ahead…
Analysing existing TDBs
Towards a generic methodology
General Architecture
TDB Flat XML GMT TMLForm
at spe
cific
XSL
sty
lesh
eet
Sim
ple
DB dum
per
Autom
atic G
MT2
TML st
yles
heet
A two phase process
List the various Data Categories used in the TDB– Relate them to existing registries (e.g. iso 12620),
cf. http://salt.loria.fr/public/salt/DCQuery.html
Identify the underlying organization of the TDB– Relate it to the Meta-model– Anchor the DatCat where they actually occur
Analysis of an existing TDB
Going through an example
Eurodicautom sample<entry>
<BE>BTB</BE><TY>DAG77</TY><NI>398</NI><CF>3</CF><CM>AG1</CM><CM>JUA</CM><EN>
<VE>key money</VE><RF>CILF,Dict.Agriculture,ACCT,1977</RF>
</EN><FR>
<VE>pas-de-porte</VE><DF>prix payé au précédent occupant pour le droit d'entrer dans une
exploitation agricole</DF><RF target="DF">TNC(1997)</RF><RF>CILF,Dict.Agriculture,ACCT,1977</RF><NT type="NTE">droit rural;pratique prohibée par la loi</NT>
</FR></entry>
definition-12620A.5.1 (TS)
term-12620A.1 (TS)
Language 12620A.10.7(LS)
note-12620A.8 (TS)
classificationCode-12620A.4.2 (TE)
Result in GMT (1/2)<tmf>
<struct type="TE"><feat type="entryIdentifier-12620A.10.15">BTB-TY-398</feat><feat type="originatingInstitution-12620A.10.22.2">BTB</feat><feat type="projectSubset">DAG77</feat><feat type="NI">398</feat><feat type="reliabilityCode">3</feat><feat type="classificationCode-12620A.4.2">AG1</feat><feat type="classificationCode-12620A.4.2">JUA</feat><struct type="LS">
<feat type="language-12620A.10.7">EN</feat><struct type="TS">
<feat type="term-12620A.1">key money</feat></struct><feat type="sourceIdentifier-
12620A.10.20">CILF,Dict.Agriculture,ACCT,1977</feat></struct>
Result in GMT (2/2)<struct type="LS">
<feat type="language-12620A.10.7">fr</feat><struct type="TS">
<feat type="term-12620A.1">pas-de-porte</feat>
</struct><brack>
<feat type="definition-12620A.5.1">prix payé au précédent occupant pour le droit d'entrer dans une exploitation agricole</feat>
<feat type="sourceIdentifier-12620A.10.20">TNC(1997)</feat>
</brack><feat type="sourceIdentifier-
12620A.10.20">CILF,Dict.Agriculture,ACCT,1977</feat><feat type="note-12620A.8">droit rural;pratique
prohibée par la loi</feat></struct>
</struct></tmf>
Simple rules
Using XSL locality
<xsl:template match="CM"> <feat type="classificationCode-12620A.4.2"> <xsl:apply-templates/> </feat></xsl:template>
Introducing specific levels
Necessity to combine structure and content
<xsl:template match="VE"> <struct type="TS"> <feat type="term-12620A.1"> <xsl:apply-templates/> </feat> </struct></xsl:template>
Default rule
Useful for keeping track of unmapped data categories
<xsl:template match="*"> <feat> <xsl:attribute name="type">
<xsl:value-of select="name()"/></xsl:attribute>
<xsl:apply-templates/> </feat></xsl:template>
Useful pointers
TMF page:– http://www.loria.fr/projets/TMF
HLT/Salt project page– http://www.loria.fr/projets/SALT
Data category query tool:– http://salt.loria.fr/public/salt/DCQuery.html