Modeling languages for Semi- Structured Documents · Modeling languages for Semi-Structured...

Post on 08-Aug-2020

8 views 0 download

Transcript of Modeling languages for Semi- Structured Documents · Modeling languages for Semi-Structured...

Modeling languages for Semi-Structured DocumentsStructured Documents

C O M P A R I S O N A N D T R A N S L A T I O N B E T W E E N D M L A N D I T S C O M P E T I T O R S

Yudan ZhaiDep. Of Informatiquep q

06/08/2009

Outline

Project Introduction Project Introduction

XML Modeling in General

Th C i f S h L The Comparison of Schema Languages

The Development of a Schema Translator

Conclusion

Project Introductionj

Goal of the projectGoal of the project

XML Modeling in general XML Modeling in general

Compare DML with other XML schema languages

Make a translation tool between DML and Relax NG Make a translation tool between DML and Relax NG

Project Introductionj

Mil t Milestones -

MS1 - Related study Understand XML modeling in general

DTD, XML Schema and Relax NG

Make a comparison among these three schemas

MS I d th t d f DML MS2 - In-depth study of DML YML, DML, DGL

Comparison: DML vs Relax NG / XML Schema / DTD Comparison: DML vs. Relax NG / XML Schema / DTD

MS3 - Implementation

XML Modeling in Generalg

XML XML stands for - eXtensible Markup Language

Motivation – Exchange information

Valid document The document should be readable and understandable with XML- The document should be readable and understandable with XML-

aware software.

Sets of rules and constrains are defined. specified by XML schema languages

Four Schema Languages g g

DTD Document Type Definitions

Can be defined inline

XML Schema Published by the W3C More express power

Too complexity syntax

Four Schema Languages g g

Relax NG Being standardized in OASIS

Clean, simple and powerful

T ib l i d l Treat attributes as elements in content models

DML DML Document Modeling Language

Is a regular tree grammar-based schema language

Supports inheritance

Comparison of Schema LanguagesComparison of Schema Languages

The easiest syntax –DTDThe easiest syntax DTD

Richest build in data types XML Schema Richest build-in data types –XML Schema

Simple yet powerful enough –Relax NG

As a part integrated system – DML

The Development of a Schema Translatorp

Project Introduction

Language: JAVA

D l i E i t JDK Developing Environment: JDK 5.0

Function: Function: Converting From RelaxNG to DML

Converting From DML to Relax NGg

The Development of a Schema Translatorp

Abstract syntax

ASN.1(Abstract Syntax Notation One )

Standard and Notation

Describes data structures

Implementationp

Ab t t t f R l NG Abstract syntax for Relax NGGrammar : = srt : Start ; def : Define Start : = top : TopDefine : = name : Identifier; elt : Element Element : = nc : NameClass; top : TopTOP : = na : NotAllowed | pattern : PatternPattern : = empty : Empty | nep :NonEmptyPatternNonEmptyPattern : = txt : TEXT | data : Data

| value : Valueue | list : NGList| att : NGAttribute | ref : REF | att : NGAttribute | ref : REF | oom : OneOrMore | choice : Choice | group : Group | itl : Interleave

Text : = < text /> Data : = type : Identifier ; dtl : URI Value : = dtl : URI ; type : Identifier ; ns : String ;; yp ; g ;

content:StringList : = pattern : PatternNGAttribute : = name : String ; pattern : Pattern Ref : = name : IdenfifierOneOrMore : = nep : NonEmptyPattern

h iChoice : = nep : NonEmptyPatternGroup : = nep : NonEmptyPatternInterleave : = nep : NonEmptyPatternNameClass : = anyName : AnyName

| nsName : NsName| name : Name| name : Name

Identifier : = S

Implementationp

Ab t t t f DML Abstract syntax for DMLSCHEMA ::= ns:NS*;str:STRUCT*;type:TYPE*NS ::= id:ID;uri:URI|ns:NS*STRUCT ::= sim:SIMPLE |named:NAMED |der:DERIVED |str:STRUCT*STRUCT :: sim:SIMPLE |named:NAMED |der:DERIVED |str:STRUCTTYPE ::= id:ID;pattr:PATTERN |type:TYPE*SIMPLE ::= att:ATTRIBUTE;cnt:CONTENTCONTENT ::= item:ITEM |ref:REFITEM ::= seq:SEQ |choice:CHOICE |elt:ELT

|txt:TXT |any:ANY |item:ITEM*|txt:TXT |any:ANY |item:ITEM*REF ::= qn:QNAMESEQ ::= occ:OCC;item:ITEMCHOICE ::= occ:OCC;item:ITEMELT ::= val:VAL;occ:OCC;sim:SIMPLEATTRIBUTE ::= anyatt:ANYATT | use:USE;val:VAL* |att:ATTRIBUTE*TXT ::= val:VAL;occ:OCC;BANY ::= occ:OCC;sim:SIMPLEANYATT ::= use:USE;val:VALOCC ::= 1|?|+|*OCC :: 1|?| |USE ::= 1|?VAL ::= tref:TYPEREF |pattr:PATTERN |id:ID |APP |CPYNAMED ::= id:ID;sim:SIMPLEID ::= id:stringB ::= Boolean B ::= Boolean

Relax NG to DML

A hit t Architecture

Relax NG Tree Builder

Relax NG Tree Builder

E l Example<?xml version="1.0" encoding="ISO-8859-1"?>

<grammar><start>

<ref name="simple-elt"/></start></start><define name="simple-elt">

<element><name ns="">a</name>

ib<attribute><name ns="">id</name><text/>

</attribute></attribute></element>

</define></grammar>

Relax NG Tree Builder

Corresponding abstract tree

Converter(Relax NG to DML)( )

Basic Rules: Basic Rules: Grammar -------> Schema

TOP -------> SimpleStructure Attribute -------> Attribute Attribute > Attribute Reference -------> Reference Other Pattern -------> Item

• Empty -------> NULL• NonEmptyPattern -------> Item

Text -------> Text Data, Value -------> Value List,OneOrMore,Group -------> SEQ, , p Q Choice -------> Choice InterLeave -------> Choice and SEQ

Define -------> NamedStructure Element >SimpleStructure Element ------->SimpleStructure

• NameClass -------> Value• Top ------->SimpleStructure

Converter(Relax NG to DML)( )

R l Rules But Reference -------> Reference?

<Seq><ref name="elt-a"/><ref name="elt-b"/>

/S

<Seq><ref name="elt-a"/><ref name="elt-b"/>

/S

<start><oneOrMore>

<group><ref name="elt-a"/><ref name="elt-b"/>

<Seq><ref name="elt-a"/><ref name="elt-b"/>

/S</Seq></Seq><ref name elt b />

</group></start><define name="elt-a">

<element><name ns "">a</name>

</Seq><seq occ="many"><elt occ="once"><name content="a"/><text occ="once" eol="false"><value type="string"/><name ns= >a</name>

<text/></element>

</define><define name="elt-b">

<value type= string /></text></elt><elt occ="once"><name content="b"/>

t t " " l "f l "<element><name ns="">b</name><text/>

</element></define>

<text occ="once" eol="false"><value type="string"/></text></elt></seq>/de e

Result

<?xml version="1.0" encoding="ISO-8859-1"?> <?xml version="1 0" encoding="UTF-8"?><?xml version 1.0 encoding ISO 8859 1 ?><!-- TWO ELEMENTS --><grammar><start><oneOrMore><group>

<?xml version= 1.0 encoding= UTF-8 ?>

<yml><seq occ="many">

<group><ref name="elt-a"/><ref name="elt-b"/>

</group></oneOrMore>

<elt occ="once"><name content="a"/><text occ="once" eol="false"><value type="string"/>

</start><define name="elt-a"><element>

<name ns="">a</name><text/>

<value type= string /></text>

</elt><elt occ="once">

<text/></element>

</define><define name="elt-b"><element>

"" b /

<name content="b"/><text occ="once" eol="false"><value type="string"/></text><name ns="">b</name>

<text/></element>

</define></grammar>

</text></elt></seq></yml>

Reverse Convertingg

A hit t Architecture

DML Tree Builder

Converter( DML to Relax NG)( )

B i l Basic rules: Schema -------> Grammar

SimpleStructure > Top SimpleStructure -------> Top

ATT -------> ATT

CNT-------> PatternC• Item ------->Pattern

• Ref -------> Ref

N dS D fi NamedStructure -------> Define

SimpleStructure ------->TOP

Converter( DML to Relax NG)( )

E ti l Exception rules:

Simple structure contains Element Element -------> Reference

Add the ne Define Str ct re Add the new Define Structure

Element contains Element Element contains Element Element -------> Reference

Add the new Define Structure

Converter( DML to Relax NG)( )

E ti l Exception rules

<yml version="1.0" type="dml">

<grammar xmlns=http://relaxng.org/ns/structure/1.0>y y

<elt><name content="addressbook"/><elt occ="many">

<name content="contact"/>

<start><ref name="addressbook-NC"/>

</start><define name "addressbook NC" ><name content= contact />

<ref name="contact-content"/></elt>

</elt>

<define name="addressbook-NC" ><element><name ns="">addressbook</name><oneOrMore>

<Structure>……..

</structure></yml>

<ref name="contact-NC"/></oneOrMore>

</element></define></yml> </define><define>

……</define><grammar>

/<grammar>

Converter( DML to Relax NG)( )

How to deal with Occurrence? How to deal with Occurrence?

Many <OneOrMore> P <OneOrMore>

Free <Choice>Free <Choice><OneOrMore>P<OneOrMore><empty/>

<Choice> <Choice>

Optional

<Choice>P <empty/><Choice>

Once P

Result (DML->RelaxNG)( )

Add b k d l Addressbook.dml

Addressbook.rng

Conclusion

DML DML is a part of integration system for the management of semi-structured

documentsh i h DTD has a stronger expressive power than DTD

Reduce the complexity as XML Schema Is very comparable to Relax NG but provides an inheritance mechanism

Implementation Based on Abstract Syntax Notation Based on Abstract Syntax Notation Good expansibility for the program Limitations:

Weak for data type conversion Syntax of RelaxNG is limited to Simple syntax only Did not consider inheritance in DML