XML Update Formal Verification

download XML Update Formal Verification

of 21

Transcript of XML Update Formal Verification

  • 8/9/2019 XML Update Formal Verification

    1/21

    Rewrite-Based Verication of XML Updates

    Florent JacquemardINRIA Saclay - IdF & LSV UMR

    61 av. du pdt Wilson 94230 Cachan, [email protected]

    Michael RusinowitchINRIA Nancy - Grand Est & LORIA UMR

    615 rue du Jardin Botanique54602 Villers-les-Nancy, [email protected]

    AbstractWe propose a model for XML update primitives of the W3CXQuery Update Facility as parameterized rewriting rulesof the form: insert an unranked tree from a regular treelanguage L as the rst child of a node labeled by a. Forthese rules, we give type inference algorithms, consideringtypes dened by several classes of unranked tree automata.These type inference algorithms are directly applicable toXML static typechecking, which is the problem of verifyingwhether, a given document transformation always convertssource documents of a given input type into documents of a given output type. We show that typechecking for arbi-trary sequences of XML update primitives can be done inpolynomial time when the unranked tree automaton den-ing the output type is deterministic and complete, and thatit is EXPTIME-complete otherwise.

    We then apply the results to the verication of access con-trol policies for XML updates. We propose in particular apolynomial time algorithm for the problem of local consis-tency of a policy, that is, for deciding the non-existence of a sequence of authorized update operations starting from agiven document that simulates a forbidden update opera-tion.Categories and Subject Descriptors D.2.4 [SOFT-WARE ENGINEERING ]: Software/Program Verication-Formal methods, model checking]

    General Terms Verication, Languages, Theory, Secu-rity

    Keywords XML Updates, Static Typechecking, XML Ac-cess Control Policies, Term Rewriting, Hedge Automata

    1. IntroductionXQuery language has been extended to XQuery Update Fa-cility [Xquery UF 2009] in order to provide convenient meansof modifying XML documents or data. The language is aThis work has been supported by the INRIA ARC 2010 projectACCESS, the FP7-ICT-2007-1 Project no. 216471, AVANTSSARand the FET-Open grant agreement FOX no. FP7-ICT-23359.

    Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copiesare not made or distributed for prot or commercial advantage andthat copies bear this notice and the full citation on the rst page. Tocopy otherwise, to republish, to post on servers or to redistribute tolists, requires prior specic permission and/or a fee.PPDP10 July 2628, 2010, Hagenberg, Austria.Copyright c 2010 ACM 978-1-4503-0132-9/10/07. . . $10.00

    candidate recommendation from W3C and adds imperativeoperations that permit one e.g. to update some parts of adocument while leaving the rest unchanged. This includesrename, insert, replace and delete operations at the nodelevel. Compared to other transformation languages (such aXSLT), XQuery Update Facility is considered to offer con-cise, readable solutions.

    A central problem in XML document processing is statictypechecking . This problem amounts to verifying at com-pile time that every output XML document which is theresult of a specied query or transformation applied to aninput document with a valid input type has a valid out-put type. However for transformation languages such as theone provided by XQuery Update Facility, the output type of (iterated) applications of update primitives are not easy topredict. Another important issue for XML data processing isthe specication and enforcement of access policies. A largeamount of work has been devoted to secure XML querying.But most of the work focus on read-only rights, and very fewhave considered update rights for a model based on XQueryUpdate Facility operations e.g. [Bravo et al. 2008; Fundulakiand Maneth 2007].

    In the domain of innite state systems and program veri-cation, several approaches such as regular model checkingrely on algorithms computing the rewrite closure of tree au-tomata languages, see e.g. [Bouajjani and Touili 2005; Feuil-lade et al. 2004; Genet and Rusu 2010]. It seems natural toconsider such tree automata techniques for verication prob-lems related to the typing of XML documents and XMLtransformations, in particular XML updates [Xquery UF2009]. Indeed, XML documents are commonly representedas nite labeled unranked trees, and most of the typing for-malisms currently used for XML are based on nite treeautomata [Murata 2000; Schwentick 2007].

    A standard approach to XML typechecking is forward (resp.backward) type inference , that is, the computation of an out-

    put (resp. input) XML type given an input (resp. output)type and a tree transformation. Then the typechecking itself can be reduced to the verication of inclusion of the com-puted type in the given output or input type, see [Milo et al.2003] for an example of backward type inference procedure.

    In this paper, we consider the problem of typechecking arbi-trary sequences of operations taken in a given set of atomicupdate primitives. We propose a modeling of (possibly in-nite) sets of primitive update operations of the W3C XQueryUpdate Facility proposal [Xquery UF 2009] in terms of rewrite rules with parameters and XPath expressions forthe selection of the rewrite positions. The update opera-tions include renaming, insertion, deletion and replacement

  • 8/9/2019 XML Update Formal Verification

    2/21

    in XML documents, and some extensions, like the deletionof one single node (preserving its descendant) instead of the deletion of a whole subtree. For several subclasses of these operations, we derive algorithms of synthesis of un-ranked tree automata, yielding both forward and backwardtype inference results. Since update operations, beside rela-beling document nodes, can create and delete entire XML

    fragments, modifying a documents structure, it is not ob-vious how to infer the type of updated documents. Formertree automata completion constructions like [Feuillade et al.2004] work for automata computing on ranked trees. Here,we consider unranked ordered trees, and our constructionsare non trivial adaptations of former tree automata comple-tion procedures, where, starting from an initial automaton,new transitions rules are added and existing transition rulesare recursively modied. Moreover, we show that some up-date operations do not preserve regular tree languages (i.e.languages of hedge automata) and that for the type infer-ence for these operations, we need to consider a larger andless mainstream class of decidable unranked tree recognizerscalled context-free hedge automata.

    One of our motivations for this study is the static analy-sis of access control policies (ACP) for XML updates. Weconsider two approaches for this problem. The rst approachaddresses rule-based specications of ACPs, where the oper-ations allowed, resp. forbidden, to a user are specied as twosets of atomic update primitives [Bravo et al. 2008; Fundu-laki and Maneth 2007]. We show in particular how to applyour type inference results to the verication of local consis-tency of ACPs, i.e. whether no sequence of allowed updatesstarting from a given document can achieve an explicitly for-bidden update. Such situations may lead to serious securitybreaches which are challenging to detect according to [Fun-dulaki and Maneth 2007]. In the second approach ( DTD-based XML ACPs) the ACP is dened by adding securityannotations to a DTD D , as in [Fan et al. 2004; Fundulaki

    and Maneth 2007]. In this case, it is required to check the va-lidity of the document wrt D before applying every updateoperation. We show that under this restriction typecheckingbecomes undecidable.

    Related work: Many works have employed tree automatato compute sets of descendants for standard (ranked) termrewriting (see e.g. [Feuillade et al. 2004]). Regular modelchecking [Bouajjani et al. 2000] is extended to hedge rewrit-ing and hedge automata in [Touili 2007], which gives a pro-cedure to compute reachability sets approximations . Herewe compute exact reachability sets for some classes of hedgerewrite systems. For some results we need context-free hedgeautomata, a more general class than the regular hedge au-tomata of [Touili 2007].

    When considering real programming languages like XDuceor CDuce [Benzaken et al. 2003] for writing transforma-tions, typechecking is generally undecidable and approxi-mations must be applied. In order to obtain exact algo-rithms, several approaches dene conveniently abstract for-malisms for representing transformations. Let us cite forinstance TL (the transformation language) [Maneth et al.2005] and macro tree transducers (MTT) [Maneth et al.2007; Perst and Seidl 2004], and k-pebble tree transducers(k-PTT) [Milo et al. 2003], a powerful model dened so as tocover relevant fragments of XSLT [Kay 2003] and other XMLtransformation languages. Some restrictions on schema lan-guages and on top down tree transducers (on which trans-

    formations are based) have also been studied [Engelfrietet al. 2009; Martens and Neven 2004] in order to obtainPTIME typechecking procedures. [Tozawa 2001] proposea backward type inference algorithm (based on tree au-tomata techniques) for an XSLT fragment without XPathbut with recursive calls. In a comparable approach, [Frischand Hosoya 2007] propose a backward type inference algo-

    rithm for MTTs based on alternating tree automata, opti-mized towards practicability.

    In this paper, we consider unrestricted applications of up-dates, unlike e.g. top-down transductions in [Martens andNeven 2004]. It is shown in [Milo et al. 2003] that the set of output trees of a k-PTT for a xed input tree is a regular treelanguage. In contrast, we shall see (Example 4 below) thatit is not the case for the iteration of some update operations,and therefore that such transformation are not expressibleas k-PTT. In Theorem 2, we show that the output languageof the iteration of these updates for a regular input languageis recognizable by a context-free hedge automata. This canbe related to the result of [Engelfriet and Vogler 1985], usedin [Maneth et al. 2007] in the context of typechecking XML

    transformations, and stating that the output language of alinear stay MTT can be characterized by a context-free treegrammar (in the case of ranked trees). Theorem 2 impliesthat the output languages of the iteration of updates can bedescribed by MTTs, as MTT can generate all context-freetree languages. On the other hand, each of the primitiveupdate operations can be solely modeled by a MTT. It ishowever not clear whether the nite (but unbounded) iter-ations of updates operations can be easily expressed as aMTT relation.

    In [Benedikt and Cheney 2009] the authors investigate theproblem of synthesizing an output schema describing theresult of an update applied to a given input schema. Theyshow how to infer safe over-approximations for the results of both queries and updates. Recent works have also appliedlocal Hoare reasoning to simple tree update and even toa signicant subset of the XML update library in W3CDocument Object Model [Gardner et al. 2008]. As far aswe know this approach is not automated.

    The rst access control model for XML was proposedby [Damiani et al. 2000] and was extended to secure up-dates in [C. Lim and Son 2003]. Static analysis has beenapplied to XML Access Control in [Murata et al. 2006] todetermine if a query expression is guaranteed not to accessto elements that are forbidden by the policy. In [Fundulakiand Maneth 2007] the authors propose the XACU language.They study policy consistency and show that it is undecid-able in their setting. On the positive side [Bravo et al. 2008]considers policies dened in term of annotated non recursiveXML DTDs and gives a polynomial algorithm for checkingconsistency.

    Organization of the paper: we introduce the needed for-mal background about terms, hedge automata and rewrit-ing systems in Section 2. Then we present XML update asparameterized rewriting rules and the type synthesis algo-rithms in Section 3. In Section 4 we study an extension of ourrewriting rules by XPath expressions specifying the nodeswhere the rules can be applied. Finally we give applicationsto Access Control Policies verication in Section 5.

  • 8/9/2019 XML Update Formal Verification

    3/21

    2. Denitions2.1 Unranked Ordered Trees

    Terms and Hedges. We consider a nite alphabet and an innite set of variables X . The symbols of aregenerally denoted a ,b , c . . . and the variables x , y. . .We

    dene recursively a hedge over and X as a nite (possiblyempty) sequence of terms and a term as either a single noden labeled by a variable of x X or the application of anode n labeled by a symbol a to a hedge h . The term isdenoted x in the rst case and a(h) in the second case, andn is called the root of the term in both cases. The emptysequence is denoted () and when h is empty, the term a(h)will be simply denoted by a . The root node of a(h) is calledthe parent of every root of h and every root of h is calleda child of the root of a(h). A root of a hedge ( t1 . . . t n ) is aroot node of one of t 1 ,...,t n . A leaf of a hedge ( t1 . . . t n ) isa leaf (node without child) of one of the terms t 1 ,...,t n . Apath is a sequence of nodes n 0 , . . . , n p such that for all i < p ,n i +1 is a child of n i . In this case, n p is called a descendant of n 0 . As usual, we can see a hedge h H( , X ) as a function

    from its set of nodes dom (h) into labels in X . The labelof the node n dom (h) is denoted by h(n ).The set of hedges and terms over and X are respectivelydenoted H ( , X ) and T ( , X ). We will sometimes considera term as a hedge of length one, i.e. consider that T ( , X ) H ( , X ). The sets of ground terms (terms without variables)and ground hedges are respectively denoted T () and H ().The set of variables occurring in a hedge h H( , X ) isdenoted var (h). A hedge h H( , X ) is called linear if every variable of X occurs at most once in h .

    Substitutions. A substitution is a mapping of nite do-main from X into H ( , X ). The application of a substitution to terms and hedges (written with postx notation) is de-ned recursively by x := (x) when x dom (), y := ywhen y X \dom (), ( t1 . . . t n ) := ( t1 . . . t n ) for n 0,and a (h) := a(h ).

    Contexts. A context is a hedge u H( , X ) with a distin-guished variable xu linear (with exactly one occurrence) inu . The application of a context u to a hedge h H( , X ) isdened by u[h] := u{xu h}. It consists in inserting h intoa hedge in u in place of the node labelled by xu . Sometimes,we write t [s] in order to emphasize that s is a subterm (orsubhedge) of t .

    2.2 Hedge Automata and Context-Free HedgeAutomata

    We consider two kind of types for XML documents, dened

    as two classes of automata for unranked trees. The rst oneis the class of hedge automata [Murata 2000], denoted HA.It captures the expressive strength of almost all populartype formalisms for XML [Murata et al. 2000]. The secondand perhaps lesser known class is the context-free hedgeautomata, denoted CF-HA and introduced in [Ohsaki et al.2003]. CF-HA are strictly more expressive than HA and weshall see that they are of interest for typing certain updateoperations.

    Definition 1. A hedge automaton (resp. context-free hedgeautomaton ) is a tuple A = ( ,Q ,Q f , ) where is an niteunranked alphabet, Q is a nite set of states disjoint from ,Q f Q is a set of nal states, and is a set of transitions

    of the form a(L) q where a , q Q and L Q is a regular word language (resp. a context-free word language).

    When is clear from the context it is omitted in the tuplespecifying A. We dene the move relation between groundhedges h, h H( Q) as follows: h A h

    iff there exists acontext u H( , {xC }) and a transition a(L) q such

    that h = u[a(q1 . . . qn )], with q1 . . . qn L and h

    = u[q]. Therelation A is the transitive closure of A .

    Collapsing Transitions. We consider the extension of HAand CF-HA with so called collapsing transitions which arespecial transitions of the form L q where L Q is acontext-free language and q is a state. The move relationfor the extended set of transitions generalizes the abovedenition with the case u[q1 . . . qn ] A u[q] if L q isa collapsing transition of A and q1 . . . qn L . Note that wedo not exclude the case n = 0 in this denition, i.e. L maycontain the empty word in L q. Collapsing transitionswith a singleton language L containing a length one word(i.e. transitions of the form q q , where q and q are states)correspond to -transitions for tree automata.

    Languages. The language of a HA or CF-HA A in one of its states q, denoted by L(A, q) and also called set of hedgesof type q, is the set of ground hedges h H() such thath A q. We say sometimes that a hedge of L(A, q) has typeq (when A is clear from context). A hedge is accepted by Aif there exists q Q f such that h L(A, q). The languageof A, denoted by L(A) is the set of hedge accepted by A.Note that without collapsing transitions, all the hedges of L(A, q) are terms. Indeed, by applying standard transitionsof the form a(L) a , one can only reduce length-one hedgesinto states. But collapsing transitions permit to reduce aground hedge of length more than one into a single state.

    The -transitions of the form q q do not increase theexpressiveness HA or CF-HA (see [Comon et al. 2007] for HAand the proof for CF-HA is similar). But it is not the casein general for collapsing transitions: collapsing transitionsstrictly extend HA in expressiveness, and even collapsingtransitions of the form L q where L is nite (henceregular).Example 1. [Jacquemard and Rusinowitch 2008]. The ex-tended HA A = {q, qa , qb , qf }, {g,a ,b}, {qf }, {a qa , b qb , g(q) qf , qa q qb q} recognizes {g(an bn ) | n 1}which is not a HA language.

    However, collapsing transitions can be eliminated from CF-HA, when restricting to the recognition of terms.

    Lemma 1 ([Jacquemard and Rusinowitch 2008]). For every extended CF-HA over with collapsing transitions A, thereexists a CF-HA A without collapsing transitions such that L(A ) = L(A) T () .

    Properties. It is known that for both classes of HA and CF-HA, the membership and emptiness problems are decidablein PTIME [Comon et al. 2007; Murata 2000; Ohsaki et al.2003]. Moreover HA languages are closed under Booleanoperations, but CF-HA are not closed under intersection andcomplementation. The intersection of a CF-HA languageand a HA language is a CF-HA language. All these resultsare effective, with PTIME (resp. EXPTIME) constructionsof automata of polynomial (resp. exponential) sizes for theclosures under union and intersection (resp. complement).

  • 8/9/2019 XML Update Formal Verification

    4/21

    We call a HA or CF-HA A = ( ,Q ,Q f , ) normalized if for every a and every q Q , there is at most onetransition rule a(La,q ) q in . Every HA (resp. CF-HA)can be transformed into a normalized HA (resp. CF-HA) inpolynomial time by replacing every two rules a(L1 ) q anda(L2 ) q by a(L1 L2 ) q.

    A CF-HA A = ( Q, Q f , ) is called deterministic iff for alltwo transitions rules a(L1 ) q1 and a(L2 ) q2 in ,either L1 L2 = or q1 = q2 . It is called complete if forall a and and w Q, there exists at least one rulea(L) q such that w L . When A is deterministic(resp. complete), for all t T (), there exists at most (resp.at least) one state q Q such that t L(A, q). EveryHA can transformed into a deterministic and complete HArecognizing the same language (see e.g. [Comon et al. 2007]).CF-HA can be completed but not determinized.

    2.3 Term Rewriting Systems

    We use below term rewriting rules for modeling XML updateoperations. For this purpose, we propose a non-standard def-inition of term rewriting, extending the classical one [Der-

    showitz and Jouannaud 1990] in two ways: the applicationof rewrite rules is extended from ranked terms to unrankedterms and second, the rules are parameterized by HA lan-guages (i.e. each parameterized rule can represent an innitenumber of unparameterized rules).

    Unranked Term Rewriting Systems. A term rewritingsystem R over a nite unranked alphabet (TRS) is a setof rewrite rules of the form r where H( , X ) \ X and r H( , X ); and r are respectively called left- andright-hand-side ( lhs and rhs ) of the rule. Note that we donot assume the cardinality of R to be nite.The rewrite relation R of a TRS R is the smallest binaryrelation on H ( , X ) containing R and closed by application

    of substitutions and contexts. In other words, h R h

    , iff there exists a context u , a rule r in R and a substitution such that h = u[] and h = u[r ]. The reexive andtransitive closure of R is denoted

    R .

    Parameterized Term Rewriting Systems. Let A =( ,Q ,Q f , ) be a HA. A term rewriting system over parameterized by A (PTRS ) is given by a nite set, denotedR / A, of rewrite rules r where H( , X ) andr H( Q, X ) and symbols of Q can only label leaves of r( stands disjoint union, hence we implicitly assume that and Q are disjoint sets). In this notation, A may be omittedwhen it is clear from context or not necessary. The rewriterelation R / A associated to a PTRS R / A is dened as therewrite relation

    R [A ]where the TRS R [A] is the (possibly

    innite) set of all rewrite rules obtained from rules r inR / A by replacing in r every state q Q by a ground term of L(A, q). Several examples of parameterized rewrite rules canbe found in Figure 1 below. We will consider in Sections 4and 5.2 two extensions of PTRS, called controlled PTRSand PTRS with global constraints.

    Problems. Given a set L H( , X ) and a PTRS R / A, wedene post R / A (L) := {h

    H( , X ) | h L, h R / A h

    }and preR / A (L) := {h H( , X ) | h

    L, h R / A h

    }.

    Reachability is the problem to decide, given two hedgesh, h H() and a PTRS R / A whether h R / A h

    . Reach-

    ability problems for ground ranked term rewriting have beeninvestigated in e.g. [Gilleron 1991]. C. L oding [Loding 2002]has obtained results in a more general setting where rules of type L R specify the replacement of any element of a reg-ular language L by any element of a regular tree languageR . Then [L oding and Spelten 2007] has extended some of these works to unranked tree rewriting for the case of sub-

    tree and at prex rewriting which is a combination of stan-dard ground tree rewriting and prex word rewriting on theordered leaves of subtrees of height 1.

    Typechecking (see e.g. [Milo et al. 2003]) is the problemto decide, given two sets of terms in and out called in-put and output types (generally presented as HA) and aPTRS R / A whether post R / A ( in ) out or equivalently in preR / A ( out ) = (where out is the complement of out ). One related problem, called forward (resp. backward )type inference , is, given a PTRS R /A and a HA or CF-HA language L , to construct a HA or CF-HA recognizingpost R / A (L) (resp. pre

    R / A (L)).

    3. Forward and Backward Type Inferencefor Update Operations

    In this section, we study the problem of type inference for ar-bitrary nite sequences of primitive update operations takenin a given set. More precisely, we propose a denition in termof PTRS rules (Section 3.1) of innite sets of update prim-itive operations of the XQuery update facility [Xquery UF2009] and some extensions. Then, we present constructionsof HA and CF-HA for forward and backward type inferencein these settings (Sections 3.23.4).

    3.1 Primitive Update Facility Operations

    We assume given an unranked alphabet and a HA A =( ,Q ,Q f , ). Figure 1 displays PTRS rules, parameterized

    by states p, p1 ,.., pn of A, representing innite sets of atomicoperations of the XQuery update facility [Xquery UF 2009],and some restrictions or extensions. We call UFO + the classof PTRS rules in Figure 1.

    The following rules correspond to the update primitivesof [Xquery UF 2009] except for the possibility in [Xquery UF2009] to select by XQuery the nodes to be inserted (calledcontent nodes in [Xquery UF 2009]) from the document oneis working on.

    REN renames a node: it changes its label from a into b. Sucha rule leaves the structure of the term unchanged. INSrstinserts a term of type p at the rst position below a nodelabeled by a . INS last inserts at the last position and INS intoat an arbitrary position below a node labeled by a . INSbefore

    (resp. INSafter ) inserts a term of type p at the left (resp.right) sibling position to a node labeled by a . DEL deletesa whole subterm whose root node is labeled by a and RPLreplaces a subterm by a sequence of terms of respective types p1 , . . . , p n .

    Example 2. The patient data in a hospital are stored in an XML document whose DTD can be characterized by an HAA with transition rules:

    hospital ({ ppa , pepa }) ph ,patient ( pn pt ) ppa ,

    patient ( pn ) pepa ,treatment ( pdr pdia pda ) pt ,

    name ( pc ) pn ,drug ( pc ) pdr ,

    diagnosis ( pc ) pdia ,date ( pc ) pda ,

    a pc , b pc , c pc . . .

  • 8/9/2019 XML Update Formal Verification

    5/21

    UFO +UFO reg

    a(x) b(x) RENa(x) a( p x) INSrst a(x) b( p x) RNS rsta(x) a(x p ) INS last a(x) b(x p ) RNS last

    a(xy ) a(x p y ) INS intoa(x) p a (x) INSbeforea(x) a(x) p INSaftera(x) p RPL 1 a(x) p1 . . . p n RPLa(x) () DEL a(x) x DEL s

    Figure 1. PTRS Rules for the Primitive XQuery Update Facility Operations and Extensions

    The state ph is the entry point of the DTD i.e. it representsthe type of the root element.

    A DEL rule patient (x) () will delete a patient in the base,and a INS last rule hospital (x) hospital (x p pa ) will insert a new patient , at the last position below the root node hospital .We can ensure that the patient newly added has an empty treatments list (to be completed later) using hospital (x) hospital (x p epa ). A INS after rule name (x) name (x) pt can beused to insert later a treatment next to the patients name .

    We propose also in Figure 1 some other operations notin [Xquery UF 2009]. The rules RNS combine the appli-cation of the corresponding insert operations INS and of anode renaming REN . The rule RPL 1 is a restriction of RPLto n = 1 (note that DEL is also a special cases of RPL , withn = 0). Finally, DEL s deletes a single node n whose argu-ments inherit the position. In other words, it replaces a termwith the hedge containing its children. This operation is em-ployed to build user views of XML documents e.g. in [Fanet al. 2004], and can also be useful for updates as well.Example 3. Assume that some patients of the hospital of Example 2 are grouped in one department like in

    hospital (. . . surgery ( ppa ) . . .),

    and that we want to suppress the department surgery whilekeeping its patients. This can be done with the DEL s rulesurgery (x) x .

    We will see in Section 3.3 that allowing the operationsRNS, DEL s or RPL has important consequences w.r.t. typeinference. Indeed, the subclass of operations in the rstcolumn of Figure 1, called UFO reg preserves languages of HAwhereas the operations in the second column may transforma HA language into a CF-HA language.

    3.2 Forward Type Inference for UFO reg Rules

    We want to characterize the sets of terms which can be ob-tained, from terms of a given type, by arbitrary applicationof updates operations dened as PTRS rules. For this pur-pose, we shall study the recognizability (by HA and CF-HA)of the forward closure ( post ) of automata languages underthe above rewrite rules.

    Theorem 1. For all HA A on , PTRS R / A UFO reg ,and HA language L , post R / A (L) is the language of an HAof size polynomial and which can be constructed in PTIME in the size of R / A and of an HA recognizing L .

    In the following proofs, we describe nite automata forthe horizontal languages of HA transitions as tuples B =(Q,S,i,F, ), where Q is the nite input alphabet, S is a

    nite set of states, i is the initial state, F S is the set of nal states and Q S ( {}) S is the set of transitionsand -transitions. Every transition ( s ,q ,s ) will be denoteds q s . For s, s S , we write s B s

    to express thats can be reached from s by a (possibly empty) sequence of -transitions of B , and s a 1 ...a n s , for a1 , . . . , a n Q , if there exists 2( n +1) states s0 , s 0 , . . . , s n , s n S with s0 = s ,

    sn B s

    and 0 i < n , s i B s

    i and ( s

    i , a i +1 , s i +1 ) .

    Proof. Let A = ( , P, P f , ) and let AL = ( , Q L , Q f L , L )recognizing L . We assume that both A and AL are nor-malized and that their state sets P and QL are disjoint.We construct a HA A = ( , P QL , Q f L , ) recognizingpost R / A (L). For each a , qQL , let L a,q be the regularlanguage in the transition (assumed unique) a (La,q ) q L , and let B a,q = QL , S a,q , i a,q , {f a,q }, a,q be a niteautomaton recognizing La,q . The sets of states S a,q are as-sumed pairwise disjoint. Let S be the disjoint union of allS a,q for all a and q QL .

    For the construction of , we develop a set of transitionrules S (P QL ) S . Initially, we let be theunion 0 of all a,q for a , q QL , and we complete

    iteratively by analyzing the different cases of update rules of R / A. At each step, for each a and q QL , we let B a,qbe the automaton ( P QL , S, i a,q , {f a,q }, ). For the sake of conciseness we make no distinction between an automatonB a,q and its language L(B a,q ).

    REN for every a(x) b(x) R/ A and q QL , we add two-transitions ( i b,q , , i a,q ) and ( f a,q , , f b,q ) to .

    INSrst for every a(x) a( p x) R/ A and q QL , we addone looping transition ( i a,q , p, i a,q ) to .

    INS last for every a(x) a (x p ) R/ A and q QL , we addone looping transition rule ( f a,q , p , f a,q ) to .

    INS into for every a(xy ) a (x p y ) R/ A, qQL and s S reachable from ia,q using the transitions of , we addone looping transition rule ( s,p,s ) to .

    INSbefore for every a (x) p a (x) R/ A, q QL and states S such that L(B a,q ) = and there exists a transition(s ,q ,s ) , we add one looping transition ( s,p,s ) to .

    INSafter for every a (x) a(x) p R/ A, q QL and s S such that L(B a,q ) = and there exists a transition(s ,q ,s ) , we add one looping transition ( s , p , s )to .

    RPL 1 for every a(x) p R/ A, q QL , and s, s S such that L(B a,q ) = , and there exists a transition(s ,q ,s ) , we add one transition ( s,p,s ) to .

  • 8/9/2019 XML Update Formal Verification

    6/21

    DEL for every a(x) () R/ A, q QL , and s, s S such that L(B a,q ) = , and there exists a transition(s ,q ,s ) , we add one -transition ( s,,s ) to .

    Note that some of the above new transitions summarizeseveral insertions. Such a construction are comparable toacceleration techniques used in model checking.

    We iterate the above operations until a xpoint is reached(only a nite number of transitions can be added to thisway). Finally, we let

    := a B a,q q a , q Q, L (B a,q ) = .

    We show in the long version that L(A ) = post R / A (L). 2

    Corollary 1. Typechecking is EXPTIME-complete for UFO reg and PTIME-complete when the output type is given by a deterministic and complete HA.

    Proof. Let in and out be two HA languages (resp. inputand output types), and let R / A by a PTRS. We want toknow whether post R / A ( in ) out . Following Theorem 1,post R / A ( in ) is a HA language. Hence post

    R / A ( in ) out is a HA language. The size of the HA for the complement out can be exponential in the size of the HA for out if this latter HA is non-deterministic, and it is polynomialotherwise. Testing the emptiness of the above intersectionlanguage solves the problem.

    Regarding the lower bounds, the EXPTIME-hardness fol-lows the fact that the inclusion problem is already EXPTIME-complete for ranked tree automata [Seidl 1990], and thePTIME-hardness from the fact that the inclusion problemis PTIME-hard for deterministic HA. 2

    Regarding the problem of type synthesis, if we are givenR / A and an input type in as a HA, Theorem 1 providesin PTIME an output type presented as a HA of polynomialsize.

    3.3 Forward Type Inference for UFO + Rules

    Theorem 1 is not true for all the rules of UFO +: the rulesof UFO + \ UFO reg do not preserve HA languages in general.It is evident for RPL , and the examples below show that itis also the case for RNS and DEL s . However, we prove inTheorem 2 that the rules of UFO + preserve the larger classof CF-HA language.

    Example 4. Let = {a,b,c,c } and let R be the -nite TRS containing the two RNS rst and RNS last rulesc(x) c (ax ), c (x) c(xb). We have post R {c} H () ={c(an bn ) | n 0}, and this set is not a HA language. It fol-lows that post R {c} is not a HA language.

    Let = {a,b,c}, let R be the nite TRS with one DEL s rulec(x) x and let L be the HA language containing exactly the terms c(ac(a . . . c . . . b )b); it is recognized by the HA with the set of transition rules a qa , b qb , c {() , qa q qb} q . We have post R (L) c {a, b}

    = {c(a n bn ) | n 0},hence post R (L) is not a HA language.

    Theorem 2. For all HA A on , PTRS R / A UFO + , and CF-HA language L , post R / A (L) is the language of a CF-HAof size polynomial and which can be constructed in PTIME in the size of R / A and of an CF-HA recognizing L .

    Proof. Let A = ( , P, P f , ) and let us assume that it is nor-malized. Let AL = ( , Q L , Q f L , L ) be a CF-HA recognizing

    L , normalized and without collapsing transitions. The statesets P and QL are assumed disjoint.

    We shall construct a CF-HA extended with collapsing tran-sitions A = ( , P QL , Q f L , ) recognizing post R / A (L).It follows that post R / A (L) is a CF-HA language thanks toLemma 1.

    Very roughly, we dene new CFG Ga,q for the horizontallanguages of the transitions of A , like in Theorem 1, startingfrom the CFG for the transitions of AL and adding a newinitial non-terminal I a,q and new production rules, accordingto cases of rewrite rules in R / A.

    More formally, The set of transitions is constructedstarting from L and analysing the different cases of update rules.

    For each a , q QL , let La,q be the context-free languagein the transition (assumed unique) a(La,q ) q L ,and let Ga,q = ( QL , N a,q , I a,q , a,q ) be a CF grammar inChomski normal form generating La,q . It has alphabet (setof terminal symbols) QL , set of non terminal symbols N a,q ,initial non-terminal I a,q N a,q , and set of production rulesa,q . The sets of non-terminals N a,q are assumed pairwisedisjoint.

    Let us consider one new non-terminal I a,q for each a and q QL . Each of these non terminals aims at becomingthe initial non terminal of the CF grammar in the transitionassociated to a and q in . For technical convenience, wealso add one new non terminal X p for each p P .

    For the construction of , we construct rst below

    a set C of collapsing transitions, and a set of production rules of CF grammar over the set of

    terminal symbols in P QL and the set of non terminals

    N =a ,qQ

    N a,q {I a,q

    } {X p | p P }.

    Initially, we let C = and

    = 0 :=a ,qQ

    P a,q {I a,q := I a,q }

    {X p := p | p P }.

    We now proceed by analysis of the rewrite rules of R / A forthe completion of and C . At each step, for each a andqQL , we let Ga,q be the CF grammar ( P QL , N , I a,q , ),and let L a,q = L(Ga,q ). The production rules of remainin Chomski normal form after each completion step.

    REN for every a(x) b(x) R/ A, q QL , we add oneproduction rule I b,q := I a,q to .RNS rst for every a(x) b( p x) R/ A, q QL , we add one

    production rule I b,q := X p I a,q to .RNS last for every a(x) b(x p ) R/ A, q QL , we add one

    production rule I b,q := I a,q X p to .INS into for every a(xy ) a (x p y ) R/ A, q QL and every

    N N reachable from I a,q using the rules of , we addtwo production rules N := NX p and N := X p N .

    INSbefore for every a(x) p a (x) R/ A, and q QL suchthat L a,q = , we add one collapsing transition p q qto C .

  • 8/9/2019 XML Update Formal Verification

    7/21

    INSafter for every a(x) a(x) p R/ A, and q QL suchthat L a,q = , we add one collapsing transition q p qto C .

    RPL for every a(x) p1 . . . p n R/ A, with n 0, andq QL such that L a,q = , we add one collapsingtransition p1 . . . p n q to C .

    DEL for every a (x) () R/ A and q QL such thatL a,q = , we add one collapsing transition () q to C .

    Note that INSrst , INS last , RPL 1 are special cases of respec-tively RNS rst , RNS last , RPL .

    We iterate the above operations until a xpoint is reached.Indeed, only a nite number of production and collapsingrules can be added. Finally, we let

    := a(L a,q ) q a , q Q, L a,q = C {L a,q q | a (x) x R/ A, L a,q = }.

    We show in the long version that L(A ) = post R / A (L).

    It follows that post R / A (L) is a CF-HA language by Lemma 1.2

    Corollary 2. Typechecking is EXPTIME-complete for UFO + and PTIME-complete when the output type is given by a deterministic and complete HA.

    Proof. The proof for the upper bound works as in Corol-lary 1, because the intersection of a CF-HA and a HA lan-guage is a CF-HA language (there is an effective PTIMEconstruction of an CF-HA of polynomial size), and empti-ness of CF-HA is decidable in PTIME. The arguments of Corollary 1 for lower bounds are still valid here because HAare special cases of CF-HA. 2

    Regarding the problem of type synthesis for a R / A inUFO +, if an input type in is given as a HA or CF-HA, thenTheorem 2 provides in PTIME an output type, presentedas a CF-HA of polynomial size. Unlike HA, CF-HA are notpopular type schemes, but HA solely do not permit to extendthe results of Theorem 1, in particular for the operation RPLof [Xquery UF 2009], as we have seen above.

    Note that post R / A (L) can already be a CF-HA languagewhen the given L is a HA language (see Example 4). Onemay wonder to what extent the CF-HA produced by The-orem 1, given a HA for L and a R / A, is actually an HA.This problem is actually undecidable, since the problem of knowing whether a given CF language is regular is undecid-able.

    3.4 Backward Type Inference for UFO + Rules

    Since UFO + Rules do not preserve HA languages, as for k-pebble tree transducer [Milo et al. 2003] we may attempt toperform typechecking using pre computations (backwardtype inference). The next theorem shows that this is indeedpossible, though EXPTIME, since the class of HA languagesis preserved by pre when using UFO + rules.

    Theorem 3. Given a HA A on and a PTRS R / A UFO + , for all HA language L , pre R / A (L) is the languageof a HA of size exponential and which can be constructed in EXPTIME in the size of R / A and of an HA recognizing L .

    Proof. We consider a normalized and complete HA AL =( , Q L , Q f L , L ) recognizing L . Like in the proof of Theo-rem 1, we assume given, for each a , q QL , a nite au-

    tomaton B a,q = ( QL , S a,q , ia,q , {f a,q }, a,q ) recognizing theregular language La,q in the transition a(La,q ) q L .Unlike the proofs of Theorems 1 and 2, we will incrementallyadd transitions to AL , according to the rules of R / A, until axpoint automaton is reached which recognizes preR / A (L).Every transition added has the form a 0 (B ) q0 where Bbelongs to the smallest set C dened below.More precisely, we construct a nite sequence sequenceof HA A0 , A1 , . . . , Ak whose nal elements language ispreR / A (L), where for all i k, Ai = ( , Q L , Q

    f L , i ). For

    the construction of the transition sets i , we consider theset C of nite automata over QL dened as the smallest setsuch that:

    C contains every B a,q for a , q QL , for all B C, B = ( QL ,S,i ,F, ) and all states s, s S ,

    the automaton B s,s := ( QL ,S,s, {s }, ) is in C, for all B C, B = ( QL ,S,i ,F, ) C, q QL and all

    states s, s S , the automata ( QL ,S,i ,F, { s ,q ,s })and ( QL ,S,i ,F, { s,,s }), respectively denoted byB + s ,q ,s and B + s,,s also belong to C.

    Note that C is nite with this denition, though exponential.For the sake of conciseness, we make no distinction belowbetween an automaton B C and the language L(B )recognized by B . Moreover, we assume that every B Chas a unique nal state denoted f B and its initial state isdenoted i B .

    First, we let 0 = L . The other i are constructedrecursively by iteration of the following case analysis untila xpoint is reached (only a nite number of transitioncan be added in the construction). In the construction weuse an extension of the move relation of HA, from statesto set of states (single states are considered as singletonsets): a(L1 , . . . , L n ) i q (where L1 , . . . , L n QL andq QL ) iff there exists a transition a(L) q i suchthat L1 . . . L n L .

    REN if a(x) b(x) R/ A, B Cand q QL , such thatb(B ) i q, then let i +1 := i {a(B ) q}.

    RNS rst if a(x) b( p x) R/ A, B Cand q, qp QL , suchthat L(Ai , qp ) L(A, p) = and b(qp B ) i q, then i +1 := i {a (B ) q}.

    RNS last if a(x) b(x p ) R/ A, B Cand q, qp QL , suchthat L(Ai , qp ) L(A, p) = and b(B qp ) i q, then i +1 := i {a (B ) q}.

    INS into if a (xy ) a(x p y ) R/ A, B C, s, s are statesof B , and q, qp QL , such that L(Ai , qp ) L(A, p) = ,s B

    qp s , and a(B ) i q then i +1 := i a(B +s,,s ) q .

    INSbefore if a (x) p a (x) R/ A, b, B, B C, s, s arestates of B , and q, qp , q QL such that b(B ) q i ,a(B ) i q , L(A i , qp ) L(A, p) = , s B

    qp q

    s , then i +1 := i b(B + s, q , s ) q .

    INSafter if a(x) a (x) p R/ A, b , B, B C, s, s arestates of B , and q, qp , q QL such that b(B ) q i ,a(B ) i q , L(A i , qp ) L(A, p) = , s B

    q qp s , then i +1 := i b(B + s, q , s ) q .

    RPL if a(x) p1 . . . p n R/ A, b , B, B C, s, s are states of B , and q, q , q1 , . . . , q n QL such that

  • 8/9/2019 XML Update Formal Verification

    8/21

    h, (n, n ) = . iff n = n h, (n, n ) = a iff n is a child of n and h(n ) = ah, (n, n ) = .. iff n is a child of n h, (n, n ) = p/p iff m dom (h) such that

    h, (n, m ) = p and h, (m, n ) = ph, (n, n ) = p p iff h, (n, n ) = p or h, (n, n ) = ph, (n, n ) = p iff n 0 , . . . , n k , n 0 = n, n k = n ,

    and h, (n i , n i +1 ) = p for all i < kh, (n, n ) = p[q] iff h, (n, n ) = p and h, n |= q

    h, n |= p iff n , h, (n, n ) = ph, n |= lab (a) iff h(n ) = ah, n |= qq iff h, n |= q or h, n |= qh, n |= q iff h, n |= q

    Figure 2. Semantics of Path and Node Expressions

    b(B ) q i , a(B ) i q , L(Ai , qj ) L(A, p j ) = for all 1 j n , s B

    q1 ...q n s then i +1 := i b(B +s, q , s )) q .

    DEL if a(x) () R/ A, b , B, B C, s is a state of B , q, q QL such that b(B ) q i , a(B ) i q ,then i +1 := i b(B + s, q , s ) q .

    DEL s if a (x) x R/ A, b , B C, s, s are states of B , q, q QL such that b(B ) q i , a(B s,s ) i q ,then i +1 := i b(B + s, q , s ) q .

    Note that INSrst , INS last , RPL 1 are special cases of respec-tively RNS rst , RNS last , RPL . Since no state is added to theoriginal automaton AL and all the transitions added in-volve horizontal languages of the set C, which is nite, theiteration of the above operations terminates with an au-tomaton A . We show in the long version of this paper thatL(A ) = preR / A (L). 2

    4. Selection of Target NodesIn general, an XML update operation is applied to nodes(called target nodes in [Xquery UF 2009]) selected in a doc-ument using specied XPath 2.0 or XQuery expressions. Inthis section, we study an extension of PTRS which permitsto model such a feature.

    4.1 Controlled Rewriting

    We consider a fragment XP of Regular XPath [ten Cate 2006]with the following path expressions (where a ):

    p := . a .. p/p p p p p[q]

    and the node expressions:

    q := p lab (a) qq qq q

    The satisfaction of a path expression p by a hedge h and apair of nodes n, n dom (h), denoted by h, (n, n ) = p,and of a node expression q by a hedge h and one noden dom (h), denoted h, n |= q, are dened in Figure 2.Given a path expression p, we use below the abbreviation// p for the path expression ( a 1 . . . a k )/p (assuming = {a1 , . . . , a k }) and we shall omit a . at the beginning of an expression.

    A controlled term rewriting system over is a set R of controlled rewrite rules of the form r at where, r H( , X ) and is a path expression of XP . The rewriterelation of R is dened as the rewrite relation of uncontrolled

    systems (see Section 2.3) by furthermore restricting therewrite nodes to nodes dened by . More precisely, h Rh , iff there exists a controlled rule r at in R , asubstitution , and a context u such that the node n labelledby the variable xu in the context u is selected by , i.e. thereexists a root n 0 of h such that h, (n 0 , n ) = , and h = u[],h = u[r ]. Note that for applying a rule r at it isexpected for the path expression and the lhs to matchthe same labels.

    A controlled term rewriting system parameterized by a HA(CPTRS ) over is a nite set of controlled and parame-terized rewrite rules r at , where and r are like inthe denition of PTRS in Section 2.3 and is as above. Therewrite relation of a CPTRS parameterized by A is denedas the rewrite relation of the associated CTRS R [A] like inSection 2.3.

    4.2 Selection by Label

    The PTRS rewrite rules of Section 3 permit to dene aminimal criteria for the selection of rewrite nodes (nodewhere the updates operations are applied), by specifying thelabel of the selected node. Indeed, all the left-hand-sides of rules have the form a (x) (or a(xy ) for INS into ). For instance,in the case of a rule of INSrst : a(x) a( p x), a term of type p (w.r.t. to the given HA A) can only be inserted below anode labeled with a . For a rule INSafter : a(x) a(x) p, a termof type p (w.r.t. to the given HA A) can only be inserted atthe sibling position next to a node labeled with a , and forDEL: a(x) (), the term to be deleted must have a labela at its root node. It means that a PTRS rule a(x) r issemantically equivalent to the CPTRS rule a (x) r at //a .

    4.3 Selection by Label and Parents Label

    For the rules with a hedge at right-hand-side (like INSbefore ,INSafter , RPL 1 , DEL , DEL s ...), we can rene the selection by

    furthermore constraining the label at the parent of the nodewhere the update is performed, obtaining the generalizedrules of Figure 3. Indeed, every PTRS rule of the formb(y a(x) z) b(y r z ) in Figure 3 is semantically equivalentto the CPTRS rule a (x) r at //b/a .

    Example 5. The DEL rule

    hospital (y patient (x) z) hospital (y z)

    can be used to delete a patient only if it is located under a hospital node selected by the path expression // hospital [./ patient ].It corresponds to the CPTRS rule patient (x) () at// hospital / patient .

    Let us call UFO + the class of all PTRS with rules inUFO + or of a kind described in Figure 3. The result of Theorem 3 for backward type inference can be straightfor-wardly extended from UFO + to UFO +. For instance, dur-ing the iteration, if the PTRS R / A contains a rule INS beforeb(y a(x) z) b(y p a (x) z), and if B, B C, s, s are statesof B , and q, qp , q QL such that s B

    qp q

    s , b(B ) q is oneof the current transitions and a(B ) can reach q and someterm of L(A, p) can reach qp using the current transitions,then we add the transition b B { s, q , s } q.

    Theorem 4. Given a HA A on and a PTRS R / A UFO + , for all HA language L , pre R / A (L) is the languageof a HA of size exponential and which can be constructed in EXPTIME in the size of R / A and of an HA recognizing L .

  • 8/9/2019 XML Update Formal Verification

    9/21

    b(y a(x ) z) b(y p a (x) z) INS beforeb(y a(x ) z) b(y a(x) p z) INS afterb(y a(x ) z) b(y p z) RPL 1 b(y a(x) z) b(y p1 . . . p n z) RPL b(y a(x ) z) b(y z) DEL b(y a(x) z) b(y x z ) DEL s

    Figure 3. PTRS rules for Update Facility Operations with Control of Parents Label

    Proof. The proof is very close to the proof of Theorem 3.Indeed, in the above construction for Theorem 3, we con-sider the applications of rules INSbefore , INSafter , RPL , DELand DEL s under any symbol b. Here instead, we can re-strict the construction to the application under the symbolspecied in the lhs of the rewrite rules. More precisely, letus just detail below the cases of the construction which aremodied.INS before if b(y a(x) z) b(y p a (x) z) R/ A, B, B C, s, s

    are states of B , and q, qp , q QL such that b(B ) q i , a(B ) i q , L(Ai , qp ) L(A, p) = , s B

    qp q

    s ,then i +1 := i b(B + s, q , s ) q .

    INSafter if b(y a(x) z) b(y a(x) p z) R/ A, B, B

    C, s, s

    are states of B , and q, qp , q QL such that b(B ) q i , a(B ) i q , L(Ai , qp ) L(A, p) = , s B

    q qp s ,then i +1 := i b(B + s, q , s ) q .

    RPL if b(y a (x) z) b(y p1 . . . p n z) R/ A, B, B C,s, s are states of B , and q, q , q1 , . . . , qn QL such thatb(B ) q i , a (B ) i q , L(Ai , qj ) L(A, p j ) = for all 1 j n , s B

    q1 ...q n s then i +1 := i b(B +s, q , s )) q .

    DEL if b(y a(x) z) b(yz ) R/ A, B, B C, s is a stateof B , q, q QL such that b(B ) q i , a (B ) i q ,then i +1 := i b(B + s, q , s ) q .

    DEL s if b(y a(x) z) b(yxz ) R/ A, B C, s, s are statesof B , q, q QL such that b(B ) q i , a (B s,s ) iq , then i +1 := i b(B + s, q , s ) q .

    The rest of the proof is the same as for Theorem 3. 2

    4.4 Selection by XPath Expressions

    Allowing more navigation axis, like the parent axis, in thecontrol expressions of the CPTRS rules leads to theundecidability of reachability, hence of typechecking.

    More precisely, let XP 1 be the following fragment of pathexpressions of XP (where a ):

    p1 := . a .. p1 /p 1 p1 p1 p1 [lab (a)]

    Theorem 5. Reachability is undecidable for CPTRS with

    rules of the form rat

    with r UFO

    reg of typeREN or RPL 1 , and XP 1 .

    Proof. The proof is very close to the proof of undecidabilityof inconsistency of update ACPs in [Fundulaki and Maneth2007]. We reduce the halting problem of a deterministic Tur-ing Machine (TM) M that work on half a tape (unboundedon the right). Let = {0, 1, } be the tape alphabet ( isthe blank symbol) and S = {s1 , s 2 , . . . , s n } be the state setof M .We consider the alphabet := {g} (S ) (S ) for representing the congurations of M as binary terms.A symbol of the form s, a with s S and a will beused to indicate the position of the head of M . For instance,

    the TM conguration with tape abcde . . ., symbol d underhead, and state s will be represented by the following binaryterm of T (): g(a g (b g(c g( s, d g(eg( ))))).We also use a trivial HA automaton A = ( , Q , Q , ) torecognize some particular terms: every term of the formg( r, , ) (with r S ) will be recognized in a state pg ( r, , ) Q , and it is the only term recognized in thisstate.

    We dene a CPTRS R / A such that every transition of Mcan be simulated by a sequence of (at most three) rewritesteps with R / A.

    For each TM instruction of type: In state s reading a goto state r and write b, we dene the following uncontrolledPTRS rule (of type REN ): s, a (x) r, b (x).For each TM instruction of type: In state s reading a goto state r and move left , we dene the following CPTRSrules:

    1. b(x) r, b (x) at // s, a /../../b (for all b {0, 1}),(the symbol b at the left of the head - marked by s, a -is renamed into the temporary symbol r, b )

    2. s, a (x) a (x) at // r, b /g/ s, a( s, a is renamed into a if it has r, b at its left),

    3. r, b (x) r, b (x) at //a/../../ r, b ( r, b is renamed into r, b , which marks the new posi-tion of the head).

    Note the use of the XPath expressions (selecting rewritenodes) for checking the neighbor symbol and ensuring acorrect chaining of the rewrite steps. Note also that forthe rst rule, if a is the rst symbol of the tape, then therule cannot be applied because of the path expression, thiscorresponds to the fact that the Turing machine cannotmove to the left of the beginning of the tape.

    For a transition of M moving to the right, we also add aRPL rule for moving the marker. More precisely, for eachinstruction of type: In state s reading a go to state r and move right , we dene the following CPTRS rules of typeREN and RPL 1 (we recall that pg ( r, , ) is a state of A):

    b(x) r, b (x) at // s, a /../g/g/bfor all b {0, 1}

    (x) pg ( r, , ) at // s, a /../g/s, a (x) a(x) at // r, b /../../ s, ar, b (x) r, b (x) at //a/../g/g/ r, b

    The TM instruction will be executed in three rewrite steps:rst the symbol at position at the right of the head (markedby s, a ) is renamed from b into the temporary symbolr, b . Next s, a is renamed into a and nally r, b is

    renamed into r, b , which marks the new position of thehead. The tests in the path expressions for the selection of rewrite nodes will ensure a correct chaining of the rewritesteps: at each step, we check the neighbor position in orderto test that the previous step has been applied.

  • 8/9/2019 XML Update Formal Verification

    10/21

    For all couple of TM congurations T 1 , T 2 and their respec-tive term encodings t1 , t 2 , there is a sequence of transitionsfrom T 1 to T 2 with M iff t 1 R / A t2 .

    Assuming (wlog) that M has unique initial and nal con-gurations, we can conclude. 2

    5. ACP for XML UpdatesIn this last section we study some models of Access ControlPolicies (ACP) for the update operations dened in Sec-tion 3, and verication problems for these ACP. We con-sider two kind of formalisms from the literature for thespecication of XML ACPs. The rst formalism is the mostwidespread. It consists in dening an ACP as a set of up-dates rules, partitioned into authorized and forbidden oper-ations. The second one is a most recent proposal of [Fundu-laki and Maneth 2007] where the ACP is dened by addingsecurity annotations to a DTD.

    5.1 Local Consistency of Rule-based ACPs

    An ACP for XML updates can be dened by a pair

    (R a / A, R f / A) of PTRS, where R a contains allowed opera-tions and R f contains forbidden operations (see e.g. [Bravoet al. 2008]). Such an ACP is called inconsistent [Bravo et al.2008; Fundulaki and Maneth 2007] if some forbidden oper-ation can be simulated through a sequence of allowed oper-ations, i.e. if there exists t, u T () such that t R f / A uand t R a / A u .

    Example 6. Assume that in the hospital document of ex-ample 2, it is forbidden to rename a patient , that is the fol-lowing update of RPL 1 is forbidden: patient (y name (x) z) patient (y pn z). If the following updates are allowed: patient (x) () for deleting a patient , and hospital (x) hospital (x p pa ) toinsert a patient , then we have an inconsistency in the senseof [Bravo et al. 2008] since the effect of the forbidden update

    can be obtained by a combination of allowed updates.Using the results of Section 3, we can decide the aboveproblems individually for terms of T (). More precisely, wesolve the following problem called local inconsistency : givena HA A over and a term t T (), an ACP ( R a / A, R f / A)is locally inconsistent if there exists u T () such thatt R f / A u and t

    R a / Au?

    Theorem 6. Local inconsistency is decidable in PTIME for UFO + ACPs.

    Proof. It can be easily shown that the set {u T () |t R f / A u} is the language of a HA of size polynomialand constructed in PTIME on the sizes of A, R f and t . ByTheorem 2, post R a / A ({t}) is the language of a CF-HA of polynomial size and constructed in polynomial time on thesizes of A, R a and t . The ACP is locally inconsistent w.r.t.t iff the intersection of the two above language is not empty,and this property can be tested in PTIME. 2

    Inconsistency is undecidable [Fundulaki and Maneth 2007]for an ACP dened by a pair ( R a / A, R f / A) of CPTRS of Section 4. Moreover, in this setting, the problem of reach-ability (whether a given term t can be obtained from agiven term s using instances of rules of R a / A which arenot in R f / A) is also undecidable [Moore 2009], therefore lo-cal consistency is undecidable as well. It is an open questionwhether inconsistency is decidable or not for PTRS of typeUFO reg or UFO +.

    5.2 Local Consistency of DTD-based ACPs

    Following the principle of DTD-based ACPs [Fan et al.2004], [Fundulaki and Maneth 2007] have proposed the lan-guage XACU annot for the denition of ACP for XML updatesin presence of a DTD D . The idea is to add to D somesecurity annotations specifying the authorizations for theupdate operations for XML documents valid for D . In [Fanet al. 2004], the annotations are mapping from pairs of DTDelements types ( b, a) to authorization, specifying the rightto access a nodes which are children of b nodes. Such an-notations can be compared to the rewrite node selectionpresented in Section 4.3.

    The formalism of [Fan et al. 2004; Fundulaki and Maneth2007] imposes the condition that every document t to whichwe want to apply an update operation (under the givenACP) must be valid for the DTD D .

    In our rewrite-based formalism, the above condition may beexpressed by adding global constraints to the parameterizedrewrite rules of Section 2.3. These global constraints restrictthe rewrite relation to terms in a given HA language. The-

    orem 7 below shows that, unfortunately, adding such con-straints to parameterized rewrite rules of type REN or RPLmakes the reachability undecidable.

    Given a HA A = ( ,Q ,Q f , ), a term rewriting systemover , parameterized by A and with global constraints(PGTRS ) is given by a PTRS, denoted R / A, (see Sec-tion 2.3) and L T () an HA language. We say that L isthe constraint of R . The rewrite relation generated by thePGTRS is dened as the restriction of the relation denedin Section 2.3 to ground terms such that for the applicationof a rule r R/ A to a term t , we require that t L .

    Theorem 7. Reachability is undecidable for PGTRSs with rules in UFO reg and constraint given by a non recursive

    DTD.

    Proof. The proof is a variant of the one given by A. Spel-ten [Spelten 2006] for subterm and at prex rewriting. Likein the proof of Theorem 5, we will reduce the halting prob-lem of a Deterministic Turing Machine (TM) M that workon half a tape (unbounded on the right). However, congu-rations are now encoded as at terms.

    We consider the same tape alphabet = {0, 1, }, ( is theblank symbol) and state set S = {s1 , s 2 , . . . , s n } of M as inthe proof of Theorem 5, and the following alphabet forthe representation of the congurations of M .

    := {g} {0, 1, } (S ) (S ) }.

    For instance, the TM conguration with tape abcde . . .,symbol d under head, state s , will be represented by thefollowing at term of T (): g(abc s, d e ).

    We shall also use a trivial HA automata A = ( , Q , Q , })(as in the proof of Theorem 5) which recognizes only con-stant symbols by taking Q = { p | } and = { p | }.

    We dene a PGTRS R / A such that every transition of M can be simulated by a sequence of (at most three)rewrite steps with R / A. Let us rst introduce some standardauxiliary PTRS rules and some word regular languages forcontrolling rule applications.

  • 8/9/2019 XML Update Formal Verification

    11/21

    For each instruction of M of type: In state s reading a goto state r and write b, we dene the following TRS rule:

    s, a (x) r, b (x)We also dene the regular word language

    L s,a = s, a .

    For each instruction of M of type: In state s reading a goto state r and move right , we dene the following PTRSrules of types REN and INSafter (note that p is a state of A):

    b(x) r, b (x) for all b {0, 1, }(x) (x) ps, a (x) a(x)r, b (x) r, b (x) for all b.

    We also dene the regular word languages:L s,a = s, a L s,a = s, a L s,a r,b = s, a r, b for all b.

    For each instruction of M of type: In state s reading a go

    to state r and move left , we dene the following TRS rules:b(x) r, b (x) for all b {0, 1}s, a (x) a(x)r, b (x) r, b (x ) for all b {0, 1}

    We also dene the regular word languages:L s,a = s, a L s,a = s, a L r,b s,a = r, b s, a for all b {0, 1}.

    The constraint of the PGTRS will be dened by the nonrecursive DTD D : g L where L is the nite union of the regular languages associated to the instructions of M asabove. Since the machine to be simulated is deterministic,the union is disjoint.

    Our nal PGTRS is given by R / A and L so that the rewriterules in R / A can only be applied to terms satisfying theDTD D . With the above constraint, the PGTRS rules of R / A can only be applied to terms valid for the DTD D ,ensuring a correct chaining for the application of these rules.

    By case inspection we can show that for any couple of TMcongurations T 1 , T 2 and their respective term encodingst1 , t 2 , there is a sequence of transitions from T 1 to T 2 iff t1 R / A t 2 . The theorem follows. 2

    The result can be contrasted with the decidability of reach-ability for ground rewriting [Gilleron 1991].

    In [Abiteboul et al. 2009] the author study the more generalproblem of satisability for active XML documents in thecontext and unranked unordered terms. This property isshown decidable for insertions constrained by an unorderedDTD, but undecidable when they are constrained by anunordered HA.

    Corollary 3. Local inconsistency is undecidable for PGTRS with rules in UFO + and with constraint given by a non re-cursive DTD.

    6. ConclusionWe have proposed a model for the primitive XML updatesoperations of [Xquery UF 2009] based on term rewriting sys-tems parameterized by hedge automata (PTRS), and stud-

    ied the problems of type inference and typechecking for arbi-trary long sequences of such operations. We have also stud-ied some extensions of the model for selecting the rewritepositions with XPath expressions (CPTRS) and restrictingof the application of update operations to documents con-forming to a xed non recursive DTD (PGTRS). Finally, wehave shown how to apply our results to show the decidabil-

    ity of the property of local inconsistency of access controlpolicies for XML updates.

    One of our main results of forward type inference (Theo-rem 2) requires to use CF-HA (a strict extension of hedgeautomata) for output types. One may wonder whetherthis result could be adapted to compute regular over-approximations of output types, leading to an approximat-ing forward type inference algorithm, in an approach similarto e.g. [Touili 2007].

    Reachability is undecidable for CPTRS rules controlled withXPath expressions with child and parent axis. The casesof CPTRS rules controlled with a downward XPath frag-ment, or a regular downward XPath fragment, deserve tobe considered. Indeed, a decidability result for typecheck-

    ing in these settings should give a novel approach (usingCPTRS) to known problems on other tree transformationsformalisms (like MTTs or XSLT).

    The W3C recommendation [Xquery UF 2009] denes somepriorities for the application of update operations (for in-stance REN has higher priority than DEL). The inuence of such restriction on type inference should be investigated. Fi-nally, it could also be interesting to apply a similar approachfor studying updates of unranked unordered trees.

    AcknowledgmentsThe authors wish to thank Serge Abiteboul, Pierre Bourhis,Sebastian Maneth and Luc Segoun for discussions discus-sions about XML updates and access control, and the anony-mous referees for their numerous comments and suggestions.

    ReferencesS. Abiteboul, P. Bourhis, and B. Marinoiu. Satisability and

    relevance for queries over active documents. In Proceedingsof the 28th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS) , pages 8796. ACM,2009.

    M. Benedikt and J. Cheney. Semantics, Types and Effects forXML Updates. In Proceedings of the 12th International Symposium, Database Programming Languages (DBPL) ,volume 5708 of LNCS , pages 117, Springer, 2009.

    V. Benzaken , G. Castagna and A. Frisch. CDuce: an XML-centric general-purpose language. In Proceedings of the8th ACM SIGPLAN International conference on Functional programming , pages 51-63, ACM, 2003.

    A. Bouajjani, B. Jonsson, M. Nilsson, and T. Touili. RegularModel Checking. In Proceedings of the 12th Int. Conference on Computer Aided Verication (CAV) , volume 1855 of LNCS ,pages 403418. Springer, 2000.

    A. Bouajjani and T. Touili. On computing reachability sets of process rewrite systems. In Proceedings 16th International Conference Term Rewriting and Applications (RTA) , volume3467 of LNCS , pages 484499. Springer, 2005.

    L. Bravo, J. Cheney, and I. Fundulaki. ACCOn: Checking Con-sistency of XML Write-Access Control Policies. In Proceedings11th International Conference on Extending Database Tech-nology (EDBT) , volume 261 of ACM International ConferenceProceeding Series , pages 715719. ACM, 2008.

  • 8/9/2019 XML Update Formal Verification

    12/21

    S. C. Lim and S. H. Son. Access Control of XML DocumentsConsidering Update Operations. In Proceedings of ACM Workshop on XML Security , ACM, 2003.

    D. Chamberlin and J. Robie. XQuery Update Facility 1.0.W3C Candidate Recommendation. http://www.w3.org/TR/xquery-update-10/ , 2009.

    H. Comon, M. Dauchet, R. Gilleron, C. L oding, F. Jacquemard,

    D. Lugiez, S. Tison, and M. Tommasi. Tree automatatechniques and applications. Available on: http://tata.gforge.inria.fr/ , 2007.

    E. Damiani, S. D. C. di Vimercati, S. Paraboschi, and P. Sama-rati. Securing XML Documents. In Proceedings of the 7th International Conference on Extending Database Technology (EDBT) , volume 1777 of LNCS , pages 121135. Springer,2000.

    N. Dershowitz and J. P. Jouannaud. Rewrite systems. InHandbook of Theoretical Computer Science (Vol. B: Formal Models and Semantics) , pages 243320, Amsterdam, North-Holland, 1990.

    J. Engelfriet, S. Maneth, and H. Seidl. Deciding Equivalenceof Top-Down XML Transformations in Polynomial Time. J.Comput. Syst. Sci. , 75(5):271286, 2009.

    J. Engelfriet and H. Vogler. Macro Tree Transducers. J. Comp.Syst. Sci. , 31:71146, 1985.W. Fan, C.-Y. Chan, and M. Garofalakis. Secure XML Querying

    with Security Views. In Proceedings of the 2004 ACM SIGMOD international conference on Management of data (SIGMOD) , pages 587598, ACM, 2004.

    G. Feuillade, T. Genet, and V. Viet Triem Tong. ReachabilityAnalysis over Term Rewriting Systems. Journal of Automated Reasoning , 33 (3-4):341383, 2004.

    A. Frisch and H. Hosoya. Towards Practical Typechecking forMacro Tree Transducers. In Proceedings of the 11th Inter-national Symposium on Database Programming Languages(DBPL) , volume 4797 of LNCS , pages 246260. Springer,2007.

    I. Fundulaki and S. Maneth. Formalizing XML Access Control for

    Update Operations. In Proceedings of the 12th ACM sympo-sium on Access control models and technologies (SACMAT) ,pages 169174, ACM, 2007.

    P. A. Gardner, G. D. Smith, M. J. Wheelhouse, and U. D. Zarfaty.Local Hoare Reasoning about DOM. In Proceedings of the 27th ACM SIGMOD-SIGACT-SIGART Symposium on Principlesof Database Systems (PODS) , pages 261270, ACM, 2008.

    T. Genet and V. Rusu. Equational approximations for treeautomata completion. Journal of Symbolic Computation, 45(5):574597, 2010.

    R. Gilleron. Decision problems for term rewrite systems andrecognizable tree languages. In 8 t h Annual Symposium on Theoretical Aspects of Computer Science (STACS) , volume480 of LNCS , pages 148159, Springer, 1991.

    F. Jacquemard and M. Rusinowitch. Closure of Hedge-Automata

    Languages by Hedge Rewriting. In Proceedings of the19th International Conference on Rewriting Techniques and Applications (RTA) , volume 5117 of LNCS , pages 157171,Springer, 2008.

    M. Kay. XSL Transformations (XSLT) 2.0. W3C workingdraft, World Wide Web Consortium, 2003. Available athttp://www.w3.org/TR/xslt20 .

    C. L oding. Ground Tree Rewriting Graphs of Bounded TreeWidth. In Proceedings of the 19th Annual Symposium on Theoretical Aspects of Computer Science (STACS) , volume2285 of LNCS , pages 559570. Springer, 2002.

    C. L oding and A. Spelten. Transition Graphs of Rewriting Sys-tems over Unranked Trees. In Proceedings 32nd International Symposium on Mathematical Foundations of Computer Sci-

    ence (MFCS) volume 4708 of LNCS , pages 6777, Springer,2007.

    S. Maneth, A. Berlea, T. Perst, and H. Seidl. XML Type Checkingwith Macro Tree Transducers. In 24th ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems(PODS) , pages 283294, ACM, 2005.

    S. Maneth, T. Perst, and H. Seidl. Exact XML Type Checking

    in Polynomial Time. In Proceedings of the 11th International Conference on Database Theory (ICDT) , volume 4353 of LNCS , pages 254268, Springer, 2007.

    W. Martens and F. Neven. Frontiers of Tractability forTypechecking Simple XML Transformations. In Proceedingsof the Twenty-third ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS) , pages2334, ACM, 2004.

    T. Milo, D. Suciu, and V. Vianu. Typechecking for XMLTransformers. J. of Comp. Syst. Sci. , 66(1):6697, 2003.

    N. Moore. The Halting Problem and Undecidability of DocumentGeneration under Access Control for Tree Updates, InProceedings of the 3d International Conference on Languageand Automata Theory and Applications (LATA) , volume 5457of LNCS , pages 601-613, Springer, 2009.

    M. Murata. Hedge Automata: a Formal Model for XMLSchemata. Web page, 2000.M. Murata, D. Lee, and M. Mani. Taxonomy of XML Schema

    Languages using Formal Language Theory. In Extreme MarkupLanguages , 2000.

    M. Murata, A. Tozawa, M. Kudo, and S. Hada. XML AccessControl using Static Analysis. ACM Trans. Inf. Syst. Secur. ,9(3):292324, 2006.

    H. Ohsaki, H. Seki, and T. Takai. Recognizing Boolean ClosedA-tree languages with Membership Conditional RewritingMechanism. In Proc. of the 14th Int. Conference on Rewriting Techniques and Applications (RTA) , volume 2706 of LNCS ,pages 483498. Springer, 2003.

    T. Perst and H. Seidl. Macro Forest Transducers. Information Processing Letters , 89:141149, 2004.

    T. Schwentick. Automata for XML - A Survey. J. Comput. Syst.Sci. , 73(3):289315, 2007.H. Seidl. Deciding Equivalence of Finite Tree Automata. SIAM

    Journal of Computing , 19(3):424437, 1990.A. Spelten. Rewriting Systems over Unranked Trees. Masters

    thesis, Diplomarbeit, RWTH Aachen, 2006.B. ten Cate. The Expressivity of XPath with Transitive Closure.

    In Proceedings of the 26th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS) , pages328337, ACM, 2006. ISBN 1-59593-318-2.

    T. Touili. Computing Transitive Closures of Hedge Transforma-tions. In In Proceedings of the 1st International Workshop on Verication and Evaluation of Computer and Communication Systems (VECOS) , eWIC Series, British Computer Society,2007.

    A. Tozawa. Towards Static Type Checking for XSLT. InProceedings of the 2001 ACM Symposium on Document engineering (DocEng) , pages 1827, ACM, 2001.

  • 8/9/2019 XML Update Formal Verification

    13/21

    A. Proof of Lemma 1In this proof and the following ones, we describe the CF grammars used for dening the horizontal languages of CF-HA transitions as tuples G = ( , N , I , ), where is anite alphabet (set of terminal symbols), N is a set of nonterminal symbols, I N is the initial non-terminal, and N ( N ) is a set of production rules.

    Lemma 1 [Jacquemard and Rusinowitch 2008]. For every extended CF-HA over with collapsing transitions A, thereexists a CF-HA A without collapsing transitions such that L(A ) T () = L(A) T () . Proof. Let G = ( Q, N , I , )and G1 = ( Q, N 1 , I 1 , 1 ) be two CF grammars over the samenite alphabet Q . Below, G and G1 are respectively meantto generate the languages L and L1 of CF HA transitionsL q and a(L1 ) p. We assume wlog that the setsof non terminals N and N 1 are disjoint. Let q Q be aterminal symbol and let X q be a fresh non terminal symbol.We consider below the CF grammar

    G1 Gq := Q, N 1 N {X q }, I 1 , 1 [q X q ][q X q ]{X q := q, X q := I }

    where [q X q ] denotes the set of production rules of where every occurrence of the terminal symbol q is replacedby the non-terminal X q . Using this construction, we can getrid of collapsing transitions in CF HA.

    We assume that A is normalized with state set Q and foreach a and p Q , we let Ga,p by the CF grammargenerating the language La,p in the transition (assumedunique) a(La,p ) p of A. In order to construct A out of A, we perform the following operation for every collapsingtransition L q of A: (i.) delete L q (ii.) for each a and p Q , replace Ga,p by Ga,p Gq where G is a CF grammargenerating L . 2

    B. Proof of Theorem 1We show in this section the correctness of the automataconstruction presented in Section 3.2. Let us rst recall thestatement of Theorem 1 and the construction.

    Theorem 1. Given a HA A on and a PTRS R / A UFO reg , for all HA language L , post R / A (L) is the languageof an HA of size polynomial and which can be constructed in PTIME in the size of R / A and of an HA recognizing L .

    Proof. Let A = ( , P, P f , ) and let AL = ( , Q L , Q f L , L )recognizing L . We assume that both A and AL are nor-malized and that their state sets P and QL are disjoint.We construct the HA A = ( , P QL , Q f L , ) recognizingpost R / A (L). For each a , q QL , let La,q be the regularlanguage in the transition (assumed unique) a(La,q ) q L , and let B a,q = QL , S a,q , i a,q , {f a,q }, a,q be a niteautomaton recognizing La,q . The sets of states S a,q are as-sumed pairwise disjoint. Let S be the disjoint union of allS a,q for all a and q QL .

    For the construction of , we develop a set of transitionrules S (P QL ) S . Initially, we let be theunion 0 of all a,q for a , q QL , and we complete iteratively by analyzing the different cases of update rules of R / A. At each step, for each a and q QL , we let B a,qbe the automaton ( P QL , S, i a,q , {f a,q }, ). For the sake of conciseness we make no distinction between an automatonB a,q and its language L(B a,q ).

    REN for every a(x) b(x) R/ A and q QL , we add two-transitions ( i b,q , , i a,q ) and ( f a,q , , f b,q ) to .

    INSrst for every a(x) a( p x) R/ A and q QL , we addone looping transition ( i a,q , p, i a,q ) to .

    INS last for every a(x) a (x p ) R/ A and q QL , we addone looping transition rule ( f a,q , p , f a,q ) to .

    INS into for every a(xy ) a (x p y ) R/ A, qQL and s S reachable from ia,q using the transitions of , we addone looping transition rule ( s,p,s ) to .

    INSbefore for every a (x) p a (x) R/ A, q QL and states S such that L(B a,q ) = and there exists a transition(s ,q ,s ) , we add one looping transition ( s,p,s ) to .

    INSafter for every a (x) a(x) p R/ A, q QL and s S such that L(B a,q ) = and there exists a transition(s ,q ,s ) , we add one looping transition ( s , p , s )to .

    RPL 1 for every a(x) p R/ A, q QL , and s, s S such that L(B a,q ) = , and there exists a transition(s ,q ,s ) , we add one transition ( s,p,s ) to .

    DEL for every a (x) () R/ A, q QL , and s, s S such that L(B a,q ) = , and there exists a transition(s ,q ,s ) , we add one -transition ( s,,s ) to .

    We iterate the above operations until a xpoint is reached(only a nite number of transitions can be added to thisway). Finally, we let := a B a,q q a , q Q, L (B a,q ) = .

    Let us show now that L(A ) = post R / A (L).

    Lemma 2. L(A ) post R / A (L).

    Proof. We show more generally that for all t L(A , q),

    q QL , there exists u L(AL , q) such that u

    R t . Theproof is by induction on the multiset M of the applicationsof horizontal transitions of not in 0 in a run of A on tleading to state q.

    Base case. If all the horizontal transitions are in 0 , then byconstruction t L(AL , q) and we are done.

    Induction step. We analyse the cases causing the addition of a transition of \ 0 .

    REN : let t L(A , q) (q QL ), and assume that an -transition ( i b,q , , i a,q ) is used in a run of A on t , and thatthis -transition was added to because a (x) b(x) R / A. Let

    t = t [b(h)] A

    t [b(q1 . . . qn )] A t [q0 ]

    A q

    be a reduction of A such that the above -transition isinvolved in the step t[b(q1 . . . qn )] A t [q0 ], where thethe transition b(B b,q 0 ) q0 is applied. Hence q1 . . . qn L(B b,q 0 ), with i b,q B b,q 0

    q1 ...q n f b,q , and the rst step in

    this computation is ( i b,q , , i a,q ). The last step must be(f a,q , , f b,q ), using an -transition added to in the samestep as ( i b,q , , i a,q ). By deleting these rst and last steps, weget i a,q B a,q 0

    q1 ...q n f a,q , hence q1 . . . qn L(B a,q 0 ). Therefore,

    we have a reduction t = t[a (h)] A t[a(q1 . . . qn )] A t[q0 ] A q (hence t

    L(A , q)) with a measure M

  • 8/9/2019 XML Update Formal Verification

    14/21

    strictly smaller than the above reduction for the recognitionof t . By induction hypothesis, it follows that there existsu L(AL , q) such that u R / A t

    . Since t = t [a(h)] R / At[b(h)] = t , with a (x) b(x), we conclude that u R / A t .

    INSrst : let t L(A , q) (q QL ), and assume that an

    transition ( i a,q , p, i a,q ) is used in a run of A

    on t , and thatthis transition was added to because a(x) a ( p x) R / A. Let

    t = t [a(t p h)] A t[a ( p q1 . . . qn )] A t[q0 ]

    A q

    be a reduction of A , with tp L(A, p), such that the abovetransition is involved in the step t[a ( p q1 . . . qn )] A t [q0 ],where the the transition b(B a,q 0 ) q0 is applied. Hence p q1 . . . qn L(B a,q 0 ), with i a,q B a,q 0

    p q 1 ...q n f a,q , and the

    rst step in this computation is ( i a,q , p, i a,q ). By deletingthis rst step, we get i a,q B a,q 0

    q1 ...q n f a,q , hence q1 . . . qn

    L(B a,q 0 ). Therefore, we have a reduction t = t [a(h)]

    A

    t[a (q1 . . . qn )] A t [q0 ]

    A q (hence tL(A , q)) with a

    measure M strictly smaller than the above reduction forthe recognition of t . By induction hypothesis, it followsthat there exists u L(AL , q) such that u R t

    . Sincet = t [a(h)] R / A t[a(t p h)] = t , with a(x) b(x), weconclude u R t .

    INS last : this case is similar to the previous one.

    INS into : let t L(A , q) (q QL ), and assume that antransition ( s,p,s ) is used in a run of A on t , and that thistransition was added to because a(xy ) a(xpy ) R/ A.Let

    t = t [a(h t p )] A t [a(q1 . . . qn p q1 . . . q

    m )] A t[q0 ]

    A q

    be a reduction of A , with tp L(A, p), such that the abovetransition ( s,p,s ) is involved in the step t[a (q1 . . . qn p q1 . . . q m )] A t[q0 ], where the transition b(B a,q 0 ) q0 is applied. Moreprecisely, assume that q1 . . . qn p q1 . . . q m L(B a,q 0 ), be-cause i a,q B a,q 0

    q1 ...q n s pB a,q 0s B a,q 0

    q 1 ...q

    m f a,q . By deleting the

    middle step ( s,p,s ), we get i a,q B a,q 0q1 ...q n q

    1 ...q

    m f a,q , hence

    q1 . . . qn q1 . . . q m L(B a,q 0 ). Therefore, we have a reductiont = t [a(h)] A t[a(q1 . . . qn q

    1 . . . q m )] A t[q0 ]

    A q(hence t L(A , q)) with a measure M strictly smaller thanthe above reduction for the recognition of t . By inductionhypothesis, it follows that there exists u L(AL , q) suchthat u R / A t

    . Since t = t [a(h)] R / A t [a(h t p )] = t ,with a(xy ) b(xpy ), we conclude that u R / A t .

    INSbefore : let t L(A , q) (q QL ), and assume that antransition ( s,p,s ) is used in a run of A on t , and that thistransition was added to because a(x) p a (x) R/ Aand because there exists ( s, q0 , s ) for some q0 QLwith L(B a,q 0 ) = . Let

    t = t [tp a(h)] A t [ pq0 ]

    A q

    be a reduction of A , with t p L(A, p), involving thetransition ( s,p,s ) in s pq 0B b,q

    s , for some b. Removing the

    transition ( s,p,s ), we have s q0B b,q s and a reduction

    t = t[a (h)] A t[q0 ]

    A q (meaning t L(A , q))

    with a measure M strictly smaller than the above reductionfor the recognition of t . By induction hypothesis, it followsthat there exists u L(AL , q) such that u R / A t

    . Sincet = t[a (h)] R / A t [tp a(h )] = t , with a(x) p, a (x), weconclude that u R / A t .

    INSafter : this case is similar to the previous one.

    RPL 1 : let t L(A , q) (q QL ), and assume that a horizon-tal transition ( s ,p ,s ) is used in a run of A on t , and thatthis transition was added to because a(x) p R/ Aand because there exists ( s, q0 , s ) for some q0 QLsuch that L(B a,q 0 ) = . Let

    t = t[t p ] A t [ p]

    A q

    be a reduction of A , with t p L(A, p), involving the addedtransition ( s,p,s ) in s pB b,q

    s , for some b and some

    q QL . Replacing the transition ( s,p,s ) with ( s, q0 , s ),we obtain s q0B b,q

    s and a reduction t = t [a(h)] A

    t[q0]

    A q (meaning t L(A , q)). The measure M

    of this later reduction is strictly smaller than the abovereduction for the recognition of t , because the transition(s, q0 , s ) belongs to 0 (no such transition can be added bythe above procedure). By induction hypothesis, it followsthat there exists u L(AL , q) such that u R / A t

    . Sincet = t[a(h)] R / A t[t p ] = t , with a(x) p, we concludethat u R / A t .

    DEL: let t L(A , q) ( q QL ), and assume that a horizontaltransition ( s,,s ) is used in a run of A on t , and that thistransition was added to because a(x) () R/ A andbecause there exists ( s, q0 , s ) for some q0 QL suchthat L(B a,q 0 ) = . Let us replace this -transition ( s,,s )

    with ( s, q0 , s

    ) in a reduction t

    A q, we obtain a reductiont = t[a (h)]

    A t[q0 ] A q.

    It means that t L(A , q). The measure M of this later re-duction is strictly smaller than the above reduction for therecognition of t , because the transition ( s, q0 , s ) belongs to0 (no such transition can be added by the above proce-dure). By induction hypothesis, it follows that there existsu L(AL , q) such that u R / A t

    . Since t = t[a(h)] R / At , with a(x) (), we conclude that u R / A t .

    (end Lemma direction ) 2

    Lemma 3. L(A ) post R / A (L).

    Proof. We show that for all t L , if t R / A u , then u L(A ), by induction on the length of the rewrite sequence.

    Base case (0 rewrite steps ). In this case, u = t L and weare done since L = L(AL ) L(A ) by construction.

    Induction step. Assume that t +R / A u with t L . Weanalyse the type of rewrite rule used in the last rewrite step.

    REN . The last rewrite step of the sequence involves a rewriterule of the form a (x) b(x) R/ A:

    u R / A

    t[a(h)] R / A

    t [b(h)] = t.

  • 8/9/2019 XML Update Formal Verification

    15/21

    By induction hypothesis, t [a(h )] L(A ). Hence there ex-ists a reduction sequence: t[a(h)] A t[a (q1 . . . qn )] A t[q0 ] A qf Q

    f L with q1 . . . qn L(B a,q 0 ), i.e. i a,q 0 B a,q 0

    q1 ...q n

    f a,q 0 . By construction, the -transitions ( i b,q 0 , , i a,q 0 ) and(f a,q 0 , , f b,q 0 ) have been added to . Hence i b,q 0 B b,q 0

    q1 ...q n

    f b,q 0 and q1 . . . qn L(B b,q 0 ). Therefore there exists a reduc-tion sequence: t = t[b(h)] A t[b(q1 . . . qn )] A t [q0 ] A qf Q f L and t L(A ).

    INSrst . The last rewrite step of the sequence involves arewrite rule of the form a(x) a ( p x) R/ A, with p P :

    u R / A

    t[a(h)] R / A

    t [a(tp h)] = t

    with tp L(A, p). By induction hypothesis, t[a (h)] L(A ). Hence there exists a reduction sequence: t[a(h)] A t[a (q1 . . . qn )] A t [q0 ]

    A qf Qf L with q1 . . . qn

    L(B a,q 0 ), i.e. i a,q 0 B a,q 0q1 ...q n f a,q 0 . By construction, the transi-

    tion ( i a,q 0 , p, i a,q 0 ) has been added to . Hence i a,q 0 pB a,q 0i a,q 0

    B

    a,q 0

    q1 ...q n f b,q 0 , i.e. p q1 . . . qn L(B a,q 0 ) and there existsa reduction sequence

    t = t[a (t p h)] A t [a( p q1 . . . qn )] A t [q0 ]

    A qf Q f L .

    It follows that t L(A ).

    INS last . The case where the last rewrite step of the sequenceinvolves a rewrite rule of the form a (x) a(x p ) R/ A,with p P is similar to the previous one.

    INS into . The last rewrite step of the sequence involves arewrite rule of the form a(xy ) a(x p y ) R/ A, with pP :

    u R / A t[a(h)] R / A t[a (h t p )] = t

    with t p L(A, p). By induction hypothesis, t [a(h)] L(A ). Hence there exists a reduction sequence: t [a(h)] A t[a (q1 . . . qn q1 . . . q m )] A t [q0 ]

    A qf Qf L with q1 . . . qn q1 . . . q m

    L(B a,q 0 ), i.e. i a,q 0 B a,q 0q1 ...q n s B a,q 0

    q 1 ...q

    m f a,q 0 for some state

    s S . By construction, the looping transition ( s,p,s ) hasbeen added to . Hence i a,q 0 B a,q 0

    q1 ...q n s pB a,q 0s B a,q 0

    q 1 ...q

    m

    f a,q 0 , i.e. q1 . . . qn p q1 . . . q m L(B a,q 0 ) and there exists areduction sequence

    t = t [a(h t p )] A t [a(q1 . . . qn p q1 . . . q m )]

    A t [q0 ]

    A qf Qf L .

    It follows that t L(A ).

    INSbefore . The last rewrite step of the sequence involves arewrite rule of the form a(x) p a (x) R/ A, with p P :

    u R / A

    t[a(h)] R / A

    t [tp a(h)] = t

    with tp L(A, p). By induction hypothesis, t[a (h)] L(A ). Hence there exists a reduction sequence: t[a(h)] A t[a (q1 . . . qn )] A t[q0 ]

    A qf Qf L . Hence L(B a,q 0 ) =

    and at some point of the reduction, a transition ( s, q0 , s ) is involved. By construction, the transition ( s,p,s ) hasbeen added to . Hence there exists a reduction sequencet = t [tp a(h)] A t [ p q0 ]

    A qf Qf L . It follows that

    t L(A ).

    INSafter . The case where the last rewrite step of the sequenceinvolves a rewrite rule of the form a(x) a (x) p R/ A,with p P is similar to the previous one.

    RPL 1 . The last rewrite step of the sequence involves a rewriterule of the form a (x) p R/ A, with p P :

    u R / A

    t [a(h)] R / A

    t[t p ] = t

    with tp L(A, p). By induction hypothesis, t[a (h)] L(A ). Hence there exists a reduction sequence: t [a(h)] A t[a(q1 . . . qn )] A t[q0 ]

    A qf Qf L . Hence L(B a,q 0 ) =

    and at some point of the reduction, a transition ( s, q0 , s ) is applied. By construction, the transition ( s,p,s ) hasbeen added to , and there exists a reduction sequencet = t[t p ] A t [ p]

    A qf Qf L . It follows that t L(A ).

    DEL . The last rewrite step of the sequence involves a rewriterule of the form a (x) () R/ A:

    u R / A

    t[a(h)] R / A

    t[()] = t.

    By induction hypothesis, t [a(h)] L(A ). Hence there ex-ists a reduction sequence: t[a (h)] A t[a (q1 . . . qn )] A t[q0 ] A qf Q

    f L . Hence L(B a,q 0 ) = and at some point

    of the reduction, a transition ( s, q0 , s ) is applied. Byconstruction, the -transition ( s,,s ) has been added to ,and there exists a reduction sequence t A qf Q

    f L , hence

    t L(A ).(end Lemma direction ) 2 (end Theorem) 2

    C. Proof of Theorem 2We show in this section the correctness of the automataconstruction presented in Section 3.3, after recalling thestatement of Theorem 2 and the construction.

    Theorem 2. Given a HA A on and a PTRS R / A UFO + , for all CF-HA term language L , post R / A (L) is thelanguage of an CF-HA of size polynomial and which can beconstructed in PTIME in the size of R / A and of an CF-HArecognizing L . Proof. Let A = ( , P, P f , ) and let us as-sume that it is normalized. Let AL = ( , Q L , Q f L , L ) bea CF-HA recognizing L , normalized and without collapsingtransitions. The state sets P and QL are assumed disjoint.We shall construct a CF-HA extended with collapsing tran-sitions A = ( , P QL , Q f L , ) recognizing post R / A (L).It follows that post R / A (L)