Lambek Grammars Are Context Free.pdf

download Lambek Grammars Are Context Free.pdf

of 6

Transcript of Lambek Grammars Are Context Free.pdf

  • 7/29/2019 Lambek Grammars Are Context Free.pdf

    1/6

    Lambek Grammars Are Context FreeM. PentusDepartment of Mathematical LogicFaculty of Mechanics and MathematicsMoscow State UniversityMoscow, RUSSIA, 119899

    AbstractI n this paper the Chomsky Conjecture i s proved: alllanguages recognized by the Lambek calculus are con-text free.IntroductionThe notion of a basic categorial grammar was in-troduced in 111 In the same paper i twas proved thatbasic categorial grammars are precisely the context-free ones.Another kind of categorial grammars was intro-duced by J . Lambek [SI. These grammars are basedon a syntactic calculus, presently knownas the Lam-bek calculus (cf. [2] for its semantic interpretations).Chomsky [SI conjectured that these rammars are alsoequivalent to context-free ones. Inq7]Cohen provedthat every basic categorial grammar (and, thus,every context-free grammar) sequivalent toaLambekgrammar. He also proposed a proof of the converse.However, as pointed out in [3], this proof contains anerror. Buszkowski proved that some special kinds ofLambek grammars are context-free [3, 4, 51 Thesegrammars use weakly unidirectional typesor types oforder at most two.The main result of this paper (Theorem 2) saysthat Lambek grammars generate only context-free lan-guages. Thus they are equivalent to context-free gram-mars and also to basic categorial grammars.1 Preliminaries1.1 Lambek calculusWe consider the syntactic calculus introduced in[8]: The types of the Lambek calculus are built ofprimitive types pl , 2, .. , and three binary connec-tives *,\,/. We shall denote the set of all typesby T p. Capital letters A , B , .. ange over types.Capital Greek letters range over finite (possiblyempty) sequences of types. Sequents of the Lambekcalculus are of the formr+A, where I? isa nonemptysequence of types.Axioms: pi-+pi

    Rules:

    A I I j B--+\)I I +A\B where I I isnot emptyI I A+BII-+B/A (-+/) where I I is not emptyThe cut-elimination theorem for this calculus isWe writeL F r-+A if the seuuent r - + A s derivallleproved in [8].in the Lambek calculus.

    Definition. The length of a type is defined as thetotal number of primitive type occurrences in the type.IlPill*1 llA*Bll =llA\Bll =IWBII I I AI I+II ~~l1.2 Lambek grammars and context-freegrammarsDefinition. We assume that a finite alphabet7 andadistinguished type D are given. A Lambek grammar

    4291043-6871/9303.000 1993 IEEE

    Authorized licensed use limited to: Tec de Monterrey. Downloaded on July 23, 2009 at 19:13 from IEEE Xplore. Restrictions apply.

  • 7/29/2019 Lambek Grammars Are Context Free.pdf

    2/6

    isamappingf such that, for all t E 7 , (t) c Tpandf(t) is finite.The language generated by the Lambek grammaris defined as the set of all expressions tl . tn overthe alphabet 7 for which there exists a derivablesequent B1 . Bn+D such that Bi E f ti ) for alli I . We shall denote this language by L [ 7 ,D , ).Definition. We assume that two disjoint alphabets7andW are given. The elements of 7 are called termi-nal symbols and those of W are auxiliary symbols.A context-free rewrite rule is of the formX e,whereX is an auxiliary symbol and e isaword in thealphabet 7 U W.A context-free grammar isafinite set R of context-free rewrite rules, with one auxiliary symbol Sdesig-nated its start symbol.By G ( 7 ,W, S,RJ we denote the set of all expres-sions over the alpha et7 U W that arise through somefinite sequence of rewritings of the start symbol Sviathe rules ofa.The language generated by the context-free gram-mar is defined as

    ~(7,, ,a)*G(7,W, , )n7+,where7+denotes the set of all nonempty expressionsover the alphabet7.2 Main resultIn this section we show that every language recog-nized by a Lambek grammar can also be generatedby a context-free grammar. The crucial point is thatevery sequent B1 .. B,+D derivable in the Lambekcalculus follows immediately (i .e., by means of the cutrule only) from some short derivable sequents contain-ingat most three types each, where none of the typesis longer than the longest type in B1 . .. Bn+D (cf.Theorem 1). The proof of Theorem1will be carriedout later in Section 3.In order to formalize the notion of immediate con-sequence, weintroduce for each natural numbermtwocalculi Lcut, and Lcut; .Definition. A sequent A1 . . An+B is an axiom ofLcut, iff(1) 5 2;(2) the sequent A1 .. . An+B is derivable in the(3) IlBll 5mand IlAill 5m for all i 5n.The only rule of Lcut, is (CUT).Definition. The calculus Lcut, has the same axiomsas Lcut,. The only rule of Lcut; is (CUT) withthe restriction that the left premise@-+Bmust be anaxiom of Lcut; .Theorem1 Lcut, t B1 ... Bn+D i and only if[[Bill5m for all i 5 n, llDl{< m, and

    Lambek calculus;

    L I- B1 .. . Bn+D.

    The theorem will be proved in Section 3.Lemma 1 Lcut; t @+A if and only ifLcut, I- @+A.PROOF. The 'only if' 'part is obvious. The 'if'part is proved by induction on the length of toheLcut;-derivation of the left premise of a cut. A deriva-tion of the form

    will be rearranged in the following way.I ' B A + A @All-C (CUT)

    (CUT)@+B Q I ' B A I I +CO I ' @A I I +C

    Theorem2 F or any Lambek grammar there exists acontext-free grammar such that the languages gener-ated by these grammars coincide.PROOF.f a fixed alphabet7, designated type D,and a mapping f are given, then the set of types rel-evant in the definition of L 7,D,f ) is finite. Idetm be the maximum of the(lngths of these types.Then I ID1I5m and, for any t E 7, or any B E f (t),

    The set of primitive types involved in the gram-mar isalso finite. Below we shall consider only typesconsistin of these primitive types.We tai e as the alphabet of auxiliary symbols Wthe set of all types not longer than m (and containingonly relevant primitive types).

    IlBlI I

    W+{A E TP I I lAl lS m}We take the distinguished typeD as the start symbolof the context-free grammar.The set R consists of obvious rules describing themapping f and of Lcut,-axioms with the sequent isr-rows reversed.

    R { B a t I t E 7 and B E f(t)}UU { A aBC I A , B , C E W and L l - B C+A}U

    U {A =>B I A , B E W and L l - B + A }First, we prove that L ( 7 ,D,f ) cG ( 7 ,W,D, X) .Suppose that tl . .tn E L 7,D, f ). According to the

    such that L I- Bi . Bn+D and Bi E f(tj) torall i 5n. By construction, Bi =>t i 72. forall i 5 n. Thus it suffices to prove th.atdefinition of L( 7, D, f )6 re are types B1,...,,3,,B1. ..Bn E C(7 ,W, YE) .

    430

    Authorized licensed use limited to: Tec de Monterrey. Downloaded on July 23, 2009 at 19:13 from IEEE Xplore. Restrictions apply.

  • 7/29/2019 Lambek Grammars Are Context Free.pdf

    3/6

    In view of Theorem 1 and Lemma 1,Lcut, I - B1 .. B,+D.Straightforward induction on the length of a Lcut i -derivation shows that if Lcut, I- B1 . B,+D then..W , , R )C C ( 7 ,D , ).We extend the mapping f to the set T U h) stipu-lating f ( B )={B } for all B E W . Easy induc-tion on the number of rewritings establishes thatif an expressionX1 .. Xnover the alphabet7UWbelongs to G ( 7 ,W, ,72 ) then there are auxiliarysymbols B1,. . B , such that Bj E f(Xj) for alli 5 n, and Lcut, I- B1 .. B,-,+D. In particular, ift2,.. t, are terminal symbols andtl . . tn belongs toG ( 7 ,W,D ,R ) then ti . tn E C( 7,D ,f). W

    3 Proof of Theorem 1Let FG stand for the free group generated by allprimitive types {pi i E N}. The identity element will1111 for the length of U written as areduced word, i.e., aword that does not contain any fragments of the formpip i or p; ' p i .Definition. The free group interpretation of types(written as [ ]) is the following mapping of typesinto FG.

    be denoted by c . dor any element U E FG,we write

    Bpi] * pi[A*B] * [ A ] o [ B ][A\B] * [ A ] - l o [ B ][ A / B ] + [A]o[B] - '[A1 . .An]

    Lemma 2 If a se vent I'+C i s derivable in the Lam-bek calculus, then iI']=[a.D. Roorda obtained this result in terms of atomicmarkings. The lemma has also an immediate proof inthe free group environment [9].

    Definition. For each natural number i we define thecounter of occurrences of the primitive typepi as fol-lows.

    * [A I ]o .. o [An]

    ~ i p i+1 uipj +0, if i #ju~(A * B ) .j(A\B) =u j (A /B )+u ~ A u ~B

    We extend this definition to finite sequences oftypes:~j ( A l A ,) u iA l+ . .+UjAn

    Remark. If asequent I'-A isderivable in the Lam-bek calculus, then for any , uj(I'A) s an even number.Remark. IlAll = uj ADefinition. A sequent I '+A is thin iff

    i

    (1) I'-A is derivable in the Lambek calculus;(2) ui(I'A)=2 for all primitive types pi occurring inr+A.Lemma 3 I f L I- I' 0 A - C then there i s a type B(an interpolant f or 0 in I' 0 A-G) such that(i) L t - 0 - B ;(ii) L I - I' B A-&'.(iii) ujB 5 min(uj@,uj(I'AC));PROOF. This lemma is a slight modification ofthe Interpolation theorem for the Lambek calculiisproved by Dirk Roorda in [lo, p. 841. The onlydifference is that Roorda allows also sequents wi1,hempty antecedents to occur in derivations. We omitthe straightforward proof by induction on a cut-freederivation of I' 0 AhC. WLemma 4 I f a sequent I' 0 A+C i s thin, then thereas a type B such that(i) the sequent 0+B is thin;(ii) the sequent I' B A+C i s thin;(iii) IlBll =I[0il, i.e., the length of B equals to thelength o f t e reduced word f or the free group izs-terpretation of 0 .PROOF.

    taining at most one occurrence of every literal.For any i ,~ j ( 0 B ) ai 0 +u ~ B uj(I'0AC)+u ~ B 2+1.

    Since ui(0B) is even, we conclude that uj(0B) iseither 0or 2. This proves (i). The claim (ii) is provedsimilarly.According to Lemma 2 [r][@][A][ Cg, whence[a] =[I ']-l[Cg[A]-l.We conclude that the reducedwords for [a] and [r]-'[q[A]-l coincide.If u j0 =0 then the letter pi does not occur in [a].If ai0 =1 then there isexactly one occurrenceof piin [a]. If u i@=2, then ui(I'AC)=0,whencepi doEsnot occur in 1I'I-l Cg[A]-' and conse uently it hzsWe have verified that the reduced word for [a] con-tains exactly the literals that occur in the typeB . Wealso see that no literal has more than one occurrencein the reduced word. This proves statement (iii). II

    Lemma 5 I f no type in a thin sequent A1 . . An-&i s longer than m, then L cut, I- A1 . .. An-&.To prove Lemma 5, we need some facts aboitlengths of reduced words in a free group.

    no occurrences in tae reduced word forI@].

    431

    Authorized licensed use limited to: Tec de Monterrey. Downloaded on July 23, 2009 at 19:13 from IEEE Xplore. Restrictions apply.

  • 7/29/2019 Lambek Grammars Are Context Free.pdf

    4/6

    Lemma 6 If u,v, wE F G and IuvI > V I , then1uvw1>1vw1PROOF. iven two reduced wordsU and v, there ex-ist reduced words U , b, c such that U =ab-l, v=bc,uv =ac, and the wordsab-l, bc, ac are reduced. Sim-ilarly for v and w there exist reduced words d, e ,andfsuch that v=de, w=e-l f , vw =df, where de, e-l f ,d are reduced.CASE : lbl 5 Id1Evidently d=bg, whereg is a reduced word.

    We consider two cases.

    uvw=ab-l bge e-l fvvvu v w, Obviorly , e, anduvwl = a1+gl + f l . The assumption of the lemmauvl =1 +19 + I lvwl =PI+ !SI +If I,implieswhence 1 >lbl. Thus

    CASE : Ibl >ldlEvidently b=dh, where h is a reduced word.uvw=ah-ld-' dhc c-'h-'f

    u v wObviously uvw=ah-'f and ah-'f i s a reducedword. The assumption of the lemma entails

    PROOF. nduction on n. The casen =2 is obvious,since Ju1u21= =0.Now we prove the lemma for n+1 wordsu1,. .. un-1, v ,w, assuming that it holds for any 36-quence of n words.Suppose that IUn-1VI >max(lun-iI,IvI) andlvwl >max(lv1, lwl). By Lemma7,

    Iun-1vwJ >max(lun-11,1~~1). 111)Applying the induction hypothesisfor u1,. . ,un-l, U,, where un=vw, we find a num-ber k 2. In view of Lemma 2,[AI] . [An][Cg-l =E. We apply Lemma 8 foru1=[All, 9 un =[An], un+ =gC1-l. EvidentlyIuil5m for all i 5n +1. According to Lemma,8there isanumber k 5 n such that Iukuk+ I5m. Thefollowing two cases arise.CASE : E

  • 7/29/2019 Lambek Grammars Are Context Free.pdf

    5/6

    We introduce a new primitive type for each in-stance of an axiom in the derivation of B1 .. Bn+Dand replace all occurrences of old primitive types inthe derivation by corresponding new ones. We ob-tain a derivation of a sequent B 1 .. Bn+D, whereai( ! , . .BnB)=2 for all primitive typespi occurringin ~1 . . . &+D.In view of Lemma5, Lcut, I- 131 .. kn+fi. ~ e -placing new primitive types by corresponding oldones in this Lcut,-derivation, we obtain a Lcut,-derivation of B1 .. Bn-+D. This completes theproof of Theorem 1.AcknowledgementsI would like to thank Prof. S. Artemov for guidingme into the subject, pointing out the most importantproblems, and running a seminar at Moscow Univer-sity, which providesaproper environment to approachthese problems. I am grateful to Prof. M. Kanovich,who teaches the formal grammars and has made sev-eral useful comments on the subject of this paper. Ialso wish to thank L. Beklemishev, V. Krupski, andN. Pankratiev for checking the proof and making anumber of valuable suggestions.References[l] Y. Bar-Hillel, C. Gaifman, and E. Shamir. Oncategorial and phrase-structure grammars. Bull.Res. Council I srael Sect. F , 9F:1-16, 1960.[2] J . van Benthem. Language in Action. North-Holland, Amsterdam, 1991.

    [3] W. Buszkowski. The equivalence of unidirec-tional Lambek categorial grammars and context-free grammars. Zeitschrifl f i r mathematischeLogik und Grundlagen der Mathematik, 31:36!)-384, 1985.Generative power of categ>rial grammars. In R.T. Oehrle, E. Bach, andD. Wheeler, editors, Categorial Grammars andNatural Language Structures, pages 69-94, Rei-del, Dordrecht, 1988.

    [5] W. Buszkowski. On generative capacity of theLambek calculus. In J . van Eijck, editor, Logicsin AI , pages 139-152, Springer, Berlin, 1991.[6] N. Chomsky. Formal properties of grammars. InR.D. Luce et al., editors, Handbook of Mathemaili-

    [4] W. Buszkowski.

    cal Psychology, vol. 2, pages323-418, Wiley, NewYork, 1963.[7] J . MCohen. The equivalence of two concepts ofcategorial grammar. I nformation and Control ,[8] J . Lambek. The mathematics of sentencestructure. American Mathematical Monthiy,[9] M. Pentus. Equivalent Types in L ambek Calculusand Linear Logic. Preprint No. 2 of the Depart-mentof Math. Logic, Steklov Math. Institute, Se-ries Logic and Computer Science, Moscow, 1992.

    [lo] D. Roorda. Resource Logics: Proof-theoretical I n-vestigations. PhD thesis, Fac. Math. and Comp.Sc., University of Amsterdam, 1991.

    10:475-484, 1967.

    65(3):154-170,4958.

    433

    Authorized licensed use limited to: Tec de Monterrey. Downloaded on July 23, 2009 at 19:13 from IEEE Xplore. Restrictions apply.

  • 7/29/2019 Lambek Grammars Are Context Free.pdf

    6/6

    Lambek Grammars Are ContextFreeM. Pentus

    (SeeAddendum, page 429,)