On top-to-bottom recognition and left recursion

2
? .~- ::: :: V::: ::.: J:: : Y- ~:~ ¢ ~: ,o::: )i?- ; - ! z :~" i ~:,:] [ ~;::: 2:: i : ; ;::: = := S,g:. ~::; S~< 2 ;::; ;::,~ qi<. :=: i{d: v.#.: : k;:.: ::~r~(: = : : : ei..2C ][;[: ()n Top-tooBottom Recog ition Left Recursion i~b;[X( tk [.-RKI-~UONIO" (<~-~egie Ind-itute of Tcclmology, Pillsb'urgh, Pen nsyh'ania A procedure is given for obtaining structural descriptions in a context-free grammar by performing the recognition occording to a strongly equivalent, left-recursion-free grammar. The effect of allowing null strings in the rewriting rules is dis- cussed, ], Introduction [a a recent paper by T. V. Griffiths and S. R. Petrick I several types of context-free grammar recognizers were ~ompared. Obviously, left reeursion in grammars is a sig- ,i/leant reason why tile selective top-to-bottom (STB) aigorithm turned out to be highly inefficient, in general. )he way to avoid left reeursion is to transform the gram- mar into standard form, but the desired structural de- -(.~'iptions are then lost, as the authors of [1] point out. In :iJis paper a procedure is constructed in which this diffi- -~fit:y is avoided. The terminology and notations of [1] and [2] are mainly ~:ed. In addition, Greek letters }, n,"" are used for :< rings on I U T, attd the vertical bat" ] is adopted front '~Baekus NormM Form." The notation ~ ~ n is used to i~tdieate that either } = n or n is derivable from ~ by apply- iag one or more rewriting rules. "2. R e m o v i n g Left Reeursion A notttermirtM A ~ ~ I is left reeursive if A ~ At for :<)me 4' ¢ A. To remove left reeursion we apply a pro- ('edurc which was used in [:3] for applying predictive reeog- :tition to any context-free language. The grammar is not iransformed to standard form, since removing left reeur- sion is sufficient for our purposes. The only assumption about the grammar is that it does not. contain cycles, i.e., ! ~ A or A ~ B ~ A is not true for any nonterminals ~/¢B. ;Let A be left recursive, and let J ... in. o) (,) be its rewriting rules (we assume that }'s and ~'s are non- mdl, and that v's do not begin with A). If m > 0, the immediate left recursion is removed by adding a new non- t.;.rminal A ' ,4' . . . I&A'] .-. IS, A' (2) This work was supported by the Advanced Research Projects Agency of the Office of the Secretary of Defense (SD-146). * Currently at the University of Tampere, Tampere, Finland. Volume 9 / Number 7 / July, 1966 and by replacing (1) by _I-+nli "" inn viA' .'. ureA'. ( If there are now rewriting rules B --~ At- for nonterm nals B such that A ~ Be, we replace A in them by t right-hand sides of (3): B -, nt~" "'" ,,d'ln,A'~'J "" i n,,~A'f. ( Of course, only the first n right-hand sides appear m = 0. After these transfornlations A is no nmre left, recursiv and the process can be repeated for other nontermina until all left recursion has been removed. 3. STB Recognition The above removal of left recursion distorts the phra structure of sentences. Information on the original phra boundaries is preserved, however, if a marker is attache to the end of each right-hand side of the original rules, an these markers are then carried along in the transform tions. If the markers still contain tile names and lengt of the phrases, the STB algorithm of [1] could be easi nmdified to use this non-leff-recursive grammar and give the desired structural descriptions in the Polish suff form used in [2]. For the purpose of allowing rules of th form A -+ A later, we will, however, keep to the brack notation, although we then need two separate passes f obtaining the proper structural descriptions. Instead of placing a marker in the end of each phras we now have to indicate the termination of only tho phrases which disappear in removing left recursion. Th is done by replacing (2), (3) and (4) by A'~,S I ... ] }ml h/A' ... I }~/A', (2 A A A I " i [ n,/A' ... I (3 A A and B ~ nl..."~ i "'" I .n/r A A I n,/A'/rl -.. I (4' A -4 A A The marker / has no significance in generating the lan A guage; it serves only to indicate the end of an A-phrase i the original grammar. The first pass of the STB analysis is now essentiMly th STB algorithm of [1]: Conditions TM Instructions Output A -, ~ ~ an (a, A) ~ (a, ~1) [ A A' -+ $ ~ a~ (a, A') --~ (a, ~) & (a, a) -~ (A, ,t) a (A, I) -~ (a, .~) ] (A, /) ~ (A, ~t) / A A Here A' denotes the nonterminals added in the modifica tion of the grammar, and the output string correspondin Communications of the AC:%I 52

Transcript of On top-to-bottom recognition and left recursion

Page 1: On top-to-bottom recognition and left recursion

::::): : : . ; ]: :

?

~r t,o : dc,:-.~-

: : =::

-ing w(!~::: :: r2 onW V:::

BIN-k4 ~: :.: U1. gEv:cJ:: : ~S FgOY-

the see0~:~ .~umed t;¢ ~: l~st.ln~: ,o:::

siveTthe. )i?- • ": ::C; - !

,ix en ic,z :~" i

v eli.afina.~:,:] : = : 5 [

ONTO: ~;::: : i2:: i

: 5: :

)vious~: ; ;::: : = :=

V2; BiS,g:.

1. SKIP ~::;

1. IIEW!S~< : :: 2 ;::;

desiral~i;::,~ The:iraqi<.

bove

~IN. U2. V2 . : : := :

. Thusi i{d: t h e : at~v.#.: : "ogranuak;:.: input , in::~r~(:

:-::= : : :

: t t ion: i:~ei..2C

: : : : ] [ ; [ :

()n Top-tooBottom Recog ition Left Recursion

i ~ b ; [ X ( t k [ . - R K I - ~ U O N I O "

(<~-~egie Ind-i tute of Tcclmology, Pillsb'urgh, Pen nsyh 'ania

A procedure is given for obtaining structural descriptions in a context-free grammar by performing the recognition occording to a strongly equivalent, left-recursion-free grammar. The effect of allowing null strings in the rewriting rules is dis- cussed,

], I n t r o d u c t i o n

[a a recent paper by T. V. Griffiths and S. R. Petrick I several types of context-free grammar recognizers were ~ompared. Obviously, left reeursion in grammars is a sig- , i / leant reason why tile selective top- to-bot tom (STB) aigorithm turned out to be highly inefficient, in general. )he way to avoid left reeursion is to transform the gram-

mar into s tandard form, but the desired structural de- -(.~'iptions are then lost, as the authors of [1] point out. In :iJis paper a procedure is constructed in which this diffi- -~fit:y is avoided.

The terminology and notations of [1] and [2] are mainly ~:ed. In addition, Greek letters }, n , " " are used for :< rings on I U T, attd the vertical bat" ] is adopted front '~Baekus NormM Form." The notation ~ ~ n is used to i~tdieate that either } = n or n is derivable from ~ by apply- iag one or more rewriting rules.

"2. R e m o v i n g L e f t R e e u r s i o n

A notttermirtM A ~ ~ I is left reeursive if A ~ A t for :<)me 4' ¢ A. To remove left reeursion we apply a pro- ('edurc which was used in [:3] for applying predictive reeog- :tition to any context-free language. The grammar is not iransformed to standard form, since removing left reeur- sion is sufficient for our purposes. The only assumption about the g rammar is that it does not. contain cycles, i.e., ! ~ A or A ~ B ~ A is not true for any nonterminals

~ / ¢ B . ;Let A be left recursive, and let

J . . . i n . o) (,)

be its rewriting rules (we assume that }'s and ~'s are non- mdl, and tha t v's do not begin with A) . I f m > 0, the immediate left recursion is removed by adding a new non- t.;.rminal A'

,4' . . . I & A ' ] . - . IS , A ' (2)

This work was supported by the Advanced Research Projects Agency of the Office of the Secretary of Defense (SD-146).

* Currently at the University of Tampere, Tampere, Finland.

Volume 9 / Number 7 / July, 1966

and by replacing (1) by

_ I - + n l i " " inn viA' . ' . ureA'. (3)

If there are now rewriting rules B --~ At- for nontermi- nals B such tha t A ~ Be, we replace A in them by the right-hand sides of (3):

B - , nt~" " '" , , d ' l n ,A '~ ' J " " i n,,~A'f. (4)

Of course, only the first n right-hand sides appear if m = 0.

After these transfornlations A is no nmre left, recursive, and the process can be repeated for other nonterminals until all left recursion has been removed.

3. S T B R e c o g n i t i o n

The above removal of left recursion distorts the phrase structure of sentences. Information on the original phrase boundaries is preserved, however, if a marker is attached to the end of each right-hand side of the original rules, and these markers are then carried along in the transforma- tions. I f the markers still contain tile names and lengths of the phrases, the STB algorithm of [1] could be easily nmdified to use this non-leff-recursive grammar and to give the desired structural descriptions in the Polish suffix form used in [2]. For the purpose of allowing rules of the form A -+ A later, we will, however, keep to the bracket notation, although we then need two separate passes for obtaining the proper structural descriptions.

Instead of placing a marker in the end of each phrase, we now have to indicate the termination of only those phrases which disappear in removing left recursion. This is done by replacing (2), (3) and (4) by

A ' ~ , S I . . . ] }ml h / A ' . . . I } ~ / A ' , (2 ' ) A A

A I " i [ n , / A ' . . . I (3') A A

and

B ~ n l . . . "~ i " ' " I . n / r A A

I n , / A ' / r l - . . I (4') A -4 A A

The marker / has no significance in generating the lan- A

guage; it serves only to indicate the end of an A-phrase in the original grammar.

The first pass of the STB analysis is now essentiMly the STB algorithm of [1]:

Conditions TM Instructions Output

A - , ~ ~ an (a, A) ~ (a, ~1) [ A

A' -+ $ ~ a~ (a, A') --~ (a, ~) & (a, a) -~ (A, ,t) a (A, I) -~ (a, .~) ] (A, /) ~ (A, ~t) /

A A

Here A ' denotes the nonterminals added in the modifica- tion of the grammar, and the output string corresponding

C o m m u n i c a t i o n s o f t h e AC:%I 527

Page 2: On top-to-bottom recognition and left recursion

tO a parsing sequence is essentially the corresponding structural description. However, for A' the brackets are omitted and the output contains markers due to the changes of the original grammar.

The desired structural descriptions are now obtained in a deterministic, right-to-left scan of the output strings of the previous algorithm. In this phase the markers are re- placed by closing brackets, and a stack is needed to remem- ber the corresponding opening brackets so that they may be inserted properly.

The same notation as before is used. The first stack is now the output of the previous phase (in right-to-left order), the second stack is originally empty, and the out- put string is the structural description in the original grammar (in right-to-left order).

TM Instr~ctions Output

(], a) -~ (A, 1) ] (a, a) -~ (a, a) a

(/, ~) -~ (a , / ) ] A A

([,/) - . ([, A) [ X B A B

([, ]) -~ (a, ~) [ A A

4. T h e Of ten F o r g o t t e n A

I t is often desirable to allow the null string on the right- hand side of rewriting rules. We wiI1 finally show how the above technique can be used to get all structural descrip- tions of a sentence in this case too. The only assumption about the grammar is still tha t it contains no cycles.

The use of rewriting rules with A on the right-hand side cart be avoided if new rules are added where the vanishing nonterminals A ~ A have been deleted explicitly. The place of a deleted nonterminal A will be denoted by a marker X~. Obviously, a rule with k vanishing nontermi- nals gives rise to 2 k - 1 new rules corresponding to the deletion of any subset of these/~ nonterminals. To make it possible to remove left recursion front this modified grammar, we at first replace the rules of the type

A --~ X~ . . . M~B~

by

where ~ is ~mot;h(~r lype of ~ marker. ],eft. recursion can then be removed in t,he nt)rnm.t way (caxrying X's and/u's Mong).

To tile firsL pass of the STB :malysis we now have to add the following rules:

Conditions TM Instructions Output

I(i, x..d ..... (A, XA~ "" Xa,]) [ A--~XA~'''XA,, (n ~ 0) ~ A

l (A,,A) .... (A,,ao "" ~A~() > A

(zt, ( ) --~ (A, A) ( A A

where ( and } are additional brackets. Furthermore, if A A

belongs to the language the second stack has to have two initial contents, S# and Xs#. I t is to be noted that X's and ~u's are deleted from the stack only by the original rewriting rules with A on the right-hand side.

To the second pass we add just three new rules:

TM Instructions" Output

((, A) ~ (~.,/) ~ .a. A

(>, a ) ~ (A, >)

([,)) ~ ([, A) ] A A

and we get what we wanted. Finally, it is pointed out that the purpose of this paper

is neither to defend the top-to-bottom recognition tech- niques nor to promote the use of highly structured null phrases.

R E C E I V E D J A N U A R Y , 1966; R E V I S E D F E B R U A R Y , 1966

REFERENCES

I. GRIFFITtIS, T. V., AND P E T R I C K , S . R.. O i1 the relative effi- cieneies of context4ree grammar recognizers. Comm. ACM 8 (May 1965), 289-300.

2. - - . Letter to the Editor. Comm. ACM 8 (Oct. 1965), 594. 3. KVRKI-SuoNIO, R. On some sets of formal grammars. Ann.

Acad. Sci. Fenn. AI&49 (1964).

P l a n to A t t e n d

A C M 66

August 30 - September 1, 1966

Los Angeles

, 528 Communications of the ACM Volume 9 / Number 7 / July, 1966