ChartParse Documentation · 2019. 4. 2. · ChartParse Documentation, Release 1.1 3.1.2Lattice...

21
ChartParse Documentation Release 1.1 Chris Brew August 07, 2015

Transcript of ChartParse Documentation · 2019. 4. 2. · ChartParse Documentation, Release 1.1 3.1.2Lattice...

Page 1: ChartParse Documentation · 2019. 4. 2. · ChartParse Documentation, Release 1.1 3.1.2Lattice Chart Parsing A chart parser can be adapted to work with a word lattice, where the identity

ChartParse DocumentationRelease 1.1

Chris Brew

August 07, 2015

Page 2: ChartParse Documentation · 2019. 4. 2. · ChartParse Documentation, Release 1.1 3.1.2Lattice Chart Parsing A chart parser can be adapted to work with a word lattice, where the identity
Page 3: ChartParse Documentation · 2019. 4. 2. · ChartParse Documentation, Release 1.1 3.1.2Lattice Chart Parsing A chart parser can be adapted to work with a word lattice, where the identity

Contents

1 The parser 1

2 The English grammar 32.1 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3 Lattice input 73.1 Lattice parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

4 Indices and tables 11

Bibliography 13

Python Module Index 15

i

Page 4: ChartParse Documentation · 2019. 4. 2. · ChartParse Documentation, Release 1.1 3.1.2Lattice Chart Parsing A chart parser can be adapted to work with a word lattice, where the identity

ii

Page 5: ChartParse Documentation · 2019. 4. 2. · ChartParse Documentation, Release 1.1 3.1.2Lattice Chart Parsing A chart parser can be adapted to work with a word lattice, where the identity

CHAPTER 1

The parser

1

Page 6: ChartParse Documentation · 2019. 4. 2. · ChartParse Documentation, Release 1.1 3.1.2Lattice Chart Parsing A chart parser can be adapted to work with a word lattice, where the identity

ChartParse Documentation, Release 1.1

2 Chapter 1. The parser

Page 7: ChartParse Documentation · 2019. 4. 2. · ChartParse Documentation, Release 1.1 3.1.2Lattice Chart Parsing A chart parser can be adapted to work with a word lattice, where the identity

CHAPTER 2

The English grammar

An English grammar for chartparse.

This grammar was originally written by Steve Isard at the University of Sussex. The vocabulary is designed to amuseundergraduate Experimental Psychology students, hence the references to pigeons and cages.

The grammar is almost entirely Steve’s original. The only changes are a few words, proper names, and the production:

NP -> det Nn PP

which was changed to

NP -> NP PP

The intent is to demonstrate ambiguous grouping of modifiers.

As in the original LIB CHART _[1], features on the categories are ignored. There are three features used case, numand tr. Thy could reasonably be handled in this file, via compilation to a plain CFG, since their purpose is only toenforce agreement.

2.1 References

The original LIB CHART [R1]

>>> import chart>>> chart.parse(["the","director",'is','clint', 'eastwood'])['the', 'director', 'is', 'clint', 'eastwood']Parse 1:SNpdet theNnn director

Vpcop isPnn clintPnn eastwood

1 parses

>>> import chart>>> chart.parse(["show", "me","a","movie","where", "the","director",'is','clint', 'eastwood'],topcat='SImp',sep='_')['show', 'me', 'a', 'movie', 'where', 'the', 'director', 'is', 'clint', 'eastwood']

3

Page 8: ChartParse Documentation · 2019. 4. 2. · ChartParse Documentation, Release 1.1 3.1.2Lattice Chart Parsing A chart parser can be adapted to work with a word lattice, where the identity

ChartParse Documentation, Release 1.1

Parse 1:SImp_Vp__v show__Np___pn me__Np___Np____det a____Nn_____n movie___Relp____rp where____S_____Np______det the______Nn_______n director_____Vp______cop is______Pn_______n clint_______Pn________n eastwood1 parses

class english.Grammar(grammar, lexicon, state=None)Class for creating grammars from text strings.

Parameters grammar: string

the grammar rules, lines of the form lhs -> rhs (|rhs)*

lexicon: string

the words, lines of the form word category+

Examples

>>> g = Grammar(RULES, WORDS)>>> g.grammar[0]Rule(lhs='S', rhs=['Np', 'Vp'])

Methods

make_rule(lhs)

class english.RuleOne production of a context-free grammar.

4 Chapter 2. The English grammar

Page 9: ChartParse Documentation · 2019. 4. 2. · ChartParse Documentation, Release 1.1 3.1.2Lattice Chart Parsing A chart parser can be adapted to work with a word lattice, where the identity

ChartParse Documentation, Release 1.1

Examples

>>> r = Rule('s',('np','vp'))

Attributes

lhs: string The left hand side of the rule.rhs: list [string] The right hand side of the rule.

Methods

count(...)index((value, [start, ...) Raises ValueError if the value is not present.

2.1. References 5

Page 10: ChartParse Documentation · 2019. 4. 2. · ChartParse Documentation, Release 1.1 3.1.2Lattice Chart Parsing A chart parser can be adapted to work with a word lattice, where the identity

ChartParse Documentation, Release 1.1

6 Chapter 2. The English grammar

Page 11: ChartParse Documentation · 2019. 4. 2. · ChartParse Documentation, Release 1.1 3.1.2Lattice Chart Parsing A chart parser can be adapted to work with a word lattice, where the identity

CHAPTER 3

Lattice input

3.1 Lattice parsing

Chart parsing with word lattice input.

Chart parsing is based on two key ideas

• Collapsing together derivations that can be shown to have a common fate.

• Building data structures that are indexed by

their start and end points.

3.1.1 Standard Chart Parsing

The standard chart parser works with string positions as exemplified below:

0 show 1 me 2 a 3 movie 4 where 5 the 6 director 7 is 8 clint 9 eastwood 10

The chart is seeded with entries such as Item("Nn",show",0,1), Item("V","show",0,1) andItem("Det","the",5,6). These entries are all of length 1. Lexical ambiguity shows up as multiple itemsspanning the same set of words but giving them different labels, as happens here for the span 0:1

The parser then uses grammar rules to combine items and generate longer items such as Item("Np",5,7). Twoitems can be combined if the left hand item ends in the same place that the right hand item starts, and the grammarlicenses the combination. If, at the end of the process, an item has been built that spans the string from beginning toend and has a suitable label (e.g. sentence, imperative, question, whatever ...), the input has been fully analyzed.

Depending on the details of the implementation, there may also be items that represent partial constituents such asEdge("SImp",0,2,["Np"]). This one says that if material to the right of the span 0:2 can be made into an Npwe will have an imperative sentence.

The chart enforces a no-duplicates condition. When the same item can be made more than one way, it is stored in thechart only once, and a separate data structure is updated to keep track of the alternatives. Two items are equivalent ifthey span the same words and have the same label. For simple grammars, this is enough to enforce the principle ofcommon fate. If features are used, a little more care is needed, but the essential principle is unchanged: two alternativederivations are collapsed together when it has been shown that subsequent parsing actions will affect them in exactlythe same way.

7

Page 12: ChartParse Documentation · 2019. 4. 2. · ChartParse Documentation, Release 1.1 3.1.2Lattice Chart Parsing A chart parser can be adapted to work with a word lattice, where the identity

ChartParse Documentation, Release 1.1

3.1.2 Lattice Chart Parsing

A chart parser can be adapted to work with a word lattice, where the identity of the words is uncertain. Suppose thatan ASR system has tried to identify the example sentence, and is unsure about the words “director” and “eastwood”. Itthinks that “director” might have been either (“direct” “or”) or (“dye” “rector”) and that “eastwood” might have been(“is”, “would”), (“is”,”wood”), or (“east”,”wood”). A real lattice might have more ambiguity than this, we are keepingit short for readability. Note that (“east”,”would”) is not a possibility.

This uncertainty can be represented by a finite-state machine, as follows:

Arcs:

0 show 11 me 22 a 33 movie 44 where 55 the 66 director 76 direct 6.16 dye 6.26.1 or 76.2 rector 77 is 88 clint 99 eastwood 109 is 9.19 east 9.19 is 9.29.1 wood 109.2 would 10

For convenience, the arcs can be renumbered with consecutive integers:

0 show 11 me 22 a 33 movie 44 where 55 the 66 director 96 direct 76 dye 87 or 98 rector 99 is 1010 clint 1111 eastwood 1411 is 1211 east 1211 is 1312 wood 1413 would 14

Once this is done, the chart can be seeded in the same way as before, except that the numbers now represent states ofthe finite-state machine, rather than string positions.

The process that builds combinations from the initial seeds can now be left unchanged. Items can combine if the startstate of one is the end state of the other, and the grammar licenses that combination. Because of the renumbering,

8 Chapter 3. Lattice input

Page 13: ChartParse Documentation · 2019. 4. 2. · ChartParse Documentation, Release 1.1 3.1.2Lattice Chart Parsing A chart parser can be adapted to work with a word lattice, where the identity

ChartParse Documentation, Release 1.1

words that are on incompatible paths pass through different intermediate states, therefore never have the opportunityto combine.

The termination condition, also changes slightly: we now say that an analysis is complete if an item is built whosestart point is a start state of the finite-state machine and whose end point is an accepting state of the machine.

class lattice.DemoLatticeWords(arcs=[(0, ‘show’, 1), (1, ‘me’, 2), (2, ‘a’, 3), (3, ‘movie’, 4), (4,‘where’, 5), (5, ‘the’, 6), (6, ‘director’, 9), (6, ‘direct’, 7), (6, ‘dye’,8), (7, ‘or’, 9), (8, ‘rector’, 9), (9, ‘is’, 10), (10, ‘clint’, 11), (11,‘eastwood’, 14), (11, ‘is’, 12), (11, ‘east’, 12), (11, ‘is’, 13), (12,‘wood’, 14), (13, ‘would’, 14)])

Run a chart from a lattice rather than a linear set of words.

>>> import chart>>> chart.parse(demo_arcs,topcat='SImp', sep='_', input_source=DemoLatticeWords)[(0, 'show', 1), (1, 'me', 2), (2, 'a', 3), (3, 'movie', 4), (4, 'where', 5), (5, 'the', 6), (6, 'director', 9), (6, 'direct', 7), (6, 'dye', 8), (7, 'or', 9), (8, 'rector', 9), (9, 'is', 10), (10, 'clint', 11), (11, 'eastwood', 14), (11, 'is', 12), (11, 'east', 12), (11, 'is', 13), (12, 'wood', 14), (13, 'would', 14)]Parse 1:SImp_Vp__v show__Np___pn me__Np___Np____det a____Nn_____n movie___Relp____rp where____S_____Np______det the______Nn_______n director_____Vp______cop is______Pn_______n clint_______Pn________n eastwood1 parses

Attributes

final_state The final state should be in final position.

Methods

arcs()

final_stateThe final state should be in final position.

3.1. Lattice parsing 9

Page 14: ChartParse Documentation · 2019. 4. 2. · ChartParse Documentation, Release 1.1 3.1.2Lattice Chart Parsing A chart parser can be adapted to work with a word lattice, where the identity

ChartParse Documentation, Release 1.1

10 Chapter 3. Lattice input

Page 15: ChartParse Documentation · 2019. 4. 2. · ChartParse Documentation, Release 1.1 3.1.2Lattice Chart Parsing A chart parser can be adapted to work with a word lattice, where the identity

CHAPTER 4

Indices and tables

• genindex

• modindex

• search

11

Page 16: ChartParse Documentation · 2019. 4. 2. · ChartParse Documentation, Release 1.1 3.1.2Lattice Chart Parsing A chart parser can be adapted to work with a word lattice, where the identity

ChartParse Documentation, Release 1.1

12 Chapter 4. Indices and tables

Page 17: ChartParse Documentation · 2019. 4. 2. · ChartParse Documentation, Release 1.1 3.1.2Lattice Chart Parsing A chart parser can be adapted to work with a word lattice, where the identity

Bibliography

[R1] http://www.poplog.org/gospl/packages/pop11/lib/chart.p

13

Page 18: ChartParse Documentation · 2019. 4. 2. · ChartParse Documentation, Release 1.1 3.1.2Lattice Chart Parsing A chart parser can be adapted to work with a word lattice, where the identity

ChartParse Documentation, Release 1.1

14 Bibliography

Page 19: ChartParse Documentation · 2019. 4. 2. · ChartParse Documentation, Release 1.1 3.1.2Lattice Chart Parsing A chart parser can be adapted to work with a word lattice, where the identity

Python Module Index

eenglish, 3

llattice, 7

15

Page 20: ChartParse Documentation · 2019. 4. 2. · ChartParse Documentation, Release 1.1 3.1.2Lattice Chart Parsing A chart parser can be adapted to work with a word lattice, where the identity

ChartParse Documentation, Release 1.1

16 Python Module Index

Page 21: ChartParse Documentation · 2019. 4. 2. · ChartParse Documentation, Release 1.1 3.1.2Lattice Chart Parsing A chart parser can be adapted to work with a word lattice, where the identity

Index

DDemoLatticeWords (class in lattice), 9

Eenglish (module), 3

Ffinal_state (lattice.DemoLatticeWords attribute), 9

GGrammar (class in english), 4

Llattice (module), 7

RRule (class in english), 4

17