“Mildly” Deterministic Language Processing November 2007 Jerry Ball Human Effectiveness...

33
“Mildly” Deterministic Language Processing November 2007 Jerry Ball Human Effectiveness Directorate Air Force Research Laboratory

Transcript of “Mildly” Deterministic Language Processing November 2007 Jerry Ball Human Effectiveness...

Page 1: “Mildly” Deterministic Language Processing November 2007 Jerry Ball Human Effectiveness Directorate Air Force Research Laboratory.

“Mildly” Deterministic Language Processing

November 2007

Jerry Ball

Human Effectiveness Directorate

Air Force Research Laboratory

Page 2: “Mildly” Deterministic Language Processing November 2007 Jerry Ball Human Effectiveness Directorate Air Force Research Laboratory.

2

Constraints on Human Language Processing

• Visual World Paradigm (Tanenhaus et al. 1995)

– Subjects presented with a visual scene

– Subjects listen to auditory linguistic input describing scene

• Immediate determination of meaning

– Subjects look immediately at referents of linguistic expressions, sometimes before end of expression

• Incremental processing

• Interactive processing (Trueswell et al. 1999)

– Ambiguous expressions are processed consistent with scene

“the green…”

“put the arrow on the paper into the box”

Page 3: “Mildly” Deterministic Language Processing November 2007 Jerry Ball Human Effectiveness Directorate Air Force Research Laboratory.

3

• According to Crocker (1999), there are three basic mechanisms for dealing with ambiguity in natural language

– Serial processing with backtracking or reanalysis

– Deterministic processing with lookahead (Marcus 1980)

– Parallel processing with alternative analyses carried forward in parallel (Gibson 1991; MacDonald, Pearlmutter & Seidenberg 1994; Trueswell & Tanenhaus 1994)

• According to Lewis (2000) “…existing evidence is compatible only with probabilistic serial-reanalysis models, or ranked parallel models augmented with a reanalysis component.”

• According to Gibson & Pearlmutter (2000) “noncompetitive ranked parallel models” are most consistent with the empirical evidence

Constraints on Human Language Processing

Page 4: “Mildly” Deterministic Language Processing November 2007 Jerry Ball Human Effectiveness Directorate Air Force Research Laboratory.

4

• Serial and deterministic with reanalysis for pathological input

– Empirical evidence that we don’t carry forward all representations in parallel – Garden Path Sentences

• “The horse raced past the barn fell” (Bever 1970)

– Empirical evidence that we don’t retract previously built representations (Christianson et al. 2001)

• “While Mary dressed the baby sat up on the bed”

– In a post test, a majority of subjects answered yes to the question “Did Mary dress the baby?”

– Processing doesn’t slow down with increasing length of non-pathological input

– Typically only aware of a single interpretation

Constraints on Human Language Processing

Page 5: “Mildly” Deterministic Language Processing November 2007 Jerry Ball Human Effectiveness Directorate Air Force Research Laboratory.

5

• Parallel and probabilistic with reanalysis for pathological input

– Empirical evidence that we may carry forward multiple representations in parallel – Garden Path Effects can be eliminated with sufficient context

– Empirical evidence that dispreferred representations can affect processing time (Gibson & Pearlmutter 2000)

• It’s extremely difficult to empirically falsify either

– Could be parallel slow down or occasional switch between serial alternatives that causes effect

• Don’t have all the answers, but maybe it’s both!

– A parallel, probabilistic substrate may make a “mildly” deterministic serial processing mechanism possible!

Constraints on Human Language Processing

Page 6: “Mildly” Deterministic Language Processing November 2007 Jerry Ball Human Effectiveness Directorate Air Force Research Laboratory.

6

Why should NLP Researchers Care?

• No NLP system to date has the full capabilities of the Human Language Processor (HLP)

• Constraints on HLP provide insight into how to build NLP systems

– Focuses NLP research in directions which are likely to be productive

– Narrows the search space for solutions

• Adherence to well-established constraints on HLP might actually facilitate development of NLP systems

• Don’t know what is given up when mechanisms which are not cognitively plausible are adopted

Page 7: “Mildly” Deterministic Language Processing November 2007 Jerry Ball Human Effectiveness Directorate Air Force Research Laboratory.

7

Cognitively Implausible Mechanism

• Serial processing with algorithmic backtracking

– Algorithmically simple, but…

• Computationally intractable for NLP which is highly ambiguous

• Context which led to dead end is retracted on backtracking

–Why give up the context?

– How do we know it’s a dead end?

• Practical Consequences

– No hope for on-line processing in real-time in large coverage NLP system

– No hope for integration with speech recognition system

– Performance degrades with length of input

– Can’t easily handle degraded or ungrammatical input

Page 8: “Mildly” Deterministic Language Processing November 2007 Jerry Ball Human Effectiveness Directorate Air Force Research Laboratory.

8

Cognitively Implausible Mechanism

• Multiple pass or multi-stage parsing

– First pass assigns part of speech of each word, but…

• Can’t use full context

• Errors get propagated

– Second pass builds structure

• Typically limited to using part of speech of words

– Third pass determines meaning

• Practical Consequences

– Difficult to do on-line processing in real-time

– Can’t easily integrate with speech recognition

– Performance degrades with length of input

– Limited context available to handle ambiguity at each stage

Page 9: “Mildly” Deterministic Language Processing November 2007 Jerry Ball Human Effectiveness Directorate Air Force Research Laboratory.

9

Outrageously Implausible Mechanism!

• Parsing input from right to left (Microsoft NLP system)

– May have engineering advantages, but…

• Presumes a staged approach to NLP

• Completely ignores cognitive plausibility

• Practical consequences

• Impossible to do on-line processing in real-time

–Must wait for end of input

• Nearly impossible to integrate with speech recognition

Page 10: “Mildly” Deterministic Language Processing November 2007 Jerry Ball Human Effectiveness Directorate Air Force Research Laboratory.

10

Cognitively Plausible Mechanism?

• Deterministic processing with lookahead

– Many ambiguities resolved by looking ahead a few words, but…

• Don’t know how far to look ahead

– Cognitive plausibility improved by limiting amount of lookahead

• 3 constituent lookahead (Marcus 1980)

• 1 word lookahead (Henderson 2004)

• Practical consequences

– Difficult to use with eager algorithms for which there is good empirical evidence (immediate determination of meaning)

– The smaller the lookahead, the less deterministic

Page 11: “Mildly” Deterministic Language Processing November 2007 Jerry Ball Human Effectiveness Directorate Air Force Research Laboratory.

11

Cognitively Plausible Mechanism

• Parallel processing with multiple analyses carried forward

– “Full parallelism – where every analysis is pursued – is not psychologically possible” (Crocker 1999)

– Cognitive plausibility improved by limiting number of analyses carried forward and ranking alternatives (bounded ranked parallelism) and not having analyses compete

• Practical Consequences

– The longer the input, the less likely to have the correct representation in the parallel spotlight – necessitating a reanalysis mechanism

– Impractical if multiple representations must be built at each choice point as opposed to just being selected

Page 12: “Mildly” Deterministic Language Processing November 2007 Jerry Ball Human Effectiveness Directorate Air Force Research Laboratory.

12

Some Larger-Scale Models which take Cognitive Plausibility Seriously

• Marcus, M. (1980). A Theory of Syntactic Recognition for Natural Language

• Shen, L. & Joshi, A. (2005). Incremental LTAG Parsing

• Kim, A., Srinivas, B. & Trueswell, J. (2002). A computational model of the grammatical aspects of word recognition as supertagging

• Brants, T. & Crocker, M. (2000). Probabilistic Parsing and Psychological Plausibility

• Vosse, T. & Kempen, G. (2000). Syntactic structure assembly in human parsing

• Ball, J., Heiberg, A. & Silber, R. (2007). Toward a Large-Scale Model of Language Comprehension in ACT-R 6

• Lewis, R. (1993). NL-SOAR

Page 13: “Mildly” Deterministic Language Processing November 2007 Jerry Ball Human Effectiveness Directorate Air Force Research Laboratory.

13

LTAG & Supertagging

• Srinivas & Joshi (1999). Supertagging: An approach to almost parsing

• Linguistic Theory: Lexicalized Tree Adjoining Grammar (LTAG)– Complex trees associated with lexical items

• Multi-Pass Processing Mechanism– First Pass: Probabilistic mechanism used to select most

coherent set of trees – aka “Supertagging”

• Probabilities learned using Machine Learning techniques

• Left and right context used

– “the old man”

N

adj

old

N* Supertag!

Page 14: “Mildly” Deterministic Language Processing November 2007 Jerry Ball Human Effectiveness Directorate Air Force Research Laboratory.

14

LTAG & Supertagging

– Second Pass: Parser used to integrate selected trees using substitution and adjunction operations

NP

det

the

N

N

adj

old

N*+

NP

det

the

NN

adj

old

N

adjunction

+ N

man

substitution

NP

det

the

NN

adj

old

N

man

Page 15: “Mildly” Deterministic Language Processing November 2007 Jerry Ball Human Effectiveness Directorate Air Force Research Laboratory.

15

Improving the Cognitively Plausibility of LTAG

• Shen & Joshi (2005)

– Incremental processing (but still multi-pass for POS)

– Eager parser (immediate determination of meaning)

• Kim, Srinivas & Trueswell (2002).

– Incremental processing (but still multi-pass for POS)

– Probabilistic, constraint mechanism limited to left context

• “…much of the computation of linguistic analysis, which has traditionally been understood as the result of structure building operations, might instead be seen as lexical disambiguation”

– Substitution and adjunction still needed to connect trees, but typically only 1 way to do so (i.e. deterministic)

• “Stapling”

Page 16: “Mildly” Deterministic Language Processing November 2007 Jerry Ball Human Effectiveness Directorate Air Force Research Laboratory.

16

Double R Model

• Encoding of Referential and Relational Meaning (Ball in press)

• Construction Driven Language Processing (Ball 2007)

– Activation, selection and integration of constructions corresponding to the linguistic input (lexicalized)

– Mildly deterministic, serial processing mechanism (integration) operating over a parallel, probabilistic (constraint-based) substrate (activation & selection)

• Implemented in a Computational Cognitive Model (Ball, Heiberg & Silber 2007) using the ACT-R Cognitive Architecture (Anderson et al. 2004)

Page 17: “Mildly” Deterministic Language Processing November 2007 Jerry Ball Human Effectiveness Directorate Air Force Research Laboratory.

17

• For this presentation, focus on integration

• Serial processing without backtracking!

• If current input is unexpected given the prior context, then accommodate the input

– Adjust the representation

– Coerce the input into the representation

• The following example demonstrates the context accommodation mechanism

– “no airspeed or altitude restrictions”

Double R Model

Page 18: “Mildly” Deterministic Language Processing November 2007 Jerry Ball Human Effectiveness Directorate Air Force Research Laboratory.

18

no

“no” object specifier object referring expression

= nominal construction

Page 19: “Mildly” Deterministic Language Processing November 2007 Jerry Ball Human Effectiveness Directorate Air Force Research Laboratory.

19

no airspeed

“airspeed” object head

Tree structures created from output of model

automatically with new tool for dynamic visualization

of ACT-R declarative memory (Heiberg, Harris & Ball 2007)

integration

Page 20: “Mildly” Deterministic Language Processing November 2007 Jerry Ball Human Effectiveness Directorate Air Force Research Laboratory.

20

no airspeed or altitude

“airspeed or altitude” object head

Accommodation

of conjunction via

function overriding

override

Page 21: “Mildly” Deterministic Language Processing November 2007 Jerry Ball Human Effectiveness Directorate Air Force Research Laboratory.

21

no airspeed or altitude restrictions

“airspeed or altitude” modifier“restrictions” object head

Appearance of parallel processing!

airspeed or altitude = head vs.

airspeed or altitude = mod

Accommodation

of new head via

function shift

shift

Page 22: “Mildly” Deterministic Language Processing November 2007 Jerry Ball Human Effectiveness Directorate Air Force Research Laboratory.

22

• Coercion

– “the running of the bull” – head of nominal

• “running” construed objectively, arguments not expressed (“of the bull” functions as a modifier)

– “a Bin Laden supporter”

• Proper Noun functions as modifier

– “you’re no Jack Kennedy”

• Proper Noun functions as head (following specifier)

– “the newspaper boy porched the newspaper” – nonce expression (H. Clark 1983)

• “porched” construed as transitive action

Types of Accommodation

Page 23: “Mildly” Deterministic Language Processing November 2007 Jerry Ball Human Effectiveness Directorate Air Force Research Laboratory.

23

• Override

– Single word vs. Multi-Word Expression (MWE)

• “kicked…” transitive verb

– “kicked the bucket” idiomatic expression

• “take…” transitive verb

– “take a hike” “take five” “take time” “take place” “take out” “take my wife, please” “take a long walk off a short pier” … many idiomatic expressions

• Not possible to carry all forward in parallel

– Morphologically simple vs. complex

• “car…” noun

– “carport” noun

– “carpet…” noun

• “carpeting” noun or verb

Types of Accommodation

Page 24: “Mildly” Deterministic Language Processing November 2007 Jerry Ball Human Effectiveness Directorate Air Force Research Laboratory.

24

• Function Shift

– “he gave it to me”

• direct object (initial preference due to inanimacy)

– “he gave it the ball”

• direct object (initial preference) indirect object

– “he gave her the ball”

• indirect object (initial preference due to animacy)

– “he gave her to the groom”

• indirect object (initial preference) direct object

Types of Accommodation

Page 25: “Mildly” Deterministic Language Processing November 2007 Jerry Ball Human Effectiveness Directorate Air Force Research Laboratory.

25

• Function Shift

– “he said that…”

• In context of “said”, “that” typically functions as a complementizer

– But subsequent context can cause a function shift from

• complementizer

– “he said that she was happy”

• To nominal specifier to

– “he said that book was funny”

• To nominal head

– “he said that”

Types of Accommodation

Page 26: “Mildly” Deterministic Language Processing November 2007 Jerry Ball Human Effectiveness Directorate Air Force Research Laboratory.

26

• Function Shift

– “pressure” vs. “pressure valve” vs. “pressure valve adjustment” vs. “pressure valve adjustment screw” vs. “pressure valve adjustment screw fastener” vs. “pressure valve adjustment screw fastener part” vs. “pressure valve adjustment screw fastener part number”

• Serial nouns (and verbs) incrementally shift from head to modifier function as each new head is processed

• Functions like lookahead, but isn’t limited

• Not clear if a bounded ranked parallel mechanism can handle this!

– 2n possibilities if head or modifier at each word

Types of Accommodation

Page 27: “Mildly” Deterministic Language Processing November 2007 Jerry Ball Human Effectiveness Directorate Air Force Research Laboratory.

27

• Modulated Projection

– “the rice” vs. “rice”

– “the” projects a nominal and functions as a specifier

– In the context of “the” “rice” projects a head which functions as the head of the nominal

– When there is no specifier, “rice” projects a nominal as well as a nominal head

Types of Accommodation

Nominal

spec

the

head head

rice

+vs. head

rice

Nominal

“the rice” “rice”

Page 28: “Mildly” Deterministic Language Processing November 2007 Jerry Ball Human Effectiveness Directorate Air Force Research Laboratory.

28

• Context Accommodation is part and parcel of the Construction Integration mechanism

– Not viewed as a repair mechanism (Lewis 1998)

• Processor proceeds as though it were deterministic, but accommodates the input as needed

• Gives the appearance of parallel processing in a serial, deterministic mechanism

Summary of Context Accommodation

Page 29: “Mildly” Deterministic Language Processing November 2007 Jerry Ball Human Effectiveness Directorate Air Force Research Laboratory.

29

Combining Serial, Deterministicand Parallel, Probabilistic Mechanisms

Tree Supertagging

Construction Activation

& Selection

Supertag

Stapling

Construction

Integration

Rule ApplicationLexical Rule Selection

Rule

SelectionRule Application

Rule Selection & Application

Parallel Probabilistic

Serial Deterministic

Parallel Distributed Processing

CFG

PCFG

Lexicalized

PCFG

Double R

Probabilistic

LTAG

PDP

Mildly

Deterministic

Range

Nondeterministic

The parallel probabilistic substrate makes a mildly deterministic serial processing mechanism possible!

Page 30: “Mildly” Deterministic Language Processing November 2007 Jerry Ball Human Effectiveness Directorate Air Force Research Laboratory.

30

Ball, J., Heiberg, A. & Silber, R. (2007). Toward a Large-Scale Model of Language Comprehension in ACT-R 6. Proceedings of the 8th International Conference on Cognitive Modeling.

Ball, J. (2007). Construction-Driven Language Processing. Proceedings of the 2nd European Cognitive Science Conference.

Heiberg, A., Harris, J. & Ball, J. (2007). Dynamic Visualization of ACT-R Declarative Memory Structure. Proceedings of the 8th International Conference on Cognitive Modeling.

Questions?

Ball, J. (in press). A Bi-Polar Theory of Nominal and Clause Structure and Function. Annual Review of Cognitive Linguistics.

Page 31: “Mildly” Deterministic Language Processing November 2007 Jerry Ball Human Effectiveness Directorate Air Force Research Laboratory.

31

Crocker, M. (1999). Mechanisms for Sentence Processing. Garrod & Pickering (eds.), Language Processing, London: Psychology Press.

Anderson, J. R., Bothell, D., Byrne, M. D., Douglass, S, Lebiere, C, and Qin, Y. (2004). An Integrated Theory of the Mind. Psychological Review 111, (4). 1036-1060.

References

Christianson et al. (2001). Thematic roles assigned along the garden path linger. Cognitive Psychology, 42, 368-407.

Bever, T. (1970). The cognitive basis for linguistic structures. In J.R. Hayes (ed.), Cognition and Language Development, 277-360. New York: Wiley.

Gibson, E. & Pearlmutter, N. (2000). Distinguishing Serial and Parallel Parsing. Journal of Psycholinguistic Research, 29, 231-240.

Clark, H. (1983). Making sense of nonce sense. In G. Flores d’Arcais & R. Jarvella (Eds.), The Process of Language Understanding, 297-331. New York: John Wiley.

Henderson, J. (2004). Lookahead in Deterministic Left-Corner Parsing. Proceedings of the Workshop on Incremental Parsing: Bringing Engineering and Cognition Together. Barcelona, Spain.

Brants, T. & Crocker, M. (2000). Probabilistic Parsing and Psychological Plausibility. Proceedings of COLING, 111-117.

Page 32: “Mildly” Deterministic Language Processing November 2007 Jerry Ball Human Effectiveness Directorate Air Force Research Laboratory.

32

References

Lewis, R. (1993). An Architecturally-Based Theory of Human Sentence Comprehension. Unpublished doctoral dissertation, Carnegie-Mellon University.

Lewis, R. (1998). Reanalysis and Limited Repair Parsing: Leaping off the Garden Path. In Fodor, J. & Ferreira, F. (eds). Reanalysis in Sentence Processing. Boston: Kluwer Academic.

Lewis, R. (2000). Falsifying serial and parallel parsing models: Empirical conundrums and an overlooked paradigm. Journal of Psycholinguistic Research, 29, 241-248.

Marcus, M. (1980). A Theory of Syntactic Recognition for Natural Language. Cambridge, MA: The MIT Press.

Joshi, A. & Srinivas, B. (1994). Disambiguation of super parts of speech (Supertags): Almost parsing. Proceedings of the 1994 International Conference on Computational Linguisics (COLING).

Kim, A., Srinivas, B. & Trueswell, J. (2002). The convergence of lexicalist perspectives in psycholinguistics and computational linguistics. In Merlo, P. & Stevenson, S. (eds), Sentence Processing and the Lexicon: Formal, Computational and Experimental Perspectives, 109-135. Philadelphia, PA: Benjamins Publishing Co.

Page 33: “Mildly” Deterministic Language Processing November 2007 Jerry Ball Human Effectiveness Directorate Air Force Research Laboratory.

33

References

Tanenhaus, M., Spivey-Knowlton, M. Eberhard, K. & Sedivy, J. (1995). Integration of visual and linguistic information in spoken language comprehension. Science, 268, 632-634.

Trueswell, J. Sekering, I., Hill, N. & Logrip, M. (1999). The kindergarten path effect: studying on-line sentence processing in young children. Cognition, 73, 89-134.

Srinivas, B. & Joshi, A. (1999). Supertagging: An approach to almost parsing. Computational Linguistics, 25, 237-265.

Vosse, T. & Kempen, G. (2000). Syntactic structure assembly in human parsing. Cognition, 75, 105-143.

Shen, L. & Joshi, A. (2005). Incremental LTAG Parsing. Proceedings of the Conference on Human Language Technology and Emprical Methods in NLP, 811-818. NJ: ACL.