RMRS some background and current work. Talk overview RMRS: integrating processors via semantics...

RMRS

some background and current work

Talk overview RMRS: integrating processors via

semantics Underspecified semantics from shallow

processing Integration experiments with broad-

coverage systems/grammars (LinGO ERG and RASP)

Planned work

Integrating processing No single system can do everything:

deep and shallow processing have inherent strengths and weaknesses

Domain-dependent and domain-independent processing must be linked

Parsers and generators Common representation for processing

`above sentence level’ (e.g., anaphora)

Compositional semantics as a common representation Need a common representation language for

systems: pairwise compatibility between systems is too limiting

Syntax is theory-specific and unnecessarily language-specific

Eventual goal should be semantics Core idea: shallow processing gives underspecified

semantic representation, so deep and shallow systems can be integrated

Full interlingua / common lexical semantics is too difficult (certainly currently), but can link predicates to ontologies, etc.

Shallow processing and underspecified semantics Integrated parsing: shallow parsed phrases

incorporated into deep parsed structures Deep parsing invoked incrementally in response

to information needs Reuse of knowledge sources:

domain knowledge, recognition of named entities, transfer rules in MT

Integrated generation Formal properties clearer, representations more

generally usable Deep semantics taken as normative

RMRS approach: current and planned applications Question answering:

Cambridge CSTIT: deep parse questions, shallow parse answers QA from structured knowledge: Frank et al

Information extraction: Deep Thought Chemistry texts (SciBorg (?))

Dictionary definition parsing for Japanese and English Bond and Flickinger

Rhetorical structure, multi-document summarization, email response ...

also LOGON: semantic transfer. MRSs from LFG used in HPSG generator.

RMRS: Extreme underspecification Goal is to split up semantic representation

into minimal components (cf Verbmobil VITs) Scope underspecification (MRS) Splitting up predicate argument structure Explicit equalities Hierarchies for predicates and sorts

Compatibility with deep grammars: Sorts and (some) closed class word information in

SEM-I (API for grammar, more later) No lexicon for shallow processing (apart from POS

tags and possibly closed class words)

RMRS principles Split up information content as much as

possible Accumulate information monotonically

by simple operations Don’t represent what you don’t know

but preserve everything you do know Use a flat representation to allow pieces

to be accessed individually

Separating argumentslb1:every(x,h9,h6), lb2:cat(x), lb5:dog1(y),

lb4:some(y,h8,h7), lb3:chase(e,x,y), h9=lb2,h8=lb5

goes to:

lb1:every(x), RSTR(lb1,h9), BODY(lb1,h6), lb2:cat(x), lb5:dog1(y), lb4:some(y), RSTR(lb4,h8), BODY(lb4,h7), lb3:chase(e),ARG1(lb3,x),ARG2(lb3,y), h9=lb2,h8=lb5

Naming conventions:predicate names without a lexiconlb1:_every_q(x1sg),RSTR(lb1,h9),BODY(lb1,h6),

lb2:_cat_n(x2sg),

lb5:_dog_n_1(x4sg),

lb4:_some_q(x3sg),RSTR(lb4,h8),BODY(lb4,h7),

lb3:_chase_v(esp),ARG1(lb3,x2sg),ARG2(lb3,x4sg)h9=lb2,h8=lb5, x1sg=x2sg,x3sg=x4sg

POS output as underspecificationDEEP –

lb1:_every_q(x1sg), RSTR(lb1,h9), BODY(lb1,h6), lb2:_cat_n(x2sg), lb5:_dog_n_1(x4sg), lb4:_some_q(x3sg), RSTR(lb4,h8), BODY(lb4,h7),lb3:_chase_v(esp), ARG1(lb3,x2sg),ARG2(lb3,x4sg), h9=lb2,h8=lb5, x1sg=x2sg,x3sg=x4sg

POS –

lb1:_every_q(x1), lb2:_cat_n(x2sg), lb3:_chase_v(epast), lb4:_some_q(x3), lb5:_dog_n(x4sg)

POS output as underspecificationDEEP –

lb1:_every_q(x1sg), RSTR(lb1,h9),BODY(lb1,h6), lb2:_cat_n(x2sg), lb5:_dog_n_1(x4sg), lb4:_some_q(x3sg), RSTR(lb4,h8), BODY(lb4,h7),lb3:_chase_v(esp), ARG1(lb3,x2sg),ARG2(lb3,x3sg), h9=lb2,h8=lb5, x1sg=x2sg,x3sg=x4sg

POS –

lb1:_every_q(x1), lb2:_cat_n(x2sg), lb3:_chase_v(epast), lb4:_some_q(x3), lb5:_dog_n(x4sg)

Semantics from RASP RASP: robust, domain-independent, statistical

parsing (Briscoe and Carroll) can’t produce conventional semantics

because no subcategorization can often identify arguments:

S -> NP VP NP supplies ARG1 for V potential for partial identification:

VP -> V NP S -> NP S NP might be ARG2 or ARG3

Underspecification of arguments

ARGN

ARG1or2 ARG2or3

ARG2ARG1 ARG3

RASP arguments can be specified as ARGN, ARG2or3 etcAlso useful for Japanese deep parsing?

RMRS construction ERG etc – uses MRS -> RMRS converter

argument splitting etc also RMRS -> MRS conversion

POS-RMRS: tag lexicon RASP-RMRS: tag lexicon plus semantic

rules associated with RASP rules to match ERG defaults when no rule RMRS specified

RMRS composition with non-lexicalized grammars MRS composition assumes a lexicalized

approach: algebra defined in Copestake, Lascarides and Flickinger (2001)

RMRS with non-lexicalised grammars: has similar basic algebra without lexical subcategorization, rely on grammar

rules to provide the ARGs `anchors’ rather than slots, to ground the ARGs

(single anchor for RASP) developed on basis of semantic test suite most rules written by Anna Ritchie

Some cat sleeps (in RASP)[h3,e], <h3>, {h3:_sleep(e)}sleeps[h,x], <h1>, {h1:_some(x),RSTR(h1,h2),h2:_cat(x)}some cat

S->NP VP: Head=VP, ARG1(<VP anchor>,<NP hook.index>)[h3,e], <h3>, {h3:_sleep(e), ARG1(h3,x),

h1:_some(x),RSTR(h1,h2),h2:_cat(x)}some cat sleeps

Real rule ...

ERG-RMRS / RASP-RMRS

Inchoative

Infinitival subject (unbound in RASP-RMRS)

Ditransitive: missing ARG3

Mismatch: Expletive it

Mismatch: larger numbers

Comments on RASP-RMRS Fast enough (not significant compared to RASP

processing time because no ambiguity) Too many RASP rules! Need to generalise over

classes. Requires SEM-I – API for MRS/RMRS from deep

grammar RASP and ERG may change:

compatible test suites – semi-automatic rule update? alternative technique for composition?

Parse selection – need to generalise over RMRSs weighted intersections of RMRSs (cf RASP grammatical

relations)

SEM-I: semantic interface Meta-level: manually specified `grammar’

relations (constructions and closed-class) Object-level: linked to lexical database for

deep grammars Object-level SEM-I auto-generated from expanded

lexical entries in deep grammars (because type can contribute relations)

Validation of other lexicons Need closed class items for RMRS

construction from shallow processing

Alignment and XML Comparing RMRSs for same text

efficiently uses characterization labels RMRSs according to their source in

the text currently characters, but byte offset?

Japanese etc? RMRS-XML RMRS seen as levels of mark-up:

standoff annotation

SciBorg: Chemistry texts eScience project starting in October at Cambridge

Computer Laboratory (Copestake, Teufel), Chemistry (Murray-Rust), CeSC (Parker)

Aims: Develop an NL markup language which will act as a platform

for extraction of information. Link to semantic web languages.

Develop IE technology and core ontologies for use by publishers, researchers, readers, vendors and regulatory organisations.

Model scientific argumentation and citation purpose in order to support novel modes of information access.

Demonstrate the applicability of this infrastructure in a real-world eScience environment.

Research markup Chemistry: The primary aims of the present study are

(i) the synthesis of an amino acid derivative that can be incorporated into proteins /via/ standard solid-phase synthesis methods, and (ii) a test of the ability of the derivative to function as a photoswitch in a biological environment.

Computational Linguistics: The goal of the work reported here is to develop a method that can automatically refine the Hidden Markov Models to produce a more accurate language model.

RMRS and research markup Specify cues in RMRS Deep process cues: feasible because

domain-independent more general and reliable than shallow

techniques allows for complex interrelationships

Use zones for advanced citation maps and other enhancements to repositories

Conclusions RMRS: semantic representation

language allowing linking of deep and shallower processors

RMRS construction: phrase-level compatibility between processors

Many potential applications

RMRS some background and current work. Talk overview RMRS: integrating processors via semantics...

Documents

Transcript of RMRS some background and current work. Talk overview RMRS: integrating processors via semantics...