Conceptual noun types: grammar and automatic classification Christian Horn & Christof Rumpf CTF 07,...

25
Conceptual noun types: grammar and automatic classification Christian Horn & Christof Rumpf CTF 07, Düsseldorf Institute for Language and Information Depts. of General Linguistics & Computerli

Transcript of Conceptual noun types: grammar and automatic classification Christian Horn & Christof Rumpf CTF 07,...

Page 1: Conceptual noun types: grammar and automatic classification Christian Horn & Christof Rumpf CTF 07, Düsseldorf Institute for Language and Information Depts.

Conceptual noun types:grammar and automatic classification

Christian Horn & Christof Rumpf

CTF 07, Düsseldorf

Institute for Language and InformationDepts. of General Linguistics & Computerlinguistics

Page 2: Conceptual noun types: grammar and automatic classification Christian Horn & Christof Rumpf CTF 07, Düsseldorf Institute for Language and Information Depts.

CTF 07 Horn & Rumpf: Conceptual noun types: grammar and automatic classification 2

Structure

1. The four conceptual noun types and their contextual properties

2. Investigation of grammatical properties of the conceptual noun

types on the basis of a German text corpus

3. A framework for the automatic classification of concept types

4. Conclusion

Page 3: Conceptual noun types: grammar and automatic classification Christian Horn & Christof Rumpf CTF 07, Düsseldorf Institute for Language and Information Depts.

CTF 07 Horn & Rumpf: Conceptual noun types: grammar and automatic classification 3

Conceptual noun types

inherently unique

‚sortal‘ SC

rose, car, horse, house, table, noun

‚individual‘ IC

pope, weather, proper names, sun, semantics

relational

‚relational‘ RC

sister, uncle, arm, leg, part

‚functional‘ FC

mother, wife, size, weight, meaning

Löbner (1979, 1985, 1998)

Conceptual noun types differ according to their referential properties.

Do they differ regarding their grammatical uses?

1. The four conceptual noun types and their contextual properties

Page 4: Conceptual noun types: grammar and automatic classification Christian Horn & Christof Rumpf CTF 07, Düsseldorf Institute for Language and Information Depts.

CTF 07 Horn & Rumpf: Conceptual noun types: grammar and automatic classification 4

Sortal concepts

A rose is a nice present.

Many roses are an even nicer present.

Individual concepts

The sun is burning.§A sun is burning. §The suns are burning. §Many suns are burning.§My sun is burning. / §The sun of mine is burning.

§ = use differing from underlying concept type

Grammatical uses of conceptual noun types

1. The four conceptual noun types and their contextual properties

Page 5: Conceptual noun types: grammar and automatic classification Christian Horn & Christof Rumpf CTF 07, Düsseldorf Institute for Language and Information Depts.

CTF 07 Horn & Rumpf: Conceptual noun types: grammar and automatic classification 5

Grammatical uses of conceptual noun types

Relational concepts

One of Mary‘s legs is too short.

§Mary‘s leg is too short. / §The leg of Mary is too short.

§Many legs of Mary are too short.

Functional conceptsMary is Peter‘s mother. / Mary is the mother of Peter.

§Mary is a mother of Peter.

§Mary is the mother.

1. The four conceptual noun types and their contextual properties

Page 6: Conceptual noun types: grammar and automatic classification Christian Horn & Christof Rumpf CTF 07, Düsseldorf Institute for Language and Information Depts.

CTF 07 Horn & Rumpf: Conceptual noun types: grammar and automatic classification 6

Contextual properties of conceptual noun types

• grammatical characteristics possessive use: his mother / mother of him definiteness: the sun subcategorization: certain verbs require IC/FC as complements

• morphological properties: certain nouns are often functional deadjectival nouns (Intelligenz ‘intelligence’) deverbal nouns (Krümmung ‘bend’, Dauer ‘length’) compounds

-wert ‘value’ Bestwert ‘optimum value’-grad ‘degree’ Wirkungsgrad ‘degree of efficiency’-größe ‘size’ Kleidergröße ‘dress size’

1. The four conceptual noun types and their contextual properties

Page 7: Conceptual noun types: grammar and automatic classification Christian Horn & Christof Rumpf CTF 07, Düsseldorf Institute for Language and Information Depts.

CTF 07 Horn & Rumpf: Conceptual noun types: grammar and automatic classification 7

2. Investigation of grammatical properties of the conceptual noun types on the basis of a German text corpus

Goals:• to identify the possible uses of the different concept types and their

specific context features• to develop and implement a method for the automatic classification

of concept types in texts based on morphosyntactic features

Hybrid approach:• semantic and grammatical analysis of the conceptual noun types• statistic investigation: automatic classification allows the processing

of large amounts of data• investigation is initially carried out on the basis of a German text

corpus (108.000 words) as a training corpus• perspective: further research intended on English, French,

Japanese

2. Investigation of grammatical properties

Page 8: Conceptual noun types: grammar and automatic classification Christian Horn & Christof Rumpf CTF 07, Düsseldorf Institute for Language and Information Depts.

CTF 07 Horn & Rumpf: Conceptual noun types: grammar and automatic classification 8

Predictions

Assumptions: • The lexicalized concept type of a noun is the most frequently used type

for each noun.• Conceptual noun types occur particularly often in grammatical uses

that match their underlying conceptual properties.

– sortal concepts (rose): singular, plural, with quantifiers, indefinite ...– individual concepts (sun): singular, definite– relational concepts (leg): indefinite, possessive– functional concepts (mother): singular, definite, possessive

• Other uses (‚type shifts‘) are still possible. The conditions under which these type shifts occur still have to be investigated.

2. Investigation of grammatical properties

Page 9: Conceptual noun types: grammar and automatic classification Christian Horn & Christof Rumpf CTF 07, Düsseldorf Institute for Language and Information Depts.

CTF 07 Horn & Rumpf: Conceptual noun types: grammar and automatic classification 9

Counting

(selection, ‚definiteness‘)

Token Type # total def. 1 sg. def. 1 pl. quant./indef. 2 sg.

quant./indef. 2 pl. Ø 3 sg. Ø 3 pl.

-1 -n 4 -1 -n -1 -n -1 -n -1 -n -1 -n

Nomen SC 166 57 62 5 6 14 16 13 17 18 20 31 45

Semantik IC 152 60 122 0 0 0 12 0 0 5 16 0 0

Teil RC 124 9 22 11 14 32 45 3 7 23 23 2 9

Bedeutung FC 721 296 408 56 66 20 47 11 22 90 121 27 54

1 definite: def. determiner, poss. pron., gen. pron., d-Prep, d-selb, d-einzig, genitive deren/dessen), d-jen2 quantifiers/indefinite: quantifiers, indefinite determiner, demonstratives, numbers, kein, d-beid, d-ord3 null determiner4 incl. -1

2. Investigation of grammatical properties

Page 10: Conceptual noun types: grammar and automatic classification Christian Horn & Christof Rumpf CTF 07, Düsseldorf Institute for Language and Information Depts.

CTF 07 Horn & Rumpf: Conceptual noun types: grammar and automatic classification 10

Results

Conceptual noun type singular, definite possessive

Sortal concept Nomen (‘noun’) (166) 37 % 0 %

Relational concept Teil (‘part’) (124) 36 % 73 %

Individual concept Semantik (‘semantics’) (152) 82 % 4 %

Functional concept Bedeutung (‘meaning’) (721) 57 % 74 %

(selection)

Results so far confirm our predictions.

2. Investigation of grammatical properties

Page 11: Conceptual noun types: grammar and automatic classification Christian Horn & Christof Rumpf CTF 07, Düsseldorf Institute for Language and Information Depts.

CTF 07 Horn & Rumpf: Conceptual noun types: grammar and automatic classification 11

Tasks & Challenges

• Type shifts in certain readingsThe meaning of the word. (FC)

The word bottle has many meanings. (RC)

• Generic and anaphoric usesThe lightbulb was invented by Heinrich Göbel. (generic)

• Polysemy

• Analysis of possessive constructions, plurals, null determiner

2. Investigation of grammatical properties

Page 12: Conceptual noun types: grammar and automatic classification Christian Horn & Christof Rumpf CTF 07, Düsseldorf Institute for Language and Information Depts.

CTF 07 Horn & Rumpf: Conceptual noun types: grammar and automatic classification 12

3. A framework for the automatic classification of concept types

• Architecture• Training corpus• Morphosyntactic analysis• Training sample• Computing classifiers• Maximum entropy models• Conclusion

3. A framework for the automatic classification of concept types

Page 13: Conceptual noun types: grammar and automatic classification Christian Horn & Christof Rumpf CTF 07, Düsseldorf Institute for Language and Information Depts.

CTF 07 Horn & Rumpf: Conceptual noun types: grammar and automatic classification 13

Architecture of the framework

( , )

1

1( | )

( )j

kf a b

jj

p a bZ b

morphosyntactical analysis

training corpus

training sample

maximum entropy model

msyn: dependency grammar parser

extraction of relevant context features

morphosyntactical analysis

test corpus

test sample

manual annotation of concept types

learning

application

Generalized Iterative Scaling

annotated test korpus

learning / applicationof a classifier

3. A framework for the automatic classification of concept types

Page 14: Conceptual noun types: grammar and automatic classification Christian Horn & Christof Rumpf CTF 07, Düsseldorf Institute for Language and Information Depts.

CTF 07 Horn & Rumpf: Conceptual noun types: grammar and automatic classification 14

Training corpus

• Manually annotated version of Löbner (2003) Semantik• Concept types of nouns marked with tags

Die <f1>Semantik</f1> ist das <r2>Teilgebiet</r2> der <f2>Linguistik</f2>, das sich mit <r2>Bedeutung</r2> befasst. Diese <r2>Art</r2> von <f2>Definition</f2> mag vielleicht ihrem <r2>Freund</r2> genügen, der Sie zufällig mit diesem <so>Buch </so> in der <r2>Hand</r2> sieht und Sie fragt, was denn nun schon wieder sei, aber als <f2>Autor</f2> einer solchen <r2>Einführung</r2> muss ich natürlich präziser erklären, was der <f2>Gegenstand</f2> dieser <so>Wissenschaft</so> ist.

3. A framework for the automatic classification of concept types

Page 15: Conceptual noun types: grammar and automatic classification Christian Horn & Christof Rumpf CTF 07, Düsseldorf Institute for Language and Information Depts.

CTF 07 Horn & Rumpf: Conceptual noun types: grammar and automatic classification 15

Morphosyntactical analysis

• We use Connexor‘s msyn to analyse German texts. www.connexor.com

• Syntactical information consists of dependency trees.

• Morphological features include part-of-speech, gender, number, case, time, mood and some more.

• Some postprocessing is done by ourselves, i.e. to add definitness markers.

3. A framework for the automatic classification of concept types

Page 16: Conceptual noun types: grammar and automatic classification Christian Horn & Christof Rumpf CTF 07, Düsseldorf Institute for Language and Information Depts.

CTF 07 Horn & Rumpf: Conceptual noun types: grammar and automatic classification 16

Dependency tree

main - ist

subj - Semantik

det - DieDef

comp - Teilgebiet

det - dasDef

det - derDef

mod - LinguistikGen

possessor

Die Semantik ist das Teilgebiet der Linguistik, …The semantics is that branch of linguistics

3. A framework for the automatic classification of concept types

Page 17: Conceptual noun types: grammar and automatic classification Christian Horn & Christof Rumpf CTF 07, Düsseldorf Institute for Language and Information Depts.

CTF 07 Horn & Rumpf: Conceptual noun types: grammar and automatic classification 17

Output of Connexor‘s msyn<?xml version="1.0" encoding="iso-8859-1"?><!DOCTYPE analysis SYSTEM "http://www.connexor.com/dtds/4.0/fdg3.dtd"><analysis><sentence id="w1">

<token id="w2"> <text>Die</text> <lemma>die</lemma> <depend head="w3">det</depend> <tags><syntax>PREMOD</syntax><morpho>DET Def FEM SG NOM</morpho></tags></token>

<token id="w3"> <text>Semantik</text> <lemma>semantik</lemma> <depend head="w4">subj</depend><tags><syntax>NH</syntax> <morpho>N FEM SG NOM</morpho></tags></token>

<token id="w4"> <text>ist</text> <lemma>sein</lemma> <depend head="w1">main</depend><tags><syntax>MAIN</syntax> <morpho>V IND PRES SG P3</morpho></tags></token>

<token id="w5"> <text>das</text> <lemma>das</lemma> <depend head="w6">det</depend>‘<tags><syntax>PREMOD</syntax> <morpho>DET Def NEU SG NOM</morpho></tags></token>

<token id="w6"> <text>Teilgebiet</text> <lemma>teil#gebiet</lemma> <depend head="w4">comp</depend><tags><syntax>NH</syntax> <morpho>N NEU SG NOM</morpho></tags></token>

<token id="w7"> <text>der</text> <lemma>die</lemma> <depend head="w8">det</depend><tags><syntax>PREMOD</syntax> <morpho>DET Def FEM SG GEN</morpho></tags></token>

<token id="w8"> <text>Linguistik</text> <lemma>linguistik</lemma> <depend head="w6">mod</depend><tags><syntax>NH</syntax> <morpho>N FEM SG GEN</morpho></tags></token>

3. A framework for the automatic classification of concept types

Page 18: Conceptual noun types: grammar and automatic classification Christian Horn & Christof Rumpf CTF 07, Düsseldorf Institute for Language and Information Depts.

CTF 07 Horn & Rumpf: Conceptual noun types: grammar and automatic classification 18

training sample

Extraction of relevant contextual features with regular expressions mapped on

dependency trees with the programming language Perl.

Results in pairs (concept type | list of context features):

(f1, [tnr=2, tok=semantik, suff=ik, num=sg, art=def])

(r2, [tnr=5, tok=teilgebiet, num=sg, art=def, poss=rgen])

(f1, [tnr=7, tok=linguistik, suff=ik, num=sg, art=def])

(f2, [tnr=12, tok=bedeutung, suff=ung, num=sg, art=none])

(r2, [tnr=16, tok=art, num=sg, art=indef, poss=von])

(f2, [tnr=18, tok=definition, num=sg, art=none])

(r2, [tnr=22, tok=freund, num=sg, art=def])

(so, [tnr=30, tok=buch, num=sg, art=indef])

(r2, [tnr=33, tok=hand, num=sg, art=def])

(f2, [tnr=49, tok=autor, num=sg, art=none])

(r2, [tnr=52, tok=einführung, suff=ung, num=sg, art=indef])

(f2, [tnr=61, tok=gegenstand, num=sg, art=def])

3. A framework for the automatic classification of concept types

Page 19: Conceptual noun types: grammar and automatic classification Christian Horn & Christof Rumpf CTF 07, Düsseldorf Institute for Language and Information Depts.

CTF 07 Horn & Rumpf: Conceptual noun types: grammar and automatic classification 19

Automatic classification

• given:– training sample = {(a1,b1),…,(an,bn)} – classes ai {f1, f2, r1, r2}

– contexts bi = {m1,…,mm}

– features mi {art=def, art=indef, poss=lgen, …}

• searched: – classifier p(a|b)

How probable is class a given context b?– maximal argument a’ = arg maxa p(a|b)

Which is the most probable class a’ given context b?

3. A framework for the automatic classification of concept types

Page 20: Conceptual noun types: grammar and automatic classification Christian Horn & Christof Rumpf CTF 07, Düsseldorf Institute for Language and Information Depts.

CTF 07 Horn & Rumpf: Conceptual noun types: grammar and automatic classification 20

Computing a (bad) classifier

• simplest account:– Counting coocurrences of

classes and contexts:

• shortcomings:– Only the contexts in are learned.– Varying degrees of evidence of single features are disregarded.

• way out: – Computation of the classifier with a maximimum entropy model.

)(

),()|(

bCount

baCountbap

3. A framework for the automatic classification of concept types

Page 21: Conceptual noun types: grammar and automatic classification Christian Horn & Christof Rumpf CTF 07, Düsseldorf Institute for Language and Information Depts.

CTF 07 Horn & Rumpf: Conceptual noun types: grammar and automatic classification 21

Maximum entropy models

• Basics

– Entropy: number of bits required to encode events of a particular type (tossing a coin: 1 bit, rolling a die: 2 ½ Bit).

– Principle of maximum entropy: choose a model with maximum entropy, i.e. don‘t go beyond the data.

• Specific features

– Decompositon of contexts into single context features or their combination.

– Possibility to combine features from heterogenous sources (e.g. syntax, semantics, morphology, …).

– Computation of the weights (evidence) of single features or their combination for every class over all contexts.

3. A framework for the automatic classification of concept types

Page 22: Conceptual noun types: grammar and automatic classification Christian Horn & Christof Rumpf CTF 07, Düsseldorf Institute for Language and Information Depts.

CTF 07 Horn & Rumpf: Conceptual noun types: grammar and automatic classification 22

Contextual and binary features

The weights for contextual features are determined indirectly with binary features. These relate classes and contextual features.

– simple binary features example instance

– complex binary features example instance

else 0

a'a and b cf if 1 baf acf ),(',

else 0

a'a and b cfs if 1 baf acfs ),(',

35

(art=def) 1( ,

f)

1 if b and af a b

0 else

17

art=def, poss=( , )

von f2 1 if b and af a b

0 else

3. A framework for the automatic classification of concept types

Page 23: Conceptual noun types: grammar and automatic classification Christian Horn & Christof Rumpf CTF 07, Düsseldorf Institute for Language and Information Depts.

CTF 07 Horn & Rumpf: Conceptual noun types: grammar and automatic classification 23

Maximum entropy framework

( , )

1

( , )

1

1( | )

( )

( )

j

j

kf a b

jj

kf a b

ja A j

p a bZ b

Z b

where j > 0 is a wheight for feature fj, k is the total number of binary features,and Z(b) is a normalization constant to ensure that a p(a|b) = 1 resp. 100%

cf. Ratnaparkhi 1998

3. A framework for the automatic classification of concept types

Page 24: Conceptual noun types: grammar and automatic classification Christian Horn & Christof Rumpf CTF 07, Düsseldorf Institute for Language and Information Depts.

CTF 07 Horn & Rumpf: Conceptual noun types: grammar and automatic classification 24

Generalized Iterative Scaling

Unfortunately, there is no analytical method to determine the weights .

There are some iterative approximation algorithms to determine the , which converge to a ‚correct‘ p(a|b) and respect the principle of maximum entropy.

We use Generalized Iterative Scaling (GIS):

0

1

1

1

n

j

Cp jn n

j jjp

E f

E f

is the expectation value for feature fj in the training corpus

is the expectation value for feature fj in the previous iteration

The constant C is the total number of ‚active‘ binary features over all contexts.

p jE f

n jpE f

initialization

iteration

3. A framework for the automatic classification of concept types

Page 25: Conceptual noun types: grammar and automatic classification Christian Horn & Christof Rumpf CTF 07, Düsseldorf Institute for Language and Information Depts.

CTF 07 Horn & Rumpf: Conceptual noun types: grammar and automatic classification 25

Conclusion

• The investigations so far support the assumption that the referential properties of the concept types match their grammatical uses.

• The maximum entropy framework allows a fine grained analysis of the evidence contributed by a single context feature to the classification.

• The selection of relevant features is essential for the success of the automatic classification. Our research objective consists to a great deal in the examination of this features.

• We start experiments with complex features to model combined evidence of context features.

4. Conclusion