Post on 14-Feb-2018
7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics
1/32
Ulrich
Heid
Institutfr
maschinelleSprachverarbeitung,Universitt
Stuttgart
On
Ways
WordsWork
Together
-
Topics
in
Lexical
Combinatorics
1.
Introduction
1.1
Broad
areas
of
combinatory
phenomena
The
domain
of
lexical
combinatorics
has
received
much
interest
over
the
last
ears,
n
yntax,
exical
emantics
nd
exicology,
ut
lso
n
lexicography,erminology,
erminography
nd
naturalanguage
Processing
NLP).
f
he
ield
ofcombinatorics
can
maybe
rivially
be
defined
by
the
fact
that
it
deals
with
syntagmaticcombination
phenomena
involvingwoormoreexemes,
t
s
much
harderocomeup
with
ny
reasonable
internal
subdivision
of
the
field.
Phenomenawhichareusuallydescribedsbelongingohedomain
of
combinatorics
include,
among
others:
electional
properties
of lexical
items:
for
example,
the
English
verb
to
growas ,
roadly
speaking,
worenchequivalents,
pousser
and
grandir.Andmostdictionarieswouldstatethatpousseris
preferred
if
thesubjectnoun
denotes
a
plant,grandirifitdenotes
a
human
being
1
The
classical
exampleofGerman
essen
^ >
fressen,
for
English
toea t
(depending
onthe
distinction
betweenhumanbeingandanimal)
is
anotherinstanceofthis
phenomenon.
ollocations:
ccording
o
any
inguistsndexicographers,
2
collocationsarecombinationsofexactlywo
exemes
of
category
noun,
verb,
adjective
or
adverb),
ealizing
two
concepts,
where
the
choice
ofoneofthemdepends
on
(or:
is
restricted
by)the
other.
Typical
examples
which
are
often
cited
are
FR
un
clibataire
endurci,
&i
eingefleischterJunggeselle,
EN
pay
attention,
FR
pousser
un
cri,
etc.
Usually,
some
sort
ofdetermination
relation
between
the
twoitems
can
be
found.
3
OtherlexicographersandNLP
researchers
haveawidernotionof
collocation
whichsubsumes
any
kind
of
combination oftwo
words,
as
itoccurs(adjacently)in
a
text.
Suchawider viewisnotuncommonin
work
on
statistical
tools
(where
e.g.
also
the
combination
with
closed
classitemsmayberegardedas
a
collocation,and
where
frequency,e.g.
of
co-occurrence,
is
the
maindefinitioncriterion).
Much
f
he
iscussion
n
his
onference
ill
eevoted
o
collocations;
this
is
one
of
the
reasons
whywehavechosento
discuss
7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics
2/32
The
way
words
work
together
/
combinatorics
27
collocationsn
some
more
detail,
n
his
paper,
aking
hem
s
paradigmaticexampleof
some
of
theresearchtopicsinthe
linguistic
and
lexicographicdescriptionof
combinatory
phenomena.
dioms:
he
ommon
iew
n
dioms
s
hat
hey
re
ultiword
expressions
(more
than
twoitems)
which
have
anen
bloc-meaning
opaque
with
respect
to
the
usual
meaning
ofthewordsmaking
up
the
combination.In
examples
like
DE
das
Kind
mit
dem
Badeausschtten,
we
do
not
sayanythingabouta
child
orabath,somebody
who
FR
a(voir)
unearaigneau
plafond
may
alsohaveother
trouble
than
just
withaspider.
Assoon
as we
look
at
data
fromtextcorpora,
cases
comeup
whereitisnot
easy
to
determine
clearly
whether
to
treat
a
given
item
as
idiomatic
or
as
collocational:
DEeine
Frage
stellenis
usually
classified
anddescribed
as
a
collocationa
upport
erb
onstruction
lmost
ynonymous
ith
E
fragen),
whereas
DEin
Fragestellenis
lessclear:should
it
betreatedas an
idiomroughlyequivalenttoDEanzweifelnorasacollocation?
1.2 Structureofthispaper
Thepurposeof
this
paper
is
togiveanoverviewofsomeresearchtopics
in
the
field
of
lexicalcombinatorics.
This
includes
apresentation
ofthe
main
approaches,
methods
and
strands
of
research,
as
of
open
issues
and
lines
to
be
followed,
in
paricular
those
discussed
at
the
Euralex-94conference.Such
anoverview
is
boundtobepartial,inboth
sensesoftheword:
it
is
impossible
to
select
all
and
only
the)relevanttopics,andtheselection
is
ofcourse
biased
towards
the
preferencesofthe
author.
Nevertheless,selectingcollocations
as
a
prototypical
phenomenon
seems
tomakesenserom moregeneralpointof
view
swell:collocational
phenomenareentraloexicographers,orpusinguistsnd
terminologists;evidence:
the
sheer
numberofpapers
on
this
topic
submitted
to
theEuralex-94
conference.
Moreover,
the description
and
lexicographic
modeling
and
representation
of
collocations
isnot
at
allan
easy
task:a
few
propertiesofcollocations
are
well-known
and
easily
eproducible,
but
othersare
controversial
ornot
easy
to
consistentlyverify
on
data.
The
problems
which
need
to
be
addressed
and
which
will
beto
some
extent
discussedin
this
paperfall
into the
following
areas:
efining
the
notion
of
collocation,delimiting
it
with
respect
toother
combinatory
henomena
nd
dentifying
riteria
llowing
o
operationalize
tosome
extent
the
definitions;
escribingsyntactic,semantic
and
pragmatic
properties
ofcollocations
and
other
combinatory phenomena,
both
within
descriptive
linguistic
and
lexicographicwork(thelatterincludinginaddition
to
linguistic
descriptionalso
issues
ofthepresentation
of
the
descriptiveresults);
7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics
3/32
228
uralex
1994
etting,
y
eans
f
computational
ools
or
exical
cquisition,
material
potentiallyrelevant
forcollocationaldescription:
techniques,
methodsandtools
for
extractingcollocationcandidates
from
texts;
epresenting
nd
sing
ollocational
nformation,
or
xample
n
translation,
bothhumanand computer-aidedor
automatic.
These
opics
span
a
ange
ofactivities
ofcomputational)
inguists
and
(computational)
lexicographersand
terminologists:
definition,description
andexicographicresentationf
ollocations,
s
ell
sheir
(semi-)automaticcquisitionndse
n
uman
nd
omputational
applications oflexical
knowledge
sources.
W e
havechosen
to
comment
on
these
topics
in
thefollowing
order:
efinitional
and
descriptive
problems,
as
treated
inlinguisticworkon
lexical
combinatorics
will be
discussed,
alongwith
syntactic,
semantic
andpragmaticpropertiesofcollocations,
in
Section
2.
Thisallowsus to
better
capture
the
phenomenonwedeal
with,
fromdifferentpointsof
views.
nthisbasis,
we
willdeal
withthelexicographicandterminographic
treatmentofcollocations,
ncluding
aspectsofhepresentationof
descriptiveresults
indictionaries,
inSection
3.
he
acquisitionof
collocationally
relevantinformationfromtextual
corpora,
s
well
s
he
use
of
collocation
knowledge
n
ranslation
dictionaries
will
be
the
topic
ofSection4.
W e
will
illustratesomeofthestatements
made
in
this
paperwithexamples
fromdictionaries.
The
aim
ofthispaperis
not
to
support
onegivenapproach
or
to
argue
foragiven
methodortool
for
theacquisitionordescription
of
collocations:
the
exampleshavebeen chosenfortheir
illustrative
character,
andanattempt
has
been
made
to
cover
several
approaches.
2.Propertiesof
combinatoryphenomena
-
th e
case
of
collocations
2.1
Data
and
afirstinterpretation
Thentuitionaboutcollocationsshatheyarecombinationsof
two
lexemes,
not
necessarily
extuallyadjacentones.
To
hese
wo
exemes
correspondwoconcepts.
ncertaincollocations,wecan
ind
egular
semanticinterrelationship
between
thetwocomponents
whichiscloseto
a
determination
relation(collocations
arepolar
in
Hausmann's
terms).
Anessential
property
ofcollocationsseems
to
beheirperception
by
native
peakers
f
a
anguage
s
requent,
ecurrent,
onventionalized
building
locks
fheexicon:
dj-vu ,
s
ausmann
ays.
he
combinationof
exactly
the
twoitemsappearing
in
thecollocation
is
lexically
determined;
itis
often
notpredictable; but nativespeakersarequitegoodat
7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics
4/32
Theway
words
worktogether
/
combinatorics
229
identifying
non-collocationalcombinations
in
other
people's
texts,
and
they
feel
thatnon-collocational
texts
arenot
fluent,
not
elegant
or
just
not
the
usual
way how
one
would
express
agivenidea.
Collocations
occur
in
both
general
language
and
sublanguage.
The
table
in
Figure1
contains
a
few
examples
from
English,
French
and
German.The
sublanguageexamples
may
be
feltto
be
differentinnaturefrom
those
given
for
general
language:wewillcome
back
tothislater(see
Section
2.3.3).
language general
language
sublanguage
English pay
attention,
want
sth.
adly
meritedpraise
closely
related
stophe
onveyor
overlayingock
expensive
nabour
French
oprer
tinchoix
une
dception
amre
perdument
amoureux
crer
un
fichier
lution
gredue
ressources
renouvelables
German eine
ereinbarung
treffen
starker
Raucher
tiefbeeindruckt
(jmdn)
hart
treffen
eineForderungabtreten
Abwassereinleiten
anstehende
Kohle
Dateien
bgleichen
Figure1.Afewexamples
of
generalandsublanguagecollocations
Anumber
of
criteria
have
been
discussedin
the
literature
to
distinguish
collocationsfromfreecombinationsontheonehandandfromidiomsonthe
other,
or,
rather,
toarrangeexamples
of
certain
typessomewhere
on
the
scalebetweenhesewoextremes.Thesecriteria
nvolve
hesyntactic,
semanticand
pragmatic
descriptionoflexemes.
2.2 Syntactic
properties
2.2.1Combinatory phenomenavs.phrasestructure
Mostcombinatory
phenomena
followtherulesofsyntax;noparticular
syntacticrules
are
necessary
to
describe
combinatoryphenomena.
But
not
all
of
them makeupconstituents.
Selectional
phenomenacan
be
observedbothwithinconstituentsand
withinthesentence:theexamplesgiven
above
(
grow ,
eat )
concern
the
interactionbetween
a
subject
noun phraseand
the
main
verbal
predicate
of
the
sentence.
Similarly,
we
observe
selection
phenomena
between
verbs
and
their
subcategorized
complements,
e.g.objects,
prepositional
objects,etc.,
but
also
with
adjuncts,
or
withinother
constituents
than
VPs,forexamplein
adjectivephrases(noun+
attributive
adjective).
7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics
5/32
230
Euralex
1994
Collocations
can
be
classified,
at
eastoranguagesike
English,
he
Germanic,
Romance
and
Slaviclanguages,
according
to
thecategoryoftheir
elements,into
noun-verb,
noun-adjective,noun-noun
collocations,as
well
as
erb-adverb
nd
djective-adverb.
oun-verb
ollocations
an
e
further
subclassified
according
o
he
grammatical
unction
ofthe
noun
phraseontributing
he
oun
art
f
he
ollocation:
ubject-verb-,
verb-complement-,erb-adjunct-collocations.
4
ollowingausmann
(1979),
Hausmann(1985)and
Hausmann
(1989),we
have
classified
afew
examplesin
theillustration
inFigure2.
NOUN
+
adjective confirmed
bachelor
eingefleischter
Junggeselle
clibataire
ndurci
NOUN+
verb
(Subj)
his
anger
falls
Zorn
erraucht
laolre apaise
NOUN
+
verb
(Obj)
to
withdraw
money
Geldabheben
retirerde
argent
VERB+adverb
it
s
aining
heavily es
regnet
in
Strmen
ilpleut erse
ADJ
adverb
seriouslyinjured
schwer
verletzt
grivement
bless
VERB+adverb
tofailmiserably klglich
ersagen
NOUN+
noun
agust
of
anger Wutanfall
un ebouffe
e
olre
Figure
2.
Types
of
collocationsin
terms
of
the
category
of
their
compo-
nents,following
Hausmann
(1989)
This
notion
ofcollocationdoes
not
assume
that
all
collocations
makeup
phrases:
n+adj-collocations
may
do
so,
ifthe
adjective
is
used
attributively,
asin
EN
heavyrain,ENunquenchablethirst,DEstarker
Raucher,
FR
regrets
amers,FRremords
ardifs,
etc
owever,
we
till
want
o
consider
he
combination
of
EN
unquenchableandhirst
ascollocational,whenhe
adjective
ssedredicatively
His
hirst...was...unquenchable.).
his
implies,
among
others,
that
computational
toolswhich
would
just
look
for
combinations
of
adjacent
exemes,
5
wouldnot
etrieve
ll
combinations
which
fall
under
thesyntactic
definition
givenabove.
The
noun
which
participates
in
ann+v
collocationcan
also
be
located
in
an
adjunct
(cf.
DE
es
regnet
in
Strmen,
etc.);
such
cases
are
difficult
to
treat
in
a
strictly
valency-based
model
or
in
a
formal
account
which
makes
use
of
subcategorizationinformationonly.
To
ourknowledge,
not
much
work
has
so
far
beendone
on
(lexically)typical
adjuncts .
As
observed,combinatoryphenomena
are
oftenorthogonal
with
phrase
structural
or
valency-based
grammatical
rules
in
the
widestsense).
This
property
is
problematic
forexample
forlexicalchoice
in
naturallanguage
generation;
inearlyapproaches,theorderinwhichlexemeswere
selected
in
aentenceo
e
enerated,
asetermined
by
elationships
between
syntactic
heads
and
modifiers,
or
nodes
of
a
valency
representation
and
their
dependents
(rule:
lexical
heads
first ).
This
works
outforverb-adverb-
or
adjective-adverb-collocations
andforsome
noun-adjective-collocations
as
well,
butnotfornoun-verb-collocations(e.g.in
the
verb-object
case:
the
object
nounmust
bedetermined
first,only
then
a
collocationally
adequate
7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics
6/32
Theway
words
work
together
/combinatorics
3 1
verb
can
be
selected).
Researchers
in natural
language
generation
werefirst
to
discuss
problems
ofcollocation:some
of
the
work
on
lexical
choice
s
aimedt
ringing
ollocationalnd
yntactic
constraints
ogether
nd
controlling
their
interaction
in anadequateway
(cf.
e.g.Nirenburg
et
al.
1988,
etc.).
Andditionalroblem
f
he
nteraction
etween
yntactic
nd
collocational
descriptionisthe
recursive
nature
of
collocational
properties:
thecomponents
of
a
collocation
can
again
be
collocational
themselves:next
toheGermancollocation
ltigkeithabenn+v),
we
haveallgemeine
Gltigkeit
haben,
with
llgemeine
Gltigkeit,
ollocationn+a),s
component.
These
cases
have
sometimes
been
analyzed
as
differentfrom
collocations,
but
thereisnoreasonfor
suchtreatment.However,a
formal
account,
e.g.
for
machine
translation,
would
have
to
be
able
to
account
for
such
cases.
2.2.2
Problems
ofth e
syntactic
description
ofcollocations-
th e
case
of
support
verb
constructions
Syntacticianshave
observed
some
irregularities
in
thesyntacticbehaviour
ofollocations,n
articular
fupport
erb
onstructions
Funktionsverbgefiige ,
constructions
verbesupport ):examples
are
FR
avoir
peur,
avoirfaim,prendreun
bain,
poser
une
question,
oprer
un
choix,
ENbein ahabit,
takeabath,pay
attention,
delivera
speech,DEAngsthaben,
ein
Bad
nehmen,eine
Frage
stellen,
zur
Anwendung
kommen.
Many
of
the
syntactic
operationspossiblewith verbphrasesarenot
or
only
in
part
possiblewith
support
verb
constructions;
such
operations
(often
used
as tests)includepassivization,pronominalization,
the
possibilityof
taking
the
nominal
part
up
with
an
anaphoric
pronoun,
thepossibility
of
modifying
the
noun
e.g.
with
adjectives,
genitives,
relativeclauses,
etc.),
the
choice
between
different
kinds
of
determiners,
etc.
The
most
frozen (or
as
Cruse
(1986:41) says,
bound )
collocations
are
closeto
typicalexamplesof
idioms,
insofar
as
no
modifications
are
possible.
Other
support
verbconstructions
participate
nsome,butnot
ll
of
the
processesmentioned
bove:
E
ine
Frage
tellen
canbemodified
r
pronominalized,whereasDE
zur
Ausfhrunggelangenoesnotallow
pronominalizationor
modification:
Hanshateine
kluge
Frage
gestellt,
Josef
hatieeantwortet;dasrogrammelangtuinerollstndigen
Ausfhrung;
das
Programm
gelangt
zur
Ausfhrung:
*sie
mu
korrektsein.
Apparently,pronominalizationor
pronominalanaphoricreferenceand
the
possibility
of
modification
of
the
predicative
noun are
somehowrelated.
Similar
data
to
those
for
German
have
been
observed
for
Danish
(cf.
e.g.
Dyhr1980),
utch
cf.
inderdael
980)
nd
French
cf.
ross
986,
Gross/Vives1986).Inmanyarticles
about
support
verb
constructions,
just
some
suchfacts
aredescribed
(
anecdotically ),and
we
arestillnotaware
of
a
more
comprehensive
treatment
oraformalaccount.
It
seemsthat
the
7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics
7/32
232 uralex
1994
twotypesof lexicalized
and
non-lexicalized support
verbconstructions
observedyelbig1984)
oughly
orrespondo
ases
here
he
predicative
noun
isstill
available
asareferent( non-lexicalized case)
as
opposed
to
referentially
blocked
cases
(
lexicalized ):
ongoing
work
by
Kuhn(1994)shows
that
the
testusedtodistinguishthesetwotypesare
all
based
on
referential(un-)availability.
Similarly,thesyntactic(e.g.valency)
behaviour
of
lexical
combinations
(includingbothcollocations
and
dioms)
has
notbeendescribedinvery
muchdetail
soar
n
dictionaries.
We
only
know
ofprojects
for
foreign
language
learners'
idiomdictionarieswhichaimatcomingupwithadetailed
syntacticdescription.
Nouncompoundingis
often
not
looked
at
from
thepoint
of
view
of
lexical
combinatorics;
but
a
collocational
view
is
most
relevant,
e.g.
for
contrastive
workn
omance
s.ermanic
anguages.
el'chuk's
xamplesf
expressionsfor
groups
ofanimals(ENflock
of
seagulls,
pack
of
dogs,school
offish,
cf.Fontenelle
(1994b)
for
more
examples),
but
also
technicalterms
like
those
describedand
analyzedby
Seelbach
(1994)
(IT
acque
di
rifiuto
DE
Abwsser,
IT
stazionedi
depurazione
-
DE
KlranlageIT
perditadi
sostanze
liquide-DEFlssigkeitsverlust)
are
cases
npoint.
Knowledge
about
collocationally
adequate
combinations
of
nouns
n
compounds
or
nounphrases
is
most
important
fortranslation.
Soler/Marti
(1994)
discuss
this
problem
in
detail,giving
examples
from
Spanish-Englishtranslation.
2.3
Semant ic
properties
2.3.1
Combinatory
phenomena
and
compositionality
Syntactic
propertiesdo
not
seem
to
have
muchdiscriminatorypower,
as
far
scollocations
and
dioms,heirborderline
and
he
borderline
with
normal
constructions
areconcerned.Fortestsandcriteriaofclassification,
we
thus
have
to
relyon
(lexical)
semantics.
A
ew
general,
broad
distinctions
seem
to
be
commonly
accepted:
he
meaningof idioms
is
not
derivable
from
the
meaningof
thelexemes,
word
forms
which
make
upthe
idiom:
idioms
arenon-compositional.
On
theother
hand,
what
hasbeencalled freecombination
by
Hausmannandothers,i.e.
the
normal
case ,is
fully
compositional:themeaningofEN tobuyabook
is
derivable
bytheusual
processes
from
the
meanings
of
EN
buy
andEN
book.
Collocationsare
anintermediate
case
between
the
two:the
meaning
of
EN
buy
somebody's
argumentisnot
fully
compositionallyderivable
from
the
meanings
of
argumentandbuy.However,themeaningof
argument
is
present
in
(and
used
in
the
meaning
description
of)
buy
sb's
argument,
it
is
only
buywhich
does
not
have,
in
thiscollocation,
the
meaningit
has
in
buy
a
book.
This
partialcompositionalityofcollocations
has
ledHausmannto
describecollocationsas
polar
combinations,
consistingof
abase
(the
item
7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics
8/32
Theway
words
work
together
/
combinatorics
233
which
has
its
fulllexical
meaning,in
our
exampleabove:argument)
and
a
collocate
6
(withmodified
or
reduced meaning:buy).
Mel'chuk,
in
a
talkabout
collocations
atthe
1990conferenceofEuralex,
7
has
most
clearly
summarized
the
differences
in
compositionality
between
free,
collocational
andidiomatic
combinations,
and
we
schematizethesein
Figure
3 ,
followingMel'chuk'spresentation.
A
B
regular
compositional
-c
M
C
ll
c
n
A=
c
n
B
full
id ioms
non-coapositional
A
Dn
A
Dn
collocations
partially
cojapoaitionll
Figure
3 .
Types
of
lexical
combinations
interms
ofcompositionality
(following
Mel'chuk)
When
constructing
semantic
epresentations,
we
can
apply
he
usual
procedures
for
compositional
cases
to
free
combinations;we
can
assign
a
single
semantic
epresentation
o
an
dioms whole.Dobrovol'skij
s
examples
DE
denKopfhngen
lassen
or
etw.
ndie
Wege
leiten
could
be
described
as
denoting resignation
orstartup
of
an
activity respectively,
and
we
could,dependingonthegranularity
of
descriptionweaimat ,describe
DEjemand
ltden
Kopfhngenin
a
similarwayas jemandist
deprimiert
or
jemandistresigniert,
or
aswell
jemand
leitet
etwasin
die
Wege
similarly
to
jemand
beginnt
etwas,
jemand
leitetetwas
ein.
A
ore
difficult
problem
s
he
ollowing:
ollocations
ike
DE
ur
Anwendung
gelangen( tobeapplied )can,asitseems,
be
representedby
the
same
device
as
the
idioms
above:
at
a
certainlevelof
specificity,
we
can
considersuchcollocationsasquasi-synonymouswithverbs(DE
anwenden,
7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics
9/32
234 uralex
1994
inthis
case)
and
use
the
same
representation
for
DE
angewendet
werden
and
zur
Anwendunggelangen,only
with
an
aspectualdifference.
Butwhat
about
he
cases
where
he
predicativenounseferentially
available,
as
in
the
example
discussed
above,
in
Section
2.2.2:
DE
Hans
hat
eine
rage
estellt.
Max
atsieeantwortet.
nhisase,e
eed
representation
hich
ould
reserve
n
ntecedent
or
he
naphoric
pronoun.f
he
emantic
epresentation
s
ust
he
ame
s
hat
f
two-place
verbal
predicate
ask :
DE
fragen,
in
the
case
ofeine
Frage
stellen),no
hook
to
serve
as an
antecedent
fortheanaphoric
pronoun
is
available.Thesame
way,
no
easonable
semantic
epresentation
of
the
modifier
in
DE
erhat eineklugeFrage
gestellt
would
be
possible
then.
If
we
treat
the
referentially
available
cases
separately,
doweneeddifferent
representations
for
DE
Verkauf
in
zum
Verkauf
stehen
(no
referent)
and
einen
Verkauf
ttigen
(referent
available)?
Thisproblem
comes
up
when
one
triestogiveaslightlymoreformalaccountofsupportverbconstructions,
e.g.
in
ead
riven
hrasetructurerammar,
PSG
see
.g.
ork
f
Erbach/Krenn
1993),
or
inany
other
framework
usable
in
NLP.
It
also
comes
upin
translation:
Thurmair
(1990)
discusses
cases
like
the
translationofEN
tolaunch(a
product)
byDE(ein
Produkt)
auf den
Marktbringen;if we
use
a
compact
semantic
representation
or
he
collocation,
.e.
one
which
wouldbe
similar
oridenticalwith
that
ofthe verb,w e
would
beintrouble
to
translate
backfromDE(einProdukt)aufdenberfllten
Markt
bringen
into
English.
Other,moredetailedsemantic
representations
seem
necessary.
W e
have
o
sk
urselves,
hen,
owever,
ow
ar
e
hould
o
n
decomposing the
meaning
ofcollocations,
derivatives
and
of
one-word
lexemes .
2.3.2Towards
a
semantic
classification
of
collocations?
Mel'chuk's
lexical
functions
Theeaning= >
ext-Model
MTM),
r:
eaning-Text
heory,
developed
by
Igor
A.
Mel'chukand
his
collaborators
includes,
among
many
other
things,
asemanticclassificationof
lexicalcombinationphenomena.
The
approach
is much broader
thanjust
a descriptionof
thesemanticclasses
into
which
collocations
can
be
subdivided:it
is
a
whole
theory
of
language,
conceived
as a
model
ofhowmeanings can
be
realized
in
language.
Itis impossible
to
giveafullandadequatecharacterizationofMTMin
the
frameworkof
the
present
article.
Itissufficient,here,
to
recalla
fe wof its
most
important
aspects:
-
TM
equally
supports
analysis
and
generation
of
text,
but
its
primary
goalisanaccount ofgeneration,i.e.
the
problemofhowmeaningsget
realizednexts
hence
he
ame
Meaning
= >Text-Theory ).
Consequently,
the
description
ofparaphrasing,
ofquasi-synonymy,
of
7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics
10/32
The
way
words
work
together/
combinatorics
23 5
the
hading
f
eaning,
epending
n
ommunicativend
text-structural
phenomena,
etc
are
important
to
MTMresearchers;
MTMisamodularand
stratified
approach.tdistinguishesseveral
strata,roughlycorresponding
to
thetraditionallevelsof
description
and
epresentation
semantic,
deep
nd
urface
yntactic,
morphological
andphonological);
-TM
yntax
s
ependency-based;
escriptions
f
verbs
n
he
dictionaryincludeaninventoryoftherelevantactants
and
of
their
syntacticrealization
(e.g.
as
phrase
structural
constructs);
this
is
a
basis
for
the definition ofa
syntax-semantics
interface,
and italsoallows
to
link
the
descriptionof
collocations
to
thisinterface.
8
MTM
describes
collocations
by
means
of
lexical
functions .
These
can
be
seen
as relations
between
one
ormorewords
or
wordcombinations
on
the
one
hand
and
partialsemantic
description
onhe
ther.
he
partial
semantic
descriptionconsistsof
akeyword
and
an
abstractsemantic
operator
applied
to
thiskeyword;thedifferentkindsof
operators
are
the
different
types
of
collocations.
9
The
number
of
lexical
unctions
simitedo
around
60 ;
heycan
be
combined(see
e.g.
Ramos/Tutin
1992,
workbyMel'chuk, etc.).Outof
these
60exical
unctions,about
dozen
play
n
mportant
ole
o
describe
collocations
of
indoeuropean
languages.
10
The
table
inFigure4contains
a
fewexamplesof
lexical
functions,
along
with
the
name
ofthe
LFs
and
our
very
roughdescription
of
the
meaning
expressedbyeachoftheoperators.
Meaning lexical
unction
examples
French,
English)
Intensifier MAGN
bruitinfernal,
interdireabsolument
Quantity
electors
MULT
SING
un
e aim
d'abeilles
ungrain
de
riz,
ak eofsoap
Evaluate*
VER
sharp
knife,
meritedpraise
semantically
almost)
emptystylistic
figures
EPIT
GENER
FIGUR
ocanimmense
unsentiment
de
joie
un
rideaudefume
points
ina
process
GERM
CULM
seed
of
hope
paroxysme
de
joie
semantically
almost)
empty
support
verbs
OPERl
OPERj
porter
plainte,pousserun
cri,
mener
unelutte
sth.formsanoffer,th .
onstitutes
an
offer
Figure4.
Examples
oflexical
functions
The
Meaning
< = >
Text-Model
is
not
only
a
framework
forthe
description
and
semantic
classification
of
collocations:
Mel'chuk
andhis
collaborators
have
lso
orked
ut
roposals
or
ery
etailed
ictionaries,
he
ExplanatoryandCombinatory
Dictionaries
(ECD);theseproposals
11
have
7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics
11/32
236
Euralex
1994
been
moststimulating
for
bothlexicographycf.the
dictionary
of
Cohen
(1986),thethree
volumesof
French
ECDs,
as
well
as
ECD
fragments
of
Russian
and
studies
towards
ECDs
of
English
(Steele1988),
and
German)
and
atural
Language
Processing
cf.
ork
by
Nirenburg
et
l.
988,
Heylen/Maxwell
994,eid/Raab
989).
tillsomeeficitsave
been
identified,
such sheact
hat
he
evelof
granularityof
the
semantic
description
f
exical
unctions
ay
ot
e
ully
ufficient
or
semantics-basedNLP(cf.
Heylen/Maxwell
1994)
or
the
lack
ofa grammar
forcombininglexical
unctions
among;
hislatter
gap
hasbeen
filled
by
Ramos/Tutin(1992).
If
onecompares
he
MTM
approach
o
collocationswith
he
work
of
Hausmann
and
otherlexicographers,as Cop(1990)and
Heid
(1992)have
done,
quite
some
overlap
is
found,
despite
differences
in
terminology.
The
tableinFigure5comparativelysummarizes
the
relevantterminology
used
in
Mel'chuk'sand
Hausmann's
work.
Compared
Who?
components/properties
Terminlogy
H.
Base
Collocate
M.
Keyword
Value
of
LF
Semantic
H.
autonomous
dependent,
properties
non-autonomous
M.
compositionally
not
fully
describable
compositionally
Implication
H. collocations
musl
be
learned
fo r
treatment
separately
of
collocations
M.
collocations
must
be
stored
explicitly
n
theECD
Figure5.
Comparing
terminologyofMTMandHausmann
Collocation-related
esearch
opics
n
TMnclude
he
ctual
integration
of
a
collocational
componentinto
implementations,as
well
as
workon
the
relationship
between
semanticsandcollocation(seeSection
2.3.3below).
2.3.3Correlating semantic
classes
and collocationalbehaviour
It
has
been
stated
that
collocationsand
collocational
lexical
choice
are
completelylexicallydetermined
(cf.
Mel'chuk/Polgure1987)and
thus
need
tobememorized,byforeignlanguagelearners
(cf.
Hausmann1984)or
in
7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics
12/32
Theway
words
work
together/
combinatorics
3 7
dictionaries,
be
t
orhumanuseor
for
NLP.
On
he
otherhand,
some
researchhasbeengoingon,over
he
past
ew
years,
aboutcorrelations
between
emanticlassifications,
exical
ields,
tc .ndollocational
behaviour.
Heid/Raab
(1989) have
observed
that
theFrenchnouns
denoting
personalattitudes
whicharedescribedinthefirst
volume
oftheECD(cf.
Mel'chuket
al.
1984
)selectsimilar
collocates,for
certain
lexical
functions:
foradozenof
semanticallyrelated
nouns,
12
aparallel
behaviour
in
collocate
selection
for
thelexical
functions
OPERI,CAUSOPER,INCEP
FUNC,
INCEP
OPER,
FIN
OPER,
..
wasobserved.
Inthe
field
of
lexical
acquisition,
it
has
been
triedtoconstitute
lexical
semantic
lasses
oromain
lasses?)
yonsideringollocational
behaviour:
the
assumptionisthatbaseshaving
the
samecollocatesbelongto
the
same
ield.
ustejovsky
et
l.
1993)
have
sed
his
assumption
n
terminology-relatedcorpus
exploration.
Muchmorematerial
is
nowanalyzed
n
studies
by
Meyer/Mackintosh
(1994)
and
inparticularby
Mel'chuk/Wanner
(1994).
While
the
first
ison
sublanguagecollocations,
the
second
deals
with
general
language,
coming
back
to
the
fieldofemotionnouns,
subdivided,
for
that
purpose,
into
(in
part
overlapping)
subsets,
according
to
inherent
propertiesof
emotions,as
they
aredescribedin
psychology:
positive
vs .negativeemotions,moderatevs .
intense,temporaryvs .permanent,etc.Foreachsuch
subset,
thecollocate
selection
behaviour
of
a
few
prototypical
German
base
nouns
and
selected
lexical
functions
isanalyzed.
Theesults
are
of
wo
ypes:
onheone
hand
ndeed,
a
numberof
collocations
appear
withmostor
al l
of
the elements
of
the
field
or
ofa
given
subset;
on
the
other
hand,
a
non-negligible amount
of
exceptions
is
noted
as
well.
Mel'chuk/Wanner1994)ake
his
esults
starting-point
or
proposal
for
thereorganizationof
the
ECDentriesfor
emotion
nouns.
The
proposalis
tointroduce
a
common
public entryfor
the
whole
class
of
nounswhich
wouldstipulate
the
values
of
certain
lexical
functions,
either
for
all ofthe
class
members,or
in
function
of
the
presence
ofoneormore
of
the
subclass-definingriteria.
he
esults
o
ot
mmediately
ead
o
hierarchy;
the
domain
model
usedis
not
hierarchical
neither.
Whatcomes
out
reather
mplications
etween
he
resence
fcertain
emantic
propertiesand
the
collocation
behaviour.
Meyer/Mackintosh
(1994)observethat,
for
thesublanguage
of
technical
documentationofCD-ROM
devices,
afew
collocational
generalizations
are
possible,
which
can
be
modeled
in
aninheritancehierarchy.Maybe
in
part
theifferences
etween
el'chuk/Wanner'sndeyer/Mackintosh's
resultshave
to
do
withthe
fact
that
terminologicaldomains,
especially
when
denoting
oncrete
bjects,
an
ore
asily
e
odeled
hemselves
n
taxonomies
than
domains
ofabstract
notions,
as used
in
general
language.
Theesult
snteresting
n
heightofMartin's
otionofconceptual
collocation':
Martin
(1992) observesa
correlation
between
the
semantic
and
conceptual
description
of
items
ofa
(technical)
domainandthe
collocational
7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics
13/32
238
Euralex
1994
behaviour.
He
observes
a
subtypeof
n+adj-andn+n-collocations
which
j
ust
denotesspecialized)subtypesof
the
objectsdenoted
by
hebasenoun.
Similarly,hepoints
out
thatn+v-collocations denote what
one
cantypically
do
with
(or
to)
the
object
denoted
by
the
base
noun.
13
W ehavemadeafew experiments
on
this
problem
ourselves,usingCohen's
descriptionofcollocationsofthe
sublanguage
ofthestock
market
s
starting
point(Cohen
1986).
14
W e
have
looked
up
theentries
for
nouns
which
share
certain
collocational
properties.Onetypeofquestion
we
askedwastoknowwhichsubsetsof
nouns
share
one
or
more
collocate
expressing
the
INCREASE
or
DECREASEof
theprocess
denoted
by
thebasenoun,bothwith
subject-
andobject-
taking
verbs.Onesuchgroupconsistsof
:
all
hese
nouns
share
he
collocates
7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics
14/32
Theway
words
work
together/combinatorics 239
Theresult
ofthis
exploration
shows,
among
otherthings,
the
following:
some
collocateverbs
are
passe-partout ,like
7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics
15/32
240 uralex
1994
in
such
smallgroups
of
nouns
is significantandmostlikely
can
berelated
with
properties
relevant
forthe
semanticorconceptualdescription
ofthe group
of
nouns.
Terminologists
and
lexicographers
might
usefully
explore
in
more
detail
the
elationship
etween
onceptual
r
emantic
escription
nd
collocational
behaviour.
AsMartin(1992)
states,
resultsof
acollocational
analysisfurnish
input
fordefinition
construction
and
vice
versa,definitions
(in
ermsframes,
or
xample)
an
e
sed
s
ackground
or
collocational
expectation
patterns.
With
the
availabilityofcorpus
processing
tools,
such
analyses
become
less
expensive.
ohenid
ot
xplicitely
roup
he
ouns
reated
n
er
dictionary,
althoughthis
would
be
possible,as
our
experimentsshowandas
the
ork
f
el'chuk/Wanner
1994)
nd
eyer/Mackintosh
1994)
suggest.
uch
tructuringoulde
elpful
oredagogicalurposes.
Knowles/Roe
(1994)dealwith
the
pedagogical
use
of
collocationalmaterial
extracted
from
textsofspecializedlanguage;someofthetoolsdescribedby
Grefenstette
(1994)
are
helpfulfor
technically
doingthejob.Tools,methods,
applications
and
descriptiveworkcome
ogether
at
his
point:
affaire
suivre.
2.4
Pragmaticpropert ies
The
ragmatic
escription
f
ollocations
nvolves
he
otion
f
collocations
as
conventionalized
expressions.
Generallanguage
collocations
are
thenormalway ofexpressingagivenmeaning(cf.sichdieZhne
putzen/*biirsten
s.
e
rosser/*nettoyeres
ents).
ausmannalls
collocations
semi-finished
products
of
language( Halbfertigprodukte
der
ede ).
his
s
hy
ollocationally
orrectexts
re
erceived
s
fluent ,
hereasexts
ith
rong
ollocates
rith
ompositional
expressions
where
collocational
alternatives
would
exist,are
perceivedas
unnatural;
his
property
ofcollocations
n
urn
otivates
much
of
he
pedagogical
interest
they
attract.
In
addition,individualcollocations canpertain
to
diasystematiclanguage
varieties,
thesamewayas
one-word
lexemescan(cf.SwissGerman
einen
Entscheid
fllen
for
German
eineEntscheidung
treffen;
East
German
eine
Bestellung
auslsen
vs .eine
Bestellung
aufgeben).
3.
Lexicographic
treatment
of
combinatory
phenomena
-
access
to collo-
cations
We
ave
o
ar
entioned
ew
roblems
fhe
escription
f
collocations.In
addition,
the
properties
ofcollocations
lead
to
a
numberof
particular
roblems
oncerningheresentationn
ictionaries
f
collocational descriptions.
Here,
wecapitalize onthe organization oflexical
entries
and
the
access
tocollocational
information
indictionaries.
7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics
16/32
The
waywordsworktogether/combinatorics
41
Although,
from
a
semantic
point
ofview,
twouldprobablybe
a
good
solution
tohaveindividuallexical
entries
for
collocationsand
idioms
(and
to
make
themaccessibleasawhole),thisisnotpracticalwithinsemasiological
dictionaries.
This
is ,
however,
what
happens
in
onomasiological
dictionaries,
suchas the Longman
Language
Activator
(LLA)
orthedictionary
of
idioms
plannedby
Dobro
vol'skij(1994).
3.1 The
organization
ofcollocationand id iom
dictionaries
Lexicographers
have
muchdiscussedtheaccesstoidiomsandcollocations
inmonolingualandbilingualdictionaries;inparticularthequestion
where
to
lphabetize
ultiword
xpressions:hisroblemmustbeolved
n
different
ways,
depending
on
he
distinctions
between
monolingual
and
bilingual
dictionaryandbetween
encoding
(textproduction)
anddecoding
(text
understanding)
seof
he
ictionary.
roductiondictionaries
will
favour
the
accessto
collocational
informationvia
the
base,
whereasin
a
decoding
dictionary
we
can
not
be
sure
thatthe
reader
of
a
text
is
able
to
figure
outwhetherornotawordformbelongstoacollocation,
and,
access
viaboth,
bases
and
collocates,
or
via
the
collocation
as
a
whole
wouldbe
ideal.
15
Thisiseasierto
realize
online;
an
experiment
of
thistypehasbeenmade
in
a
lexicaland
terminological
database
designedtoholdsingleword
items
as
well
as
collocations,
which
has
been
designed
by
Heid/Freibott
(1991).
The
ollowing
roblems
f
ccess
o
ombinatory
nformation
n
dictionaries
have
beendiscussedintheliterature.
For
idiomatic
expressions,theproblemisparticularlyhard,sinceusually
none
of
the
word
forms
whichmake
up
theidiom
isa
clear
candidate,
on
semanticgrounds,
to
serve
asan
entry
word.
For
collocations,Hausmann(1988)
has
suggested
tosortthemunder
the
bases.
This
is
what
happensconsistently
in
Ilgenfritz
et
al.
(1989)(cf.theentry
for
respect
inFigure9).
This
sorting
procedureisin
line
withthetraditionof
stylistic
dictionaries,
such
as
Lacroix
(1956)
and
others.
An
example
of
an
entry
fromLacroix
(1956)
is
reproducedinFigure8.
It
liststheverbal
and
ad j ectivalcollocates
ofthe
entry
word,sortingthemin part
accordingto
their
subcategorizationproperties.
TheBBIcombinatorydictionaryofEnglish
(Benson
et
al.1986)
also
organizesits
macrostructure
by
the
bases
treated,
listing
the
collocates
in
thebody
of
theentry.
Respect.prouver,
ressentir,
montrer,
marquer,
tmoigner,
manifester,
devoir,porter,professer,affecter,feindredu
respect.
Inspirer,provoquer,
commander,forcerle respect.Manquerderespect. Adresser
ses respects.
tre
entour
d'un
certain
respect.
Rappeler
au.
QUAL.:
profond,
filial,
sincre,craintif,gnral,unanime,universel.
Figure8.The
entry
s.v.respectinthecollocation dictionaryby
Lacroix
(1956)
7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics
17/32
242 uralex1994
respect
m
Respekt,
Achtung
avoir,ressentirdu~
envers,pour,
l'gardde
qnj-m
Achtungentgegenbringen:Nous
ressentons
du
~
envers
Monsieur
votre
pre.
/
devoirle
~
qn
j-m
Respektschulden:Nous
devons
le ~
no s
professeurs.
/
forcer
le
~
de
qn
j-n
Achtung
bntigen:
on
omportement
a
forc
mon
~.
/
mposer,
nspirer,
commanderle
~
Achtungeinflssen:Cette
personne,
bienqu'ellesoit
trs
petite,inspire le
~.
/manquer
de
~
(envers
qn)
es
an
der
notwendigenAchtung
fehlen
lassen(gegenber
j-m:)
Jetrouve
qu'il
manque
de
~
envers
se s
parents.
/tmoigner,montrer
du
~
,
envers,pour,
l'gardde
qn
j-m
Respekterweisen:
Les
enfants
d'aujourd'huinetmoignentplus
tellement
de
~
aux
personesges.
Figure
9.
The
entry
s.v.
respect
inIlgenfritz
et
al.(1989)
A
quite
detailed
syntactic
account
of
collocations
similar
to
our
proposals
in
Figure2isgiven
in
Laine
(1993):
this
dictionary
(specialized
vocabulary
of
CAD/CAM
French/English)
distinguishes
subject-verb-,
verb-object-,
and
oun-adjective-collocations,
s
well
s
ollocational
oun
hrases
involving
PPs
(compound
nouns).
Below,
we
reproduceanexampleofan
entry.
t
onsists
f
wo
olumns,
neofwhich
ontains
he
yntactic
classificationsed
n
heictionary,
he
ther
he
elevant
exical
combinations.
ordonnancement scheduling
~
V.
V.~
~
Adj.
~
Prp)(Art)N
N(Prp)(Art)
~
~
connatre
les
ordres
lancs
choisir
~,dfinir
~ essayer
~,rgles]gouverner
~
~
assist
par
ordinateur,
~
dynamique,
~
nformatis,
~
multiconvergent,
~
optimal
~
butsmultiples,
~
pardates
croissantes,
~parvaleurs
croissantes
des
marges
libres
l'artde'~,
coefficient
d'~,fonction
~,
mthode
d'~
(de
production),rebouclagesur/'~,echnique
7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics
18/32
The
way
words
work
together
/
combinatorics
43
(glossed)
examples,
as
wellas
subentries.
Similarly,Cobuildhascollocations
in
its
definienda,as
well
asin
its
examples.
Amongilingualictionaries,heollins/Robertnglish/French
dictionaries
and
the
Collins/Klett
English/German
ones
are
a
remarkable
exception:
hey
ave
articular
evices
o
enote
+v-collocations,
distinguishing
even
whether the
noun
is
the
subject
or
a
complement
of
the
verb.
Thesedictionaries
have
been
collocationallyexplored
and
described
in
detail
yontenelle
1992a)
tc.:n
op
fheir
ell-structured
representation
of
collocations,they
are
a
remarkably
richsourceforthis
type
ofexicalnformation.nother
articular
eviceor
he
reatment
f
collocationshasbeenusedin theVan Dalebilingualdictionaries.
They
follow
theidea
of
a
categorialdescription
of
the
component
partsof
collocations
and
indicate,
for
example,
verbal
collocates
of
a
noun
in
a
special
part
of
the
entry,
using
a
numeric
code
opoint
o
he
categoryof
the
combination
partner.
16
A
sample
entry
from
theFR->
NLdictionaryis
reproduced
in
Figure
11.
respect0.1r6'eerf= hoog)achting,ontzag,reaped0.2erbieding=
naveling
0.3
beiuigingen
van
hoogachiing
O2.1~humainvrees
voor
watmenervandenken,
zeggenal2.3m es respects
votre
emmede
groeien
aa n
uw
rouw;
m esespects
3.1voirdu
~
pour
qn.
achting,
espect
voor
em .
hebben;
commander,
imposer,
nspirer
le
~
ontzag
nboezemen,
espect
afdwingen;
manquer
de
~,
enversqn.
zieh
tegenover
iem.
onbehoorlijk,
niet
correctgedragen;
manquer
de~
une
emmeiehvrijpostiggedragenegenover
een
vrouw;montrer,moignerdu~
,nvers,pour
qn .
em.
chting
betonen,
eiuigen;garder,enir
qn .
n~em.'nbedwang
houden,em.
onder
schot
houden
3.3prsenterses
respects
an.
em.degroetendoen4.1
~
de
soi
zelrespect6.1
sauf
le~qu evousdois,sauf
votre
~
metuw
verlof,metuuiwelnemen,
metaileespect.
Figure
11.
Theentrys.v.respect
intheVanDaleFR/NL dictionary
3.3TheECDs:access
v ia
semantic criteria
The
bove
ictionaries,pecializedndeneral,
onolingual
nd
bilingual,
se
yntactic
riteria
or
he
rganization
f
ollocational
information.
Theonly
dictionaries
we
are
aware
ofto
base
their
organization
on
emantic
riteriasell,
re
he
xplanatoryndombinatory
dictionaries
which
have
been
publishedby Mel'chukandhis research
group,
such
as
Mel'chuketal.(1984),etc.Theaccesstocollocationsis
via
thebase
entry
and
the
lexical
functions
applicable
tothe
base
entry(seeSection
2.3.2
andnote
9).
A
small
part
of
the
entry
s.v.
respect
is
given
in
Figure
12.
17
7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics
19/32
244
Euralex
1994
Operi
avoir,
prouver
ART
~
[Toute
la
population
n
profond
respect
pource t
artiste
mrite]
continuel-
lement
Operi
vivre
[dans
le~
]
[Cette
famillevit
an s
le
espectde
se s
anctres]
ContOperi
garder
[ART~
]
[Malgrlespropos
diffamatoires
es
journalistes
envers
ce
put,
se s
prochescollaborateurs
ont
gardunprofondrespectpour
lui]
FinOperi
perdre
[ART
~
/out
~
]
[Lesdirigeants
ont
perdu
tout
respectpour
ces
artistes]
Caus/
3
\
Operi
inciterN
ART
~
]
[Les
parents
les
incitent
au
espect
es
valeurs
morales;
L'honnte
e
Louise
ncitePaul
etJean
au
espect
de
ette
femme]
nonOperi
ignorer
tout
~
[Jean
gnore
toutrespectpour
se s
parents]
Oper
jouir
de
ART~
],avoir[le
~
[Pierrejouitdu
espectde
se s
subordonns]
ContOper
conserverle
~
[Malgr
lespropos
diffamatoires
es
journalistes
envers
e
put,
ce
dernieraconservlerespectese s
proches
collaborateurs]
FinFunco
disparatre
[L e
respect
du
publicpourceministre
disparu]
CauSFunco
se
mriter
[ART
~
[Parso ntravailconsciencieux,
l
semritale
respect
de
se s
collgues
Figure12.
A
fragment
ofthe
collocation
partof
the
entry
s.v.respectinthe
ECD
An
application
of
the
ECD
description
echnique
s
found
n
Cohen's
dictionary
Cohen
1986)
of
collocations
of
the
sublanguage
of
economy
(stock
market
and
conjuncture).
18
Instead
of
using
lexicalfunctions,
she
uses
paraphrases
ofa
elevant
subsetofthese;given
hat
manyofthe
tems
serving
as
entrywordsin
Cohen's
dictionary
denote
processes,the
dictionary
indicates
phases
of
the
processes,
likethe
start,increase,
decrease
and
end.
We
reproduce
inFigure13a partof
the
entryforFRempruntas anexample.
7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics
20/32
The
way
words
worktogether/combinatorics 245
emprunt
nouns
( sub
j.
of)
verbs
(obj .
of)
verbs
adjectives
START
mission
lancement
mettre
lancer
INCREASE
accroissement
augmentat ion
s'accrotre
augmenter
monter
accrotre
augmenter
considrable
lev
gros
UNDETERMINED
DECREASE
baisse
diminut ion
rduction
baisser
diminuer
rduire
restreindre
petit
END clore,l iquider
rembourser
restituer
Figure
13 .
The
entry
s.v.
emprunt
in
the
dictionary
by
Cohen
(1986)
The
two-dimensional
presentationof
the
materialin
Cohen
(1986)supports
access
viadifferent
ways:the
userstarts
with
abase
lemma
and
then
can
eitherlook
up
collocations
in
terms
of
the phases
of
the
process
denotedby
thenoun,selecting
hereafter
headequategrammaticalealization,or,
alternatively,
by
jointly usingboth,semanticandgrammaticalproperties.
3.4
Summary,
new
proposa ls
The
tablein
Figure
14
contains
anexemplary
summary
ofthetypesof
information
we
canfindin
dictionaries
andof
thewayshow
this
information
can
be
accessed.
Information
given
n
dictionaries
Example
cited
Access
via
...
Example
cited
acollocation
is
used:
it
isattested
any
definitions,
examples,
etc.
any
collocation
related
with
a
given
reading
of
thebase(explicitely)
VanDale
bilinguals
base
+
reading
(number)
VanDale
bilinguals
category
of
base
andcollocate
Van
Dale,
[Ilgenfritze.a.
1989]
[Laine1993]
base+
reading+
cat.code VanDale
fo rN-V
collocations:
gramm.function
of
N
Robert/Collins
[Cohen1986]
base+
r.,
+position
(markup)
Robert/Collins
semanticclassification
of
collocations
ECDs,
[Cohen1986]
base+r.,
lexicalf.
ECDs,
[Cohen
1986]
Figure14.MainFeatures
of
collocation
treatment
andaccess
in
dictiona-
ries
7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics
21/32
246
uralex
1994
New
proposalsorsimplynewpracticalsolutionscome
up
indictionaries
which
cons istently
followan
onomasiological
view.Thisis
discussed,
with
a
view
to
the
plans
fo ra
Russian/English
idiom
dictionaryin
the
paperby
Dobrovol'skij
(1994).
The
author
uses
a
set
of
local
(or:
partial)
conceptual
hierarchies,inspired
by
prototypetheory,
to
organize
th e
backbone
of the
dictionary,
an dhe
thenlinks th e
idioms
described
in
the
articles
to
the
nodes
of
hesehierarchyrees.
imilar
pproachsollowednhe
LongmanLanguageActivatoranonomasiologicaldictionaryfo r
language
production.
The
LLA
has
collocations,
idiomsandnormal
single
word
lexemes as entries,all
related
by
a
semantic
superstructurespanning
up
small
hierarchies,fo rabout1,000topics .Collocationsandidiomsare
treated
here
on
a
parwithotherlexemes.Accessisby
the
meaningofthe
multiword
item
as
a
whole.
Other
proposalsfo r
dictionary
structureare
madein
Oubine's(Oubine
(1994)
ilingual
ussian-English
ictionaryf
exical
ntensifierscf.
Mel'chuk's
lexical
function MAGN).
Given
thatabilingual
dictionary
is
aimed
at, andthatnot forallcollocationsfullequivalencecanbe
stated,
the
author
optsfor
keeping
the
twolanguagesseparate,pointingfrom
bases
of
one
languageto
their
equivalentsin
the
otherlanguage;
the
base
entriescontain
alphabetical
listsof
intensifiers,each
with
examples
and,
optionally,
usage
notes.
Access to
collocations
is
normally
givenviathebases,
a
reverse
part
can beaccessed
by
th eintensifiersthemselves,
leading
toan indexof
bases
modifiable
bya
given
intensifier.
Another
Russian/English
collocational
dictionary is
described
inBenson(1994).
Collocationseem
o
equire ultidimensionalescription
(syntactic,semantic,
pragmatic,
relation
withthedomainmodel,
etc.).
Representing
this
information
and
making
it
accessible,
also
indifferent
'adhoc'combinations,makes
a
flexible
dictionary
structure
necessary,
as it
is
best
achieved
with
computational
tools.
Definingthe
structure
of
a
computational
collocation
dictionary
in
a
way
goingbeyondth e
simple
encoding
in
Heid/Freibott
(1991)
is
an
interesting
task.
19
4.
Acquisit ionand
application
of
collocational
information
4.1
Acquiringcollocationalinformationfromtext
Much
of
the
linguisticand
lexicographic
discussion
about
collocations has
long
been
based
on
a
few
examples,
mostly
made
upby
linguists
for
the
purpose
of
exemplification.
Researchers thus
felt
that
there
isaneed
for
lists
of
ollocationsollectedromextualmaterial.Others,ike
Fontenelle
(1992a),
Fontenelle
(1992b)
have
developed
and
used
computational
tools
to
identify
collocationcandidatesindictionaries
and
toextractcollocation
listsfrom
machine
readabledictionaries.
7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics
22/32
Theway
words
work
together
/
combinatorics
47
Onthe
otherhand,
practicalexicographers
hemselves
need
ools
for
corpusnalysishichouldiveccessoollocationandidates:
collocationaldescriptionindictionariesisanareawherestill
improvements
are
possible
nd
necessary,
nd,
on
he
other
hand,
he
availability
of
collocationlistsextracted
from
texts
is
a
great
advantage
for
thesemantic
descriptionof
lexical
items.
Some
of
the
ools
or
extracting
collocationsrom
exts
are
basedon
statistical
methods.
A
few
statistical
measures
of
similarity
havebeen
used
to
identify
how
often
wordsappear
together
and
how
similar
the
contextsare
wherethese
words
appear.The
measures
mostfrequentlyusedare
mutual
information ,
t-score
and
z-score .Thereare
as
wellothersimilarity
measureswhichhavebeenappliedintoolsforcorpusexploration.
20
We
can
not,
in
the
framework
of
this
article,
discuss
the
different
statistical
toolsinalldetail,and
we
willthusrestrict
ourselves
to
an
informalaccount
oftheworkingsofthemostsimplestatisticaltools;adiscussionof
the
choices
which
exicographers
nd
omputational
inguists
ave
o
ake
hen
applyingstatisticaltoolswitha
view
tothe
retrieval
of
collocation
candidates
fromtext.
On
hat
basis,wecandentifyafewasks
or
both
esearch
and
ool
development:
essentially
the
mpression
s
hat
he
combination
ofboth
statistical
easures
nd
inguisticnformation
e.g.rom
re-
nd
post-analysissteps)is
a
successful
practical
way
forward.
4.1.1
Simple
statistical
measures
Themost
prominent
statistical
measuresused,mplementations
of
whichare
available
tomanylexicographers,arethe
mutualinformation
index
andthe
t-score test.
21
4.1.1.1Mutualinformation
The
mutual
nformation
ndex
MI,
or
short)
s
used
o
measure
he
association
between
twowords;for
a
givencorpus,wecan count howoften
a
givenword
occursinthat
corpus
(frequency).
W e
can
dosuch
statistics
for
allword
forms
ina
corpus
(and
useforexampletheinformationonvery
frequent
orveryare
ordso
decide
whetherhey
houldgo
nto
dictionary
builtfor
that
corpus).Bydividingthefrequency
of
a
word
form
by
the
numberof
word
formsinthecorpus,wegetthe
lexical
probability
of
thewordform
(how
often,inrelationtothe
overall
number
of
word
forms
inthecorpus,does
it
appear?).
When
easuring
utual
nformation,
e
o
ot
nly
bserve
he
probability
of
single
word
forms,butalsothe
probability
of
combinations
of
twowords
(
bigrams ).This
leads
to
threeprobability
values
whichcan
be
compared:
7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics
23/32
248 uralex
1994
-
he
probabilityofthe
first
word form,w\:
P(w\);
heprobability
ofthe
second
word
form,wg
P(w-i);
heprobability
of
a
pair(w\,wj)builtupofthetwoword
forms:
P
(Wl,
W2)).
These
three
values
are
compared:
theprobabilitythat
wjand
H > 2
cooccur
(e.g.
nexttoeachother)isdividedby
the
product
of
the
individual
probabilities
ofH I
and
W 2 each.
22
WhencomputingMI,the
window
(or:
span )within
whichthewordformshavetoappearto
be
takenascooccurring can
be
defined
bythe
user.
We
maylookjust
at
adjacentwordforms
or
at
word
formswhichareatupto5,3,4,...
word
forms
distance
oneofeachother.
Itis
quite
evident
what
the
comparisonwill
tell
us :the
MI
valueishigh,
whenmostof
the
occurrences
ofagivenitemareinfact
cooccurrences
with
the
second
item
selected:
if
we
compare
MI
for
a
set
of
pairs
of, say,
noun
and
adjective,
by
keeping
the
nounconstant,the
adjectives
which
mosttypically
cooccurwith
our
noun
willhave the highestMIindex Typically means:if
the
adjectiveisusednexttoanounatall,
it
isverylikely
that
it
isthenoun
for
which
theMI
is
high.The
famous
clibataireendurcishould
be
easy
to
detect
in
texts thisway.
MIis
dependent
on
the
frequency
of
wordforms.
IfwecalculateMI
for
a
list
ofadjectives,collocating
with
anoun,wedo not see,
however,
in
the
MI
values,
whether
theadjectives
arefrequentornot.
With
MI,
wehave
toface
the
sparse
data
problem :
given
hat
MI
s
similarity
or
ypicality)
measure,
it willyieldhigh
typicality
valuesincaseswhere
an
itemisrarebut
cooccurs
by
chanceorby
ule)with
notherone,n
each
oftsare
occurrencesin
thetext.Rareitems
maythus
be
rankedmuchhigher
than
one
wouldintuitively likethem
to
be .
4.1.1.2
T-score
Thisproblem
of
high
ranking of
rare
forms
can
be
remedied
byuseof
thet-test.Thet-test
operates
onpairs
of
words.
It
finds
thoseadditional
words
which
are
more
likely
to
cooccur
with
one
of
the
two
words
from
the
pair
hanwithhe
other.
Theesultsof
the-test
comes
positive
and
negative
values.
Thehighestandheowestvaluesaresignificant:hey
indicatestrong
association
with
one
or
the
other
word.T-score
is
also
not
reliable
for
low
frequencies.
It
tends
to
indicate
thefrequentwordsthatcooccur
with
the
target
words,
and
it
allows
to
separate
near
synonyms
by
showing
frequentcombination
partners.Churchetal.(1991)haveappliedittodiscriminate
EN
strong
and
powerful
by
findingout
nouns
which
frequently
cooccur
withone
ofthese
adjectives.
7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics
24/32
Thewaywords
work
together/
combinatorics
49
4.1.2
Choices
in
applying
statistical
measures
fo r
identifying
collocations
Thestatisticalmeasuresdescribed
above,
as
well
as
modificationsthereof,
are
often
used
to
identify
collocation
candidates
in
texts;
such
use
depends
ofcourse
on
a
number
of
assumptions
and
choices.
he
distance
of
itemscompared.
MI
and
t-score
can
becalculated
on
immediately
adjacent
tems
or
on
tems
occurringwithin
certain
window orspan (cf.
the
terminologyofSinclair
1991,
Clear1994).
heimpactof textstructure.Themeasurescanbecalculatedonbigrams
within
or
across
sentence
boundaries.Usually,
we
wouldassumethat
limitingourselves
to
occurrences
within
one
sentence
wouldlead
to
more
relevant
results
than
ignoring
sentence
boundaries.
23
he
mpact
f
emmatization
nd
ategorial
nformation.
he
discussion
above,inSection2.2
led
to
the
assumptionwecandescribe
collocations
n
erms
fcategory
ombinations
n+v,
+n,
+adj,
adj+adj,
v+adv),
in
part
evenof
partial
syntactic
structures,such
as
verb+object ,
verb+subject ,
noun+attributive
adjective ,etc.
Mostof
theworkdone sofarinusingMI
and
t-scorewas
performed
onEnglishmaterial.Theimpactofword fromvariation
there
isnotas
important
as
with
inflecting
languages,like
German
or
the
Romance
languages.
Fortheseitseemsusefultohaveanoption,instatistical
programs
o
calculate
he
measures
or
emmas
ather
han
word
forms.oreover,tanesefulo
arry
ut
hetatistical
computationonlyon,say,adjectivesappearingnexttoanoun,oron
verbs
and
heir
nominalobjects,
tc.,
24
.e.
oestrict
search
and
statistical
computation
according
to
syntactic
environments.
There
is
arangeofchoices
in
theapplicationofthestatistical
tools,
andthe
axisonwhichthesechoicescanbearrangedbasically
has
to
dowith
the
amountof
linguistic
informationwhichiskepttrackof,eitherbypre-or
postprocessing:
The
statistical
measures
may
be
applied
to
material
selected
according
o
certain
inguistic
criteria
e.g.
y
use
of
concordances),
or
relevant
material
is
selectedaccording
to
linguisticcriteriafrom
the
set
of
data
extracted
by
statistical
processing.
Proposals
for
tool
building
in view
of
collocation
extraction
hus
should
be
staged
along
with
he
amount
of
information
available
alongwith
the
corpus
text:
-
rawtext,possiblywithsentenceboundaries,
-
textwith
part-of-speech
annotations,
-
lemmatized
andmorphosyntactically
annotated
text,
-