Detection of Relations in Textual Documents
Manuela Kunze,
Dietmar Rösner
University of Magdeburg Knowledge Based Systems and Document Processing
Kunze, Rösner: Detection of Relations in Textual Documents 2
Introduction
http://en.wikipedia.org/wiki/Unsupervised_learning
Kunze, Rösner: Detection of Relations in Textual Documents 3
Introduction
• to extract information from text, you can use techniques like simple pattern matching etc.
• additional knowledge is required:• 'Thursday': a day of a week• meaning of
• (implicit) `open' vs. `close'• `Pay-what-you-wish'
• text understanding / techniques of NLP • `Exhibition of over 30 color photographs and stories of life in
China's Yunnan Province …'
Kunze, Rösner: Detection of Relations in Textual Documents 4
Introduction
ontologies contain information about:
• definition/description of concepts and
• description of instances
• kind of relation (name, type),– definition of domain and range values,
– characteristic of the relation: cardinality, transitivity, ...,
Kunze, Rösner: Detection of Relations in Textual Documents 5
Natural Language Processing
• NLP techniques: – case frame analysis– exploiting syntactic structures– corpus-based IE for an initial ontology
• corpus:– autopsy protocols (400 protocols)– different document parts:
• findings• histological findings• background• discussion• …
– short linguistic structures – typical attribute-value structures
Kunze, Rösner: Detection of Relations in Textual Documents 6
Overview
Case Frame
Analysis of Specific Syntactic Structures
Discussion/Conclusion
Kunze, Rösner: Detection of Relations in Textual Documents 7
Case Frames
• resources:– results from syntactic parser
<NP TYPE="COMPLEX" RULE="NPC3" GEN="MAS" NUM="SG" CAS="NOM"> <NP TYPE="FULL" RULE="NP1" CAS="NOM" NUM="SG" GEN="MAS"> <N>Flachschnitt</N> </NP> <PP RULE="PP1" CAS="AKK"> <PRP CAS="AKK">in</PRP> <NP TYPE="FULL" RULE="NP2" CAS="AKK" NUM="SG" GEN="NTR"> <DETD>das</DETD> <N>Zungengewebe</N> </NP> </PP> </NP>
– results from semantic tagger– description of case frames
Kunze, Rösner: Detection of Relations in Textual Documents 8
Case Frames
• (corpus-based) definition of roles for a concept– `Flachschnitt' (flat cut)
• `location'– sem. category: `tissue'– PP, case of NP: accusative, preposition: `in'
– `Herausschleudern' (skidding)• `patient'
– sem. category: `body-hum'– NP; case of NP: genitive
• `location' – sem. category: `vehicle' – PP, case of NP: dative, preposition: `aus'
Kunze, Rösner: Detection of Relations in Textual Documents 9
Case Frames…<CONCEPT TYPE="medicalOperation">
<WORD>Flachschnitt</WORD> <DESC>medizinischer Schnitt</DESC> <SLOTS> <RELATION TYPE="LOCATION"> <ASSIGN_TO>TISSUE</ASSIGN_TO> <FORM>P(akk, fak, in)</FORM> <CONTENT>in das Zungengewebe</CONTENT> </RELATION> </SLOTS> </CONCEPT>
<CONCEPT TYPE="traffic-event"> <WORD>Herausschleudern</WORD> <DESC>event</DESC> <SLOTS> <RELATION TYPE="PATIENT"> <ASSIGN_TO>BODY-HUM</ASSIGN_TO> <FORM>N(gen, fak)</FORM> <CONTENT>des Koerpers</CONTENT> </RELATION> <RELATION TYPE="LOCATION"> <ASSIGN_TO>VEHICLE</ASSIGN_TO> <FORM>P(dat, fak, aus)</FORM> <CONTENT></CONTENT> </RELATION> </SLOTS> </CONCEPT>
…
Kunze, Rösner: Detection of Relations in Textual Documents 10
Case Frames
• coverage of phrases like `fracture of elbow joint'?
• abstraction– `fracture' (sem. category: `trauma')
• role `patient': sem. category: `bone'
– `bruise' (sem. category: `trauma')• role `patient': sem. category: `organ'
– `hematoma' (sem. category: `trauma')• role `patient': sem. category: `tissue'
• concept x (sem. category: `trauma')– role `patient': sem. category: `body-part'
Kunze, Rösner: Detection of Relations in Textual Documents 11
Case Frames
• results:– relations are defined by the case frame
• name/type of relation• domain, range
– corpus-based abstractions:• redefinition of semantic restriction
– use the least general hypernym as semantic restriction
• not yet extracted:– information about the characteristic of a relation
Kunze, Rösner: Detection of Relations in Textual Documents 12
Overview
Case Frame
Analysis of Specific Syntactic Structures
Discussion/Conclusion
Kunze, Rösner: Detection of Relations in Textual Documents 13
Analysis of Specific Syntactic Structures
• from general to specific information• resources:
– results from syntactic parser– results from semantic tagger– description of interpretation of syntactic structures
• Which word class can be interpreted as concept/instance?
• Which word class describes a relation?– adjective in a NP: describes the noun in the NP relation `prop‘– negations: negate concepts, verbs, or properties of a concept– particle: modification of adjectives
Kunze, Rösner: Detection of Relations in Textual Documents 14
Analysis of Specific Syntactic Structures
CLMed N ADJ
prop(N, ADJ)
N interpreted as concept
ADJ interpreted as concept
results:
prop_catadj(N,ADJ)
Kunze, Rösner: Detection of Relations in Textual Documents 15
Analysis of Specific Syntactic Structures
`liver tissue bloodless‘
Steps:
bloodless*blood
concentrationbloodless
liver_tissue* tissueliver tissue
• nouns and adjectives are interpreted as concept/instance
• adjectives describe a relation• in general: 'prop'
prop_blood-concentrationprop_blood-concentration
conceptinstancerelation
Kunze, Rösner: Detection of Relations in Textual Documents 16
Analysis of Specific Syntactic Structures`liver tissue bloodless‘
…
<owl:Class rdf:ID="lebergewebe">
<rdfs:subClassOf><owl:Class rdf:ID="tissue"/></rdfs:subClassOf></owl:Class>
<owl:Class rdf:ID="blood-concentration"/>
<owl:Class rdf:ID="blutleer">
<rdfs:subClassOf rdf:resource="#blood-concentration"/></owl:Class>
<owl:ObjectProperty rdf:ID="prop_blood-concentration">
<rdfs:domain rdf:resource="#tissue"/><rdfs:range rdf:resource="#blood-concentration"/></owl:ObjectProperty>
<lebergewebe rdf:ID="Lebergewebe_6">
<prop_blood-concentration><blutleer rdf:ID="blutleer_7"/></prop_blood-concentration></lebergewebe> …
Kunze, Rösner: Detection of Relations in Textual Documents 17
Analysis of Specific Syntactic Structures"kaum wahrnehmbare Unterblutungen"(Engl. "hardly detectable hematomas")
results of syntactic parser:<NP TYPE="FULL" RULE="NP4" CAS="_" NUM="PL" GEN="FEM">
<ADJP RULE="ADJP1">
<ADV>kaum</ADV>
<ADJ>wahrnehmbare</ADJ>
</ADJP>
<N>Unterblutungen</N>
</NP>
results of semantic tagger:– `kaum': weak-graduation– `wahrnehmbar': unknown token– `Unterblutung': trauma
resources for interpretation:• N: concept/instance• ADJ:
• concept/instance• rel: prop
• ADV:• concept/instance• rel: mod
adverb specifies adjective
adjective specifies noun
Kunze, Rösner: Detection of Relations in Textual Documents 18
Analysis of Specific Syntactic Structures
`hardly detectable hematomas‘ Steps:
detectable* unspecified
hematoma* traumahematoma
• nouns, adjectives and adverbs are interpreted as concept/instance
• adjectives and adverbs describe relations
prop_unspecifiedprop_unspecified
conceptinstancerelation
hardly* hardly weak-graduation
mod_weak-graduationmod_weak-graduation
Kunze, Rösner: Detection of Relations in Textual Documents 19
Analysis of Specific Syntactic Structures`hardly detectable hematomas‘
<owl:Class rdf:ID="unterblutung"><rdfs:subClassOf rdf:resource="#trauma"/></owl:Class>
<owl:Class rdf:ID="trauma"/>
<owl:Class rdf:ID="wahrnehmbar">
<rdfs:subClassOf rdf:resource="#unspecified"/></owl:Class>
<owl:Class rdf:ID="unspecified"/>
<owl:Class rdf:ID="kaum">
<rdfs:subClassOf rdf:resource="#weak-graduation"/></owl:Class>
<owl:Class rdf:ID="weak-graduation"/>
Kunze, Rösner: Detection of Relations in Textual Documents 20
Analysis of Specific Syntactic Structures`hardly detectable hematomas‘
<owl:ObjectProperty rdf:ID="mod_weak-graduation">
<rdfs:domain rdf:resource="#unspecified"/>
<rdfs:range rdf:resource="#weak-graduation"/></owl:ObjectProperty>
<owl:ObjectProperty rdf:ID="prop_unspecified">
<rdfs:domain rdf:resource="#trauma"/>
<rdfs:range rdf:resource="#unspecified"/></owl:ObjectProperty>
<unterblutung rdf:ID="Unterblutungen_5">
<prop_unspecified rdf:resource="#wahrnehmbare_4"/></unterblutung>
<wahrnehmbar rdf:ID="wahrnehmbare_4">
<mod_weak-graduation rdf:resource="#kaum_3"/></wahrnehmbar>
<kaum rdf:ID="kaum_3"></kaum>
Kunze, Rösner: Detection of Relations in Textual Documents 21
Analysis of Specific Syntactic Structures
conceptinstancerelation
Protégé Plugin for Visualization: Ontoviz
Phrases like: • NP NP NP• NP N Adj Conj Adj• NP N conj N Adj• …
Kunze, Rösner: Detection of Relations in Textual Documents 22
Analysis of Specific Syntactic Structures
• results– definition of concepts/instances– corpus-based definition/concretion of relations:
• prop prop_catADJ
• information about domain, relation
• not extracted:– information about the characteristic of a relation
Kunze, Rösner: Detection of Relations in Textual Documents 23
Overview
Case Frame
Analysis of Specific Syntactic Structures
Discussion/Conclusion
Kunze, Rösner: Detection of Relations in Textual Documents 24
Conclusion
• NLP techniques for extraction of information– analyse syntactic structures – information about semantic categories– result: corpus-based description of an initial ontology
• case frame analysis– relations are described in the case frame– disadvantage: creation of case frames– advantage: a definition of the relation
• analysis specific syntactic structures– a general interpretation of tokens and the syntactic structures– redefined by results from the semantic tagger– disadvantage: in some case, only the general relation definition is
delivered– advantage: less effort to describe the resources
Kunze, Rösner: Detection of Relations in Textual Documents 25
Conclusion
• no information about the characteristic of a relation (cardinality, …)
• solutions– analyse occurrences in the corpus
• corpus-based assumption about cardinality
– integration of additional knowledge• initial domain specific ontology
Kunze, Rösner: Detection of Relations in Textual Documents 26
Key Aspects for IE
• ‘conceptual’ preprocessing steps: Names of concepts occur in different linguistic structures; compound vs. complex noun phrase (like ‘liver tissue’ and ’tissue of liver’)
– handle only one canonical linguistic structure as a representative for all paraphrases
• treatment of generalisation within local contexts – The token ‘liver’ may occur in the first sentence of a paragraph. In the next sentences
of the paragraph, only the hypernym ‘organ’ is used.
• concept or instance: which term in a linguistic structure has to be interpreted as a concept and which as an instance of a concept resp.
• definition of the scope for a concept: – a paragraph starts with a description of an organ (e.g. organ ‘liver’ in: ‘The liver
shows ... . Bloodrichness of the tissue.’ ), after this follows a description of parts of the organ (e.g., ‘Gewebe’). In such cases, additional knowledge about the domain has to be employed (for example, about meronyms or holonyms)
– tissue part-of liver vs tissue part-of concept X
Top Related