Using lexical chains for text summarization

16
17/07/22 Anthony -Claret Onwutalobi University of Helsinki, 1 Using lexical chains for Text summarization Define key Concepts: Text summarization: is a text that is produced from one or more texts, that contains a significant portion of the information in the original text(s), in a condensed form •Lexical chains : is a sequence of related words in the writing, spanning short or long distances in entire text. E.g.: Rome → capital → city → inhabitant •Source: http://en.wikipedia.org/wiki/Lexical_chain

Transcript of Using lexical chains for text summarization

Page 1: Using lexical chains for text summarization

15/04/23 Anthony -Claret Onwutalobi University of Helsinki, Finland

1

Using lexical chains for Text summarization

Define key Concepts: Text summarization: is a text that is produced from one or more texts, that contains a significant portion of the information in the original text(s), in a condensed form

•Lexical chains : is a sequence of related words in the writing, spanning short or long distances

in entire text. E.g.: Rome → capital → city → inhabitant •Source: http://en.wikipedia.org/wiki/Lexical_chain

Page 2: Using lexical chains for text summarization

Using lexical chains for Text summarizationAnthony-Claret Onwutalobi

15/04/23 Anthony Claret University of Helsinki, Finland

2

Page 3: Using lexical chains for text summarization

15/04/23 Anthony Claret University of Helsinki, Finland

3

Goal of Text summarization

• Automated summarization tools can help people to grasp main concepts of information sources in a short time.

• The motivation for such work is to build such tool which is computationally efficient and creates summaries automatically

• Reduce the size of a document while preserving its content

Page 4: Using lexical chains for text summarization

15/04/23 Anthony Claret University of Helsinki, Finland

4

Few attempts to achieve the goal and its constraints

• Frequency based method: most frequent words represent the most important concepts of the text. it means that most frequent words are assumed key words and are abstracted into a frequency table

• Constraints:

• This method Ignores the semantic content of words and their relationship with other words or phrases

• Cue phrase method: this suggest that the first paragraph or first sentence of each paragraph contains topic information and some key word, like significantly, impossible, hardly these words are assumed to be key topic

Page 5: Using lexical chains for text summarization

15/04/23 Anthony Claret University of Helsinki, Finland

5

Cont,

• Constraint (cue phrase or topic based method)

• Style specific – most article have different format and style of writing and makes it difficult to use them

• Advantages of the two techniques

• Easy computation

Page 6: Using lexical chains for text summarization

15/04/23 Anthony Claret University of Helsinki, Finland

6

To overcome the limitation of the two method

• Lexical chains are used to determine the central theme of the text. The chains are created using semantically related words and the concept represented by the strongest chain is the theme of the text.

• •Lexical chains are sequences of words in a text that represent the same topic. Thus, it deals with the problem of word sense disambiguation (WSD).

• •Lexical chains can be computed in a source document by grouping (chaining) a set of words that are semantically related (i.e. have a sense flow).

• •Lexical chains require the use of an ontology or a database which has predefined chains of semantically similar words.

• •Identities, synonyms, and hypernyms/ hyponyms are the relations among words that might cause them to be grouped into the same lexical chain.

• •WordNet thesaurus is used for this purpose

Page 7: Using lexical chains for text summarization

15/04/23 Anthony Claret University of Helsinki, Finland

7

WordNet Thesaurus

• WordNet – aggregated synonym occurrences that appear to be related or have the same concept

• So in using this, lexical chain is constructed by calculating the semantic distance between words using WordNet

• -strong lexical chains are selected and the sentences related to those strong chains are chosen as summary

Page 8: Using lexical chains for text summarization

15/04/23 Anthony Claret University of Helsinki, Finland

8

Four steps in Text summarization

• Segmentation of the original source text

• Construction of lexical chains

• Identification of strong chains

• Extraction of significant sentences

Page 9: Using lexical chains for text summarization

15/04/23 Anthony Claret University of Helsinki, Finland

9

Overall Design of the proposed System (ATS)

Page 10: Using lexical chains for text summarization

15/04/23 Anthony Claret University of Helsinki, Finland

10

Three steps for constructing lexical chains

• Select a set of candidate words;• For each candidate word, find an

appropriate chain relying on a relatedness criterion among members of the chains

• If it is found, insert the word in the chain and update it accordingly

Page 11: Using lexical chains for text summarization

15/04/23 Anthony Claret University of Helsinki, Finland

11

FORMATION OF LEXICAL CHAINS (Using WordNet)

1.For each noun instance (candidate word, cw)i. Collect Sense Numbers and SynsetOffsets of words

from WordNet. ii. For each of these senses, find the words having

following relationships with them:•Synonyms•Hypernyms(upto2 levels of depth)•Hyponyms (upto2 levels of depth) •Put the pair (cw, weight) at the end of linked list containing all

such pairs in the hash table indexed by the sense offset

2.For each noun instance & for each its corresponding “lexical-chain”

• Keep word instance in the “lexical chain” to which it contributes the most

• update the score of the “lexical chain”

Page 12: Using lexical chains for text summarization

15/04/23 Anthony Claret University of Helsinki, Finland

12

Example

• Assume the sentences “John has a computer. The machine is an IBM.” and that the nouns have the following senses/synonyms/hypernyms/hyponyms

• : John(0), computer(1, 2), machine(0, 2,3), unit (3),and these words are put in a chain if they have identity, synonym, hypernym/hyponym relations upto2 levels. The below table depicts the lexical chains.

Page 13: Using lexical chains for text summarization

15/04/23 Anthony Claret University of Helsinki, Finland

13

Example Sense Index

Sense meaning Element 1 Element 2 Element 3

Chain 1 0 Person {John, 1} {machine, 0.5}

Chain 2 1 Unit {computer, 1}

Chain 3 2 device {computer, 1} {machine, 1}

Chain 4 3 organization {machine, 0.5} {unit, 0.5}

Chain 6 4

.

.

Chain N N-1

Page 14: Using lexical chains for text summarization

15/04/23 Anthony Claret University of Helsinki, Finland

14

Lexical Chains (cont…)

• Scoring Scheme• IdenticalWord = 1• Synonym = 1• Hypernym/Hyponym = 0.5

• Data Structures:• element = [“candidate word’, weight]• chain = [element1, element2, …, elementN]• lexical_chains = [hashed chain1, hashed chain2, …,

hashed chainN]

Page 15: Using lexical chains for text summarization

15/04/23 Anthony Claret University of Helsinki, Finland

15

IDENTIFYING STRONG LEXICAL CHAINS

1. Compute the aggregate score of each chain by summing the scores of each individual element in the chain.

2. Pick up the chains whose score is more than the mean of the scores for every chain computed in the document.

3. For each of the strong chains, identify representative words, whose contribution to the chain is maximum

4. Choose the sentence that contains the first appearance of a representative chain member in the text.

Page 16: Using lexical chains for text summarization

15/04/23 Anthony Claret University of Helsinki, Finland

16

• 3.For each of the strong chains, identify representative words, whose contribution to the chain is maximum.

• 4.Choose the sentence that contains the first appearance of a representative chain member in the text