Tutorial on automatic summarization

Automatic summarisation in the Information Age

Constantin Orasan

Research Group in Computational LinguisticsResearch Institute in Information and Language Processing

University of Wolverhamptonhttp://www.wlv.ac.uk/~in6093/

http://www.summarizationonline.info

12th Sept 2009

http://www.wlv.ac.uk/~in6093/


Structure of the course

1 Introduction to automatic summarisation

2 Important methods in automatic summarisation

3 Automatic summarisation and the Internet


1 Introduction to automatic summarisationWhat is a summary?What is automatic summarisationContext factorsEvaluation

General information about evaluationDirect evaluationTarget-based evaluationTask-based evaluationAutomatic evaluationEvaluation conferences

2 Important methods in automatic summarisation


What is a summary?

Abstract of scientific paper

Source: (Sparck Jones, 2007)

Summary of a news event

Source: Google news http://news.google.com

http://news.google.com

Summary of a web page

Source: Bing http://www.bing.com

http://www.bing.com

Summary of financial news

Source: Yahoo! Finance http://finance.yahoo.com/

http://finance.yahoo.com/

Maps

Source: Google Maps http://maps.google.co.uk/

http://maps.google.co.uk/

Summaries in everyday life

• Headlines: summaries of newspaper articles

• Table of contents: summary of a book, magazine

• Digest: summary of stories on the same topic

• Highlights: summary of an event (meeting, sport event, etc.)

• Abstract: summary of a scientific paper

• Bulletin: weather forecast, stock market, news

• Biography: resume, obituary

• Abridgment: of books

• Review: of books, music, plays

• Scale-downs: maps, thumbnails

• Trailer: from film, speech

Summaries in the context of this tutorial

• are produced from the text of one or several documents

• the summary is a text or a list of sentences

Definitions of summary

• “an abbreviated, accurate representation of the content of adocument preferably prepared by its author(s) for publicationwith it. Such abstracts are also useful in access publicationsand machine-readable databases” (American NationalStandards Institute Inc., 1979)

• “an abstract summarises the essential contents of a particularknowledge record, and it is a ” (Cleveland, 1983)

• “the primary function of abstracts is to and the structureand content of the text” (van Dijk, 1980)



• “an abstract summarises the essential contents of a particularknowledge record, and it is a true surrogate of the document”(Cleveland, 1983)

• “the primary function of abstracts is to and the structureand content of the text” (van Dijk, 1980)



• “an abstract summarises the essential contents of a particularknowledge record, and it is a true surrogate of the document”(Cleveland, 1983)

• “the primary function of abstracts is to indicate and predictthe structure and content of the text” (van Dijk, 1980)

Definitions of summary (II)

• “the abstract is a time saving device that can be used to finda particular part of the article without reading it; [...] knowingthe structure in advance will help the reader to get into thearticle; [...] as a summary of the article, it can serve as areview, or as a clue to the content”. Also, an abstract gives“an exact and concise knowledge of the total content of thevery much more lengthy original, a factual summary which isboth an elaboration of the title and a condensation of thereport [...] if comprehensive enough, it might replace readingthe article for some purposes” (Graetz, 1985).

• these definitions refer to human produced summaries

Definitions for automatic summaries

• these definitions are less ambitious

• “a concise representation of a document’s content to thereader to determine its to a specific information” (Johnson,1995)

• “a summary is a text produced from one or more texts, thatcontains a in the original text(s), and is not longer than halfof the original text(s)”. (Hovy, 2003)



• “a concise representation of a document’s content to enablethe reader to determine its relevance to a specificinformation” (Johnson, 1995)

• “a summary is a text produced from one or more texts, thatcontains a in the original text(s), and is not longer than halfof the original text(s)”. (Hovy, 2003)



• “a concise representation of a document’s content to enablethe reader to determine its relevance to a specificinformation” (Johnson, 1995)

• “a summary is a text produced from one or more texts, thatcontains a significant portion of the information in the originaltext(s), and is not longer than half of the original text(s)”.(Hovy, 2003)

What is automatic summarisation?

What is automatic (text) summarisation

• Text summarisation• a reductive transformation of source text to summary text

through content reduction by selection and/or generalisationon what is important in the source. (Sparck Jones, 1999)

• the process of the most important information from a source(or sources) to produce an abridged version for a . (Mani andMaybury, 1999)

• Automatic text summarisation = The process of producingsummaries automatically.

What is automatic (text) summarisation

• Text summarisation• a reductive transformation of source text to summary text

through content reduction by selection and/or generalisationon what is important in the source. (Sparck Jones, 1999)

• the process of distilling the most important information from asource (or sources) to produce an abridged version for aparticular user (or users) and task (or tasks). (Mani andMaybury, 1999)

• Automatic text summarisation = The process of producingsummaries automatically.

Related disciplines

There are many disciplines which are related to automaticsummarisation:

• automatic categorisation/classification

• term/keyword extraction

• information retrieval

• information extraction

• question answering

• text generation

• data/opinion mining

Automatic categorisation/classification

• Automatic text categorisation• is the task of building software tools capable of classifying text

documents under predefined categories or subject codes• each document can be in one or several categories• examples of categories: Library of Congress subject headings

• Automatic text classification• is usually considered broader than text categorisation• includes text clustering and text categorisation• in does not necessary require to know the classes• Examples: email/spam filtering, routing,

Term/keyword extraction

• automatically identifies terms/keywords in texts

• a term is a word or group of words which are important in adomain and represent a concept of the domain

• a keyword is an important word in a document, but it is notnecessary a term

• terms and keywords are extracted using a mixture ofstatistical and linguistic approaches

• automatic indexing identifies all the relevant occurrences of akeyword in texts and produces indexes

Information retrieval (IR)

• Information retrieval attempts to find information relevant toa user query and rank it according to its relevance

• the output is usually a list of documents in some casestogether with relevant snippets from the document

• Example: search engines

• needs to be able to deal with enormous quantities ofinformation and process information in any format (e.g. text,image, video, etc.)

• is a field which achieved a level of maturity and is used inindustry and business

• combines statistics, text analysis, link analysis and userinterfaces

Information extraction (IE)

• Information extraction is the automatic identification ofpredefined types of entities, relations or events in free text

• quite often the best results are obtained by rule-basedapproaches, but machine learning approaches are used moreand more

• can generate database records

• is domain dependent

• this field developed a lot as a result of the MUC conferences

• one of the tasks in the MUC conferences was to fill intemplates

• Example: Ford appointed Harriet Smith as president

• Person: Harriet Smith• Job: president• Company: Ford

Information extraction (IE)

• Information extraction is the automatic identification ofpredefined types of entities, relations or events in free text

• quite often the best results are obtained by rule-basedapproaches, but machine learning approaches are used moreand more

• can generate database records

• is domain dependent

• this field developed a lot as a result of the MUC conferences

• one of the tasks in the MUC conferences was to fill intemplates

• Example: Ford appointed Harriet Smith as president• Person: Harriet Smith• Job: president• Company: Ford

Question answering (QA)

• Question answering aims at identifying the answer to aquestion in a large collection of documents

• the information provided by QA is more focused thaninformation retrieval

• a QA system should be able to answer any question andshould not be restricted to a domain (like IE)

• the output can be the exact answer or a text snippet whichcontains the answer

• the domain took off as a result of the introduction of QAtrack in TREC

• user-focused summarisation = open-domain questionanswering

Text generation

• Text generation creates text from computer-internalrepresentations of information

• most generation systems rely on massive amounts of linguisticknowledge and manually encoded rules for translating theunderlying representation into language

• text generation systems are very domain dependent

Data mining

• Data mining is the (semi)automatic discovery of trends,patterns or unusual data across very large data sets, usuallyfor the purposes of decision making

• Text mining applies methods from data mining to textualcollections

• Processes really large amounts of data in order to find usefulinformation

• In many cases it is not known (clearly) what is sought

• Visualisation has a very important role in data mining

Opinion mining

• Opinion mining (OM) is a recent discipline at the crossroadsof information retrieval and computational linguistics which isconcerned not with the topic a document is about, but withthe opinion it expresses.

• Is usually applied to collections of documents (e.g. blogs) andseen part of text/data mining

• Sentiment Analysis, Sentiment Classification, OpinionExtraction are other names used in literature to identify thisdiscipline.

• Examples of OM problems:• What is the general opinion on the proposed tax reform?• How is popular opinion on the presidential candidates evolving?• Which of our customers are unsatisfied? Why?

Characteristics of summaries

Context factors

• the context factors defined by Sparck Jones (1999; 2001)represent a good way of characterising summaries

• they do not necessary refer to automatic summaries

• they do not necessary refer to summaries

• there are three types of factors:

• input factors: characterise the input document(s)• purpose factors: define the transformations necessary to obtain

the output• output factors: characterise the produced summaries

Context factors




• there are three types of factors:• input factors: characterise the input document(s)

• purpose factors: define the transformations necessary to obtainthe output

• output factors: characterise the produced summaries

Context factors




• there are three types of factors:• input factors: characterise the input document(s)• purpose factors: define the transformations necessary to obtain

the output

• output factors: characterise the produced summaries

Context factors




• there are three types of factors:• input factors: characterise the input document(s)• purpose factors: define the transformations necessary to obtain

the output• output factors: characterise the produced summaries

Context factors

Input factors Purpose factors Output factorsForm Situation Form

- Structure Use - Structure- Scale Summary type - Scale- Medium Coverage - Medium- Genre Relation to source - Language- Language - Format- Format Subject matter

Subject typeUnit

Input factors - Form

• structure: explicit organisation of documents.Can be problem - solution structure of scientific documents,pyramidal structure of newspaper articles, presence ofembedded structure in text (e.g. rhetorical patterns)

• scale: the length of the documentsDifferent methods need to be used for a book and for anewspaper article due to very different compression rates

• medium: natural language/sublanguage/specialised languageIf the text is written in a sublanguage it is less ambiguous andtherefore it’s easier to process.


• language: monolingual/multilingual/cross-lingual

• Monolingual: the source and the output are in the samelanguage

• Multilingual: the input is in several languages and output inone of these languages

• Cross-lingual: the language of the output is different from thelanguage of the source(s)

• formatting: whether the source is in any special formatting.This is more a programming problem, but needs to be takeninto consideration if information is lost as a result ofconversion.


• language: monolingual/multilingual/cross-lingual• Monolingual: the source and the output are in the same

language

• Multilingual: the input is in several languages and output inone of these languages





language• Multilingual: the input is in several languages and output in

one of these languages





language• Multilingual: the input is in several languages and output in

one of these languages• Cross-lingual: the language of the output is different from the

language of the source(s)


Input factors

• Subject type: intended readershipIndicates whether the source was written from the generalreader or for specific readers. It influences the amount ofbackground information present in the source.

• Unit: single/multiple sources (single vs. multi-documentsummarisation)mainly concerned with the amount of redundancy in the text

Why input factors are useful?

The input factors can be used whether to summarise a text or not:

• Brandow, Mitze, and Rau (1995) use structure of thedocument (presence of speech, tables, embedded lists, etc.)to decide whether to summarise it or not.

• Louis and Nenkova (2009) train a system on DUC data todetermine whether the result is expected to be reliable or not.

Purpose factors

• Use: how the summary is used

• retrieving: the user uses the summary to decide whether toread the whole document,

• substituting: use the summary instead of the full document,• previewing: get the structure of the source, etc.

• Summary type: indicates how is the summary

• indicative summaries provide a brief description of the sourcewithout going into details,

• informative summaries follow the ideas main ideas andstructure of the source

• critical summaries give a description of the source and discussits contents (e.g. review articles can be considered criticalsummaries)

Purpose factors

• Use: how the summary is used• retrieving: the user uses the summary to decide whether to

read the whole document,

• substituting: use the summary instead of the full document,• previewing: get the structure of the source, etc.





Purpose factors


read the whole document,• substituting: use the summary instead of the full document,

• previewing: get the structure of the source, etc.





Purpose factors


read the whole document,• substituting: use the summary instead of the full document,• previewing: get the structure of the source, etc.





Purpose factors



• Summary type: indicates how is the summary• indicative summaries provide a brief description of the source

without going into details,



Purpose factors




without going into details,• informative summaries follow the ideas main ideas and

structure of the source


Purpose factors




without going into details,• informative summaries follow the ideas main ideas and

structure of the source• critical summaries give a description of the source and discuss

its contents (e.g. review articles can be considered criticalsummaries)

Purpose factors

• Relation to source: whether the summary is an extract orabstract

• extract: contains units directly extracted from the document(i.e. paragraphs, sentences, clauses),

• abstract: includes units which are not present in the source

• Coverage: which type of information should be present in thesummary

• generic: the summary should cover all the importantinformation of the document,

• user-focused: the user indicates which should be the focus ofthe summary

Output factors

• Scale (also referred to as compression rate): indicates thelength of the summary

• American National Standards Institute Inc. (1979)recommends 250 words

• Borko and Bernier (1975) point out that imposing an arbitrarylimit on summaries is not good for their quality, but that alength of around 10% is usually enough

• Hovy (2003) requires that the length of the summary is keptless then half of the source’s size

• Goldstein et al. (1999) point out that the summary lengthseems to be independent from the length of the source

• the structure of the output can be influenced by the structureof the input or by existing conventions

• the subject matter can be the same as the input, or can bebroader when background information is added

Evaluation of automaticsummarisation

Why is evaluation necessary?

• Evaluation is very important because it allows us to assess theresults of a method or system

• Evaluation allows us to compare the results of differentmethods or systems

• Some types of evaluation allow us to understand why amethod fails

• almost each field has its specific evaluation methods

• there are several ways to perform evaluation• How the system is considered• How humans interact with the evaluation process• What is measured

How the system is considered

• black-box evaluation:• the system is considered opaque to the user• the system is considered as a whole• allows direct comparison between different systems• does not explain the system’s performance

• glass-box evaluation:• each of the system’s components are assessed in order to

understand how the final result is obtained• is very time consuming and difficult• relies on phenomena which are not fully understood (e.g. error

propagation)

How humans interact with the process

• off-line evaluation• also called automatic evaluation because it does not require

human intervention• usually involves the comparison between the system’s output

and a gold standard• very often annotated corpora are used as gold standards• are usually preferred because they are fast and not directly

influenced by the human subjectivity• can be repeated• cannot be (easily) used in all the fields

• online evaluation• requires humans to assess the output of the system according

to some guidelines• is useful for those tasks where the output of the system cannot

be uniquely predicted (e.g. summarisation, text generation,question answering, machine translation)

• are time consuming, expensive and cannot be easily repeated

What it is measured

• intrinsic evaluation:• evaluates the results of a system directly• for example: quality, informativeness• sometimes does not give a very accurate view of how useful

the output can be for another task

• extrinsic evaluation:• evaluates the results of another system which uses the results

of the first• examples: post-edit measures, relevance assessment, reading

comprehension

Evaluation used in automaticsummarisation

• evaluation is very difficult task because there is no clear ideawhat constitutes a good summary

• the number of perfectly acceptable summaries from a text isnot limited

• four types of evaluation methods

Intrinsic Extrinsic

On-line Direct evaluation Task-based evaluation

Off-line evaluation Target-based evaluation Automatic evaluation

Direct evaluation

• intrinsic & online evaluation

• requires humans to read summaries and measure their qualityand informativeness according to some guidelines

• is one of the first evaluation methods used in automaticsummarisation

• to a certain extent it is quite straight forward which makes itappealing for small scale evaluation

• it is time consuming, subjective and in many cases cannot berepeated by others

Direct evaluation: quality

• it tries to assess the quality of a summary independently fromthe source

• can be simple classification of sentences in acceptable orunacceptable

• Minel, Nugier, and Piat (1997) proposed an evaluationprotocol which considers the coherence, cohesion and legibilityof summaries

• cohesion of a summary is measured in terms of danglinganaphors

• the coherence in terms of discourse ruptures.• the legibility is decided by jurors who are requested to classify

each summary in very bad, bad, mediocre, good and very good.

• it does not assess the contents of a summary so it could bemisleading

Direct evaluation: informativeness

• assesses how correctly the information in the source isreflected in the summary

• the judges are required to read both the source and thesummary, for this reason making the process longer and moreexpensive

• judges are generally required to:• identify important ideas from the source which do not appear

in the summary• ideas from the summary which are not important enough and

therefore should not be there• identify the logical development of the ideas and see whether

they appear in the summary

• given that it is time consuming automatic methods tocompute the informativeness are preferred

Target-based evaluation

• it is the most used evaluation method

• compares the automatic summary with a gold standard

• they are appropriate for extractive summarisation methods

• it is intrinsic and off-line

• it does not require to have humans involved in the evaluation

• has the advantage of being fast, cheap and can be repeatedby other researchers

• the drawback is that it requires a gold standard which usuallyis not easy to produce

Corpora as gold standards

• usually annotated corpora are used as gold standard

• usually the annotation is very simple: for each sentence itindicates whether it is important enough to be included in thesummary or not

• such corpora are normally used to assess extracts

• can be produced manually and automatically

• these corpora normally represent one point of view

Manually produced corpora

• Require human judges to read each text from the corpus andto identify the important units in each text according toguidelines

• Kupiec, Pederson, and Chen (1995) and Teufel and Moens(1997) took advantage of the existence of human producedabstracts and asked human annotators to align sentences fromthe document with sentences from the abstracts.

• it is not necessary to use specialised tools apply thisannotation, but in many cases they can help

Guidelines for manually annotated corpora

• Edmundson (1969) annotated a heterogenous corpusconsisting of 200 documents in the fields of physics, lifescience, information science and humanities. The importantsentences were considered to be those which indicated:

• what the subject area is,• why the research is necessary,• how the problem is solved,• which are the findings of the research.

• Hasler, Orasan, and Mitkov (2003) annotated a corpus ofnewspaper articles and the important sentences wereconsidered those linked to the main topic of text as indicatedin the title (See http://clg.wlv.ac.uk/projects/CAST/ for thecomplete guidelines)

Problems with manually produced corpora

• given how subjective the identification of important sentencesis, the agreement between annotators is low

• the inter-annotator agreement is determined by the genre oftexts and the length of summaries

• Hasler, Orasan, and Mitkov (2003) tries to measure theagreement between three annotators and notice very lowvalue, but

• when the contents is compared the agreement increases

Automatically produced corpora

• Relies on the fact that very often human produce summariesby copy-paste from the source

• there are algorithms which identify sets of sentences from thesource which cover the information in the summary

• Marcu (1999) employed a greedy algorithm which eliminatessentences from the whole document that do not reduce thesimilarity between the summary and the remaining sentences.

• Jing and McKeown (1999) treat the human produced abstractas a sequence of words which appears in the document, andreformulate the problem of alignment as the problem offinding the most likely position of the words from the abstractin the full document using a Hidden Markov Model.

Evaluation measures used with annotatedcorpora

• usually precision, recall and f-measure are used to calculatethe performance of a system

• the list of sentences extracted by the program is comparedwith the list of sentences marked by humans

Extracted by program Not-extracted by program

Extracted by humans True Positives False negatives

Not extracted by humans False positives True negatives

Precision =TruePositives

TruePositives + FalsePositives

Recall =TruePositives

TruePositives + FalseNegatives

F − score =(β2 + 1)PR

β2P + R

Summary Evaluation Environment (SEE)

• SEE environment was is being used in the DUC evaluations

• is a combination between direct and target evaluation

• it requires humans to assess whether each unit from theautomatic summary appears in the target summary

• it also offers the option to answer questions about the qualityof the summary (e.g. Does the summary build from sentenceto sentence to a coherent body of information about thetopic?)

Relative utility of sentences (Radev et. al.,2000)

• Addresses the problem that humans often disagree when theyare asked to select the top n% sentences from a document

• Each sentence in the document receives a score from 1 to 10depending on how “summary worthy” is

• The score of an automatic summary is the normalised score ofthe extracted sentences

• When several judges are available the score of a summary isthe average over all judges

• Can be used for any compression rate

Target-based evaluation withoutannotated corpora

• They require that the sources have a human providedsummary (but they do not need to be annotated)

• Donaway et. al. (2000) propose to use cosine similaritybetween an automatic summary and human summary - but itrelies on words co-occurrences

• ROUGE uses the number of overlapping units (Lin, 2004)

• Nenkova and Passonneau (2004) proposed the pyramidevaluation method which addresses the problem that differentpeople select different content when writing summaries

ROUGE

• ROUGE = Recall-Oriented Understudy for Gisting Evaluation(Lin, 2004)

• inspired by BLEU (Bilingual Evaluation Understudy) used inmachine translation (Papineni et al., 2002)

• Developed by Chin-Yew Lin and available athttp://berouge.com

• Compares quality of a summary by comparison with idealsummaries

• Metrics count the number of overlapping units

• There are several versions depending on how the comparisonis made

http://berouge.com

ROUGE-N

N-gram co-occurrence statistics is a recall oriented metric

• S1: Police killed the gunman

• S2: Police kill the gunman

• S3: The gunman kill police

• S2=S3

ROUGE-L

Longest common sequence

• S1: police killed the gunman

• S2: police kill the gunman

• S3: the gunman kill police

• S2 = 3/4 (police the gunman)

• S3 = 2/4 (the gunman)

• S2 > S3

ROUGE-W

Weighted Longest Common Subsequence

• S1: [A B C D E F G]

• S2: [A B C D H I J]

• S3: [A H B J C I D]

• ROUGE-W favours consecutive matches

• S2 better than S3

ROUGE-S

ROUGE-S: Skip-bigram recall metric

• Arbitrary in-sequence bigrams are computed

• S1: police killed the gunman (“police killed”, “police the”,“police gunman”, “killed the”, “killed gunman”, “thegunman”)

• S2: police kill the gunman (“police the”, “police gunman”,“the gunman”)

• S3: the gunman kill police (“the gunman”)

• S4: the gunman police killed (“police killed”, “the gunman”)

• S2 better than S4 better than S3

• ROUGE-SU adds unigrams to ROUGE-S

ROUGE

• Experiments on DUC 2000 - 2003 data shows good corelationwith human judgement

• Using multiple references achieved better correlation withhuman judgement than just using a single reference.

• Stemming and removing stopwords improved correlation withhuman judgement

Task-based evaluation

• is an extrinsic and on-line evaluation

• instead of evaluating the summaries directly, humans areasked to perform tasks using summaries and the accuracy ofthese tasks is measured

• the assumption is that the accuracy does not decrease whengood summaries are used

• the time should reduce

• Example of tasks: classification of summaries according topredefined classes (Saggion and Lapalme, 2000), determiningthe relevance of a summary to a topic (Miike et al., 1994; Okaand Ueda, 2000), and reading comprehension (Morris, Kasper,and Adams, 1992; Orasan, Pekar, and Hasler, 2004).

Task-based evaluation

• this evaluation can be very useful because it assess a summaryin real situations

• it is time consuming and requires humans to be involved inthe evaluation process

• in order to obtain statistically significant results a largenumber of judges have to be involved

• this evaluation method has been used in evaluationconferences

Automatic evaluation

• extrinsic and off-line evaluation method

• tries to replace humans in task-based evaluations withautomatic methods which perform the same task and areevaluated automatically

• Examples:• text retrieval (Brandow, Mitze, and Rau, 1995): increase in

precision but drastic reduction of recall• text categorisation (Kolcz, Prabakarmurthi, and Kalita, 2001):

the performance of categorisation increases

• has the advantage of being fast and cheap, but in many casesthe tasks which can benefit from summaries are as difficult toevaluate as automatic summarisation (e.g. Kuo et al. (2002)proposed to use QA)

intrinsic

extrinsic

From (Sparck Jones, 2007)

intrinsic

extrinsic

• semi-purpose: inspection (e.g. for properEnglish)


intrinsic

extrinsic


• quasi-purpose: comparison with models (e.g.ngrams, nuggets)


intrinsic

extrinsic



• pseudo-purpose: simulation of task contexts(e.g. action scenarios)


intrinsic

extrinsic



• pseudo-purpose: simulation of task contexts(e.g. action scenarios)

• full-purpose: operation in task context (e.g.report writing)


Evaluation conferences

• evaluation conferences are conferences where all theparticipants have to complete the same task on a common setof data

• these conferences allow direct comparison between theparticipants

• such conferences determined quick advances in fields: MUC(information extraction), TREC (Information retrieval &question answering), CLEF (question answering fornon-English languages and cross-lingual QA)

SUMMAC

• the first evaluation conference organised in automaticsummarisation (in 1998)

• 6 participants in the dry-run and 16 in the formal evaluation

• mainly extrinsic evaluation:• adhoc task determine the relevance of the source document to

a query (topic)• categorisation assign to each document a category on the basis

of its summary• question answering answer questions using the summary

• a small acceptability test where direct evaluation was used

SUMMAC

• the TREC dataset was used

• for the adhoc evaluation 20 topics each with 50 documentswere selected

• the time for the adhoc task halves with a slight reduction inthe accuracy (which is not significant)

• for the categorisation task 10 topics each with 100 documents(5 categories)

• there is no difference in the classification accuracy and thetime reduces only for 10% summaries

• more details can be found in (Mani et al., 1998)

Text Summarization Challenge

• is an evaluation conference organised in Japan and its maingoals are to evaluate Japanese summarisers

• it was organised using the SUMMAC model

• precision and recall were used to evaluate single documentsummaries

• humans had to assess the relevance of summaries from textretrieved for specific queries to these queries

• is also included some readability measures (e.g. how manydeletions, insertions and replacements were necessary)

• more details can be found in (Fukusima and Okumura, 2001;Okumura, Fukusima, and Nanba, 2003)

Document Understanding Conference(DUC)

• it is an evaluation conference organised part of a largerprogram called TIDES (Translingual Information Detection,Extraction and Summarisation)

• organised from 2000

• at be beginning it was not that different from SUMMAC, butin time more difficult tasks were introduced:

• 2001: single and multi-document generic summaries with 50,100, 200, 400 words

• 2002: single and multi-document generic abstracts with 50,100, 200, 400 words, and multi-document extracts with 200and 400 words

• 2003: abstracts of documents and document sets with 10 and100 words, and focused multi-document summaries

Document Understanding Conference

• in 2004 participants were required to produce short (<665bytes) and (very short <75 bytes) summaries of singledocuments and document sets, short document profile,headlines

• from 2004 ROUGE is used as evaluation method

• in 2005: short multiple document summaries, user-orientedquestions

• in 2006: same as in 2005 but also used pyramid evaluation

• more information available at: http://duc.nist.gov/

• in 2007: 250 word summary, 100 update task, pyramidevaluation was used as a community effort

• in 2008 DUC became TAC (Text Analysis Conference)


1 Introduction to automatic summarisation

2 Important methods in automatic summarisationHow humans produce summariesSingle-document summarisation methods

Surface-based summarisation methodsMachine learning methodsMethods which exploit the discourse structureKnowledge-rich methods

Multi-document summarisation methods


Ideal summary processing model

Source text(s)

Interpretation

Source representation

Transformation

Summary representation

Generation

Summary text

How humans produce summaries

How humans summarise documents

• Determining how humans summarise documents is a difficulttask because it requires interdisciplinary research

• Endres-Niggemeyer (1998) breaks the process in three stages:document exploration, relevance assessment and summaryproduction

• these have been determined through interviews withprofessional summarisers

• use a top-down approach

• the expert summarisers do not attempt to understand thesource in great detail, instead they are trained to identifysnippets which contain important information

• very few automatic summarisation methods use an approachsimilar to humans

Document exploration

• it’s the first step

• the source’s title, outline, layout and table of contents areexamined

• the genre of the texts is investigated because very often eachgenre dictates a certain structure

• For example expository texts are expected to have aproblem-solution structure

• the abstractor’s knowledge about the source is represented asa schema.

• schema = an abstractor’s prior knowledge of document typesand their information structure

Relevance assessment

• at this stage summarisers identify the theme and the thematicstructure

• theme = a structured mental representation of what thedocument is about

• this structure allows identification of relations between textchunks

• is used to identify important information, deletion of irrelevantand unnecessary information

• the schema is populated with elements from the thematicstructure, producing an extended structure of the theme

Summary production

• the summary is produced from the expanded structure of thetheme

• in order to avoid producing a distorted summary, summarisersrelay mainly on copy/paste operations

• the chunks which are copied are reorganised to fit the newstructure

• standard sentence patters are also used

• summary production is a long process which requires severaliterations

• checklists can be used

Single-document summarisationmethods

Single document summarisation

• Produces summaries from a single document

• There are two main approaches:

• automatic text extraction → produces extracts also referred toas extract and rearrange

• automatic text abstraction → produces abstracts also referredto as understand and generate

• Automatic text extraction is the most used method toproduce summaries



• There are two main approaches:• automatic text extraction → produces extracts also referred to

as extract and rearrange

• automatic text abstraction → produces abstracts also referredto as understand and generate




• There are two main approaches:• automatic text extraction → produces extracts also referred to

as extract and rearrange• automatic text abstraction → produces abstracts also referred

to as understand and generate


Automatic text extraction

• Extracts important sentences from the text using differentmethods and produces an extract by displaying the importantsentences (usually in order of appearance)

• A large proportion of the sentences used in human producessummaries are sentences have been extracted directly from thetext or which contain only minor modifications

• Uses different statistical, surface-based and machine learningtechniques to determine which sentences are important

• First attempts made in the 50s

Automatic text extraction

• These methods are quite robust

• The main drawback of this method is that it overlooks theway in which relationships between concepts in the text arerealised by the use of anaphoric links and other discoursedevices

• Extracting paragraphs can solve some of these problems

• Some methods involve excluding the unimportant sentencesinstead of extracting the important sentences

Surface-based summarisationmethods

Term-based summarisation

• It was the first method used to produce summaries by Luhn(1958)

• Relies on the assumption that important sentences have alarge number of important words

• The importance of a word is calculated using statisticalmeasures

• Even though this method is very simple it is still used incombination with other methods

• A demo summariser which relies on term frequency can befound at:http://clg.wlv.ac.uk/projects/CAST/demos.php

http://clg.wlv.ac.uk/projects/CAST/demos.php

How to compute the importance of a word

• Different methods can be used:• Term frequency: how frequent is a word in the document• TF*IDF: relies on how frequent is a word in a document and in

how many documents from a collection the word appears

TF ∗ IDF (w) = TF (w) ∗ log(Number of documents

Number of documents with w)

• other statistical measures, for examples see (Orasan, 2009)

• Issues:• stoplists should be used• what should be counted: words, lemmas, truncation, stems• how to select the document collection

Term-based summarisation: the algorithm

(and can be used for other types of summarisers)

1 Score all the words in the source according to the selectedmeasure

2 Score all the sentences in the text by adding the scores of thewords from these sentences

3 Extract the sentences with top N scores

4 Present the extracted sentences in the original order

Position method

• It was noticed that in some genres important sentence appearin predefined positions

• First used by Edmundson (1969)

• Depends very much from one genre to another:• newswire: lead summary the first few sentences from the text• scientific papers: the first/last sentences in the paragraph are

relevant for the topic of the paragraph (Baxendale, 1958)• scientific papers: important information occurs in specific

sections of the document (introduction/conclusion)

• Lin and Hovy (1997) use a corpus to determine the wherethese important sentences occur

Title method

• words in titles and headings are positively relevant tosummarisation

• Edmundson (1969) noticed that can lead to an increase inperformance of up to 8% if the score of sentences whichinclude such words are increased

Cue words/indicating phrases

• Makes use of words or phrases classified as ”positive” or”negative” which may indicate the topicality and thus thesentence value in an abstract

• positive: significant, purpose, in this paper, we show,• negative: Figure 1, believe, hardly, impossible, pronouns

• Paice (1981) proposes indicating phrases which are basicallypatterns (e.g. [In] this paper/report/article we/I show)

Methods inspired from IR (Salton et. al.,1997)

• decomposes a document in a set of paragraphs

• computes the similarity between paragraphs and it representsthe strength of the link between two paragraphs

• similar paragraphs are considered those who have a similarityabove a threshold

• paragraphs can be extracted according to different strategies(e.g. the number of links they have, select connectedparagraphs)

How to combine different methods

• Edmundson (1969) used a linear combination of features:

Weight(S) = α∗Title(S)+β∗Cue(S)+γ∗Keyword(S)+δ∗Position(S)

• the weights were adjusted manually

• the best system was cue + title + position

• it is better to use machine learning methods to combine theresults of different modules

Machine learning methods

What is machine learning (ML)?

Mitchell (1997):

• “machine learning is concerned with the question of how toconstruct computer programs that automatically improve withexperience”

• “A computer program is said to learn from experience E withrespect to some class of tasks T and performance measure P,if its performance at tasks in T, as measured by P, improveswith experience E”

What is machine learning? (2)

• Reasoning is based on the similarity between new situationsand the ones present in the training corpus

• In some cases it is possible to understand what it is learnt(e.g. If-then rules)

• But in many cases the knowledge learnt by an algorithmcannot be easily understood (instance-based learning, neuralnetworks)

ML for language processing

• Has been widely employed in a large number of NLPapplications which range from part-of-speech tagging andsyntactic parsing to word-sense disambiguation andcoreference resolution.

• In NLP both symbolic methods (e.g. decision trees,instance-based classifiers) and numerically oriented statisticaland neural-network training approaches were used

ML as classification task

Very often an NLP problem can be seen as a classification problem

• POS: finding the appropriate class of a word

• Segmentation (e.g. noun phrase extraction): each word isclassified as the beginning, end or inside of the segment

• Anaphora/coreference resolution: classify candidates inantecedent/non-antecedent

Summarisation as a classification task

• Each example (instance) in the set to be learnt can bedescribed by a set of features f1, f2, ...fn

• The task is to find a way to assign an instance to one of them disjoint classes c1, c2, ..., cm

• The automatic summarisation process is usually transformedin a classification one

• The features are different properties of sentences (e.g.position, keywords, etc.)

• Two classes: extract/do-not-extract

• Not always classification. It is possible to use the score orautomatically learnt rules as well

Kupiec et. al. (1995)

• used a Bayesian classifier to combine different features

• the features were:• if the length of a sentence is above a threshold (true/false)• contains cue words (true/false)• position in the paragraph (initial/middle/final)• contains keywords (true/false)• contains capitalised words (true/false)

• the training and testing corpus consisted of 188 documentswith summaries

• humans identified sentences from the full text which are usedin the summary

• the best combination was position + cue + length

• Teufl and Moens (1997) used a similar method for sentenceextraction

Mani and Bloedorn (1998)

• learn rules about how to classify sentences

• features used:• location features: location of sentence in paragraph, sentence

in special section, etc.• thematic features: tf score, tf*idf score, number of section

heading words• cohesion features: number of sentences with a synonym link to

sentence• user focused features: number of terms relevant to the topic

• Example of rule learnt: IF sentence in conclusion & tf*idf high& compression = 20% THEN summary sentence

Other ML methods

• Osborne (2002) used maximum entropy with features such asword pairs, sentence length, sentence position, discoursefeatures (e.g., whether sentence follows the “Introduction”,etc.)

• Knight and Marcu(2000) use noisy channel for sentencecompression

• Conroy et. al. (2001) use HMM

• Most of the methods these days try to use machine learning

Methods which exploit the discoursestructure

Methods which exploit discourse cohesion

• summarisation methods which use discourse structure usuallyproduce better quality summaries because they consider therelations between the extracted chunks

• they rely on global discourse structure

• they are more difficult to implement because very often thetheories on which they are based are difficult and not fullyunderstood

• there are methods which use text cohesion and text coherence

• very often it is difficult to control the length of summariesproduced in this way

Methods which exploit text cohesion

• text cohesion involves relations between words, word senses,referring expressions which determine how tightly connectedthe text is

• (S13) ”All we want is justice in our own country,” aboriginalactivist Charles Perkins told Tuesday’s rally. ... (S14) ”Wedon’t want budget cuts - it’s hard enough as it is ,” saidPerkins

• there are methods which exploit lexical chains andcoreferential chains

Lexical chains for text summarisation

• Telepattan system: Bembrahim and Ahmad (1995)

• two sentences are linked if the words are related by repetition,synonymy, class/superclass, paraphrase

• sentences which have a number of links above a thresholdform a bond

• on the basis of bonds a sentence has to previous and followingsentences it is possible to classify them as start topic, endtopic and mid topic

• sentences are extracted on the basis of open-continue-endtopic

• Barzilay and Elhadad (1997) implemented a more refinedversion of the algorithm which includes ambiguity resolution

Using coreferential chains for textsummarisation

• method presented in (Azzam, Humphreys, Gaizauskas, 1999)

• the underlying idea is that it is possible to capture the mostimportant topic of a document by using a principalcoreferential chain

• The LaSIE system was used to produce the coreferentialchains extended with a focus-based algorithm for resolution ofpronominal anaphora

Coreference chain selection

The summarisation module implements several selection criteria:

• Length of chain: prefers a chain which contains most entireswhich represents the most mentioned instance in a text

• Spread of the chain: the distance between the earliest and thelatest entry in each chain

• Start of Chain: the chain which starts in the title or in thefirst paragraph of the text (this criteria could be very usefulfor some genres such as newswire)

Summarisation methods which userhetorical structure of texts

• it is based on the Rhetorical Structure Theory (RST) (Mannand Thompson, 1988)

• according to this theory text is organised in non-overlappingspans which are linked by rhetorical relations and can beorganised in a tree structure

• there are two types of spans: nuclei and satellites

• a nucleus can be understood without satellites, but not theother way around

• satellites can be removed in order to obtain a summary

• the most difficult part is to build the rhetorical structure of atext

• Ono, Sumita and Miike (1994), Marcu (1997) andCorston-Oliver (1998) present summarisation methods whichuse the rhetorical structure of the text

from (Marcu, 2000)

Summarisation using argumentativezoning

• Teufel and Moens (2002) exploit the structure of scientificdocuments in order to produce summaries

• the summarisation process is split into two parts

1 identification of important sentences using an approach similarto the one proposed by Kupiec, Pederson, and Chen (1995)

2 recognition of the rhetorical roles of the extracted sentences

• for rhetorical roles the following classes are used: Aim,Textual, Own, Background, Contrast, Basis, Other

Knowledge-rich methods

Knowledge rich methods

• Produce abstracts

• Most of them try to “understand” (at least partially a text)and to make inferences before generating the summary

• The systems do not really understand the contents of thedocuments, but they are using different techniques to extractthe meaning

• Since this process involves a huge amount of world knowledgethe application is restricted to a specific domain only

Knowledge-rich methods

• The abstracts obtained in this way are betters in terms ofcohesion and coherence

• The abstracts produced in this way tend to be moreinformative

• This method is also known as the understand and generateapproach

• This method extracts the information from the text and holdsit in some intermediate form

• The representation is then used as the input for a naturallanguage generator to produce an abstract

FRUMP (deJong, 1982)

• uses sketchy scripts to understand a situation

• these scripts only keep the information relevant to the eventand discard the rest

• 50 scripts were manually created

• words from the source activate scripts and heuristics are usedto decide which script is used in case more than one script isactivated

Example of script used by FRUMP

1 The demonstrators arrive at the demonstration location

2 The demonstrators march

3 The police arrive on the scene

4 The demonstrators communicate with the target of thedemonstration

5 The demonstrators attack the target of the demonstration

6 The demonstrators attack the police

7 The police attack the demonstrators

8 The police arrest the demonstrators

FRUMP

• the evaluation of the system revealed that it could not processa large number of scripts because it did not have theappropriate scripts

• the system is very difficult to be ported to a different domain

• sometimes it can misunderstand some scripts: Vatican City.The dead of the Pope shakes the world. He passed away →Earthquake in the Vatican. One dead.

• the advantage of this method is that the output can be in anylanguage

Concept-based abstracting (Paice andJones, 1993)

• Also referred to as extract and generate

• Summaries in the field of agriculture

• Relies on predefined text patterns such as this paper studiesthe effect of [AGENT] on the [HLP] of [SPECIES] → Thispaper studies the effect of G. pallida on the yield of potato.

• The summarisation process involves instantiation of patternswith concepts from the source

• Each pattern has a weight with is used to decide whether thegenerated sentence is included in the output

• This method is good to produce informative summaries

Other knowledge-rich methods

• Rumelhart (1975) developed a system to understand andsummarise simple stories, using a grammar which generatedsemantic interpretations of the story on the basis ofhand-coded rules.

• Alterman (1986) used local understanding

• Fum, Guida, and Tasso (1985) tries to replicate the humansummarisation process

• Rau, Jacobs, and Zernik (1989) integrates a bottom-uplinguistic analyser and a top-down conceptual interpretation

Multi-document summarisationmethods

Multi-document summarisation

• multi-document summarisation is the extension ofsingle-document summarisation to collections of relateddocuments

• very rarely methods from single-document summarisation canbe directly used

• it is not possible to produce single-document summaries fromevery single document in collection and then to concatenatethem

• normally they are user-focused summaries

Issues with multi-document summaries

• the collections to be summarised can vary a lot in size, sodifferent methods might need to be used

• a much higher compression rate is needed

• redundancy

• ordering of sentences (usually the date of publication is used)

• similarities and differences between different texts need to beconsidered

• contradiction between information

• fragmentary information

IR inspired methods

• Salton et. al. (1997) can be adapted to multi-documentsummarisation

• instead of using paragraphs from one documents, paragraphsfrom all the documents are used

• the extraction strategies are kept

Maximal Marginal Relevance

• proposed by (Goldstein et al., 2000)

• addresses the redundancy among multiple documents

• allows a balance between the diversity of the information andrelevance to a user query

• MMR(Q,R, S) =argmaxDi∈R\S [λSim1(Di ,Q)− (1− λ)maxDj∈RSim2(Di ,Dj ))]

• can be used also for single document summarisation

Cohesion text maps

• use knowledge based on lexical cohesion Mani and Bloedorn(1999)

• good to compare pairs of documents and tell what’s common,what’s different

• builds a graph from the texts: the nodes of the graph are thewords of the text. Arcs represent adjacency, grammatical,co-reference, and lexical similarity-based relations.

• sentences are scored using tf.idf metric.

• user query is used to traverse the graph (a spread activation isused)

• to minimize redundancy in extracts, extraction can be greedyto cover as many different terms as possible

Cohesion text maps

Theme fusion Barzilay et. al. (1999)

• used to avoid redundancy in multi-document summaries

• Theme = collection of similar sentences drawn from one ormore related documents

• Computes theme intersection: phrases which are common toall sentences in a theme

• paraphrasing rules are used (active vs. passive, different ordersof adjuncts, classifier vs. apposition, ignoring certainpremodifiers in NPs, synonymy)

• generation is used to put the theme intersection together

Centroid based summarisation

• a centroid = a set of words that are statistically important toa cluster of documents

• each document is represented as a weighted vector of TF*IDFscores

• each sentence receives a score equal with the sum ofindividual centroid values

• sentence salience Boguraev and Kennedy (1999)

• centroid score Radev, Jing, and Budzikowska (2000)

Cross Structure Theory

• Cross Structure Theory provides a theoretical model for issuesthat arise when trying to summarise multiple texts (Radev,Otterbacher, and Zhang, 2004).

• describing relationships between two or more sentences fromdifferent source documents related to the same topic.

• similar to RST but at cross-document level

• 18 domain-independent relations such as identity, equivalence,subsumption, contradiction, overlap, fulfilment andelaboration between texts spans

• can be used to extract sentences and avoid redundancy

Automatic summarisation and theInternet

• New research topics have emerged at the confluence ofsummarisation with other disciplines (e.g. question answeringand opinion mining)

• Many of these fields appeared as a result of the expansion ofthe Internet

• The Internet is probably the largest source of information, butit is largely unstructured and heterogeneous

• Multi-document summarisation is more necessary than ever

• Web content mining = extraction of useful information fromthe Web

Challenges posed by the Web

• Huge amount of information

• Wide and diverse

• Information of all types e.g. structured data, texts, videos, etc.

• Semi-structured

• Linked

• Redundant

• Noisy

Summarisation of news on the Web

• Newsblaster (McKeown et. al. 2002) summarises news fromthe Web (http://newsblaster.cs.columbia.edu/)

• it is mainly statistical, but with symbolic elements

• it crawls the Web to identify stories (e.g. filters out ads),clusters them on specific topics and produces amultidocument summary

• theme sentences are analysed and fused together to producethe summary

• summaries also contain images using high precision rules

• similar services: newsinessence, Google News, News Explorer

• tracking and updating are important features of such systems

http://newsblaster.cs.columbia.edu/

Email summarisation

• email summarisation is more difficult because they have adialogue structure

• Muresan et. al. (2001) use machine learning to learn rules forsalient NP extraction

• Nenkova and Bagga (2003) use developed a set of rules toextract important sentences

• Newman and Blitzer (2003) use clustering to group messagestogether and then they extract a summary from each cluster

• Rambow et. al. (2004) automatically learn rules to extractsentences from emails

• these methods do not use may email specific features, but ingeneral the subject of the first email is used as a query

Blog summarisation

• Zhou et. al. (2006) see a blog entry as a summary of a newsstories with personal opinions added. They produce asummary by deleting sentences not related to the story

• Hu et. al. (2007) use blog’s comments to identify words thatcan be used to extract sentences from blogs

• Conrad et. al. (2009) developed a query-based opinionsummarisation for legal blog entries based on the TAC 2008system

Opinion mining and summarisation

• find what reviewers liked and disliked about a product

• usually large number of reviews, so an opinion summaryshould be produced

• visualisation of the result is important and it may not be a text

• analogous to, but different to multi-document summarisation

Producing the opinion summary

A three stage process:

1 Extract object features that have been commented on in eachreview.

2 Classify each opinion

3 Group feature synonym and produce the summary (pro vs.cons, detailed review, graphical representation)

Opinion summaries

• Mao and Lebanon (2007) suggest to produce summaries thattrack the sentiment flow within a document i.e., howsentiment orientation changes from one sentence to the next

• Pang and Lee (2008) suggest to create “subjectivity extracts.”

• sometimes graph-based output seems much more appropriateor useful than text-based output

• in traditional summarization redundant information is oftendiscarded, in opinion summarization one wants to track andreport the degree of redundancy, since in the opinion-orientedsetting the user is typically interested in the (relative) numberof times a given sentiment is expressed in the corpus.

• there is much more contradictory information

Opinion summarisation at TAC

• the Text Analysis Conference 2008 (TAC) contained anopinion summarisation from blogs

• http://www.nist.gov/tac/

• generate summaries of opinions about targets

• What features do people dislike about Vista?

• a question answering system is used to extract snippets thatare passed to the summariser

http://www.nist.gov/tac/

QA and Summarisation at INEX2009

• the QA track at INEX2009 requires participants to answerfactual and complex questions

• the complex questions will require to aggregate the answerfrom several documents

• What are the main applications of bayesian networks in thefield of bioinformatics?

• for complex sentences evaluators will mark syntacticincoherence, unresolved anaphora, redundancy and notanswering the question

• Wikipedia will be used as document collection

Conclusions

• research in automatic summarisation is still a very active, butin many cases it merges with other fields

• evaluation is still a problem in summarisation

• the current state-of-the-art is still sentence extraction

• more language understanding needs to be added to thesystems

Thank you!More information and updates at:



References

Alterman, Richard. 1986. Summarisation in small. In N. Sharkey, editor, Advances incognitive science. Chichester, England, Ellis Horwood.

American National Standards Institute Inc. 1979. American National Standard forWriting Abstracts. Technical Report ANSI Z39.14 – 1979, American NationalStandards Institute, New York.

Baxendale, Phyllis B. 1958. Man-made index for technical literature - an experiment.I.B.M. Journal of Research and Development, 2(4):354 – 361.

Boguraev, Branimir and Christopher Kennedy. 1999. Salience-based contentcharacterisation of text documents. In Inderjeet Mani and Mark T. Maybury, editors,Advances in Automated Text Summarization. The MIT Press, pages 99 – 110.

Borko, Harold and Charles L. Bernier. 1975. Abstracting concepts and methods.Academic Press, London.

Brandow, Ronald, Karl Mitze, and Lisa F. Rau. 1995. Automatic condensation ofelectronic publications by sentence selection. Information Processing & Management,31(5):675 – 685.

Cleveland, Donald B. 1983. Introduction to Indexing and Abstracting. LibrariesUnlimited, Inc.

Conroy, James M., Jjudith D. Schlesinger, Dianne P. O’Leary, and Mary E. Okurowski.2001. Using HMM and logistic regression to generate extract summaries for DUC. InProceedings of the 1st Document Understanding Conference, New Orleans, LouisianaUSA, September 13-14.

DeJong, G. 1982. An overview of the FRUMP system. In W. G. Lehnert and M. H.Ringle, editors, Strategies for natural language processing. Hillsdale, NJ: LawrenceErlbaum, pages 149 – 176.

Edmundson, H. P. 1969. New methods in automatic extracting. Journal of theAssociation for Computing Machinery, 16(2):264 – 285, April.

Endres-Niggemeyer, Brigitte. 1998. Summarizing information. Springer.

Fukusima, Takahiro and Manabu Okumura. 2001. Text Summarization ChallengeText summarization evaluation in Japan (TSC). In Proceedings of AutomaticSummarization Workshop.

Fum, Danilo, Giovanni Guida, and Carlo Tasso. 1985. Evaluating importance: a steptowards text summarisation. In Proceedings of the 9th International Joint Conferenceon Artificial Intelligence, pages 840 – 844, Los Altos CA, August.

Goldstein, Jade, Mark Kantrowitz, Vibhu Mittal, and Jaime Carbonell. 1999.Summarizing text documents: Sentence selection and evaluation metrics. InProceedings of the 22nd Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval, pages 121 – 128, Berkeley, California,August, 15 – 19.

Goldstein, Jade, Vibhu O. Mittal, Jamie Carbonell, and Mark Kantrowitz. 2000.Multi-Document Summarization by Sentence Extraction. In Udo Hahn, Chin-Yew Lin,Inderjeet Mani, and Dragomir R. Radev, editors, Proceedings of the Workshop onAutomatic Summarization at the 6th Applied Natural Language ProcessingConference and the 1st Conference of the North American Chapter of the Associationfor Computational Linguistics, Seattle, WA, April.

Graetz, Naomi. 1985. Teaching EFL students to extract structural information fromabstracts. In J. M. Ulign and A. K. Pugh, editors, Reading for Professional Purposes:Methods and Materials in Teaching Languages. Leuven: Acco, pages 123–135.

Hasler, Laura, Constantin Orasan, and Ruslan Mitkov. 2003. Building better corporafor summarisation. In Proceedings of Corpus Linguistics 2003, pages 309 – 319,Lancaster, UK, March, 28 – 31.

Hovy, Eduard. 2003. Text summarisation. In Ruslan Mitkov, editor, The OxfordHandbook of computational linguistics. Oxford University Press, pages 583 – 598.

Jing, Hongyan and Kathleen R. McKeown. 1999. The decomposition ofhuman-written summary sentences. In Proceedings of the 22nd InternationalConference on Research and Development in Information Retrieval (SIGIR’99), pages129 – 136, University of Berkeley, CA, August.

Johnson, Frances. 1995. Automatic abstracting research. Library review, 44(8):28 –36.

Knight, Kevin and Daniel Marcu. 2000. Statistics-based summarization — step one:Sentence compression. In Proceedings of the 17th National Conference on ArtificialIntelligence (AAAI), pages 703 – 710, Austin, Texas, USA, July 30 – August 3.

Kolcz, Aleksander, Vidya Prabakarmurthi, and Jugal Kalita. 2001. Summarization asfeature selection for text categorization. In Proceedings of the 10th InternationalConference on Information and Knowledge Management, pages 365 – 370, Atlanta,Georgia, US, October 05 - 10.

Kuo, June-Jei, Hung-Chia Wung, Chuan-Jie Lin, and Hsin-Hsi Chen. 2002.Multi-document summarization using informative words and its evaluation with a QAsystem. In Proceedings of the Third International Conference on Intelligent TextProcessing and Computational Linguistics (CICLing-2002), pages 391 – 401, MexicoCity, Mexico, February, 17 – 23.

Kupiec, Julian, Jan Pederson, and Francine Chen. 1995. A trainable documentsummarizer. In Proceedings of the 18th ACM/SIGIR Annual Conference on Researchand Development in Information Retrieval, pages 68 – 73, Seattle, July 09 – 13.

Lin, Chin-Yew. 2004. Rouge: a package for automatic evaluation of summaries. InProceedings of the Workshop on Text Summarization Branches Out (WAS 2004),Barcelona, Spain, July 25 - 26.

Lin, Chin-Yew and Eduard Hovy. 1997. Identifying topic by position. In Proceedingsof the 5th Conference on Applied Natural Language Processing, pages 283 – 290,Washington, DC, March 31 – April 3.

Louis, Annie and Ani Nenkova. 2009. Performance confidence estimation forautomatic summarization. In Proceedings of the 12th Conference of the EuropeanChapter of the ACL, page 541548, Athens, Greece, March 30 - April 3.

Luhn, H. P. 1958. The automatic creation of literature abstracts. IBM Journal ofresearch and development, 2(2):159 – 165.

Mani, Inderjeet and Eric Bloedorn. 1998. Machine learning of generic anduser-focused summarization. In Proceedings of the Fifthteen National Conference onArtificial Intelligence, pages 821 – 826, Madison, Wisconsin. MIT Press.

Mani, Inderjeet and Eric Bloedorn. 1999. Summarizing similarities and differencesamong related documents. In Inderjeet Mani and Mark T. Maybury, editors, Advancesin automatic text summarization. The MIT Press, chapter 23, pages 357 – 379.

Mani, Inderjeet, Therese Firmin, David House, Michael Chrzanowski, Gary Klein,Lynette Hirshman, Beth Sundheim, and Leo Obrst. 1998. The TIPSTER SUMMACtext summarisation evaluation: Final report. Technical Report MTR 98W0000138,The MITRE Corporation.

Mani, Inderjeet and Mark T. Maybury, editors. 1999. Advances in automatic textsummarisation. MIT Press.

Marcu, Daniel. 1999. The automatic construction of large-scale corpora forsummarization research. In The 22nd International ACM SIGIR Conference onResearch and Development in Information Retrieval (SIGIR’99), pages 137–144,Berkeley, CA, August 15 – 19.

Marcu, Daniel. 2000. The theory and practice of discourse parsing and summarisation.The MIT Press.

Miike, Seiji, Etsuo Itoh, Kenji Ono, and Kazuo Sumita. 1994. A full-text retrievalsystem with a dynamic abstract generation function. In Proceedings of the 17th ACMSIGIR conference, pages 152 – 161, Dublin, Ireland, 3-6 July. ACM/Springer.

Minel, Jean-Luc, Sylvaine Nugier, and Gerald Piat. 1997. How to appreciate thequality of automatic text summarization? In Proceedings of the ACL’97/EACL’97Workshop on Intelligent Scallable Text Summarization, pages 25 – 30, Madrid, Spain,July 11.

Morris, Andrew H., George M. Kasper, and Dennis A. Adams. 1992. The effect andlimitations of automatic text condensing on reading comprehension performance.Information Systems Research, 3(1):17 – 35.

Oka, Mamiko and Yoshihiro Ueda. 2000. Evaluation of phrase-representationsummarization based on information retrieval task. In NAACL-ANLP 2000 Workshopon Automatic Summarization, pages 59 – 68, Seattle, Washington, April 30.

Okumura, Manabu, Takahiro Fukusima, and Hidetsugu Nanba. 2003. TextSummarization Challenge 2: Text Summarization Evaluation at NTCIR Workshop 3.In Proceeding of the HLT-NAACL 2003 Workshop on Text summarization, pages 49 –56, Edmonton, Alberta, Canada, May 31 – June 1.

Orasan, Constantin. 2009. Comparative evaluation of term-weighting methods forautomatic summarization. Journal of Quantitative Linguistics, 16(1):67 – 95.

Orasan, Constantin, Viktor Pekar, and Laura Hasler. 2004. A comparison ofsummarisation methods based on term specificity estimation. In Proceedings of theFourth International Conference on Language Resources and Evaluation (LREC2004),pages 1037 – 1041, Lisbon, Portugal, May 26 – 28.

Osborne, M. 2002. Using maximum entropy for sentence extraction. In Proceedingsof ACL 2002 Workshop on Automatic Summarization.

Paice, Chris D. 1981. The automatic generation of literature abstracts: an approachbased on the identification of self-indicating phrases. In R. N. Oddy, C. J. Rijsbergen,and P. W. Williams, editors, Information Retrieval Research. London: Butterworths,pages 172 – 191.

Papineni, K., S. Roukos, T. Ward, and W. J. Zhu. 2002. BLEU: a method forautomatic evaluation of machine translation. In Proceedings of the 40th Annualmeeting of the Association for Computational Linguistics (ACL 2002), pages 311 –318.

Radev, Dragomir, Jahna Otterbacher, and Zhu Zhang. 2004. CSTBank: A Corpus forthe Study of Cross-document Structural Relationship. In Proceedings of LanguageResources and Evaluation Conference (LREC 2004), Lisbon, Portugal.

Radev, Dragomir R., Hongyan Jing, and Malgorzata Budzikowska. 2000.Centroid-based summarization of multiple documents: sentence extraction,utility-based evaluation and user studies. In Proceedings of the NAACL/ANLPWorkshop on Automatic Summarization, pages 21 – 29, Seattle, WA, USA, 30 April.

Rau, Lisa F., Paul S. Jacobs, and Uri Zernik. 1989. Information extraction and textsummarisation using linguistic knowledge acquisition. Information Processing &Management, 25(4):419 – 428.

Rumelhart, E. 1975. Notes on a schema for stories. In D. G. Bobrow and A. Collins,editors, Representation and Understanding: Studies in Cognitive Science. AcademicPress Inc, pages 211 – 236.

Saggion, Horacio and Guy Lapalme. 2000. Concept identification and presentation inthe context of technical text summarization. In NAACL-ANLP 2000 Workshop onAutomatic Summarization, pages 1 – 10, Seattle, Washington, April 30.

Salton, Gerard, Amit Singhal, Mandar Mitra, and Chris Buckley. 1997. Automatictext structuring and summarization. Information Processing and Management,33(3):193 – 207.

Sparck Jones, Karen. 1999. Automatic summarizing: factors and directions. InInderjeet Mani and Mark T. Maybury, editors, Advances in automatic textsummarization. The MIT Press, chapter 1, pages 1 – 12.

Sparck Jones, Karen. 2001. Factorial summary evaluation. In Proceedings of theWorkshop on Text Summarization (DUC 2001), New Orleans, Louisiana, USA,September 13-14.

Sparck Jones, Karen. 2007. Automatic summarising: The state of the art.Information Processing and Management, 43:1449 – 1481.

Teufel, Simone and Marc Moens. 1997. Sentence extraction as a classification task.In Proceedings of the ACL’97/EACL’97 Workshop on Intelligent Scallable TextSummarization, pages 58 – 59, Madrid, Spain, July 11.

Teufel, Simone and Marc Moens. 2002. Summarizing scientific articles: Experimentswith relevance and rhetorical status. Computational linguistics, 28(4):409 – 445.

van Dijk, Teun A. 1980. Text and context : explorations in the semantics andpragmatics of discourse. London : Longman.

Tutorial on automatic summarization

Education

Transcript of Tutorial on automatic summarization