Chapter 9 Topic Detection and Tracking
Transcript of Chapter 9 Topic Detection and Tracking
-
8/8/2019 Chapter 9 Topic Detection and Tracking
1/28
Chapter 9:
Topic Detection and Tracking
(TDT)
Some slides from Overview NIST Topic Detection and Tracking
-Introduction and Overview by G. Doddington
-http://www.itl.nist.gov/iaui/894.01/tests/tdt/tdt99/presentations/index.htm
-
8/8/2019 Chapter 9 Topic Detection and Tracking
2/28
TDT Task Overview
5 R&D Challenges:
Story Segmentation Topic Tracking
Topic Detection
First-Story Detection Link Detection
TDT3 CorpusCharacteristics: Two Types of Sources:
Text Speech
Two Languages: English 30,000
stories Mandarin 10,000
stories
11 Different Sources: _8 English__ 3
MandarinABC CNN VOAPRI VOA XINNBC MNB ZBNAPW NYT
** see http://www.itl.nist.gov/iaui/894.01/tdt3/tdt3.htm for details see http://morph.ldc.upenn.edu/Projects/TDT3/for details
-
8/8/2019 Chapter 9 Topic Detection and Tracking
3/28
Preliminaries
A topictopic is a seminal eventevent or activity, along with all
directly related events and activities.
A storystory is a topically cohesive segment of news that
includes two or more DECLARATIVE
independent clauses abouta single event.
-
8/8/2019 Chapter 9 Topic Detection and Tracking
4/28
Example Topic
Title: Mountain Hikers Lost
WHAT: 35 or 40 young Mountain Hikerswere lost in an avalanche in France aroundthe 20th of January.
WHERE: Orres, France WHEN: January 1998
RULES OF INTERPRETATION: 5.Accidents
-
8/8/2019 Chapter 9 Topic Detection and Tracking
5/28
(for Radio and TV only)
Transcription:
text (words)
Story:
Non-story:
The Segmentation Task:
To segment the source stream into its constituent stories,
for all audio sources.
-
8/8/2019 Chapter 9 Topic Detection and Tracking
6/28
The Topic Tracking Task:
To detect stories that discuss the target topic,
in multiple source streams.
Find all the stories that discuss a giventarget topic Training: Given Nt sample stories that
discuss a given target topic,
Test: Find all subsequent stories thatdiscuss the target topic.
on-topic
unknown
unknown
training data
test datanot guaranteed
to be off-topic
-
8/8/2019 Chapter 9 Topic Detection and Tracking
7/28
-
8/8/2019 Chapter 9 Topic Detection and Tracking
8/28
-
8/8/2019 Chapter 9 Topic Detection and Tracking
9/28
Topic Detection Conditions
Decision Deferral Conditions:
Maximum decision deferral
period in # of source files
1
10
100
-
8/8/2019 Chapter 9 Topic Detection and Tracking
10/28
There is no supervised topic training
(like Topic Detection)
Time
First Stories
Not First Stories
= Topic 1
= Topic 2
The First-Story Detection Task:
To detect the first story that discusses a topic,
for all topics.
-
8/8/2019 Chapter 9 Topic Detection and Tracking
11/28
The Link Detection Task
To detect whether a pair of stories discuss the same topic.
The topic discussed is a free variable.
Topic definition and annotation is unnecessary.
The link detection task represents a basic
functionality, needed to support all applications(including the TDT applications of topic detection andtracking).
The link detection task is related to the topic trackingtask, with Nt = 1.
same topic?
-
8/8/2019 Chapter 9 Topic Detection and Tracking
12/28
TDT3 Evaluation Methodology
All TDT3 tasks are cast as statistical detection (yes-no) tasks.
Story Segmentation: Is there a story boundary here?
Topic Tracking: Is this story on the given topic?
Topic Detection: Is this story in the correct topic-clustered set?
First-story Detection: Is this the first story on a topic?
Link Detection: Do these two stories discuss the same topic?
Performance is measured in terms of detection cost, which is aweighted sum of missand false alarmprobabilities:
CDet
= CMiss
PMiss
+ CFA
PFA
(e.g. CMiss=0.2, CFA = 0.98 )
Detection Cost is normalized to lie between 0 and 1:(CDet)Norm = CDet / min{CMiss, CFA }
-
8/8/2019 Chapter 9 Topic Detection and Tracking
13/28
Example Performance Measures:
0.01
0.1
1
Eng
lish
Manda
rinNormalized
TrackingCo
st
Tracking Results on Newswire Text (BBN)
-
8/8/2019 Chapter 9 Topic Detection and Tracking
14/28
0.01
0.1
1
BBN
1
CMU
1
Dragon
1
UPenn1
GE
1
UIowa1
UMd1
Normaliz
edTrackingCost
1999 TDT3 Tracking ResultsRequired Evaluation Condition4 English Training Stories, Multilingual Test Texts,Newswire Text+Broadcast News ASR, Given ASR Boundaries
95%
ConfidenceBars
Actual
Decision Cost
MinimumDETGraphCost
0.092
-
8/8/2019 Chapter 9 Topic Detection and Tracking
15/28
1999 TDT3 Tracking ResultsRequired Evaluation Condition
-
8/8/2019 Chapter 9 Topic Detection and Tracking
16/28
Story Segmentation using
Decision Trees
-
8/8/2019 Chapter 9 Topic Detection and Tracking
17/28
-
8/8/2019 Chapter 9 Topic Detection and Tracking
18/28
Results
-
8/8/2019 Chapter 9 Topic Detection and Tracking
19/28
Relevance Models and Link
Detection Given two stories A and B
Determine if topic(A)=topic(B)
Estimate topics models of A and B
e.g. language models
Measure distance between the models
e.g. Kullback-Leibler
-
8/8/2019 Chapter 9 Topic Detection and Tracking
20/28
Generating Queries
Suppose you have some source of
queries You have generated several queries
q1qn from this source
What is)....|( 111 + NN qqqP
q1q2q3 ?
-
8/8/2019 Chapter 9 Topic Detection and Tracking
21/28
Universe of Models
q1q2
q3
?
M1
M2
Mk
=
++=
k
i
NiiNNN qqMPMqPqqqP
1
1111 )...|()|()...|(
-
8/8/2019 Chapter 9 Topic Detection and Tracking
22/28
Using Relevance Models in Link
Detection Question:
Are stories S1 and S2 linked? Approach
Create are relevance model for S1 and S2 Measure the distance between the models
-
8/8/2019 Chapter 9 Topic Detection and Tracking
23/28
Building Relevance Models as
Topic Models S1 :
Generate queries from it Retrieve documents from the collection
Estimate
SizeColl
cf
D
tfDwP w
Dw
.)1(
||)|(
, +=
)...|()|(
)...|()|(
1
11
N
RD
N
qqDPDwP
qqwPSwP
=
=
-
8/8/2019 Chapter 9 Topic Detection and Tracking
24/28
Measuring Distances
=w SwP
SwPSwPSSD)|()|(log)|()||(
2
1121
( ))||()||(2
1)||( 122121 SSDSSDSSDsym +=
Kullback-Leibler Distance
Symmetric Kullback-Leibler Distance
Kullback-Leibler Distance with Clarity
english)general:(GE
)|(
)|(log)|()||( 2121 =
w
ClGEwP
SwPSwPSSD
-
8/8/2019 Chapter 9 Topic Detection and Tracking
25/28
Comparison on Training Data
-
8/8/2019 Chapter 9 Topic Detection and Tracking
26/28
Comparison on Evaluation Data
-
8/8/2019 Chapter 9 Topic Detection and Tracking
27/28
Distance Metric /Smoothing
Sym. KL distance + clarity is not only the best methodbut also is robust against changes in the smoothing
-
8/8/2019 Chapter 9 Topic Detection and Tracking
28/28
Summary
TDT:
International Benchmark Various sub tasks
Link detection