MediaEval 2015 - Multimodal Person Discovery in Broadcast TV - poster

Multimodal Person Discovery in Broadcast TV Task at MediaEval 2015 Johann Poignant, Hervé Bredin, Claude Barras LIMSI – CNRS, Orsay, France [email protected] Motivation ➢ Indexing TV archives → find people appearing and speaking ➢ Biometric models are not always available → find people identities in the video (pronounced names, names written on screen) Definition of the task ➢ From a collection of TV broadcast pre-segmented into shots, participants are asked to provide: ➢ The names of people both speaking and appearing at the same time during the shot, with a confidence score ➢ An evidence (a unique shot) proving that the person holds the right name ➢ List of persons was not provided ➢ Person biometric models trained on externel data can not be used ➢ Participants should find their names in the audio (using ASR) or visual (using OCR) streams A B A Hello Mrs B Mr A blah blah shot #1 shot #2 shot #3 A B B blah blah shot #4 speaking face evidence A B blah blah A text overlay speech transcript INPUT OUTPUT LEGEND Shot#1 is an evidence for Mr A Shot#3 is an evidence for Mrs B Datasets ➢ Dev set: REPERE (2 French channel, 8 shows, 137 hours, 50 hours manually annotated) ➢ Test set: INA corpus (1 French channel, 172 editions of the evening broadcast news, 106 hours) ➢ Annotated a posteriori based on participants' submissions Baseline and metadata ➢ Available as open-source software at http://github.com/MediaEvalPersonDiscoveryTask Metrics ➢ For each queries q ϵ Q, rank all shots (according to their confidence) of the best hypothesis p (if Levenshtein ratio(p, q) > 0.95) ➢ Average Precision ➢ Mean Average Precision ➢ Correctness of evidences ➢ Evidence-weighted Mean Average k is the rank P(k) is the precision at cut-off k rel(k) =1 if the shot at rank k is relevant, 0 otherwise ∑ k=1 (P(k)) · rel(k)) # relevant shots n AP = Q MAP = ∑ q=1 AP(q) 1 Q C(q) = 1 if the Levenshtein ratio(p, q) > 0.95 and if the evidence proves q identity 0 otherwise Q EwMAP = ∑ q=1 C(q) · AP(q) 1 Q

Upload
multimediaeval
Category

Education
view
70
download
1

Embed Size (px):

Transcript of MediaEval 2015 - Multimodal Person Discovery in Broadcast TV - poster

Page 1: MediaEval 2015 - Multimodal Person Discovery in Broadcast TV - poster

Multimodal Person Discovery in Broadcast TV Taskat MediaEval 2015

Johann Poignant, Hervé Bredin, Claude BarrasLIMSI – CNRS, Orsay, [email protected]

Motivation➢Indexing TV archives → find people appearing and speaking➢Biometric models are not always available → find people identities in the video (pronounced names, names written on screen)

Definition of the task➢From a collection of TV broadcast pre-segmented into shots, participants are asked to provide:

➢The names of people both speaking and appearing at the same time during the shot, with a confidence score➢An evidence (a unique shot) proving that the person holds the right name

➢List of persons was not provided➢Person biometric models trained on externel data can not be used➢Participants should find their names in the audio (using ASR) or visual (using OCR) streams

A BA

Hello Mrs B

Mr A

blah blah

shot #1 shot #2 shot #3

A B B

blah blah

shot #4 speaking face

evidence

A B

blah blah

text overlay

speech transcript

INPUT

OUTPUT

LEGEND

Shot#1 is an evidence for Mr A

Shot#3 is an evidence for Mrs B

Datasets➢Dev set: REPERE (2 French channel, 8 shows, 137 hours, 50 hours manually annotated)➢Test set: INA corpus (1 French channel, 172 editions of the evening broadcast news, 106 hours) ➢Annotated a posteriori based on participants' submissions

Baseline and metadata➢Available as open-source software at

http://github.com/MediaEvalPersonDiscoveryTask

Metrics➢For each queries q ϵ Q, rank all shots (according to their confidence) of the best hypothesis p (if Levenshtein ratio(p, q) > 0.95)

➢Average Precision

➢Mean Average Precision

➢Correctness of evidences

➢Evidence-weighted Mean Average

k is the rankP(k) is the precision at cut-off krel(k) =1 if the shot at rank k is relevant, 0 otherwise

∑k=1

(P(k)) · rel(k))

# relevant shots

AP =