MediaEval 2015 - Multimodal Person Discovery in Broadcast TV - poster

1
Multimodal Person Discovery in Broadcast TV Task at MediaEval 2015 Johann Poignant, Hervé Bredin, Claude Barras LIMSI – CNRS, Orsay, France [email protected] Motivation Indexing TV archives → find people appearing and speaking Biometric models are not always available → find people identities in the video (pronounced names, names written on screen) Definition of the task From a collection of TV broadcast pre-segmented into shots, participants are asked to provide: The names of people both speaking and appearing at the same time during the shot, with a confidence score An evidence (a unique shot) proving that the person holds the right name List of persons was not provided Person biometric models trained on externel data can not be used Participants should find their names in the audio (using ASR) or visual (using OCR) streams A B A Hello Mrs B Mr A blah blah shot #1 shot #2 shot #3 A B B blah blah shot #4 speaking face evidence A B blah blah A text overlay speech transcript INPUT OUTPUT LEGEND Shot#1 is an evidence for Mr A Shot#3 is an evidence for Mrs B Datasets Dev set: REPERE (2 French channel, 8 shows, 137 hours, 50 hours manually annotated) Test set: INA corpus (1 French channel, 172 editions of the evening broadcast news, 106 hours) Annotated a posteriori based on participants' submissions Baseline and metadata Available as open-source software at http://github.com/MediaEvalPersonDiscoveryTask Metrics For each queries q ϵ Q, rank all shots (according to their confidence) of the best hypothesis p (if Levenshtein ratio(p, q) > 0.95) Average Precision Mean Average Precision Correctness of evidences Evidence-weighted Mean Average k is the rank P(k) is the precision at cut-off k rel(k) =1 if the shot at rank k is relevant, 0 otherwise k=1 (P(k)) · rel(k)) # relevant shots n AP = Q MAP = q=1 AP(q) 1 Q C(q) = 1 if the Levenshtein ratio(p, q) > 0.95 and if the evidence proves q identity 0 otherwise Q EwMAP = q=1 C(q) · AP(q) 1 Q

Transcript of MediaEval 2015 - Multimodal Person Discovery in Broadcast TV - poster

Multimodal Person Discovery in Broadcast TV Taskat MediaEval 2015

Johann Poignant, Hervé Bredin, Claude BarrasLIMSI – CNRS, Orsay, [email protected]

Motivation➢Indexing TV archives → find people appearing and speaking➢Biometric models are not always available → find people identities in the video (pronounced names, names written on screen)

Definition of the task➢From a collection of TV broadcast pre-segmented into shots, participants are asked to provide:

➢The names of people both speaking and appearing at the same time during the shot, with a confidence score➢An evidence (a unique shot) proving that the person holds the right name

➢List of persons was not provided➢Person biometric models trained on externel data can not be used➢Participants should find their names in the audio (using ASR) or visual (using OCR) streams

A BA

Hello Mrs B

Mr A

blah blah

shot #1 shot #2 shot #3

A B B

blah blah

shot #4 speaking face

evidence

A B

blah blah

A

text overlay

speech transcript

INPUT

OUTPUT

LEGEND

Shot#1 is an evidence for Mr A

Shot#3 is an evidence for Mrs B

Datasets➢Dev set: REPERE (2 French channel, 8 shows, 137 hours, 50 hours manually annotated)➢Test set: INA corpus (1 French channel, 172 editions of the evening broadcast news, 106 hours) ➢Annotated a posteriori based on participants' submissions

Baseline and metadata➢Available as open-source software at

http://github.com/MediaEvalPersonDiscoveryTask

Metrics➢For each queries q ϵ Q, rank all shots (according to their confidence) of the best hypothesis p (if Levenshtein ratio(p, q) > 0.95)

➢Average Precision

➢Mean Average Precision

➢Correctness of evidences

➢Evidence-weighted Mean Average

k is the rankP(k) is the precision at cut-off krel(k) =1 if the shot at rank k is relevant, 0 otherwise

∑k=1

(P(k)) · rel(k))

# relevant shots

n

AP =

QMAP = ∑

q=1 AP(q)1

Q

C(q) = 1 if the Levenshtein ratio(p, q) > 0.95 and if the evidence proves q identity0 otherwise

QEwMAP = ∑

q=1 C(q) · AP(q)1

Q