CLEF 2008 Multilingual Question Answering Track

19
CLEF 2008 Multilingual Question Answering Track UNED Anselmo Peñas Valentín Sama Álvaro Rodrigo CELCT Danilo Giampiccolo Pamela Forner

description

CLEF 2008 Multilingual Question Answering Track. UNED Anselmo Peñas Valentín Sama Álvaro Rodrigo CELCT Danilo Giampiccolo Pamela Forner. QA 2008 Task and Exercises. QA Main task (6th edition) Pilot: QA WSD, English newswire collections with Word Sense Disambiguation - PowerPoint PPT Presentation

Transcript of CLEF 2008 Multilingual Question Answering Track

Page 1: CLEF 2008 Multilingual Question Answering Track

CLEF 2008

Multilingual Question Answering Track

UNEDAnselmo PeñasValentín SamaÁlvaro Rodrigo

CELCTDanilo Giampiccolo

Pamela Forner

Page 2: CLEF 2008 Multilingual Question Answering Track

2

QA 2008 Task and Exercises

QA Main task (6th edition) Pilot: QA WSD, English newswire collections with

Word Sense Disambiguation

Answer Validation Exercise – AVE (3rd edition)

QA on Speech Transcripts – QAST (2nd edition)

Page 3: CLEF 2008 Multilingual Question Answering Track

3

Main Task QA 2008Organizing Committee

CELCT (D. Giampiccolo, P. Forner): Italian UNED (A. Peñas): Spanish U. Groeningen (G. Bosma): Dutch U. Limerick (R. Sutcliff): English DFKI (B. Sacalenau): German ELDA/ELRA (N. Moreau): French Linguateca (P. Rocha): Portuguese Bulgarian Academy of Sciences (P. Osenova): Bulgarian♦ IASI (C. Forascu): Romanian♦ U. Basque Country (I. Alegria): Basque♦ ILSP (P.Prokopidis): Greek

Page 4: CLEF 2008 Multilingual Question Answering Track

4

Evolution of the Track2003 2004 2005 2006 2007 2008

Target languages

3 7 8 9 10 11

Collections News 1994 +News 1995 +Wikipedia Nov. 2006

Type of questions

200 Factoid

+ Temporal restrictions

+ Definitions

- Type of question

+ Lists

+ Linked questions

+ Closed lists

Supporting information

Doc. Snippet

Pilots and Exercises

Temporal restrictions

Lists

AVEReal Time

WiQA

AVEQAST

AVEQAST

WSDQA

Page 5: CLEF 2008 Multilingual Question Answering Track

5

200 questions

FACTOID (loc, mea, org, per, tim, cnt, obj , oth)

DEFINITION (per, org, obj, oth)

CLOSED LIST Who were the components of The Beatles? Who were the last three presidents of Italy?

LINKED QUESTIONS Who was called the “Iron-Chancellor”? When was he born? Who was his first wife?

♦ Temporal restrictions by date, by period, by event♦ NIL questions (without known answer in the collection)

Page 6: CLEF 2008 Multilingual Question Answering Track

6

43 Activated Language Combinations(at least one registered participant)

Page 7: CLEF 2008 Multilingual Question Answering Track

77

Activated Tasks

MONOLINGUAL CROSS-LINGUAL TOTAL

CLEF 2003 3 5 8

CLEF 2004 6 13 19

CLEF 2005 8 15 23

CLEF 20067 17 24

CLEF 2007 8 29 37

CLEF 2008 10 33 43

Page 8: CLEF 2008 Multilingual Question Answering Track

8

8

Submitted runs

  Submitted runs Monolingual Cross-lingual

CLEF 2003 17 6 11

CLEF 2004 48 (+182%) 20 28

CLEF 2005 67 (+40%) 43 24

CLEF 2006 77 (+15%) 42 35

CLEF 2007 37 (-52%) 20 17

CLEF 2008 51 (+38%) 31 20

Page 9: CLEF 2008 Multilingual Question Answering Track

9

Participant groups

  Newcomers Veterans TOTAL Registered

CLEF 2003 - - 8 -

CLEF 2004 13 518

(+125%)22

CLEF 2005 9 1524

(+33%)27

CLEF 2006 10 2030

(+25%)36

CLEF 2007 8 1422

(-26%)29

CLEF 2008 8 13 21 33

Page 10: CLEF 2008 Multilingual Question Answering Track

10

List of Participants (random order)

Bulgaria

Page 11: CLEF 2008 Multilingual Question Answering Track

11

Groups per year and target collection

0

5

10

15

20

25

30

35

40

45

2003 2004 2005 2006 2007 2008

Greek

Finnish

French

Spanish

English

Italian

Ducth

Bulgarian

Basque

Romanian

German

Portuguese

Task Change

Natural selection?

Above 20 groups

Page 12: CLEF 2008 Multilingual Question Answering Track

12

Groups per target collection

012345678910

2003 2004 2005 2006 2007 2008

English

Spanish

French

Portuguese

German

Romanian

Italian

Bulgarian

Ducth

Basque

Finnish

Greek

Page 13: CLEF 2008 Multilingual Question Answering Track

13

2008 participation: Comparative evaluation?

Lack from evaluation perspective:

4 languages without comparison between different groups

Breakout session

Language RunsDifferent groups

Portuguese 9 6

Spanish 10 4

English 5 4

German 11 3

Romanian 4 2

Dutch 4 1

Basque 4 1

French 3 1

Bulgarian 1 1

Italian 0 0

Greek 0 0

Page 14: CLEF 2008 Multilingual Question Answering Track

14

54,0

63,5

29,0

23,7

29,4 27,9

22,8 23,6

35,0

41,8

19,0

49,5

39,535,0

25,0

10,9 13,218,517,0

14,7

69,064,5

41,545,5

0,0

10,0

20,0

30,0

40,0

50,0

60,0

70,0

80,0

2003 2004 2005 2006 2007 2008

Best Bilingual Average Bilingual Best Monolingual Average Monolingual

Results: Best and Average scores

Page 15: CLEF 2008 Multilingual Question Answering Track

15

Best scores by language34

,01

23,5

24,5 28

45,5

28,6

4

53,1

6

68,9

5

28,1

9

31,2

65,9

6

14

44,5

54

11,5

5

25,5

50,5

30

37,0

19,0

42,5

56,5

0,0

25,5

63,5

22,5

32,5

22,6

3

42,3

330

0

10

20

30

40

50

60

70

80

German

English

Spanish

French

Italian

Dutch

Portuguese

Romanian

Best2004

Best2005

Best2006

Best2007

Best2008

Page 16: CLEF 2008 Multilingual Question Answering Track

16

37

23 22

25,5

63,5

56,5

0

10

20

30

40

50

60

70

80

DF

KI

HA

GE

N

INA

OE

GR

ON

ING

EN

PR

IBE

RA

M

SY

NA

PS

E

2004 2005 2006 2007 2008

Best scores by participant

Page 17: CLEF 2008 Multilingual Question Answering Track

17

Results depend on type of questions

Definitions Almost solved for several systems 80%-95%

Factoids 50%-65% for several systems

Temporal restrictions Same level of difficulty as factoids for some systems

Closed lists Still very difficult

Linked questions Still very difficult

Now wikipedia provides more answers

Page 18: CLEF 2008 Multilingual Question Answering Track

18

Conclusion

Same task as 2007 Same level of participation (slightly

better) 11 target languages (9 with participation) 43 activated subtasks 21 participants 51 runs

Same results (slightly better)

Page 19: CLEF 2008 Multilingual Question Answering Track

19

Future direction

Less participants per language Poor comparison Change methodology: one task for all

Critics to QA over wikipedia Easier to find questions with IR No user model Change collection

QA proposal for 2009 SC and breakout