CLEF 2008 Multilingual Question Answering Track UNED Anselmo Peñas Valentín Sama Álvaro Rodrigo...

19
CLEF 2008 Multilingual Question Answering Track UNED Anselmo Peñas Valentín Sama Álvaro Rodrigo CELCT Danilo Giampiccolo Pamela Forner

Transcript of CLEF 2008 Multilingual Question Answering Track UNED Anselmo Peñas Valentín Sama Álvaro Rodrigo...

CLEF 2008

Multilingual Question Answering Track

UNEDAnselmo PeñasValentín SamaÁlvaro Rodrigo

CELCTDanilo Giampiccolo

Pamela Forner

2

QA 2008 Task and Exercises

QA Main task (6th edition) Pilot: QA WSD, English newswire collections with

Word Sense Disambiguation

Answer Validation Exercise – AVE (3rd edition)

QA on Speech Transcripts – QAST (2nd edition)

3

Main Task QA 2008Organizing Committee

CELCT (D. Giampiccolo, P. Forner): Italian UNED (A. Peñas): Spanish U. Groeningen (G. Bosma): Dutch U. Limerick (R. Sutcliff): English DFKI (B. Sacalenau): German ELDA/ELRA (N. Moreau): French Linguateca (P. Rocha): Portuguese Bulgarian Academy of Sciences (P. Osenova): Bulgarian♦ IASI (C. Forascu): Romanian♦ U. Basque Country (I. Alegria): Basque♦ ILSP (P.Prokopidis): Greek

4

Evolution of the Track2003 2004 2005 2006 2007 2008

Target languages

3 7 8 9 10 11

Collections News 1994 +News 1995 +Wikipedia Nov. 2006

Type of questions

200 Factoid

+ Temporal restrictions

+ Definitions

- Type of question

+ Lists

+ Linked questions

+ Closed lists

Supporting information

Doc. Snippet

Pilots and Exercises

Temporal restrictions

Lists

AVEReal Time

WiQA

AVEQAST

AVEQAST

WSDQA

5

200 questions

FACTOID (loc, mea, org, per, tim, cnt, obj , oth)

DEFINITION (per, org, obj, oth)

CLOSED LIST Who were the components of The Beatles? Who were the last three presidents of Italy?

LINKED QUESTIONS Who was called the “Iron-Chancellor”? When was he born? Who was his first wife?

♦ Temporal restrictions by date, by period, by event♦ NIL questions (without known answer in the collection)

6

43 Activated Language Combinations(at least one registered participant)

77

Activated Tasks

MONOLINGUAL CROSS-LINGUAL TOTAL

CLEF 2003 3 5 8

CLEF 2004 6 13 19

CLEF 2005 8 15 23

CLEF 20067 17 24

CLEF 2007 8 29 37

CLEF 2008 10 33 43

8

8

Submitted runs

  Submitted runs Monolingual Cross-lingual

CLEF 2003 17 6 11

CLEF 2004 48 (+182%) 20 28

CLEF 2005 67 (+40%) 43 24

CLEF 2006 77 (+15%) 42 35

CLEF 2007 37 (-52%) 20 17

CLEF 2008 51 (+38%) 31 20

9

Participant groups

  Newcomers Veterans TOTAL Registered

CLEF 2003 - - 8 -

CLEF 2004 13 518

(+125%)22

CLEF 2005 9 1524

(+33%)27

CLEF 2006 10 2030

(+25%)36

CLEF 2007 8 1422

(-26%)29

CLEF 2008 8 13 21 33

10

List of Participants (random order)

Bulgaria

11

Groups per year and target collection

0

5

10

15

20

25

30

35

40

45

2003 2004 2005 2006 2007 2008

Greek

Finnish

French

Spanish

English

Italian

Ducth

Bulgarian

Basque

Romanian

German

Portuguese

Task Change

Natural selection?

Above 20 groups

12

Groups per target collection

012345678910

2003 2004 2005 2006 2007 2008

English

Spanish

French

Portuguese

German

Romanian

Italian

Bulgarian

Ducth

Basque

Finnish

Greek

13

2008 participation: Comparative evaluation?

Lack from evaluation perspective:

4 languages without comparison between different groups

Breakout session

Language RunsDifferent groups

Portuguese 9 6

Spanish 10 4

English 5 4

German 11 3

Romanian 4 2

Dutch 4 1

Basque 4 1

French 3 1

Bulgarian 1 1

Italian 0 0

Greek 0 0

14

54,0

63,5

29,0

23,7

29,4 27,9

22,8 23,6

35,0

41,8

19,0

49,5

39,535,0

25,0

10,9 13,218,517,0

14,7

69,064,5

41,545,5

0,0

10,0

20,0

30,0

40,0

50,0

60,0

70,0

80,0

2003 2004 2005 2006 2007 2008

Best Bilingual Average Bilingual Best Monolingual Average Monolingual

Results: Best and Average scores

15

Best scores by language34

,01

23,5

24,5 28

45,5

28,6

4

53,1

6

68,9

5

28,1

9

31,2

65,9

6

14

44,5

54

11,5

5

25,5

50,5

30

37,0

19,0

42,5

56,5

0,0

25,5

63,5

22,5

32,5

22,6

3

42,3

330

0

10

20

30

40

50

60

70

80

German

English

Spanish

French

Italian

Dutch

Portuguese

Romanian

Best2004

Best2005

Best2006

Best2007

Best2008

16

37

23 22

25,5

63,5

56,5

0

10

20

30

40

50

60

70

80

DF

KI

HA

GE

N

INA

OE

GR

ON

ING

EN

PR

IBE

RA

M

SY

NA

PS

E

2004 2005 2006 2007 2008

Best scores by participant

17

Results depend on type of questions

Definitions Almost solved for several systems 80%-95%

Factoids 50%-65% for several systems

Temporal restrictions Same level of difficulty as factoids for some systems

Closed lists Still very difficult

Linked questions Still very difficult

Now wikipedia provides more answers

18

Conclusion

Same task as 2007 Same level of participation (slightly

better) 11 target languages (9 with participation) 43 activated subtasks 21 participants 51 runs

Same results (slightly better)

19

Future direction

Less participants per language Poor comparison Change methodology: one task for all

Critics to QA over wikipedia Easier to find questions with IR No user model Change collection

QA proposal for 2009 SC and breakout