P ASCAL C HALLENGE ON E VALUATING M ACHINE L EARNING FOR I NFORMATION E XTRACTION

34
PASCAL PASCAL CHALLENGE ON EVALUATING MACHINE LEARNING FOR INFORMATION EXTRACTION Designing K now ledge M anagem ent using A daptive Inform ation Extraction from Text PASCAL N etw ork ofExcellence on Pattern A nalysis, StatisticalM odelling and Com putationalLearning C allfor participation: Evaluating M achine Learning forInform ation Extraction July 2004 -N ovem ber 2004 The D ot.K om European projectand the PascalN etw ork ofExcellence invite you in participating in the Challenge on Evaluation ofM achine Learning forInform ation Extraction from D ocum ents.G oalofthe challenge isto assessthe currentsituation concerning M achine Learning (M L)algorithm sforInform ation Extraction (IE),identifying future challengesand to foster additionalresearch in the field. G iven a corpusofannotated docum ents, the participantsw illbe expected to perform a num beroftasks;each exam ining differentaspectsofthe learning process. Corpus A standardised corpusof1100 W orkshop CallforPapers(CFP)w illbe provided.600 ofthese docum entsw illbe annotated w ith 12 tagsthatrelate to pertinentinform ation (nam es, locations, dates, etc.).Ofthe annotated docum ents400 w illbe provided to the participantsasa training set,the rem aining 200 w illform the unseen testsetused in the finalevaluation.A llthe docum ents w illbe pre-processed to include tokenisation,part-of-speech and nam ed-entity inform ation. Tasks Fullscenario:The only m andatory task forparticipantsislearning to annotate im plicitinform ation:given the 400 training docum ents,learn the textualpatternsnecessary to extractthe annotated inform ation.Each participantprovidesresultsofa four-fold cross-validation experim entusing the sam e docum entpartitionsforpre-com petitive tests.A finaltestw illbe perform ed on the 200 unseen docum ents. A ctive learning:Learning to selectdocum ents:the 400 training docum entsw illbe divided into fixed subsetsofincreasing size (e.g. 10, 20,30,50,75, 100, 150,and 200). The use ofthe subsetsfortraining w illshow effectoflim ited resourceson the learning process.Secondly,given each subsetthe participantscan selectthe docum entsto add to increm entto the nextsize (i.e. 10 to 20,20 to 30,etc.),thusshow ing the ability to selectthe m ostsuitable setofdocum entsto annotate. Enriched Scenario:the sam e procedure astask 1,exceptthe participantsw illbe able to use the unannotated partofthe corpus(500 docum ents).Thisw illshow how the use ofunsupervised orsem i-supervised m ethodscan im prove the resultsofsupervised approaches.A n interesting variantofthistask could concern the use ofunlim ited resources,e.g.the W eb. Participation Participantsfrom differentfieldssuch asm achine learning, textm ining,naturallanguage processing,etc. are welcom e.Participation in the challenge isfree.A fter registration,participantw ill receive the corpusofdocum entsto train on and the precise instructionson the tasksto be perform ed.A tan established date,participantsw illbe required to subm ittheirsystem s’answ ersvia a W eb portal.An autom atic scorerw illcom pute the accuracy ofextraction.A paperw illhave to be produced in orderto describe the system and the resultsobtained.Resultsofthe challenge w illbe discussed in a dedicated w orkshop. Timetable 5 th July 2004:Form aldefinition ofthe tasks,annotated corpusand evaluation server 15 th O ctober2004:Form alevaluation N ovem ber2004:Presentation ofevaluation atPascalw orkshop O rganizers Fabio Ciravegna:U niversity ofSheffield,U K ;(coordinator) M ary Elaine Califf,IllinoisState U niversity, U SA , Neil Ireson Local Challenge Coordinator Web Intelligent Group Department of Computer Science University of Sheffield UK

description

P ASCAL C HALLENGE ON E VALUATING M ACHINE L EARNING FOR I NFORMATION E XTRACTION. Neil Ireson Local Challenge Coordinator Web Intelligent Group Department of Computer Science University of Sheffield UK. Organisers. Sheffield – Fabio Ciravegna UCD Dublin – Nicholas Kushmerick - PowerPoint PPT Presentation

Transcript of P ASCAL C HALLENGE ON E VALUATING M ACHINE L EARNING FOR I NFORMATION E XTRACTION

Page 1: P ASCAL  C HALLENGE ON   E VALUATING  M ACHINE  L EARNING FOR  I NFORMATION  E XTRACTION

PASCAL

PASCAL CHALLENGE ON EVALUATING MACHINE LEARNING FOR

INFORMATION EXTRACTION

Designing Knowledge Management using Adaptive Information Extraction from Text

PASCAL Network of Excellence on Pattern Analysis, Statistical Modelling and Computational Learning

Call for participation:

Evaluating Machine Learning for Information Extraction

July 2004 - November 2004

The Dot.Kom European project and the Pascal Network of Excellence invite you in participating in the Challenge on Evaluation of Machine Learning for Information Extraction from Documents. Goal of the challenge is to assess the current situation concerning Machine Learning (ML) algorithms for Information Extraction (IE), identifying future challenges and to foster additional research in the field. Given a corpus of annotated documents, the participants will be expected to perform a number of tasks; each examining different aspects of the learning process.

Corpus A standardised corpus of 1100 Workshop Call for Papers (CFP) will be provided. 600 of these documents will be annotated with 12 tags that re late to pertinent information (names, locations, dates, etc.). Of the annotated documents 400 will be provided to the participants as a training set, the remaining 200 will form the unseen test set used in the final evaluation. All the documents will be pre-processed to include tokenisation, part-of-speech and named-entity information.

Tasks Full scenario: The only mandatory task for participants is learning to annotate implicit information: given the 400 training documents, learn the textual patterns nece ssary to extract the annotated information. Each participant provides results of a four-fold cross-validation experiment using the same document partitions for pre-competitive tests. A final test will be performed on the 200 unseen documents. Active learning: Learning to select documents: the 400 training documents will be divided into fixed subsets of increasing size (e.g. 10, 20, 30, 50, 75, 100, 150, and 200). The use of the subsets for training will show effect of limited resources on the learning process. Secondly, given each subset the participants can select the documents to add to increment to the next size (i.e. 10 to 20, 20 to 30, etc.), thus showing the ability to select the most suitable set of documents to annotate. Enriched Scenario: the same procedure as task 1, except the participants will be able to use the unannotated part of the corpus (500 documents). This will show how the use of unsupervised or semi-supervised methods can improve the results of supervised approaches. An interesting variant of this task could concern the use of unlimited resources, e.g. the Web.

Participation Participants from different fields such as machine learning, text mining, natural language processing, etc. are welcome. Participation in the challenge is free. After registration, participant will receive the corpus of documents to train on and the precise instructions on the tasks to be performed. At an established date, participants will be required to submit their systems’ answers via a Web portal. An automatic scorer will compute the accuracy of extraction. A paper will have to be produced in order to describe the system and the results obtained. Results of the challenge will be discussed in a dedicated workshop.

Timetable 5th July 2004: Formal definition of the tasks, annotated corpus and evaluation server 15th October 2004: Formal evaluation November 2004: Presentation of evaluation at Pascal workshop

Organizers Fabio Ciravegna: University of Sheffield, UK; (coordinator) Mary Elaine Califf, Illinois State University, USA,

Neil Ireson

Local Challenge Coordinator

Web Intelligent GroupDepartment of Computer ScienceUniversity of SheffieldUK

Page 2: P ASCAL  C HALLENGE ON   E VALUATING  M ACHINE  L EARNING FOR  I NFORMATION  E XTRACTION

PASCAL

Organisers

• Sheffield – Fabio Ciravegna

• UCD Dublin – Nicholas Kushmerick

• ITC-IRST – Alberto Lavelli

• University of Illinois – Mary-Elaine Califf

• FairIsaac – Dayne Freitag

Page 3: P ASCAL  C HALLENGE ON   E VALUATING  M ACHINE  L EARNING FOR  I NFORMATION  E XTRACTION

PASCAL

Outline

• Challenge Goals

• Data

• Tasks

• Participants

• Experimental Results

• Conclusions

Page 4: P ASCAL  C HALLENGE ON   E VALUATING  M ACHINE  L EARNING FOR  I NFORMATION  E XTRACTION

PASCAL

Goal : Provide a testbed for comparative evaluation of ML-based IE

• Standardisation – Data

• Partitioning• Same set of features

– Corpus preprocessed using Gate– No features allowed other than the ones provided

– Explicit Tasks– Evaluation Metrics

• For future use• Available for further test with same or new systems• Possible to publish and new corpora or tasks

Page 5: P ASCAL  C HALLENGE ON   E VALUATING  M ACHINE  L EARNING FOR  I NFORMATION  E XTRACTION

PASCAL

Data (Workshop CFP)2005

1993

2000

Training Data

400 Workshop CFP

Testing Data

200 Workshop CFP

Page 6: P ASCAL  C HALLENGE ON   E VALUATING  M ACHINE  L EARNING FOR  I NFORMATION  E XTRACTION

PASCAL

Data (Workshop CFP)2005

1993

2000

Training Data

400 Workshop CFP

Testing Data

200 Workshop CFP

Set0

Set1

Set2

Set3

Page 7: P ASCAL  C HALLENGE ON   E VALUATING  M ACHINE  L EARNING FOR  I NFORMATION  E XTRACTION

PASCAL

Data (Workshop CFP)2005

1993

2000

Training Data

400 Workshop CFP

Testing Data

200 Workshop CFP

Set0

Set1

Set2

Set3

0 1 2 3 4 5 6 7 8 9

0 1 2 3 4 5 6 7 8 9

0 1 2 3 4 5 6 7 8 9

0 1 2 3 4 5 6 7 8 9

Page 8: P ASCAL  C HALLENGE ON   E VALUATING  M ACHINE  L EARNING FOR  I NFORMATION  E XTRACTION

PASCAL

Data (Workshop CFP)2005

1993

2000

Training Data

400 Workshop CFP

Testing Data

200 Workshop CFP

Unannotated Data 1

250 Workshop CFP Unannotated

Data 2

250 Conference CFP

WWW

Page 9: P ASCAL  C HALLENGE ON   E VALUATING  M ACHINE  L EARNING FOR  I NFORMATION  E XTRACTION

PASCAL

Page 10: P ASCAL  C HALLENGE ON   E VALUATING  M ACHINE  L EARNING FOR  I NFORMATION  E XTRACTION

PASCAL

Annotation SlotsTraining Corpus Test corpus

workshop name 543 11.8% 245 10.8%

acronym 566 12.3% 243 10.7%

homepage 367 8.0% 215 9.5%

location 457 10.0% 224 9.9%

date 586 12.8% 326 14.3%

paper submission date 590 12.9% 316 13.9%

notification of acceptance date 391 8.5% 190 8.4%

camera-ready copy date 355 7.7% 163 7.2%

conference name 204 4.5% 90 4.0%

acronym 420 9.2% 187 8.2%

homepage 104 2.3% 75 3.3%

Total 4583 100.0% 2274 100.0%

Page 11: P ASCAL  C HALLENGE ON   E VALUATING  M ACHINE  L EARNING FOR  I NFORMATION  E XTRACTION

PASCAL

Preprocessing

• GATE– Tokenisation– Part-Of-Speech– Named-Entities

• Date, Location, Person, Number, Money

Page 12: P ASCAL  C HALLENGE ON   E VALUATING  M ACHINE  L EARNING FOR  I NFORMATION  E XTRACTION

PASCAL

Evaluation Tasks

• Task1 - ML for IE: Annotating implicit information – 4-fold cross-validation on 400 training documents

– Final Test on 200 unseen test documents

• Task2a - Learning Curve: – Effect of increasing amounts of training data on learning

• Task2b - Active learning: Learning to select documents – Given seed documents select the documents to add to training set

• Task3a – Semi-supervised Learning: Given data– Same as Task1 but can use the 500 unannotated documents

• Task3b - Semi-supervised Learning: Any Data– Same as Task1 but can use all available unannotated documents

Page 13: P ASCAL  C HALLENGE ON   E VALUATING  M ACHINE  L EARNING FOR  I NFORMATION  E XTRACTION

PASCAL

Evaluation

• Precision/Recall/F1Measure

• MUC Scorer

• Automatic Evaluation Server

• Exact matching

• Extract every slot occurrence

Page 14: P ASCAL  C HALLENGE ON   E VALUATING  M ACHINE  L EARNING FOR  I NFORMATION  E XTRACTION

PASCAL

ParticipantsParticipant ML 4-fold X-validation Test Corpus

1 2a 2b 3a 3b 1 2a 2b 3a 3b

Amilcare (Sheffield, UK) LP2 2 2 1 1 1 1 1

Bechet (Avignon, France) HMM 2 1 2 2

Canisius (Netherlands) SVM, IBL 1 1

Finn (Dublin, Ireland) SVM 1 1

Hachey (Edinburgh, UK) MaxEnt, HMM 1 1

ITC-IRST (Italy) SVM 3 3 1

Kerloch (France) HMM 2 2 3 2

Sigletos (Greece) LP2, BWI, ? 1 3

Stanford (USA) CRF 1 1

TRex (Sheffield, UK) SVM 2

Yaoyong (Sheffield, UK) SVM 3 3 3 3 3 3

Total 15 8 4 0 0 20 10 5 1 1

Page 15: P ASCAL  C HALLENGE ON   E VALUATING  M ACHINE  L EARNING FOR  I NFORMATION  E XTRACTION

PASCAL

Task1

Information Extraction with all the available data

Page 16: P ASCAL  C HALLENGE ON   E VALUATING  M ACHINE  L EARNING FOR  I NFORMATION  E XTRACTION

PASCAL

Task1: Test Corpus

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Precision

Re

call

AmilcareStanfordYaoyongITC-IRSTSigletosCanisiusTrexBechetFinnKerloch

Page 17: P ASCAL  C HALLENGE ON   E VALUATING  M ACHINE  L EARNING FOR  I NFORMATION  E XTRACTION

PASCAL

Task1: Test Corpus

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Precision

Re

call

AmilcareStanfordYaoyongITC-IRSTSigletosCanisiusTrexBechetFinnKerloch

Page 18: P ASCAL  C HALLENGE ON   E VALUATING  M ACHINE  L EARNING FOR  I NFORMATION  E XTRACTION

PASCAL

Task1: Test Corpus

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

P recision

Yaoyong

ITC-IRST

Canisius

Trex

Finn

Page 19: P ASCAL  C HALLENGE ON   E VALUATING  M ACHINE  L EARNING FOR  I NFORMATION  E XTRACTION

PASCAL

Task1: 4-Fold Cross-validation

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Precision

Rec

all

Amilcare

Yaoyong

ITC-IRST

Sigletos

Canisius

Bechet

Finn

Kerloch

Page 20: P ASCAL  C HALLENGE ON   E VALUATING  M ACHINE  L EARNING FOR  I NFORMATION  E XTRACTION

PASCAL

Task1: 4-Fold & Test Corpus

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Precision

Rec

all

Amilcare

Yaoyong

ITC-IRST

Sigletos

Canisius

Bechet

Finn

Kerloch

Page 21: P ASCAL  C HALLENGE ON   E VALUATING  M ACHINE  L EARNING FOR  I NFORMATION  E XTRACTION

PASCAL

Task1: Slot FMeasure

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Mean

Max

Page 22: P ASCAL  C HALLENGE ON   E VALUATING  M ACHINE  L EARNING FOR  I NFORMATION  E XTRACTION

PASCAL

Best Slot FMeasures Task1: Test Corpus

Amilcare1 Yaoyong1 Stanford1 Yaoyong2 ITC-IRST2name 0.352 0.58 0.596 0.542 0.66acro 0.865 0.612 0.496 0.6 0.383date 0.694 0.731 0.752 0.69 0.589home 0.721 0.748 0.671 0.705 0.516loca 0.488 0.641 0.647 0.66 0.542pape 0.864 0.74 0.712 0.696 0.712noti 0.889 0.843 0.819 0.856 0.853came 0.87 0.75 0.784 0.747 0.783name 0.551 0.503 0.493 0.477 0.481acro 0.905 0.445 0.491 0.387 0.348home 0.393 0.149 0.151 0.116 0.119

workshop

conference

Page 23: P ASCAL  C HALLENGE ON   E VALUATING  M ACHINE  L EARNING FOR  I NFORMATION  E XTRACTION

PASCAL

Task 2a

Learning Curve

Page 24: P ASCAL  C HALLENGE ON   E VALUATING  M ACHINE  L EARNING FOR  I NFORMATION  E XTRACTION

PASCAL

Task2a: Learning Curve FMeasure

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Amilcare

Yaoyong1

Yaoyong2

Yaoyong3

ITC-IRST1

Bechet2

Kerloch3

Bechet1

Kerloch2

Hachey

MEAN

Page 25: P ASCAL  C HALLENGE ON   E VALUATING  M ACHINE  L EARNING FOR  I NFORMATION  E XTRACTION

PASCAL

Task2a: Learning Curve Precision

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Amilcare

Yaoyong1

Yaoyong2

Yaoyong3

ITC-IRST1

Bechet2

Kerloch3

Bechet1

Kerloch2

Hachey

MEAN

Page 26: P ASCAL  C HALLENGE ON   E VALUATING  M ACHINE  L EARNING FOR  I NFORMATION  E XTRACTION

PASCAL

Task2a: Learning Curve Recall

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Amilcare

Yaoyong1

Yaoyong2

Yaoyong3

ITC-IRST1

Bechet2

Kerloch3

Bechet1

Kerloch2

Hachey

MEAN

Page 27: P ASCAL  C HALLENGE ON   E VALUATING  M ACHINE  L EARNING FOR  I NFORMATION  E XTRACTION

PASCAL

Task 2b

Active Learning

Page 28: P ASCAL  C HALLENGE ON   E VALUATING  M ACHINE  L EARNING FOR  I NFORMATION  E XTRACTION

PASCAL

Task2b: Active Learning

• Amilcare– Maximum divergence from expected number of

tags.

• Hachey– Maximum divergence between two classifiers

built on different feature sets.

• Yaoyong (Gram-Schmidt)– Maximum divergence between example subset.

Page 29: P ASCAL  C HALLENGE ON   E VALUATING  M ACHINE  L EARNING FOR  I NFORMATION  E XTRACTION

PASCAL

Task2b: Active LearningIncreased FMeasure over random selection

-0.03

-0.02

-0.01

0

0.01

0.02

0.03

0.04

0.05

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Amilcare

Yaoyong1

Yaoyong2

Yaoyong3

Hachey

Page 30: P ASCAL  C HALLENGE ON   E VALUATING  M ACHINE  L EARNING FOR  I NFORMATION  E XTRACTION

PASCAL

Task 3

Semi-supervised learning

(not significant participation)

Page 31: P ASCAL  C HALLENGE ON   E VALUATING  M ACHINE  L EARNING FOR  I NFORMATION  E XTRACTION

PASCAL

Conclusions (Task1)

• Top three (4) systems use different algorithms– Rule Induction, SVM, CRF & HMM

• Same algorithms (SVM) produced different results

• Brittle Performance• Large variation on slot performance• Post-processing

Page 32: P ASCAL  C HALLENGE ON   E VALUATING  M ACHINE  L EARNING FOR  I NFORMATION  E XTRACTION

PASCAL

Conclusion (Task2 & Task3)

• Task 2a: Learning Curve– Systems’ performance is largely as expected

• Task 2b: Active Learning– Two approaches, Amilcare and Hachey,

showed benefits

• Task 3: Semi-supervised Learning– Not sufficient participation to evaluate use of

enrich data

Page 33: P ASCAL  C HALLENGE ON   E VALUATING  M ACHINE  L EARNING FOR  I NFORMATION  E XTRACTION

PASCAL

Future Work

• Performance differences:– Systems: what determines good/bad performance– Slots: different systems were better/worse at identifying different

slots

• Combine approaches• Active Learning• Semi-supervised Learning

– Overcoming the need for annotated data

• Extensions– Data: Use different data sets and other features, using (HTML)

structured data– Tasks: Relation extraction

Page 34: P ASCAL  C HALLENGE ON   E VALUATING  M ACHINE  L EARNING FOR  I NFORMATION  E XTRACTION

PASCAL

Thank You

http://tyne.shef.ac.uk/Pascal