My research taster project

24
Michele Filannino + You CS-GN-TEAM: internal presentation Manchester, 15/02/2012 research taster project temporal expressions extraction

description

 

Transcript of My research taster project

Page 1: My research taster project

Michele Filannino + You

CS-GN-TEAM: internal presentation

Manchester, 15/02/2012

research taster projecttemporal expressions extraction

Page 2: My research taster project

/ 2315/02/2012, Michele Filannino

presentation my research taster project

cdt?

■ 4-year PhD course

■ funded by EPSRC

■ industrial partners

■ multi-disciplinary

■ new model for all PhD training within the UK

2

Page 3: My research taster project

/ 2315/02/2012, Michele Filannino

presentation my research taster project

cdt?■ 6 months of foundation period

● 3 postgraduate courses

▶ Machine Learning and Data Mining, Modelling and

visualisation of high-dimensional data, Semi-structured data

and the web

● 3 scientific methods courses

● 1 short taster project [6 weeks]

● creativity workshops

■ 3,5 years of PhD research

3

Page 4: My research taster project

/ 2315/02/2012, Michele Filannino

presentation my research taster project

where we are

■ Computer science

● natural language processing

▶ information retrieval

★ information extraction

✦ temporal expressions extraction

4

Page 5: My research taster project

/ 2315/02/2012, Michele Filannino

presentation my research taster project

or...

■ Computer science

● data mining

▶ text mining

★ information extraction

✦ temporal expressions extraction

5

Page 6: My research taster project

/ 2315/02/2012, Michele Filannino

presentation my research taster project

1 L. Ferro, I. Mani, B. Sundheim, and G. Wilson, “Tides temporal annotation guidelines, v.

1.0.2,” MITRE, 20012 timex temporal expression

temporal expression

■ natural language phrase that denotes a temporal

entity: an interval or an instant1

● fully-qualified: no reference to any other temporal

entity

▶ March 15, 2001

● deictic: reference to the time of utterance

▶ today, yesterday, three weeks ago, last Thursday

● anaphoric: reference to a timex2 previously evoked in

the text

▶ March 15, the next week, Saturday, at that time

6

Page 7: My research taster project

/ 2315/02/2012, Michele Filannino

presentation my research taster project

why?

■ user’s perspective

● temporal aspects of events and entities provide a

natural mechanism for organising information.

■ machine’s perspective

● improvements in

▶ question answering, summarisation, browsing

7

Page 8: My research taster project

/ 2315/02/2012, Michele Filannino

presentation my research taster project

how?

■ annotation

● recognition

▶ automatically detect and delimitate expressions

▶ mostly machine-learning techniques

● normalisation

▶ assign attributes values for all the recognised

expressions

▶ using a shared and formal format (standard?)

▶ mostly rule-based techniques

■ reasoning or searching

8

Page 9: My research taster project

/ 2315/02/2012, Michele Filannino

presentation my research taster project

1 J. Poveda, M. Surdeanu, and J. Turmo, “An analysis of Bootstrapping for the Recognition of Temporal Expressions”, 2009

timex forms1

■ time or date references

● 11pm, February 14th, 2005

■ time references that anchor on another time

● one hour after midnight, two weeks before Christmas

■ durations

● few months, two days, five years

■ recurring times

● every third month, twice in the hour

9

Page 10: My research taster project

/ 2315/02/2012, Michele Filannino

presentation my research taster project

1 J. Poveda, M. Surdeanu, and J. Turmo, “An analysis of Bootstrapping for the Recognition of Temporal Expressions”, 2009

timex forms1

■ context-dependent times

● today, last year

■ vague references

● somewhere in the middle of June, the near future

■ times indicated by an event

● the day S. Berlusconi resigned

▶ an event is considered a cover term for situations that

happen or occur

10

Page 11: My research taster project

/ 2315/02/2012, Michele Filannino

presentation my research taster project

1 TERN2004 corpus

timeline

11

85%1 87.8%187.8%1 90.7%190.7%1

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013

TimeML(standard)

ACE-2004 dev & eval(TERN2004 corpus)

TimeBank(corpus)

Hand grammar approach(rule-based)

TempEval Task#15(in SemEval07)

TempEval-2 Task#13(in SemEval10)

TempEval-3 Task#1(in SemEval13)

Markov logic network(machine learning)

SVM(machine learning)

Maximum Entropy Class.(machine learning)

Conditional Random Fields(machine learning)

Page 12: My research taster project

/ 2315/02/2012, Michele Filannino

presentation my research taster project

standards

■ “the nice thing about standards is, there are so

many to choose from” by Andrew S. Tanenbaum

● TimeML

● DAML-Time

● TIDES

● ACE-TERN

12

Page 13: My research taster project

/ 2315/02/2012, Michele Filannino

presentation my research taster project

standards

■ there’s a tension between

● flexibility and efficiency

● usability and flexibility

● complexity and spreadability

● flexibility and agreement

13

Page 14: My research taster project

/ 2315/02/2012, Michele Filannino

presentation my research taster project

about the spreadability

14

Page 15: My research taster project

/ 2315/02/2012, Michele Filannino

presentation my research taster project

Source: http://timeml.org/site/timebank/documentation-1.2.html

about the agreement

15

TimeML Tag agreement

TIMEX3 0.83

SIGNAL 0.77

EVENT 0.78

ALINK 0.81

SLINK 0.85

TLINK 0.55

Page 16: My research taster project

/ 2315/02/2012, Michele Filannino

presentation my research taster project

Source: TRIOS TimeBank v.0.1

example: raw text

That means Unisys must pay about $100 million in interest every

quarter, on top of $27 million in dividends on preferred stock.

16

Page 17: My research taster project

/ 2315/02/2012, Michele Filannino

presentation my research taster project

Source: TRIOS TimeBank v.0.1

example: recognition

That means Unisys must <ev>pay</ev> about $100 million in interest

<te>every quarter</te>, on top of $27 million in dividends on preferred

stock.

17

Page 18: My research taster project

/ 2315/02/2012, Michele Filannino

presentation my research taster project

Source: TRIOS TimeBank v.0.1

example: normalisationThat means Unisys must <EVENT eid="e110" mainevent="YES"

class="OCCURRENCE" stem="pay" tense="NONE" aspect="NONE"

polarity="POS" pos="VERB">pay</EVENT> about $100 million in

interest <TIMEX3 tid="t256" type="SET" value="P1Q"

temporalFunction="false" functionInDocument="NONE"

quant="every">every quarter</TIMEX3>, on top of $27 million in

dividends on preferred stock.

<TLINK lid="l32" relType="BEFORE" relatedToEvent="e110"

eventID="e107"/>

<TLINK lid="l26" relType="OVERLAP" eventID="e110"

relatedToTime="t256"/>

18

Page 19: My research taster project

/ 2315/02/2012, Michele Filannino

presentation my research taster project

considerations

■ specialised linguistic approaches do not pay

● machine learning techniques usually perform better

■ scarcity of pre-annotated corpus

● manual corpus annotation is very tricky

● partially solved with TempEval-3 (2013)

▶ 1M words corpus automatically annotated by TRIOS

■ vibrant area in bio-medical domain

19

Page 20: My research taster project

/ 2315/02/2012, Michele Filannino

presentation my research taster project

Source: Google Scholar (last update 09/02/2012) 20

0

50

100

150

200

250

300

350

400

450

500

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012

9

46

44

4541

36

42

22

15

1516

1210

33

382

433412410

370

410

310280

230220

180182

“temporal expressions” “temporal expressions” AND “clinical”

Page 21: My research taster project

/ 2315/02/2012, Michele Filannino

presentation my research taster project

Source: Google Scholar (last update 09/02/2012) 21

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012

21%

11%9%10%9%9%9%

7%5%6%7%6%5%

79%

89%91%90%91%91%91%93%95%94%93%94%95%

“temporal expressions” “temporal expressions” AND “clinical”

Page 22: My research taster project

/ 2315/02/2012, Michele Filannino

presentation my research taster project

considerations

■ rule-based approach will never die

● CRF and MLN are machine learning hybridisation

■ better performance means clever decomposition

● how to divide the general problem into sub-problems

22

Page 23: My research taster project

/ 2315/02/2012, Michele Filannino

presentation my research taster project

my to-do list

■ collect some corpus in clinical field

■ study novel machine learning approaches

● maximum likelihood, logistic regression, CRF, MLN

■ implement a prototype

● Python or MATLAB

23

0 3 6 9 12 15 18 21 24 27 30

18 days remaining12 days elapsed

Page 24: My research taster project

Thank you.