Me and My Vote Taster Session. Ground Rules Mobiles off! Participation Confidentialit y.
My research taster project
-
Upload
michele-filannino -
Category
Technology
-
view
631 -
download
0
description
Transcript of My research taster project
Michele Filannino + You
CS-GN-TEAM: internal presentation
Manchester, 15/02/2012
research taster projecttemporal expressions extraction
/ 2315/02/2012, Michele Filannino
presentation my research taster project
cdt?
■ 4-year PhD course
■ funded by EPSRC
■ industrial partners
■ multi-disciplinary
■ new model for all PhD training within the UK
2
/ 2315/02/2012, Michele Filannino
presentation my research taster project
cdt?■ 6 months of foundation period
● 3 postgraduate courses
▶ Machine Learning and Data Mining, Modelling and
visualisation of high-dimensional data, Semi-structured data
and the web
● 3 scientific methods courses
● 1 short taster project [6 weeks]
● creativity workshops
■ 3,5 years of PhD research
3
/ 2315/02/2012, Michele Filannino
presentation my research taster project
where we are
■ Computer science
● natural language processing
▶ information retrieval
★ information extraction
✦ temporal expressions extraction
4
/ 2315/02/2012, Michele Filannino
presentation my research taster project
or...
■ Computer science
● data mining
▶ text mining
★ information extraction
✦ temporal expressions extraction
5
/ 2315/02/2012, Michele Filannino
presentation my research taster project
1 L. Ferro, I. Mani, B. Sundheim, and G. Wilson, “Tides temporal annotation guidelines, v.
1.0.2,” MITRE, 20012 timex temporal expression
temporal expression
■ natural language phrase that denotes a temporal
entity: an interval or an instant1
● fully-qualified: no reference to any other temporal
entity
▶ March 15, 2001
● deictic: reference to the time of utterance
▶ today, yesterday, three weeks ago, last Thursday
● anaphoric: reference to a timex2 previously evoked in
the text
▶ March 15, the next week, Saturday, at that time
6
/ 2315/02/2012, Michele Filannino
presentation my research taster project
why?
■ user’s perspective
● temporal aspects of events and entities provide a
natural mechanism for organising information.
■ machine’s perspective
● improvements in
▶ question answering, summarisation, browsing
7
/ 2315/02/2012, Michele Filannino
presentation my research taster project
how?
■ annotation
● recognition
▶ automatically detect and delimitate expressions
▶ mostly machine-learning techniques
● normalisation
▶ assign attributes values for all the recognised
expressions
▶ using a shared and formal format (standard?)
▶ mostly rule-based techniques
■ reasoning or searching
8
/ 2315/02/2012, Michele Filannino
presentation my research taster project
1 J. Poveda, M. Surdeanu, and J. Turmo, “An analysis of Bootstrapping for the Recognition of Temporal Expressions”, 2009
timex forms1
■ time or date references
● 11pm, February 14th, 2005
■ time references that anchor on another time
● one hour after midnight, two weeks before Christmas
■ durations
● few months, two days, five years
■ recurring times
● every third month, twice in the hour
9
/ 2315/02/2012, Michele Filannino
presentation my research taster project
1 J. Poveda, M. Surdeanu, and J. Turmo, “An analysis of Bootstrapping for the Recognition of Temporal Expressions”, 2009
timex forms1
■ context-dependent times
● today, last year
■ vague references
● somewhere in the middle of June, the near future
■ times indicated by an event
● the day S. Berlusconi resigned
▶ an event is considered a cover term for situations that
happen or occur
10
/ 2315/02/2012, Michele Filannino
presentation my research taster project
1 TERN2004 corpus
timeline
11
85%1 87.8%187.8%1 90.7%190.7%1
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
TimeML(standard)
ACE-2004 dev & eval(TERN2004 corpus)
TimeBank(corpus)
Hand grammar approach(rule-based)
TempEval Task#15(in SemEval07)
TempEval-2 Task#13(in SemEval10)
TempEval-3 Task#1(in SemEval13)
Markov logic network(machine learning)
SVM(machine learning)
Maximum Entropy Class.(machine learning)
Conditional Random Fields(machine learning)
/ 2315/02/2012, Michele Filannino
presentation my research taster project
standards
■ “the nice thing about standards is, there are so
many to choose from” by Andrew S. Tanenbaum
● TimeML
● DAML-Time
● TIDES
● ACE-TERN
12
/ 2315/02/2012, Michele Filannino
presentation my research taster project
standards
■ there’s a tension between
● flexibility and efficiency
● usability and flexibility
● complexity and spreadability
● flexibility and agreement
13
/ 2315/02/2012, Michele Filannino
presentation my research taster project
about the spreadability
14
/ 2315/02/2012, Michele Filannino
presentation my research taster project
Source: http://timeml.org/site/timebank/documentation-1.2.html
about the agreement
15
TimeML Tag agreement
TIMEX3 0.83
SIGNAL 0.77
EVENT 0.78
ALINK 0.81
SLINK 0.85
TLINK 0.55
/ 2315/02/2012, Michele Filannino
presentation my research taster project
Source: TRIOS TimeBank v.0.1
example: raw text
That means Unisys must pay about $100 million in interest every
quarter, on top of $27 million in dividends on preferred stock.
16
/ 2315/02/2012, Michele Filannino
presentation my research taster project
Source: TRIOS TimeBank v.0.1
example: recognition
That means Unisys must <ev>pay</ev> about $100 million in interest
<te>every quarter</te>, on top of $27 million in dividends on preferred
stock.
17
/ 2315/02/2012, Michele Filannino
presentation my research taster project
Source: TRIOS TimeBank v.0.1
example: normalisationThat means Unisys must <EVENT eid="e110" mainevent="YES"
class="OCCURRENCE" stem="pay" tense="NONE" aspect="NONE"
polarity="POS" pos="VERB">pay</EVENT> about $100 million in
interest <TIMEX3 tid="t256" type="SET" value="P1Q"
temporalFunction="false" functionInDocument="NONE"
quant="every">every quarter</TIMEX3>, on top of $27 million in
dividends on preferred stock.
<TLINK lid="l32" relType="BEFORE" relatedToEvent="e110"
eventID="e107"/>
<TLINK lid="l26" relType="OVERLAP" eventID="e110"
relatedToTime="t256"/>
18
/ 2315/02/2012, Michele Filannino
presentation my research taster project
considerations
■ specialised linguistic approaches do not pay
● machine learning techniques usually perform better
■ scarcity of pre-annotated corpus
● manual corpus annotation is very tricky
● partially solved with TempEval-3 (2013)
▶ 1M words corpus automatically annotated by TRIOS
■ vibrant area in bio-medical domain
19
/ 2315/02/2012, Michele Filannino
presentation my research taster project
Source: Google Scholar (last update 09/02/2012) 20
0
50
100
150
200
250
300
350
400
450
500
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
9
46
44
4541
36
42
22
15
1516
1210
33
382
433412410
370
410
310280
230220
180182
“temporal expressions” “temporal expressions” AND “clinical”
/ 2315/02/2012, Michele Filannino
presentation my research taster project
Source: Google Scholar (last update 09/02/2012) 21
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
21%
11%9%10%9%9%9%
7%5%6%7%6%5%
79%
89%91%90%91%91%91%93%95%94%93%94%95%
“temporal expressions” “temporal expressions” AND “clinical”
/ 2315/02/2012, Michele Filannino
presentation my research taster project
considerations
■ rule-based approach will never die
● CRF and MLN are machine learning hybridisation
■ better performance means clever decomposition
● how to divide the general problem into sub-problems
22
/ 2315/02/2012, Michele Filannino
presentation my research taster project
my to-do list
■ collect some corpus in clinical field
■ study novel machine learning approaches
● maximum likelihood, logistic regression, CRF, MLN
■ implement a prototype
● Python or MATLAB
23
0 3 6 9 12 15 18 21 24 27 30
18 days remaining12 days elapsed
Thank you.