Zornitsa Kozareva USC/ISI Marina del Rey, CA · Choice of classifier The attribute whose value is...

35
!"#$$% '()*+,-./+( )+ 0123 Zornitsa Kozareva USC/ISI Marina del Rey, CA [email protected] www.isi.edu/~kozareva 43(-3*5 678 69:;

Transcript of Zornitsa Kozareva USC/ISI Marina del Rey, CA · Choice of classifier The attribute whose value is...

Page 1: Zornitsa Kozareva USC/ISI Marina del Rey, CA · Choice of classifier The attribute whose value is to be predicted from the values of the remaining ones. Default is the last attribute.

!"#$$%&'()*+,-./+(&)+&0123&

Zornitsa Kozareva!USC/ISI!

Marina del Rey, [email protected]!

www.isi.edu/~kozareva!

43(-3*5&678&69:;&

Page 2: Zornitsa Kozareva USC/ISI Marina del Rey, CA · Choice of classifier The attribute whose value is to be predicted from the values of the remaining ones. Default is the last attribute.

!"#$%!&'(&)*%"+,'-*+./+)%0*-%#+*12/34/%$+&256'6%

Page 3: Zornitsa Kozareva USC/ISI Marina del Rey, CA · Choice of classifier The attribute whose value is to be predicted from the values of the remaining ones. Default is the last attribute.

!/(&7%8&)&%9'+'+4%:*;1&-/%•  <*22/=>*+%*0%.&=?'+/%2/&-+'+4%&24*-')?.6%

–  *@/+A6*B-=/%@&=(&4/%1-'C/+%'+%D&,&%

•  E6/3%0*-%-/6/&-=?F%/3B=&>*+%&+3%&@@2'=&>*+%

•  9&'+%0/&)B-/67%–  3&)&%@-/A@-*=/66'+4%)**26%–  2/&-+'+4%&24*-')?.6%–  /,&2B&>*+%./)?*36%

–  4-&@?'=&2%'+0/-/+=/%–  /+,'-*+./+)%0*-%=*.@&-'+4%2/&-+'+4%&24*-')?.6%

G%

Page 4: Zornitsa Kozareva USC/ISI Marina del Rey, CA · Choice of classifier The attribute whose value is to be predicted from the values of the remaining ones. Default is the last attribute.

!/(&7%8&)&%9'+'+4%:*;1&-/%

•  <2&66'H=&>*+%&24*-')?.67%–  %3/='6'*+%)-//6F%(IIF%:J9F%I&',/AK&5/6%

•  L-/3'=>*+%&24*-')?.67%–  -/4-/66'*+%M2'+/&-N:J9O%F%@/-=/@)-*+%

•  9/)&A&24*-')?.67%–  K&44'+4F%K**6>+4%M$3&P**6)O%

&.*+4%*)?/-6%

Q%

Page 5: Zornitsa Kozareva USC/ISI Marina del Rey, CA · Choice of classifier The attribute whose value is to be predicted from the values of the remaining ones. Default is the last attribute.

R/S+4%:)&-)/3%

•  T+6)&22%!/(&%6*;1&-/%M*+%U'+BVO7%

–  8*1+2*&3%2'+(7%%•  ?C@7NN@-3*1+2*&36W6*B-=/0*-4/W+/)N1/(&N1/(&AQAXAGWY'@%•  E+Y'@%)?/%6*;1&-/%

– Z/[B'-/./+)7%%%%%D&,&%\W]%M*-%?'4?/-O%

–  T+,*(/%!/(&%=*..&+37%•  ^&,&%A=@%1/(&W^&-%!"#$%&'())%*+,-

_%

Page 6: Zornitsa Kozareva USC/ISI Marina del Rey, CA · Choice of classifier The attribute whose value is to be predicted from the values of the remaining ones. Default is the last attribute.

]%

java -Xmx1000M -jar weka.jar Weka GUI Chooser

Page 7: Zornitsa Kozareva USC/ISI Marina del Rey, CA · Choice of classifier The attribute whose value is to be predicted from the values of the remaining ones. Default is the last attribute.

@relation named_entity

@attribute position numeric @attribute pos_tag { NN, NP, VB, DT} @attribute word_length numeric @attribute in_gazetteer { no, yes} @attribute class { B-PER, I-PER, B-LOC, I-LOC, O}

@data 3,DT,3,no,B-ORG 4,NP,10,yes,I-ORG 15,NP,6,yes,O 7, NN,12,?,B-PER ...

Data File Format (.arff)

Other attribute types:

•  String

•  Date

Missing value

X%

Page 8: Zornitsa Kozareva USC/ISI Marina del Rey, CA · Choice of classifier The attribute whose value is to be predicted from the values of the remaining ones. Default is the last attribute.

List of attributes (last: class variable)

Frequency and categories for the selected

attribute

Statistics about the values of the selected attribute

Classification

Filter selection

Manual attribute selection

Statistical attribute selection

Preprocessing

The Preprocessing Tab

Slide adapted from Marti Hearst

Page 9: Zornitsa Kozareva USC/ISI Marina del Rey, CA · Choice of classifier The attribute whose value is to be predicted from the values of the remaining ones. Default is the last attribute.

Choice of classifier

The attribute whose value is to be predicted from the values of the remaining ones.

Default is the last attribute.

Cross-validation: split the data into e.g. 10 folds and

10 times train on 9 folds and test on the remaining one

The Classification Tab

Slide adapted from Marti Hearst

Page 10: Zornitsa Kozareva USC/ISI Marina del Rey, CA · Choice of classifier The attribute whose value is to be predicted from the values of the remaining ones. Default is the last attribute.

Choosing a classifier

Slide adapted from Marti Hearst

Page 11: Zornitsa Kozareva USC/ISI Marina del Rey, CA · Choice of classifier The attribute whose value is to be predicted from the values of the remaining ones. Default is the last attribute.

Slide adapted from Marti Hearst

Page 12: Zornitsa Kozareva USC/ISI Marina del Rey, CA · Choice of classifier The attribute whose value is to be predicted from the values of the remaining ones. Default is the last attribute.

Slide adapted from Marti Hearst

Page 13: Zornitsa Kozareva USC/ISI Marina del Rey, CA · Choice of classifier The attribute whose value is to be predicted from the values of the remaining ones. Default is the last attribute.

all other numbers can be obtained from it

different/easy class

accuracy

Slide adapted from Marti Hearst

Page 14: Zornitsa Kozareva USC/ISI Marina del Rey, CA · Choice of classifier The attribute whose value is to be predicted from the values of the remaining ones. Default is the last attribute.

Running on Test Set

\Q%Slide adapted from Marti Hearst

Page 15: Zornitsa Kozareva USC/ISI Marina del Rey, CA · Choice of classifier The attribute whose value is to be predicted from the values of the remaining ones. Default is the last attribute.

!"#$%<*..&+3%U'+/%

\_%

Page 16: Zornitsa Kozareva USC/ISI Marina del Rey, CA · Choice of classifier The attribute whose value is to be predicted from the values of the remaining ones. Default is the last attribute.

!/(&%:@/='H=&>*+6%•  -̀&'+%=2&66'H/-%*+%)-&'+'+4%3&)&%&+3%*B)@B)%.*3/2%

•  ^&,&%A=@%1/(&W^&-%!'.%//01#2&34*'5(*,%a)%b62%0*&1.#,%%A3%!62%0*#+&)(+#.,-

•  ZB+%)-&'+/3%=2&66'H/-%.*3/2%*+%)/6)%3&)&%•  ^&,&%A=@%1/(&W^&-%!'.%//01#2&34*'5(*,%a`%b6#/6&1.#,%%A2%b62%0*#+&)(+#.,-

•  :@/='05'+4%@&-&./)/-67%A)%7%)-&'+'+4%H2/%MW&-cO%A`%7%)/6)%H2/%MW&-cO%A3%7%*B)@B)%H2/+&./%M)-&'+/3%=2&66'H/-%.*3/2O%A2%7%'+@B)%.*3/2%M0*-%)/6>+4O%A#%7%+B.K/-%*0%+/&-/6)%+/'4?K*-6%0*-%(II%&24*-')?.%&7-8-7#.9-:'7#'$-(46-(67#2-9%2%)#6#2-(95(*/;-#6'<=-

general parameters

Classifier-specific

parameters

\]%

Page 17: Zornitsa Kozareva USC/ISI Marina del Rey, CA · Choice of classifier The attribute whose value is to be predicted from the values of the remaining ones. Default is the last attribute.

"V&.@2/7%$II%'+%!/(&%

•  -̀&'+%&%=2&66'H/-%B6'+4%GII%&24*-')?.%•  ^&,&%A=@%1/(&W^&-%%%%%%%%%%%%%%%%%%1/(&W=2&66'H/-6W2&Y5WTP(%%

%%%%%%%%%%%%A)%%3&)&N1/&)?/-W&-c-

%%%%%%%%%%%%A#%%G%

%%%%%%%%%%%%A3%%.*3/2WG++%

•  ZB+%)?/%)-&'+/3%=2&66'H/-%*+%)/6)%3&)&%•  ^&,&%A=@%1/(&W^&-%%%%%%%%%%%%%%%%%%1/(&W=2&66'H/-6W2&Y5WTP(%%

%%%%%%%%%%%%A`%%3&)&N1/&)?/-W&-c-

%%%%%%%%%%%%A2%%.*3/2WG++%

Classifier-function in weka

Training file Algorithm parameter Output model name

Classifier-function in weka Test file

Input model name

\X%

Page 18: Zornitsa Kozareva USC/ISI Marina del Rey, CA · Choice of classifier The attribute whose value is to be predicted from the values of the remaining ones. Default is the last attribute.

:&.@2/%!/(&%dB)@B)%

\e%

Page 19: Zornitsa Kozareva USC/ISI Marina del Rey, CA · Choice of classifier The attribute whose value is to be predicted from the values of the remaining ones. Default is the last attribute.

•  <2&66'H=&>*+%2&K/26%0*-%/&=?%'+6)&+=/%MB6/%fa@%\g%*@>*+O%•  ^&,&%A=@%1/(&W^&-%%%1/(&W=2&66'H/-6W2&Y5WTK(%%A`%%3&)&N1/&)?/-W&-c%%%A2%%.*3/2WG++%%A@%\%

9*-/%8/)&'2/3%dB)@B)%

\h%

Page 20: Zornitsa Kozareva USC/ISI Marina del Rey, CA · Choice of classifier The attribute whose value is to be predicted from the values of the remaining ones. Default is the last attribute.

•  (II7%%•  8/='6'*+%)-//67%•  I&i,/%P&5/67%•  $3&P**6)7%%%

!/(&%<2&66'H=&>*+%jB+=>*+6%

\k%

Page 21: Zornitsa Kozareva USC/ISI Marina del Rey, CA · Choice of classifier The attribute whose value is to be predicted from the values of the remaining ones. Default is the last attribute.

$33'>*+&2%T+0*-.&>*+%

•  R/+/-&2%3*=B./+)&>*+7%%%%%?C@7NN111W=6W1&'(&)*W&=W+YN.2N1/(&N%?C@7NN@-3*1+2*&36W6*B-=/0*-4/W+/)N1/(&N1/(&W@@)%

•  <*..&+3%2'+/%3*=7%

%%%?C@7NN1/(&W1'('6@&=/6W=*.NL-'./-%

Gl%

Page 22: Zornitsa Kozareva USC/ISI Marina del Rey, CA · Choice of classifier The attribute whose value is to be predicted from the values of the remaining ones. Default is the last attribute.

$66'4+./+)%m\%

G\%

I&./3%"+>)5%Z/=*4+'>*+%

Page 23: Zornitsa Kozareva USC/ISI Marina del Rey, CA · Choice of classifier The attribute whose value is to be predicted from the values of the remaining ones. Default is the last attribute.

•  R',/+7% &% )-&'+% &+3% 3/,/2*@./+)% 3&)&% 6/)6% *0%"+42'6?%6/+)/+=/6%)&44/3%1')?%)?/%=2&66/67%– PAL"ZF%TAL"Z%M@/*@2/O%– PAdZRF%TAdZR%M*-4&+'Y&>*+O%– PAUd<F%TAUd<%M2*=&>*+O%– PA9T:<F%TA9T:<%M.'6=/22&+/*B6O%– d%M*B)6'3/F%./&+'+4%+*)%&%+&./3%/+>)5%=2&66O%

•  n*B-% *K^/=>,/% '67% )*% 3/,/2*@% &% .&=?'+/%2/&-+'+4%I"% 656)/.F%1?'=?%1?/+%4',/+%&%+/1%@-/,'*B625% B+6//+% )/V)% M'W/W% )/6)% 6/)O% 1'22%'3/+>05% &+3% =2&66'05% )?/% +&./3% /+>>/6%=*--/=)25%

GG%

Page 24: Zornitsa Kozareva USC/ISI Marina del Rey, CA · Choice of classifier The attribute whose value is to be predicted from the values of the remaining ones. Default is the last attribute.

8&)&%8/6=-'@>*+%•  `?/% 3&)&% =*+6'6)6% *0% )?-//% =*2B.+6% 6/@&-&)/3% K5% &% 6'+42/%

6@&=/W%"&=?%1*-3%?&6%K//+%@B)%*+%&%6/@&-&)/%2'+/%&+3%)?/-/%'6%&+%/.@)5%2'+/%&;/-%/&=?%6/+)/+=/W%%

GQ%

EWIW%%IIL%PAdZR%%*o='&2%II%d%%

"(/B6%IIL%PAL"Z%?/&36%JPp%d%%0*-%TI%d%%

P&4?3&3%IIL%PAUd<%%W%W%d%%

1*-3%

@&-)A*0A6@//=?A)&4%

+&./3%/+>)5%)&4%

<=>?@A%./&+6%)?/%1*-3%'6%)?/%K/4'++'+4%*0%&%@?-&6/%*0%)5@/%`nL"%B%./&+6%)?/%1*-3%'6%+*)%@&-)%*0%&%@?-&6/%%

Make sure to preserve the empty lines in the output of the test data

Page 25: Zornitsa Kozareva USC/ISI Marina del Rey, CA · Choice of classifier The attribute whose value is to be predicted from the values of the remaining ones. Default is the last attribute.

Z/6B2)6%0*-%I"%8/)/=>*+%

G_%

!+CDD=6996&"E3(FGH&AI3J-3/+(&K3)3&

8&)&%6/)6% m)*(/+6% mI"6%

-̀&'+% GX_Fe\]% \hFek_%

8/,/2*@./+)% ]GFkGQ% _FQ]\%

/̀6)% ]\F]QQ% QF]]h%

!3**1*3G&1)&3JL86996& @*1.FGF+(& M1.3JJ& N=G.+*1&

PTd%3/,W% kGW_]% klWhh% k\WXX%

AI3J-3/+(&O13G-*1G&

Page 26: Zornitsa Kozareva USC/ISI Marina del Rey, CA · Choice of classifier The attribute whose value is to be predicted from the values of the remaining ones. Default is the last attribute.

Z/6B2)6%0*-%I"%8/)/=>*+%

G]%

!+CDD=6996&"E3(FGH&AI3J-3/+(&K3)3&

8&)&%6/)6% m)*(/+6% mI"6%

-̀&'+% GX_Fe\]% \hFek_%

8/,/2*@./+)% ]GFkGQ% _FQ]\%

/̀6)% ]\F]QQ% QF]]h%

!3**1*3G&1)&3JL86996& @*1.FGF+(& M1.3JJ& N=G.+*1&

PTd%3/,W% kGW_]% klWhh% k\WXX%

Page 27: Zornitsa Kozareva USC/ISI Marina del Rey, CA · Choice of classifier The attribute whose value is to be predicted from the values of the remaining ones. Default is the last attribute.

Z/6B2)6%0*-%I"%<2&66'H=&>*+q%

GX%

"E3(FGH&&K1IL&

@*1.FGF+(& M1.3JJ& N=G.+*1&

Ud<% ekWl_% hlWll% ekW]G%

9T:<% ]]W_h% ]_WX\% ]]Wl_%

dZR% ekW]e% eXWlX% eeWee%

L"Z% heW\k% hXWk\% heWl]%

*,/-&22% ekW\]% eeWhl% ehW_e%

"E3(FGH&&>1G)L&

@*1.FGF+(& M1.3JJ& N=G.+*1&

Ud<% h]WeX% ekW_Q% hGW_e%

9T:<% XlW\k% ]eWQ]% ]hWeQ%

dZR% h\WG\% hGW_Q% h\Wh\%

L"Z% h_We\% kQW_e% hhWhe%

*,/-&22% h\WQh% h\W_l% h\WQk%

:56)/.%*0%<&--/-&6%/)%&2WFGllG%

Page 28: Zornitsa Kozareva USC/ISI Marina del Rey, CA · Choice of classifier The attribute whose value is to be predicted from the values of the remaining ones. Default is the last attribute.

`'./2'+/%

Ge%

M1J13G1&

>*3F(PK1I1J+EQ1()&K3)3& 43(-3*5&67)H&69:;&

>1G)&K3)3& 43(-3*5&R)H&69:;&

M1G-J)&"-SQFGGF+(&K13,JF(1&

43(-3*5&T)H&69:;&U::%#7&EQ&VO>W&

J3)1*&G-SQFGGF+(G&XFJJ&(+)&S1&3..1E)1,&

>1.H(F.3J&M1E+*)&K13,JF(1&

43(-3*5&T)H&69:;&

Page 29: Zornitsa Kozareva USC/ISI Marina del Rey, CA · Choice of classifier The attribute whose value is to be predicted from the values of the remaining ones. Default is the last attribute.

:BK.')%•  `?/%6*B-=/%=*3/%0*-%)?/%0/&)B-/%4/+/-&>*+%%%%%%%MQ321&G-*1&F)&XFJJ&*-(&-(,1*&DF(-YO%

•  `?/%*o='&2%)-&'+%&+3%)/6)%0/&)B-/%H2/6%B6/3%'+%)?/%H+&2%-B+F%)*4/)?/-%1')?%)?/%H+&2%*B)@B)%*0%5*B-%656)/.%0*-%)?/%)/6)%3&)&%

•  $33'>*+&225%4/+/-&)/3%-/6*B-=/6%M'0%&+5O%

•  !-')/%r_%@&4/3%K-'/0%3/6=-'@>*+%*0%5*B-%&@@-*&=?%/V@2&'+'+47%–  B6/3%IUL%)**26%–  3/6'4+/3%0/&)B-/6%–  /.@2*5/3%.&=?'+/%2/&-+'+4%&24*-')?.s.*>,&>*+%

Gh%

Page 30: Zornitsa Kozareva USC/ISI Marina del Rey, CA · Choice of classifier The attribute whose value is to be predicted from the values of the remaining ones. Default is the last attribute.

",&2B&>*+%'6%K&6/3%*+%•  -&+('+4%*0%5*B-%656)/.%&4&'+6)%)?/%-/6)%

•  3/6'4+/3%0/&)B-/6%–  (+I1JF%@-/,'*B625%B+(+*1+%0/&)B-/6%1'22%K/%0&,*-/3%–  656)/.t6%@-/%*-%@*6)%@-*=/66'+4%

•  Z1(1*3)1,&*1G+-*.1G%%–  6'Y/F%./)?*36%&+3%6*B-=/6%0*-%4&Y/C//-%/V)-&=>*+%

–  )-'44/-%2'6)6%

•  [B&2')5%*0%)?/%@&@/-%3/6=-'@>*+%–  6)-B=)B-/%–  B6/%*0%2')/-&)B-/%%–  1**+*&3(3J5GFG&

Gk%

Page 31: Zornitsa Kozareva USC/ISI Marina del Rey, CA · Choice of classifier The attribute whose value is to be predicted from the values of the remaining ones. Default is the last attribute.

R/+/-&)/%n*B-%d1+%Z/6*B-=/6%•  "V)-&=)%4&Y/C//-6%0-*.%!'('@/3'&%%

–  L/*@2/%M6'+4/-6F%)/&=?/-6F%.&)?/.&>='&+6%/)=WO%–  U*=&>*+6%M='>/6F%=*B+)-'/6O%–  d-4&+'Y&>*+6%MB+',/-6'>/6F%T`%=*.@&+'/6%/)=WO%

•  "V)-&=)%)-'44/-%1*-36%0-*.%!*-3I/)%–  2**(%0*-%?5@*+5.6%*0%@/-6*+F%2*=&>*+F%*-4&+'Y&>*+%

•  "V)-&=)%&+3%-&+(%)?/%@&C/-+6%'+%1?'=?%)?/%I"6%*==B--/3%'+%)?/%)-&'+%&+3%3/,/2*@./+)%3&)&W%:?*1%1?&)%@/-=/+)&4/6%*0%)?/6/%1/-/%0*B+3%'+%)?/%H+&2%)/6)%3&)&W%

•  "V)-&=)%2'6)6%*0%,/-K6%0*B+3%+/V)%)*%)?/%I"6W%8*%5*B%H+3%&+5%6'.'2&-')5N-/4B2&-')5%*0% )?/%,/-K6%&66*='&)/3%1')?%/&=?%*+/%*0%)?/%I"%=&)/4*-'/6u%

Ql%

Page 32: Zornitsa Kozareva USC/ISI Marina del Rey, CA · Choice of classifier The attribute whose value is to be predicted from the values of the remaining ones. Default is the last attribute.

!?&)%.B6)%T%3*%v%•  E6/% )?/% )-&'+% &+3% 3/,/2*@./+)% 3&)&% )*% 3/6'4+%&+3%)B+/%5*B-%I"%656)/.%

•  8/='3/% *+% )?/% 0/&)B-/6% 5*B% 1*B23% 2'(/% )*%'+=*-@*-&)/%'+%5*B-%I"%656)/.%

•  <?**6/%&%.&=?'+/%2/&-+'+4%=2&66'H/-%0-*.%!/(&%•  %?C@7NN111W=6W1&'(&)*W&=W+YN.2N1/(&N%•  T+)-*%K5%9&->%w/&-6)%?C@7NN=*B-6/6W'6=?**2WK/-(/2/5W/3BN'G]XN0lXN2/=)B-/6N2/=)B-/\XW@@)%

>HFG&FG&3&SFZ&3GGFZ(Q1()&G)3*)&13*J5[&Q\%

Page 33: Zornitsa Kozareva USC/ISI Marina del Rey, CA · Choice of classifier The attribute whose value is to be predicted from the values of the remaining ones. Default is the last attribute.

!?&)%5*B%.B6)%+*)%3*%v%

•  \G1& 1YFG/(Z& (3Q1,& 1(/)5& G5G)1QUGW& +*& JFS*3*5&1F)H1*&3G&3&]13)-*1&Z1(1*3)+*8&+-)E-)&Z1(1*3)+*&1).L%%

•  T0%5*B%3*F%)?/+%5*B%1'22%?&,/%)*%-B+%5*B-%656)/.%0*-%)1*%&33'>*+&2%2&+4B&4/6%:@&+'6?F%T)&2'&+%!%

QG%

Page 34: Zornitsa Kozareva USC/ISI Marina del Rey, CA · Choice of classifier The attribute whose value is to be predicted from the values of the remaining ones. Default is the last attribute.

$,&'2&K2/%Z/6*B-=/6%•  !*-3I/)%?C@7NN1*-3+/)W@-'+=/)*+W/3BN%•  L&-)A*0A6@//=?%)&44/-6%

–  -̀// &̀44/-?C@7NN111W'.6WB+'A6)BC4&-)W3/N@-*^/()/N=*-@2/VN -̀// &̀44/-N8/='6'*+ -̀// &̀44/-W?).2%

–  :)&+0*-3%L*:% &̀44/-%?C@7NN+2@W6)&+0*-3W/3BN6*;1&-/N)&44/-W6?).2%

•  IL%=?B+(/-%–  ?C@7NN111W3=6W6?/0W&=WB(Nr.&-(N'+3/VW?).2u?C@7NN111W3=6W6?/0W&=WB(Nr.&-(N@?3N6*;1&-/N=?B+(/-W?).2%

•  L&-6/-%–  :)&+0*-3%L&-6/-%?C@7NN+2@W6)&+0*-3W/3BN6*;1&-/N2/VA@&-6/-W6?).2%

•  d)?/-%% %?C@7NN+2@W6)&+0*-3W/3BN2'+(6N6)&)+2@W?).2%

QQ%

Page 35: Zornitsa Kozareva USC/ISI Marina del Rey, CA · Choice of classifier The attribute whose value is to be predicted from the values of the remaining ones. Default is the last attribute.

R**3%UB=(x%

Q_%