Magic Moments: Moment-based Approaches to Structured Output Prediction

48
Magic Moments: Moment-based Approaches to Structured Output Prediction Elisa Ricci joint work with Nobuhisa Ueda, Tijl De Bie, Nello Cristianini Thursday, October 25 th The Analysis of Patterns

description

The Analysis of Patterns. Magic Moments: Moment-based Approaches to Structured Output Prediction. Elisa Ricci joint work with Nobuhisa Ueda, Tijl De Bie, Nello Cristianini. Thursday, October 25 th. Learning in structured output spaces Z-score Experimental results and computational issues - PowerPoint PPT Presentation

Transcript of Magic Moments: Moment-based Approaches to Structured Output Prediction

Page 1: Magic Moments: Moment-based Approaches to Structured Output Prediction

Magic Moments:Moment-based Approaches toStructured Output Prediction

Elisa Riccijoint work with Nobuhisa Ueda, Tijl De Bie, Nello Cristianini

Thursday, October 25th

The Analysis of Patterns

Page 2: Magic Moments: Moment-based Approaches to Structured Output Prediction

Outline

Learning in structured output spaces

New algorithms based on Z-score

Experimental results and computational issues

Conclusions

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

Page 3: Magic Moments: Moment-based Approaches to Structured Output Prediction

Structured data everywhere!!! Many problems involve highly

structured data which can be represented by sequences, trees and graphs.

Temporal, spatial and structural dependencies between objects are modeled.

This phenomenon is observed in several fields such as computational biology, computer vision, natural language processing or web data analysis.

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

Page 4: Magic Moments: Moment-based Approaches to Structured Output Prediction

Learning with structured data

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

Machine learning and data mining algorithms must be able to analyze efficiently and automatically a vast amount of complex and structured data.

The goal of structured learning algorithms is to predict complex structures, such as sequences, trees, or graphs.

Using traditional algorithms to cope with problems involving structured data often implies a loss of information about the structure.

Page 5: Magic Moments: Moment-based Approaches to Structured Output Prediction

Find s.t.

Supervised learning Data are available in form of examples and their associated correct answers.

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

Learning:

Prediction:

Hypotheses

space

yx ii yyy ,,,,,, xxx 11T

yxh : Hh

1 iyh iix

Training set:

yh x on a new test sample x.

Page 6: Magic Moments: Moment-based Approaches to Structured Output Prediction

Classification A typical supervised learning task is classification.

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

Named entity recognition (NER): locate named entities in text. Entities of interest are person names, location names, organization names, miscellaneous (dates, times...)

Label: entity tag.

Observed variable: word in a sentence.

PP ESTUDIA YA PROYECTO LEY TV REGIONAL REMITIDO POR LA JUNTA Merida.

O N N N M m m N N N O L

Multiclass classification y

x

Page 7: Magic Moments: Moment-based Approaches to Structured Output Prediction

Sequence labeling

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

Sequence labeling: given an input sequence x, reconstruct the associated label sequence y of equal length.

Label sequence: entity tags.

Observed sequence: words in a sentence.

Can we consider the interactions between adjacent words?

Goal: realize a joint labeling for all the words in the sentence.

PP ESTUDIA YA PROYECTO LEY TV REGIONAL REMITIDO POR LA JUNTA Merida.

O N N N M m m N N N O L

y = (y1...yn)

x = (x1...xn)

Page 8: Magic Moments: Moment-based Approaches to Structured Output Prediction

Sequence alignment

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

Biological sequence alignment is used to determine the similarity between biological sequences.

ACTGATTACGTGAACTGGATCCA

ACTC--TAGGTGAAGTG-ATCCA?

S ={A,T,G,C}, S1 , S2

S

Given two sequences S1, S2 S a global alignment is an assignment of gaps, so as to line up each letter in one sequence with either a gap or a letter in the other sequence.

ATGCTTTC------CTGTCGCC

S1 ATGCTTTCS2 CTGTCGCC

Page 9: Magic Moments: Moment-based Approaches to Structured Output Prediction

Sequence alignment: given a sequences pair x, predict the correct sequence y of alignment operations (e.g. matches, mismatches, gaps).

Alignments can be represented as paths from the upper-left to the lower-right corner in the alignment graph.

Sequence alignment

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

ATGCTTTC------CTGTCGCCy

S1 ATGCTTTCS2 CTGTCGCC

x

A T G C T T T CCTGTCGCC

Page 10: Magic Moments: Moment-based Approaches to Structured Output Prediction

RNA secondary structure prediction

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

RNA secondary structure prediction: given a RNA sequence, predict the most likely secondary structure.

The study of RNA structure is important in understanding its functions.

AUGAGUAUAAGUUAAUGGUUAAAGUAAAUGUCUUCCACACAUUCCAUCUGAUUUCGAUUCUCACUACUCAU

?

Page 11: Magic Moments: Moment-based Approaches to Structured Output Prediction

Sequence parsing

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

Sequence parsing: given an input sequence x, determine the associated parse tree y given an underlying context-free grammar.

Example:

y

GAUCGAUCGAUCx

SS

S

SS

S

UC

GGA

A

C

U

SSS

GA

C

U

Context-free grammar G={V, A, R, S}

V ={S} set of non-terminals symbols

S = {G, A, U, C} set of terminals symbols

R= {S → SS | GSC | CSG | ASU | USA | }.

Page 12: Magic Moments: Moment-based Approaches to Structured Output Prediction

Traditionally HMMs have been used for sequence labeling.

Two main drawbacks: The conditional independence assumptions are often too restrictive. HMMs cannot represent multiple interacting features or long range dependencies between the observations.They are typically trained by maximum likelihood (ML) estimation.

Label sequence y = (y1...yn) Observed sequence x = (x1...xn)

y1 y2 y3

x1 x2 x3

Generative models

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

Sequence labeling:

Page 13: Magic Moments: Moment-based Approaches to Structured Output Prediction

Discriminative models

Specify the probability of possible output y given an observation x (consider conditional probability P(y|x) rather than joint probability P(y,x)).

Do not require strict independence assumptions of generative models.

Arbitrary features of the observations are considered.

Conditional Random Fields (CRFs) [Lafferty et al., 01]

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

y

x

y1 y2 y3

x1 x2 x3

Page 14: Magic Moments: Moment-based Approaches to Structured Output Prediction

Learning in structured output spaces

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

Several discriminative algorithms have emerged recently in order to predict complex structures, such as sequences, trees, or graphs.

New discriminative approaches.

Problems analyzed: Given a training set of correct pairs of sentences and their associated entity tags learn

to extract entities from a new sentence. Given a training set of correct biological alignments learn to align two unknown

sequences. Given a training set of corrects RNA secondary structures associated to a set of

sequences learn to determine the secondary structure of a new sequence.

This is not an exhaustive list of possible applications.

Page 15: Magic Moments: Moment-based Approaches to Structured Output Prediction

Find s.t.

Learning in structured output spaces Multilabel supervised classification (Output: y = (y1...yn)).

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

Learning:

Prediction:

Hypotheses

space

yx ii yxyxyx ,,,,,, 11T

yxh : Hh

1 ih ii yx

Training set:

yx h on a new test sample x. yxxy

,sy

h

max argScore

dT Rs wyxwyx ,,,

Page 16: Magic Moments: Moment-based Approaches to Structured Output Prediction

Three main phases:

Encoding: define a suitable feature map (x,y).

Compression: characterize the output space in a synthetic and compact way.

Optimization: define a suitable objective function and use it for learning.

Learning in structured output spaces

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

Page 17: Magic Moments: Moment-based Approaches to Structured Output Prediction

Encoding: define a suitable feature map (x,y).

Compression: characterize the output space in a synthetic and compact way.

Optimization: define a suitable objective function and use it for learning.

Learning in structured output spaces

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

Page 18: Magic Moments: Moment-based Approaches to Structured Output Prediction

Encoding

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

Features must be defined in a way such that prediction can be computed efficiently.

The feature vector (x,y) decomposes as sum of elementary features (x,y) on “parts”.

Parts are typically edges or nodes in graphs.

yxwxy

,T

yh

max arg

is typically

huge.

y

S1 ATGCTTTCS2 CTGTCGCC

A T G C T T T CCTGTCGCC

Page 19: Magic Moments: Moment-based Approaches to Structured Output Prediction

Encoding

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

ykkkktpz zpzyIpyIyy ,, 11

yxkkkkepq pqpyIqxIyx ,,

y1 y2 y3

x1 x2 x3

In general features reflect long range interactions (when labeling xi past and future observations are taken into account).

Arbitrary features of the observations are considered (e.g. spelling properties in NER).

Sequence labeling:

Example: CRF with HMM features

Page 20: Magic Moments: Moment-based Approaches to Structured Output Prediction

3-parameters model:

In practice more complex models are used:

4-parameters model: affine function for gap penalties, i.e. different costs if the gap starts (gap opening penalty) in a given position or if it continues (gap extension penalty).

211/212-parameters model: (x,y) contains the statistics associated to the gap penalties and all the possible pairs of amino acids.

Encoding

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

A T G C T T T CCTGTCGCC

#matches #mismatches #gaps

yx,

4 1 4

Sequence alignment:

Page 21: Magic Moments: Moment-based Approaches to Structured Output Prediction

Encoding

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

2

2

2

1

1

1

yx,

S → SS S → GSC

S → CSG

S → ASU

S → USA

S → .

Sequence parsing:

y

GAUCGAUCGAUCx

SS

S

SS

S

UC

GGA

A

C

U

SSS

GA

C

U

The feature vector contains the statistics associated to the occurrences of the rules.

Page 22: Magic Moments: Moment-based Approaches to Structured Output Prediction

Encoding Having defined these features, predictions can be computed efficiently with dynamic

programming (DP).

Sequence labeling Viterbi algorithm

Sequence alignment Needleman-Wunsch algorithm

Sequence parsing Cocke-Younger-Kasami (CYK) algorithm

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

A T G C T T T CCTGTCGCC

A T G C T T T C

CTGTCGCC

DP TABLE

Page 23: Magic Moments: Moment-based Approaches to Structured Output Prediction

Encoding: define a suitable feature map (x,y).

Compression: characterize the output space in a synthetic and compact way.

Optimization: define a suitable objective function and use it for learning.

Learning in structured output spaces

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

Page 24: Magic Moments: Moment-based Approaches to Structured Output Prediction

Computing moments

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

The number N of possible output vector yk given an observation x is typically huge.

To characterize the distribution of the scores its mean and its variance are considered.

C and can be computed efficiently with DP techniques.

ΤN

kk

ΤN

kk N

sN

wyxwyxyx

11

11,,,

Cwwwyxyxwyx ΤN

k

Tkk

Τ

N

1

2 1 ,,,

Page 25: Magic Moments: Moment-based Approaches to Structured Output Prediction

Input: x = (x1, x2, ..., xn), p, q.

(i, 1) := 1 i

if (q = x1) and (p = i),

for j = 2 to n

for i = 1 to

M := 0

if (q = x1) and (p = i), M := 1

endfor

endfor

Output:

i

jiji 1: ,,

111

:

ji

jiMjiji i

epqe

pq ,

,,,

i

i

epq

ni

nini

,

,,

1:1 ,iepq

y

Computing moments

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

The number N of possible label sequences yk given an observation sequence x is exponential in the length of the sequences.

An algorithm similar to the forward algorithm is used to compute and C.

Recursive formulaSequence labeling:

Mean value associated to the feature which represents the emission of a symbol q at state p.

kkepq yx ,

y1 y2 y3

x1 x2 x3

Page 26: Magic Moments: Moment-based Approaches to Structured Output Prediction

Computing moments

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

k

k

ii

k

ii aEaEaE

1

11

21

1

21

1

2

1

2 k

k

iik

k

ii

k

ii aEaEaaEaE

Basic idea behind recursive formulas:

Mean values are computed considering:

Variances are computed centering the second order moments:

Page 27: Magic Moments: Moment-based Approaches to Structured Output Prediction

Computing moments

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

Problem: high computational cost for large feature spaces. 1st Solution: Exploit the structure and the sparseness of the covariance matrix C.

In sequence labeling for CRF with HMM features the number of different values in C is linear in the size of the observation alphabet.

2nd Solution: Sampling strategy.

Example:

34 yx

Page 28: Magic Moments: Moment-based Approaches to Structured Output Prediction

Encoding: define a suitable feature map (x,y).

Compression: characterize the output space in a synthetic and compact way.

Optimization: define a suitable objective function and use it for learning.

Learning in structured output spaces

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

Page 29: Magic Moments: Moment-based Approaches to Structured Output Prediction

Z-score

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

New optimization criterion particularly suited for non-separable cases.

Minimize the number of output vectors with score higher than the score of the correct pairs.

Maximize the Z-score:

yx

yxyxx

,,,

s

Z

Page 30: Magic Moments: Moment-based Approaches to Structured Output Prediction

Z-score

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

The Z-score can be expressed as a function of the parameters w.

Two equivalent optimization problems:

Cww

bww T

T

max

kkT

N

kkN

1s.t.

1min

1

2

yxyxw

w

,,

Cww

bw

Cww

yxw

yx

yxyxx

T

T

T

TsZ

,,

,,

Page 31: Magic Moments: Moment-based Approaches to Structured Output Prediction

Z-score

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

Ranking loss:

An upper bound on the ranking loss is minimized:

The number of output vectors with score higher than the score of the correct pairs is minimized.

yy

yxyxyxk

kssIN

rk ,,,L 1

yxyxyxwyxyy

,,,, LL rkurk

k

kT

N

kk NN

2

1

2 111

Page 32: Magic Moments: Moment-based Approaches to Structured Output Prediction

Previous approaches

Minimize the number of incorrect macrolabels y.

CRFs [Lafferty et al., 01], HMSVM [Altun at al., 03], averaged perceptron [Collins 02].

Minimize the number of incorrect microlabels y.

M3Ns [Taskar et al., 03], SVMISO [Tsochantaridis et al., 04].

yxyx hI/ ,L 10

jj yhIhm xyx,L

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

Page 33: Magic Moments: Moment-based Approaches to Structured Output Prediction

SODA

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

Given a training set T the empirical risk associated to the upper-bound on the ranking loss is minimized.

An equivalent formulation in terms of C and b is considered to solve it .

wCw

bw

wbbCw

bw

w

T

T

i

Tiii

T

ii

T

1

1maxSODA (Structured Output

Discriminant Analysis)

i

N

ikik

iiiT

i

N

kik

i

i

1s.t.

1min

1 1

2

yxyxw

w

,,

Page 34: Magic Moments: Moment-based Approaches to Structured Output Prediction

SODA Convex optimization:

If C* is not PSD, regularization can be introduced.

Solution: simple matrix inversion .

Fast conjugate gradient methods available.

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

1s.t.

minmax

bw

wCw

wCw

bww

w T

T

T

T

bCw1

Page 35: Magic Moments: Moment-based Approaches to Structured Output Prediction

Rademacher bound The bound shows that learning based on the upper bound on the ranking loss is

effectively achieved.

The bound holds also in the case where b* and C* are estimated by sampling.

Two directions of sampling: For each only a limited number n of incorrect outputs is considered to

estimate b* and C*.

Only a finite number ℓ of input-output pairs is given in the training set.

The empirical expectation of the estimated loss (estimated by computing b* and C* by random sampling) is a good approximate upper bound for the expected loss .

The latter is an upper bound for the ranking loss , such that the Rademacher bound is also a bound on the expectation of the ranking loss.

yxyx ,E urkL,

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

Tyx,

yxyx ,E urkL̂,ˆ

yx,rkL

Page 36: Magic Moments: Moment-based Approaches to Structured Output Prediction

Rademacher bound

Theorem (Rademacher bound for SODA). With probability at least 1-over the joint of therandom sample T and the random samples from the output space for each that aretaken to approximate the matrices b* and C*, the following bound holds for any w with squarednorm smaller than c:

whereby M is a constant and we assume that the number of random samples for each trainingpair is equal to n.The Rademacher complexity terms and decrease with and respectively, suchthat the bound becomes tight for increasing n and ℓ, as long as n grows faster than log(ℓ).

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

2

2log3

2

2log3

21

//

ˆˆˆˆ,,,,

ˆ

Mn

M

E,E,E ,urkurk

yxyxyxyx yxyx LL

yx,,ˆ

1 2̂ 1

n

1

Tyx,

Page 37: Magic Moments: Moment-based Approaches to Structured Output Prediction

Z-score approach

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

How to define the Z-score of a training set? Another possible approach (independence assumption):

Convex optimization problem which can be solved again by simple matrix inversion.

Maximizing the Z-score most linear constraints

are satisfied.

wCw

bw

wCw

bw

w T

T

ii

T

ii

T

1

1max Z-score approach

iik

ik

iTiiT i yyyxwyxw ,,,,, 21

Page 38: Magic Moments: Moment-based Approaches to Structured Output Prediction

One may want to impose explicitly the violated constraints.

This is again a convex optimization problem that can be solved with an iterative algorithm similar to previous approaches (HMSVM [Altun at al., 03], averaged perceptron [Collins 02]).

Eventually relax constraints (e.g. add slack variables for non separable problems).

Iterative approach

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

QP1s.t.

min

bw

wCww

T

T

iik

ik

iTiiT i yyyxwyxw ,,,,, 21

Page 39: Magic Moments: Moment-based Approaches to Structured Output Prediction

Input: training set T

1: C ← ø2: Compute bi, Ci for all i=1…ℓ3: Compute =sum(bi), =sum(Ci)4: Find wsolving QP.5: Repeat6: for i=1…ℓ do7: Compute yi’=argmaxy wT(xi, yi) 8: if wT(xi, yi’) >wT(xi, ) 9: C ← C U wT((xi, )- (xi, yi’) )> }10: Find wsolving QP s.t. C11: endif 12: endfor13: until C is not changed in during the current iteration.

iy

b C

iy

Iterative approach

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

Moments computation

Z-score maximization

Constrained Z-score maximization

Identify the most violated constraint

Page 40: Magic Moments: Moment-based Approaches to Structured Output Prediction

Experimental results

Chain CRF with HMM features.Sequence length: 50. Training set size: 20 pairs. Test set size: 100 pairs. Comparison with SVMISO [Tsochantaridis et al., 04], Perceptron [Collins 02], CRFs [Lafferty et al., 01].Average number of incorrect labels varying the level of noise p.

24 yx

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

Sequence labeling: artificial data.

35 yx

0 0.2 0.4 0.660

70

80

90

100

p

Tes

t e

rro

r

SODAPerceptronSVMISOCRFsZ-score

0 0.2 0.4 0.630

40

50

60

70

80

90

p

Tes

t e

rro

r

CRFsSVMISOPerceptronSODAZ-score

Page 41: Magic Moments: Moment-based Approaches to Structured Output Prediction

HMM features ( ).Noise level p=0.2.Average number of incorrect labels and computational time as function of the training set size.

Experimental results

35 yx

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

Sequence labeling: artificial data.

5 10 15 20 25 3030

32

34

36

38

40

42

44

Training set size

Tes

t e

rro

r

CRFsSVMISOPerceptronSODA

20 40 60 80 1000

10

20

30

40

50

60

70

Training set size

Tim

e

SODASVMISO

Page 42: Magic Moments: Moment-based Approaches to Structured Output Prediction

Chain CRF with HMM features ( ).Sequence length: 10. Training set size: 50 pairs. Test set size: 100 pairs. Level of noise p=0.2Comparison with SVMISO [Tsochantaridis et al., 04].Labeling error on test set and average training time as function of the observation alphabet size.

2 4 6 80

2

4

6

8

10

12

14

Tim

e (s

ec)

SODA (50 paths)SODA (200 paths)SODA (DP)SVMISO

Experimental results

3 y

2 4 6 80

10

20

30

40

Tes

t e

rro

r

SODA (50 paths)SODA (200 paths)SODA (DP)SVMISO

x

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

Sequence labeling: artificial data.

x

Page 43: Magic Moments: Moment-based Approaches to Structured Output Prediction

Experimental results

Chain CRF with HMM features ( ).

Adding constraints is not very useful when data are noisy and non linearly separable.

0 20 40 60 80 1000

20

40

60

80

100

Number of constraintsAve

rag

e n

um

ber

of c

orr

ect

hid

den

seq

uen

ces

(%)

Z-score (constr)SVMISOPerceptron

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

Sequence labeling: artificial data.

24 yx

Page 44: Magic Moments: Moment-based Approaches to Structured Output Prediction

Experimental results

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

Sequence labeling:

NER

Spanish news wire article - Special Session of CoNLL02

300 sentences with average length of 30 words.9 labels: non-name, beginning and continuation

of persons, organizations, locations and miscellaneous names.

Two sets of binary features: S1 (HMM features) and S2 (S1 and HMM features for the previous and the next word).

Labeling error on test set (5-fold crossvalidation)

Method S1 S2

Z-score 11.07 7.89

SODA 10.13 8.27

SVMISO 10.97 8.11

Perceptron 20.99 13.78

CRFs 12.01 8.29

Page 45: Magic Moments: Moment-based Approaches to Structured Output Prediction

Experimental results

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

Sequence alignment: artificial sequences.

5 10 20 50 100

SODA 78.6 62.85 44.6 36.7 30.84

Generative 96.4 94.39 87.12 45.31 31.05

Test error (number of incorrectly aligned pairs) as function of the training set size.

5 10 15 20

2

4

6

8

10

12

14

16

18

20

5 10 15 20

2

4

6

8

10

12

14

16

18

20

Original and reconstructed substitution matrices.

Page 46: Magic Moments: Moment-based Approaches to Structured Output Prediction

Experimental results

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

Sequence parsing:

G6 grammar in [Dowell and Eddy, 2004].RNA sequences of five families extracted from the Rfam database [Griffiths-Jones et al., 2003]

Prediction on five-fold crossvalidation.

Z-score with constraints Generative Perceptron

sensitivity specificity constraints sensitivity specificity sensitivity specificity

RF00032 100 95.98 2 100 95.53 100 95.59

RF00260 98.77 94.80 6 98.97 100 98.57 98.90

RF00436 91.11 90.61 27.6 44.16 53.30 90.27 86.53

RF00164 76.14 73.74 37.8 65.51 62.55 87.06 78.32

RF00480 99.08 89.89 78.2 99.88 86.43 98.83 94.78

Page 47: Magic Moments: Moment-based Approaches to Structured Output Prediction

Conclusions

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

New methods for learning in structured output spaces. Accuracy comparable with state-of-the-art techniques. Easy to implement (DP for matrix computations and simple optimization problem). Fast for large training set and reasonable number of features.

• Mean and variance computations parallelizable for large training set.• Conjugate gradient techniques used in the optimization phase.

Three application analyzed: sequence labeling, sequence parsing and sequence alignment.

Future works: Test the scalability of this approach using approximate techniques. Develop a dual version with kernels.

Page 48: Magic Moments: Moment-based Approaches to Structured Output Prediction

Learning in structured output spaces

Z-score

Experimental results and computational issues

Conclusions

Thank you