Density-Driven Cross-Lingual Transfer of …rasooli/papers/emnlp_presentation...Density-Driven...

49
Density-Driven Cross-Lingual Transfer of Dependency Parsers Mohammad Sadegh Rasooli Michael Collins [email protected] Presented by Owen Rambow EMNLP 2015 Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

Transcript of Density-Driven Cross-Lingual Transfer of …rasooli/papers/emnlp_presentation...Density-Driven...

Density-Driven Cross-Lingual Transfer of

Dependency Parsers

Mohammad Sadegh Rasooli Michael Collins

[email protected]

Presented byOwen Rambow

EMNLP 2015

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

Motivation

Availability of treebanks

Accurate parsers use annotated treebanks.

There are no gold-standard treebanks for manylanguages.

Annotated treebanks are very expensive to create.

Common approach: using universal linguistic information

Without parallel data; e.g [Zhang and Barzilay, 2015]

With parallel data; e.g [Ma and Xia, 2014]

The best results but still lags behind supervised parsing

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

Motivation

Availability of treebanks

Accurate parsers use annotated treebanks.

There are no gold-standard treebanks for manylanguages.

Annotated treebanks are very expensive to create.

Common approach: using universal linguistic information

Without parallel data; e.g [Zhang and Barzilay, 2015]

With parallel data; e.g [Ma and Xia, 2014]

The best results but still lags behind supervised parsing

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

Projecting Dependencies from Parallel Data

Bitext

Prepare bitext

The political priorities must be set by this House and the MEPs . ROOT

Die politischen Prioritäten müssen von diesem Parlament und den Europaabgeordneten abgesteckt werden . ROOT

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

Projecting Dependencies from Parallel Data

Align

Align bitext (e.g. via Giza++)

The political priorities must be set by this House and the MEPs . ROOT

Die politischen Prioritäten müssen von diesem Parlament und den Europaabgeordneten abgesteckt werden . ROOT

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

Projecting Dependencies from Parallel Data

Parse

Parse source sentence with a supervised parser.

The political priorities must be set by this House and the MEPs . ROOT

Die politischen Prioritäten müssen von diesem Parlament und den Europaabgeordneten abgesteckt werden . ROOT

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

Projecting Dependencies from Parallel Data

Project

Project dependencies.

The political priorities must be set by this House and the MEPs . ROOT

Die politischen Prioritäten müssen von diesem Parlament und den Europaabgeordneten abgesteckt werden . ROOT

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

Projecting Dependencies from Parallel Data

Train

Train on the projected dependencies.

Die politischen Prioritäten müssen von diesem Parlament und den Europaabgeordneten abgesteckt werden . ROOT

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

Practical Problems

Most translations are not word-to-word.

Alignment errors!

Supervised parsers are not perfect.

Difference in syntactic behavior across languages.

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

Previous Results

MPH

11

ZB

15

MX

14

1st

-ord

Sh-R

70

75

80

85

90

95

71.34

75.476.67

dependency

acc

ura

cy(a

vg.

over

6EU

languages)

[McDonald et al., 2011]

[Zhang and Barzilay, 2015]

[Ma and Xia, 2014]

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

Previous work

Previous Results

MPH

11

ZB

15

MX

14

1st

-ord

Sh-R

70

75

80

85

90

95

71.34

75.476.67

84.29

87.5

dependency

acc

ura

cy(a

vg.

over

6EU

languages)

[McDonald et al., 2011]

[Zhang and Barzilay, 2015]

[Ma and Xia, 2014]

[McDonald et al., 2005]

[Rasooli and Tetreault, 2015]

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

Previous work Supervised models

8% lower than a first-order supervised model.

Our Approach

We define different sets of dense structures

Full treesDense partial trees

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

Dense Structures

A projected full tree t ∈ P100 is:

A projective dependency tree

All words have one parent

Die politischen Prioritäten müssen von diesem Parlament und den Europaabgeordneten abgesteckt werden . ROOT

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

Dense Structures

A partial tree t ∈ P80 is:

A projective dependency tree (a collection of projectivetrees)

At least 80% of words have one parent

Die politischen Prioritäten müssen von diesem Parlament und den Europaabgeordneten abgesteckt werden . ROOT

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

Dense Structures

A partial tree t ∈ P≥k is:

A projective dependency tree (a collection of projectivetrees)

There is at least one span of length≥ k where all words inthat span have one parent

Die politischen Prioritäten müssen von diesem Parlament und den Europaabgeordneten abgesteckt werden . ROOT

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

k=7 in the above tree

Our Contributions

We demonstrate the utility of dense projected structures.

We describe a training algorithm that builds on densestructures.

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

Our Contributions

MPH

11

ZB

15

MX

14

Our

Model

1st

-ord

Sh-R

70

75

80

85

90

95

71.34

75.476.67

84.29

87.5

dependency

acc

ura

cy(a

vg.

over

6EU

languages)

[McDonald et al., 2011]

[Zhang and Barzilay, 2015]

[Ma and Xia, 2014]

[McDonald et al., 2005]

[Rasooli and Tetreault, 2015]

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

Previous work Supervised models

Our Contributions

MPH

11

ZB

15

MX

14

Our

Model

1st

-ord

Sh-R

70

75

80

85

90

95

71.34

75.476.67

82.18

84.29

87.5

dependency

acc

ura

cy(a

vg.

over

6EU

languages)

[McDonald et al., 2011]

[Zhang and Barzilay, 2015]

[Ma and Xia, 2014]

Our model

[McDonald et al., 2005]

[Rasooli and Tetreault, 2015]

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

Previous work Supervised models

5.5% absolute improvement

Overview

The learning algorithm

Results

Analysis

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

Projecting Dependencies

Languages from Google universal treebank:

English (only as source), German, Spanish, French, Italian,Portuguese, and Swedish.English to German transfer data for developing ourmodels.

We use Giza++ intersected alignments on EuroParl data

We use the Yara parser [Rasooli and Tetreault, 2015], ashift-reduce beam parser.

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

Functions Used in Our Algorithm

We use the following function definitions:

Train(D)

CDECODE(P, θ)

TOP(D, θ)

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

Train(D)

Input D

A set of dependency trees (full trees)

Output θ

A parsing model

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

CDECODE(P, θ)

Input P

A set of partial dependency structures

Input θ

Parsing model

Output D

A set of full trees that are completely consistent with thedependencies in P.Filling in partial trees with dynamic oracles[Goldberg and Nivre, 2013].

Die politischen Prioritäten müssen von diesem Parlament und den Europaabgeordneten abgesteckt werden . ROOT

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

CDECODE(P, θ)

Input P

A set of partial dependency structures

Input θ

Parsing model

Output D

A set of full trees that are completely consistent with thedependencies in P.Filling in partial trees with dynamic oracles[Goldberg and Nivre, 2013].

Die politischen Prioritäten müssen von diesem Parlament und den Europaabgeordneten abgesteckt werden . ROOT

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

TOP(D, θ)

Input D

A set of full dependency trees

Input θ

Parsing model

Output ATop m highest scoring trees in D

We use m=200,000 in our experiments.

Score: Perceptron-based parse score normalized bysentence length

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

Definitions

A0 = P100

A1 = P≥7 ∪ P80

A2 = P≥5 ∪ P80

A3 = P≥1 ∪ P80

Note A1 ⊆ A2 ⊆ A3

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

Learning Algorithm

Train on full trees

θ0 = Train(A0)for i = 1...3 do

Di = CDECODE(Ai, θi−1)A′i = TOP(Di, θi−1)θi = Train(A0 ∪ A′i)

end forReturn θ3

Given definitions:

A0 = P100

A1 = P≥7 ∪ P80

A2 = P≥5 ∪ P80

A3 = P≥1 ∪ P80

Note A1 ⊆ A2 ⊆ A3

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

Learning Algorithm

Gradually decrease density

θ0 = Train(A0)for i = 1 ... 3 do

Di = CDECODE(Ai, θi−1)A′i = TOP(Di, θi−1)θi = Train(A0 ∪ A′i)

end forReturn θ3

Given definitions:

A0 = P100

A1 = P≥7 ∪ P80

A2 = P≥5 ∪ P80

A3 = P≥1 ∪ P80

Note A1 ⊆ A2 ⊆ A3

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

Learning Algorithm

Fill in partial trees

θ0 = Train(A0)for i = 1...3 do

Di = CDECODE(Ai, θi−1)A′i = TOP(Di, θi−1)θi = Train(A0 ∪ A′i)

end forReturn θ3

Given definitions:

A0 = P100

A1 = P≥7 ∪ P80

A2 = P≥5 ∪ P80

A3 = P≥1 ∪ P80

Note A1 ⊆ A2 ⊆ A3

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

Learning Algorithm

Select high-scoring trees

θ0 = Train(A0)for i = 1...3 do

Di = CDECODE(Ai, θi−1)A′i = TOP(Di, θi−1)θi = Train(A0 ∪ A′i)

end forReturn θ3

Given definitions:

A0 = P100

A1 = P≥7 ∪ P80

A2 = P≥5 ∪ P80

A3 = P≥1 ∪ P80

Note A1 ⊆ A2 ⊆ A3

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

Learning Algorithm

Train on the new set

θ0 = Train(A0)for i = 1...3 do

Di = CDECODE(Ai, θi−1)A′i = TOP(Di, θi−1)θi = Train(A0 ∪ A′i)

end forReturn θ3

Given definitions:

A0 = P100

A1 = P≥7 ∪ P80

A2 = P≥5 ∪ P80

A3 = P≥1 ∪ P80

Note A1 ⊆ A2 ⊆ A3

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

Learning Algorithm

Return the final model

θ0 = Train(A0)for i = 1...3 do

Di = CDECODE(Ai, θi−1)A′i = TOP(Di, θi−1)θi = Train(A0 ∪ A′i)

end forReturn θ3

Given definitions:

A0 = P100

A1 = P≥7 ∪ P80

A2 = P≥5 ∪ P80

A3 = P≥1 ∪ P80

Note A1 ⊆ A2 ⊆ A3

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

Overview

The learning algorithm

Results

Analysis

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

Two Settings

Scenario 1

Transfer from English.

Scenario 2 (voting)

The different languages vote on dependencies.

This scenario is true for cases such as Europarl.

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

Results on European Languages

German Spanish French Italian Portuguese Swedish

70

75

80

85

70.5

8

75.6

9 77.0

3

77.3

5

75.9

8

76.6

8

UA

S

θ0 (Full trees)

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

Results on European Languages

German Spanish French Italian Portuguese Swedish

70

75

80

85

70.5

8

75.6

9 77.0

3

77.3

5

75.9

8

76.6

8

74.3

2

78.1

7 79.9

1

79.4

6

79.3

8

82.1

1

UA

S

θ0 (Full trees) θ3 (Full+dense)

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

Results on European Languages

German Spanish French Italian Portuguese Swedish

70

75

80

85

70.5

8

75.6

9 77.0

3

77.3

5

75.9

8

76.6

8

74.3

2

78.1

7 79.9

1

79.4

6

79.3

8

82.1

1

79.6

8 80.8

6 82.7

2

83.6

7

82.0

7

84.0

6

UA

S

θ0 (Full trees) θ3 (Full+dense) θ3 (voting)

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

Results on European Languages (Comparison)

German Spanish French Italian Portuguese Swedish

75

80

85

74.3 7

5.5

3

76.5

3 77.7

4

76.6

5

79.2

7

79.6

8 80.8

6

82.7

2

83.6

7

82.0

7

84.0

6

81.6

5

83.9

2

83.5

1

85.4

7

85.6

7

85.5

9

UA

S

[Ma and Xia, 2014] θ3 (voting) Sup. MST-1st

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

Comparison to Previous Work

MPH

11

ZB

15

MX

14

θ0

θ3

θ3

(voti

ng)

1st

-ord

Sh-R

70

75

80

85

90

95

71.34

75.476.67

84.29

87.5

dependency

acc

ura

cy(a

vg.

over

6EU

languages)

[McDonald et al., 2011]

[Zhang and Barzilay, 2015]

[Ma and Xia, 2014]

[McDonald et al., 2005]

[Rasooli and Tetreault, 2015]

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

Previous work Supervised models

Comparison to Previous Work

MPH

11

ZB

15

MX

14

θ0

θ3

θ3

(voti

ng)

1st

-ord

Sh-R

70

75

80

85

90

95

71.34

75.476.67

75.88

78.89

84.29

87.5

dependency

acc

ura

cy(a

vg.

over

6EU

languages)

[McDonald et al., 2011]

[Zhang and Barzilay, 2015]

[Ma and Xia, 2014]

θ0 (full trees)

θ3 (full+dense)

[McDonald et al., 2005]

[Rasooli and Tetreault, 2015]

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

Previous work Supervised models

2.2% absolute improvement

Comparison to Previous Work

MPH

11

ZB

15

MX

14

θ0

θ3

θ3

(voti

ng)

1st

-ord

Sh-R

70

75

80

85

90

95

71.34

75.476.67

75.88

78.89

82.18

84.29

87.5

dependency

acc

ura

cy(a

vg.

over

6EU

languages)

[McDonald et al., 2011]

[Zhang and Barzilay, 2015]

[Ma and Xia, 2014]

θ0 (full trees)

θ3 (full+dense)

θ3 (voting)

[McDonald et al., 2005]

[Rasooli and Tetreault, 2015]

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

Previous work Supervised models

5.5% absolute improvement

Overview

The learning algorithm

Results

Analysis

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

Accuracy of Full Trees

The accuracy of full trees is high.

Voting increases the number of words per sentence,number of sentences and accuracy of full trees.

Setting English→target Voting

Sen# 17K 77KWord/sen 6.8 10.4Prec. vs supervised 84.7 89.0

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

Density of Partial Trees in Voting

The length and number of sentences are increased inpartial dense trees.

The accuracy of partial trees are lower than full trees.

Setting P100 P80 ∪ P≥7

Sen# 77K 243KDeps# 10.4 13.7Words/sen 10.4 27.6Density 100% 50%Prec. vs supervised 89.0 84.7

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

Accuracy across Different Languages

LanguageP100 P80 ∪ P≥7 Sup.

#sen words/sen #dep Prec. #sen words/sen #dep Prec.German 47K 8.2 8.2 91.4 75K 23.5 10.8 84.5 85.34Spanish 109K 12.1 12.1 89.2 346K 28.5 17.0 86.1 86.69French 78K 11.7 11.7 91.2 303K 29.9 14.9 87.4 86.24Italian 101K 12.4 12.4 87.9 301K 28.5 15.2 84.5 88.83Portuguese 39K 8.8 8.8 85.8 222K 30.3 12.4 81.3 89.44Swedish 86K 9.5 9.5 88.8 211K 25.2 12.2 84.2 88.06Average 77K 10.4 10.4 89.0 243K 27.6 13.7 84.7 87.50

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

Conclusion

We showed the utility of dense structures in projecteddependencies.

We showed a simple and effective learning method toutilize dense structures.

Our performance is very close to a supervised parser.

Future work:

Applying to a broader set of languages.Using this model to improve machine translation.

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

Thanks

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

References I

Goldberg, Y. and Nivre, J. (2013).Training deterministic parsers with non-deterministic oracles.TACL, 1:403–414.

Ma, X. and Xia, F. (2014).Unsupervised dependency parsing with transferring distribution via parallelguidance and entropy regularization.In Proceedings of the 52nd Annual Meeting of the Association for ComputationalLinguistics (Volume 1: Long Papers), pages 1337–1348, Baltimore, Maryland.Association for Computational Linguistics.

McDonald, R., Pereira, F., Ribarov, K., and Hajic, J. (2005).Non-projective dependency parsing using spanning tree algorithms.In Proceedings of the Conference on Human Language Technology andEmpirical Methods in Natural Language Processing, HLT ’05, pages 523–530,Stroudsburg, PA, USA. Association for Computational Linguistics.

McDonald, R., Petrov, S., and Hall, K. (2011).Multi-source transfer of delexicalized dependency parsers.In Proceedings of the 2011 Conference on Empirical Methods in NaturalLanguage Processing, pages 62–72, Edinburgh, Scotland, UK. Association forComputational Linguistics.

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers

References II

Rasooli, M. S. and Tetreault, J. (2015).Yara parser: A fast and accurate dependency parser.arXiv preprint arXiv:1503.06733.

Zhang, Y. and Barzilay, R. (2015).Hierarchical low-rank tensors for multilingual transfer parsing.In Conference on Empirical Methods in Natural Language Processing (EMNLP),Lisbon, Portugal.

Mohammad Sadegh Rasooli, Michael Collins Density-Driven Cross-Lingual Transfer of Dependency Parsers