Context Free Grammar

60
MT Study Group - Context Free Grammar - 2015/06/18 AHCLab M1 Makoto Morishita You can download my slides from http://goo.gl/uO2NVU

Transcript of Context Free Grammar

MT Study Group - Context Free Grammar -

2015/06/18 AHCLab M1 Makoto Morishita

You can download my slides from http://goo.gl/uO2NVU

Slides http://goo.gl/uO2NVU

Quick Overview of Tree-to-String Machine Translation

Why we use a Tree-to-String Machine Translation

Phrase-Based Machine Translation

4

English ↔ French→ Similar word order & vocabulary

English ↔Japanese→ Different word order & vocabulary

Why we use a Tree-to-String Machine Translation

Tree-to-String Machine Translation

5

English ↔ French→ Similar word order & vocabulary

English ↔Japanese→ Different word order & vocabulary

How Tree-to-String works

6

友達とご飯を食べたString

TreeVP

PP VP

I ate a meal with a friendString

友達 と ご飯 を 食べ た

N P PP VP

N P V SUF

a friendx1 with x0

a meal

x1 x0ate

How Tree-to-Tree works

7

he visited the white houseString

TreeS

NPVP

PRP VBD

NP

DT NNP NNP

he visited the white house

彼 は ホワイト ハウス を 訪問 したString

TreeS

NP

VP

N P

PP

P N

彼 は ホワイト ハウス を 訪問 したN N V

PPNP VP

Tree-to-String Pros and Cons

Pros • Good between

different word order & vocabulary languages

Cons • Need a lot of time to translate • Translation accuracy is depend on

Parser result

8

You can try Tree-to-String Machine Translation (Travatar) http://ahclab.naist.jp/travatar/translator/

Context Free Grammar

Motivation

Context Free Grammar (CFG) is used to…

• Parsing

• Tree-to-String Machine Translation

11

CFG is composed by…

• Terminal characters

• Non-terminal characters

• Start variable

• Rules

• Weight of rules

12

X

N

R

S

A

CFG Example

Rules • S → NP VP | VP

• VP → NP V | PP V | VP NP

• NP → PP NP | N P

• PP → N P

• P → が | は | の | に | を

• V → 開けた | 座った

• N → 犬 | ドア | 本

13

Non-Terminal • S, VP, NP, PP, P, V, N

Terminal •が, は, の, に, を, 開けた, 座った, 犬, ドア, 本

Start Variable • S

CFG Example

Rules • S → NP VP | VP

• VP → NP V | PP V | VP NP

• NP → PP NP | N P

• PP → N P

• P → が | は | の | に | を

• V → 開けた | 座った

• N → 犬 | ドア | 本

14

Derivation S ⇒NP VP

⇒N P VP

⇒ 犬 P VP

⇒ 犬 P VP

⇒ 犬 が VP

⇒ 犬 が NP V

⇒ 犬 が N P V

⇒ 犬 が ドア P V

⇒ 犬 が ドア を V

⇒ 犬 が ドア を 開けた

Derivation Tree

CFG Example

15

S

VP

N

犬 が ドア を 開けた

NP

P NP V

N P

Derivation S ⇒NP VP

⇒N P VP

⇒ 犬 P VP

⇒ 犬 P VP

⇒ 犬 が VP

⇒ 犬 が NP V

⇒ 犬 が N P V

⇒ 犬 が ドア P V

⇒ 犬 が ドア を V

⇒ 犬 が ドア を 開けた

Parsing

Parsing

CKY Algorithm (a.k.a. CYK Algorithm) • One of the most famous algorithm of parsing

• Grammar is supposed to Chomsky Normal Form (CNF)

17

Chomsky Normal Form (CNF)

• Right side of the rule has 2 non-terminal characters or 1 terminal character

18

S → NP VP S → PRP VP VP → VBD NP

PRP → “I” VBD → “saw” DT → “a”

VP → VBD NP PP NP → NN NP → PRP

CKY Algorithm Example

• Expand terminal characters with scores

19

I saw him

PRP0,1

NP0,1

VP1,2

VBD1,2

PRP2,3

NP2,31.0 0.5 3.2 1.4 2.4 2.6

CKY Algorithm Example

• Expand all nodes of 0,2

20

I saw him

PRP0,1

NP0,1

VP1,2

VBD1,2

PRP2,3

NP2,3

SBAR0,2

S0,2 5.34.7

1.0 0.5 3.2 1.4 2.4 2.6

CKY Algorithm Example

• Expand all nodes of 1,3

21

I saw him

PRP0,1

NP0,1

VP1,2

VBD1,2

PRP2,3

NP2,3

VP1,3

SBAR0,2

S0,2 5.05.34.7

1.0 0.5 3.2 1.4 2.4 2.6

CKY Algorithm Example

• Expand all nodes of 0,3

22

I saw him

PRP0,1

NP0,1

VP1,2

VBD1,2

PRP2,3

NP2,3

VP1,3

SBAR0,2

S0,2

SBAR0,3

S0,36.1 5.9

5.05.34.7

1.0 0.5 3.2 1.4 2.4 2.6

CKY Algorithm Example

• Find S which is cover all sentence, and expand all edges

23

I saw him

PRP0,1

NP0,1

VP1,2

VBD1,2

PRP2,3

NP2,3

VP1,3

SBAR0,2

S0,2

SBAR0,3

S0,36.1 5.9

5.05.34.7

1.0 0.5 3.2 1.4 2.4 2.6

CKY Algorithm Example

• Find S which is cover all sentence, and expand all edges

24

I saw him

PRP0,1

NP0,1

VP1,2

VBD1,2

PRP2,3

NP2,3

VP1,3

SBAR0,2

S0,2

SBAR0,3

S0,36.1 5.9

5.05.34.7

1.0 0.5 3.2 1.4 2.4 2.6

CKY Algorithm Example

• Result

25

I saw him

NP0,1

VBD1,2

NP2,3

VP1,3

S0,3 5.9

5.0

0.5 1.4 2.6

Another expression of tree

• By using deduction system, tree is expressed in this way

26

開けた[V, 4, 5]

犬[N, 0, 1]

が[P, 0, 1]

[NP, 0, 2]

ドア[N, 2, 3]

を[P, 3, 4]

[NP, 2, 4]

[VP, 2, 5]

[S, 0, 5]

S

VP

N

犬 が ドア を 開けた

NP

P NP V

N P

Hyper-Graph

Hyper-Graph

Suppose that there are 2 trees

28

S

VP

N

犬 が ドア を 開けた

NP

P NP V

N P

Tree 1 Tree 2

S

VP

N

犬 が ドア を 開けた

NP

P PP V

N P

Hyper-Graph

Almost the same

29

S

VP

N

犬 が ドア を 開けた

NP

P NP V

N P

Tree 1 Tree 2

S

VP

N

犬 が ドア を 開けた

NP

P PP V

N P

Hyper-Graph

If left only the same edges

30

S

VP

N

犬 が ドア を 開けた

NP

P V

N P

Hyper-Graph

Add edges which exists only Tree 1

31

S

VP

N

犬 が ドア を 開けた

NP

P V

N P

NP

Hyper-Graph

Add edges which exists only Tree 2

32

S

VP

N

犬 が ドア を 開けた

NP

P V

N P

PP

Hyper-Graph

Add edges which exists Tree 1 and Tree 2

33

S

VP

N

犬 が ドア を 開けた

NP

P V

N P

NP PP

If select blue, tree will be Tree 1 If select orange, tree will be Tree 2

This is also called as a Parse Forest

• Vertexes

• Hyper-edges

• Target Vertex

• Weight of edges

Weighted Acyclic Directed Hyper-Graphis composed by…

34

A

V

E

t

Hyper-edge

35

e 2 E =< tails(e), head(e),!(e) >

• tails(e)

• head(e)

• ω(e)

➡ list of start points

➡ end point

➡ weight of the “e”

Hyper-edge

36

• in(e)

• out(e)

➡ set of hyper-edges which go to v

➡ set of hyper-edges which go from v

{e 2 E|head(e) = v}

{e 2 E|9v 2 tails(e)}

Another expression of Hyper-Graph

Directed Hyper-Graph can be Directed and/or Graph

• all vertex → or-vertex

• all hyper-edge → and-vertex

37

Another expression of Hyper-GraphS0,5

VP2,5

N0,1

犬0,1 が1,2ドア2,3 を3,4 開けた4,5

NP0,2

P1,2 NP2,4 V4,5

N2,3 P3,4

PP2,4

∧ ∧

∧ ∧

N∨2,3 P∨3,4

NP∨2,4PP∨2,4

∧ ∧

V∨4,5

VP∨2,5

S∨0,5

NP∨0,2

∧ ∧

N∨0,1 P∨1,2

犬∨0,1 が∨1,2 ドア∨2,3 を∨3,4 開けた∨4,5

38

Why we use a Hyper-Graph

• Hyper-Graph can express various parse tree easily

39

Semiring Parsing

Semiring

Examples of semiring

41

Boolean {0, 1} ∨ ∧ 0 1

Real + × 0 1

Tropical max + -∞ 0

LogProb logsumexp + -∞ 0

LogReal {-, +}×aaaaa + LogReal

× LogReal <+, -∞> <+, 0>

A � ⌦ 0 1

R1�1

R1�1

R0�1

R1�1

Semiring Parsing

• : all derivations of hyper-graph G

• : set of hyper-edge

• : weight of d

• : sum of weight G

42

D(G)

d 2 D(G)

!(d) = ⌦e2d!(e)

!(G) = �d2D(G) ⌦e2d !(e)

= �d2D(G)!(d)

Real, LogReal Semiring → sum of weightTropical Semiring → weight of Viterbi derivation

Semiring Parsing

• Finding all derivation is difficult. ➡ inside-outside algorithm

43

!(d) = ⌦e2d!(e)

!(G) = �d2D(G) ⌦e2d !(e)

= �d2D(G)!(d)

K-Best

K-Best (a.k.a. N-Best)

Parse Forest (with weight)

What is the best tree? (1-best) What is the second-best tree? (2-best) What is the k-best tree? (k-best)

45

S0,5

VP2,5

N0,1

NP0,2

P1,2 V4,5NP2,4 PP2,4N1,2

K-Best

Left-side:{<<{N0,1, P1,2}, NP0,2>, (1,1)>, <<{N0,1, N1,2}, NP0,2>, (1,1)>, …}

Right-side:{<<{NP2,4, V4,5}, VP2,5>, (1,1)>, <<{PP2,4, V4,5}, VP2,5>, (1,1)>, …}

46

1-best 2-best

1-best 2-best

S0,5

VP2,5

N0,1

NP0,2

P1,2 V4,5NP2,4 PP2,4N1,2

Left-side:{<<{N0,1, P1,2}, NP0,2>, (1,1)>, <<{N0,1, N1,2}, NP0,2>, (1,1)>, …}

Right-side:{<<{NP2,4, V4,5}, VP2,5>, (1,1)>, <<{PP2,4, V4,5}, VP2,5>, (1,1)>, …}

47

1-best 2-best

1-best 2-best

Derivation <<{NP0,2, VP2,5}, S0,5>, (1,2)>

S0,5

VP2,5

N0,1

NP0,2

P1,2 V4,5PP2,4

S0,5

VP2,5

N0,1

NP0,2

P1,2 V4,5NP2,4 PP2,4N1,2

K-Best

• D(v) = {D1(v),…,Dk(v)} : K-Best list of v

• Finding D(root(G)) is difficult ‣ We need find k-best of every nodes,

and sort… ‣ It’s a heavy calculation…

➡ There is efficient algorithm!

48

K-Best

• D(NP0,2)={-0.8, -1.6, -2.4, -3.2} • D(VP2,5)={-0.5, -1.5, -2.5} • ω(<{NP0,2,VP2,5}, S0,5>)=-0.3

49

-0.5 -1.5 -2.5

-0.8 -1.6 -2.6 -3.6

-1.6 -2.4 -3.4 -4.4

-2.4 -3.2 -4.2 -5.2

-3.2 -4.0 -5.0 -6.0

D(VP2,5)

D(NP0,2)

S0,5

VP2,5

N0,1

NP0,2

P1,2 V4,5NP2,4 PP2,4N1,2

• D(NP0,2)={-0.8, -1.6, -2.4, -3.2} • D(VP2,5)={-0.5, -1.5, -2.5} • ω(<{NP0,2,VP2,5}, S0,5>)=-0.3

50

-0.5 -1.5

-0.8 -1.6 -2.6

-1.6 -2.4

D(VP2,5)

D(NP0,2)

1-best

cand(v)

-0.5 -1.5

-0.8 -1.6 -2.6

-1.6 -2.4 -3.4

-2.4 -3.2

D(VP2,5)

D(NP0,2)

2-best

cand(v)

51

-0.5 -1.5

-0.8 -1.6 -2.6

-1.6 -2.4

D(VP2,5)

D(NP0,2)

1-best

cand(v)

-0.5 -1.5

-0.8 -1.6 -2.6

-1.6 -2.4 -3.4

-2.4 -3.2

D(VP2,5)

D(NP0,2)

2-best

cand(v)

-0.5 -1.5 -2.5

-0.8 -1.6 -2.6 -3.6

-1.6 -2.4 -3.4

-2.4 -3.2

D(VP2,5)

D(NP0,2)

3-best

cand(v)

We can omit a lot of calculation

Conclusion

Why we use a Tree-to-String Machine Translation

Tree-to-String Machine Translation

53

English ↔ French→ Similar word order & vocabulary

English ↔Japanese→ Different word order & vocabulary

How Tree-to-String works

54

友達とご飯を食べたString

TreeVP

PP VP

I ate a meal with a friendString

友達 と ご飯 を 食べ た

N P PP VP

N P V SUF

a friendx1 with x0

a meal

x1 x0ate

Derivation Tree

CFG Example

55

S

VP

N

犬 が ドア を 開けた

NP

P NP V

N P

Derivation S ⇒NP VP

⇒N P VP

⇒ 犬 P VP

⇒ 犬 P VP

⇒ 犬 が VP

⇒ 犬 が NP V

⇒ 犬 が N P V

⇒ 犬 が ドア P V

⇒ 犬 が ドア を V

⇒ 犬 が ドア を 開けた

CKY Algorithm Example

• Find S which is cover all sentence, and expand all edges

56

I saw him

PRP0,1

NP0,1

VP1,2

VBD1,2

PRP2,3

NP2,3

VP1,3

SBAR0,2

S0,2

SBAR0,3

S0,36.1 5.9

5.05.34.7

1.0 0.5 3.2 1.4 2.4 2.6

Hyper-Graph

Add edges which exists Tree 1 and Tree 2

57

S

VP

N

犬 が ドア を 開けた

NP

P V

N P

NP PP

If select blue, tree will be Tree 1 If select orange, tree will be Tree 2

This is also called as a Parse Forest

K-Best

Parse Forest (with weight)

What is the best tree? (1-best) What is the second-best tree? (2-best) What is the k-best tree? (k-best)

58

S0,5

VP2,5

N0,1

NP0,2

P1,2 V4,5NP2,4 PP2,4N1,2

Reference

• COLING2012 参加報告(その3)– 木構造に基づく機械翻訳 –,中澤 敏明, http://www.anlp.jp/doc/IC/ICreportv02/coling2012-3.pdf

• ALAGIN 機械翻訳セミナー 統語情報に基づく機械翻訳, Graham Neubig, http://www.phontron.com/slides/alagin2014-syntax.pdf

• 機械翻訳, Graham Neubig, http://www.phontron.com/slides/neubig-alagin-20130117.pdf

59

Questions & Comments