University of Alberta Letter-to-phoneme conversion Sittichai Jiampojamarn [email protected] CMPUT...

University of Alberta

Letter-to-phoneme conversion

Sittichai [email protected]

CMPUT 500 / HUCO 612September 26, 2007


Outline• Part I

– Introduction to letter-phoneme conversion

• Part II– Many-to-Many alignments and Hidden Markov Models to Letter-

to-phoneme conversion., NAACL 2007

• Part III– On-going work: discriminative approaches for letter-to-phoneme

conversion

• Part IV– Possible term projects for CMPUT 500 / HUGO 612


The task

• Converting words to their pronunciations– study -> [ s t ʌ d I ]– band -> [b æ n d ] – phoenix -> [ f i n I k s ]– king -> [ k I ŋ ]

• Words sequences of letters.• Pronunciations sequence of phonemes.

– Ignoring syllabifications, and stresses.


Why is it important?• Major component in speech synthesis systems

• Word similarity based on pronunciation– Spelling correction. (Toutanova and Moore, 2001)

• Linguistic interest of relationships between letters and phonemes.

• Not a trivial task, but tractable.


Trivial solutions ?

• Dictionary – searching answers on database– Great effort to construct such large lexicon database.– Can’t handle new words and misspellings.

• Rule-based approaches– Work well on non-complex languages– Fail on complex languages

• Each word creates its own rules. --- end up with remembering word-phoneme pairs.


John Kominek and Alan W. Black, “Learning Pronunciation Dictionaries: Language Complexity and Word Selection Strategies”, In proceeding of HLT-NAACL 2006, June 4-9, pp.232-239


Learning-based approaches

• Training data– Examples of words and their phonemes.

• Hidden structure– band [b æ n d ]

• b [b], a [æ], n [n], d [d]

– abode [ə b o d]• a [ə], b [b], o [o], d [d], e [ _ ]


Alignments

• To train L2P, we need alignments between letters and phonemes

a -> [ə]b -> [b]o -> [o]d -> [d]e -> [_]

a b o d e

ə b o d _


Overview standard process

Training data

1-1 alignerAligned

dataPhoneme prediction

pronunciation


Letter-to-phoneme alignments

• Previous work assumed one-to-one alignment for simplicity (Daelemans and Bosch, 1997; Black et al., 1998; Damper et al., 2005).

• Expectation-Maximization (EM) algorithms are used to optimize the alignment parameters.

• Matching all possible letters and phonemes iteratively until the parameters converge.


1-to-1 alignments• Initially, alignments parameters can start from uniform

distribution, or counting all possible letter-phoneme mapping. Ex. abode [ə b o d]

a b o d e

ə b o d_

a b o d e

ə b o d_

a b o d e

ə b o d_

a b o d e

ə b o d_

P(a, ə) = 4/5P(b,b) = 3/5…

a b o d e

ə b o d _


1-to-1 alignments• Find the best possible alignments based on current

alignment parameters.

a b o d e

ə b o d _

• Based on the alignments found, update the parameters.


Finding the best possible alignments

• Dynamic programming:– Standard weighted minimum edit distance algorithm style.

– Consider the alignment parameter P(l,p) is a mapping score component.

– Try to find alignments which give the maximum score.

– Allow to have null phonemes but not null letters• It is hard to incorporate null letters in the testing data


Visualizationa b o d e

_ b o də

_ b o də

_b o də

_b o də

_b o də

# a b o d e

#

ə

b

o

d


Visualization# a b o d e

#

ə

b

o

d

a b o d e

_ b o də

_ b o də

_b o də

_b o də

_b o də

a b o d e

_b o də


Problems with 1-to-1 alignments

• Double letters: two letters map to one phoneme. (e.g. ng [ŋ], sh [ʃ], ph [f])

k i n g

k i ŋ _

k i n g

k i ŋ_

k i n g

k i ŋ


Problem with 1-to-1 alignments

• Double phonemes: one letter maps to two phonemes. (e.g. x [k s], u [j u])

f u m e

f j u m

f u m e

f j u m

_

_

f u m e

f j u m _


Previous solutions for double phonemes

• Preprocess using a fix list of phonemes.– [k s] -> [X]– [j u] -> [U]

f u m e

f j u m

f u m e

f U m

f u m e

f U m _


Applying many-to-many alignments and Hidden Markov Models to Letter-to-Phoneme conversion

Sittichai Jiampojamarn, Grzegorz Kondrak and Tarek Sherif

Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-

HLT 2007), Rochester, NY, April 2007, pp.372-379.


Overview system

Training data

1-1 alignerAligned


pronunciation

M-M alignerChunking prediction

Local prediction

HMM

Phoneme prediction

Prediction process

Alignment process


Many-to-many alignments

• EM-based method.

• Extended from the forward-backward training of a one-to-one stochastic transducer (Ristad and Yianilos,

1998).

• Allow one or two letters to map to null, one, or two phonemes.


p h o e n i x

f

i

n

ɪ

k

s

#

# #

#


p h o e n i x

f i n ɪ k s



p h o e n i x

f

i

n

ɪ

k

s

#

# #

#

p h o e n i x

f i n ɪ k s


Prediction problem

• Should the prediction model generate phonemes from one or two letters ?

– gash [g æ ʃ ] gasholder [g æ s h o l d ə r]

g a sh

g æ ʃ

g a s

g æ s

h o l d e r

h o l d ə r


Letter chunking

• A bigram letter chunking prediction automatic discovers double letters.

Ex. longs

l ɒ ŋ z

l o ng s


Overview system

Training data

1-1 alignerAligned


pronunciation

M-M alignerChunking prediction

Local prediction

HMM

Phoneme prediction

Prediction process

Alignment process


Phoneme prediction• Once the training examples are aligned, we need a

phoneme prediction model.

• “Classification task” or “sequence prediction”?

P0

L0

P1 P2 P3

L1 L2 L3

#L0L1

L0L1L2

L1L2L3

L2L3#

P0

P1

P2

P3


Instance based learning• Store the training examples.

• The predicted class is assigned by searching the “most similar” training instance.

• The similarity functions: – Hamming distance, Euclidean distance, etc.

æ

Me!!

ɑ

Me!!

ə

Me!!

A

Who do I look like most?


Basic HMMs• A basic sequence-based prediction method.

• In L2P, – letters are observations– phonemes are states

• Output phoneme sequences depend on both emission and transition probabilities.


Applying HMM• Use an instance based learning to produce a list of

candidate phones with confidence values “conf(phonei)” for each letteri. (emission probability).

• Use a language model of phoneme sequence in the training data to obtain transition probability P(phonei | phonei-1, …phonei-n).


Visualization

b / b u / E r / r i / aI

i / I

e / _ d / d0.048 0.067 0.003

0.700

0.008

0.014

0.433

Conf( i / aI) = 0.714

Conf( i / I) = 0.286

Buried -> [ b E r aI d ] = 2.38 x 10-8 Buried -> [ b E r I d ] = 2.23 x 10-6


Evaluation• Data sets

– English: CMUDict (112K), Celex (65K).– Dutch: Celex (116K).– German: Celex (49K).– French: Brulex (27K).

• IB1 algorithm implemented in TiMBL package as the classifier.(W. Daelemans et al., 2004.)

• Results are reported in word accuracy rate based on 10-fold cross validation.


50

55

60

65

70

75

80

85

90

95

CMUDict Eng. Celex DutchCelex

GermanCelex

FrenchBrulex

Wo

rd a

cc

ura

cy

1-1 alignments 1-1 alignments + HMM M-M alignments


Messages

• Many-to-many alignments show significant improvements over one-to-one traditional alignments.

• HMM-like approach helps when a local classify has difficulty to predict phonemes.


Criticism

• Joint models– Alignments, chunking, prediction, and HMM.

• Error propagation– Errors from one model to other models which are

unlikely to re-correct.

• Can we combine and optimize at once ? Or at least allow the system to re-correct past errors ?


On-going work

Discriminative approach

for letter-to-phoneme conversion


Online discriminative learning

• Let x is an input word and y is an output phonemes.

• represents features describing x and y.

• is a weight vector for


Online training algorithm

1. Initially,

2. For k iterations1. For all letter-phoneme sequence pairs (x,y)

1.

2. update weights according to and


Perceptron update (Collins, 2002)

• Simple update training method.

• Try to move the weights to the direction of correct answers when predicting wrong answers.


Examples

• Separable case

Adapted from Dan Klein’s tutorial slides at NAACL 2007.


Examples

• Non-separable case

Adapted from Dan Klein’s tutorial slides at NAACL 2007.


Issues with Perceptron

• Overtraining: test / held-out accuracy usually rises, then falls.

• Regularization: – if the data isn’t separable, weights

often thrash around.

– Finds a “barely” separating solution

Taken from Dan Klein’s tutorial slides at NAACL 2007.


Margin Infused Relaxed Algorithm (MIRA) (Crammer and Singer, 2003)

• Use n-best list to update weights.

• separate by a margin at least as large as a loss function

• and keep the weight changes as small as possible.


Loss function in letter-to-phoneme

• Describe the loss of an incorrect prediction compared to the correct one.

• Word error (0/1), phoneme error, or combination.


Results

• Incomplete !!!– MIRA outperforms Perceptron.

– Using 0/1 loss and combination loss are better than the phoneme loss function alone.

– Overall, results show better performance than previous work.


Possible term projects


Possible term projects

1. Explore more linguistic features.

2. Explore machine translation systems for letter-to-phoneme conversion.

3. Unsupervised approaches for letter-to-phoneme conversion.

4. Other cool ideas to improve on a partial system– Data for evaluation are provided– Alignments are provided.– L2P model are provided.


Linguistic features• Looking for linguistic features to help L2P

– Most systems incorporate letter feature (n-gram) type in some ways.

• The new features (must) be obtained by using (only) word information.

• Works been already done– Syllabification : Susan’s thesis

• Find syllabification break on letters using SVM approach.


Machine translation approach

• L2P problem can be seen as a (simple) machine translation problem.

• Where, we’d like to translate letters to phonemes. – Consider: L2P MT

• Letters words• Words sentences• Phonemes target sentences

• Moses -- a baseline SMT system, ACL 2007– http://www.statmt.org/wmt07/baseline.html

– May need to also look at GIZA++, Pharaoh, Carmel, etc.


Unsupervised approaches

• Assuming, we don’t have examples of word-phoneme pairs to train a model.

• We can start from a list of possible letter-phoneme mappings

• Or assuming, we have a small set of example pairs (~100 pairs).

• Don’t expect to outperform the supervised approach but take advantage of being unsupervised methods


References• Collins, M. 2002. Discriminative training methods for hidden Markov models: theory and

experiments with perceptron algorithms. In Proceedings of the Acl-02 Conference on Empirical Methods in Natural Language Processing - Volume 10 Annual Meeting of the ACL. Association for Computational Linguistics, Morristown, NJ, 1-8

• Crammer, K. and Singer, Y. 2003. Ultraconservative online algorithms for multiclass problems. J. Mach. Learn. Res. 3 (Mar. 2003), 951-991.

• Kristina Toutanova and Robert C. Moore. 2001. “Pronunciation modeling for improved spelling correction”. In ACL’02: pp144-151, 2001.

• John Kominek and Alan W Black, “Learning Pronunciation Dictionaries Language Complexity and Word Selection Strategies”, NAACL06, pp. 232-239, 2006.

• Walter M. P. Daelemans and Antal P. J. van den Bosch. 1997. “Language-independent data-oriented grapheme-to-phoneme conversion.” In Progress in Speech Synthesis, pages 77.89. Springer, New York.

• Alan W Black, Kevin Lenzo, and Vincent Pagel. 1998. “Issues in building general letter to sound rules”. In The Third ESCA Workshop in Speech Synthesis, pages 77-80.


References• Robert I. Damper, Yannick Marchand, John DS. Marsters, and Alexander I. Bazin. 2005.

“Aligning text and phonemes for speech technology applications using an EM-like algorithm”, International Journal of Speech Technology, 8(2):147-160, June 2005.

• Eric Sven Ristad and Peter N. Yianilos. 1998. “Learning string-edit distance.” IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(5):522.532.

• Walter Daelemans, Jakub Zavrel, Ko Van Der Sloot, and Antal Van Den Bosch. 2004. “TiMBL: Tilburg Memory Based Leaner, version 5.1, reference guide.” In ILK Technical Report Series 04-02., 2004.

University of Alberta Letter-to-phoneme conversion Sittichai Jiampojamarn [email protected] CMPUT...

Documents

Transcript of University of Alberta Letter-to-phoneme conversion Sittichai Jiampojamarn [email protected] CMPUT...