AUTOMATIC PHONETIC ANNOTATION OF AN ORTHOGRAPHICALLY TRANSCRIBED SPEECH CORPUS Rui Amaral, Pedro...

AUTOMATIC PHONETIC ANNOTATIONOF AN ORTHOGRAPHICALLY TRANSCRIBED

SPEECH CORPUS

Rui Amaral, Pedro Carvalho, Diamantino Caseiro, Isabel Trancoso, Luís Oliveira

IST, Instituto Superior Técnico

INESC, Instituto de Engenharia de Sistemas e Computadores

Summary

• Motivation

• System Architecture– Module 1: Grapheme-to-phone converter (G2P)

– Module 2: Alternative transcriptions generator (ATG)

– Module 3: Acoustic signal processor

– Module 4: Phonetic decoder and aligner

• Training and Test Corpora

• Results– Transcription and alignment (Development phase)

– Test corpus annotation (Evaluation phase)

• Conclusions and Future Work

Motivation

• Time consuming, repetitive task ( over 60 x real time)

• Large corpora processing

• No expert intervention– Non-existence of widely adopted standard procedures

– Error prone

– Inconsistency's among human annotators

System Architecture

speech corpus

Orthographically transcribed

Acoustic signalprocessor

AlternativeTranscriptions

Generator

PhoneticDecoder/Aligner

RulesLexicon

Grapheme-to-PhoneConverter Phonetically annotated

speech corpus

- Module 1 -

Grapheme-to-Phone Converter

Modules of the Portuguese TTS system (DIXI)

• Text normalisation– Special symbols, numerals, abbreviations and acronyms

• Broad Phonetic Transcription– Careful pronunciation of the word pronunciation

– Set of 200 rules

– Small exceptions dictionary (364 entries)

– SAMPA phonetic alphabet

- Module 2 -

Alternative Transcriptions Generator

Transformation of phone sequences into lattices

• Based on optional rules:

– Which account for:

» Sandhi

» Vowel reduction

– Specified using finite-state-grammars and simple transduction operators

A (B C) D

Examples:

Type Text Broad P.T. Alternative P.T.

de uma [d@ um6] [djum6]sandhi with vowelquality change

mesmo assim [m"eZmu 6s"i~] [m"eZmw6s"I~]

de uma [d@ um6] [dum6]sandhi withvowel reduction

mesmo assim [m"eZmu 6s"i~] [m"eZm6s"i~]

semana [s@m"6n6] [sm"6n6]vowel reduction

oito ["ojtu] ["ojt]

restaurante [R@Stawr"6~t] [R@StOr"6~t]Alternative pronunciations viagens [vj"aZ6~j~S] [vj"aZe~S]

Phrase “vou para a praia.”

Canonical P.T. [v"o p6r6 6 pr"aj6]

Narrow P. T. (most freq.) [v"o pr"a pr"ai6]

= sandhi + vowel reduction

Rules:

DEF_RULE 6a, ( (6 NULL) (sil NULL) (6 a) )

DEF_RULE pra, ( p ("6 NULL) r 6 )

Lattice

rp "6 r 6 sil 6 sil p...

Example (rules application):

- Module 3 -

Acoustic Signal Processor

Extraction of acoustical signal characteristics

• Sampling: 16 kHz, 16 bits

• Parameterisation: MFCC (Mel - Frequency Cepstral Coefficients)

– Decoding: 14 coefficients, energy, 1st and 2nd order differences, 25 ms Hamming windows, updated every 10 ms.

– Alignment: 14 coefficients, energy, 1st and 2nd order differences, 16 ms Hamming windows, updated every 5 ms.

- Module 4 -

Phonetic Decoder and Aligner

Selection of the phonetic transcription which is closest to the utterance

• Viterbi algorithm

• 2 x 60 HMM models– Architecture

» left-to-right

» 3-state

» 3-mixture

NOTE: modules 3 and 4 use Hidden Markov Model Toolkit (Entropic Research Labs)

Training and Test Corpora

• Subset of the EUROM 1 multilingual corpus

– European Portuguese

– Collected in an anechoic room, 16 kHz, 16 bits.

– 5 male + 5 female speakers (few talkers)

– Prompt texts

» Passages: • Paragraphs of 5 related sentences

• Free translations of the English version of EUROM 1

• Adapted from books and newspaper text

» Filler sentences:• 50 sentences grouped in blocks of 5 sentences each

• Built to increase the numbers of different diphones in the corpus

– Manually annotated.

Training and Test Corpora (cont.)

Speaker Passages Phrases

1 O0 - O4 O5 - O9 P0 - P4 F5 - F9

2 O0 - O4 O5 - O9 P0 - 04 F0 - F4

3 P5 - P9 Q0 - Q4 Q5 - Q9 F5 - F9

4 P0 - P4 P5 - P9 Q0 - Q4 F5 - F9

5 O5 - O9 P0 - P4 P5 - P9 F0 - F4

6 P5 - P9 Q0 - Q4 Q5 - Q9 F5 - F9

7 O0 - O4 O5 - O9 P0 - P4 F0 - F4

8 Q0 - Q4 Q5 - Q9 R0 - R4 F0 - F4

9 R5 - R9 O0 - O4 O5 - O9 F5 - F9

10 Q5 - Q9 R0 - R4 R5 - R9 F5 - F9

Training Corpus

Test Corpus 1

Test Corpus 2

Passages:O0-O9, P0-P9: English translations

Q0-Q9, R0-R9: Books and newspaper text.

Filler sentences:F0-F9

Transcription AlignmentModels

Precision < 10ms Percentile 90%

HMM (transcription) 52,8 % 66,9 % 20 ms

HMM (alignment) 43 % 78,9 % 18 ms

Transcription and alignment results

• Transcription:– Precision = ((correct - inserted)/Total) x 100%

• Alignment:– % of cases in which the absolute error is < 10 ms

– average absolute error including 90 % of cases

Annotation strategies and Results

Transcription AlignmentModels

Precision < 10ms Percentile 90%

Strategy 1 85,3 % 77,4 % 20 ms

Strategy 2 85,8 % 44 % 29 ms

Strategy 3 85,8 % 78 % 19 ms

NOTE: Alignment evaluated only in places where the decoded sequence matched the manual sequence

Transcription Alignment

Strategy 1 HMM alignment HMM alignment

Strategy 2 HMM recognition HMM recognition

Strategy 3 HMM recognition HMM alignment

Annotation results - Transcription -

• Comments– Better precision achieved for canonical transcriptions of Test 2

– Highest global precision achieved in Test 1

– Successive application of the rules leads to a better precision

PrecisionRules

Test 1 Test 2

Canonical 74 % 76,9 %

Sandhi 77,1 % 79,4 %

Vowel reduction andalternative pronunciation

85,1 % 84,5 %

Annotation results - Alignment -

• Comments– Better alignment obtained with the best decoder

– Some problematic transitions: vowels, nasals vowels and liquids.

Alignment

Test 1 Test 2Rules

< 10 ms 90 % < 10 ms 90 %

Canonical 74,68 % 24 ms 75,18 % 25 ms

Sandhi 75,04 % 23 ms 75,41 % 24 ms

Vowel reduction andalternative pronunciations 78,76 % 19 ms 77,27 % 22 ms

Conclusions

• Better annotations results with:

– Alternative Transcriptions (comparatively to canonical).

– Use of different models for alignment and recognition

• About 84 % precision in transcription and 22 ms of

maximum alignment error for 90 % of the cases

Future Works

• Automatic rule inference – 1st Phase: comparison and selection of rules

– 2nd Phase: validation or phonetic-linguistic interpretation

• Annotation of other speech corpora to build better acoustic models

• Assignment of probabilistic information to the alternative pronunciations generated by rule

TOPIC ANNOTATION IN BROADCAST NEWS

Rui Amaral, Isabel Trancoso

IST, Instituto Superior Técnico

INESC, Instituto de Engenharia de Sistemas e Computadores

Preliminary work

• System Architecture– Two-stage unsupervised clustering algorithm

» nearest-neighbour search method

» Kullback-Leibler distance measure

– Topic language models

» smoothed unigrams statistics

– Topic Decoder

» based on Hidden Markov Models (HMM)

NOTE: topic models created with CMU Cambridge Statistical Language Modelling Toolkit

System Architecture

Topic Segmentationand Labelling

Topic ModelGeneration

Topic HMM

DECODING PHASE

N EWSPAPER T EXT C ORPUS

(TOPIC LABELED )

Process 1:

Process 2:

TRAINING PHASE

TM 1 TM kTM i

Topic annotated textsTexts

Selection &Filtering

Clustering

N EWSPAPER T EXT C ORPUS

(TOPIC UNLABELED )

Training and Test Corpora

• Subset of the BD_PUBLICO newspaper text corpus

– 20000 stories

– 6 month period (September 95 - February 96)

– topic annotated

– size between 100 and 2000 word

– normalised text

AUTOMATIC PHONETIC ANNOTATION OF AN ORTHOGRAPHICALLY TRANSCRIBED SPEECH CORPUS Rui Amaral, Pedro...

Documents

Transcript of AUTOMATIC PHONETIC ANNOTATION OF AN ORTHOGRAPHICALLY TRANSCRIBED SPEECH CORPUS Rui Amaral, Pedro...

Isabel Trancoso L2F, INESC-ID, FLUL/CLUL, ISCTE-IUL, …repositorio.ul.pt/bitstream/10451/31069/1/12101.pdf · Keywords: discourse markers, prosody, speech processing, multiclass

Russell Wood. Devoção e Escravidão. A Irmandade de Nossa Senhora do Rosário dos Pretos no Distrito Diamantino no Século XVIII by Julita Scarano.pdf

Aerosol Optical Depth measurements in the Azores Fernanda Carvalho 1 Diamantino Henriques 1 Paulo Fialho 2 Vera Bettencourt 1 1 Instituto de Meteorologia.

AEV Lab Guidelines - Spring 2015 · Format & Language Total Content Content ... Top/Right/Front View, and orthographically aligned 4 Name, ... in Lab 2 Programming Basics section

Construindo o futuro das energias renováveis BuiLding tHe ... · GARDunHA 114,0 266,5 sETEMBRo sEPTEMBEr 2008 TRAnCoso 28,0 65,9 MARço MArch 2008 Mosqu EIRos 8,0 19,7 MAIo MAy 2008

Diamantino Ferreira - RE/MAX Telheiras

Frederico Rodrigues and Isabel Trancoso INESC/IST, 2000 Robust Recognition of Digits and Natural Numbers.

POUSADA TUTABEL - TRANCOSO - room701.comroom701.com/sites/default/files/room701_factsheets_tutabel_27_02... · POUSADA TUTABEL Brazil tutabel.com.br POUSADA TUTABEL - TRANCOSO Get

Villa boheme à Trancoso

DESIGN 000 NAME Diamantino S - sublitex.com · NAME Porfido M M 050 061. DESIGN DESIGN NAME Full NAME Caracoles S S 063 065. DESIGN DESIGN NAME Valentino NAME Fiamme S S …

Queensland Future Climate - Water Modelling · Queensland Future Climate High-resolution climate projections in the Long Paddock Dr Ralph Trancoso A/P Jozef Syktus, Jacqui Willcocks,

Evolution of Computingresearch.ac.upc.edu/multiprog/multiprog2017/... · LightningTalk-Multiprog2017-v2 copy Author: Pedro Trancoso Created Date: 1/30/2017 3:26:18 PM ...

Trancoso Watershed Model/Flood Protection Project

· terial judaico, designadamente, as marcas de simbologia religiosa judaica e cristä ou cristã-nova gravadas em ombreiras de porta nos núcleos urbanos antigos de Estremoz e Trancoso.

Molecular Targeted Therapy of Cancer · workshops in Cancun, Mexico (2004), Trancoso, Brazil (2005), ... Fundación IVI, ISFP International Society for Fertility Preservation, ISMH

Long-term effects of synthetic versus analytic phonics ... · of 9–10 year olds, that the boys were slower to read lists of orthographically similar words where words with inconsistent

cto i Peïisarrnto Fiiosoíicri o-fj Portiguê subibliorum.ubi.pt/bitstream/10400.6/4159/4/2001... · A Enc. Judaica di-lo natural de Trancoso, mas Rodrigo Mendes Si'va e Miguel da

The Antonov An 225 Mriya - CAF Swiss Wing Antonov An‐225 Mriya Aircraft Click to advance. Specifications (An-225) Orthographically projected diagram of the An-225 Mriya. Data from

Frederico Rodrigues and Isabel Trancoso

Loki- Direct Geopositioning in Drone Mapping · 2018. 6. 1. · 80%). When using these techniques in aerial mapping, an orthographically corrected image mosaic (orthophoto) is also