December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis...

26
December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed

Transcript of December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis...

Page 1: December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.

December 2006

Cairo UniversityFaculty of Computers and Information

HMM Based Speech Synthesis

Presented byOssama Abdel-Hamid Mohamed

Page 2: December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.

2 HMM Based Speech Synthesis

Agenda

Speech SynthesisHMM Based Speech SynthesisProposed SystemChallenges

Page 3: December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.

3 HMM Based Speech Synthesis

Speech Synthesis

What is speech synthesis?– Generating human like speech using computers.

Applications– Text To Speech.

– Conversation systems.

– Speech to speech translation.

– Concept to speech.

Systems built since late 1970s.– MITTALK 1979

– Klattalk 1980

Page 4: December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.

4 HMM Based Speech Synthesis

Speech Synthesis, Cont.

Challenges:– Intelligibility.

– Naturalness.

– Pleasantness.

– Emotions.

Page 5: December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.

5 HMM Based Speech Synthesis

Speech Synthesis, Techniques

•Techniques

•Formant Based •Concatenative HMM Based

•Rule Based

•Difficult to make

•Machine Like

•Instance Based

•Based on corpus

•Better quality

•Not flexible

•Statistical Based

•Based on corpus

•Newest technique

•More flexible

Page 6: December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.

6 HMM Based Speech Synthesis

Agenda

Speech SynthesisHMM Based Speech SynthesisProposed SystemChallenges

Page 7: December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.

7 HMM Based Speech Synthesis

HMM Based Speech Synthesis Overview

HMM has been used successfully in speech recognition.

In Recogntion

In Speech Synthesis:

)|(maxarg* OPOO

)|(maxarg*

OP

Page 8: December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.

8 HMM Based Speech Synthesis

HMM Based Speech Synthesis Overview, Cont. Include delta and acceleration to get smooth

output

Page 9: December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.

9 HMM Based Speech Synthesis

The Overall System

Synthesis Part

Speech Database F0

ExtractionMel-Cepstral

Analysis

HMM Training

Models

Labels and context features

Text Analysis

Text

Text Analysis Parameters Generation

Labels and context features

Pulse or Noise Excitation

f0

MLSA filter Speech

Mel-cepstrum

Excitation

Mel-cepstrum

f0

Training Part

Page 10: December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.

10 HMM Based Speech Synthesis

The Overall System

Synthesis Part

Speech Database F0

ExtractionMel-Cepstral

Analysis

HMM Training

Models

Labels and context features

Text Analysis

Text

Text Analysis Parameters Generation

Labels and context features

Pulse or Noise Excitation

f0

MLSA filter Speech

Mel-cepstrum

Excitation

Mel-cepstrum

f0

Training Part

Modeled using MSD-HMM 25 Mel-Cepstral

Page 11: December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.

11 HMM Based Speech Synthesis

The Overall System

Synthesis Part

Speech Database F0

ExtractionMel-Cepstral

Analysis

HMM Training

Models

Labels and context features

Text Analysis

Text

Text Analysis Parameters Generation

Labels and context features

Pulse or Noise Excitation

f0

MLSA filter Speech

Mel-cepstrum

Excitation

Mel-cepstrum

f0

Training Part

Context Dependant Models

Each model 5 States

Page 12: December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.

12 HMM Based Speech Synthesis

The Overall System

Synthesis Part

Speech Database F0

ExtractionMel-Cepstral

Analysis

HMM Training

Models

Labels and context features

Text Analysis

Text

Text Analysis Parameters Generation

Labels and context features

Pulse or Noise Excitation

f0

MLSA filter Speech

Mel-cepstrum

Excitation

Mel-cepstrum

f0

Training Part

Page 13: December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.

13 HMM Based Speech Synthesis

The Overall System

Synthesis Part

Speech Database F0

ExtractionMel-Cepstral

Analysis

HMM Training

Models

Labels and context features

Text Analysis

Text

Text Analysis Parameters Generation

Labels and context features

Pulse or Noise Excitation

f0

MLSA filter Speech

Mel-cepstrum

Excitation

Mel-cepstrum

f0

Training Part

Each Frame is either voicedor unvoiced

Page 14: December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.

14 HMM Based Speech Synthesis

The Overall System

Synthesis Part

Speech Database F0

ExtractionMel-Cepstral

Analysis

HMM Training

Models

Labels and context features

Text Analysis

Text

Text Analysis Parameters Generation

Labels and context features

Pulse or Noise Excitation

f0

MLSA filter Speech

Mel-cepstrum

Excitation

Mel-cepstrum

f0

Training Part

Page 15: December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.

15 HMM Based Speech Synthesis

Advantages

1. Its voice characteristics can be easily modified,

2. It can be applied to various languages with little modification,

3. A variety of speaking styles or emotional speech can be synthesized using the small amount of speech data,

4. Techniques developed in ASR can be easily applied,

5. Its footprint is relatively small. An HMM based TTS system produced best

results in Blizzard challenge.

Page 16: December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.

16 HMM Based Speech Synthesis

Agenda

Speech SynthesisHMM Based Speech SynthesisProposed SystemChallenges

Page 17: December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.

17 HMM Based Speech Synthesis

Problems we tried to solve

1. Marking each frame as either voiced or unvoiced degrades quality, because there are some unvoiced components on most voiced speech parts, and there are mixed-excitation phonemes.

2. Used speech signal analysis / synthesis techniques and parameters degrades quality.

Page 18: December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.

18 HMM Based Speech Synthesis

Multi-Band Excitation

In MBE (Multi-Band Excitation) speech is divided into a number of frequency bands, and voicing is estimated in each band (used 17 bands).

Page 19: December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.

19 HMM Based Speech Synthesis

Mixed Excitation

In synthesis periodic and noise excitations are mixed according to voicing parameters

Page 20: December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.

20 HMM Based Speech Synthesis

Spectral Envelop Estimation

Find values for a fixed number of samples

Use sinusoidal model for synthesis

Page 21: December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.

21 HMM Based Speech Synthesis

Modified System

Synthesis Part

Speech Database F0

ExtractionSpectral Envelop

Analysis

HMM Training

Models

Labels and context features

Text Analysis

Text

Text Analysis Parameters Generation

Labels and context features

Spectral Envelop Samples

f0

Training PartBands Voicing

detectionBands Voicing

Noise + STFT filter

HarmonicsSynthesis

Bands Mixing

Spec. Env. Samples+ f0

Bands Voicing

Voiced Speech

Unvoiced Speech Speech

Page 22: December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.

22 HMM Based Speech Synthesis

Result

MOS scores

1

1.5

2

2.5

3

3.5

4

4.5

5

BaselineSystem

Baseline +MBE

ProposedSystem

Sc

ore

Page 23: December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.

23 HMM Based Speech Synthesis

Agenda

Speech SynthesisHMM Based Speech SynthesisProposed SystemChallenges

Page 24: December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.

24 HMM Based Speech Synthesis

Other Challenges

Speech is overly smoothed– Use global variance.

Modeling accuracy, the system uses same modeling as recognition.

– Hidden semi markov models (duration).

– Trajectory HMMs,

– Minimum Generation error training

– More states clusters and use acoustic context (under research).

Page 25: December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.

25 HMM Based Speech Synthesis

More States Clusters

Instead of computing one Gaussian per state, we store all occurrences. And record the context of each occurrence.

At synthesis we get the best sequence using dynamic programming.

Previous NextCurrent

Page 26: December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.

26 HMM Based Speech Synthesis

Thank You