Design and Implementation of Voice Conversion Application (VOCAL)

54
Design and Design and Implementation of Implementation of Voice Conversion Voice Conversion Application (VOCAL) Application (VOCAL) Elizabeth Kwan (26406025) Supervised by: Ms. Liliana, M.Eng Mr. Resmana Lim, M.Eng

description

Design and Implementation of Voice Conversion Application (VOCAL). Elizabeth Kwan (26406025) Supervised by: Ms. Liliana, M.Eng Mr. Resmana Lim, M.Eng. A method to transform the input speech signal such that the output signal will be perceived as produced by another speaker. ?. - PowerPoint PPT Presentation

Transcript of Design and Implementation of Voice Conversion Application (VOCAL)

Page 1: Design and Implementation of Voice Conversion Application (VOCAL)

Design and Implementation Design and Implementation of Voice Conversion of Voice Conversion Application (VOCAL)Application (VOCAL)

Elizabeth Kwan (26406025)

Supervised by:Ms. Liliana, M.Eng

Mr. Resmana Lim, M.Eng

Page 2: Design and Implementation of Voice Conversion Application (VOCAL)

DEFINITIONDEFINITIONWhat is Voice Conversion???

A method to transform the input speech signal such that the output signal will be perceived as produced by another speaker

Page 3: Design and Implementation of Voice Conversion Application (VOCAL)

BACKGROUNDBACKGROUNDWhy Voice Conversion???

Rapid development in speech technology

Speech recognition and text-to-speech have been the priorities in research efforts to improve human-machine (computer) interaction

Improve the naturalness of human-machine (computer) interaction

Voice conversion used in personification of speech enabled system

Page 4: Design and Implementation of Voice Conversion Application (VOCAL)

SCOPE & LIMITATIONSCOPE & LIMITATIONScope and limitation of project??

GENERAL :Format : wave file (.wav), single channel (mono)

INPUT :Source speaker and target speaker which speaks same utterances

Home recording

One person with minimal noise (no background sound)

For speech only

Page 5: Design and Implementation of Voice Conversion Application (VOCAL)

SCOPE & LIMITATIONSCOPE & LIMITATIONScope and limitation of project??

PROCESS :Not real-time, pre-record speech needed

Text-dependent

OUTPUTOutput signal will be perceived as produced by another speaker, judge by subjectivity of human auditory perception

Dialect not included

Page 6: Design and Implementation of Voice Conversion Application (VOCAL)

SCOPE & LIMITATIONSCOPE & LIMITATIONScope and limitation of project??

Test using Mean Opinion Score (MOS)

Developed in .NET environment (C# .NET Visual Studio 2005)

Page 7: Design and Implementation of Voice Conversion Application (VOCAL)

VOICE CONVERSION METHODVOICE CONVERSION METHODBrief explanation on Voice Conversion??

Difference system conversion used difference methods

General system:A method to represent the speaker specific characteristics of the speech waveform

A method to map the source and the target acoustical spaces

A method to modify the characteristics of the source speech using the mapping obtained in previous step

Page 8: Design and Implementation of Voice Conversion Application (VOCAL)

VOICE CONVERSION METHODVOICE CONVERSION METHODPage 33??

SOUND A(source speaker)

SOUND B(target speaker)

segmentation

segmentation

A(i)

B(i)

A(n)

B(n)

A(i)

B(i)

resample

resample

LPC

LPC

Inverse Filter

Apply Filter to Excitation

Excitation

Filter

Pitch Period Computation

Pitch Period Computation

Pitch Replacement

SynthesisC(i)

C(i)

C(n)

Window Combination

SOUND C(converted)

Page 9: Design and Implementation of Voice Conversion Application (VOCAL)

VOICE CONVERSION METHODVOICE CONVERSION METHODMain Process (Flow Chart see Page 30)??

SEGMENTATION

ANALYSIS or MODELING

TRANSFORMATION

SYNTHESIS

Page 10: Design and Implementation of Voice Conversion Application (VOCAL)

WHY IT IS DIFFICULT?WHY IT IS DIFFICULT?External Problems??

Complexity of human language

Speech is more than sequences of phones that forms words and sentences. It carries information (rhythm, intonation, stress of words, etc)

This information is varied from one person to the others

The infinite variety raised the application complexity, especially in segmentation

Page 11: Design and Implementation of Voice Conversion Application (VOCAL)

WHY IT IS DIFFICULT?WHY IT IS DIFFICULT?External Problems??

Speaker Variability

Unique voice. Speech generated from one person may varied too- Realization- Speaking style- Sex of speaker- Anatomy of vocal tract- Speed of speech- Dialects

Page 12: Design and Implementation of Voice Conversion Application (VOCAL)

WHY IT IS DIFFICULT?WHY IT IS DIFFICULT?Internal Problems??

Digital form only contains information of amplitude per periods

Amplitude can not directly used to determined the speech parameters (problems for analysis process)

Manipulate (add or delete) some part of the sound would effect to whole sound

Page 13: Design and Implementation of Voice Conversion Application (VOCAL)

VOICE CONVERSION METHODVOICE CONVERSION METHODMain Process (Flow Chart see Page 30)??

SEGMENTATION

ANALYSIS or MODELING

TRANSFORMATION

SYNTHESIS

Page 14: Design and Implementation of Voice Conversion Application (VOCAL)

SEGMENTATIONSEGMENTATIONFlow Chart see Page 34??

It is difficult to process entire phrase as tone, pitch, and other characteristics may diverse over the whole signal

Split base on syllable

Use end-point detection methods, combination of volume (two volume threshold) and zero-crossing rate (ZCR)

Page 15: Design and Implementation of Voice Conversion Application (VOCAL)

SEGMENTATIONSEGMENTATIONFlow Chart see Page 34??

VolumeLoudness of audio signal

Zero-Crossing Rate (ZCR)Rate where signal change from positive to negative, and vise versa

n

iiSvolume

1

Page 16: Design and Implementation of Voice Conversion Application (VOCAL)

SEGMENTATIONSEGMENTATIONFlow Chart see Page 34??

Page 17: Design and Implementation of Voice Conversion Application (VOCAL)

VOICE CONVERSION METHODVOICE CONVERSION METHODMain Process (Flow Chart see Page 30)??

SEGMENTATION

ANALYSIS or MODELING

TRANSFORMATION

SYNTHESIS

Page 18: Design and Implementation of Voice Conversion Application (VOCAL)

ANALYSIS OR MODELINGANALYSIS OR MODELINGMain Process (Flow Chart see Page 36)??

ANALYSIS or MODELING

Linear Predictive Coding

Pitch Period Computation

Page 19: Design and Implementation of Voice Conversion Application (VOCAL)

ANALYSIS OR MODELINGANALYSIS OR MODELINGMain Process (Flow Chart see Page 36)??

ANALYSIS or MODELING

Linear Predictive Coding

Pitch Period Computation

Page 20: Design and Implementation of Voice Conversion Application (VOCAL)

ANALYSIS OR MODELINGANALYSIS OR MODELINGModeling Vocal Tract??

Page 21: Design and Implementation of Voice Conversion Application (VOCAL)

Source : signal x(t) [excitation signal]

Filter : linear time invariant h(t) [transfer function]

Speech : convolution of source and filter y(t) = x(t) * h(t)

ANALYSIS OR MODELINGANALYSIS OR MODELINGModeling Vocal Tract??

Page 22: Design and Implementation of Voice Conversion Application (VOCAL)

De-convolution needed

Use of LPC methodspredicting a sample of a speech signal based on several previous samples

ANALYSIS OR MODELINGANALYSIS OR MODELINGModeling Vocal Tract??

p

kk knsas

1

][ˆ

Page 23: Design and Implementation of Voice Conversion Application (VOCAL)

ANALYSIS OR MODELINGANALYSIS OR MODELINGLinear Predictive Coding??

Page 24: Design and Implementation of Voice Conversion Application (VOCAL)

VOICE CONVERSION METHODVOICE CONVERSION METHODMain Process (Flow Chart see Page 36)??

ANALYSIS or MODELING

Linear Predictive Coding

Pitch Period Computation

Page 25: Design and Implementation of Voice Conversion Application (VOCAL)

VOICE CONVERSION METHODVOICE CONVERSION METHODMain Process (Flow Chart see Page 36)??

Pitch Period Computation

Pitch Analysis

Glottal Pulse Computation

Pitch Tier Computation

Page 26: Design and Implementation of Voice Conversion Application (VOCAL)

Pitch AnalysisBased on autocorrelation methods (Boersma 1993)

ANALYSIS OR MODELINGANALYSIS OR MODELINGPitch Period Computation??

Page 27: Design and Implementation of Voice Conversion Application (VOCAL)

Glottal Pulse ComputationRepeated pattern of voiced sound

τ : glottal pulse

ANALYSIS OR MODELINGANALYSIS OR MODELINGPitch Period Computation??

Page 28: Design and Implementation of Voice Conversion Application (VOCAL)

Pitch Tier Calculationtotal points according to total voiced frames from pitch contour obtained from previous step

ANALYSIS OR MODELINGANALYSIS OR MODELINGPitch Period Computation??

Page 29: Design and Implementation of Voice Conversion Application (VOCAL)

VOICE CONVERSION METHODVOICE CONVERSION METHODMain Process (Flow Chart see Page 30)??

SEGMENTATION

ANALYSIS or MODELING

TRANSFORMATION

Synthesis

Page 30: Design and Implementation of Voice Conversion Application (VOCAL)

TRANSFORMATIONTRANSFORMATIONTransform speech parameter obtained??

TransformationExtract pitch dari target chunk (target

chunk mula-mula sebelum di resample)

Extract pitch dari source yang sudah difilter

Replace pitch

return

Page 31: Design and Implementation of Voice Conversion Application (VOCAL)

SYNTHESISSYNTHESISMain Process (Flow Chart see Page 30)??

SEGMENTATION

ANALYSIS or MODELING

TRANSFORMATION

SYNTHESIS

Page 32: Design and Implementation of Voice Conversion Application (VOCAL)

SYNTHESISSYNTHESISFlow Chart see Page 46??

Use of LPC Filter method to reconstruct transformed speech

Page 33: Design and Implementation of Voice Conversion Application (VOCAL)

EXPERIMENTAL RESULTEXPERIMENTAL RESULT??

Page 34: Design and Implementation of Voice Conversion Application (VOCAL)

TESTINGTESTINGEffect of choice of hardware used to record??

Microphone :Soundcard :

Phillips PC Headset (SHM7410U/1)Realtek HD Audio

Page 35: Design and Implementation of Voice Conversion Application (VOCAL)

TESTINGTESTINGEffect of choice of hardware used to record??

Microphone :Soundcard :

Shure Beta 58Realtek HD Audio

Page 36: Design and Implementation of Voice Conversion Application (VOCAL)

TESTINGTESTINGEffect of choice of hardware used to record??

Microphone :Soundcard :

Shure Beta 58EMU0404

Page 37: Design and Implementation of Voice Conversion Application (VOCAL)

TESTINGTESTINGTest on segmentation??

Speech : “Hai” from 4 difference speakers

Page 38: Design and Implementation of Voice Conversion Application (VOCAL)

Speech : “Hai” from 4 (four) difference speakers

TESTINGTESTINGTest on segmentation??

Page 39: Design and Implementation of Voice Conversion Application (VOCAL)

Speech : “Hai” from 4 (four) difference speakers

Percentage result:For speech with only 1 (one) syllable : 100% success

TESTINGTESTINGTest on segmentation??

Page 40: Design and Implementation of Voice Conversion Application (VOCAL)

TESTINGTESTINGTest on segmentation

Speech : “Saya” from 4 difference speakers

??

Page 41: Design and Implementation of Voice Conversion Application (VOCAL)

Speech : “Saya” from 4 difference speakers

?? TESTINGTESTINGTest on segmentation

Page 42: Design and Implementation of Voice Conversion Application (VOCAL)

Speech : “Saya” from 4 (four) difference speakers

Percentage result:For speech with 2 (two) syllables without paused : 0%

success (All detect as 1 (one) syllable only)But it works good in the application : 100% success

TESTINGTESTINGTest on segmentation??

Page 43: Design and Implementation of Voice Conversion Application (VOCAL)

Speech : “Sistem Cerdas” from 4 difference speakers

?? TESTINGTESTINGTest on segmentation

Page 44: Design and Implementation of Voice Conversion Application (VOCAL)

Speech : “Sistem Cerdas” from 4 difference speakers

?? TESTINGTESTINGTest on segmentation

Page 45: Design and Implementation of Voice Conversion Application (VOCAL)

Speech : “Sistem Cerdas” from 4 (four) difference speakers

Percentage result:For speech with more complex forms : 50% successRelated to Speaker Variability

TESTINGTESTINGTest on segmentation??

Page 46: Design and Implementation of Voice Conversion Application (VOCAL)

TESTINGTESTINGTest on pitch modification??

No UtteranceSource Target

Converted (Hz)Speaker Freq (Hz) Speaker Freq (Hz)

1 Good Kath 242.04 Liz 266.95 263.16

2 Hai Kath 227.09 Zefan 176.26 172.41

3 Saya Liz 259.11 Will 170.14 172.41

4 Hallo Zefan 162.01 Liz 100.18 100

5 A Will 151.44 Zefan 191.57 188.68

Page 47: Design and Implementation of Voice Conversion Application (VOCAL)

TESTINGTESTINGTest on pitch modification??

No UtteranceTarget

Converted (Hz)

Success RateSpeaker Freq (Hz)

1 Good Liz 266.95 263.16 98.58 %

2 Hai Zefan 176.26 172.41 97.82 %

3 Saya Will 170.14 172.41 98.66 %

4 Hallo Liz 100.18 100 99.82 %

5 A Zefan 191.57 188.68 98. 49 %

Average percentage result: 98.67 %

Page 48: Design and Implementation of Voice Conversion Application (VOCAL)

TESTINGTESTINGSubjectivity Test??

Similarity (based on human auditory perception)Test on 20 peoples, 5 utterances

Overall result : 3.71 of 5.0

Utterance Source Target Avg. score

Good Kath Liz 3.55 of 5.0

Hai Kath Zefan 4.1 of 5.0

Saya Liz Will 3.4 of 5.0

Hallo Zefan Liz 3.65 of 5.0

A Will Zefan 3.85 of 5.0

Page 49: Design and Implementation of Voice Conversion Application (VOCAL)

TESTINGTESTINGSubjectivity Test??

Based on genderTest on 22 peoples, 2 utterances. 4 combinations gender for each utterance

From To Overall Rank

Female Female 2.591

Female Male 1.818

Male Female 2.727

Male Male 2.864

Page 50: Design and Implementation of Voice Conversion Application (VOCAL)

TESTINGTESTINGSubjectivity Test??

Similarity of speaker characteristicTest on 22 peoples, 5 utterances

Overall result : 3.64 of 5.0

No UtteranceSource Target

Avg. ScoreSpeaker Speaker

1 Carike Leonita Daniel 3.29 of 5.0

2 Mboh yo Daniel Leonita 3.95 of 5.0

3 Ndek mana Melinda Indro 4.16 of 5.0

4 Ra mangan Melinda Angela 3.41 of 5.0

5 Ya toh Indro Liz 3.36 of 5.0

Page 51: Design and Implementation of Voice Conversion Application (VOCAL)

CONCLUSIONCONCLUSIONConclusion from experiments result??

Segmentation result is fairly effective for certain speech, depends on the input speech which can be very diverse

For segmentation, longer speech will result lower success rate

Segmentation effects on conversion result

Page 52: Design and Implementation of Voice Conversion Application (VOCAL)

CONCLUSIONCONCLUSIONConclusion from experiments result??

Pitch modification calculation is working successful (average percentage 98.67 %)

This system is fairly effective at imitating certain target speaker (average score 3.71 of 5.0)

Female to male conversion give the best results (overall rank 1.818 of 4.0)

Speaker characteristic is fairly recognized by auditory perception (overall score 3.64 of 5.0)

Page 53: Design and Implementation of Voice Conversion Application (VOCAL)

SUGGESTIONSUGGESTIONFor future development??

The need of semi-automatic segmentation for a better result

Currently, the system only convert 2 voices saying same word or phrase (text-dependent). Neural network need to make text-independent system

Real-time system is possible

More research on frequency domain process

Page 54: Design and Implementation of Voice Conversion Application (VOCAL)

Design and Implementation Design and Implementation of Voice Conversion of Voice Conversion Application (VOCAL)Application (VOCAL)

THANKS FOR YOUR ATTENTION