Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing...

54
Non-linear speech proce ssing: overview of COST -277 current research 1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos Faúndez-Zanuy ([email protected]) COST-277 Chairman

Transcript of Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing...

Page 1: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

1

Nonlinear speech processing (NOLISP)

Overview of COST-277 current research

Marcos Faúndez-Zanuy

([email protected])

COST-277 Chairman

Page 2: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

2

OUTLINE

1. Overview: what means “nonlinear”?

2. Organization of COST-277

3. Report activity june’01 – june’03

Page 3: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

3

OUTLINE

1. Overview: what means “nonlinear”?

2. Organization of COST-277

3. Report activity june’01 – june’03

Page 4: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

4

What means “Non-linear”? (Strict sense)

Superposition principle does not hold:

Given: f(x1)=y1, f(x2) =y2 =>

f(ax1)=ay1, f (x1 +x2) =y1+y2

Page 5: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

5

What means “Non-linear”? Strict sense: Really almost “everything” is nonlinear

Acquisition Parameterization Models

Quantizer (linear, A-law, etc.)

Cepstrum HMM, VQ

-8 -6 -4 -2 0 2 4 6 8 -8 -6 -4 -2 0 2 4 6 8

outp

ut

input

Uniform 3 bits quantizer

-4 -3

-2 -1

0 1

2 3

)(log)( 1 nxFFnxcepstrum

Page 6: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

6

Non-linearities are always present

Nonlinearities of the systems that generate the signal and/ or noise

Nonlinearities of the signal acquisition system

Nonlinearities of the transmission channel Nonlinearities of the human perception

mechanism.

Page 7: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

7

Classical approachWide sense: linear speech processing

Speech signal model consists of a pulse/ noise source and a linear filter where both change their characteristics on a frame-by-frame basis.

This approach neglects structure known to be present in the speech signal.

Page 8: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

8

Evidences of nonlinearities

Residue comparison Correlation dimension Higher order statistics Probability density functions

Page 9: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

9

Example: Linear vs NL

Page 10: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

10

Drawbacks with NOLISP approaches

A lack of a unifying theory of the different nonlinear processing tools (nnets, homomorphic, polynomial, morphological, ordered statistics filters, and so on)

High computational burden Well known analysis tools are not applicable Usually, a closed-form formulation does not exist,

and iterative methods (with local minima problems) must be used.

Page 11: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

11

What are we mainly looking for?

The replacement of the linear filter (or parts thereof) with nonlinear operators (models) should enable us to obtain an accurate description of the speech signal with a lower number of parameters. This in turn should lead to better performance of practical speech processing applications.

Page 12: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

12

OUTLINE

1. Overview: what means “nonlinear”?

2. Organization of COST-277

3. Report activity june’01 – june’03

Page 13: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

13

What is COST ?

Intergovernmental Cooperation– Created in 1971– 17 Scientific and Technical Domains

Participation– 33 COST Countries– European Commission– International Organisations – Organizations from Non-COST Countries on Mutual

Benefit Basis COST Actions

– Concerted Actions of Nationally Funded R&D

Page 14: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

14

COST TISTCOST TISTTelecommunications,Telecommunications,Information ScienceInformation Scienceand Technologiesand Technologies

Page 15: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

15

COST CountriesThe fifteen EU Member States

The EFTA Member States

Iceland

Norway

Switzerland

Central and Eastern countries

Estonia

Latvia

Lithuania

Poland

the Czech republic

Slovakia

Slovenia

Croatia

Romania

Bulgaria

Other countries

Cyprus

Malta

Turkey

Hungary

Page 16: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

16

Evolution of COST Actions

0

50

100

150

200

250

80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 00

Total Actions

Starting Actions

Page 17: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

17

WHAT IS A COST ACTION?

Concerted Action Pan-European “NON-COMPETITIVE” Research R&D Financed Nationally Flexibility Bottom-up A la carte participation Commission funds only coordination activities

Page 18: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

18

COST Senior Officials (CSO)

Responsible for the overall strategy of COST

Decides on the launching of each individual COST Action

Approves participation from non-COST countries institutes

Approves prolongation of COST Actions

Page 19: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

19

COST Technical Committee (TC)

Selection of new COST Actions

Monitoring of ongoing COST Actions

Evaluation of completed COST Actions

Dissemination and Valorisation of COST activities

Provide Advice to EC on Budget Planning

Page 20: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

20

Management Committee (MC)

Supervises and coordinates the implementation of the Action

Composed of :– Maximum two representatives of each signatory

country they ensure the scientific coordination at national level

– One representative of any non-COST institution admitted to participate

– The Scientific Secretary– Representatives of the Commission services

Each signatory has one vote

Page 21: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

21

Working Group (WG)

Small number of researchers per working group

Working group members may be:

– Management Committee members

– Other scientists from the signatory countries

Page 22: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

22

COST TIST

~ 28 Actions, ~ 2000 Organisations Covering Basic Research on

– Antennas and Radio Propagation– Satellite Technologies and Services– Mobile Technologies and Services– Optical Networking Components and Services– Internet & Multimedia Network Services– Speech Technologies– Information and Computer Science

Strong Relationship with IST Program

Page 23: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

23

Evolution of COST Evolution of COST TIST ActionsTIST Actions

0

5

10

15

20

25

30

1996 1998 2000

Total Actions

StartingActions

Page 24: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

24

Special Needs & User Requirements

COST 219bis,

269

COST TISTResearch Domains & Actions

Antennas/Radio PropagationCOST 244bis, 255,

260, 261, 271

Mobile & Personal Comm.

COST 259, 273Satellite

Tech. & Services

COST 272

Optical Networking

COST 265, 266, 267, 268, 270

New Internet & Multimedia Services COST 211 Quad, 256,

257, 263, 264, 269, 275, 279

Speech Technologies

COST 258, 277, 278

Information & Computer Science

COST 274, 276

Page 25: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

25

Other COST Actions in Speech Technologies

COST 275: Biometrics-Based Recognition of People over the Internet – Involves the use of both voice and face recognition

for user authentification over the Internet COST 278: Spoken Language Interaction in

Telecommunications– Improve knowledge regarding issues and problems

related to spoken language interaction, including robustness and multi-lingual aspects

– Human-computer interaction using spoken language in multi-modal context, including dialoque theories and application evaluation

Page 26: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

26

Relationship between COST Actions 275, 277 and 278

275: Biometrics based Recognition of People

over the Internet

277: Non-linear Speech Processing

278: Spoken LanguageInteraction in

Telecommunication

Speaker

Recognition

Speech

Recognition

Natural

Language

Processing

Multi

Modality &

Data Fusion

Speech

Analysis & Coding

Image

Analysis &

Graphics

Speech

SynthesisDialogue

Application Fields

Interface Components

Generic Functions

Page 27: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

27

GRANT CONTRACTS COST TIST support is provided through annual

Grant Contracts with coordinating organisation Contract covers costs for:

– Secretariat (manpower to cover administration)– Meetings (WG and MC)– Seminars and workshops– Short Term Scientific Missions– Publications

Page 28: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

28

SECRETARIAT Contract Management, Payments Reimbursement of Meetings Rebuilding of WWW site

– Repository of Official Documents– TC and Action Activities and Events

Enhancing Dissemination– News Letter– Central Index and Storage of Reports for Retrieval

Links with EC (IST) and National Programmes

Page 29: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

29

Overview:COST-277

DISCRETE MODELS

SY

NT

HE

TIC

SP

EE

CHH

UM

AN

SP

EE

CH

CODED SPEECH

WRITTEN SPEECH

TtS

StT

StC

CtS

Analysis SynthesisR

ecogn.

Cod

ing

© u

kl 2

002

Page 30: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

30

Organization

Chair: Marcos Faúndez Vice-Chair: Gernot Kubin Secretary: Stephen McLaughlin

– WG1: Bastiaan Kleijn– WG2: Bojan Petek– WG3: Stephen McLaughlin– WG4: Gerard Chollet

Page 31: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

31

Countries

Austria Belgium Czech Republic France Germany Greece Ireland Italy Lithuania Portugal Slovakia Slovenia Spain Sweden Switzerland UK

Canada

Page 32: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

32

Dissemination of info

e-mail distribution list:

[email protected]

Subscribe/unsubscribe [email protected]

Website:

http://www.ee.ed.ac.uk/cost277/

Page 33: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

33

Future Meetings of the management committee

Page 34: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

34

Publications and reports

International Journal of control and intelligent systems, special issue on Non-linear Speech processing techniques and applications ACTAPRESS. Invited editor: A. Hussain (COST-277 MC member)

Special sessions in EUSIPCO’02, IWANN’01, IWANN’03, EUSIPCO’04 (TBC)

Page 35: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

35

COST Actions in Speech Technologies

COST 275: Biometrics-Based Recognition of People over the Internet – Involves the use of both voice and face recognition for user

authentification over the Internet COST 277: Nonlinear speech processing COST 278: Spoken Language Interaction in

Telecommunications– Improve knowledge regarding issues and problems related

to spoken language interaction, including robustness and multi-lingual aspects

– Human-computer interaction using spoken language in multi-modal context, including dialoque theories and application evaluation

Page 36: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

36

Relationship between COST Actions 275, 277 and 278

275: Biometrics based Recognition of People

over the Internet

277: Non-linear Speech Processing

278: Spoken LanguageInteraction in

Telecommunication

Speaker

Recognition

Speech

Recognition

Natural

Language

Processing

Multi

Modality &

Data Fusion

Speech

Analysis & Coding

Image

Analysis &

Graphics

Speech

SynthesisDialogue

Application Fields

Interface Components

Generic Functions

Page 37: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

37

COST-277: A different approach

“The four classical areas of speech processing:

Speech Recognition (Speech-to-Text, StT)

Speech Synthesis (Text-to-Speech, TtS and Code-to-Speech, CtS)

Speech Coding (Speech-to-Code, StC with CtS) and

Speaker Verification and Identification (SV)

have all developed their own methodology almost independently from the neighboring areas. This has led to a plurality of tools and methods that are hard to integrate to any small multifunctional speech processing system (a mobile phone performing speaker verification and continuous speech recognition in addition to speech coding should have many separate processes running in parallel).

Page 38: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

38

Relations between different fields

DISCRETE MODELS

SY

NT

HE

TIC

SP

EE

CHH

UM

AN

SP

EE

CH

CODED SPEECH

WRITTEN SPEECH

TtS

StT

StC

CtS

Analysis SynthesisR

ecogn.C

odin

g

© u

kl 2

002

Page 39: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

39

COST277Non-linear speech processing

PROGRESS REPORT

Period: from (June-2001) to (June-2003)

Page 40: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Speech coding 40

LINEAR PREDICTION

Scalar linear prediction AR modeling of order P : where ai are the scalar prediction coefficients.

obtained with the levinson-durbin recursion.

Vectorial linear prediction AR-vector modeling of order P: where are matrices

P

ii neinxanx

1

neinxAnxP

ii

1

PiiA ,1mm

Page 41: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Speech coding 41

NL SCALAR PREDICTION WITH NNET

input layer

hidden layer

output layer

x[n-1]x[n-p] x[n-p+1]inputs: x[n]

output

Page 42: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Speech coding 42

NLVECTORIAL PREDICTION WITH NNET

input layer

hidden layer

output layer

inputs:

outputs

x[n-p] x[n-p+1] x[n-1]

x[n] x[n+1]

Page 43: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Speech coding 43

ADPCM NNET PREDICTION

Q

Q -1

MLP1

x[n]

+ -

d[n]

xN[n] ~

d[n] ~

c[n]

x[n] ^ MLP2

MLPN

x1[n] ~ C

OM

.

x[n] ~

Page 44: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Speech coding 44

VECTORIAL NL-ADPCM RESULTS

1 1.5 2 2.5 3 3.5 46

8

10

12

14

16

18

20

22

24

26

bits per sample

SE

GS

NR

1D2D3D4D5D

Page 45: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

45

Very low bit rate speech coder

Demonstration !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Page 46: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

46

Broadcast news audio segmentation,

classification, clustering and speech recognition

Demonstration

demo

Available at http://193.126.86.80

Page 47: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

47

SPEAKER RECOGNITION

Current systems rely on low-level information in speech.– Short time extent analysis windows (20-30 ms)– Spectral energy based (MFCC)

Another possibility: High level information– Speaking rate– Pitch patterns– Word/ Phrase usage– Idiosyncratic pronunciation

Page 48: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

48

SPEAKER RECOGNITION:Possibilities of NOLISP

Low level information:– Non-linear predictive models instead of LPCC– Parameters: Fractal, Lyapunov exponents,

correlation dimension, etc. High level information:

– To take advantage of the other working groups. For instance intonation is fundamental in speech synthesis and useful for speaker recognition.

Page 49: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

49

Why to use NL-models?

Listening to the residual signal of an LPC analysis it is possible to identify who is speaking.– Usually the residual signal is discarded.– NL models offer a better fit and whiter

residual signal. NL models can offer an improvement in

coding and synthesis, so there is room for speaker recognition improvement.

Page 50: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

50

BANDWIDTH EXTENSION:An example of NL processing

A speech signal that has passed through the public switched telephony network (PSTN) has generally a limited frequency range between 0.3 and 3.4 kHz.

The Bandwidth extension algorithms aim at recovering the lost low- (0 - 0.3 kHz) and/or high- (3.4 –8 kHz) frequency band given the narrow-band speech signal

Page 51: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

51

SPECTRAL BAND REPLICATION

0 fs/4 fs/2

0 fs/4 fs/2fs/8

0 fs/4 fs/2

0 fs/4 fs/2

initial

final

f [kHz]5 10

LPF

Page 52: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

52

BANDWIDTH EXTENSION

Databases:– Original fullband: [0.3, 7] kHz

– Narrow band: [0.3, 3.4] kHz

– Bandwidth extended: [0.3, 7] kHz

LPF

Bandwidth extension

Page 53: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

53

MIC database:DCF for several MELCEPS-l

8 10 12 14 16 18 20 22 24 260.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

l

DC

FMELCEPS

[0, 8] kHz[0.3, 3.4] kHz

[0.3, 8] kHz BWext

Page 54: Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Non-linear speech processing: overview of COST-277 current research

54

Bandwidth extension

For human beings it’s more easy to recognize using full band signals.

No new information is added Experimental results reveal that:

– The bandwidth extension algorithm does not introduce any damaging artifacts

– With MELCEPS parameterization, the results are better than using the narrow band signal.