Speech Reognition Using FPGA Technology

43
101010101010101010101 010111100101011001011 110101010101010101001 010100101010100110111 010100101001010001010 101010101101010101011 010101010001110101110 101010001010111011000 101101011000110100101 010100110111010100101 001010001010101010101 101010111010010101001 010111100101011001011 110101010101010101001 010100101010100110111 010101010001110101110 101010001010111011000 101101011000110100101 010100110111010100101 110101010101010101001 010100101010100110111 010100101001010001010 101010101101010101011 010101010001110101110 101010001010111011000 110101010101010101001 010100101010100110111 010100101001010001010 101010101101010101011 010101010001110101110 Carlos Asmat – David López Sanzò – Kanwen Wu Speech Recognition Using FPGA Technology By Carlos Asmat 260148251 David López Sansò 260146414 Kanwen Wu 260045745 Presentation Date: Wednesday, June 6, 2007 Project Supervisor: Prof. Miguel Marin Project Coordinator: Prof. Kenneth L. Fraser

description

David and Kanwen and Carlos implemented a speech recognition system on an FPGA development board (Altera DE2 Board) for the Design Project course at McGill (ECSE 494).

Transcript of Speech Reognition Using FPGA Technology

Page 1: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

Carlos Asmat – David López Sanzò – Kanwen Wu

Speech RecognitionUsing FPGA Technology

ByCarlos Asmat 260148251David López Sansò 260146414Kanwen Wu 260045745

Presentation Date: Wednesday, June 6, 2007

Project Supervisor: Prof. Miguel Marin

Project Coordinator: Prof. Kenneth L. Fraser

Page 2: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

2Carlos Asmat – David López Sanzò – Kanwen Wu

Outline

1) Introduction

2) MATLAB™ Demonstration

3) Hardware Implementation

4) Hardware Demonstration

5) Final remarks

Page 3: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

3Carlos Asmat – David López Sanzò – Kanwen Wu

What is speech recognition?● Convert analog sound into binary digits.

● Compare with the pre-stored word.

● Not to confuse with speaker recognition.

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 4: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

4Carlos Asmat – David López Sanzò – Kanwen Wu

Speech Recognition Performance

● Priority: Accuracy and Reliability.

● Consumer products.

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 5: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

5Carlos Asmat – David López Sanzò – Kanwen Wu

Objectives● Hardware implementation of a simple speech recognition

system.

● Single word identification.

● Cost efficiency, reliability, and simplicity are the major consideration.

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 6: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

6Carlos Asmat – David López Sanzò – Kanwen Wu

Background Theory● The sound identification is based on its frequency content.

● Two steps:

➔ Training

➔ Recognition

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 7: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

7Carlos Asmat – David López Sanzò – Kanwen Wu

Background theory● A MATLAB™ implementation was devised to assess the

project feasibility.

● Two files were produced:

➔ train.m

➔ recogniz.m

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 8: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

8Carlos Asmat – David López Sanzò – Kanwen Wu

Background Theory● Training:

➔ Input several versions of a sound.

➔ Translate them to the frequency domain by using the FFT.

➔ Average their amplitude in the frequency domain.

● This produces the sound's fingerprint.

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 9: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

9Carlos Asmat – David López Sanzò – Kanwen Wu

● Note on the FFT:

➔ Only half of it is used.

➔ Five 1024-points FFTs are performed per sound sample.

Background Theory

X k=∑n=0

N−1

xn e−2 i

Nnk

k=0,... , N−1

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 10: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

10Carlos Asmat – David López Sanzò – Kanwen Wu

Background Theory● User inputs .wav files.

● Decimate and quantize the input sound files.

● Sound acquisition parameters:

➔ Sound samples are quantized down to 8 bits.

➔ The sampling frequency is 5 kHz.

➔ Around one second (1.024s) of sound is stored.

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 11: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

11Carlos Asmat – David López Sanzò – Kanwen Wu

Background Theory● Sound detection:

➔ Compute the average of a window.

➔ Compare it to the average of the next window.

➔ If the difference is significant then the sound is assumed to start at that point.

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 12: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

12Carlos Asmat – David López Sanzò – Kanwen Wu

Background Theory

w1=w2=1024 samples=0.2048s

L=5120 samples=1.024s

● Sound detection (cont'd):

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 13: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

13Carlos Asmat – David López Sanzò – Kanwen Wu

Background Theory● Store detected sound stream into a vector.

● Apply FFT to the above vector's first 1024 points and put it in 's'.

● Store 's' as the first row in the matrix 'x' and repeat with the following 1024 points until there are five rows in 'x'.

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 14: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

14Carlos Asmat – David López Sanzò – Kanwen Wu

Background Theory● Sound recognition:

➔ Compute the fingerprint of a sound.

➔ Compute the distance between the sound's fingerprint and the reference fingerprint

➔ If both are close enough, then the sound is assumed to match the reference sound.

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 15: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

15Carlos Asmat – David López Sanzò – Kanwen Wu

D=∑i=0

1024

ai−bi 2

Background Theory● Note on the distance computation:

➔ The sounds fingerprint and the reference fingerprint are considered as 1024-dimensional vectors.

➔ The distance between them is computed using the euclidean distance formula:

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 16: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

16Carlos Asmat – David López Sanzò – Kanwen Wu

System Overview

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 17: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

17Carlos Asmat – David López Sanzò – Kanwen Wu

Hardware Implementation● Design approach

● A/D Conversion

● Word detector

● FFT

● Memory Management

● Distance Computation

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 18: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

18Carlos Asmat – David López Sanzò – Kanwen Wu

Design Approach● Quartus II

➔ VHDL process blocks

➔ Computer-Aided Design

● Datapath/Overall Controller

● Intermediate controllers

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 19: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

19Carlos Asmat – David López Sanzò – Kanwen Wu

A/D Conversion

Introduction ● Hardware Implementation ● Demo ● Final Remarks Source: http://www.societyofrobots.com/images/analogdigital.jpg

Page 20: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

20Carlos Asmat – David López Sanzò – Kanwen Wu

A/D – Overall Configuration

Introduction ● Hardware Implementation ● Demo ● Final Remarks

MCLK

BCLK

LRCLK

ADCDAT

WolfsonCODEC

FPGA

I2C Bus

MASTER SLAVE

Page 21: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

21Carlos Asmat – David López Sanzò – Kanwen Wu

A/D Conversion

● Internal signals set by bus.

➔ De-mute.

➔ Boost mic.

➔ Change path.

Introduction ● Hardware Implementation ● Demo ● Final Remarks

MUTE

MUX A/D D/ADigital Filters

LINEIN

MICIN

MUTEMIC INSEL ADCDAT

LINEOUT

Page 22: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

22Carlos Asmat – David López Sanzò – Kanwen Wu

I2C Bus

● RADDR → Base address = 0011010

● R/W → Read/Write = 0

● B[15-9] → Control Address = 0000100

● B[8-0] → Control Data = 000001101

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Source: Wolfson WM8731 data sheets, p.43

Page 23: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

23Carlos Asmat – David López Sanzò – Kanwen Wu

● B[8-0] → Control Data = 000001101

I2C Bus

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Source: Wolfson WM8731 data sheets, p.43

'INSEL'

'MUTE MIC''MIC BOOST'

Page 24: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

24Carlos Asmat – David López Sanzò – Kanwen Wu

I2C Bus – ACK Signal● ACK signal goes from the Wolfson to the FPGA

➔ Opposite direction from rest of data

➔ Only one data line

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 25: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

25Carlos Asmat – David López Sanzò – Kanwen Wu

I2C Bus – ACK Signal● ACK signal goes from the Wolfson to the FPGA

➔ Opposite direction from rest of data

➔ Only one data line

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Solution...

Page 26: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

26Carlos Asmat – David López Sanzò – Kanwen Wu

I2C Bus – ACK Signal● ACK signal goes from the Wolfson to the FPGA

➔ Opposite direction from rest of data

➔ Only one data line

Introduction ● Hardware Implementation ● Demo ● Final Remarks

d a t a [ ]

e n a b l e d t

e n a b l e t r

r e s u l t [ ]t r i d a t a [ ]

L P M _ B U S T R I

i n s t

Solution...

Tri-state buffer!

Page 27: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

27Carlos Asmat – David López Sanzò – Kanwen Wu

A/D – ADCDAT Fetcher

● Clock module

● MSB available on 2nd rising BCLK edge

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Source: Wolfson WM8731 data sheets, p.34

Page 28: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

28Carlos Asmat – David López Sanzò – Kanwen Wu

Quantization● Codec output: two's complement

● Quantize 24 bits into 8.

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Decimalnumber

Binary (2's comp.)

Quantizeddecimal

Quantizedbinary

(2's comp.)

3 011

2 0101 01

1 001

0 0000 00

-1 111

-2 110-1 11

-3 101

-4 100-2 10

Page 29: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

29Carlos Asmat – David López Sanzò – Kanwen Wu

Downsampler

● Implementation

➔ Flip-flop

➔ Counters (and FSM)

Introduction ● Hardware Implementation ● Demo ● Final Remarks

DownsamplerDATA_IN @ 48 kHz DATA_OUT@ 5 kHz

READY

Page 30: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

30Carlos Asmat – David López Sanzò – Kanwen Wu

Word Detector

● Detects sharp transitions.

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Comparator

DATA_IN Average

Register 1

Register 2AbsoluteDifference

8

THRESHOLD9

9

SOUND_STARTS

Page 31: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

31Carlos Asmat – David López Sanzò – Kanwen Wu

Fast Fourier Transform● Altera IP MegaCore® 1024-points FFT module:

➔ Natural order streaming data input.

➔ Bit-reversed streaming data output.

➔ Low latency.

➔ Time Limited Version.

Introduction ● Hardware Implementation ● Demo ● Final Remarks

c l k

r e s e t _ n

i n v e r s e

s i n k _ v a l i d

s i n k _ s o p

s i n k _ e o p

s i n k _ r e a l [ 7 . . 0 ]

s i n k _ i m a g [ 7 . . 0 ]

s i n k _ e r r o r [ 1 . . 0 ]

s o u r c e _ r e a d y

s i n k _ r e a d y

s o u r c e _ e r r o r [ 1 . . 0 ]

s o u r c e _ s o p

s o u r c e _ e o p

s o u r c e _ v a l i d

s o u r c e _ e x p [ 5 . . 0 ]

s o u r c e _ r e a l [ 7 . . 0 ]

s o u r c e _ i m a g [ 7 . . 0 ]

F F T

i n s t 1

Page 32: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

32Carlos Asmat – David López Sanzò – Kanwen Wu

Memory Management● Three memory modules:

➔ FALSH (4MB)

➔ SDRAM (8MB)

➔ SRAM (512 kB)

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 33: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

33Carlos Asmat – David López Sanzò – Kanwen Wu

Data I/O

Address

Chip Enable

Write Enable

Output Enable

High Byte Mask

Low Byte Mask

18

16SRAM Chip

Memory Management● 512 kB SRAM memory module

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 34: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

34Carlos Asmat – David López Sanzò – Kanwen Wu

218 blocks

16 bits

8 bits

0123

262 141262 142262 143262 144

0 1

2 3

4 5

6 7

524 280 524 281

524 282 524 283

524 284 524 285

524 287 524 288

Memory Management● Memory structure:

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 35: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

35Carlos Asmat – David López Sanzò – Kanwen Wu

Memory Management● Memory Controller:

DATA_OUT

ADDR

DATA_IN

MODE

ENABLE

19

88

Memory Controller

Add

ress

Chi

p E

nabl

e

Wri

te E

nabl

e

Out

put E

nabl

e

Hig

h B

yte

Mas

k

Low

Byt

e M

ask

18

Dat

a I/

O

16

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 36: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

36Carlos Asmat – David López Sanzò – Kanwen Wu

Memory Management● Batch Operations:

MemoryBatch Operator

START_ADDR

DATA_IN

MODE

DATA_READY

19

8

END_ADDR19

ENABLE

CLK

DATA_OUT8

MEM_MODE

MEM_ENABLE

ADDR19

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 37: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

37Carlos Asmat – David López Sanzò – Kanwen Wu

Distance Computation● The distance computation module:

Distance

A

RST

8

ENABLE

CLK

DISTANCE8

B8

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 38: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

38Carlos Asmat – David López Sanzò – Kanwen Wu

Distance Computation● The distance computation module (cont'd):

Introduction ● Hardware Implementation ● Demo ● Final Remarks

SquareDifferenceA

8

B8

Accumulator

RST

CLK

DISTANCESquareRoot

Page 39: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

39Carlos Asmat – David López Sanzò – Kanwen Wu

Demonstration

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Sound Detection

I2C Done Signal

Threshold Settings

Assign Threshold

Send I2C Configuration

Current Average

Original image source: http://users.ece.gatech.edu/~hamblen/DE2/DE2.jpg

Page 40: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

40Carlos Asmat – David López Sanzò – Kanwen Wu

Final Remarks● Deficiencies.

● Strengths.

● Potential Improvements.

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 41: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

41Carlos Asmat – David López Sanzò – Kanwen Wu

Deficiencies● Lack of accuracy.

● Lack of observability.

● Requires complex hardware

➔ FFT (Nios II)

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 42: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

42Carlos Asmat – David López Sanzò – Kanwen Wu

Strengths● Fast.

● Trainable.

● The system is not limited to speech.

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 43: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

43Carlos Asmat – David López Sanzò – Kanwen Wu

Potential Improvements● Recognize several words

● Improve accuracy

● Variable length word

● Recognize sentences

➔ Requires hidden Markov model (HMM) (Very complex!)

Introduction ● Hardware Implementation ● Demo ● Final Remarks