Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time...
-
Upload
william-richards -
Category
Documents
-
view
214 -
download
0
Transcript of Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time...
© Florida Institute of Technology
Speech Processing and Recognition
Access audio data in real time and apply to speech
recognition
Final Exam ProjectFinal Exam Project ByBy Hesheng LiHesheng Li
Instructor: Dr.KepuskaInstructor: Dr.KepuskaDepartment of Electrical and Computer Engineering Department of Electrical and Computer Engineering
2
© Florida Institute of Technology
Speech Processing and Recognition
Overview
Introduction Three models to access live audio data How to get audio data by using low level
API model? Application in speech recognition Comparison and Analysis Conclusion
3
© Florida Institute of Technology
Speech Processing and Recognition
Introduction
Why ?Why ? HowHow??
Live audio data access has a Wide application !Live audio data access has a Wide application !
4
© Florida Institute of Technology
Speech Processing and Recognition
Three model to access live audio data
High level Digital Audio API-----MCI
DirectSound
Low level Digital Audio API----WaveX
5
© Florida Institute of Technology
Speech Processing and Recognition
High level Digital Audio APIMCI
MCI
The media control interface (MCI) provides standard command for playing multi-media device and recording multi-media resource files
Two different ways are possible to send devices a command.
1. Command message interface
2. Command string interface
6
© Florida Institute of Technology
Speech Processing and Recognition
Command message interface
Passing binary values and structures to an Audio device is referred to as using the "Command message interface“
We use the function mciSendCommand() to send commands using this approach.
Example waveParams.lpstrElementName = "C:\\WINDOWS\\
CHORD.WAV"; mciSendCommand(0, MCI_OPEN, MCI_WAIT|MCI_OPEN_ELEMENT|MCI_OPEN_TYPE|
MCI_OPEN_TYPE_ID, (DWORD)
(LPVOID)&waveParams)
7
© Florida Institute of Technology
Speech Processing and Recognition
Command string interface
Passing strings to an Audio device is referred to
as using the "Command string interface“We use the function mciSendString() to send
commands using this approach.Example mciSendString(“ open C:\\WINDOWS\\CHORD.WAV type waveaudio alias A_Chord", 0, 0, 0)))
8
© Florida Institute of Technology
Speech Processing and Recognition
MCI
Some other command:Command message interface:
1.Start record by “MCI _REOCRD”
2.Write data to wave file by “MCI _SAVE”
3.Stop by “MCI _STOP”
4.Play by “MCI_PLAY”
Command string interface:
1.Play by "play %s %s %s"
2.Stop by “stop %s %s %s"
9
© Florida Institute of Technology
Speech Processing and Recognition
DirectSound
Like other components of DirectX,DirectSound allow you to
use the hardware in the most efficient way
Here are some other things that DirectSound makes easy: Querying hardware capabilities at run time to determine the best solution
for any given personal computer configuration Using property sets so that new hardware capabilities can be exploited even
when they are not directly supported by DirectSound Low-latency mixing of audio streams for rapid response Implementing three dimensional (3-D) sound
10
© Florida Institute of Technology
Speech Processing and Recognition
Directsound
DirectSound playback is built on the IDirectSound
Component Object Model (COM) interface and on the IDirectSoundBuffer interface for manipulating sound buffers.
DirectSound capture is based on the IDirectSoundCapture and IDirectSoundCaptureBuffer COM interfaces.
11
© Florida Institute of Technology
Speech Processing and Recognition
Low level Digital Audio API----WaveX
Open audio deviceOpen audio devicePrepare structure Prepare structure
for recordingfor recordingStartStart
recordingrecording
DataDataprocessingprocessing
Release structureRelease structureClose audio deviceClose audio device
12
© Florida Institute of Technology
Speech Processing and Recognition
Open Audio DeviceOpen Audio Device
There are several different approaches you can
take, depending upon how fancy and flexible you
want your program to be.
1. Pass the value ”Wave mapper ” to open "preferred audio input/output device.
2. Call function to get the list of the devices and then open the audio device which one you want
3. WaveInOpen() and WaveOutOpen()
13
© Florida Institute of Technology
Speech Processing and Recognition
EXAMPLE
result = waveInOpen(&outHandle, WAVE_MAPPER, result = waveInOpen(&outHandle, WAVE_MAPPER,
&waveFormat, &waveFormat,
(DWORD)myWindow, (DWORD)myWindow,
0,CALLBACK_WINDOW); 0,CALLBACK_WINDOW);
ifif (result) (result)
{ printf("There was an error opening the { printf("There was an error opening the
preferred Digital Audio in device!\r\n"); }preferred Digital Audio in device!\r\n"); }
14
© Florida Institute of Technology
Speech Processing and Recognition
EXAMPLE
iNumDevs = waveInGetNumDevs(); iNumDevs = waveInGetNumDevs();
forfor (i = 0; i < iNumDevs; i++) { (i = 0; i < iNumDevs; i++) {
ifif (!waveOutGetDevCaps(i, &woc, (!waveOutGetDevCaps(i, &woc, sizeofsizeof(WAVEOUTCAPS))) (WAVEOUTCAPS)))
{ printf("Device ID #%u: %s\r\n", i, woc.szPname); } }{ printf("Device ID #%u: %s\r\n", i, woc.szPname); } }
result = result = waveInOpen(&outHandle,iNumDevs,&waveForwaveInOpen(&outHandle,iNumDevs,&waveFormat,mat,
(DWORD)myWindow, (DWORD)myWindow,
0,CALLBACK_WINDOW); 0,CALLBACK_WINDOW);
ReturnReturn
15
© Florida Institute of Technology
Speech Processing and Recognition
Structure wavefomatexWFomatTag WFomatTag PCM, Mulaw, AulawPCM, Mulaw, AulawnChannelsnChannels Mono,StereoMono,StereonSamplePernSamplePerSecSec
Sample rates,ie 8000HZSample rates,ie 8000HZ
navgBytePenavgBytePerSecrSec
Average data-transfer rateAverage data-transfer rate
nBlockAlignBlockAlignn
Minimum atomic unit of Minimum atomic unit of datadata
wBitsPerSawBitsPerSamplemple
8bits or 16bits per sample8bits or 16bits per sample
cbSizecbSize Extra format informationExtra format information
16
© Florida Institute of Technology
Speech Processing and Recognition
Example
WAVEFORMATEX waveFormat; WAVEFORMATEX waveFormat;
/* Initialize the WAVEFORMATEX for 16-bit, 44KHz, stereo /* Initialize the WAVEFORMATEX for 16-bit, 44KHz, stereo */*/ waveFormat.wFormatTag = WAVE_FORMAT_PCM; waveFormat.wFormatTag = WAVE_FORMAT_PCM; waveFormat.nChannels = 2; waveFormat.nChannels = 2;
waveFormat.nSamplesPerSec = 44100; waveFormat.nSamplesPerSec = 44100; waveFormat.wBitsPerSample = 16; waveFormat.wBitsPerSample = 16;
waveFormat.nBlockAlign =waveFormat.nChannels* waveFormat.nBlockAlign =waveFormat.nChannels*
(waveFormat.wBitsPerSample/8); (waveFormat.wBitsPerSample/8); waveFormat.nAvgBytesPerSec=waveFormat.nSamplesPwaveFormat.nAvgBytesPerSec=waveFormat.nSamplesPerSec * erSec *
waveFormat.nBlockAlign; waveFormat.nBlockAlign;
waveFormat.cbSize = 0;waveFormat.cbSize = 0; ReturnReturn
17
© Florida Institute of Technology
Speech Processing and Recognition
Recording engine
buffer1buffer1buffer2buffer2buffer3buffer3buffer4buffer4
Call back functionCall back function
Data proccesingData proccesing
AddInBuffer()AddInBuffer()
waveInStart()waveInStart()
AudioAudio devicedevice
ms
ms
gg
18
© Florida Institute of Technology
Speech Processing and Recognition
Recording engine
buffer2buffer2buffer3buffer3buffer4buffer4buffer1buffer1
Call back functionCall back function
Data processingData processingm
sm
sgg
AudioAudio devicedevice
Circular buffer
19
© Florida Institute of Technology
Speech Processing and Recognition
1+3+1
Three Important methods: prepare a buffer for wave-audio input
function: WaveInPrepareHeader() Send the buffer to audio device,when the buffer is full
the application is notified
function: WaveInAddBuffer() Start recording
function: WaveInStart()
20
© Florida Institute of Technology
Speech Processing and Recognition
Example
if(MMSYSERR_NOERROR != if(MMSYSERR_NOERROR !=
waveInPrepareHeader(m_hWaveIn, &waveheader, sizeof(WAVEHDR)))waveInPrepareHeader(m_hWaveIn, &waveheader, sizeof(WAVEHDR)))
{ {
printf(“prepare buffer faliure!”) printf(“prepare buffer faliure!”)
}}
waveInAddBuffer(m_hWaveIn, &waveheader, sizeof(WAVEHDR));waveInAddBuffer(m_hWaveIn, &waveheader, sizeof(WAVEHDR));
waveInStart(m_hWaveIn);waveInStart(m_hWaveIn);
21
© Florida Institute of Technology
Speech Processing and Recognition
MessageWindows messages: MM_WIM_DATA:this message is sent to a window when the data is present
in the buffer and buffer is being returned to the application
Other messages: MM_WIM_CLOSE 、 MM_WIM_OPEN 、 MM_WOM_CLOSE MM_WOM_DONE 、 MM_WOM_OPEN
Call back function messages: WIM_DATA: this message is sent to the given call back function when the
data is present in the input buffer and the buffer is being
returned to the application
Other messages: WIM_CLOSE 、 WIM_DONE 、 WIN_OPEN 、 WOM_CLOSE 、 WOM_DONE 、 WOM_OPEN
22
© Florida Institute of Technology
Speech Processing and Recognition
Message ExampleCall back message
waveInOpen(&m_hWaveIn, WAVE_MAPPER, &m_Format, waveInOpen(&m_hWaveIn, WAVE_MAPPER, &m_Format,
waveInProc, 0L, CALLBACK_FUNCTION )waveInProc, 0L, CALLBACK_FUNCTION )
waveInProc(…..) {waveInProc(…..) {
switch(msg) {switch(msg) {
case WIM_OPEN: ………….case WIM_OPEN: ………….
break,break,
case WIM_DATA: ………….case WIM_DATA: ………….
break,break,
case WIM_CLOSE: …………case WIM_CLOSE: …………
Window message
waveInOpen(&m_hWaveIn, WAVE_MAPPER, &m_Format, waveInOpen(&m_hWaveIn, WAVE_MAPPER, &m_Format,
hWnd, 0L, CALLBACK_WINDOW )hWnd, 0L, CALLBACK_WINDOW )ReturnReturn
23
© Florida Institute of Technology
Speech Processing and Recognition
Application in Real-time Key Word Recognition
Front - EndAudio
InterfaceBack-End
Training/Testing/Analysis
12/18/2003
Key-Word Recognizer
Monitor
To be continuedTo be continued….….
24
© Florida Institute of Technology
Speech Processing and Recognition
Application in Real-time Key Word Recognition
Practical problems when we apply this model in Practical problems when we apply this model in speech recognitionspeech recognition
1.1. AsynchronismAsynchronism
2.2. EfficiencyEfficiency
25
© Florida Institute of Technology
Speech Processing and Recognition
Application in Real-time Key Word Recognition
buffer2buffer2
Call back functionCall back function
Data proccessingData proccessing
buffer3buffer3 buffer4buffer4 buffer500buffer500……..
ms
ms
gg
CA
LC
AL
LL
buffer1buffer1
26
© Florida Institute of Technology
Speech Processing and Recognition
Comparison and Analysis
Mci is the easiest model ,very convenient,but Mci is the easiest model ,very convenient,but offers the least amount control,”FileLevel”offers the least amount control,”FileLevel”
waveX is more complicit ,but can flexible waveX is more complicit ,but can flexible control audio data,”BufferLevel” control audio data,”BufferLevel”
Direct sound is the most efficient Direct sound is the most efficient method,but most complicit, ”BufferLevel” method,but most complicit, ”BufferLevel”
27
© Florida Institute of Technology
Speech Processing and Recognition
Conclusion
Apply MCI to audio document part in Apply MCI to audio document part in “video conference”“video conference”
Apply WaveX to real time speech Apply WaveX to real time speech recognition and also to “video conference” recognition and also to “video conference”
Direct sound is widely used in computer Direct sound is widely used in computer game design game design