ECE532 Group Report Group #22 April 9, 2012
Transcript of ECE532 Group Report Group #22 April 9, 2012
XAudio: A Voice Controlled Music Player
ECE532 – Group Report
Group #22
April 9, 2012
Michael Cornacchia
Jason Deng
Goce Jankuloski
Table of Contents
Overview ....................................................................................................................................................... 1
Project Description.................................................................................................................................... 1
Manual ...................................................................................................................................................... 1
System Block Diagram ............................................................................................................................... 2
Summary ................................................................................................................................................... 3
Outcome ....................................................................................................................................................... 3
Proposed Acceptance Criteria: ................................................................................................................. 3
Results ....................................................................................................................................................... 3
Improvements ........................................................................................................................................... 4
Project Schedule ........................................................................................................................................... 4
Planned Schedule ...................................................................................................................................... 4
Actual Schedule ......................................................................................................................................... 5
Schedule Comparison ............................................................................................................................... 6
Description of Blocks ..................................................................................................................................... 7
Word Processor ......................................................................................................................................... 7
Vcom_RxTx ................................................................................................................................................ 8
Audio (v1.00a) ........................................................................................................................................... 8
OPB_AC97 (v2.00a) ................................................................................................................................... 8
XPS_SysACE (v1.00a) ................................................................................................................................. 8
PLB_OPB_Bridge (v1.00c) ......................................................................................................................... 9
Voice Recognition (voice_rec) uB (v7.10d) ............................................................................................... 9
Voice Recognition (voice_rec) UART (v1.00a)........................................................................................... 9
Player uB (v7.10d) ..................................................................................................................................... 9
Player UART (v1.00a) ................................................................................................................................ 9
XPS_GPIO (v1.00a) .................................................................................................................................... 9
2to1 Mux ................................................................................................................................................... 9
Description of Design Tree .......................................................................................................................... 10
Tips and Tricks ............................................................................................................................................. 10
Xilfatfs_v1_00_a ...................................................................................................................................... 10
1
Overview
Project Description The objective of this project was to design and build a voice controlled music player. By speaking
predefined voice commands into the microphone of the device, the device carries out the wanted action
on behalf of the user. The device has the songs which can be played and the frequency signature of
reference voice commands stored in memory. The user is able to control the playback of the music on
the device with typical commands such as regulating the playback mode, current track played, etc. The
commands are converted to the frequency domain using hardware implemented DCT (discrete cosine
transform) so that they can be compared with the commands stored in memory. The goal was to
develop a music player which is fun to use due to its voice enabled interaction.
Manual Event Type Description
Left push button Button/Switch Toggles the operation mode between “voice command” (to accept user voice input) and “training mode” (to store user voice input)
Enter push button Saves the most recent voice command reference data to memory
Right push button Toggles the method of “training mode” between “overwrite” (erases existing voice reference data) and “average” (averages the existing voice reference data with the new data)
Toggle switch 3 Selects rs232 output between the UARTs of the 2 microblazes
Switches 0-2 Selects each word to train
Say “Play” Voice Command
Play the current song
Say “Stop” Stop the current song
Say “Next” Go to the next song
Say “Pause” Pause the current song
Say “Shuffle” Set shuffle mode
Say “Repeat” Set repeat mode
2
PLB
OPB
System Block Diagram
Legend
TYPE
Existing IP
Custom IP
External Hardware (Off Chip)
Name/Desc.
Name/Desc.
Name/Desc.
HARDWARE
FSL
MICROBLAZE
player_uB
MICROBLAZE
voice_rec_uB
wordprocessor
AUDIO
audio
PLBv46_OPB_BRIDGE
plb_opb_bridge
OPB_AC97
opb_ac97
XPS_UARTLITE
player_UART
XPS_UARTLITE
voice_rec_UART
2to1 mux
HARDWARE
RS-232
vcom_rxtx
XPS_GPIO
GPIO
HARDWARE
AC97
XPS_SYSACE
SysACE
XPS_INTC
voice_rec_intr
XPS_INTC
player_intr
HARDWARE
led, switch, push button
3
Summary The AC97 codec is used for playback of music and capturing of voice input. The custom DCT block
converts the time domain microphone input into a frequency domain representation and segments the
audio stream into words. The frequency information describing this word is then sent to the next
custom block which is the voice command receiver (Vcom_rxtx). This block accumulates the frequency
information for a word and manages an FSL connection to forward the information to a MicroBlaze
processor where it is compared against a set of predefined word commands loaded in memory. If a
match is found, the corresponding command ID is sent from the voice recognizer MicroBlaze to the
player MicroBlaze, which then alters playback accordingly. The player MicroBlaze controls playback by
writing to the FSL connecting it to the audio block. It also reads music files from the CF card through the
SysACE interface controller.
Everything except for the custom IP used is Xilinx IP, which is available through the IP catalog in XPS. The
only exception is the opb_ac97 and the audio cores which are still Xilinx IP and come as part of a
reference design.
Outcome
Proposed Acceptance Criteria: Device understands 9 out of every 10 given commands
Device does not misrecognize one command for another
Any sound/command outside of the predefined commands is rejected
Processing of given commands happens within 2 seconds of input
Eliminates noise – command understood in environment with sound level 65 dB
Sound feedback is at an audible level in 65 dB room
Loudest volume is below 95 dB
Voice input sampling at a minimum of 44.1kHz
Music output bit rate at minimum of 256 kbps
Results The input processing hardware is able to perform a 128 sample DCT in 1920 cycles or 19.2 microseconds
at 100MHz, allowing it to process the required 44.1kHz input audio (22.7 microsecond sampling period)
in real-time. Noise filtering is automatically performed in hardware by applying a threshold to each
frequency bin separately before forwarding the frequency information to the MicroBlaze. The software
can perform autocorrelation or mean squared error comparison in well under one second, meeting the
desired two second processing time for audio commands.
The requirement for music output bit rate was found to be impossible to satisfy due to the limited read
speed from the Compact Flash card to the MicroBlaze. We determined that 22kHz single channel .wav
format audio files could be read from the card in real-time.
4
We are not yet able to determine the command recognition accuracy due to last minute issues with
drivers and meeting area constraints which have just been resolved. The final results for this
requirement will be shown in our presentation after we are able to determine appropriate settings for
microphone boost and training the software to match the outputs on the final hardware.
Improvements The fundamental tools for the project have been created including hardware for word segmentation and
frequency domain conversion, software for matching frequency patterns to reference words and
software and drivers for music player control, reading files and audio output. With this backbone, it
would now be easy to make improvements such as adding additional commands, adding other optional
functions for performing pattern matching to improve accuracy or improving audio quality using by
reading higher bit rate music files from a different memory source.
Project Schedule
Planned Schedule
Week Milestone
Feb 8 - Figure out music format and other logistics like different command sound patterns - Look at existing FFT IP block and see how it works - Bandwidth estimation
Feb 15 - Figure out protocols (FSL) and write the Verilog code for custom blocks (Week 1/2) - Figure out MPMC protocol and store music into DDR, ability to access files for read
Feb 29 - Simulate existing FFT block - Simulate individual custom blocks to verify correctness - Figure out protocols and write the Verilog code for custom blocks (Week 2/2) - Music files playing through AC97 codec
Mar 7 - Interface modules o Interface microblaze with the music reader o Interface microblaze with the command receiver
Mar 14 - Test out completed design
Mar 21 - Debug - Optional: Visualizer component
o Displaying video frame form memory o Creating video frame from FFT output
Mar 28 - Make sure everything is functional
5
Actual Schedule
Week Milestone
Feb 8 - Bandwidth estimation - Read about FFT IP block, and instantiated with estimated parameters - Created Modelsim .do scripts for testing FFT - Discussion of custom block I/O
Feb 15 - Created Matlab model for performing word segmentation and FFT - Wrote up the code needed to read/write from CF card. - Synthesized the hardware to test this out - Prototype C code for frequency pattern recognition for words
Feb 29 - Created Verilog module for performing word segmentation - Got the write/read working for CF - Researched on how to get audio implemented by using old blocks. - Coding, testing, and implementation of custom block “vcom_rxtx”
Mar 7 - Added support to “vcom_rxtx” for accumulating frequency data directly from the FFT - Simulated word segmentation module - Decided to use DCT instead of FFT - Created Matlab model for performing 128 sample DCT with integer approximation of
DCT matrix - Wrote and synthesized to test out the audio playback. Wasn’t able to get it playing
audio.
Mar 14 - Created C++ program to generate code for the DCT module multiplications and sums - Created fast multiplier for the known bit widths to multiply - Modified word segmenter to work on frequency converted data - Simulated “wordprocessor” module, combining DCT and word segmentation - Successfully got the audio working. Used reference design from Xilinx, that uses OPB - Revised “vcom_rxtx” to support the DCT - Started design of a testbench project in XPS to test “vcom_rxtx”, adding in software
code for frequency pattern recognition
Mar 21 - Modified wordprocessor module to take input as a slave on an FSL - Implemented downsampling by 4 before DCT to maximize use of frequency spectrum
for 44.1kHz input - Able to read .wav file from CF and play it onto the AC97 - Modified XilFatFS library to disable cache – otherwise cannot play more than 10s - Implemented a “dummy_dct” custom block to test the output of “vcom_rxtx” - Added FSL communication to the frequency pattern recognition software
Mar 28 - Integration and Debugging
Extra - Changed DCT to perform 128 multiplications and sums per cycle for 128 cycles instead of the original parallel design to fit the fpga LUT and FF constraints
- Further decreased the size of “vcom_rxtx” by reducing wire bit width and removing double buffering functionality to reduce logic utilization
- Modified some xilfatfs driver code to force the configuration controller to reset and give the lock to the sysace controller, which fixes some minor errors
6
Schedule Comparison Unsurprisingly our actual schedule differs from the planned schedule. Most noticeably our actual
schedule is behind our planned schedule, sometimes by as much as 2-3 weeks. The slowdown in
progress is attributed mainly to errors with existing drivers, running out of space on the FPGA chip, and
modifications in custom blocks due to unforeseen complexities.
For example the music playback from CF card to AC97 was originally planned to be done by Feb. 29, but
it only got done by Mar. 21. In this instance, there were several unexpected obstacles which significantly
slowed down the progress in this matter. The read/write functionality for CF took longer because of the
way the presence of the card is interpreted by the board. However, the ‘slack’ in our planned schedule
for this partition of the work in the weeks of March allowed us in the end to still finish on time.
Although the implementation of the custom blocks around Feb. 15 to Feb. 29 was complete on time, the
design changes of a single custom block affected the operation of the other. The result was that the
custom blocks were continuously modified through the week of Mar. 14. Similarly, in the “Extra” weeks,
the system failed to map onto the FPGA chip, forcing further changes in the custom blocks and changes
in various software settings to reduce logic utilization.
7
Description of Blocks
Word Processor
The core of this custom block is another custom module which performs a discrete cosine transform on
a sequence of 128 audio samples, each represented as a 16-bit integer. Due to constraints on the
number of LUTS and FFs, only the five most significant bits are kept. These values are multiplied with
four-bit pre-calculated cosine samples using a custom multiplier for multiplying by signed four-bit values
(optimal bit width determined through Matlab simulation) in three cycles using a sequence of additions.
The cosine values are hard coded as inputs to the multipliers by using a script to generate the Verilog
file. The output frequency components are 17-bit because they are the sum of 128 scaled inputs.
The DCT block uses an FSM to store a sample each time iwrite is driven high. Once it accumulates 128
time values it performs 128 dot products as described above to generate the frequency data. It then
outputs the frequency information by raising owrite whenever a new value is ready.
The wordprocessor block is an FSM which downsamples the input by four. This value was determined
using a Matlab model to optimize the use of the frequency bins for a 128-DCT considering 44.1kHz
sampled audio and the range of frequencies observed when saying the keywords. It also manages the
communication protocols for receiving input from the FSL, using the DCT block and outputting to the
next custom block.
The information is output to the adjacent block as a 128-bit wide array of ones or zeros for each
frequency component which are determined by comparing the magnitude of the frequency component
to an experimentally determined threshold for optimizing distinction between keywords. The number of
ones in a vector is used to determine the power of the signal during the 128 time samples considered. A
single vector with power greater than a threshold starts a word and two consecutive power levels below
a threshold ends the word.
wordprocessor
clk reset
vec_out
send
write
sendready
32 128 FSL_S_Data
FSL_S_Control
FSL_S_Exists
FSL_S_Read
FSL_S_Clk
dct in
iwrite
clk reset
out
owrite
16 17 / /
8
Vcom_RxTx This custom block builds up the frequency data into a histogram/bar graph and once signaled, sends the
data to “Voice Recognition uB” through an FSL. It obtains sets of 128-bits of frequency data from “Word
Processor” every clock cycle that “dct_raw_write” is high. Each bit of the 128-bits is summed in a
separate 8-bit counter. Thus the data sent to the microblaze consists of a total of 128 8-bit numbers
packaged into 32-bits to fit the FSL. The accumulated data is sent to the FSL when “dct_raw_send” is
high. “dct_sendready” is used for handshaking to indicate the completion of send.
Audio (v1.00a) The audio block is a custom core that comes bundled with opb_ac97 block from Xilinx Reference designs
on how to play audio. As show by the block diagram, the audio block interfaces with the player
microblaze via FSL for playback of the audio, as well as the word processor block for microphone voice
input. The FSLs are implicitly connected to the opb_ac97 block through which the audio is sent/received.
It is unclear why Xilinx would not merge these two blocks into one simple block.
OPB_AC97 (v2.00a) This is the main block that controls and configures the AC97 codec. It is as mentioned above, pulled from
Xilinx’s own reference designs and interfaces with OPB instead of PLB. Through its own driver, things like
setting the sampling rate, volume level, line in/out enabling, as well as microphone controls are easily
accomplished. No modifications were necessary to get it working.
XPS_SysACE (v1.00a) This is standard Xilinx IP block that is used to communicate with the compact flash storage. No real
hardware changes were made to this block; there were some library function changes in order to disable
things like cache buffering.
Vcom_rxtx
clk rst
dct_raw_data
dct_raw_write
dct_raw_send
dct_sendready
fsl_m_data
fsl_m_control
fsl_m_write
fsl_m_full
128 32
9
PLB_OPB_Bridge (v1.00c) This is a deprecated core that needed to be used in order to be able to communicate with the opb_ac97.
Voice Recognition (voice_rec) uB (v7.10d) The microblaze used for determining which word command was issued by the user and in turn, on a
valid command, sends the request to “Player uB”. The frequency data input comes from “Vcom_Rxtx”,
which is sent through an FSL. The microblaze is configured to be interrupted by available FSL input. The
frequency pattern comparison is performed through an autocorrelation function. The word command is
similarly outputted through an FSL to “Player uB”.
Voice Recognition (voice_rec) UART (v1.00a) Standard UART block used in order to be able to display messages on RS232. It is used for debugging
purposes to ensure proper FSL functionality and voice command recognition.
Player uB (v7.10d) The microblaze used for controlling playback of music, as well as reading and writing to the compact
flash card. There were no hardware changes, other than configuring it to have the correct number of FSL
links. On the software side, the XilFatFS (v1.00a) library was included which allows for read/write access
to the card on a higher level like you would expect when doing a normal C program.
Player UART (v1.00a) Standard UART block used in order to be able to display messages on RS232. Also it was used to display
the playback state of the player. Things like track #, track progress, as well as play mode are displayed
through this.
XPS_GPIO (v1.00a) This general purpose input/output, which is connected to “Voice Recognition uB”, is used to connect to
the switches and push buttons for user input and testing purposes. It also outputs to LEDs to indicate
some states, mainly used for debugging and notification.
2to1 Mux This custom block is a simple 2 input multiplexer. This is used to select one of the UARTs (there are 2, 1
for each microblaze) to output print messages through RS232. The select pin is connected to a switch.
10
Description of Design Tree
File directory File/Folder Name Description
doc Documents
Xaudio player_ub Software project for music player
voice_rec_microblaze Software project for voice recognition
Xaudio/code Main.c Software code for the music player
Audio Software code to communicate with the AC97
Cf Software code to communicate with the compact flash card
Voice_rec Software code for voice recognition
Xaudio/data System.ucf User and pin constraints
Xaudio/drivers ac97_v2_00_a AC97 driver files
audio_v1_00_a Audio driver files
All other files are automatically generated by XPS.
Tips and Tricks
Xilfatfs_v1_00_a The existing driver code for the sysace controller, specifically in “xilfatfs_sysace.c” where the sysace is
initialized in function “init_ace()” has trouble acquiring a lock to the controller. The system ace ERROR
led is a solid red at this point. The configuration controller has the lock and for some reason, does not
release it even though no configuration file is present on the compact flash card. In the above
mentioned file, the following lines replace the call to XSysace_lock(…) in “init_ace”:
XSysAce_ResetCfg(&Ace); XSysAce_Unlock(&Ace); XSysAce_Lock(&Ace,XTRUE);
This will reset the configuration controller and forcefully acquire a lock for the sysace controller. The
sysace is only used for file IO in this project.
Second, in order to be able to read large amounts of data from the CF we had to disable the caching. As
soon as the cache filled out, all the sectors were marked as valid – hence the software got stuck because
it would not be able to evict any sector from the cache. The fix is to disable the caching in the same file
as above – function is read_sector(). Remove everything and replace it with this line:
return read_sector_cf(sector, sector_buf);