Final Neww
-
Upload
mrityunjay-gogoi -
Category
Documents
-
view
222 -
download
0
Transcript of Final Neww
-
7/31/2019 Final Neww
1/37
CHAPTER 1
INTRODUCTION
There has been a tremendous rise wireless communication in the form of mobile users in
world(> 2 billion) in last two decades .Due to widespread growth of cellular networks and drastic
reduction in call rates and lower end mobile handsets, mobile usage has percolated all sections of
society. Any mobile having messaging facility and capability to support common AT commands
can be used in this system .Nokia model 6610 is chosen because it supports AT commands .Most
mobile manufacturer like Siemens, Motorola, LG, Samsung etc also provides AT command
capability.
SMS is store and forward way of transmitting message to and from mobiles.Each short
nmessahe shouldnt be larger than 160 characters (text/binary).Since SMS uses signaling channels
instead of dedicated data channels for its transmission and reception ,these messages can be
sent/received simultaneously with voice/fax /data services over GSM network .The major
advantage of using SMS is provision of intimation to the sender when SMS is delivered at the
destination and ability of SMSC to continue efforts to delivery of message for the specified
validity period if network is presently busy or the user is outside the coverage area.
A system is developed for remote controlling of various electrical devices using mobile
through spoken commands .The system offers several attractive features like
Control from anywhere in the world if cellular coverage is available
Acknowledgement about execution of command from system to user
Uses spoken commands from users for control
Ease of implementation and cost effectiveness approach
1.1 Overview
On the user side, microphone is used to translate the voice signal to electrical signal. The
microphone is connected to HM2007
1
-
7/31/2019 Final Neww
2/37
In this approach, predetermined phrases of words are selected for various commands. The Mel
cepstrum features are extracted from the spoken words for recognition. Mel cepstrum exploits
auditory principles as well as discriminating property of the cepstrum and is proven to be one of
the most successful feature representations in speech related recognition tasks [1, 2]. The spoken
words are isolated and recognized after extraction of features. Learning Vector Quantization
Neural Network is used for recognition of various words used in the command. A text message is
generated if all spoken words are identified as per specified format. This message is transmitted in
form of SMS to control system mobile using AT commands [3].
On control side, system mobile is connected to AVR micro-controller based system
through RS-232C cable. Process block consists of 8 digital output ports, 8 digital input ports and
one analog input port. The configuration of number of inputs, output and analog input ports can be
varied as per the needs of the applications. Presently, LEDs are used to indicate status of output
digital ports, dip switches to change the status of input digital ports, and potential divider provided
to vary analog input voltage.
1.2 AT COMMANDS
Now-a-days extensive list of mobile related AT commands are available for carrying out various
activities like sending SMS, using GPRS services, sending fax, controlling speaker volume, battery
status indication, etc [4-6]. AT commands require sending of text strings A, T, along with
specified command strings through serial port to mobile and are executed on receipt of carriage
return. The result codes are sent by mobile to Terminal Equipment (TE) to indicate the response
after execution of command. The text message is sent to mobile using CMGS commands. CNMI
command is used to indicate to TE about the receipt of incoming SMS message from the network.
On receipt of the SMS message, text words are checked with predetermined format, which
includes password, desired device ON/OFF commands or status
query. After interpretation of valid control message, microcontroller carries out the specified tasks
and then sends SMS to pre-specified mobile number as acknowledgement of fulfillment of
command or reporting of error during execution of command. There are varieties of commands
available at our disposal like directly storing various predefined messages in phone memory,
sending messages at appropriate time by calling the relevant message number depending on
present conditions, storing incoming SMS in phone memory, deleting the message after execution
2
-
7/31/2019 Final Neww
3/37
of command, etc. But it was decided to discard these features to ensure easy adaptation to any
mobile model having limited AT commands interpretation capability. So in our case, any
incoming SMS message is directly routed to microcontroller (TE) and any outgoing text message
is directly sent by micro-controller to designated mobile number without being stored in control
system mobile phone memory.
3
-
7/31/2019 Final Neww
4/37
CHAPTER 2
SPEECH RECOGNITION
2.1 Design Introduction
As globalization, networking, information and digital eras coming, the demand of high
reliability of our identity verification is growing .An efficient mean to this is by authenticating
users through biometric methods. Among the existing biometric methods, voice biometrics can be
an affordable and accurate authentication technology that has been already successfully and widely
employed. Voiceprint, as a basic human physiological characteristics, possess a unique role which
is difficult to counterfeit, imitate and replace.As a non-contact identification technology, Voice
Recognition Technology is being accepted by the users.
Voice authentication refers to the process of accepting or rejecting the identity claim
of a speaker on the basis of individual information present in the speech waveform . It
has received increasing attention over the past two decades, as a convenient, user-friendly way of
replacing (or supplementing) standard password-type matching.The authentication procedure
requests from the user to pronounce a random sequence of digits. After capturing speech and
extracting voice features, individual voice characteritics are generated by registration algorithm.
The central process unit decides whether the received features match the stored voiceprint of the
customer who claims to be, and accordingly grants authentication.
2.2 Voice Recognition Technology Principle
Voice Recognition, also known as the Speaker Recognition, has two categories:
speaker identification and speaker verification. Speaker identification is used to determine which
one of the people speaks, i.e. "one out of more election" and speaker verification is used to
determine whether a person specified speaks, i.e. "one-on-one recognition".
According to the voice of different materials, voice recognition can be divided into the
text-dependent, and text-independent technology. The text-dependent voice recognition system
requires speaker pronounce in accordance with the contents of the text. Each person's individual
sound profile model is established accurately. People must also be identified by the contents of the
text during recognition to achieve better effect. Text-independent recognition system does not
4
-
7/31/2019 Final Neww
5/37
require fixed contents of words, which is relatively difficult to model, but is convenient for user
and can be applied to a wide range.Voiceprint recognition is an application based on physiological
and behavioral characteristics of the speakers voice and linguistic patterns. Different from speech
recognition, voiceprint recognition is regardless of contents of speech.Rather, the unique features
of voice are analyzed to identify the speaker. With voice samples, the unique features will be
extracted and converted to digital symbols, and then these symbols are stored as that person's
character template. This template is stored in a computer database, a smart card or bar-coded cards.
User authentication is processed inside the recognition system to identify matching or not.
2.3 Classification of Speech Recognition System
Speech Recognition system, according to different points of view and the scope of different
applications, has different performance requirements of the design. Their implementations are thefollowing types:
Isolated words, conjunctions, continuous speech recognition, and speech
understanding of the conversation systems
Large vocabulary and small vocabulary system
Specific and non specific speech recognition system
CHAPTER 35
-
7/31/2019 Final Neww
6/37
DYNAMIC TIME WARPING
3.1Principle
A distance measurement between time series is needed to determine similarity between time
series and for time series classification. Euclidean distance is an efficient distance
measurement that can be used. The Euclidian distance between two time series is simply the
sum of the squared distances from each nth point in one time series to the nth point in the
other. The main disadvantage of using Euclidean distance for time series data is that its results
are very unintuitive. If two time series are identical, but one is shifted slightly along the time
axis, then Euclidean distance may consider them to be very different from each other.
Dynamic time warping (DTW) was introduced to overcome this limitation and give intuitive
distance measurements between time series by ignoring both global and local shifts in the time
dimension.
Problem Formulation. The dynamic time warping problem is stated as follows:
Given two time series X, and Y, of lengths |X| and |Y|,
construct a warp path W
where K is the length of the warp path and the kth element of the warp path is
where i is an index from time series X, and j is an index from time series Y. The warp path must
start at the beginning of each time series at w1 = (1, 1) and finish at the end of both time series at
wK= (|X|, |Y|). This ensures that every index of both time series is used in the warp path. There is
6
-
7/31/2019 Final Neww
7/37
also a constraint on the warp path that forces i and j to be monotonically increasing in the warp
path, which is why the lines representing the warp path in Figure 1 do not overlap. Every index of
each time series must be used. Stated more formally:
The optimal warp path is the warp path is the minimum-distance warp path, where the
distance of a warp path W is
Dist(W) is the distance (typically Euclidean distance) of warp path W, and Dist(wki,
wkj) is the distance between the two data point indexes (one from X and one from Y)in the kth element of the warp path.
3.2 Mel-Frequency Cepstral Coefficients
The Mel-cepstrum exploits auditory principles as well as decorrelating property of the
cepstrum. In MFCC implementation, triangular filters are used. These filters follow Mel scale
whereby band edges and corner frequencies are linear for low frequencies (
-
7/31/2019 Final Neww
8/37
Finally discrete cosine transform (DCT) of filter bank coefficients is taken to get MFCC as under
where log{Emel(l)} is log filter bank energies & Cmel(k) is the kth MFCC and N is number of
filters.It has been observed that performance is reasonably well for 24 filters in MFCC
implementation [14]. If there are n frames in a word and 12 MFCCs are computed for each frame,
we get feature vector of length 12
n. However, the number of frames (n) varies from word toword, which in turn changes the length of feature vector. In order to obtain feature vector of
constant length, n values of each Mel Frequency Cepstral Coefficient are converted into 10 values
using resampling technique. Thus for each word, constant length feature vector of 120 (12 10)
elements is obtained. Principal Component Analysis (PCA) is carried out on the MFCC data thus
obtained. PCA transforms the input data so that the elements of the input vectors are uncorrelated.
3.3 LVQ Classifier:
The LVQ is an algorithm for learning classifiers from labeled data samples. It models the
discrimination function defined by the set of labeled codebook vectors and the nearest
neighborhood search between the codebook and data. In classification, a data point xi is assigned
to a class according to the class label of the closest codebook vector. The training algorithm
involves an iterative gradient update of the winner unit [15, 16].
The winner unit wc is defined by
c = arg min || xi wk ||
kThe update equation for the winner unit wc defined by the nearest neighbor and a data sample x(t)
is
wc(t+1) = wc(t) alpha(t) [x(t) wc(t)]
where sign depends on whether the data sample is correctly classified (+) or misclassified (-) and
alpha(t) is learning rule and must decrease monotonically in time.8
-
7/31/2019 Final Neww
9/37
CHAPTER 4
Hardware Implementations
4.1 Transmitter Part
9
-
7/31/2019 Final Neww
10/37
On the transmitter side, microphone is used to translate the voice signal to electrical signal. The
microphone is connected to HM2007 , a voice recognition module.
In this approach, predetermined phrases of words are selected for various commands. The Mel
cepstrum features are extracted from the spoken words for recognition. Mel cepstrum exploits
auditory principles as well as discriminating property of the cepstrum and is proven to be one of
the most successful feature representations in speech related recognition tasks [1, 2]. The spoken
words are isolated and recognized after extraction of features. Learning Vector Quantization
Neural Network is used for recognition of various words used in the command. A text message is
generated if all spoken words are identified as per specified format. This message is transmitted in
form of SMS to control system mobile using AT commands.
Figure: BLOCK DIAGRAM OF TRANSMITTER
4.1.1 NEURAL NETWORK FOR SPEECH RECOGNITION:
4.1.1.1Overview of Neural Networks:
10
-
7/31/2019 Final Neww
11/37
Artificial neural networks are computers whose architecture is modeled
after the brain. They typically consist of many hundreds of simple processing units
which are wired together in a complex communication network. Each unit
or node is a simplified model of a real neuron which fires (sends off a new signal) if
it receives a sufficiently strong input signal from the other nodes to which it is
connected. The strength of these connections may be varied in order for the
network to perform different tasks corresponding to different patterns of node firing
activity. This structure is very different from traditional computers.
4.1.1.2 Fundamentals of Neural Network:
There are many different types of neural networks, but they all have four
basic attributes:
A set of processing units;
A set of connections;
A computing procedure
A training procedure.
4.1.1.3Artificial neural networks from the viewpoint of speech recognition:
Artificial neural networks (ANNs) are systems consisting of
interconnected computational nodes working somewhat similarly to human neurons. Neural
networks can be used e.g. to approximate functions or classify data into similar classes than can be
e.g. phonemes, sub-phoneme units, syllables or words in the speech recognition domain. The
ability to learn by adapting strengths of inter-neuron connections (synapses) is a fundamental
property of artificial neural networks. Speech recognition has been another proving ground for
neural networks.
Researchers quickly achieved excellent results in such basic tasks as voiced/unvoiced
discrimination (Watrous 1988), phoneme recognition (Waibel et al, 1989), and spoken digit
recognition (Franzini et al, 1989). However, in 1990, when this thesis was proposed, it still
remained to be seen whether neural networks could support a large vocabulary, speaker
independent, continuous speech recognition system. Of the two types of variability in speech
11
-
7/31/2019 Final Neww
12/37
acoustic and temporal the former is more naturally posed as a static pattern matching problem
that is amenable to neural networks; therefore we use neural networks for acoustic modeling, while
we rely on conventional Hidden Markov Models for temporal modeling.
4.1.2 HM2007 FUNCTIONING:
The HM2007 is a CMOS voice recognition LSI (Large Scale Integration) circuit. The chip
contains an analog front end, voice analysis, regulation, and system control functions. The chip
may be used in a stand alone or CPU connected.
Pin configuration of HM2007
The functioning of the HM2007 IC involves the following steps:
4.1.2.1 Speech Acquisition:
12
-
7/31/2019 Final Neww
13/37
We can easily implement speech acquisition with the HM 2007 ic. During
speech acquisition, speech samples are obtained from the speaker in real time and stored in
memory for preprocessing. Speech acquisition requires a microphone coupled with an analog-to-
digital converter (ADC) that has the proper amplification to receive the voice speech signal,
sample it, and convert it into digital speech. The system sends the analog speech through a
transducer, amplifies it, sends it through an ADC. The received samples are stored into memory on
a RAM. The microphone input port with the audio codec receives the signal, amplifies it, and
converts it into 8-bit PCM digital samples at a sampling rate of 3.57MHZ. The HM 2007 IC
requires initial configuration or training of words, which is performed using a programming board.
In the training process user trains the IC by speaking words into the microphone and assigning a
particular value for that word. For example a world hello can be assigned a value 02or 05. This
can then be later connected to a microcontroller for further functions.
4.1.2.2 Speech Preprocessing:
Preprocessing reduces the amount of processing required in later
stages. Generally, preprocessing involves taking the speech samples as input, blocking the samples
into frames, and returning a unique pattern for each sample, as described in the following steps.
1. The system must identify useful or significant samples from the speech signal. To accomplish
this goal, the system divides the speech samples into overlapped frames.
2. The system checks the frames for voice activity using endpoint detection and energy threshold
calculations.
3. The speech samples are passed through a pre-emphasis filter.
4.1.2.3 Training the IC:
An important part of speech-to-text conversion using pattern
recognition is training. Training involves creating a pattern representative of the features of a class
using one or more test patterns that correspond to speech sounds of the same class. A model
commonly used for speech recognition is the HMM, which is a statistical model used for modeling
13
-
7/31/2019 Final Neww
14/37
an unknown system using an observed output sequence. The keypad and digital display are used to
communicate with and program the HM2007 chip.
The keypad is made up of 12 normally open momentary contact switches. When the circuit
is turned on, 00 is on the digital display, the red LED (READY) is lit and the circuit waits for a
command.
4.2 Receiver Part
On control side i.e. at the receiver side, system mobile is connected to AVR micro-controller
based system through RS-232C cable.Process block consists of 8 digital output ports, 8 digital input ports
and one analog input port. The configuration of number of inputs, output and analog input ports can be
varied as per the needs of the applications.
Figure: BLOCK DIAGRAM OF THE RECEIVER SECTION
4.2.1 Overview of Serial Communication
Computers can transfer data in two ways: parallel and serial. In parallel data transfers, often 8 or more
lines (wire conductors) are used to transfer data to a device that is only a few feet away. Examples of parallel
data transfer are printers and hard disks; each uses cables with many wire strips. Although in such cases a lot of
14
-
7/31/2019 Final Neww
15/37
data can be transferred in a short amount of time by using many wires in parallel, the distance cannot be great.
To transfer to a device located many meters away, the serial method is used. In serial communication, the data
is sent one bit at a time, in contrast to parallel communication, in which the data is sent a byte or more at a time.
Serial communication of the 89s52 and the peripheral is the topic of this chapter.
If data is to be transferred on the telephone line, it must be converted from 0s and 1s to audio tones,
which are sinusoidal-shaped signals. A peripheral device called a modem, which stands for
modulator/demodulator, performs this conversion.
Serial data communication uses two methods, asynchronous and synchronous. The synchronous
method transfers a block of data at a time, while the asynchronous method transfers a single byte at a time.
In data transmission if the data can be transmitted and received, it is a duplex transmission. This is in
contrast to simplex transmissions such as with printers, in which the computer only sends data. Duplex
transmissions can be half or full duplex, depending on whether or not the data transfer can be simultaneous. If
data is transmitted one way at a time, it is referred to as half duplex. If the data can go both ways at the same
time, it is full duplex. Of course, full duplex requires two wire conductors for the data lines, one for
transmission and one for reception, in order to transfer and receive data simultaneously.
Asynchronous serial communication and data framing
The data coming in at the receiving end of the data line in a serial data transfer is all 0s and 1s; it is
difficult to make sense of the data unless the sender and receiver agree on a set of rules, a protocol, on how the
data is packed, how many bits constitute a character, and when the data begins and ends.
Start and stop bits
Asynchronous serial data communication is widely used for character-oriented transmissions, while
block-oriented data transfers use the synchronous method. In the asynchronous method, each character is
placed between start and stop bits. This is called framing. In the data framing for asynchronous
communications, the data, such as ASCII characters, are packed between a start bit and a stop bit. The start bit
is always one bit, but the stop bit can be one or two bits. The start bit is always a 0 (low) and the stop bit (s) is 1
(high).
Data transfer rate
The rate of data transfer in serial data communication is stated in bps (bits per second). Another
widely used terminology for bps is baud rate. However, the baud and bps rates are not necessarily equal. This
15
-
7/31/2019 Final Neww
16/37
is due to the fact that baud rate is the modem terminology and is defined as the number of signal changes per
second. In modems a single change of signal, sometimes transfers several bits of data. As far as the conductor
wire is concerned, the baud rate and bps are the same, and for this reason we use the bps and baud
interchangeably.
The data transfer rate of given computer system depends on communication ports incorporated into
that system. For example, the early IBMPC/XT could transfer data at the rate of 100 to 9600 bps. In recent
years, however, Pentium based PCS transfer data at rates as high as 56K bps. It must be noted that in
asynchronous serial data communication, the baud rate is generally limited to 100,000bps.
Computers can transfer data in two ways: parallel and serial. In parallel data transfers, often 8 or more
lines (wire conductors) are used to transfer data to a device that is only a few feet away. Examples of parallel
transfers are printers and hard disks; each uses cables with many wire strips. Although in such cases a lot of
data can be transferred in a short amount of time by using many wires in parallel, the distance cannot be great.
To transfer to a device located many meters away, the serial method is used. In serial communication, the data
is sent one bit at a time, in contrast to parallel communication, in which the data is sent a byte or more at a time.
The 8051 has serial communication capability built into it, there by making possible fast data transfer using
only a few wires. The PC uses RS 232 as a Serial Communication Standard.
4.2.2 RS232 Standards
To allow compatibility among data communication equipment made by various manufacturers, an
interfacing standard called RS232 was set by the Electronics Industries Association (EIA) in 1960. In 1963 it
was modified and called RS232A. RS232B AND RS232C were issued in 1965 and 1969, respectively. Today,
RS232 is the most widely used serial I/O interfacing standard. This standard is used in PCs and numerous types
of equipment. However, since the standard was set long before the advert of the TTL logic family, its input and
output voltage levels are not TTL compatible. In RS232, a 1 is represented by -3 to -25V, while a 0 bit is +3 to
+25V, making -3 to +3 undefined. For this reason, to connect any RS232 to a microcontroller system we must
use voltage converters such as MAX232 to convert the TTL logic levels to the RS232 voltage levels, and vice
versa. MAX232 IC chips are commonly referred to as line drivers.
RS232 pins
16
-
7/31/2019 Final Neww
17/37
RS232 cable connector commonly referred to as the DB-25 connector. In labeling, DB-25P
refers to the plug connector (male) and DB-25S is for the socket connector (female). Since not all
the pins are used in PC cables, IBM introduced the DB-9 Version of the serial I/O standard,
which uses 9 pins only, as shown in table.
DB-9 pin connector
1 2 3 4 5
6 7 8 9
(Out of computer and exposed end of cable)
Pin Functions:
Pin Description
1 Data carrier detect (DCD)2 Received data (RXD)
3 Transmitted data (TXD)
4 Data terminal ready(DTR)
5 Signal ground (GND)
6 Data set ready (DSR)
7 Request to send (RTS)
8 Clear to send (CTS)
9 Ring indicator (RI)
Note: DCD, DSR, RTS and CTS are active low pins.
The method used by RS-232 for communication allows for a simple connection of three linesnamely Tx, Rx, and Ground.
TXD: carries data from DTE to the DCE.
RXD: carries data from DCE to the DTE
SG: signal ground
4.2.3 8051 connection to RS232:
17
-
7/31/2019 Final Neww
18/37
Embedded
Controller
RXD
TXD
TXD
RXD2
3
5
GND
MAX 232
The RS232 standard is not TTL compatible; therefore, it requires a Line Driver such as the MAX232
chip to convert RS232 voltage levels to TTL levels, and vice versa.
The 8051 has two pins that are used specifically for transferring and receiving data serially. These two
pins are TXD and RXD and are a part of the port 3 (P3.0 and P3.1). Pin 11 of the 8051 is designated as TXD
and pin 10 as RXD. These pins are TTL compatible; therefore, they require a line driver to make them RS232
compatible. One such line driver is the MAX232 chip.
MAX232 converts from RS232 voltage levels to TTL voltage levels, and vice versa. One advantage of
the MAX232 chip is that it uses a +5V power source which, is the same as the source voltage for the 8051. In
the other words, with a single +5V power supply we can power both the 8051 and MAX232, with no need for
the power supplies. The MAX232 has two sets of line drivers for transferring and receiving data. The line
drivers used for TXD are called T1 and T2, while the line drivers for RXD are designated as R1 and R2. In
many applications only one of each is used.
4.2.4 MAX-232
Logic Signal Voltage
18
-
7/31/2019 Final Neww
19/37
Serial RS-232 (V.24) communication works with voltages (between -15V ... -3V and used to transmit a binary
'1' and +3V ... +15V to transmit a binary '0') which are not compatible with today's computer logic voltages. On
the other hand, classic TTL computer logic operates between 0V ... +5V (roughly 0V ... +0.8V referred to as
low for binary '0', +2V ... +5V for high binary '1' ). Modern low-power logic operates in the range of 0V ...
+3.3V or even lower.
So, the maximum RS-232 signal levels are far too high for today's computer logic electronics, and the
negative RS-232 voltage can't be grokked at all by the computer logic. Therefore, to receive serial data from an
RS-232 interface the voltage has to be reduced, and the 0 and 1 voltage levels inverted. In the other direction
(sending data from some logic over RS-232) the low logic voltage has to be "bumped up", and a negative
voltage has to be generated, too.
RS-232 TTL Logic
--------------------------------------------------------
-15V ... -3V +2V ... +5V 1
+3V ... +15V 0V ... +0.8V 0
All this can be done with conventional analog electronics, e.g. a particular power supply and a couple of
transistorsor the once popular 1488 (transmitter) and 1489 (receiver) ICs. However, since more than a decade it
has become standard in amateur electronics to do the necessary signal level conversion with an integrated
circuit (IC) from the MAX232 family (typically a MAX232A or some clone). In fact, it is hard to find some RS-
232 circuitry in amateur electronics without a MAX232A or some clone.
The MAX232 & MAX232A
19
http://en.wikipedia.org/wiki/transistorhttp://en.wikipedia.org/wiki/transistorhttp://en.wikipedia.org/wiki/transistor -
7/31/2019 Final Neww
20/37
Figure: A MAX232 integrated circuit
The MAX232 fromMaximwas the first IC which in one package contains the necessary drivers (two)
and receivers (also two), to adapt the RS-232 signal voltage levels to TTL logic. It became popular, because it
just needs one voltage (+5V) and generates the necessary RS-232 voltage levels (approx. -10V and +10V)
internally. This greatly simplified the design of circuitry. Circuitry designers no longer need to design and build
a power supply with three voltages (e.g. -12V, +5V, and +12V), but could just provide one +5V power supply,
e.g. with the help of a simple 78x05 voltage converter.
The MAX232 has a successor, the MAX232A. The ICs are almost identical, however, the MAX232A is
much more often used than the original MAX232, and the MAX232A only needs external capacitors 1/10th the
capacity of what the original MAX232 needs.
It should be noted that the MAX232 (A) is just a driver/receiver. It does not generate the necessary RS-
232 sequence of marks and spaces with the right timing, it does not decode the RS-232 signal, it does not
provide a serial/parallel conversion. All it does is to convert signal voltage levels. Generating serial data with
the right timing and decoding serial data has to be done by additional circuitry, e.g. by a 16550 UARTor one of
these small micro controllers (e.g.Atmel AVR,Microchip PIC) getting more and more popular.
The MAX232 and MAX232A were once rather expensive ICs, but today they are cheap. It has also helped
that many companies now produce clones (ie. Sipex). These clones sometimes need different external circuitry,
e.g. the capacities of the external capacitors vary. It is recommended to check the data sheet of the particular
manufacturer of an IC instead of relying on Maxim's original data sheet.
20
http://www.maxim-ic.com/http://www.maxim-ic.com/http://www.maxim-ic.com/http://en.wikibooks.org/wiki/Serial_Programming:8250_UART_Programminghttp://en.wikibooks.org/wiki/Serial_Programming:8250_UART_Programminghttp://en.wikibooks.org/wiki/Atmel_AVRhttp://en.wikibooks.org/wiki/Atmel_AVRhttp://en.wikibooks.org/wiki/Embedded_Systems/PIC_Microcontrollerhttp://en.wikibooks.org/wiki/Embedded_Systems/PIC_Microcontrollerhttp://www.sipex.com/products/interface.htmhttp://www.maxim-ic.com/http://en.wikibooks.org/wiki/Serial_Programming:8250_UART_Programminghttp://en.wikibooks.org/wiki/Atmel_AVRhttp://en.wikibooks.org/wiki/Embedded_Systems/PIC_Microcontrollerhttp://www.sipex.com/products/interface.htm -
7/31/2019 Final Neww
21/37
The original manufacturer (and now some clone manufacturers, too) offers a large series of similar ICs, with
different numbers of receivers and drivers, voltages, built-in or external capacitors, etc. E.g. The MAX232 and
MAX232A need external capacitors for the internal voltage pump, while the MAX233 has these capacitors
built-in. The MAX233 is also between three and ten times more expensive in electronic shops than the
MAX232A because of its internal capacitors. It is also more difficult to get the MAX233 than the garden
variety MAX232A.
A similar IC, the MAX3232 is nowadays available for low-power 3V logic.
MAX232(A) DIP Package
No. Name Purpose Signal VoltageCapacitor
MAX232Capacitor MAX232A
1 C1++ connector for
capacitor C1
capacitor should stand at least
16V1F 100Nf
2 V+ output of voltage pump+10V, capacitor should stand
at least 16V1F to VCC 100nF to VCC
3 C1-- connector for capacitor
C1
capacitor should stand at least
16V1F 100nF
21
-
7/31/2019 Final Neww
22/37
4 C2++ connector for
capacitor C2
capacitor should stand at least
16V1F 100nF
5 C2-- connector for capacitor
C2
capacitor should stand at least
16V1F 100nF
6 V-output of voltage pump /
inverter
-10V, capacitor should stand
at least 16V1F to GND 100nF to GND
7 T2out Driver 2 output RS-232
8 R2in Receiver 2 input RS-232
9 R2out Receiver 2 output TTL
10 T2in Driver 2 input TTL
11 T1in Driver 1 input TTL
12 R1out Receiver 1 output TTL
13 R1in Receiver 1 input RS-232
14 T1out Driver 1 output RS-232
15 GND Ground 0V 1F to VCC 100nF to VCC
16 VCC Power supply +5V see above see above
V+(2) is also connected to VCC via a capacitor (C3). V-(6) is connected to GND via a capacitor (C4). And
GND(16) and VCC(15) are also connected by a capacitor (C5), as close as possible to the pins.
A Typical Application
The MAX232 (A) has two receivers (converts from RS-232 to TTL voltage levels) and two drivers (converts
from TTL logic to RS-232 voltage levels). This means only two of the RS-232 signals can be converted in each
direction. The old MC1488/1498 combo provided four drivers and receivers.
22
-
7/31/2019 Final Neww
23/37
Typically a pair of a driver/receiver of the MAX232 is used for
TX and RX
and the second one for
CTS and RTS.
There are not enough drivers/receivers in the MAX232 to also connect the DTR, DSR, and DCD signals.
Usually these signals can be omitted when e.g. communicating with a PC's serial interface. If the DTE really
requires these signals either a second MAX232 is needed, or some other IC from the MAX232 family can be
used (if it can be found in consumer electronic shops at all). An alternative for DTR/DSR is also given below.
The circuitry is completed by connecting five capacitors to the IC as it follows. The MAX232 needs 1.0F
capacitors, the MAX232A needs 0.1F capacitors. MAX232 clones show similar differences. It is
recommended to consult the corresponding data sheet. At least 16V capacitor types should be used. If
electrolytic or tantalic capacitors are used, the polarity has to be observed. The first pin as listed in the following
table is always where the plus pole of the capacitor should be connected to.
23
-
7/31/2019 Final Neww
24/37
Capacitor + Pin - Pin Remark
C1 1 3
C2 4 5
C3 2 16
C4 GND 6This looks non-intuitive, but because pin 6 is
on -10V, GND gets the + connector, and not the -
C5 16 GND
The 5V power supply is connected to
+5V: Pin 16
GND: Pin 15
The output of the VT pin is high only when the transmission is valid. Otherwise it is low always.
Output type: There are 2 types of output to select from:
Momentary type: The data outputs follow the encoder during a valid transmission and the reset.
Latch type: The data outputs follow the encoder during a valid
4.3 Microcontroller AT89S52
4.3.1 Overview
AT89S52 is one of the family MCS-51/52 equipped with an internal 8 Kbyte Flash
EPROM (Erasable and Programmable Read Only Memory), which allows memory to be24
-
7/31/2019 Final Neww
25/37
reprogrammed.Designed by Atmel AT89S52 in accordance with standard instructions and pin
layout 80C5.
AT89S52 Microcontroller Features :
A CPU (Central Processing Unit) 8 Bit.
256 bytes of RAM (Random Access Memory) internally.
Four-port I / O, which each consist of eight bits
the internal oscillator and timing circuits.
Two timer / counters 16 bits
Five interrupt lines (two fruits and three external interrupt internal interruptions).
A serial port with full duplex UART (Universal Asynchronous Receiver
Transmitter).
Able to conduct the process of multiplication, division, and Boolean.
the size of 8 KByte EPROM for program memory.
Maximum speed execution of instructions per cycle is 0.5 s at 24 MHz clock
frequency.
If the microcontroller clock frequency used is 12 MHz, the speed is 1 s instruction
execution
25
-
7/31/2019 Final Neww
26/37
4.3.2 Pin Configuration
AT89S52 microcontroller has 40 pins with a single 5 Volt power supply. The pin 40 is
illustrated as follows
Figure:AT89S52 Microcontroller pin diagram
The function of each pin AT89S52 is:
Pin 1 to 8 (Port 1) is an 8-bit parallel port of a two-way (bidirectional) that can be used for
different purposes (general purpose).
Pin 9 is a pin reset, reset is active if a high ration.
P3.0 (10): RXD (serial port data receiver)
P3.1 (11): TXD (serial port data sender)
P3.2 (12): INT0 (external interrupt 0 input, active low)
26
-
7/31/2019 Final Neww
27/37
P3.3 (13): INT1 (ekstrernal an interrupt input, active low)
P3.4 (14): T0 (external input timer / counter 0)
P3.5 (15): T1 (external input timer / counter 1)
P3.6 (16): WR (Write, active low) control signal from port 0 write data to memory and
input-output data externally.
P3.7 (17): RD (Read, active low) control signal of the reading of input-output data
memory external to the port 0. XTAL pin 18 as the second, the output is connected to the
crystal oscillator. XTAL pin 19 as the first, high berpenguatan input to the oscillator,
connected to the crystal.
Pin 20 as Vss, is connected to 0 or ground on the circuit. Pin 21 to 28 (Port 2) is 8 bits parallel
ports in both directions. This port sends the address byte when accessing external memory is
carried on. Pin 29 as the PSEN (Program Store Enable) is the signal used for reading, move the
program the external memory (ROM / EPROM) to microcontroller (active low).
Pin 30 as the ALE (Address Latch Enable) to hold down the address for accessing external
memory. This pin also functions as a prog (active low) that is activated when the internal program
flash memory on the microcontroller (on chip).Pin 31 as the EA (External Accesss) to select the
memory to be used, the internal program memory (EA = Fcc) or external program memory (EA =
Vss), also serves as Vpp (programming supply voltage) when programming the internal flash
memory on the microcontroller Pin 32 to 39 (Port 0) is an 8-bit parallel port in both directions.
Under which functions as a multiplexed address data to access an external program and data
memory.Pin 40 as Fcc, connected to +5 V as a ration to the microcontroller.All single chip in the
family division of MCS-51 has the address space to programs and data. The separation of program
memory and data memory allows data to be accessed by a memory address 8 bits.Even so, the
address memory 16 bits of data can be generated through the DPTR register (Point Data Register).
Program memory can only be read can not be written because it is stored in the EPROM.In this
case the EPROM is available in a single chip AT89S52 for 8 Kbyte.
27
-
7/31/2019 Final Neww
28/37
Figure:AT89S52 Microcontroller memory
28
-
7/31/2019 Final Neww
29/37
CHAPTER 5
Transmission Of Digitized Speech Over Wireless Network
5.1 Overview Of GSM Modem
A GSM modem is a specialized type of modem which accepts a SIM card, and operates
over a subscription to a mobile operator, just like a mobile phone. From the mobile operator
perspective, a GSM modem looks just like a mobile phone.
When a GSM modem is connected to a computer, this allows the computer to use the GSM
modem to communicate over the mobile network. While these GSM modems are most frequently
used to provide mobile internet connectivity, many of them can also be used for sending and
receiving SMS and MMS messages.For the purpose of this project, the term GSM modem is used as a generic term to refer to
any modem that supports one or more of the protocols in the GSM evolutionary family, including
the 2.5G technologies GPRS and EDGE, as well as the 3G technologies WCDMA, UMTS,
HSDPA and HSUPA.
A GSM modem exposes an interface that allows applications such as NowSMS to send and
receive messages over the modem interface. The mobile operator charges for this message sending
and receiving as if it was performed directly on a mobile phone. To perform these tasks, a GSM
modem must support an extended AT command set for sending/receiving SMS messages, as
defined in theETSI GSM 07.05and and 3GPP TS 27.005 specifications.
GSM modems can be a quick and efficient way to get started with SMS, because a special
subscription to an SMS service provider is not required. In most parts of the world, GSM modems
are a cost effective solution for receiving SMS messages, because the sender is paying for the
message delivery.
29
http://www.etsi.org/http://www.etsi.org/http://www.etsi.org/http://www.3gpp.org/ftp/specs/html-info/27005.htmhttp://www.etsi.org/http://www.3gpp.org/ftp/specs/html-info/27005.htm -
7/31/2019 Final Neww
30/37
5.2 Overview of SIM 300
For the purpose of our project we are using a GSM modem SIM 300 both at the
transmitter and receiver side.
5.2.1 FEATURES COMPLETE GSM MODEM
HANDLES VOICE / DATA / SMS / FAX
DUAL BAND 900 / 1800 MHz GSM TRANSMISSION
ACCEPTS STANDARD SIM CARD
CAN BE USED ON STANDARD GSM NETWORK
RS232 INTERFACE
USES STANDARD AT COMMANDS
SUPPORTS CLASS 1 FAX COMMANDS
DATA TRANSMISSION UP TO 14400 BAUD
30
-
7/31/2019 Final Neww
31/37
CHAPTER 6
RESULTS
Fig. 4 shows the speech signal waveform for a spoken command phrase Alpha Device Six On
and its energy. Fig. 5 shows the plot of MFCC coefficients {Cmel(1), Cmel(2), Cmel(3), Cmel(4)}
for the spoken word Alpha. Fifty samples of each spoken word are stored out of which, 25 are
used for training and remaining for testing. Thus database of 650 samples of words is used for
experimentation. Accuracy of correct recognition for various words in spoken commands with
principal component analysis (PCA) is shown in Table II. The accuracy for the words on and
one is relatively less because of phonetic similarity in these words. However, these words can
easily be discriminated due to difference in their utterance positions in spoken command phrase.
FIGURE: Plot of spoken phrase alpha device six on and its energy
31
-
7/31/2019 Final Neww
32/37
FIGURE: Plot of MFFC Coefficients for spoken word alpha
CHAPTER 7
32
-
7/31/2019 Final Neww
33/37
CONCLUSION AND FUTURE ASPECTS
In the near future, speech recognition will become the method of choice for controlling
appliances, toys, tools, computers and robotics. There is a huge commercial market waiting for this
technology to mature.
This project demonstrates in details the construction and building of a stand alone trainable
speech recognition circuit that may be interfaced to control just about anything electrical, such as;
appliances, robots, test instruments, VCR's TV's, etc. With suitable modifications the project can
be extended for various industrial automation. To control and command an appliance (computer,
VCR, TV security system, etc.) by speaking to it, will make it easier, while increasing the
efficiency and effectiveness of working with that device.
At its most basic level speech recognition allows the user to perform parallel tasks, (i.e. hands and
eyes are busy elsewhere) while continuing to work with the computer or appliances. Remote
control of devices and retrieval of information relating present status of inputs using spoken
commands have been successfully demonstrated. There is scope for lot of improvement depending
upon the user requirements like inclusion of greater number of desired commands, selection of
suitable sensor for measurement of analog parameters, etc. This approach can be easily extended to
develop many exciting products from remote process control to high-end security solutions. It can
prove to be great boon to blind/physically handicapped persons due to its capability for remote
control through speech commands.
33
-
7/31/2019 Final Neww
34/37
ACCURACY OF WORD RECOGNITION:
However, this method is not suitable for time-critical applications as message transfer time to
destination is variable. This problem can be alleviated to certain extent by adding time at which
device should respond to message and sending the SMS well in advance before the
scheduled event. The software can be modified in this case to check if timing information is
present and accordingly schedule that event. Alternatively, it is recommended to use data calls
(Fax/ GPRS) or DTMF based calls for immediate response from system with suitable
modifications. The accuracy of spoken commands recognition system is about 98% (much better
than our previous work [17]). Moreover, spoken phrase can be extended to carry out additional
tasks like adding time duration for Device ON/OFF condition. Further, adding the speaker
verification feature can enhance security level. Presently PC is used for generation of text
34
-
7/31/2019 Final Neww
35/37
message from voice command. For dedicated
applications, PC can be replaced by DSP processor/
FPGA based system with higher initial development cost.
CHAPTER 8
BIBLIOGRAPHY:
35
-
7/31/2019 Final Neww
36/37
1. S. B. Davies and P. Mermelstein, Comparison of Parametric Representation for Monosyllabic
Word Recognition in Continously spoken Semantics, IEEE Transanction on Acoustics, Speech &
Signal Processing, vol ASSP-28, Aug 1980, pp 357-366.
2 .Thomas F. Quatiero, Discrete time Speech Signal Processing Principles and Practice,
Pearson Education (Singapore) Pvt. Ltd., Indian Branch, Delhi, India, 2004
3. N. P. Jawarkar, Vasif Ahmed & R. D. Thakare, Remote world-wide control through SMS using
Nokia Mobile, IETE Journal of Education, Vol 46, No. 4, Oct-Dec 2005, pp 165-170.
4. http://forum.nokia.com, AT Commands Set for Nokia
GSM and WCDMA products, Version 1.2, July 2005.
5. http://www.atmel.com/avr\
6. G. M. White and R. B. Neely, Speech Recognition Experiments with Linear Prediction,
Bandpass filtering & Dynamic Programming, IEEE Transactions on Acoustics, Speech & Signal
Processing, vol ASSP-24(2), 1976, pp 183-188.
7. L. R. Rabiner and B. H. Juang, Fundamental of Speech Recognition, Pearson Education
(Singapore) Pvt. Ltd., 2005.
8. S. Umesh, L. Cohen and D. Nelson, Frequency warping and Mel scale. IEEE Signal
Processing Letter, vol. 9, No. 3, March 2001, pp 104-107.
9. Hollmen V. Tresp and O. Simula, A Learning Vector Quantization Algorithm for Probabilistic
Model, Proceedings of EUSIPCO 2000 X European Signal Processing Conference, Volume II,
pp 721-724.
36
http://www.atmel.com/avr%5Chttp://www.atmel.com/avr%5C -
7/31/2019 Final Neww
37/37
10. Kohonen T, Improved version of Learning Vector Quantization, International Joint
Conference on Neural Networks, San Diego, CA, 1990, pp z:545-550.
11.. Real Time Data Transmission over GSM Voice Channel for secure Voice & Data Applications
N.N. Katugampala, K.T. Al-Naimi, S. Villette, and A.M. Kondoz, University of Surrey, United
Kingdom Email: [email protected]