You are on page 1of 8

An Operator Interface for an Autonomous Mobile

System
NILESH GOEL, ADITYA AGARWAL, SUBHAYAN BANERJEE, CHANDRA VEER SINGH
Department of Electronics and Communication Engineering,
MNNIT, Allahabad- 211 004, India
Email: nilesh.goel@gmail.com

Abstract In this paper; a system for reliable detection an algorithm to track sound direction and in
of the direction and efficient recognition of human section IV an algorithm for faithful recognition
voice is proposed that is to be used in an autonomous of voice commands is proposed. The
humanoid robot. After detection and recognition, the experimental results are given in section V to
system will perform the tasks according to the given show the improvement in the proposed algorithm
commands. Compared with previous researches, this
system comprises simpler, faster and more accurate
as compare to previous reported algorithms.
algorithms. The system consists of a microphone
assembly of three microphones for sound detection II. PROPOSED SYSTEMS
and one separate microphone for voice recognition,
band pass filter, microcontroller as processing unit for The block diagram of the proposed systems
sound detection and PC(Personal Computer) as for reliable detection of the direction of the voice
processing unit for speech recognition using MATLAB commands and its recognition is shown in Fig. 1
7.0.1, motor controller unit and mechanical assembly.
The robot senses the human voice lying in the
and Fig. 2 respectively.
frequency range (300 Hz-3400 Hz) and detects the
direction of human voice using delay of arrival (DOA)
mechanism and then recognizes the voice commands.
In order to show the viability of the proposed
algorithms, these are being implemented in an
experimental autonomous robot named ‘ACFRO’
(Autonomous Command Following RObot).

I. INTRODUCTION

The sound detection and its recognition, is


an interesting field that has drawn the attention
of many researchers in the recent years [1], [2],
[5]-[8] as this is requirement of any humanoid Figure 1: Block diagram of the proposed
robot. In [1], [2], the sound detection mechanism sound detection system.
has been developed for human robot interaction.
Sound detecting humanoid robot that can
recognize speech also, is able to work in very
harsh inhuman conditions as in nuclear reactors
where extremely high temperatures and harmful
radiations prevail. With the speech recognition
the robot can follow the instructions given by its
operator.
The aim of this paper is to design an
operator interface for an autonomous mobile
humanoid robot, using microcontroller which
will follow its operator whenever voice
commands are received from its operator and
performs the tasks accordingly. Figure 2: Block diagram of the proposed
The rest of the paper is organized as follows. voice command recognition system.
The description of the systems for reliable
detection and efficient recognition of the voice A brief description of various modules of the
commands is given in section II. In section III, proposed sound detection system is given below.
Dual full H bridge motor driver IC L298 [10] is
MICROPHONE ASSEMBLY used to control the movement of motors. Each H
bridge is capable of moving the motor
The microphone works as a transducer ‘clockwise’ or ‘anticlockwise’ depending upon
which converts the audio signal to electrical the direction of current flow through the circuit.
signal. The microphone assembly as shown in Using IC L298, it is possible to ‘jam’ or ‘free’
the Fig. 3 consists of three condenser the motors if required. Basically L298 [10] acts
microphones (M1, M2 and M3) oriented at an as an interface between the low power control
angle of 120° with respect to each other. This signals generated by microcontroller and the
arrangement uses minimum number of motor assembly which requires relatively high
power for driving of motors.
microphones to cover the whole 360° uniformly
for reliable detection of the direction of sound MECHANICAL ASSEMBLY
source.
This module mainly consists of two brushed
DC motors, gear boxes and vehicle chassis. The
implemented side steering mechanism in
mechanical assembly can effectively control the
motors for taking sharp turns. Motor 1 controls
the motion of left wheels and motor 2 controls
the right wheels as shown in the Fig. 4

Figure 3: Microphone assembly

BAND PASS FILTER

The power spectral density of noise shows Figure 4: Bottom view of mechanical
that maximum power is concentrated at low assembly
frequencies as compared to higher frequencies.
Therefore the band pass filter is used to eliminate Apart from processing unit all the modules used
environmental noise(<300 Hz) as well as to pass in speech recognition system are same as sound
the audio signal .Here it is an active second-order detection system and need no further description.
Butterworth band pass filter which is designed For voice recognition system PC (using
using operational amplifiers [4] (IC LM324). MATLAB 7.0.1) acts as a processing and
The pass band of this filter lies with in the audio decision making unit and explained in detail in
range (300Hz – 3400Hz). section 4.

PROCESSING AND DECISION MAKING III. ALGORITHM TO TRACK DIRECTION


UNIT OF VOICE COMMANDS

The processing and decision making is done The Delay of Arrival (DOA) [1] mechanism
by ATMEL’s AVR family ATMEGA32L is used for efficient detection of the direction of
microcontroller [3], [9]. This microcontroller has sound. This mechanism uses the time delay from
32k programmable flash memory and maximum the sound source to each microphone. The
clock frequency 8 MHz. It has an inbuilt 8 microphones are connected as the vertices of an
channel, 10 bit A/D converter. Here A/D equilateral triangle. The microcontroller samples
converter is used for converting analog signal analog electrical signal from each microphone
from the output of band pass filter to digital one by one in a predefined cyclic manner (...M1,
signal which is processed by the processing unit M2, M3 M1, M2, M3….) and simultaneously
of microcontroller and accordingly, it will converts this to digital signal with the help of
generate appropriate control signals to drive the inbuilt A/D converter.
motors used in mechanical assembly. First, microcontroller takes some predefined
MOTOR CONTROLLER UNIT number of samples from each microphone to set
the threshold level that depends on the amplitude
of local disturbances. This makes the system
immune to local disturbances as now
microcontroller recognizes only those signals
that are having higher amplitude than the set
threshold level.
Now the microcontroller samples the three
microphones continuously and detects to which
microphone the sound comes first (having
amplitude higher than the threshold level).
After determining the first microphone that
receives the sound, the microcontroller sets the
offset angle (0 degree for M1, 120 degree for M2
and 240 for M3) according to the orientations of Figure 5: Angular range of microphone
microphones M1,M2 and M3 and then it only
samples the rest two microphones so that Now, as the sound is being detected by M2
problems due to echo are ignored. next, therefore, the angular region from which
Now the rest two microphones (say M2 and sound is coming is limited to θ 2 . The DOA of
M3) are monitored continuously and the
microcontroller determines to which microphone sound Td between M2 and M3 will determine the
the sound comes next and accordingly it
deviation from the offset angle that is 0° in this
determines whether the robot has to take a
case as M1 is the first microphone to detect the
clockwise or anticlockwise turn and generates
sound.
the control signals for motor controller IC .After
As M2 is the second microphone after M1
this, it only samples the remaining microphone
to detect the sound so the microcontroller will
and calculates the delay in the arrival of sound
generate control signals for motor controller IC
between the two microphones (M2 and M3)
to turn the robot in clockwise direction for the
using the inbuilt timer. This delay Td determines specified time duration determined by the
the angle of deviation from the offset angle and conversion formula..
that will be the final angle at which the robot has For reliable calculation of Td , it is necessary
to rotate to reach the operator. Conversion
that
formula is derived to convert the final angle to
1
corresponding time duration for which robot has Td > (1)
to take a turn. To explain the above stated f
algorithm, consider the following example.
Suppose the sampling of three microphones where f is the frequency at which the
is being done in the sequence of M1, M2 and M3 microcontroller samples the digital data received
and this sequence is repeated continuously. The from the microphones.
sequence entirely depends on the programming If condition (1) is not satisfied, i.e., the
of the microcontroller. time taken by the microcontroller to sample one
There are six possible sequences in which
sound reaches the microphones. 1
microphone ( ) is more than the DOA Td
1. M1, M2, M3 f
2. M1, M3, M2 ,the microcontroller will be not able to judge that
3. M2, M1, M3 which microphone received the sound first and
4. M2, M3, M1 proposed algorithm will be not efficient.
5. M3, M1, M2 To avoid this situation, the sampling frequency,
6. M3, M2, M1 f, is kept as high as possible.
Here in sequence 1 M1, M2, M3 represents The maximum delay of arrival can be
that the sound first reaches the microphone M1 observed in the case when the sound source is
then M2 and then M3. As in this case, M1 will just between two microphones for example when
detect the sound first which means that the sound
the angle between the source and the M1 is 60°.
source must be in the angular range ( θ1 ) as In this situation, the maximum delay of arrival
shown in Fig. 5. between M2 and M3 obtained from Fig. 6 is
given by
fT
= 3 L sin θ × (6)
Vs
where use has been made of (4) and (5).
Equation (6) can be rewritten as

⎛ NVs ⎞
θ = sin −1 ⎜⎜ ⎟⎟ . (7)
⎝ 3 f T L ⎠

Figure 6: Calculation of maximum time delay


Total angle turned by the robot = offset value ±
Td max = (L+L cos 60°)/ Vs θ where
⎧ = 00 if M1 detects the
3L ⎪
= . (2) sound first
2Vs ⎪
⎪⎪ = 1200 if M2 detects the
where Vs is the speed of sound and L is the offset value = ⎨
⎪ sound first
distance of each microphone from the center O ⎪= -120 if M3 detects the
0
of the microphone assembly. Using (1) and (2), ⎪
we obtain ⎪⎩ sound first
1 3L
< . (3)
f 2Vs
and ‘+’ sign is taken when the 2nd microphone
TIME CALCULATION FOR CONTROL which detects the sound after the first one is in
SIGNALS the clockwise direction to the first detected
microphone and the ‘–’ sign is taken if the 2nd
From Fig. 7, it is clear that microphone detected is in the anticlockwise
d = 2 L cos 30°sin θ direction to the first detected microphone.

= 3 L sin θ (4) Total angle to turn by the robot


d
DOA =
Vs
. (5)
−1
⎛ NVs ⎞
= offset value ± sin ⎜⎜ ⎟⎟
⎝ 3 fT L ⎠

If total angle to turn by the robot is more than


180° then effective angle to turn by the robot is
calculated by subtracting total angle from 360°
and microcontroller generates control signals to
take robot an anticlockwise turn.

Total rotation time of robot

t = total angle turned/turning speed of robot.


⎧⎪ ⎛ NVs ⎞ ⎫⎪
Figure 7: Sound source at an angle = ⎨offset value ± sin −1 ⎜
⎜ 3 f L ⎟⎟ ⎬ r
ω
⎪⎩ ⎝ T ⎠ ⎪⎭
For this DOA the timer will count and let the .… (8)
count of counter after this time delay is ‘N’ and
clock frequency of timer is fT . Therefore, where ω r is the turning speed of robot.
N = DOA × fT
ERROR CALCULATION

Figure 9: Maxi mum error calculation for n


microphones
Figure 8: Setup for maximum error
calculation it can be calculated that
θ error = arcsin ( Vs / (2 Lf sin (180/n))) (12)
The curve of θ error Vs n plotted with the help of
As the time taken by the microcontroller in MATLAB is given below:
1
sampling one microphone is , so there is an
f
error in calculation of angle or in other words it
is the minimum angle (resolution of the system)
that can be efficiently detected by the robot.

From Fig. 8, we obtain

a = 2 L cos 30°sin θ

= 3 L sin θ
(9)
Time Delay = a / Vs
Fig. 10: Plot of θ error vs. n
= 3 L sin θ / Vs
IV. ALGORITHM TO RECOGNIZE VOICE
=1/f (for one sampling time period) (10) COMMANDS

From (9) and (10), we have To recognize the voice commands


efficiently different parameters of speech like
pitch, amplitude pattern or power/energy can be
−1
⎛ Vs ⎞ used. Here to recognize the voice commands
θ error = sin ⎜ ⎟ (12)
energy of the speech signal is used.
⎝ 3f L⎠ First the voice commands are taken with the
θ error is the maximum error or the minimum help of a microphone that is directly connected
detectable angle (resolution of the system) in to PC. After it the analog voice signals are
determining the angle. sampled using MATLAB. As speech signals
generally lie in the range of 300Hz-4000 Hz, so
This θ error is calculated for an assembly of three according to Nyquist sampling theorem
microphones. If we generalize it for an assembly minimum sampling rate required should be 8000
of say ‘n’ microphones then from Fig. 9 samples/second.
After sampling, the discrete data obtained is
passed through a band pass filter having pass
band frequency in the range of 300 - 4000 Hz.
The basic purpose of using band pass filter is to
eliminate the noise that lies at low frequencies
(below 300 Hz) and generally above 4000 Hz dictionary commands has been set. If the
there is no speech signal. calculated Euclidian distance of received
This algorithm for voice recognition command does not lie in the range for any
comprises of speech templates. The templates dictionary command, then the command received
basically consist of the energy of discrete is considered as a strange one and ‘ACFRO’
signals. To create the templates here the energy requests for a familiar command. The efficiency
of each sample is calculated and then the of the proposed algorithm depends on the
accumulated energy of 250 subsequent samples mechanism of dictionary creation and
is represented by one value. For example, let comparison method of the dictionary templates
8000 samples are taken, then energy of discrete with the received command template and also on
data will be represented by 32 discrete values if the range of values for Euclidian distance. If the
energy of subsequent 250 samples (i.e. 1- number of times the same command is taken for
250.251-500….7750-8000) is accumulated and creating dictionary is increased, the efficiency of
represented by one value. The numbers of proposed algorithm will go up.
samples taken and grouped are absolutely
flexible and can be changed keeping in mind the V. EXPERIMENTAL RESULTS
memory space available and the processing time.
For recognition of commands first a dictionary The experimental setup of the robot
is created that consists of templates of all the ‘ACFRO’ is shown in Fig. 11. The experiment
commands that ‘ACFRO’ has to follow (like in is performed in a living room where the sounds
our case these are ‘Turn Left’, ‘Move Right’, of moving motors, fan, tube light, air conditioner
‘Come Forward’ and ‘Go Back’. For creating the act as background noise. The audio source is
dictionary the same command is taken several placed at an interval of 30° on a circular path of
times (15 in case of ‘ACFRO’) and template is 1 m radius. There fore angle detected by robot
created each time. For creating the final template and the corresponding error in detection are
the average of all these templates is taken and measured for 12 points on the circumference of
stored. the circle as shown in the TABLE 1.
After creating the dictionary of templates,
the command to be followed is taken with the
help of the microphone and the template of the
input command signal is created using the same
procedure as mentioned earlier.
Now the template of command received is
compared with the templates of dictionary using
Euclidian distance. It is the accumulation of the
square of each difference between the value of
dictionary template and that of command
template at each sample points. The formula can
be given as
Fig. 11. Experimental setup for sound
direction detection

From the TABLE I it can be observed that


Where i denotes the number of sample the average error in angle detection is very less.
points which is 32 in the proposed algorithm. Moreover, practical value for maximum error
After calculating Euclidian distance for each (6°) in angle measurement closely matches with
dictionary template, these distances are sorted in the theoretical value of it (1.5°).
the ascending order to find out the smallest
distance among them. This distance corresponds
to a particular dictionary template which is the To show the viability of the proposed
template belonging to a particular dictionary algorithm for speech recognition, the algorithm
command. Then the Robot detects that particular is implemented with the help of MATLAB and
command given by the operator and performs the the results are shown in TABLE II. To make the
task accordingly. If the command given by the table each dictionary word is tested 15 times and
operator does not match with any of the accuracy in detection can be obtained from
dictionary command then the Robot should not correct recognition.
follow that command. In order to incorporate this
feature in ‘ACFRO’ an individual maximum
range of values of Euclidian distance for all the
Table I: VI. CONCLUSION

EXPERIMENTAL OBSERVATIONS FOR Algorithms for the proposed interface have been
SOUND DIRECTION DETECTION developed and from experimental results shown
it can be proved that the proposed algorithms are
Location Living Room efficient to fulfill the objective of successful
interfacing of a robot with its operator through
Distance Angle Angle Error voice direction detection and its recognition. As
Detected compare to previous researches [1] this system is
0° 6° +6° more accurate and simpler and faster.

30° 34° +4° REFERENCES


[1] Kim H.D, Choi J.S, Kim M, and Lee C.H. ,
60° 64° +4° 2004, “Reliable Detection of Sound’s Direction
for Human Robot Interaction”, IEEE/RSJ
90° 94° +4° International conference on intelligent robots and
systems, Sendal, Japan, Sept. 28-Oct. 2, 2004.
120° 125° +5°
[2] Yangisawa K., Ohya A., and Yuta S., 1995,
150° 156° +6° “An Operator Interface for an Autonomous
1m Mobile Robot using Whistle Sound and a Source
180° 178° -2° Direction Detection System “, Proc. of 21st
Int.Conference on IEEE IECON-1995, vol. 2,
210° 206° -4° pp. 1118-1123, 6-10 Nov. 1995.

[3] Gadre D.V., 2003, Programming and


240° 235° -5°
Customizing the AVR Microcontroller, Tata
McGraw-Hill, New Delhi, India, 2006 reprint.
270° 264° -6°
[4] Gayakwad R.A., 2000, Op-Amps and Linear
300° 296° -4°
Integrated Circuits, Pearson Education
(Singapore) Pte. Ltd., Delhi, India, pp. 268-273,
330° 327° -3° 2004 reprint.
Average Error +4° [5] Shindoi T., Hirai, T., Takashima, K., and
Usami, T. , 1999, “Plant Equipment Diagnosis
By Sound Processing”. Proc. of 25th annual
Table 1I: Conference of Industrial Electronics Society,
IECON-99, IEEE, vol. 2, pp. 1020 – 1026, 29
EXPERIMENTAL OBSERVATIONS FOR Nov. - 3 Dec. 1999.
VOICE RECOGNITION
[6] Shun-H. and Chang F. T. W. , 2003,
Correct “Underwater Sound Detection Based on Hilbert
Words Recognition Transform Pairs of Wavelet Bases”, Proc. of
Percentage OCEANS -2003, vol. 3, pp. 1680 – 1684, 22-26
Sept. 2003.
TURN LEFT 80
[7] Estrada, R.F. and Starr, E. A. , 2005, “50
Years of Acoustic Signal Processing for
MOVE 65 Detection: Coping with the Digital Revolution”.
RIGHT Annals of the History of Computing, IEEE, vol.
27, Issue: 2, pp. 65 – 78, April-June 2005.
COME 80
FORWARD [8] Kikuchi, M., Tanisawa, S. and Hirose, H.,
2004, “Measurement of Low-Pressure Gas Flow
Rate by the Active Sound Detection Method”,
GO BACK 75 Annual conference of SICE - 2004, vol. 1, pp.
471 - 474, 4-6 Aug 2004.

You might also like