You are on page 1of 3

Isolated Digit Recognition

Saad Shahid Khokhar, Muhammad Junaid Ashfaq Telecommunication Engineering Department, University of Engineering and Technology Taxila UET Taxila, Taxila-47050, Rawalpindi, Pakistan saad_skhokhar@yahoo.com, m_junaid_a@yahoo.com

Abstractat the time now, voice activated routing systems at customer call centres, voice dialling on mobile phones and many other everyday applications are embedded with Speech Recognition Technology. This paper motivates the concept of isolated digit recognition. The subject is implemented in MATLAB using its built-in functionality. The key concept is the recognition of isolated digits from an input utterance using microphone. Output is displayed graphically with the assistance of MATLABs GUI feature. The overall system is Speaker-dependent that is, it recognizes speech only from one particular speakers voice.

Recognition can be categorized into Isolated Word/Digit Recognition. This era involves two major stages: a training stage and testing stage. Training involves teaching the system by building its dictionary, an acoustic model for each word that system needs to recognize. In our case, we have built the systems vocabulary by training it to recognize the digits from zero to nine. In the testing stage we use acoustic models of these digits to recognize isolated words using a classification algorithm. II. RESEARCH WORK The development workflow consists of three steps: Speech acquisition Speech analysis User interface development A. SPEECH ACQUISTION Training stage requires acquisition of speech from the trainer. The speech is acquired from microphone and brought into the development environment for offline analysis. During the training stage, it is necessary to record repeated utterances of each digit in the dictionary. For example, we repeat the word one many times with a pause between each utterance. In matlab environment, following command >> y = wavrecord(30*8000,8000); Captures 30secs of speech from a microphone input at 8000 samples per second and then the command, >> wavwrite(y,8000,X.wav); Where X=one, two,.zero saves the data to disk. Repeating above commands for the digits from zero to nine will leave us with the systems dictionary comprising of digits form zero to nine. This approach works well for training data. In the testing stage, however we need

Keywordsembedded systems, speech recognition, speaker dependent, GUI

I.

INTRODUCTION The speech signal possesses several levels of information. Primarily a speech signal depicts the word or a message being spoken, but on a secondary level, the signal conveys information about the identity of the talker. The speechrecognition era together with extracting the underlying linguistic message in an utterance; is concerned with extracting the identity of the person speaking the utterance. As speech interaction with computers becomes more pervasive in activities such as telephone financial transactions and information retrieval from speech databases, the utility of automatically recognizing a speaker based solely on vocal characteristics increases. A robust speech-recognition system combines accuracy of identification with the ability to filter out noise and adapt to other acoustic conditions, such as the speakers speech rate and accent. A robust speech-recognition algorithm requires detailed knowledge of signal processing and statistical modelling. Most speech-recognition systems are classified as isolated or continuous. In isolated word recognition, there is a brief pause between each spoken word, whereas continuous speech recognition does not require such pauses. Speech-

to continuously acquire and buffer speech samples and at the same time, process the incoming speech frame by frame, or in continuous groups of samples.

Fig3: MFCC of digit one. Coefficients that collectively make mfc are called Mel frequency cepstrum coefficients (mfcc).

Fig1: plot of speech of one and outdetect of detected word. B. SPEECH ANALYSIS This step provides complete analysis of the input speech. We developed a word-detection algorithm that separates each word from ambient noise. This algorithm takes input speech in the form of frames, eventually calculating the energy and number of zero crossings in the frame. These values are then compared with the threshold to determine if we have possible voice activity and hence a spoken digit. A bench mark is developed for the threshold. If either energy or zero crossings exceeds the threshold, the algorithm continues analysing frames and start buffering. The algorithm specifies a buffer length. If the number of contiguous frames does not exceed buffer length, we have a false alarm and the analysis of frames continues. If the number of contiguous frames exceeds the buffer length, we have detected a word. The algorithm keeps analysing frames until we encounter "buffer length" contiguous frames where neither energy nor zero crossing exceeds the threshold. This means we have analysed past the end of the spoken digit. The duration of the detected digit is then compared with the threshold (0.25secs). If duration exceeds threshold, voice activity is marked. If duration does not exceed threshold, disregard digit. In speech analysis, we use Gaussian mixture models for our speech comparison. For obtaining out detect speech pattern mfcc is applied in this algorithm. In sound processing, the mfc is the representation if short term power spectrum of a sound based linear cosine transform of a log power spectrum on non linear mel scale of frequency.

Fig2: Code of finding frame Energy and Zero crossing C. USER INTERFACE DEVELOPMENT The beauty of the project lies in the GUI environment. The detected word is displayed in a GUI environment by using MATLAB built in functions. After developing the isolated digit recognition system in an offline environment with pre-recorded speech, we migrate the system to operate on streaming speech from a microphone input. We use MATLAB GUIDE tools to create an interface

that displays the time domain plot of each detected word as well as the classified digit.

Fig4: MATLAB GUI III. CONCLUSION

The algorithm is robust for recognizing the spoken speech with high detection rate. Keeping noise at minimum level. The spoken digit is recognized efficiently, even in a noisy environment. The algorithm is robust for recognizing the spoken speech with high detection rate. Keeping noise at minimum level. The spoken digit is recognized efficiently, even in a noisy environment. IV. FUTURE WORK

This work is done on isolated digits. It can be taken forward to continuous speech and more dictionary elements . V. REFRENCES [1] Developing an Isolated Word Recognition System in MATLAB by Daryl Ning [2] Voice Recognition Using MATLAB_ David Roberts [3] Gaussian Mixture Models_Douglas Reynolds MIT Lincoln Laboratory, 244 Wood St., Lexington, MA 02140, USA.dar@ll.mit.edu [4] Robust text independent speaker identification using GMM by Douglas A. Reynolds and Richard. C. Rose

You might also like