Professional Documents
Culture Documents
Speech Recognition
P. Sudhakara Rao
Central Electronics Engineering Research Institute Centre,
CSIR Madras Campus, Tharamani,
Chennai – 600 113, India
Introduction
• Robust Speech Recognition and Synthesis in
Embedded Systems provides a link between
the technology and the application worlds
• Language dependent
• Reverse Engineering not possible
• Interdisciplinary Research
• Technology Matured now
– Some practical Applications available
– Wide range of applications in sight
Speech Technology Helps IN
• Communication with Computers
• Information Services
• Language Translation
• Aids for Handicapped
• Recognition of messages
• Recognition of Speakers
• Text to Speech Conversion
• Language Identification
Applied Research Problems
• Fluent Speech recognition
– (All Speakers, all accents, environment etc..)
• Natural sounding Synthetic Speech
– (desired accent, language, voice etc..)
• Speaker Recognition
– (Text independent identification and verification of a
speaker)
• Speech Communication / understanding messages in high
ambient noise conditions
E m b e d d e d S p e e c h T e c h n o lo g y
L I N G U IS T I C
S In p ut T e x t
A /D
P HO N E T IC S T e x t P r oc e s s in g
P A R S IN G
P r e -P r o c e s s i n g
S p e e ch C o n c a te n a ti o n
K n o w le d g e o f P a r a m e te r s
S pe e c h A n a l y s i s R e p r e se n ta ti o of S p e e ch
( E x tr a c ti o n o f n . D a ta b a se S a m p le s
P a r a m e te r s ) R u l e s e tc . (R u l e s)
R u le of P r os o d y
D a ta
C o m p r e s s ion an d
R e pr e s e n ta ti o n
H i g h Q u a l i ty
S pe e c h
S y n th e s i z e r
S pe e c h S pe a k e r
R e c o g n i ti o n R e c o g n i ti o n
M e s s ag e S pe a k e r I de n ti t y
A P P L IC A T IO N S
Com m and I n f o r m a ti o n T e le c o m m . A i ds fo r M ac h in e
& C o n tr o l R e tr i e v a l / e n tr y s e r vi c e s H an d ic a p p e d tr a n s l a ti o n
Speech- Natural, Efficient &
Economical way of Communication
Specific Properties of Indian
Spoken Languages
• Phonetic in Nature
• Better Articulatory discipline
• systematic manner of production
• Very few Flaps/Taps or Trills
• Five or Six distinct place of Articulation
• Few fricatives compared to English
Acoustic Phonetic classification
of Hindi and Bengali sounds
Major Achievements in
Speech Recognition
• TECHNICAL SPECIFICATIONS:
INPUT: Closed talking head worn microphone.
OUTPUT: Parallel and serial(RS-232C) ports
Built in 7-segment LED display.
PRE-PROCESSOR:
Input Bandwidth: 200 Hz -7000 Hz.
Filter Bank analyzer: 16 critical bands Lowpass filter 40 Hz.
DATA PROCESSOR: MC68000 based microcomputer.
WORD BOUNDARIES: Based on silence background noise
level.
-Contd.
DATA COMPRESSION:
Removal of redundant information.
Variable segment encoding (100: 1 aprox.)
PATTERN MATICHING :
Template matching ( Dynamic time warping / Discrete HMM).
AVAILABLE INTERFACES:
Stepper motor controller.
Multilingual (GIST) CRT Terminal.
Telephone Dialer
Speech Synthesizer (CVSD CODEC)
Voice Controlled Wheel Chair
INDIGENOUS TECHNOLOGY
ON-SPOT CHAIR ROTATION (SPIN MOVEMENT) BOON FOR QUADRIPLEGIC AND PARAPLEGIC PATIENTS AUTOMATIC AND MANUAL CONTROL OPTIONS
Motors 24V/120W
Speed 0 - 4 Km / Hour
Braking Electromagnetic
Original
Sentence
Major Achievements in
Basic Speech Research
C h an n el 1 C h an n el 2
C o m p u te ris e d S p e e c h
L a b (C S L ) S y ste m
S e ttin g th e re c o rd in g S /R a te : 1 6 ,0 0 0
c o n d itio n th ro u g h D u ra tio n : 4 0 S e c
s o ftw a re
R e c o rd in g fiv e
s e n te n c e s ( a p p r o x .)
a t a tim e
M a rk in g s e n te n c e
b o u n d a rie s
C o n firm s e n te n c e
b o u n d a rie s b y lis te n in g
b ack
P ro c e s s th is d a ta file
S to re th e d a ta in a file U s in g s e m i-a u to m a tic
L a b e lin g to o l
C o p y th e d ig ita l
s p e e c h d a ta b a s e in to M a n u a l in s p e c tio n a n d
C D -R O M c o rre c tio n o f s e g m e n t
b o u n d a rie s
O rg a n is a tio n a n d
d is trib u tio n o f th e
la b e le d d a ta
General Purpose Speech Data Base for
Hindi
• A general-purpose database has been created for a vocabulary of 1000 most frequently used
words.
– The specifications of the data base are as follow
• This database will be very useful for developing general-purpose speech recognition systems
Data Base of 1000 sentences
• A phonetically rich database has been created for 1000 sentences
– The specifications of the data base are as follow
1. Language : Standard Hindi
2. Vocabulary Size : A set of 800 phonetically compact
and 2 phonetically rich sentences
3. Speakers : 100 speakers (60 male and 40 female)
4. Utterances : one
5. Audio Recording : 2 channel Recording using two different
microphones
6. Microphones : SHURE microphone, An ordinary microphone
7. Signal to Noise ratio : First Channel (50db), Second Channel(20db)
8. Digitisation : 16 KHz sampling, 16 bit quantization.
9. Storage Media : Floppies, Tape cartridge , CD-ROM.
10. Recording Platform : directly on a Pentium PC
11. Specialised H/W : Kay’s Computerised Speech Lab
12. Labelling : Manual Labelling using Sensimetrics
Speech Station Software
A sample set of 10 sentences
Main Window of the Data base
Sensimetrics Speech Station
Software used for Labeling
Challenges in Speech Technology
• For Applications
– Speech to Speech Translation
– Text Reading Machines
– Multi-lingual dialogue in speech mode
– Voice Operated Telecom Services
– Voice interactive (2 mode / 3 mode communication service
for Multi-media, financial transaction, enquires etc..
– Voice commands/control in noisy and hazardous
environment, security applications
THANK YOU