You are on page 1of 6

Neuro-Genetic based Speech Processing for

Promoting Global Cyber Security using


Steganography technique
Tilendra Shishir Sinha, and Gautam Sanyal

that no one can decode the message unless the master image –
Abstract—The goal of Cryptography is to provide voiced frame is found. The format of this master image –
confidentiality and privacy by encrypting the information. If any voiced frame is shown in Fig. 1.
unauthorized human body does cryptanalysis then this goal may
be defeated. In this paper the authors have attempted an FrmNo FrmSeq ImgparBG ImgparFG Vpara
algorithm called STEG_NGBSP (Steganography/Neuro-Genetic Fig. 1. Master image – voiced file format required for the successful
based Speech Processing) using a technique called Steganalysis at the Receiver’s end.
Steganography that inherits some features of Cryptography in
such a way, that the information is encrypted as well as hidden.
The methodology adopted here is the concept of null ciphers,
There are five fields, FrmNo, FrmSeq, ImgparBG, ImgparFG
which are extracted from the speech signal of a Bengali speaker and Vpara. The FrmNo field is used for keeping the track of
using Neuro – Genetic approach. The Speech signal is processed the frame number, which acts as a primary key. The field
and then veiled with an image. Every image is encapsulated with FrmSeq, which acts as a secondary key, maintains the frame
some information, which may be decrypted at the receivers end sequence. The field ImgparBG holds the details of the
using the master image – voiced frame, in which the speech data background image and ImgparFG holds the details of
is covered by another image. The proposed algorithm is unique
in handling the security measures from Steganalysis, thus
foreground image. The field Vpara holds the various
promoting global cyber security and bridging the gap between parameters extracted after speech processing. The format of
the computer and the user with more security through Natural this field is shown in Fig. 2.
Language Processing.
A B C D E F G H I J K L M
Index Terms — Cryptography, Global Cyber Security, Fig. 2. Processed Speech Parameters format using GSAMTSS Algorithm.
Principal Component Analysis, Steganography, Steganalysis,
Wavelet.
Where A : lower bound pitch; B : actual pitch; C : upper
I. INTRODUCTION bound pitch, D : lower bound amplitude, E : actual amplitude;
F : upper bound amplitude; G : auto-correlation coefficient; H
C RYPTOGRAPHY [1] is the art of writing secret
information in such a way that no one can understand
except the intended viewer. It provides privacy, integrity and
: power spectral density; I : eigen vector; J : eigen value; K :
adaptive vector Quantization coefficient; L : wavelet
coefficient; M : equivalent Bengali text. The general literature
authenticity. If the methodology is known then it is very survey in this area of research has been briefly expressed in
difficult to secure the data from the hackers. These hackers the subsequent paragraphs.
can be categorized into two groups, first type is the passive Most of the researchers organize the data in a more
hacker, who just tries to read the information, and the second compressed way by adopting various methods, but the work
type is the active hacker, who not only tries to read the done by Hongwei et al [4], describes the proper usage of
message but also modifies the data and may forward it to its Vector Quantization using Genetic Algorithm [11], [12]. The
destination. In cryptography, the data cannot be hidden. This work has been carried out for an efficient image processing
is possible by adopting a new technology called using digital signal processing. Further work has been carried
Steganography. Steganography [2], [3] is a Greek word, out by Fernando Bacaa et al [5], to divide the outputs into
meaning “Covered Writing”. Through this technology, the zones, here only five zones have been created and each zone
secret message is hidden in another message, in such a way parameters are aggregated in the network model and the
resultant is fed as an input for the best-fit search. This work
Tilendra Shishir Sinha is with the Department of Computer Science and has been further carried out by, Gautam Sanyal et al [6], for
Engineering, as Research Scholar at NIT, Durgapur, West Bengal, India (e- the recognition of the speech and the speaker using
mail: tssinha2006@yahoo.co.in).
Dr. Gautam Sanyal is with the Department of MCA & CSE as HOD, NIT, GSAMTSS algorithm. The algorithm interprets up to certain
Durgapur, West Bengal, India (e-mail: gs_comp@nitdgp.ac.in). number of words and this has been verified with tested data.
Further the technique adopted by Neil F Johnson [7], is the
1-4244-0370-7/06/$20.00 ©2006 IEEE

Authorized licensed use limited to: IEEE Xplore. Downloaded on February 15, 2009 at 21:42 from IEEE Xplore. Restrictions apply.
null cipher, which discusses that a complete plain text is sent frame, which is encapsulated within the images. This master
and the cipher text is generated by extracting the specified image – voiced frame is sent to its destination as per
character from each of the words of plain text. From this, new requirement. If this frame is not received positively then the
sentences are generated, forming null ciphers. The work speech processed at the sender’s end will not be recognized at
carried out by Yasser Ghanbari et al [8], describes the the receiver’s end. Hence this master – voiced frame is
computation of wavelet coefficient, which has been adopted playing the key role in this proposed work. The solution
by the present authors for the betterment of the results and methodology is discussed in the subsequent subsection.
comparisons with the existing algorithms. The comparison has
been attempted with respect to the performance measures like A. Proposed Solution Methodology
Accuracy and Precision with the work of C. Orasan et al [15] Adopting the proposed algorithm can summarize the solution
and Hang Li et al [16]. of the present work with some mathematical analysis. The
The paper has been organized in the following manner, mathematical analysis starts with the extraction of various
section II proposes problem formulation and solution speech parameters which includes wavelet coefficient, ranges
methodology, section III describes the results and discussions, of pitch and amplitude values as a Genetic number, power
section IV gives the concluding remarks and further work and spectral density, auto-correlation coefficient value, eigen
finally section V incorporates all the references been made for value and eigen vector after performing principal component
the completion of this work. analysis and Adaptive Vector Quantization coefficient. The
compression of speech data is to be done by employing
II. PROBLEM FORMULATION AND SOLUTION Discrete Cosine transform and Karhunen – Loeve Transform
METHODOLOGY [13], [14]. Using these parameters an Artificial Neural
Promoting global cyber security is gradually becoming a Network model is created with an arrangement of word maps.
social responsibility of computer professionals. Serious This strategy of forming a word class is called Part-of-speech
research is underway in this area, but maximum work is being (POS). Incorporating word classes into Network model yields
done using cryptography technique with Digital Signature [9], good smoothing or hopefully meaningful generation of
[10] in which RSA algorithm is used but with unhidden sentences in Bengali Language (Indian Language). The
encrypted data. With the sole aim of promoting global cyber proposed algorithm has been compared with the algorithm as
security, a dubious system has been created, such that the devised by Hang Li et al [15], as per the size of the test data.
encrypted data is cleverly hidden from the hacker. This unique Further measures are computed using the concepts adopted by
system has been shown in Fig. 3. Constantin Orasan et al [16]. The complexity of the proposed
algorithm is also being computed.
Processed Speech
B. Mathematical Analysis
Based on the assumption that the original spectral is additive
Image (BG) Image (FG)
with noise. To compute the approximate shape of the wavelet
(i.e., Any real valued function of time possessing some
Hiding Mechanism structure), in a noisy signal and also to estimate its time of
occurrence, two methods are available, first one is a simple
structural analysis and the second one is the template
matching technique. For the detection of wavelets in noisy
Sandwiched Processed Speech Data between two images
signal, assume a class of wavelets, Si(t), I = 0,2,…..N-1, all
having some common structure. Based on this assumption that
noise is additive, then the corrupted signal is modeled by
Master Data Extraction Images are
Cropped
following equation -:
Mechanism
x(n) = s(n) + G d(n)
Where s(n) is the clean speech signal, d(n) is the noise and G
Speech and Voice Recognition at Receivers end is the term for signal-to-noise ratio control. To de-noise this
speech signal, wavelet transform is to be applied. Let the
Fig. 3. Block diagram for Steganalysis. mother wavelet or basic wavelet be ψ(t), which is given by,
ψ(t) = exp (j2πft – t2/2) (1)
The system involves utterance of Bengali words at the Now as per the definition of Continuous Wavelet transform
sender’s end and these words are processed using the CWT(a,τ), yields,
methodology adopted by Gautam Sanyal et al [6]. This
produces a processed speech data, which is then sandwiched
CWT(a, τ)=(1/ a ) ∫ x(t ) ψ{(t-τ)/a}dt
between two digital images just like butter or jam is The parameters in the above equation may be discretized to
sandwiched between two bread slices. The details of this yield the Discrete Parameter Wavelet transform, DPWT(m, n),
processed speech data is kept in a master image – voiced

Authorized licensed use limited to: IEEE Xplore. Downloaded on February 15, 2009 at 21:42 from IEEE Xplore. Restrictions apply.
m Now finding the covariance of equation (8), it yields, the
, τ = n τ0 a
m
by substituting a = a . Thus it yields,
0 0 corresponding Eigen vector
DPWT (m, n) = 2-m/2 ∑ k
x(k ) ψ(2-mk – n) (2) P = cov( p n ) (9)
Where ‘m’ and ‘n’ are the integers, a0 and τ0 are the sampling and P . M i = λi . M i (10)
intervals for ‘a’ and ‘τ’, x(k) is the speech signal. Equation (2)
has been obtained by substituting a0 = 2 and τ0 = 1. This gives where λi are the corresponding Eigen values.
Now the resultant parameters are taken as weights and then
the wavelet coefficient.
formed a large class of Artificial Neural Model called
Now assume that a speech signal has been sampled at regular
Perceptron. The arrangement is done in the form of word
time interval ‘T’ to produce a sample sequence {x (nT)},
maps, with averaged weights of each word during learning
which is denoted as,
period. The matching is to be done in such a way that the
{x (nT) } = x(0), x(T), x(2T),…….,x[(N-1)T] of N sample
values, where ‘n’ is a sample number from n = 0 to n = N-1. output response must be achieved as per the input weights. If
After employing proper Discrete Fourier Transformation the result is not achieved then it should be updated or
(DFT) method, it yields to the equation of the form, searched next averaged weight from the word map.
N−1 Using the above-discussed mathematical parameters, the
X(k)=FD[x(nT)]= ∑x(nT) exp (− jk(2π / N)n)
n=0
(3) speech is processed and stored in a template, after selecting a
proper view port on the background image. This image is to
where k = 0, 1, 2, ……..,N-1 be blurred using Gaussian Low pass filtering, whose transfer
Let WN = exp (-j2π/N) function, is given by,
Thus equation (3) becomes, GLP(u, v) = exp(-(D2(u, v) / 2σ2)) (11)
N −1 Where ‘σ’ is the Standard deviation and D(u, v) is the
X(k) = FD[x(nT)] = ∑ x(nT )W
n =0
kn
N (4) distance from the point (u, v) to the center of the filter.
For De-blurring the image, Gaussian High Pass filtering has
Now to find the power spectral density, it yields, been used, whose transfer function, yields,
N −1 GHP(u, v) = 1 – GLP(u, v) (12)
1
E[x(nT)] =
N
∑ x(nT )
n =0
Mirroring of the foreground image has been done with the
help of transformation matrix, shown below,
where ‘E’ denotes expectation. The variance is, 1 0 
Var[x(nT)] = E{[x(nT) – x’(nT)]2} [x’ y’] = [x y]   (13)
The Auto-Covariance of the signal is given by, 0 − 1
Cxx(g) = E{[x(nT) – x’(nT)][x(nT) – x’(nT)]} (5) Considering two measures, Accuracy and Precision has been
Thus the Power Spectrum Density is, derived to assess the performance of the system, which may
N −1
(6) be formulated as,
PE ( f ) = ∑C xx ( m )W ( m ) exp( − j 2π fm )
m =0 Correctly Re cognized words
where Cxx(m) is the Auto-covariance function with Accuracy = (14)
‘m’ sample and W(m) is the Blackman window function with
Total number of words
‘m’ sample. TPR
Now to perform the data compression, employ Discrete Precision = (15)
Cosine Transform (DCT), which yields only Real Values, and
TPR + FPR
can be easily employed using algorithms, Hence, Where TPR = True positive recognition and FPR = False
N −1
 k 2π nT  positive recognition.
Xc(k) = Re[X(k)] = ∑ x ( nT ) cos 
n=0 N


(7)
C. Proposed Algorithm
Further data compression is to be done using Karhunen- STEG-NGBSP Algorithm {Starts}
Loeve (KL) method, which is widely used for non-linear Step 1. Employ GSAMTSS [6] algorithm for processing the
signals. Using this method, principal component values i.e.,
Speech.
Eigen Values and the corresponding Eigen Vectors, can be
Step 2. Store the parameters in a template and append in the
computed for the proper recognition of speech and the
master – voiced frame file
speaker. Thus, equation (3) yields, a pattern vector p n can be Step 3. Find the size of the template, say, TS
represented by another vector q n of lower dimension by a Step 4. Read the image and find its size, say GS.
Step 5. Select a view port of size TS, over the image, only
linear transformation. when TS < GS otherwise select another image and
Thus p n = [M ] qn (8) do step 1.
where [M ] = [X (k )] for k = 0 to N-1 from equation (3). Step 6. Employ Gaussian Low pass filtering algorithm to
blur the image
and q n = min([M]), such that q n > 0 Step 7. Superimpose one image over the other.

Authorized licensed use limited to: IEEE Xplore. Downloaded on February 15, 2009 at 21:42 from IEEE Xplore. Restrictions apply.
Step 8. Read the incoming image frame.
Step 9. Check for the master image – voiced frame.
Step 10. If found then do step 11 else do step 17. TABLE II
SPOKEN WORDS ARE /TAAKA/ /JOMA/ /KORE/ /DAO/
Step 11. Take the Mirror Reflection of the image
Important Lower Bound Actual Bound Upper Bound
Step 12. Clear the View port parameters
Step 13. Employ now Gaussian High pass filtering Amplitude -0.213701 0.0370962 0.172668
algorithm to de-blur the image. Pitch -32.11065 -9.458762 30.77158
Step 14. Read the template parameters again RXX 0.0 1.51249e+010 1.0
PSD 0.0 5.24789e+009 1.0
Step 15. Employ GSAMTSS [6] algorithm for the Evector 0.0 4.34587e-005 1.0
recognition of the spoken speech. Evalue 0.0 1.65892e-005 1.0
Step 16. Display the desired message in Bengali Font. PDF 0.0 4.14651e+012 1.0
Step 17. Display the error and stop. AVQC 0.0 0.0548569 1.0
WLC 0.0 0.09486995 1.0
STEG-NGBSP Algorithm {Ends}
An original image, which is acting as a background image, is
D. Computation of Complexity selected as shown in Fig. 4a, and its histogram is shown in
Let us assume that ‘n’ is the average length of a word. So for Fig. 4b. The Processed speech data is then entered in the
a single word matching the time complexity of the proposed selected view port and then it is blurred using Gaussian Low
algorithm is almost or roughly linear to the size of the pass filtering and the histogram is plotted, as shown in Fig. 4c
vocabulary, O(n log n). But when three words are present in a and Fig. 4d respectively. Again the original image, which is
particular sentence, then the time complexity of the proposed now acting as a foreground image, is then superimposed over
algorithm is O(n2). In a worst case, the time complexity for the blurred image, as shown in Fig. 4e and its histogram is
the proposed algorithm expected to be F * O(n log n), where shown in Fig. 4f., and next the mirroring has been done
‘F’ is the size of the complete phrase or sentence. during recognition phase at the receiver’s end after employing
Gaussian High pass filtering for de-blurring the image
III. RESULTS AND DISCUSSIONS received.
The Algorithm has been tested over a standalone machine,
and the result is remarkable. The threshold value of
recognizing the speech is kept as 5% tolerance and this is
verified from the trained data stored in the master image –
voiced frame file. Some of the vocabularies extracted are
shown in Table I and Table II.
TABLE I
SPOKEN WORDS ARE /TUMI/ /KE/
Important Lower Bound Actual Bound Upper Bound
parameters
Amplitude -0.063701 0.0375432 0.101245
Pitch -41.6458 -8.67741 30.2124 Fig. 4a. Original Image used as Background.
RXX 0.0 1.08268e+010 1.0
PSD 0.0 3.7574e+009 1.0
Evector 0.0 4.33129e-006 1.0
Evalue 0.0 7.14436e-007 1.0
PDF 0.0 2.54233e+012 1.0
AVQC 0.0 0.0470479 1.0
WLC 0.0 0.0987955 1.0
Where the symbols are illustrated as -:
RXX Auto-correlation coefficient;
PSD Power Spectral density
Evector Eigen Vector
Evalue Eigen Value Fig. 4b. Histogram of Fig. 4a.
PDF Probability Density Function
AVQC Adaptive Vector Quantization coefficient
WLC Wavelet Coefficient

Fig. 4c. After performing Gaussian Low pass filtering.

Authorized licensed use limited to: IEEE Xplore. Downloaded on February 15, 2009 at 21:42 from IEEE Xplore. Restrictions apply.
TABLE IV
PERFORMANCE MEASURES (PRECISION)
Size of Test Li’s Precision Our Method’s Precision
Data (in KB)
1 0.802 0.825
2 0.799 0.815
3 0.793 0.796
4 0.786 0.786
5 0.784 0.789
8 0.791 0.799
Fig. 4d. Histogram of Fig. 4c.

Performance Measures (Precision)

0.84

Precision Values
0.82 Li's precision

0.8 Our Method's


0.78 Precision
Fig. 4e. Another image similar to background image is superimposed by
0.76
another image as foreground image like sandwich of images.
1 2 3 4 5 8
Size of the Test data (KB)

Fig. 5b. Graphical Representation of Precision Measures.

Then at the receiver’s end, the speech will be processed for


the recognition and the result will be obtained with the actual
speech spoken at the sender’s end in text form, here in
Bengali font, as shown in Fig. 6.
Fig. 4f. Histogram of Fig. 4e.

The performance measures for Accuracy and Precision are


compared with the method adopted by Constantine Orasan et
al [15] and Hang Li et al [16], with the data shown in Table
III and Table IV and the corresponding graphical
representations are shown in Fig. 5a. and Fig. 5b. respectively.

TABLE III
PERFORMANCE MEASURES (ACCURACY)
Size of Test Li’s Accuracy Our Method’s Accuracy
Data (in KB)
1 0.88 0.98
2 0.91 0.95
3 0.95 0.97
4 0.93 0.95 Fig. 6. The desired output waveforms and the Bengali Font /tumi/ /ke/
5 0.91 0.95
8 0.95 0.96 Here the Bengali words are used and the equivalent English
meanings are given below -:
Performance Measures (Accuracy)
In Bengal, /tumi/ /ke/ ?
In English, “Who are you?”
Accuracy Values

1 Li's Accuracy In Bengali, /taaka/ /joma/ /kore/ /dao/


0.95 In English, “Deposit the Amount.”
0.9
Our Method's In Bengali, /aami/ /khabar/ /khabo/
0.85
Accuracy In English, “I will take food.”
0.8
1 2 3 4 5 8
Size of the Test data (KB)

Fig. 5a. Graphical Representation of Accuracy Measures.

Authorized licensed use limited to: IEEE Xplore. Downloaded on February 15, 2009 at 21:42 from IEEE Xplore. Restrictions apply.
IV. CONCLUDING REMARKS AND FURTHER WORK [16] Hang Li and Naoki Abe, ’Word Clustering and Disambiguation Based
on Co-occurrence data,’ Proceedings of the 18th International
An attempt has been made successfully to verify the Conference on Computational Linguistics and the 36th Annual Meeting
technology called Steganography, using Bengali words and of Association for Computational Linguistics, 749-755, 1998.
phrases with the Bengali font displayed in the result, as shown
in Fig. 6. The result has been tested in a standalone machine. VI. BIOGRAPHIES
However, this approach is limited to Bengali words
commonly used in daily conversation. The work may be Tilendra Shishir Sinha was born in July 1968 in
Dhanbad district, Jharkhand, India. He is graduated
further carried out for more complex Bengali words and from Nagpur University, Nagpur, Maharashtra
phrases for more secured data transmission promoting global State, India, in Computer Technology Engineering
cyber security. in the year 1992 and later did his post graduation
The Accuracy and the Precision measures for the performance from National Institute of Technology,
Jamshedpur, Jharkhand, India. Now he is pursuing
of the system is computed and compared, and the result of the his Ph.D in Computer Science and Engineering
proposed algorithm is satisfactory. from NIT, Durgapur, West Bengal, India.

V. REFERENCES
Gautam Sanyal was born in August 1954 in
Periodicals: Burdwan district, West Bengal, India. He has
[1] David Clark, “Encryption Advances to Meet Internet Challenges,” IEEE received his B.E and M.Tech degree from REC,
Transactions on Computer Communication, pp. 20-24, August, 2004. Durgapur, now, National Institute of Technology,
[2] 2nd Lt. James Caldwell, “U. S. Air Force, Steganography, Software Durgapur. He has received his Ph.D (Engg.) from
Technology Support Center,” June 2003. [Online] Available: Jadavpur University, Kolkata, West Bengal, India,
http://www.stsc.hill.af.mil/crostalk/2003/06/caldwell.html, 12th in the area of Robot Vision. He is presently
November, 2005. working as an HOD (CSE & MCA) at NIT,
[3] Clair, Bryan, “Steganography: How to Send a Secret Message”, 8th Durgapur. His area of interest includes Soft
October, 2001. [Online] Available : Computing, Networking and NLP.
http://www.strangehorizons.com/2001/20011008/steganography.shtml,
12th November, 2005.
[4] Hongwei Sun, Kwok-Yan Lam, Siu-Leung Chung, Weiming Dong,
Ming Gu, Jiaguang Sun,’ Efficient Vector Quantization using Genetic
Algorithm, ‘ Neural – Computing and Applications (2005) 14 : 203 –
211, DOI 10.1007/s00521-004-0413-4, Springer – Verlag London
Limited.
[5] Fernando Bacao, Victor Lobo, Marco Painbo, ‘Applying Genetic
Algorithms to Zone design, ‘ Soft Computing(2005) 9 :341 – 348, DOI
10.1007/s00500-004-0413-4, Springer – Verlag London Limited.
[6] Tilendra Shishir Sinha, Gautam Sayal and Abhijit Mukherji,’Some
Aspects of Modelling and Simulation for the Recognition of Speech and
Speaker of Bengali Language using proposed GSAMTSS Algorithm,’
International Journal of Systemics, Cybernetics and Informatics, April,
2006, pp. no. 69 – 75.
[7] Neil F. Johnson, ‘History and Steganography, ‘ Johnson and Johnson
Consultants, LLC, 1995 – 2003, [Online] Available :
http://www.jjtc.com/stegdoc/sec202.html, 12th November, 2005.
[8] Yasser Ghanbari and Mohammad Reza Karami,’Spectral Subtraction in
the Wavelet Domain for Speech Enhancement’,International Journal of
Softwares and Information Technologies, Vol 1, No.1, August, 2004, pp.
no. 26 – 29.
[9] Yu Lei, Deren Chen, Zhongding Jiang, ’Peer Generating Digital
Signatures on Mobile Devices,’ IEEE Transactions on Mobile
Computing, pp. 532, March, 2004.
[10] Scott Cambell,’Supporting Digital Signatures in Mobile Environments,’
IEEE Transactions on Mobile Computing, pp. 238, June, 2003.

Books:
[11] David E. Goldberg, ’Genetic Algorithms in search Optimization &
Machine Learning,’ Pearson Education Asia, ISBN 81-7808-130-X, pp.
185-213.
[12] Douglas O’ Shaughnessy, ’Speech Communication Human and
Machine,’ Universities Press, Second Edition, ISBN 81-7371-374-X.
[13] J.-S. R. Jang, C.-T. Sun, E. Mizutani, ‘Neuro-Fuzzy and Soft
Computing, - A Computational approach to Learning and Machine
Intelligence,’ First Edition, PHI, ISBN 81-297-0324-6.
[14] Mohammad H. Hassoun, ‘Fundamentals of Artificial Neural Networks,’
Eastern Economy Edition, PHI, ISBN 81-203-1356-9.

Papers from Conference Proceedings (Published):


[15] Constantine Orasan and Richard Evans, 2001, ‘Learning to identify
animate references,’ In Proceedings of Learning (CoNLL-2001), pages
129 – 136, Toulouse, France, July 6 –7, 2001.

Authorized licensed use limited to: IEEE Xplore. Downloaded on February 15, 2009 at 21:42 from IEEE Xplore. Restrictions apply.

You might also like