You are on page 1of 22

Knowledge-Based Systems 105 (2016) 248269

Contents lists available at ScienceDirect

Knowledge-Based Systems
journal homepage: www.elsevier.com/locate/knosys

ZCR-aided neurocomputing: A study with applications


Rodrigo Capobianco Guido
Instituto de Biocincias, Letras e Cincias Exatas, Unesp - Univ Estadual Paulista (So Paulo State University), Rua Cristvo Colombo 2265, Jd Nazareth,
15054-000, So Jos do Rio Preto - SP, Brazil

a r t i c l e i n f o a b s t r a c t

Article history: This paper covers a particular area of interest in pattern recognition and knowledge-based systems
Received 18 March 2016 (PRKbS), being intended for both young researchers and academic professionals who are looking for a pol-
Revised 1 May 2016
ished and rened material. Its aim, playing the role of a tutorial that introduces three feature extraction
Accepted 7 May 2016
(FE) approaches based on zero-crossing rates (ZCRs), is to offer cutting-edge algorithms in which clarity
Available online 10 May 2016
and creativity are predominant. The theory, smoothly shown and accompanied by numerical examples,
Keywords: innovatively characterises ZCRs as being neurocomputing agents. Source-codes in C/C++ programming
Zero-crossing rates (ZCRs) language and interesting applications on speech segmentation, image border extraction and biomedical
Pattern recognition and knowledge-based signal analysis complement the text.
systems (PRKbS)
Feature extraction (FE) 2016 Elsevier B.V. All rights reserved.
Speech segmentation
Image border extraction
Biomedical signal analysis

1. Introduction As in the previous, this essay suggests possible future trends for
the PRKbS community. In doing so, it is organised as follows. The
1.1. Objective and tutorial structure concept of ZCRs and some recent related work pertaining to these
constitute the next subsections of these introductory notes. Then,
In a previous work, I published a tutorial on signal energy and Section 2 presents the proposed algorithms for FE, their corre-
its applications [1], introducing alternative and innovative digital sponding implementations in C/C++ programming language [8] and
signal processing (DSP) algorithms designed for feature extraction my particular point-of-view which characterises ZCRs as being
(FE) [24] in pattern recognition and knowledge-based systems neurocomputing agents. Moving forward, Section 3 shows numeri-
(PRKbS) [5,6]. At that time, I intended to cover the lack of novelty cal examples and Section 4 describes the tests and results obtained
in related approaches based on consistency among creativity, sim- during the analyses of both 1D and 2D data. Lastly, Section 5 re-
plicity and accuracy. So it is presently, opportunity in which three ports the conclusions that are followed by the references.
methods for FE from unidimensional (1D) and bidimensional (2D) Throughout this document, detailed descriptions, graphics, ta-
data are dened, explained and exemplied, pursuing and taking bles and algorithms are abundant, however, for a much better un-
advantage of my own three previous formulations [1]. The dif- derstanding, I strongly encourage you, the reader of this tutorial,
ferences between that and this work are related to the concepts to learn my previous text [1] before proceeding any further.
and their corresponding physical meanings adopted to substanti-
ate them: antecedently, signal energy was used to provide infor-
1.2. A review on ZCRs and their applications
mation on workload, on the other hand, zero-crossing rates (ZCRs)
are currently handled to retrieve spectral behaviour [7] of signals.
Although its roots were traced back before [9] and throughout
Complementarily, ZCRs are interpreted as being neurocomputing
[10,11] the beginning of DSP, the suitability of ZCRs has been inten-
agents, which characterises an innovation that this work offers to
sively pointed out by the speech processing community, the one in
the scientic community. Another remarkable contribution consists
which their applications are more frequent [12]. Thus, ZCRs, as be-
of the use of ZCRs for 2D signal processing and pattern recognition,
ing the simplest existing tools used to extract basic spectral infor-
a concept practically inexistent up to date.
mation from time-domain signals without their explicit conversion
to the frequency-domain [13], play an important role in DSP and

Corresponding author. PRKbS.
E-mail address: guido@ieee.org Despite the word rate in its name, ZCR is dened, in its ele-
URL: http://www.sjrp.unesp.br/guido/ mentary form, as being the number of times a signal waveform

http://dx.doi.org/10.1016/j.knosys.2016.05.011
0950-7051/ 2016 Elsevier B.V. All rights reserved.
R.C. Guido / Knowledge-Based Systems 105 (2016) 248269 249

Fig. 1. The example signal s[] = {2, 3, 5, 4, 2, 3, 5} and its four zero-crossings represented as red square dots. (For interpretation of the references to colour in this gure
legend, the reader is referred to the web version of this article).

Fig. 2. In blue, the pure sine wave; in red, the composed sine wave; in brown, the square wave. (For interpretation of the references to colour in this gure legend, the
reader is referred to the web version of this article).

crosses the amplitude zero. An alternative and formal manner to over the others, whilst mandatory to dene its general waveform
express this concept, letting s[] = {s0 , s1 , s2 , ..., sM1 } be a discrete- shape. Consequently, it is often the minor oscillations produced by
time signal of length M > 1, is the higher harmonics that do not generate zero-crossings. There-
fore, the ZCR of a given signal is much more likely to provide in-
1
M2
formation on its fundamental frequency than a detailed description
ZCR(s[] ) = |sign(s j ) sign(s j+1 )|, (1)
of its complete frequency content.
2
j=0
Another relevant concept is the direct relationship between the
 fundamental frequency of a signal and its ZCR. Since sinusoids are
1 if x  0;
being ZCR(s[]) 0 for any s[] and sign(x ) = 1 otherwise . In the periodic in 2 , each period contains two zero-crossings, as shown
next section, distinct normalisation procedures will be applied to in Fig. 3. Thus, if a 1D signal s[] of length M crosses G times the
ZCRs in order for the word rate to make the intended sense. amplitude zero, it contains G2 sinusoidal periods at that frequency.
As an example, let s[], of size M = 7, be the discrete-time Considering that, at the time the signal was converted from its
signal for which the samples are {2, 3, 5, 4, 2, 3, 5}. Then, analog to its digital version [14], the sampling rate was R sam-
 1 5
j=0 |sign (s j ) sign (s j+1 )| = 2
ZCR(s []) = 12 M2 j=0 |sign (s j ) ples per second, then R1 is the period of time between consecutive
sign(s j+1 )| = 12 (| 1 1| + |1 (1 )| + | 1 1| + |1 1| + samples, entailing that M R1 = MR is the time extension of the ana-
|1 1| + |1 (1 )| ) = 12 (| 2| + |2| + | 2| + |0| + |0| + |2| ) = log signal in seconds. Concluding, in M G
R seconds there are 2 sinu-
R
2 (2 + 2 + 2 + 0 + 0 + 2 ) = 4, i.e., the waveform of s[] crosses its
1 soidal periods, implying that, proportionally, there are 2GM periods
amplitude axis four times at the value 0, as can be easily seen in per second, i.e., the frequency, F, caught by the ZCRs is
Fig. 1.
GR
The elementary example I have just described is really quite F (ZCR( f [] )) = Hz. (2)
simple, however, I ask for your attention in order to gure out
2M
the correct physical meaning of ZCRs, avoiding underestimations. Obviously, the previous formulation is only valid if the sinu-
For that, a basic input drawn from Fouriers theory and his mathe- soids are not shifted on the amplitude axis, i.e., no constant value
matical series [14] is required: the statement which conrms that is added to them. Equivalently, the signal under analysis is required
any signal waveform distinct of the sinusoidal can be decomposed to have its arithmetic mean equal zero, implying that an initial ad-
as an innite linear combination of sinusoids with multiple fre- justment may be necessary prior to counting the ZCRs, otherwise
quencies, called harmonics. Thus, a signal waveform that matches they would not be physically meaningful. The simplest process to
exactly a sinusoidal function, with a certain period, phase and am- normalise a signal s[] in order to turn its mean to zero is to shift
plitude, is classied as being pure. Conversely, any other type of each one of its samples, subtracting its original mean, i.e.,
signal waveform consists of a main sinusoid called fundamental or M1 
rst harmonic, owning the lowest frequency among the set, added j=0 sj
sk sk , (0  k  M 1 ) . (3)
together with the other sinusoids of higher frequencies, i.e., the M
second harmonic, the third harmonic, the fourth harmonic, and so
In order to illustrate the concepts I have just exposed, the
on, in a descending order of magnitude.
readers are requested to consider the signal s[] = { 12 12
10 , 3, 10 , 3,
The connection between ZCRs and Fouriers series is now ex-
10 , 3, 10 , 3, 10 }, of length M = 9, that was sampled at 36 sam-
12 12 12
plained on the basis of the following example, illustrated in Fig. 2.
ples per second and is illustrated in Fig. 4. Its arithmetic mean
In blue, red and brown, respectively, a pure sine wave, a composi- 12 +3+ 12 +3+ 12 +3+ 12 +3+ 12
tion of two sine waves and a square wave that is essentially the is 10 10 10
9
10 10
= 2 = 0, i.e., the normalisation de-
sum of innite sinusoids, are shown, all with the same length. ned in Eq. (3) must be applied before the ZCRs are counted.
Interestingly, the three curves have exactly the same number of Thus, s[] becomes { 12 12 12
10 2, 3 2, 10 2, 3 2, 10 2, 3 2, 10
12

ZCRs, however, according to Fouriers theory, their frequency con- 2, 3 2, 12


10 2 } = { 8
10 , 1 , 8
10 , 1 , 8
10 , 1 , 8
10 , 1 , 8
10 } , which has
tents are considerably different. Based on the example, the learnt its mean equal zero and is also shown in Fig. 4. ZCRs are now
lesson is: the rst harmonics of a non pure signal are dominant ready to be counted, according to Eq. (1), resulting in G = 8 zero-
250 R.C. Guido / Knowledge-Based Systems 105 (2016) 248269

Fig. 3. A sine wave and its zero-crossings.

Fig. 4. The example original signal s[] = { 12


10
, 3, 12
10
, 3, 12
10
, 3, 12
10
, 3, 12
10
}, in olive, and its translated version, { 108 , 1, 108 , 1, 108 , 1, 108 , 1, 108 }, with zero mean, in red. (For
interpretation of the references to colour in this gure legend, the reader is referred to the web version of this article).

R
crossings. Therefore, the signal fundamental frequency is 2GM = In article [16], authors used ZCR in addition to a least-mean
836 squares lter for speech enhancement. Their successful strategy
29 = 16 Hz.
In comparison with the most common features, ZCRs have the consists of using ZCRs at the nal stage of their algorithm in order
following advantages. First, they are extremely simple to be com- to identify patterns, providing the desired improvements. Signal-
puted, with a linear order of time and space complexities [15]. Sec- to-noise ratio (SNR) increased about 22 dB in the signals anal-
ond, as mentioned above, they are the only features which reveal ysed, conrming the important role of ZCR-based PRKbS. Speech
spectral information on input data without an explicit conversion enhancement based on ZCRs is also the focus of the paper [17],
from time to frequency-domain. Third, as consolidated in the lit- which points out that zero-crossing information is more accurate
erature, relevant problems on DSP and PRKbS, such as those in- than the cross-correlation for the proposed task.
volving speech processing [12], can benet from them. Nonethe- In association with short-time energy, the authors of article
less, the disadvantages presented by ZCRs must be considered. Ini- [18] applied ZCRs to distinguish speech from music. In conjunction
tially, spectral information on the signal under analysis is not com- with an ordinary k-means algorithm, ZCRs allowed the identica-
plete, as obtained, for instance, with the Discrete Fourier Trans- tion of quasi-periodic patterns that are key for their task, attaining
form [13] or the Discrete Wavelet Transform [69]. In addition, a 96.10% accuracy with 110 music clips and 140 speech les. Recipro-
joint time-frequency mapping is not possible with them, i.e., fre- cally, the authors of article [19] showed a ZCR-based estimator that
quency localisation can only be performed based on a manual- works better than the sample autocorrelation method to analyse
controlled partition of the signal. Lastly, features extracted on stationary Gaussian processes. Their work was successfully devel-
their bases may be considerably disturbed if originated from noisy oped on the basis of a mathematical analysis of random noise. In
inputs. [20], researchers proposed a method to estimate the frequency of
Summarising, ZCRs are neither better nor worse than other fea- a harmonic trend based on ZCRs. In addition to a low computation
tures, from a general point-of-view. Instead, they present advan- time, their results demonstrate the possible use of such features in
tages and disadvantages that have to be taken into account for practical applications.
each particular PRKbS problem. In any event, they should be con- An interesting aspect of ZCRs was shown by the authors of ar-
sidered every time that modest or incomplete spectral information ticle [21], who analysed transient signals to demonstrate these can
is found to be useful. In order to complement the review on ZCRs, be accurately found based on zero-crossings. Applications related
the remaining of this section is dedicated to describe their recent to the estimate of epochs in speech signals were performed, con-
applications, those found in the literature. rming the authors assumptions. In conjunction with other fea-
R.C. Guido / Knowledge-Based Systems 105 (2016) 248269 251

Fig. 5. 1D example for B1 aiming DSP and PRKbS with its variants TA, PA and EA: sliding window with length L = 8 traversing s[] with overlap V = 50%. The symbols wi
represent the kth positioning of the window, for k = 0, 1, 2, ..., T 1, and wh is the window that contains the highest number of ZCRs.

tures, such as energy, the authors of paper [22] present different considered as being the input. The procedure consists of a slid-
methods to distinguish voiced from unvoiced segments in speech ing rectangular window, w, of length L traversing the signal so
signals. Empirically, the size of the speech segments were deter- that, for each placement, the ZCR over that position is determined.
mined for better accuracy during their successful analyses. Sim- Each subsequent positioning overlaps in V% the previous one, being
ilar experiments were also performed by the authors of paper the surplus samples at the end of the signal, which are not long
[23], conrming the ndings. In [24], a practical and noise-robust enough to be overlapped by a L-sample window, disposed. The re-
speech recognition system based on ZCRs was developed. Authors strictions (2  L  M) and (0  V < 100) are mandatory.
showed improvements on baseline approaches at a rate of about In a DSP context, the ZCRs computed over the fragments of s[]
18.8%. Humanoid robots also benet from ZCRs, for speech recog- may be directly used to determine the fundamental frequencies it
nition and segregation purposes, according to the experiments de- contains, based on Eq. 2. On the other hand, in case PRKbS asso-
scribed in [25]. ciated with handcrafted FE is the objective, as explained in [1]-
In order to successfully predict epileptic seizures in scalp elec- pp.2, s[] requires its conversion to a feature vector, f[], of length
troencephalogram signals, the authors of paper [26] modeled a T =  (100 M )(LV )
(100V )L , being   the oor operator. In this case, each
Gaussian Mixture of ZCRs intervals of occurrence, obtaining rel- fk , (0  k  T 1 ), corresponds to the ZCR computed over the kth
evant results. Interestingly, the authors of the paper [27] used a position of the window w. Of fundamental importance is the fact
modied ZCR to determine fractal dimensions of biomedical sig- that, for handcrafted FE, f[] requires normalisation prior to its use
nals. Similarly, in paper [28], authors evaluate a modied ZCR ap- as an input for a classier, as documented in [1]-pp.2.
proach for the detection of heart arrhythmias. A prominent appli- There are, basically, three possible ways to normalise f[]: in re-
cation of ZCRs can be found in paper [29], in which authors de- lation to the total amount (TA) of zero-crossings, in relation to the
veloped a brain-computer interface on their basis. A health moni- maximum possible amount (PA) of zero-crossings and in relation
toring scheme based on ZCRs characterises the work described in to the maximum existing amount (EA) of zero-crossings. Each of
[30], for which interesting aspects of such features are pointed out. the normalisations characterises a particular physical meaning for
Not surprisingly, a wide search on Web of Science and other sci- the ZCRs contained in f[], being adequate for a specic task in
entic databases, aiming to nd possible research articles describ- PRKbS. Comments on each of them follow, nonetheless, all the nor-
ing applications of ZCRs on image processing and computer vision, malisations force f[] to express a rate, bringing the proper sense to
i.e., 2D signals, returned a modest number of results: two confer- the letter R used in the abbreviation ZCR.
ence papers, being one recent [31] and the other published twenty The division of each individual ZCR in f[] by the sum of all ZCRs
years ago [32], and one journal paper published almost thirty years it contains, i.e.,
ago [33]. Possibly, this is due to the fact that digital images usually fr
have their pixels represented as being positive integer numbers, fr T 1 , ( 0  r  T 1 ),
inhibiting the use of zero-crossings. In this study, in addition to k=0 fk
the novel ZCR-based algorithms designed for 1D signals, 2D ones T 1
characterises TA. Once this procedure is adopted, f = 1.
k=0 k
are also considered just after a proper pre-processing strategy dis- Physically, TA forces f[] to express the fraction of ZCRs in each
cussed herein. segment of s[], being ideal to describe the way the fundamental
frequencies of an input signal, s[], vary in relation to its overall
2. The proposed methods behaviour.
To force f[] express individual spectral properties related to
Three different methods, i.e., B1 , B2 and B3 , respectively inspired each fragment of s[], in isolation, PA is required. The correspond-
on A1 , A2 and A3 introduced in [1], are proposed in this section. ing normalisation consists of dividing each ZCR in f[] by L 1, i.e.,
Their corresponding details follow. the highest possible number of ZCRs inside a window of length L:

2.1. Method B1 fr
fr , ( 0  r  T 1 ).
L1
B1 , illustrated in Fig. 5, is the simplest method I present in With this procedure, fk 1, for (0  k  T 1 ). The closer a cer-
this study, in which an 1D discrete-time signal s[] of length M is tain fk is to 0 or to 1, respectively, the lowest or highest the fun-
252 R.C. Guido / Knowledge-Based Systems 105 (2016) 248269

damental frequency at the corresponding window is, disregarding Algorithm 1 : fragment of C++ code for method B1 in 1D, adopting
the remaining fragments of s[]. the normalisation TA.
Lastly, EA is chosen whenever evaluation by comparison is //...
needed, particularly forcing the highest ZCR in f[] to be 1 and ad- // ensure that s[], of length M, is available as input
justing the remaining ones, proportionally, within the range (01 ). double mean = 0;
for(int k = 0; k < M; k + +)
The corresponding procedure consists of dividing each individual mean+ = s[k]/(double )(M );
ZCR in f[] by the highest unnormalised ZCR contained in it, the for(int k = 0; k < M; k + +)
one computed over the window placement named wh , i.e., s[k] = mean; //at this point, the arithmetic mean of the input signal is 0
int L = /* the desired positive value, not higher than M */;
fr
fr , ( 0  r  T 1 ). int V = /* the desired positive value, lower than 100 */;
ZCR(wh ) int T = (int)((100 M L V )/((100 V )L));
int ZCR = 0; // ZCR is the total number of zero-crossings over all the window place-
All the previous formulations and concepts can be easily ex- ments, required for normalisation
tended to a 2D signal, m[][], with N rows and M columns, which double f = new double[T ]; // dynamic vector declaration
represent, respectively, the height and width of the corresponding for(int k = 0; k < T ; k + +)
image. As in the unidimensional case, the computation of a bidi- {
f [k] = 0;
mensional ZCR requires all the values in m[][] to be previously
for(int i = k ((int )(((100 V )/100.0 ) L )); i < k ((int )(((100
shifted so that its arithmetic mean becomes equal zero, i.e., V )/100.0 ) L )) + L 1; i + +)
N1 M1  f [k]+ =(s[i] s[i + 1] < 0)?1:0; / multiplying subsequent samples
i=0 j=0 mi, j
m p,q m p,q , results in a negative value if they are between 0. This is equivalent to the theoretical
MN procedure described in the text and based on equation 1. /
( 0  p  M 1 ), ( 0  q  N 1 ). (4) ZCR+ = f [k];
}
Once m[][] presents zero mean, its ZCR is simply the sum of for(int k = 0; k < T ; k + +) // normalisation
individual ZCRs in each row and column, i.e., f [k]/ = (double )(ZCR ); / the casting, i.e., the explicit conversion of ZCR
from int to double is, theoretically, not required, however, some C/C++ compilers have
1
N1 M2
presented problems when a double-precision variable is divided by an int one, result-
ZCR(m[][] ) = |sign(mi, j ) sign(mi, j+1 )| ing in 0. To avoid this issue, the casting is used. /
2
i=0 j=0 // at this point, the feature vector, f [], is ready
//...
1 
M1 N2
+ |sign(mi, j ) sign(mi+1, j )|. (5)
2
j=0 i=0
Algorithm 2 : fragment of C++ code for method B1 in 1D, adopting
Similarly to 1D signals, ZCRs computed in 2D are useful for both the normalisation PA.
DSP and FE in PRKbS, as illustrated in Fig. 6. In the latter, the case //...
of interest, the feature vector, f[], contains not only T, but T P // ensure that s[], of length M, is available as input
double mean = 0;
elements, being P =  (100 N )(LV )
(100V )L , as in the bidimensional case of for(int k = 0; k < M; k + +)
method A1 , explained in [1]-pp.3. During the analysis, m[][] is tra- mean+ = s[k]/(double )(M );
versed along the horizontal orientation based on T placements of for(int k = 0; k < M; k + +)
the square window w of side L, being L < M and L < N. Then, the s[k] = mean; //at this point, the arithmetic mean of the input signal is 0
int L = /* the desired positive value, not higher than M */;
process is repeated for each one of the P shifts along the vertical int V = /* the desired positive value, lower than 100 */;
orientation. int T = (int)((100 M L V )/((100 V )L));
TA, PA and EA are also the possible normalisations for the 2D double f = new double[T ]; // dynamic vector declaration
version of B1 applied for FE in PRKbS. Particularly, TA requires each for(int k = 0; k < T ; k + +)
{
component of f[] to be divided by the sum of all ZCRs it contains,
f [k] = 0;
i.e., the sum of all the values in that vector prior to any normali- for(int i = k ((int )(((100 V )/100.0 ) L )); i < k ((int )(((100
sation. On the other hand, if PA is adopted, each element in f[] is V )/100.0 ) L )) + L 1; i + +)
divided by the maximum possible number of ZCRs inside w, i.e., f [k]+ =(s[i] s[i + 1] < 0)?1:0; / multiplying subsequent samples
results in a negative value if they are between 0. This is equivalent to the theoretical
( L 1 ) ( L ) + ( L 1 ) ( L ) = 2 L ( L 1 ). procedure described in the text and based on equation 1. /
 

 

f [k]/ = (double )(L 1 ); / the casting, i.e., the explicit conversion of L from
maximum ZCR number of maximum ZCR number of
in one row rows in one column columns int to double is, theoretically, not required, however, some C/C++ compilers have pre-
sented problems when a double-precision variable is divided by an int one, resulting
Lastly, the choice for EA implies that each component of f[] is di- in 0. To avoid this issue, the casting is used. /
vided by the highest ZCR contained in it, the one computed over }
the window placement named wh . // at this point, the feature vector, f [], is ready
Exactly as in A1 [1], for both 1D and 2D signals, respectively, //...

B1 is only capable of generating a T, or a T P, sample-long vec-


tor f[] if the value of L is subjected to the value of M, or M
and N. Thus, the value of L intrinsically depends on the length of 2.2. Method B2
the input 1D signal s[], or the dimensions of the input 2D ma-
trix m[][], bringing a disadvantage: irregular, temporal or spatial B2 , as A2 in [1], is also based on a sliding window w traversing
analysis. Oppositely, the advantage is that a few sequential ele- s[], or m[][]. Two differences, however, exist: there are no over-
ments of f[], obtained by predening L, T and P, allow the de- laps and the window length for 1D, or the rectangle sizes for 2D,
tection of some particular event in the 1D or 2D signal under vary. Thus, s[] or m[][] are inspected in different levels of resolu-
analysis. tion.
The algorithms 1, 2 and 3, respectively, contain the source code After applying Eq. (3) or Eq. (4), respectively to remove the
in C/C++ programming language that implement method B1 with mean of s[] or of m[][], the feature vector, f[], is dened as be-
the normalisations TA, PA and EA, all of them for 1D input signals. ing the concatenation of Q sub-vectors of different dimensions, i.e.,
At variance with this, algorithms 4, 5 and 6 correspond, respec- f [] = {1 []} {2 []} {3 []} ... {Q []}. For 1D, each sub-vector
tively, to the 2D versions of B1 with the same normalisations. is created by placing w over T non-overlapping sequential positions
R.C. Guido / Knowledge-Based Systems 105 (2016) 248269 253

Fig. 6. 2D example for B1 aiming DSP and PRKbS with its variants TA, PA and EA: sliding square with length L = 3 traversing m[][] with overlap V = 66.67%. Again, wk [ ]
indicate the kth position of the window, for k = 0, 1, 2, ..., (T P ) 1. Dashed squares in the arbitrary positions 0, 1, 15, 16, 41 and 63, are shown.

of s[] and then calculating the normalised ZCRs using TA, PA, or  (100 M )(LV ) (100N )(LV )
(100V )L  = 3 and P =  (100V )L  = 3, respectively, im-
EA, as previously explained during the description of B1 , i.e.,: plying that T P = 3 3 = 9 non-overlapping rectangles traverse
subvector 1 [] is obtained by letting L =  M m[][];
2  and V = 0%,
idem to subvector 3 [], obtained by letting L =  M 5  and
which that T =  (100 M )(LV )
(100V )L  = 2, to traverse s[] and get the V = 0% and then L =  N5  and V = 0%, which that T =
normalised ZCRs;
idem to subvector 2 [], obtained by letting L =  M  (100 M )(LV ) (100N )(LV )
(100V )L  = 5 and P =  (100V )L  = 5, respectively, im-
3  and V =
0%, which that T =  (100 M )(LV ) plying that T P = 5 5 = 25 non-overlapping rectangles tra-
(100V )L  = 3; verse m[][];
idem to subvector 3 [], obtained by letting L =  M5  and V =
...
0%, which that T =  (100 M )(LV )
(100V )L  = 5; idem to subvector Q [], obtained by letting L =  MX  and
... V = 0% and then L =  N X  and V = 0%, which that T =
idem to subvector Q [], obtained by letting L =  M
X  and V =  (100 M )(LV ) (100N )(LV )
(100V )L  = X and P =  (100V )L  = X, respectively, im-
0%, which that T =  (100 M )(LV )
(100V )L  = X. plying that T P = X X = X 2 non-overlapping rectangles tra-
Q is dened on the basis of the desired renement and, simi- verse m[][];
larly, the values 2, 3, 5, 7, 9, 11, 13, 17, ..., X are choices for T, that is
essentially restricted to prime numbers in order to avoid one sub- In the 2D version of B2 , the normalisations TA and EA are im-
vector to be a linear combination of another, implying in no gain plemented exactly as they were in B1 . One particular note regard-
for classication. ing the normalisation PA is, however, important. Differently to B1
For the 2D case, each sub-vector is created by framing m[][] in 2D, in which L is the same for both horizontal and vertical ori-
with T P non-overlapping rectangles, being T = P prime numbers, entations, B2 divides the input image into rectangles, i.e., the hori-
to compute the corresponding normalised ZCRs, i.e., zontal and vertical sides are  M
X  and  X , respectively. Thus, each
N

element of f[] is not divided by 2 L (L 1 ), but by


subvector 1 [] is created by letting L =  M
2  and V = 0% to
M N N
obtain T =  (100 M )(LV )
(100V )L  = 2 and then by letting L =  2  and
N M
 1   +  1  
V = 0% to obtain P =  (100 N )(LV )
(100V )L  = 2. Subsequently, m[][] is  X 


X
 X 
X

 
traversed by T P = 2 2 = 4 non-overlapping rectangles ; maximum ZCR number of
rows
maximum ZCR number of
columns
in one row in one column
idem to subvector 2 [], obtained by letting L =  M 3  and M N M N
V = 0% and then L =  N3  and V = 0%, which that T = =2       ,
X X X X
254 R.C. Guido / Knowledge-Based Systems 105 (2016) 248269

Algorithm 5 : fragment of C++ code for method B1 in 2D, adopting


Algorithm 3 : fragment of C++ code for method B1 in 1D, adopting the normalisation PA.
the normalisation EA. // ensure that m[][], with height N and width M, is available as input
// ensure that s[], of length M, is available as input double mean = 0;
double mean = 0; for(int p = 0; p < N; p + +)
for(int k = 0; k < M; k + +) for(int q = 0; q < M; q + +)
mean+ = s[k]/(double )(M ); mean+ = m[ p][q]/(double )(M N );
for(int k = 0; k < M; k + +) for(int p = 0; p < N; p + +)
s[k] = mean; //at this point, the arithmetic mean of the input signal is 0 for(int q = 0; q < M; q + +)
int L = /* the desired positive value, not higher than M */; m[ p][q] = mean; //at this point, the arithmetic mean of the
int V = /* the desired positive value, lower than 100 */; input signal is 0
int T = (int)((100 M L V )/((100 V )L));
int L = /* the desired positive value, not higher than the higher between M
and N */;
int highest _ZCR = 0;
int V = /* the desired positive value, lower than 100 */;
double f = new double[T ]; // dynamic vector declaration
int T = (int)((100 M L V )/((100 V )L));
for(int k = 0; k < T ; k + +) int P = (int)((100 N L V )/((100 V )L));
{ double f = new double[T P]; // dynamic vector declaration
f [k] = 0; for(int k = 0; k < T P; k + +)
for(int i = k ((int )(((100 V )/100.0 ) L )); i < k ((int )(((100 {
V )/100.0 ) L )) + L 1; i + +) f [k] = 0;
f [k]+ =(s[i] s[i + 1] < 0)?1:0; / multiplying subsequent samples for(int i = k ((int )(((100 V )/100.0 ) L )); i < k ((int )(((100
results in a negative value if they are between 0. This is equivalent to the theoretical V )/100.0 ) L )) + L; i + +)
procedure described in the text and based on equation 1. / for(int j = k ((int )(((100 V )/100.0 )) L ); j <
if ( f [k] > highest _ZCR) k ((int )(((100 V )/100.0 )) L ) + L 1; j + +)
highest _ZCR = f [k]; f [k]+ = (m[i][ j] m[i][ j + 1] < 0)?1:0; / multiplying
} subsequent samples results in a negative value if they are between 0. This
is equivalent to the theoretical procedure described in the text and based on
for(int k = 0; k < T ; k + +)
equation 1. /
f [k]/ = (double )(highest _ZCR);/ the casting, i.e., the explicit conversion of
for(int i = k ((int )(((100 V )/100.0 ) L )); i < k ((int )(((100
highest_ZCR from int to double is, theoretically, not required, however, some C/C++ V )/100.0 ) L )) + L 1; i + +)
compilers have presented problems when a double-precision variable is divided by for(int j = k ((int )(((100 V )/100.0 )) L ); j <
an int one, resulting in 0. To avoid this issue, the casting is used. / k ((int )(((100 V )/100.0 )) L ) + L; j + +)
// at this point, the feature vector, f [], is ready f [k]+ = (m[i][ j] m[i + 1][ j] < 0)?1:0; / multiplying
subsequent samples results in a negative value if they are between 0. This
is equivalent to the theoretical procedure described in the text and based on
equation 1. /
Algorithm 4 : fragment of C++ code for method B1 in 2D, adopting }
for(int k = 0; k < T P; k + +)
the normalisation TA. f [k]/ = (double )(2 L (L 1 )); the casting, i.e., the explicit conver-
// ensure that m[][], with height N and width M, is available as input sion of L from int to double is, theoretically, not required, however, some
double mean = 0; C/C++ compilers have presented problems when a double-precision variable
for(int p = 0; p < N; p + +) is divided by an int one, resulting in 0. To avoid this issue, the casting is
for(int q = 0; q < M; q + +) used. /
mean+ = m[ p][q]/(double )(M N ); // at this point, the feature vector, f [], is ready
for(int p = 0; p < N; p + +)
for(int q = 0; q < M; q + +)
m[ p][q] = mean; //at this point, the arithmetic mean of the input
signal is 0 that corresponds to the maximum possible number of zero-
int L = /* the desired positive value, not higher than the higher between M and N crossings in each rectangular sub-image.
*/;
Figs. 7 and 8 show the sliding window for 1D and the sliding
int V = /* the desired positive value, lower than 100 */;
int T = (int)((100 M L V )/((100 V )L)); rectangle for 2D, respectively, for TA, PA and EA. In addition, the
int P = (int)((100 N L V )/((100 V )L)); algorithms 8, 7 and 9 contain the corresponding 1D implementa-
int ZCR = 0; // ZCR is the total number of zero-crossings over all the window place- tions. The 2D ones are in the algorithms 1012.
ments, required for normalisation
double f = new double[T P]; // dynamic vector declaration
for(int k = 0; k < T P; k + +)
{ 2.3. Method B3
f [k] = 0;
for(int i = k ((int )(((100 V )/100.0 ) L )); i < k ((int )(((100
As described above, B1 and B2 focus on measuring the levels
V )/100.0 ) L )) + L; i + +)
for(int j = k ((int )(((100 V )/100.0 )) L ); j < k ((int )(((100 of normalised ZCRs over windows or rectangles of certain dimen-
V )/100.0 )) L ) + L 1; j + +) sions. B3 , on the other hand, is quite similar to A3 [1] and consists
f [k]+ = (m[i][ j] m[i][ j + 1] < 0)?1:0; / multiplying sub- of determining the proportional lengths, or areas, of the signal un-
sequent samples results in a negative value if they are between 0. This is equivalent der analysis that are required to reach predened percentages of
to the theoretical procedure described in the text and based on equation 1. /
the total ZCR. Normalisations do not apply in this case. The direct
for(int i = k ((int )(((100 V )/100.0 ) L )); i < k ((int )(((100
V )/100.0 ) L )) + L 1; i + +) consequence of this approach is the characterisation of B3 as being
for(int j = k ((int )(((100 V )/100.0 )) L ); j < k ((int )(((100 ideal to inspect the constancy in frequency of the physical entity
V )/100.0 )) L ) + L; j + +) responsible for generating s[], or m[][].
f [k]+ = (m[i][ j] m[i + 1][ j] < 0)?1:0; / multiplying sub-
Specically, C is dened as being the critical base-level of ZCRs,
sequent samples results in a negative value if they are between 0. This is equivalent
to the theoretical procedure described in the text and based on equation 1. / (0 < C < 100), and then, for 1D, the feature vector f[] of size T is
ZCR+ = f [k]; determined as follows:
}
for(int k = 0; k < T P; k + +) f0 is the proportion of the length of s[], i.e., M, starting from its
f [k]/ = (double )(ZCR ); / the casting, i.e., the explicit conversion of ZCR
from int to double is, theoretically, not required, however, some C/C++ compilers have
beginning, which is covered by the window placement w0 [ ],
presented problems when a double-precision variable is divided by an int one, result- required to reach C% of the total ZCR;
ing in 0. To avoid this issue, the casting is used. / fB is the proportion of the length of s[], i.e., M, starting from its
// at this point, the feature vector, f [], is ready beginning, which is covered by the window placement w1 [ ],
required to reach 2 C% of the total ZCR;
R.C. Guido / Knowledge-Based Systems 105 (2016) 248269 255

Fig. 7. 1D example for B2 assuming Q = 3: (a) sliding window, with length L =  M 2


 =  20
2
 = 10 traversing s[] in order to compose 1 []; (b) sliding window with length
L = M3
 =  20
3
 = 6 traversing s[] in order to compose 2 []; (c) sliding window with length L =  M5  =  20
5
 = 4 traversing s[] in order to compose 3 []. The window
positions do not overlap and the symbols wi indicate the ith window position, for i = 0, 1, 2, ..., T 1 .

fA is the proportion of the length of s[], i.e., M, starting from its fB is the proportion of m[][] area, i.e., M N, starting from m0, 0
beginning, which is covered by the window placement w2 [ ], and covered by the  1  x  1  rectangle w1 [ ][ ], required
required to reach 3 C% of the total ZCR; to reach 2 C% of the total ZCR;
... fA is the proportion of m[][] area, i.e., M N, starting from m0, 0
fT 1 is the proportion of the length of s[], i.e., M, starting and covered by the  2  x  2  rectangle w2 [ ][ ], required
from its beginning, which is covered by the window placement to reach 3 C% of the total ZCR;
wT 1 [], required to reach (T C)% of the total ZCR, so that (T ...
C) < 100%; fT P1 is the proportion of m[][] area, i.e., M N, starting from
m0, 0 and covered by the T 1  x T 1  rectangle wT 1 [][],
For B3 , the value of T is dened as being:
required to reach T C% of the total ZCR, so that (T C) < 100%;
100
1 if C is multiple of 100;
T = C
 100
C
 otherwise. The values of i and i , (0  i  T 1), are determined accord-
ing to the following rule, the exact same used for A3 in 2D 1 [1]:
The 2D version of B3 implies that f[], with the same size T, is de-
termined as follows:
f0 is the proportion of m[][] area, i.e., M N, starting from m0, 0 1
I will take this opportunity to correct an error in my previous published tutorial
and covered by the  0  x  0  rectangle w0 [ ][ ], required [1]-pp.270 regarding the description of A3 in 2D: the way i and i vary is in ac-
to reach C% of the total ZCR; cordance with a relationship between N and M, as shown above, instead of i and
256 R.C. Guido / Knowledge-Based Systems 105 (2016) 248269

Fig. 8. 2D example for B2 assuming Q = 3 subvectors: [above] sliding square with length { M 2
x N2 } = { 102 x 20
2
} = 5x10 traversing m[][] in order to compose 1 []; [mid-
dle] sliding square with length { M3
x N3 } = { 103 x 20
3
} = 3x6 traversing m[][] in order to compose 2 []; [below] sliding square with length { M5 x N5 } = { 105 x 205
} =
2x4 traversing m[][] in order to compose 3 []. Again, wi indicates the i window position, for i = 0, 1, 2, ..., (T P ) 1, with no overlap. Dashed squares represent the sliding
th

window in all possible positions.

Beginning: (i 0 ) and (i 0 ), unconditionally. End. Figs. 9 and 10, and algorithms 13 and 142 , complement my
explanations regarding B3 , for both 1D and 2D, respectively.

(i i + 1 ) and (i i + 1 ) if (N = M )
repeat (i i + 1 ) and (i i + M
N
) if (N > M ) 2.4. ZCRs are neurocomputing agents
(i i + MN ) and (i i + 1 ) otherwise.
In this subsection, the trail for an interesting point-of-view is
until the desired level of energy, i.e., C, 2 C, 3 C, ..., T C is explained. Additionally to Eq. 1, ZCRs may also be counted based
reached.
2
The algorithm for A3 in 2D, originally described in [1]-pp.273, also requires the
i themselves, as originally documented in that paper. A corrigendum is available same corrections I mentioned in the previous footnote, as described in the corri-
on-line at http://dx.doi.org/10.1016/j.neucom.2016.04.001 with details. gendum.
R.C. Guido / Knowledge-Based Systems 105 (2016) 248269 257

Fig. 9. 1D example for B3 , where Lwi represents the length of the window wi [ ], for i = 0, 1, 2, 3, ..., T 1.

Fig. 10. 2D example for B3 , where wi [ ] corresponds to the ith window, and i and i represent, respectively, its height and width, for i = 0, 1, 2, 3, ..., T P 1.

ber of ZCRs instead of its raw amount. Thus,


L2 L2
1  1  1 1
ZCR(s[] ) = =
i=0
1+ e (si si+1 )
i=0
1 + e (si si+1 )

is the simplest possibility to obtain a bounded outcome within


Fig. 11. The sigmoide function, y = 1+e1 x , exemplied for different values of :
the range from 0 to 1. Particularly, TA, PA and EA can be ad-
2, 5 and 10 0 0, respectively drawn in green, blue and brown. The proposed strategy
requires >>0 aiming at a response as the one drawn in brown.(For interpretation dressed as a function of , respectively, by letting it be equal to
T 1
of the references to colour in this gure legend, the reader is referred to the web k=0
ZCR(wk [] ), L 1 and ZCR(wh [ ]), as I dened previously.
version of this article). Clearly, the structure I propose corresponds to the original mul-
tilayer perceptron dened by Frank Rosenblatt [34], as shown in
Fig. 12, with some peculiarities. Its ith input, ith weight between
on a different strategy, which is the one I use in my algorithms: the input and the hidden layers, and ith weight between the hid-
two adjacent samples of a discrete-time signal, lets say si and si+1 , den and the output layers are, respectively, si , si+1 and 1 . More-
cross zero whenever their product is negative. Thus, over, the ith neuron of the input layer connects forward only with
the ith of the hidden one. Another possible interpretation for the
si si+1 1 if there is a zero-crossing between si and si+1
= , proposed structure is that of a weightless neural network, also
|si si+1 | 1 otherwise
known as random access memory (RAM) network [3537], so that
there are weights albeit pre-dened, implying that there is no
being the denominator used for normalisation.
learning procedure.
Purposely, I am inverting the polarities hereafter so that
s s Concluding, when we are counting the normalised ZCRs of a
|si si+1 | becomes either 1 or 1, respectively, in response to the
i i+1 certain signal, we are somehow neurocomputing it, moreover, on
presence or absence of a zero-crossing. Furthermore, despite the the basis of neurons which were born with a pre-established
fact that |s s1 | is the simplest existing normalisation, I am going knowledge. The potential of ZCRs awakens deeper attraction upon
i i+1
to replace it by a more convenient formulation to reach my objec- their characterisation as being specic neurocomputing agents,
tive: the sigmoide function parametrised with a slope > >0, as thus, my expectation is that the interdisciplinary community in-
shown in Fig. 11. We therefore have terested in PRKbS, FE, computational intelligence, articial neural
 networks, DSP and related elds will frequently take advantage of
1 if there is a zero-crossing between the methods I present.
1
= si and si+1 .
1+ e (si si+1 ) 0 otherwise
3. Numerical examples
In order to traverse a window of length L and count its ZCRs,
L2 1
the summation i=0 (s s ) is adopted. The readers may In order to shed some light on the proposed approaches, one
1+e i i+1
have learnt that, for FE, we are interested in the normalised num- numerical example follows for each case: methods B1 , B2 and B3 ,
258 R.C. Guido / Knowledge-Based Systems 105 (2016) 248269

Fig. 12. The proposed structure, with pre-dened weights.

Algorithm 6 : fragment of C++ code for method B1 in 2D, adopting Algorithm 7 : fragment of C++ code for method B2 in 1D, adopting
the normalisation EA. the normalisation TA.
// ensure that m[][], with height N and width M, is available as input // ensure that s[], of length M, is available as input
double mean = 0; double mean = 0;
for(int p = 0; p < N; p + +) for(int k = 0; k < M; k + +)
for(int q = 0; q < M; q + +) mean+ = s[k]/(double )(M );
mean+ = m[ p][q]/(double )(M N ); for(int k = 0; k < M; k + +)
for(int p = 0; p < N; p + +) s[k] = mean; //at this point, the arithmetic mean of the input signal is 0
for(int q = 0; q < M; q + +) int L; // window length
m[ p][q] = mean; //at this point, the arithmetic mean of the
int ZCR; // ZCR represents the total ZCR over all the window positions, that is re-
input signal is 0
quired to normalise f []
int L = /* the desired positive value, not higher than the higher between M
and N */; int X[] = {2, 3, 5, 7, 9, 11, 13, 17}; /* vector containing the prime numbers of interest.
int V = /* the desired positive value, lower than 100 */; It can be changed according to the experiment */
int T = (int)((100 M L V )/((100 V )L)); int total_size_of_f = 0;
int P = (int)((100 N L V )/((100 V )L)); for(int i = 0; i <(int)(sizeof(X)/sizeof(int));i + +) // number of elements in X[]
int highest _ZCR = 0; // ZCR is the total number of zero-crossings over all the total_size_of_f+=X[i];
window placements, required for normalisation double f = new double[total_size_of_f]; /* The total size of f [] is the sum of the
double f = new double[T P]; // dynamic vector declaration elements in X[], i.e., the size of the subvector 1 [] plus the size of the subvector
for(int k = 0; k < T P; k + +) 2 [], plus the size of the subvector 3 [], ..., and so on */
{ int jump = 0; // helps to control the correct positions to write in f []
f [k] = 0; for(int j = 0; j <(int)(sizeof(X)/sizeof(int)) ; j + +)
for(int i = k ((int )(((100 V )/100.0 ) L )); i < k ((int )(((100 {
V )/100.0 ) L )) + L; i + +)
ZCR = 0;
for(int j = k ((int )(((100 V )/100.0 )) L ); j <
for(int k = 0; k < X[ j]; k + +)
k ((int )(((100 V )/100.0 )) L ) + L 1; j + +)
{
f [k]+ = (m[i][ j] m[i][ j + 1] < 0)?1:0; / multiplying
subsequent samples results in a negative value if they are between 0. This L = (int )(M/X[ j] );
is equivalent to the theoretical procedure described in the text and based on f [ jump + k] = 0;
equation 1. / for(int i = (k L ); i < (k L ) + L; i + +)
for(int i = k ((int )(((100 V )/100.0 ) L )); i < k ((int )(((100 f [ jump + k]+ = (s[i] s[i + 1] < 0)?1:0;
V )/100.0 ) L )) + L 1; i + +) ZCR+ = f [ jump + k];
for(int j = k ((int )(((100 V )/100.0 )) L ); j < }
k ((int )(((100 V )/100.0 )) L ) + L; j + +) for(int k = 0; k < X[ j]; k + +)
f [k]+ = (m[i][ j] m[i + 1][ j] < 0)?1:0; / multiplying f [ jump + k]/ = (double )(ZCR );
subsequent samples results in a negative value if they are between 0. This jump+ = X[ j];
is equivalent to the theoretical procedure described in the text and based on }
equation 1. / // at this point, the feature vector, f [], is ready
if ( f [k] > highest _ZCR)
highest _ZCR = f [k];
}
for(int k = 0; k < T P; k + +)
f [k]/ = (double )(highest _ZCR ); the casting, i.e., the explicit conver- vector, which has length T =  (100 M )(LV ) (10010 )(450 )
sion of highest_ZCR from int to double is, theoretically, not required, however, (100V )L  =  (10050 )4  = 4,
some C/C++ compilers have presented problems when a double-precision vari- is obtained as follows:
able is divided by an int one, resulting in 0. To avoid this issue, the casting is
used. /
// at this point, the feature vector, f [], is ready w0 [ ], which covers the sub-signal {2, 1, 0, 1}, contains 1
zero-crossing, implying that f0 = 1;
w1 [ ], which covers the sub-signal {0, 1, 2, 2}, contains no
zero-crossings, implying that fB = 0;
both in 1D and 2D, assuming the normalisations previously de-
w2 [ ], which covers the sub-signal {2, 2, 1, 0}, contains no
scribed and based on hypothetical data.
zero-crossings, implying that fA = 0;
w3 [ ], which covers the sub-signal {1, 0, 1, 2}, contains 1
3.1. Numerical example for B1 in 1D zero-crossing, implying that f3 = 1.

Problem statement: Let s[] = {1, 2, 3, 4, 5, 5, 4, 3, 2, 1}, imply- For the normalisation TA, each component of f[] is divided
3
k=0 f k = 1 + 0 + 0 + 1 = 2. Thus, it becomes { 2 , 2 , 2 , 2 } =
1 0 0 1
ing in M = 10, and L = 4 be the window length, with overlaps of by
V = 50%. Obtain the feature vector, f[], according to the method { 2 , 0, 0, 2 }. On the other hand, for PA, each component of f[]
1 1

B1 . is divided by the maximum number of zero-crossings, i.e., L


Solution: First, the 1D signal mean, 1+2+3+4+5+5+4+3+2+1
10 = 1 = 3. Thus, it becomes { 13 , 03 , 03 , 13 } = { 13 , 0, 0, 13 }. Lastly, for EA,
30
10 = 3  = 0 , is subtracted from each component of s [ ], result- each component of f[] is divided by the highest component of
ing in {1 3, 2 3, 3 3, 4 3, 5 3, 5 3, 4 3, 3 3, 2 3, 1 its unnormalised version, i.e., 1. Thus, it becomes { 11 , 10 , 01 , 11 } =
3} = {2, 1, 0, 1, 2, 2, 1, 0, 1, 2}. The corresponding feature {1, 0, 0, 1}.
R.C. Guido / Knowledge-Based Systems 105 (2016) 248269 259

Algorithm 8 : fragment of C++ code for method B2 in 1D, adopting Algorithm 9 : fragment of C++ code for method B2 in 1D, adopting
the normalisation PA. the normalisation EA.
// ensure that s[], of length M, is available as input // ensure that s[], of length M, is available as input
double mean = 0; double mean = 0;
for(int k = 0; k < M; k + +) for(int k = 0; k < M; k + +)
mean+ = s[k]/(double )(M ); mean+ = s[k]/(double )(M );
for(int k = 0; k < M; k + +) for(int k = 0; k < M; k + +)
s[k] = mean; //at this point, the arithmetic mean of the input signal is 0 s[k] = mean; //at this point, the arithmetic mean of the input signal
int L; // window length is 0
int L; // window length
int X[] = {2, 3, 5, 7, 9, 11, 13, 17}; /* vector containing the prime numbers of interest.
int highest _ZCR; // E represents the total ZCR over all the window positions,
It can be changed according to the experiment */
that is required to normalise f []
int total_size_of_f = 0; int X[] = {2, 3, 5, 7, 9, 11, 13, 17}; /* vector containing the prime numbers of
for(int i = 0; i <(int)(sizeof(X)/sizeof(int));i + +) // number of elements in X[] interest. It can be changed according to the experiment */
total_size_of_f+=X[i]; int total_size_of_f = 0;
double f = new double[total_size_of_f]; /* The total size of f [] is the sum of the for(int i = 0; i <(int)(sizeof(X)/sizeof(int));i + +) // number of elements in
elements in X[], i.e., the size of the subvector 1 [] plus the size of the subvector X[]
2 [], plus the size of the subvector 3 [], ..., and so on */ total_size_of_f+=X[i];
int jump = 0; // helps to control the correct positions to write in f [] double f = new double[total_size_of_f]; /* The total size of f [] is the sum
for(int j = 0; j <(int)(sizeof(X)/sizeof(int)) ; j + +) of the elements in X[], i.e., the size of the subvector 1 [] plus the size of the
{ subvector 2 [], plus the size of the subvector 3 [], ..., and so on */
for(int k = 0; k < X[ j]; k + +) int jump = 0; // helps to control the correct positions to write in f []
{ for(int j = 0; j <(int)(sizeof(X)/sizeof(int)) ; j + +)
{
L = (int )(M/X[ j] );
highest _ZCR = 0;
f [ jump + k] = 0;
for(int k = 0; k < X[ j]; k + +)
for(int i = (k L ); i < (k L ) + L; i + +)
{
f [ jump + k]+ = (s[i] s[i + 1] < 0)?1:0; L = (int )(M/X[ j] );
} f [ jump + k] = 0;
for(int k = 0; k < X[ j]; k + +) for(int i = (k L ); i < (k L ) + L; i + +)
f [ jump + k]/ = (double )(L 1 ); f [ jump + k]+ = (s[i] s[i + 1] < 0)?1:0;
jump+ = X[ j]; if ( f [ jump + k] > highest _ZCR)
} highest _ZCR = f [ jump + k];
// at this point, the feature vector, f [], is ready }
for(int k = 0; k < X[ j]; k + +)
f [ jump + k]/ = (double )(highest _ZCR );
jump+ = X[ j];
}
3.2. Numerical example for B1 in 2D // at this point, the feature vector, f [], is ready
1 2 3 4

Problem statement: Let m[][] = 4 2 4 6 , implying in
7 8 9 10
N = 3 and M = 4. Assume that the square window has size L = 2 { 80 , 80 , 28 , 28 , 28 , 28 } = {0, 0, 14 , 14 , 14 , 14 }. On the other hand, for PA,
with overlaps of V = 50%. Obtain the feature vector, f[], following each component of f[] is divided by the maximum number of
method B1 . zero-crossings, i.e., (4 1 ) 3 + (3 1 ) 4 = 9 + 8 = 17. Thus, it be-
Solution: First, the 2D signal mean, comes { 17 0 0
, 17 2
, 17 2
, 17 2
, 17 2
, 17 }. Lastly, for EA, each component of f[]
1+2+3+4+4+2+4+6+7+8+9+10
= 60 is divided by the highest component of its unnormalised version,
12 12 = 5 = 0, is subtracted from each
1 5 
25 35 45 i.e., 2. Thus, it becomes { 20 , 20 , 22 , 22 , 22 , 22 } = {0, 0, 1, 1, 1, 1}.
component of s[], resulting in 45 25 45 65 =
7 5 8 5 9 5 10 5
4 3 2 1

1 3 1 1 . Then, the feature vector with length T P =
2 3 4 5
3.3. Numerical example for B2 in 1D
 (100 M )(LV ) (100N )(LV ) (1004 )(250 ) (1003 )(250 )
(100V )L   (100V )L  =  (10050 )2   (10050 )2  =
3 2 = 6 is obtained as follows:
Problem statement: Let s[] = {1, 2, 4, 6, 6, 6, 6, 5, 3, 1}, imply-
ing in M = 10. Assuming that Q = 3, with no overlaps between
4 3
w0 [ ][ ] covers the sub-matrix , which contains no window positions, obtain the feature vector, f[], following the
1 3
method B2 .
zero-crossings, implying that f0 = 0;
3 3 Solution: First, the 1D signal mean, 1+2+4+6+6+6+6+5+3+110 =
w1 [ ][ ] covers the sub-matrix 2 1
, which contains no 40
10 = 4  = 0 , is subtracted from each component of s [ ], result-
zero-crossings, implying that fB =
0; ing in {1 4, 2 4, 4 4, 6 4, 6 4, 6 4, 6 4, 5 4, 3 4, 1
2 1 4} = {3, 2, 0, 2, 2, 2, 2, 1, 1, 3}. The feature vector is com-
w2 [ ][ ] covers the sub-matrix 1 1
, which contains 2
posed by the concatenation of Q = 3 sub-vectors, i.e., f [] =
zero-crossings, implying that fA = 2 ;
1 3
{1 []} {2 []} {3 []}, which are obtained as follows:
w3 [ ][ ] covers the sub-matrix 2 3 , which contains 2
zero-crossings, implying that f3 = 2 ;
3 1 The rst subvector, 1 [], comes from two non-overlapping
w4 [ ][ ] covers the sub-matrix 3 4 , which contains 2 windows, w0 [] = {3, 2, 0, 2, 2} and w1 [] = {2, 2, 1, 1, 3},
zero-crossings, implying that f4 = 2; which are positioned over s[]. The corresponding results
1 1 are:
w5 [ ][ ] covers the sub-matrix , which contains 2
4 5 1 0 = 1 ; 1 1 = 1 .
zero-crossings, implying that f5 = 2. The second subvector, 2 [], comes from three non-overlapping
windows, w0 [] = {3, 2, 0}, w1 [] = {2, 2, 2} and w2 [] =
For the normalisation TA, each component of f[] is di-
5
{2, 1, 1}, which are positioned over s[], discarding its last el-
vided by k=0 f k = 0 + 0 + 2 + 2 + 2 + 2 = 8. Thus, it becomes ement, i.e., the amplitude 3. The corresponding results are:
260 R.C. Guido / Knowledge-Based Systems 105 (2016) 248269

Algorithm 10 : fragment of C++ code for method B2 in 2D, adopt- Algorithm 11 : fragment of C++ code for method B2 in 2D, adopt-
ing the normalisation TA. ing the normalisation PA.
// ensure that m[][], with height N and width M, is available as input // ensure that m[][], with height N and width M, is available as input
double mean = 0; double mean = 0;
for(int p = 0; p < N; p + +) for(int p = 0; p < N; p + +)
for(int q = 0; q < M; q + +) for(int q = 0; q < M; q + +)
mean+ = m[ p][q]/(double )(M N ); mean+ = m[ p][q]/(double )(M N );
for(int p = 0; p < N; p + +) for(int p = 0; p < N; p + +)
for(int q = 0; q < M; q + +) for(int q = 0; q < M; q + +)
m[ p][q] = mean; //at this point, the arithmetic mean of the m[ p][q] = mean; //at this point, the arithmetic mean of the
input signal is 0 input signal is 0
double ZCR; // represents the total ZCR over all the window positions, that int L1, L2;
is required to normalise f [] int X[] = {2, 3, 5, 7, 9, 11, 13, 17}; /* vector containing the prime numbers of
int X[] = {2, 3, 5, 7, 9, 11, 13, 17}; /* vector containing the prime numbers of interest. It can be changed according to the experiment */
interest. It can be changed according to the experiment */ int total_size_of_f = 0;
int total_size_of_f = 0; for(int i = 0; i <(int)(sizeof(X)/sizeof(int));i + +) // number of elements in
for(int i = 0; i <(int)(sizeof(X)/sizeof(int));i + +) // number of elements in X[]
X[] total_size_of_f+=pow(X[i], 2);
total_size_of_f+=pow(X[i], 2); double f = new double[total_size_of_f]; /* The total size of f [] is the sum
double f = new double[total_size_of_f]; /* The total size of f [] is the sum of the squares of the elements in X[], i.e., the size of the subvector 1 [] plus
of the squares of the elements in X[], i.e., the size of the subvector 1 [] plus the size of the subvector 2 [], plus the size of the subvector 3 [], ..., and so
the size of the subvector 2 [], plus the size of the subvector 3 [], ..., and so on */
on */ int jump = 0; // helps to control the correct positions to write in f []
int jump = 0; // helps to control the correct positions to write in f [] for(int i = 0;i <total_size_of_f;i + +)
for(int i = 0;i <total_size_of_f;i + +) f [i] = 0;
f [i] = 0; for(int k = 0;k <(int)(sizeof(X)/sizeof(int));k + +)
int L1, L2; {
for(int k = 0;k <(int)(sizeof(X)/sizeof(int));k + +) L1 =(int)(M/X[k]);
{ L2 =(int)(N/X[k]);
ZCR = 0; for(int i = 0;i <((int)(N/X [k]))*X [k] - 1;i + +)
L1 = (int)(N/X[k]); for(int j = 0; j <((int)(M/X [k]))*X [k]; j + +)
L2 = (int)(M/X[k]); f [jump+(((int)(i/L2))*(X[k]))+((int)( j/L1))]+ =(m[i][ j]
for(int i = 0;i <((int)(N/X [k]))*X [k] - 1;i + +) m[i + 1][ j] < 0)?1:0;
for(int j = 0; j <((int)(M/X [k]))*X [k]; j + +) for(int i = 0;i <((int)(N/X [k]))*X [k];i + +)
{ for(int j = 0; j <((int)(M/X [k]))*X [k] - 1; j + +)
f [jump+(((int)(i/L2))*(X[k]))+((int)( j/L1))]+ =(m[i][ j] f [jump+(((int)(i/L2))*(X[k]))+((int)( j/L1))]+ =(m[i][ j]
m[i + 1][ j] < 0)?1:0; m[i][ j + 1] < 0)?1:0;
ZCR+ = (m[i][ j] m[i + 1][ j] < 0 )?1 : 0; for(int i =jump;i <jump+pow(X[k],2);i + +)
} f [i]/ = ( 2 L1 L2 L1 L2 );
for(int i = 0;i <((int)(N/X [k]))*X [k];i + +) jump+=pow(X[k],2);
for(int j = 0; j <((int)(M/X [k]))*X [k] - 1; j + +) }
{ // at this point, the feature vector, f [], is ready.
f [jump+(((int)(i/L2))*(X[k]))+((int)( j/L1))]+ =(m[i][ j] // . . .
m[i][ j + 1] < 0)?1:0;
ZCR+ = (m[i][ j] m[i][ j + 1] < 0 )?1 : 0;
}
for(int i =jump;i <jump+pow(X[k],2);i + +) Lastly, considering EA, each component of 1 in f[] is divided
f [i]/ = (double )(ZCR );
jump+=pow(X[k],2);
by the highest component in it, i.e., 1; each component of 2 in
} f[] is divided by the highest component in it, i.e., 1; and each
// at this point, the feature vector, f [], is ready. component of 3 in f[] keeps unchangeable because its high-
est component is 0. Thus, f[] becomes { 11 , 11 , 11 , 10 , 11 , 0, 0, 0, 0, 0} =
{1, 1, 1, 0, 1, 0, 0, 0, 0, 0}.

2 0 = 1 ; 2 1 = 0 ; 2 2 = 1 . 3.4. Numerical example for B2 in 2D


The third subvector, 3 [], comes from ve non-overlapping
windows, w0 [] = {3, 2}, w1 [] = {0, 2}, w2 [] = {2, 2}, 0 1 2 3
Problem statement: Let m[][] = 8
w3 [] = {2, 1} and w4 [] = {1, 3}, which are positioned over 4 5 6 7
9 10 53
, implying in
s[]. The corresponding results are: 3 0 1 0
3 0 = 0 ; 3 1 = 0 ; 3 2 = 0 ; 3 3 = 0 N = 4 and M = 4. Assume that Q = 2 with no overlaps between
; 3 4 = 0 . windows. Obtain the feature vector, f[], following method B2 .
Solution: First, the 2D signal mean,
0+1+2+3+4+5+6+7+8+9+10+53+3+0+1+0
16 = 112
16 = 7 = 0, is sub-
The concatenation of the three sub-vectors produce f [] = tracted
from each component
of m[][], resulting in
{1, 1, 1, 0, 1, 0, 0, 0, 0, 0}. Now, each sub-vector is normalised sep- 07 17 27 37 7 6 5 4
arately. Considering TA, each component of 1 in f[] is divided 47 57 67 77
= 3 2 1 0
. The feature
1 87 97 10 7 53 7 1 2 3 46
by k=0 1 k = 1 + 1 = 2; each component of 2 in f[] is di- 37 07 17 07 4 7 6 7
2
vided by k=0 2 k = 1 + 0 + 1 = 2; and each component of 3 in vector is composed by the concatenation of Q = 2 sub-vectors, i.e.,
4
f[] keeps unchangeable because k=0 3 k = 0. Thus, f[] becomes
f [] = {1 []} {2 []}. They are obtained as follows:
{ 2 , 2 , 2 , 2 , 2 , 0, 0, 0, 0, 0} = { 2 , 2 , 2 , 0, 12 , 0, 0, 0, 0, 0}.
1 1 1 0 1 1 1 1

On the other hand, considering PA, each component of 1 in f[] for 1 [], a total of 2 2 = 4 non-overlapping
windows,

7 6 5 4 1 2
is divided by the maximum number of zero-crossings possible for w0 [][] = 3 2 , w1 [][] = 1 0 , w2 [][] = 4 7
the window that originated it, i.e., L 1 = 5 1 = 4. Equally, each
3 46
component of 2 in f[] is divided by L 1 = 3 1 = 2, and each and w3 [][] = 6 7
, are positioned over m[][]. The result
component of 3 in f[] is divided by L 1 = 2 1 = 1. Thus, f[] is:
becomes { 14 , 14 , 12 , 20 , 12 , 10 , 10 , 10 , 01 , 01 } = { 14 , 14 , 12 , 0, 12 , 0, 0, 0, 0, 0}. 1 0 = 0 ; 1 1 = 2 ; 1 3 = 2 ; 1 4 = 2 .
R.C. Guido / Knowledge-Based Systems 105 (2016) 248269 261

Algorithm 12 : fragment of C++ code for method B2 in 2D, adopt- Algorithm 14 : fragment of C++ code for method B3 in 2D.
ing the normalisation EA. // ...
// ensure that m[][], with height N and width M, is available as input
// ensure that m[][], with height N and width M, is available as input
double mean = 0;
double mean = 0;
for(int p = 0; p < N; p + +)
for(int p = 0; p < N; p + +)
for(int q = 0; q < M; q + +)
for(int q = 0; q < M; q + +)
mean+ = m[ p][q]/(double )(M N );
mean+ = m[ p][q]/(double )(M N );
for(int p = 0; p < N; p + +)
for(int p = 0; p < N; p + +)
for(int q = 0; q < M; q + +)
for(int q = 0; q < M; q + +)
m[ p][q] = mean; //at this point, the arithmetic mean of the
m[ p][q] = mean; //at this point, the arithmetic mean of the
input signal is 0
input signal is 0
double alpha = 0; // partial height
int X[] = {2, 3, 5, 7, 9, 11, 13, 17}; /* vector containing the prime numbers of
double beta = 0; // partial width
interest. It can be changed according to the experiment */
double C =; // the desired value, being 0 < C < 100
int total_size_of_f = 0;
int T = ((100/C ) ((int )(100/C )) == 0 )?(100/C ) 1 : (int )(100/C ) ; //
for(int i = 0; i <(int)(sizeof(X)/sizeof(int));i + +) // number of elements in
the number of elements in T
X[]
double f = new double[T ]; // dynamic vector declaration
total_size_of_f+=pow(X[i], 2);
double z = zcr(m, N, M)*((double)(C)/100.0);
double f = new double[total_size_of_f]; /* The total size of f [] is the sum
for(int k = 0; k < T ; k + +)
of the squares of the elements in X[], i.e., the size of the subvector 1 [] plus
{
the size of the subvector 2 [], plus the size of the subvector 3 [], ..., and so
while(zcr(&m[0][0],(int)(alpha),(int)(beta))<((k + 1 ) z))
on */
if (N == M)
int jump = 0; // helps to control the correct positions to write in f []
{
for(int i = 0;i <total_size_of_f;i + +)
alpha++;
f [i] = 0;
beta++;
int L1, L2;
}
for(int k = 0;k <(int)(sizeof(X)/sizeof(int));k + +)
else if (N > M)
{
{
L1 = (int)(N/X[k]);
alpha++;
L2 = (int)(M/X[k]);
beta+=(double)(M)/(double)(N);
for(int i = 0;i <((int)(N/X [k]))*X [k] - 1;i + +)
}
for(int j = 0; j <((int)(M/X [k]))*X [k]; j + +)
else
f [jump+(((int)(i/L2))*(X[k]))+((int)( j/L1))]+ =(m[i][ j]
{
m[i + 1][ j] < 0)?1:0;
alpha+=(double)(N)/(double)(M);
for(int i = 0;i <((int)(N/X [k]))*X [k];i + +)
beta++;
for(int j = 0; j <((int)(M/X [k]))*X [k] - 1; j + +)
}
f [jump+(((int)(i/L2))*(X[k]))+((int)( j/L1))]+ =(m[i][ j]
f [k] =((al pha beta)<=(N M))?((al pha beta)/(N M)):(1); // ex-
m[i][ j + 1] < 0)?1:0;
ceptionally, as alpha or beta increases, f [k] > 1 for k close to T , so this is
jump+=pow(X[k],2);
a correction.
}
}
jump=0;
// at this point, the feature vector, f [], is ready
double highest _ZCR;
// . . .
for(int k = 0;k <(int)(sizeof(X)/sizeof(int));k + +)
// function zcr in 2D
{
double zcr(double** input_matrix, int height, int width)
highest _ZCR = 0;
{
for(int l = 0;l <pow(X[k], 2);l + +)
double z = 0;
if(f[jump+l]> highest _ZCR)
for(int i=0;i<height - 1;i++)
highest _ZCR = f [jump+l];
for(int j=0;j<width;j++)
for(int l = 0;l <pow(X[k], 2);l + +)
z +=(input _matrix[i][ j] input _matrix[i + 1][ j] < 0)?1:0;
f[jump+l]/ = highest _ZCR;
for(int i=0;i<height;i++)
jump+=pow(X[k], 2);
for(int j=0;j<width - 1;j++)
}
z +=(input _matrix[i][ j] input _matrix[i][ j + 1] < 0)?1:0;
// at this point, the feature vector, f [], is ready.
return(z);
}
// ...
Algorithm 13 : fragment of C++ code for method B3 in 1D.
// ensure that s[], of length M, is available as input
double mean = 0;
for(int k = 0; k < M; k + +)
for 2 [], a total of 3 3 = 9 non-overlapping windows,
mean+ = s[k]/(double )(M );
for(int k = 0; k < M; k + +) w0 [][] = (7 ), w1 [][] = (6 ), w2 [][] = (5 ), w3 [][] = (3 ),
s[k] = mean; //at this point, the arithmetic mean of the input signal w4 [][] = (2 ), w5 [][] = (1 ), w6 [][] = (1 ), w7 [][] = (2 ) and
is 0 w8 [][] = (3 ), are positioned over m[][], being its fourth row
int L = 0; // partial lengths
double C =; // the desired value, being 0 < C < 100 and fourth column discarded. The result is:
int T = ((100/C ) ((int )(100/C )) == 0 )?(100/C ) 1 : (int )(100/C ) ; // 2 0 = 0 ; 2 1 = 0 ; 2 2 = 0 ; 2 3 = 0 ; 2 4 = 0
the number of elements in T ; 2 5 = 0 ; 2 6 = 0 ; 2 7 = 0 ; 2 8 = 0 .
double f = new double[T ]; // dynamic vector declaration
double z = zcr(&s[0], M)*((double)(C)/100);
for(int k = 0; k < T ; k + +) The concatenation of both sub-vectors produce f [] =
{ {0, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0}. As in the previous example,
while(zcr(&s[0], L)<((k + 1 ) z))
each sub-vector is normalised separately. Considering TA, each
L + +; 
f [k] =(double)(L)/(double)(M); component of 1 in f[] is divided by 3k=0 1 k = 0 + 2 + 2 + 2 = 6;
} and each component of 2 in f[] keeps unchangeable because
// at this point, the feature vector, f [], is ready 8
k=0 2 k = 0. Thus, f[] becomes { 6 , 6 , 6 , 6 , 0, 0, 0, 0, 0, 0, 0, 0, 0} =
0 2 2 2
// . . .
// function zcr {0, 3 , 3 , 3 , 0, 0, 0, 0, 0, 0, 0, 0, 0}.
1 1 1

double zcr(double* input_vector, int length) On the other hand, considering PA, each component of 1
{
double z = 0;
in f[] is divided by the maximum number of zero-crossings
for(int i=0;i<length - 1;i++) possible for the window that originated it, i.e., 1 2 + 1 2 =
z +=(input _vector[i] input _vector[i + 1] < 0)?1:0; 4, and each component of 2 in f[] keeps unchangeable be-
return(z);
cause the subvector does not cross zero. Thus, f[] becomes
}
{ 40 , 24 , 24 , 24 , 0, 0, 0, 0, 0, 0, 0, 0, 0} = {0, 12 , 12 , 12 , 0, 0, 0, 0, 0, 0, 0, 0, 0}.
262 R.C. Guido / Knowledge-Based Systems 105 (2016) 248269

Lastly, considering EA, each component of 1 in f[] is di- 4 ) rectangle, from m0, 0 , are required to reach at least 10 zero-
vided by the highest component in it, i.e., 2; and each compo- crossings. Since N M = 4 4 = 16, the proportion of m[][] area
nent of 2 in f[] keeps unchangeable because its highest com- covered is 16
16 = 1;
ponent is 0. Thus, f[] becomes { 20 , 22 , 22 , 22 , 0, 0, 0, 0, 0, 0, 0, 0, 0} =
{0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0}. Thus, f [] = {0.5625, 1, 1}. As in the previous example, no fur-
ther normalisation applies.

4. Example applications
3.5. Numerical example for B3 in 1D
In this section, example applications involving 1D and 2D
Problem statement: Let s[] = {1, 1, 1, 1, 1, 1, 1, 1}, im- real-life data are shown to consolidate the proposed approaches,
plying in M = 8, and assume that C = 20% is the critical level. Ob- demonstrating their usability.
tain the feature vector, f[], according to the method B3 .
Solution: The signal mean is 1 + (1 ) + 1 + (1 ) + 1 + 1 + 4.1. Speech classication and segmentation
(1 ) + (1 ) = 0, i.e., the ZCRs are ready to be counted. The fea-
ture vector, of length T = 100 20 1 = 4, is composed by the pro- There are many subclassications for speech data, however,
portional lengths of s[] required to reach 1 20% = 20%, 2 20% = voiced, unvoiced and silent, respectively originated from quasi-
40%, 3 20% = 60% and 4 20% = 80% of its total number of zero- periodic, non-periodic and inactive sources, are the root ones
crossings. They are obtained as follows: [38]-pp.77, 78. Usual applications in which the differentiation be-
tween voiced, unvoiced and silent segments (VUSS) is relevant in-
s[] has 5 zero-crossings; clude large-vocabulary speech recognition [39], speaker identica-
20% of 5 is 0.2 5 = 1.4. Ceiling the result, 2 zero-crossings are tion [40], voice conversion [41] and speech coding [42]. Thus, I
required. The proportion of the length of s[] to reach them, dedicate this section to initially present a ZCR-based algorithm for
from the beginning, is 38 ; the distinction among VUSS and, upon taking advantage of that
40% of 5 is 0.4 5 = 2 zero-crossings. The proportion of the formulation, to introduce my proposal for isolated-sentence word
length of s[] to reach them, from the beginning, is, again, 38 ; segmentation.
60% of 5 is 0.6 5 = 3. The proportion of the length of s[] to Neither B2 nor B3 can be used in this experiment, because, a
reach them, from the beginning, is 48 = 12 ; priori, there is an unknown number of VUSS, implying that the fea-
80% of 5 is 0.8 5 = 4. The proportion of the length of s[] to ture vector generated to store their positions has a variable length
reach them, from the beginning, is 58 . that depends not only on the duration but also on the content of
the spoken words. Thus, B1 is the only adequate method, among
Thus, f [] = { 38 , 38 , 12 , 58 }. As explained in the previous section, the three ones I presented, to carry out this task. Complementarily,
no further normalisation applies. VUSS are classied according to the characteristics of each speech
frame, in isolation, i.e., disregarding the remaining of the signal,
suggesting that PA is the proper normalisation.
As reported in [43], ZCRs are usually associated with signal en-
3.6. Numerical example for B3 in 2D ergy [1] to allow the implementation of accurate algorithms for
the detection of VUSS. Thus, B1 normalised with PA was associ-
5 2 5 6 ated with A1 , fully described in [1]. Independently, B1 and A1 were
Problem statement: Let m[][] = 2 3 3 7
, implying in applied to the input speech signal s[], respectively, producing the
8 4 4 6
1 4 12 0 feature vectors I named as fB [] and as fA [], with the same vari-
N = 4 and M = 4. Assume that C = 25% is the critical level. Obtain able length. Consequently, for each window placement, there are
the feature vector, f[], according to the method B3 . two types of information: spectral, based on the normalised ZCRs
Solution: The signal mean is 5+2+5+6+2+3+3+7+8+4 4+6+1+4+12+0
= available in fB [], and temporal, based on the normalised energies
16
64
= 4  = 0 . Thus, m [ ][ ] is translated so that it becomes contained in fA []. Considered in conjunction, the feature sets pro-

16 duce a variable-length description of s[] so that there is a corre-


54 24 54 64 1 2 1 2
24 34 34 7 4 2 1 1 3 spondence between this signal and those vectors for each tran-
84 44 4 4 64
= 4 0 8 2
. The feature
sition between different types of segments. Notably, the frontiers
14 44 12 4 04 3 0 8 4
which delimit VUSS could not be easily found with basis on the
vector, of length T = 100
25 1 = 3, is composed of the proportional direct inspection of s[].
areas of m[][] required to reach 1 25% = 25%, 2 25% = 50% and Fig. 13 shows the input signal, s[], that I use to explain the
3 25% = 75% of its total ZCR. They are obtained as follows: proposed approach. It was digitalised at 16, 0 0 0 samples per sec-
ond, 16-bit, corresponding to the 45, 520 sample-long raw speech
m[][] has 13 zero-crossings. Its area is N M = 4 4 = 16; data extracted from the sentence sa1 contained in the directory
25% of 13 is 0.25 13 = 3.25. Ceiling it, 4 zero-crossings are con- /test/dr1/mdab0/ of the TIMIT speech corpus [44], which reads as
sidered. Thus, the sub-matrix covered by the (N = 3 ) x (M = 3 ) She had your dark suit in greasy wash water all year. In the g-
rectangle, from m0, 0 , are required to reach at least 4 zero- ure, the horizontal axis contains the tags Jk , (0 k 37), which
crossings. Since N M = 3 3 = 9, the proportion of m[][] area correspond to the transitions between consecutive phonemes of
9
covered is 16 = 0.5625; sa1, as described in the respective phn le included in the above-
50% of 13 is 0.5 13 = 6.5. Ceiling it, 7 zero-crossings are con- mentioned directory. In addition to the silent periods, labelled as
sidered. Thus, the sub-matrix covered by the (N = 4 ) x (M = 4 ) SIL and indicated in orange, the exact voiced and unvoiced in-
rectangle, from m0, 0 , are required to reach at least 7 zero- tervals, respectively designated as VOI and UNV, are shown in
crossings. Since N M = 4 4 = 16, the proportion of m[][] area teal and violet, in accordance with the documentation found in
covered is 16
16 = 1; [44] and [45].
75% of 13 is 0.75 13 = 9.75. Ceiling it, 10 zero-crossings are The value I chose for L, equivalent to 32 ms of speech, is of 512
considered. Thus, the sub-matrix covered by the (N = 4 ) x (M = samples. As documented in [12]-pp.32 and [46]-pp.25, the speech
R.C. Guido / Knowledge-Based Systems 105 (2016) 248269 263

Fig. 13. The speech voiced/unvoiced/silent decision experiment. SIL, UNV and VOI, respectively, mean silent, unvoiced and voiced. Olive, blue and red, respectively, are
the colors used to plot the input speech signal s[], i.e., the le /test/dr1/mdab0/sa1.wav from TIMIT, the ZCR feature vector fB [] and the energy feature vector fA []. (For
interpretation of the references to colour in this gure legend, the reader is referred to the web version of this article).

processing community considers this as being adequate to analyse i.e., 20 micro-Pascal micro-Pascal, at 10 0 0 Hz and 25 C, as docu-
speech data such as s[], the signal that appears as an olive curve in mented in [71]-pp.150155. In order to compare a spoken signal
Fig. 13. Furthermore, V = 50% follows the usual procedure adopted with the threshold of hearing, its specic playback level should
during the short-time analysis of speech signals, albeit this is not a be known but this is not simple in practice. Notwithstanding, an
critical choice. Additionally, the options for L and V allowed a com- usual assumption is to consider such a level as being the small-
fortable visualisation of fB [] and fA [], plotted respectively in red est possible signal represented by means of the speech coding sys-
and blue in Fig. 13, in which the three signals are time-aligned so tem dened at the time the signal was digitalised and quantised.
that the kth sample of fB [] and of fA [], (0  k  T 1 ), correspond Equivalently, the fairly at bottom of the threshold of hearing, for
to the window placement that covers the interval from skL 100V  to the frequencies within the main range of speech, is simply aligned
100
to the energy level represented by the least signicant coding bit.
skL 100V +L , i.e., from s256K to s256K+512 . Since T =  (100 M )(LV )
(100V )L  =
100 TIMIT speech les were quantised with 16 bits, with one of them
 (100(45520 )(51250 )
10050 )512
 = 176, both fB [] and fA [], indexed from 0 to reserved for signalling, i.e., positive or negative, and the remaining
175, were dilated on the horizontal axis prior to be plotted, al- ones for amplitude description, hence, the amplitude axis, for both
lowing a better visualisation, comparison and understanding of the positive and negative amplitudes, varies at each 21611 = 2115 . Con-
proposed ideas. sidering each window placement covers L = 512 samples, the nor-
9
On one hand, each unvoiced segment of s[] matches a region malised level is 512 2115 = 2215 = 26 = 0.015625 0.016, which is
with two characteristics: an inconstant and high ZCR in fB [], and the value I chose for HA .
a high energy in fA []. On the other hand, voiced parts of s[] are If the readers repeat this experiment for the entire TIMIT cor-
the ones that correspond to low and relatively constant ZCRs in pus, which contains 630 spoken sentences, the same successful
fB [] associated with high energies in fA []. Finally, silent segments style of discrimination observed in Fig. 13 will be obtained. In fact,
correspond to those with low energies in fA [], disregarding fB []. many similar descriptions for the distinction among VUSS based
To be considered either high or low, ZCRs and energies obey the on signal energy and ZCRs have already been documented in the
hard thresholds respectively named as HB and HA . Summarising, literature, such as [47], published forty years ago, and [23], that
the strategy is: is a more recent work. Aiming to offer the readers a more attrac-
tive and novel classication scheme, taking advantage of the previ-
For the kth sample of f B [ ] and f A [ ], ( 0 k T 1 ): ous formulation, I shall describe my proposal for isolated-sentence
{ word segmentation.
If ( f Ak <HA ) According to [48]-pp.125, no known detail contained in a raw
the corresponding region in s[ ] is silent ;
speech waveform corresponds directly to the white spaces that
else if ( f B k < HB )
the corresponding region in s[ ] is voiced ; separate two words in written form. In fact, as I have observed
else through the years, the raw waveform rarely presents distinct and
the corresponding region in s[ ] is unvoiced. clear silent spaces between words, making this kind of segmenta-
} tion a hard and complex challenge. In some particular cases, only
As documented in [12]-pp.34, ZCR measures for voiced the context can serve as the basis to nd whether or not a bound-
and unvoiced speech are, respectively, close to 1400 and ary exists. One example refers to the underlined words in the next
4900 zero-crossings per second. Thus, a reasonable value pair of sentences:
for HB is the mean, i.e., 1400+490 0
= 3150 zero-crossings ... Is there a good wine ? Yes, there is some. What age is it ?
2
per second. For 32 mili-seconds (ms), HB = 3150 0.032 = ...
100.8 zero-crossings per 512 samples. Since PA was ap- ... Is there a good wine ? Yes, there is. Somewhat aged ? ...
100.8 100.8 .
plied, HB becomes L1 = 511 0.197, the value I
adopted. Surely, not only an isolated speech fragment has to be taken
For energy, contrastingly, I observe that the threshold of hear- into account to solve this issue but also the entire sentence. No-
ing for human beings is 0 dB at the pressure intensity of 20Pa, tably, the authors of paper [50] have already shown the advantages
264 R.C. Guido / Knowledge-Based Systems 105 (2016) 248269

Fig. 14. The speech segmentation experiment, with the tags Ik , (0 k 13), delimiting each spoken word. Olive and black, respectively, are the colors used to plot the input
speech signal s[], i.e., the le /test/dr1/mdab0/sa1.wav from TIMIT, and the feature vector fC []. The red horizontal line corresponds to the threshold HC . As in the previous
gure, SIL identies the silent periods.(For interpretation of the references to colour in this gure legend, the reader is referred to the web version of this article).

of using a few seconds of bootstrapping data, instead of particular in, indicate the most probable points for the word boundaries in
speech segment only, for word segmentation. Inspired by such for- sc [], because they represent a more relaxed speakers work to ut-
mulations, my experimental proposal is ter, in view of a semantic shift in the way of speaking.
Algorithm 15 presents the complete source-code that imple-
fC k = fB k fA k , ( 0  k  T 1 ), ments the procedures described for isolated-sentence word seg-
which reects a ZCR-based weighted set of energy components. Al- mentation, so that the readers can easily replicate the experiments.
though fA [], originally described in [1], is normalised considering The global accuracy of the proposed approach was evaluated on
the whole input signal energy that is distributed along the time the basis of a recent tool called R-value, explained in [49] and de-
so that its kth position is inuenced by the entire speech, fB [] ned as being
associated with PA contains only local spectral information. As a |r1 | + |r2 |
result, global temporal workload [1] is weighted on the basis of R=1 ,
200
local frequency content, semantically characterising the intended
physical principle. Once fC [] is obtained, the strategy for decision considering
becomes:  OS + HR 100
r1 = (100 HR )2 + (OS )2 and r2 = .
2
For (1  k  T 2 ): OS and HR are respectively known as over-segmentation and hit-
If ( fC k is the mid-point between the beginning and the end of a region rate. The former indicates the percentage of the number of bound-
below HC )
skL 100V  is dened as being a word boundary.
aries correctly detected in relation to the total number of exist-
100
ing ones. On the other hand, the latter corresponds to the per-
 1 T 1
HC = HB T1 Tk=0 fC k = 0.197
T f , which represents a hard
k=0 C k
centage of the total number of boundaries detected, both correctly
threshold for usage in conjunction with the new feature vector and incorrectly, in relation to the total number of existing ones.
fC [], is formed on the basis of two contributions, just as that vec- Prominent results, fully acceptable in terms of the points labelled
tor was: the local spectral threshold, HB , and the temporal vector as being word boundaries, were observed when the proposed ap-
mean, which represents a distributed piece of information regard- proach was applied to all the 630 sentences sa1 spoken by the
ing it. TIMIT speakers. Disregarding their borders, i.e., their start and end
Fig. 14 illustrates s[], fC [], HC and the exact word boundaries, points, there are a total of 10 630 = 6300 boundaries between
according to the corresponding wrd text le contained in the same the 11 630 = 6930 existing words in those sentences. Assuming
TIMIT folder that includes s[]. The close, and certainly acceptable, that OS and HR were counted so that the former and the latter
matches between the points that satisfy the condition imposed by are respectively equal to 1.7378% and 93.4755%, then R = 0.9379 =
the proposed strategy and the real boundaries are clearly percep- 93.70%, consolidating the hypothesis that fC [] contains useful in-
tible. Particularly, the highest positive spike in the black curve of formation for word segmentation and motivating its further inves-
Fig. 14, close to the position of I5 , expresses the most improbable tigation.
point for a word boundary in sc [], since it represents a consider- In comparison with state-of-the-art procedures [5056], the
able activity of the speakers vocal apparatus. Consequently and in proposed algorithm is novel, introducing the applications of ZCR-
opposition, its lowest negative spikes, the ones we are interested based weighted energy components for isolated-sentence word
R.C. Guido / Knowledge-Based Systems 105 (2016) 248269 265

Algorithm 15 : C++ function for word segmentation 4.2. Image analysis and re-synthesis for border extraction
void word_segmentation(double* s, int M)
{ At an early stage, most of the image processing algorithms for
double mean = 0; PRKbS have to deal with a segmentation step [5759], which refers
for(int k = 0; k < M; k + +)
to the analysis of the input signal aiming to highlight meaningful
mean+ = s[k]/(double )(M );
for(int k = 0; k < M; k + +) areas of interest. Usually, such a process is based on border extrac-
s[k] = mean; //at this point, the arithmetic mean of the input signal tion, a strategy that identies the pixels responsible for dening
is 0 the boundaries between objects or specic parts of the image. Par-
int L = 512;
int V = 50; ticular regions in which the fundamental frequencies are relatively
int T = (int)((100 M L V )/((100 V )L)); high are most likely to represent the borders, thus, ZCRs, captur-
double f _B = new double[T ]; ing those spectral components, are considered in this experiment
double E = 0; // E represents the total energy over all the positions of the
window, that is required to normalise f _A[] instead of the usual methods, such as watershed [57] and ltering
double f _A = new double[T ]; operations [60].
for(int k = 0; k < T ; k + +) In order to establish an overall comparison, I state that ZCRs
{
f _B [ k ] = 0 ; have one great advantage over the usual techniques: they do not
f _A [ k ] = 0 ; take into account the frequencies higher than the fundamental one,
for(int i = k ((int )(((100 V )/100.0 ) L )); i < k ((int )(((100 completely disregarding minor variabilities among neighbour pix-
V )/100.0 ) L )) + L 1; i + +)
{
els which could hinder a precise characterisation of the borders.
f _B[k]+ = (s[i] s[i + 1] < 0)?1:0; The direct consequence is that ltering approaches, such as those
f _A[k]+ = pow(s[i], 2); detailed in [6063], are not better than ZCRs for border extrac-
}
tion because lters are never ideal in practice, i.e., contaminations
f _B[k]/ = (double )(L 1 );
E+ = f _A[k]; resulting from imperfect ltering are not present when ZCRs are
} used. Thus, the proposed approach, which might denitively not
for(int k = 0; k < T ; k + +) be understood as a ltering operation, consists of an FE procedure
f _A[k]/ = E; // normalisation
double f _C = new double[T ]; for PRKbS playing the role of a DSP algorithm in which convolu-
for(int k = 0; k < T ; k + +) tional ltering is advantageously replaced by an analysis followed
f _C[k] = f _B[k] f _A[k]; by re-synthesis of the input 2D signal.
mean=0;
for(int k = 0; k < T ; k + +) My strategy requires ZCRs from all the minor constant-area re-
mean + = f _C[k]; gions over the input image to be interrelated. Neither B1 nor B3
mean/ = (double)(T ); are suitable to perform this task. Instead, B2 particularly associated
double H _C = 0.197mean;
int k = 0; with EA is ideal and so became the choice. The detailed scheme
int start _region, end_region; follows, where the ideal value for  is determined empirically. Ob-
while ((k < T )&&( f _C[K] < H _C)) viously, it is at least equal to four, otherwise, no ZCR could be
k + +;
do // search for regions below HC
counted. My tests have shown that sixteen is a great option for
{ synthesising target images of different resolutions, allowing a rea-
while ((k < T )&&( f _C[K] > H _C)) sonable visual quality.
k + +;
start _region = k;
while ((k < T )&&( f _C[K] < H _C))
k + +;
end_region = k; Extract the raw data from the input gray-scaled image, storing it in the NxM
if (k < T ) matrix m[][];
printf(\n Word boundary detected at s[%d]., Apply B2 normalised with EA, choosing Q so that Q contains elements
(int)((end_region+start_region)/2.0) * (int)(((100-V)/(100.0)) * L)); extracted from non-overlapping rectangles with about  pixels;
} while (k < T ); Select either 1 , or 2 , or ..., or Q to synthesise the target image, being 1
} and Q , respectively, the options for worst and best resolutions;
Synthesise the target image, m[][], as follows:
For the kth sample of Q [], (0  k < X 2 ):
draw a  NX x MX
 rectangle, using the color
segmentation. Furthermore, it presents the following advantages. ((maximum_level_of_black_color)(Qk )), from the point ( Xk , k%X ) to the point
( Xk +  NX , k%X +  MX  ), being
First, it is based on a simple inspection of a feature vector origi- % the remainder of the integer division, just as in C/C++ programming
nated from the most humble concepts involving spectral and tem- language.
poral analysis, i.e., ZCRs and signal energy, respectively. Second and
opposed to some of the usual procedures just cited, its order of The strategy adopted to dene the color of each rectangle con-
computational complexity, both in terms of space and time, is lin- sists of a simple linear proportion of the maximum level of black
ear (O(M)) in relation to the input signal length (M), allowing real- color, that is usually equal to 255 for 8-bit images [60], based on
time implementations. This linearity is a direct consequence of the the corresponding kth value of [], which varies from 0 to 1. The
strategies adopted to dene B1 , A1 [1] and algorithm 15, because former extreme forces a white rectangle to be plotted, which cor-
all of them work based on a direct traverse of the input signal responds, in practice, to the absence of plot, since a white back-
under analysis. Lastly, saving the appropriate proportions, it is as ground is assumed. At the same time, the latter creates a black
accurate as the above-referenced similar techniques, which report rectangle. Intermediary gray colors, produced for 0 < k < 1, usually
their successful results within the range from 54 to 95% for En- appear close to the black ones, stimulating the completion effects
glish, French, Chinese and also ancient language corpora. during the paintings.
Obviously, the strategy I have just presented does not take into Aiming to exemplify the proposed approach, Fig. 15(a) shows
account complex lexical issues and cannot be determined as being the digit 5, handwritten and digitalised as being a matrix with
the denite algorithm for word segmentation, nevertheless, it bal- N = 508 rows and M = 508 columns. The image was analysed at
ances creativity, simplicity and accuracy, satisfying the objective of the resolution provided by using Q = 31, which produces a feature
2 2
this study and providing the initial insights for additional research vector containing 22 + 32 + 52 + ... + (113 ) + (127 ) elements, ac-
in the same direction. cording to Section 2. Then, sub-vectors 1 , 2 , 7 , 11 , 18 and 31 ,
266 R.C. Guido / Knowledge-Based Systems 105 (2016) 248269

Fig. 15. The input image and its synthesised versions.

Algorithm 16 : C++ code for image border extraction. The func-


tion receives three parameters, i.e., m[][] and the addresses of M
and N, modifying all of them: the rst one, so that it becomes the
synthesised squared image (m[][]), and the second and third ones,
so that a new size can be set.
void border_extraction(double** m, int* N, int* M)
{
double mean = 0; // adjust mean to be zero
for(int p = 0; p < (N ); p + +)
for(int q = 0; q < (M ); q + +)
mean+ = m[ p][q]/(double )((M ) (N ));
Fig. 16. (a): The Elsevier logo; (b): corresponding image containing only borders.
for(int p = 0; p < (N ); p + +)
for(int q = 0; q < (M ); q + +)
m[ p][q] = mean;
2 2 2
composed respectively of X = (2 ) = 4, X = (3 ) = 9, X = (17 ) = int X =; // the desired value, i.e., number of image divisions to produce Q .
2 2 2
289, X = (31 ) = 961, X = (61 ) = 3721 and X = (127 ) = 16129 In the example of Figure??, it was set to 127
int total_size_of_f = X X;
elements, were separately used to synthesise the target images double f = new double[total_size_of_f];
shown in Fig. 15(b)(g). Clearly, the higher the resolution is, the for(int i = 0;i <total_size_of_f;i + +)
better the characterisation of the border will be. f [i] = 0;
int L1 = (int)((N )/(double )(X ));
When performed using the entire database of handwritten dig- int L2 = (int)((M )/(double )(X ));
its downloaded from [64], this experiment succeeds. Complemen- for(int i = 0;i <((int)((N )/(double )(X )))*X - 1;i + +)
tarily, Fig. 16(a) and (b) shows, respectively, the 508 x 508 Elsevier for(int j = 0; j <((int)((M )/(double )(X )))*X ; j + +)
f [(((int)(i/L2))*(X))+((int)( j/L1))]+ =(m[i][ j] m[i + 1][ j] <
logo and its corresponding borders extracted on the basis of the 0)?1:0;
proposed approach adopting 55 . All the tests performed allow me for(int i = 0;i <((int)((N )/(double )(X )))*X ;i + +)
to state that the proposed approach produces equivalent percep- for(int j = 0; j <((int)((M )/(double )(X )))*X - 1; j + +)
f [(((int)(i/L2))*(X))+((int)( j/L1))]+ =(m[i][ j] m[i][ j + 1] <
tual results in comparison with those obtained when algorithms
0)?1:0;
of cutting-edge nature, such as [6567]. are adopted. Furthermore, double highest _ZCR = 0; // image normalisation
in addition to the advantages discussed above, it also presents an for(int l = 0;l <total_size_of_f;l + +)
attractive order of time and space complexities [15]. Finally, based if(f[l]> highest _ZCR)
highest _ZCR = f [l];
on the fact that my technique completely differs from the current for(int l = 0;l <total_size_of_f;l + +)
ones, in its nature and essence, I refrain from establishing more f[l]/ = highest _ZCR;
detailed analogies. ZCR-based algorithms for 2D signal processing for(int k = 0;k <total_size_of_f;k + +) // image synthesis
for(int p = (int )(k/(double )(X )); p < (int )(k/(double )(X )) +
cannot even be found in the literature, especially when it comes (int )((N )/(double )(X )) 1; p + +)
to border extraction. for(int q = k%X; q < k%X + (int )((M )/(double )(X )) 1; q +
Algorithm 16 contains a C/C++ function that receives m[][] +)
m[ p][q] = 255 f [k];
and its dimensions as input, modifying them so that the bor- N = X; // adjust size
der image is synthesised. Basically, the function corresponds to M = X; // adjust size
algorithm 12 adapted so that f[] stores just the analysis of m[][] }
at one particular resolution, i.e., f [] = Q [], implying that X be-
comes an unique integer number, instead of being a vector. For in-
stance, if m[][] corresponds to the raw data extracted from the study, time-frequency features [69] served as input to a Support
image shown in Fig. 15(a), then, X is set to 127 and f [] = 31 [] Vector Machine (SVM) [70] dedicated to examine the four-second
2
becomes a (127 ) = 16, 129 sample-long vector. In the specic case sustained /a/ vowel sounds, as in the word dogma, emitted by
treated in this example application, for which f[] corresponds ex- people enrolled in our system. Similarly, in this experiment, I use
actly to Q , X may also assume values of non-prime numbers. An speech data digitalised at 22, 050 Hz, 16-bit [71], using a wideband
intelligent choice is to set it as being a multiple of both the original microphone in a sound cabinet, from fourteen healthy subjects and
N and M in order to avoid some parts of the image to be discarded, from fourteen individuals with Reinkes edema in their larynx. All
as exemplied in Fig. 8b of Section 2. the twenty-eight voices were accredited by medical professionals
based on specic hardware tools which allow precise image exam-
4.3. Biomedical signal analysis inations and detailed vocal analyses.
Initially, the radiation effects from the speakers lips were re-
In one of my previous works from 2007 [68], an algorithm to moved, as a pre-processing stage known as pre-emphasis [48]-
distinguish between healthy speech (HS) and pathologically-affect pp.168 [72]-pp.25, before applying the proposed approach. The
speech (PAS) was presented. The former and the latter encompass, rst-order nite impulse response (FIR) high-pass lter [13] whose
respectively, the individuals with no abnormality in their vocal ap- coecients are g[] = {1, 0.95} was used, via convolution [13], to
paratus and the ones with a pathology in their larynxes. In that perform this task. Considering s[], of length M, as being an input
R.C. Guido / Knowledge-Based Systems 105 (2016) 248269 267

Table 1 5. Conclusions
Results obtained from the experiment on biomedical signal analysis.

Value chosen for This study, dedicated to our neurocomputing community, was
C 19.9% 24.9% 33.3% carefully written, polished and reviewed to serve as a tutorial on
Corresponding  100  =  19
100
 = 5  100  =  24
100
 = 4  100  =  33
100
=3
ZCRs for both 1D and 2D DSP applications designed for PRKbS. All
C .9 C .9 C .3
value of T the concepts I described correspond to the outcome of a wide and
PAS
 HS  PAS
 HS  PAS
 HS  detailed research work. The readers can observe that, despite the
Corresponding PAS 7 0 PAS 7 0 PAS 6 1 fact that ZCRs are well known in the literature, the different ways
confusion matrix HS 0 7 HS 2 5 HS 1 6 I show their applicability, majorly concerning the 2D cases, are to-
7+7 7+5 6+6
Resulting matrix = 100% = 85.71% = 85.71%
14 14 14 tally new.
sensibility (% of
accuracy) Specically, three methods for feature extraction based on ZCRs
were presented just after the literature review section: B1 , which
is the simplest one and is intended to produce variable-length fea-
speech signal, the procedure is: ture vectors, is useful to search for a specic event or character-
istic in a digital signal, such as word segmentation or the distinc-
sk sk (0.95 sk1 ) , (1  k < M ) . tion between voiced and unvoiced frames of a speech signal. B2 , on
the other hand, for which an application on image border extrac-
Likewise any usual classication scheme, this approach intends to tion based on analysis and re-synthesis was exhibited, is adopted
offer the classier a set of xed-length feature vectors, implying to inspect how ZCRs are distributed along the time, or space, in
that B1 is useless. On the other hand, both B2 and B3 output a different levels of resolution. Finally, B3 , exemplied for the dis-
set of T elements regardless of M, being the latter a preferable tinction of healthy and pathologically-affected biomedical signals,
method because it allows the regularity of ZCRs to be analysed is usefull in searching for the possible variabilities of ZCRs as time
along a period. From Section 1.2 we have learnt that ZCRs are more or space advances. Three different types of normalisations, named
likely to capture the fundamental frequencies of the speech signals, TA, PA and EA, were also designed to work together with B1 and
i.e., their pitch [12] (F0 ), disregarding the resonances of the vocal B2 . Furthermore, sixteen algorithms, sixteen gures, six numerical
tract, also known as being the formants (F1 , F2 , ...), which corre- examples and one table were included in the text.
spond to their higher frequencies. Complementarily, based on the The readers may have learnt from many references, such as
fact that B3 intrinsically normalises the frequencies it captures, the [1], that the ordinary PRKbS require a classier to be associated
proposed approach registers only the way pitches vary, disregard- with the features extracted from raw data. The more such features
ing their values themselves. Thus, for any speaker, HS is expected linearly separate the mixed data from different classes in a cor-
to keep an almost linear variation in response to the regular and rect manner, the less exquisite the classier should be, and vice-
non-excessive effort performed by the vocal folds and associated versa. Particularly based on the example applications described in
organs during vibration. On the other hand, PAS usually shows ir- Section 4, I highlight one aspect of the proposed approaches: the
regular variations due to an excessive, and sometimes irresponsive, simplest existing classiers, i.e., HT and AD, were used, implying
effort performed by the speakers vocal system. that the ZCR-based features brightly performed their work. Possi-
Assuming that seven signals from each class were adopted as bly, this is due to the fact that ZCRs are, by themselves, neurocom-
being their representative models and the other seven were used puting agents, as shown in Section 2. To treat much more com-
to test the proposed approach, an ordinary absolute distance (AD) plex problems, the potential of ZCRs may be enlarged by associat-
measurement [73] was selected to serve as the classier. The dis- ing them with different types of neural networks [80], SVMs [[81
tances from each testing vector to all the template models are 83]], hidden markov models (HMMs) [84], paraconsistent [85], and
registered, then, the class for which the lowest one belongs to, others.
guides the assignment. Table 1 shows the best results obtained
 2 2 2
In relation to noisy inputs, B3 is less inuenced than B1 and
for all the possible 147 = ( 7!(14
14!
7 )!
) = (3432 ) hold-out cross- B2 when the noise, regardless of being white, pink, red, and so
validations procedures [74], considering different options for C. on [7,13], is uniformly distributed along the signal under analysis.
The respective accuracies suggest that a ner analysis, i.e., C = This is because that method describes the way ZCRs vary, instead
19.9%, is required to characterise important data. For C = 33.3%, of counting them, implying that the artifacts introduced affect the
excessive information is grouped together in each of the T = 3 co- entire signal more or less in the same manner, vanishing their ef-
ecients of the feature vectors, causing the misclassication of one fect over B3 .
PAS member that was labelled as being an HS one. Although for Concluding, the proposed approaches provide a valid contribu-
C = 24.9%, there are also incorrect assignments, HS members la- tion for both young researchers, who are expected to take advan-
belled as being PAS do not cause serious consequences, as in the tage of the fundamental concepts and basic inputs drawn from
previous case in which the opposite occurs. DSP and PRKbS theory, and experienced professionals, for whom
The above-mentioned characteristics of HS and PAS allow a rel- this text may serve as an initial insight to the project of fruit-
ative generalisation of the results presented in Table 1, despite ful and prominent algorithms. As mentioned in [1], this study also
the modest size of the database, for which the lack of volunteers draws the DSP and PRKbS scientic communities attention to con-
and the rigorous accreditations prevented further expansion. Thus, sider the use of ZCRs, somewhen in conjunction with signal en-
I consider a meaningful and relevant outcome was obtained. De- ergy [1] or other features, to conform creativity, simplicity, and
tailed comparisons with similar state-of-the-art algorithms, such accuracy.
as those documented in [68,7579], were avoided because their All the data used during the experiments, excluding TIMIT
databases and pathologies differ, however, the proposed approach which is controlled by the Linguistic Data Consortium (LDC), are
is overall as accurate as, and much simpler than, those strategies. available to the scientic community upon prior request 3 so that
Furthermore, AD was purposely selected to play the role of the the procedures could be reproduced. Further research related with
classier just to emphasise the relevance of the ZCR-based features, ZCRs focuses both on minor changes in the proposed approaches
i.e., due to the potential solution offered by the latter, the former,
consisting of the simplest existing possibility, performs an effort-
less job. 3
Please, send requests to guido@ieee.org
268 R.C. Guido / Knowledge-Based Systems 105 (2016) 248269

so that more specic issues are properly addressed. An intriguing [25] S.J. An, R.M. Kil, Y.-. I. Kim, Zero-crossing-based speech segregation and recog-
open question: are there humbler features than ZCRs which are ca- nition for humanoid robots., IEEE Trans. Consum. Electron. 55 (4) (2009)
23412348.
pable of achieving similar or better results for basic spectral signal [26] A.S. Zandi, R. Tafreshi, M. Javidan, Predicting epileptic seizures in scalp EEG
description? based on a variational bayesian gaussian mixture model of zero-crossing in-
tervals., IEEE Trans. Biom. Eng. 60 (5) (2013) 14011413.
[27] M. Phothisonothai, M. Nakagawa, A complexity measure based on modied
Acknowledgements zero-crossing rate function for biomedical signal processing., in: Proceedings
of the 13th International Conference on Biomedical Engineering (ICBME), 23,
2009, pp. 240244.
I am very grateful to CNPQ - Conselho Nacional de Pesquisa e [28] M.I. Khan, M.B. Hossain, A.F.M.N. Uddin, Performance analysis of modied zero
Desenvolvimento, in Brazil, for the grants provided, through the crossing counts method for heart arrhythmias detection and implementation
process 306811/2014-6, to support this research. in HDL., in: Proceedings of the International Conference on Informatics, Elec-
tronics and Vision (ICIEV), 2013, pp. 16.
[29] C.-. H. Wu, H.-. C. Chang, P.-. L. Lee, Frequency recognition in an SSVEP-based
References brain computer interface using empirical mode decomposition and rened
generalized zero-crossing., J. Neurosci. Methods 196 (1) (2011) 170181.
[1] R.C. Guido, A tutorial on signal energy and its applications, Neurocomputing [30] D. Guyomar, M. Lallart, K. Li, A self-synchronizing and low-cost structural
179 (2016) 264282. health monitoring scheme based on zero crossing detection., Smart Mater.
[2] J. Xu, A multi-label feature extraction algorithm via maximizing feature vari- Struct. 19 (4) (2010).Article Number: 045017, 2010.
ance and feature-label dependence simultaneously, Knowl. Based Syst. 98 [31] L. Florea, C. Florea, R. Vranceanu, C. Vertan, Zero-crossing based image pro-
(2016) 172184. jections encoding for eye localization., in: Proceedings of the 20th European
[3] Q. Zhou, H. Zhou, T. Li, Cost-sensitive feature selection using random forest: se- Signal Processing Conference (EUSIPCO), 2012, pp. 150154.
lecting low-cost subsets of information features., Knowl. Based Syst. 95 (2016) [32] S. Watanube, T. Kotnatsu, T. Saito, A stabilized zero-crossing representation in
111. the wavelet transform domain and its extension to image representation for
[4] L. Yijing, Adapted ensemble classication algorithm based on multiple classi- early vision., in: IEEE TENCON - Digital Signal Processing Applications, 1996,
cation systems and feature selection for classifying multi-class unbalanced pp. 496501.
data., Knowl. Based Syst. 94 (2016) 88104. [33] J.G. Daugman, Pattern and motion vision without laplacian zero crossings., J.
[5] S. Garcia, J. Luengo, F. Herrera, Tutorial on practical tips of the most inuential Opt. Soc. Am. A-5 (7) (1988) 11421148.
data preprocessing algorithm in data mining., Knowl. Based Syst. 98 (2016) [34] K.-. L. Du, M.N.S. Swamy, Neural Networks and Statistical Learning, Springer,
129. 2014.
[6] Y. Meng, J. Liang, Y. Qian, Comparison study of orthonormal representations of [35] N. Nedjaha, F.M.G. Fran A a, M.D. Gregorio, L.M. Mourelle, Weightless neural
functional data in classication., Knowl. Based Syst. 97 (2016) 224236. systems., Neurocomputing 183 (2016) 12.
[7] S.M. Alessio, Digital Signal Processing and Spectral Analysis for Scientists: Con- [36] H.C.C. Carneiro, F.M.G. Franca, P.M.V. Lima, Multilingual part-of-speech tagging
cepts and Applications, 1, Springer, 2016. with weightless neural networks., Neural Netw. 66 (2015) 1121.
[8] B. Stroustrup, The C++ Programming Language, 4, Addison-Wesley Professional, [37] G.G. Lockwood, I. Aleksander, Predicting the behaviour of g-RAM networks.,
2013. Neural Netw. 16 (1) (20 03) 9110 0.
[9] M. Steenbeck, A contribution to the behavior of short AC arcs during the cur- [38] T.F. Quatieri, Discrete-time Speech Signal Processing: Principles and Practice,
rent zero crossing., Z. Phys. 65 (1-2) (1930) 8891. Upper Saddle River, NJ: Prentice Hall, 2001.
[10] F.M. Young, J.C. Grace, Zero crossing intervals of a sine wav in noise., J. Acoust. [39] W. Chou, B.H. Juang, Pattern Recognition in Speech and Language Processing,
Soc. Am. 25 (4) (1953) 832. Boca Raton: CRC Press, 2003.
[11] J.P. Ertl, Detection of evoked potentials by zero crossing analysis., Electroen- [40] H. Beigi, Fundamentals of Speaker Recognition, New York: Springer, 2011.
cephalogr. Clin. Neurophysiol. 18 (6) (1965) 630631. [41] R.C. Guido, L.S. Vieira, S. Barbon Jr., A neural-wavelet architecture for voice
[12] L. Deng, D. OShaughnessy, Speech Processing: A Dynamic and Optimiza- conversion., Neurocomputing 71 (1-3) (2007) 174180.
tion-oriented Approach, CRC Press, 2003. [42] T. Ogunfunmi, M. Narasimha, Principles of Speech Coding, CRC Press, 2010.
[13] A.V. Oppenheim, R.W. Schafer, Discrete-time Signal Processing, 3, Prentice-Hall, [43] K. Skoruppa, et al., The role of vowel phonotactics in native speech segmenta-
2009. tion., J. Phonet. 49 (2015) 6776.
[14] S. Haykin, B.V. Veen, Signals and Systems, 2, Wiley, 2002. [44] TIMIT speech corpus. linguistic data consortium (LDC), https://catalog.ldc.
[15] S. Arora, Computational Complexity: a modern approach, Cambridge University upenn.edu/LDC93S1.
Press, 2009. [45] C. Kim, K.-. d. Seo, Robust DTW-based recognition algorithm for hand-held
[16] S. Goswami, P. Deka, B. Bardoloi, D. Dutta, D. Sarma, A novel approach for consumer devices., IEEE Trans. Consum. Electron. 51 (2) (2005) 699709.
design of a speech enhancement system using NLMS adaptive lter and ZCR [46] X. He, L. Deng, Discriminative Learning for Speech Recognition, Morgan and
based pattern identication., in: Proceedings of the 2013 1st International Con- Claypool Publishers, 2008.
ference on Emerging Trends and Applications in Computer Science (ICETACS), [47] B. Atal, L. Rabiner, A pattern recognition approach to voiced-unvoiced-si-
2013, pp. 125129.13-14. lence classication with applications to speech recognition., IEEE Trans. Audio,
[17] H.-.M. Park, R.M. Stern, Spatial separation of speech signals using amplitude Speech, Lang. Process. 1 (24) (1976) 201212.
estimation based on interaural comparisons of zero-crossings., Speech Com- [48] J. Harrington, S. Cassidy, Techniques in Speech Acoustics, The Netherlands:
mun. 51 (1) (2009) 1525. Kluwer Academic Publishers, 1999.
[18] A. Ghosal, R. Chakraborty, R. Chakraborty, S. Haty, B.C. Dhara, S.K. Saha, [49] O.J. Rosanen, U.K. Laine, T. Altosaar, An improved speech segmentation quality
Speech/music classication using occurrence pattern of ZCR and STE., in: Pro- measure: the r-value, in: Proceedings of the Interspeech, 2009, pp. 18511854.
ceedings of the Third International Symposium on Intelligent Information [50] S. Brognaux, T. Drugman, HMM-based speech segmentation: improvements of
Technology Application (IITA), 3, 2009, pp. 435438. fully automatic approaches., IEEE-ACM Trans. Audio, Speech, Lang. Process. 24
[19] R.R. Shenoy, C.S. Seelamantula, A zero-crossing rate property of power comple- (1) (2016) 515.
mentary analysis lterbank outputs., IEEE Signal Process. Lett. 22 (12) (2015) [51] A. Stan, et al., ALISA: an automatic lightly supervised speech segmentation and
23542358. alignment tool., Comput., Speech Lang. 35 (2016) 116133.
[20] A.V. Levenets, C.E. Un, Method for evaluating periodic trends in measured [52] R.H. Baayen, C. Shaoul, J. Willits, M. Ramscar, Comprehension without segmen-
signals based on the number of zero crossings., Meas. Tech. 58 (4) (2015) tation: a proof of concept with naive discriminative learning., Lang., Cognit.,
381384. Neurosci. 31 (1) (2016) 106128.
[21] R.R. Shenoy, C.S. Seelamantula, Spectral zero-crossings: localization properties [53] F. Stahlberg, T. Schlippe, S. Vogel, T. Schultz, Word segmentation and pronun-
and application to epoch extraction in speech signals., in: Proceedings of the ciation extraction from phoneme sequences through cross-lingual word-to
International Conference on Signal Processing and Communications (SPCOM), phoneme alignment., Comput., Speech, Lang. 35 (2016) 234261.
2012, pp. 15. [54] K.G. Estes, C. Lew-Williams, Listening through voices: infant statistical word
[22] M. Jalil, F.A. Butt, A. Malik, Short-time energy, magnitude, zero crossing rate segmentation and meaning acquisition through cross-situational learning., Dev.
and autocorrelation measurement for discriminating voiced and unvoiced seg- Psychol. 51 (11) (2015) 15171528.
ments of speech signals., in: Proceedings of the International Conference on [55] Ok. Rasanen, H. Rasilo, A joint model for word segmentation and meaning
Technological Advances in Electrical, Electronics and Computer Engineering acquisition through cross-situational learning., Psychol. Rev. 122 (4) (2015)
(TAEECE), 2013, pp. 208212. 792829.
[23] R.G. Bachu, S. Kopparthi, B. Adapa, B.D. Barkana, Voiced/unvoiced decision for [56] L. White, S.L. Mattys, L. Stefansdottir, Beating the bounds: localised timing cues
speech signals based on zero-crossing rate and energy., in: K. Elleithy (Ed.), Ad- for word segmentation., J. Acoust. Soc. Am. 138 (2) (2015) 12141220.
vanced Techniques in Computing Sciences and Software Engineering, Springer, [57] F. Nery, J.S. Silva, N.C. Ferreira, F. Caramelo, R. Faustino, An algorithm for the
2010, pp. 279282. pulmonary border extraction in PET images, Proc. Technol. 5 (2012) 876884.
[24] Y.-. I. Kim, H.-. Y. Cho, S.-. H. Kim, Zero-crossing-based channel atten- [58] L.H. Son, T.M. Tuan, A cooperative semi-supervised fuzzy clustering framework
tive weighting of cepstral features for robust speech recognition: the ETRI for dental x-ray image segmentation, Expert Syst. Appl. 46 (2016) 380393.
2011 CHiME challenge system., in: Proceedings of the Interspeech, 2011, [59] X.-. Y. Wang, Pixel classication based color image segmentation using quater-
pp. 16491652. nion exponent moments., Neural Netw. 74 (2016) 113.
R.C. Guido / Knowledge-Based Systems 105 (2016) 248269 269

[60] M. Nixon, Feature Extraction & Image Processing for Computer Vision, 3, Aca- [73] V. Serdarushich, Analytic Geometry, CreateSpace Independent Publishing Plat-
demic Press, 2012. form, 2015.
[61] P. Zhang, T.D. Bui, C.Y. Suen, Wavelet feature extraction for the recognition [74] J.H. Kim, Estimating classication error rate: repeated cross-validation, re-
and verication of handwritten numerals., Keynote Address at 6th Interna- peated hold-out and bootstrap., Comput. Stat. Data Anal. 53 (11) (2009)
tional Program on Wavelet Analysis and Active Media Technology. Available 37353745.
at http://users.encs.concordia.ca/bui/pdf/Keynote.pdf. [75] Z. Ali, I. Elamvazuthi, M. Alsulaiman, G. Muhammad, Detection of voice pathol-
[62] S.E.N. Correia, J.M. Carvalho, R. Sabourin, On the performance of wavelets for ogy using fractal dimension in a multiresolution analysis of normal and disor-
handwritten numerals recognition., in: Proceedings of the 16th International dered speech signals., J. Med. Syst. 40 (1) (2016) 110.
Conference on Pattern Recognition, 2002, 3, 2002, pp. 127130. [76] D. Panek, A. Skalski, J. Gajda, R. Tadeusiewicz, Acoustic analysis assessment
[63] X. You, L. Du, Y. Cheung, Q. Chen, A blind watermarking scheme using new in speech pathology detection., Int. J. Appl. Math. Comput. Sci. 25 (3) (2015)
nontensor product wavelet lter banks., IEEE Trans. Image Process. 19 (12) 631643.
(2010) 32713284. [77] M. Alsulaiman, Voice pathology assessment systems for dysphonic patients:
[64] The MNIST database of handwritten digits, Available at http://yann.lecun.com/ detection, classication, and speech recognition., IETE J. Res. 60 (2) (2014)
exdb/mnist/. 156167.
[65] A. Pratondo, C.-. K. Chui, S.-.H. Ong, Robust edge-stop functions for edge-based [78] M.J. Pulga, A.C. Spinardi-Panes, S.A. Lopes-Herrera, L.P. Maximino, Evaluat-
active contour models in medical image segmentation., IEEE Signal Process. ing a speech-language pathology technology., Telemed. e-health 20 (3) (2014)
Lett. 23 (2) (2016) 222226. 269271.
[66] Z.M. Hadrich A, A. Masmoudi, Bayesian expectation maximization algorithm by [79] T.L. Whitehill, S. Bridges, K. Chan, Problem-based learning (PBL) and
using b-splines functions: application in image segmentation., Math. Comput. speech-language pathology: a tutorial., Clin. Linguist. Phonet. 28 (1-2) (2014)
Simulat. 120 (2016) 5063. 523.
[67] M. Liao, Automatic segmentation for cell images based on bottleneck detection [80] S. Haykin, Neural Networks and Learning Machines, 3, Prentice Hall, 2008.
and ellipse tting., Neurocomputing 173 (2016) 615622. [81] M. Jandel, Biologically relevant neural network architectures for support vector
[68] E. Fonseca, R.C. Guido, P.R. Scalassara, C.D. Maciel, J.C. Pereira, Wavelet machines., Neural Netw. 49 (2014) 3950.
time-frequency analysis and least-squares support vector machine for the [82] Y. Leng, Employing unlabeled data to improve the classication performance
identication of voice disorders., Comput. Biol. Med. 37 (4) (2007) 571578. of SVM and its applications in audio event classication., Knowl. based Syst.
[69] P. Addison, J. Walker, R.C. Guido, Time-frequency analysis of biosignals., IEEE 98 (2016) 117129.
Eng. Biol. Med. Mag. 28 (5) (2009) 1429. [83] L. Shen, Evolving support vector machines using fruit y optimization for med-
[70] R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classication, 2, Wiley-Interscience, ical data classication., Knowl. Based Syst. 96 (2016) 6175.
20 0 0. [84] A.M. Fraser, Hidden Markov Models and Dynamical Systems, Society for Indus-
[71] M. Bossi, E. Goldberg, Introduction to Digital Audio Coding and Standards, trial and Applied Mathematics, 2009.
Kluwer, 2003. [85] R.C. Guido, S. Barbon Jr., R.D. Solgon, K.C.S. Paulo, L.C. Rodrigues, I.N. Silva,
[72] F. Muller, Invariant Features and Enhanced Speaker Normalization for Auto- J.P.L. Escola, Introducing the discriminative paraconsistent machine (DPM)., Inf.
matic Speech Recognition, Logos Verlag, 2013. Sci. 221 (2013) 389402.

You might also like