Professional Documents
Culture Documents
LIAO LEI
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
To Andy W. H. Khong
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
ii
Abstract
A class of adaptive algorithms for acoustic echo cancellation (AEC) and speech dereverberation have been developed and analyzed in this thesis. The starting point of
this work is the affine projection (AP) based non-blind channel identification algorithms for AEC. The proposed. sparseness constrained improved proportionate AP
algorithms (SC-IPAPA-I and SC-IPAPA-II) exploit the sparseness of the estimated
channel and allocate different effective step sizes accordingly. The performance of
these algorithms has been studied in the context of single-channel AEC and tracking
capability.
The performance of a blind channel identification algorithm with additive
noise is studied by providing reasons for the degradation in perforrnance of the rnultichannel least-rnean-square (MCLMS) algorithm in the presence of additive noise.
Subsequently, it is shown mathematically, through a cross-correlation based cost
function, that minimizing the power of a filtered version of the received signals can
suppress the noise effect. The performance of MCLMS and improved lVICLlVIS (LVICLMS) are evaluated using the normalized projection rnisalignment via lVlonte Carlo
simulations.
It is shown through simulations that the well-known normalized rnultichannel
frequency-domain least-mean-square (NMCFLMS) algorithrn, originally developed
for acoustic impulse response (AIR) identification in the frequency domain, also
suffers from the noise robustness problem. A noise robust blind AIR estima.tion
algorithm is then proposed. Inspired by the trivial solution achieved by NMCFLJVIS
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
Abstract
iii
in the presence of noise, the proposed direct-path NMCFLMS algorithm with power
constraint (DP-NMCFLMS-PC) jointly applies direct-path and power constraint to
improve its robustness against noise. The DP-NMCLFMS-PC not only addresses the
noise robustness issue but also achieves fast convergence compared to NMCFLMS.
The adaptive multiple input/output inversion theorem (A-MINT) algorithm
has been developed for channel equalization with application to speech dereverberation. In order to increase its convergence rate, the proposed algorithm suppresses
any undesired non-zero coefficients in the estimated Kronecker delta function iteratively. This is achieved by applying the sparseness measure of the estimated
Kronecker delta function and using it as an additional constraint to A-MINT.
Unlike existing channel equalization systems, the proposed auto-relation
aided MINT (A-RAM) algorithrn, which achieves good equalization performance,
takes into account how the received signals are generated during its adaptation process. The differential output signals from two sub-systems are then utilized in the
cost function during equalization. Sirnulation results have shown that the proposed
A-RAM algorithm can achieve a higher rate of convergence leading to better dereverberated speech signal cornpared to existing MINT-based equalization algorithms.
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
iv
Acknowledgment
This thesis is the result of four years of research and I would like to take this
opportunity to thank many people who have contributed towards this work in one
way or another.
Foremost, I would like to express my sincere gratitude and appreciation to my
advisor Dr. Andy W. H. Khong who has given me tremendous guidance and advice
to my research work. I am very grateful for his constant patient and enthusiasm
over the past four years without which my research could not be such enjoyable.
I am also very grateful for Dr. Woon Seng Gan who generously hosted my
first year research in the Digital Signal Processing Lab. I sincerely thank him for
his valuable comments and advice in my research.
For the non-scientific side of my thesis, I particularly want to thank my wife
Zhang Yuan for being my eternal sunshine. I thank my parents, parents-in-law and
my sister for their care and support shown to me through the entire process. I also
thank the technical staff Mr. Chu Chun Chung, Mr. Yeo Sung Kheng and Mr. Ong
Say Cheng for their great assistance in the Lab. I also like to express my sincere
thanks to our teammates for their valuable comments.
Liao Lei
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
Contents
Abstract
ii
Acknowledgment
iv
Abbreviations
1
xii
Introduction
1.1
1.2
1.3
. . . . . . . . . . . . . . . . . . . . . ..
10
2.1
Introduction..................
10
2.2
13
2.2.1
14
2.2.2
16
2.2.3
2.3
17
18
2.3.1
19
2.3.2
21
2.4
Stability analysis . . . . .
25
2.5
Computational cornplexity
30
2.6
Simulation results. . . . .
31
2.6.1
31
2.6.2
33
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
CONTENTS
2.6.3
vi
2.6.4
' . . . . . . . ..
2.7
34
36
Conclusion..........
37
39
3.1
Introduction.....
39
3.2
Problem formulation
41
3.3
44
3.4
..
46
3.4.1
Convergence behaviour. . . . . . . .
50
3.4.2
52
3.5
Simulation results.
59
3.6
Conclusion.....
63
64
4.1
64
65
4.1.2
Harmonic filtering. . . . . . . . . .
68
4.1.3
4.1.4
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
68
69
70
4.2.1
72
4.2.2
76
4.3
77
4.4
81
4.2
4.4.1
Determination of
estimation .
83
4.5
Simulation results.
86
4.6
Conclusion.....
90
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
vii
CONTENTS
91
5.1
Introduction..............
91
5.2
93
5.3
96
5.3.1
....
97
98
5.4
5.5
Simulation Results
102
5.6
Conclusion....
107
Adaptive
Channel
Equalization
Exploiting
Segregated
Sub-
systems
109
6.1
Introduction........
. 109
6.2
Algorithmic developrnent
. 110
6.3
6.2.1
6.2.2
Derivation of closed-form
6.2.3
. 117
6.2.4
120
6.2.5
Computational cornplexity . . . . . . . . . . . . .
122
f3
. 123
123
125
Conclusion..................
129
model
6.3.2
117
6.4
114
131
7.1
Summary
131
7.2
134
Appendices
136
136
143
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
viii
List of Figures
1.1
1.2
2.1
2.2
= 65 with
as
= 5, (b)
13
T
= 15, (c)
= 35, (d)
20
= 1.
2.3
2.4
26
(() values.
2.5
Eigenvalue spread for IPAPA, SC-IPAPA-I and SC-IPAPA-II for 128 <
L
2.6
1024. . . . '. . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
29
2.7
25
32
2.8
. . . . . . . . . . . . . . . . . . . . . . . . . . . ..
2.9
33
34
Simulation setup used in the image model to generate room impulse response. 35
= 1024. . . . . .. 36
37
38
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
LIST OF FIGURES
3.1
ix
41
59
3.3
60
f3 == 1 for IMCLMS.
3.4
3.5
61
4.1
4.2
62
65
66
4.3
72
4.4
74
4.5
79
4.6
Illustration of NPM results from direct-path and unit-norm constraint NMCFLMS at SNR==25 dB. . . . . . . . . . . . . . . . . . . . . . . . . ..
4.7
(c)
4.9
80
81
(a) NPM of NMCFLMS and variation of (b) Ilh(m)II~, (c) ~llh(m)ll~ and
(d) cost function J(m) with time at SNR==25 dB. . . . . . . . . . .
84
86
87
88
..
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
LIST OF FIGURES
89
5.1
94
5.2
97
5.3
Steady-state performance comparison between A-MINT and proposed SCMINT when CNR
5.5
5.7
104
5.6
== 20 dB.
106
== 20 dB
6.1
Proposed inverse filtering method for a SIMO system with two sub-systems. 111
6.2
6.3
6.4
6.5
== 1600,
J-L
119
. 120
== 20 dB L ==
6.6
== 1600. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
BSD comparison between A-MINT, A-RMINT and proposed A-RAM algorithm using recorded AIRs with L
6.7
127
6.8
Equalized AIRs from (a) A-MINT, (b) A-RMINT and (c) A-RAM.
128
6.9
,
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
xi
List of Tables
2.1
2.2
30
30
. 115
6.1
6.2
. . 121
6.3
. . 122
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
xii
Abbreviations
AIR:
BSD:
BSI:
CFE:
DNE:
Delta-norm estimation
DP-NMCFLMS:
DP-NMCFLMS-PC:
Direct-path NMCFLMS
DP-NMCFLMS with power constraint
FFT:
Fast Fourier-transform
HOS:
Higher-order statistical
MCLMS:
MINT:
NMCFLMS:
Multichannel least-mean-square
Multiple-input/ output inversion theorem
Normalized multichannel frequency-domain least-mean-square
NPM:
SDR:
Signal-to-distortion ratio
SOS:
Second-order statistical
SRR:
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
xiii
List of Notations
Chapter 1
n
Sample iteration
ST(n)
SR(n)
g(n)
h(n)
h(n)
Estimated h(n)
w(n)
x(n)
y(n)
fj(n)
Estirnated y ( n )
e(n)
s(n)
hi(n)
Xi(n)
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
List of Notations
Chapter 2
L
h(n)
x(n)
v(n)
Additive noise
y(n)
X(n)
Lagrangian multiplier
Lagrangian function
Identity rnatrix of dimension m rows and n colurnns
Omxn
G(n)
9z(n)
xiv
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
List of Notations
Lz
Lu
Decay constant
JC
Positive scalar
c(n)
Q(n)
Unitary matrix
T'{n)
ryi (n)
TJ( n )
Normalized misalignment
xv
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
List of Notations
xvi
Chapter 3
The ith channel time-invariant AIR
M
Number of microphones
Concatenated
Yi(n)
eij(n)
eYj ( n)
e(n)
Xx(n)
Xv(n)
x(n)
Xp(n)
J(n)
Jp(n)
Ryiyj(n)
hi,
i.e.,
[hi hf h~]T
+ vi(n)
I~1 CLMS
Forgetting factor
(3
Lagrangian multiplier
(J'
~)(n)
1jJ' (n)
Projection misalignment
a(n)
u(n)
Difference between the true and estimated AIRs with scaling, i.e. h - a(n)h
z(n)
Difference between the estimated AIR at the nth iteration and the final estir
i.e., h(n) - h(oo)
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
List of Notations
Chapter 4
m
Frame index
b(n)
s(n)
Estimated s(n)
Is
Sampling frequency
-'tJ
e..(m)
hi(m)
hi,dp(m)
Vyi(m)
J(m)
'l9 i (m)
cpi(m)
xvii
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
List of Notations
Chapter 5
Length of inverse filter
Inverse filter of the ith channel
gi(n)
Estimated gi
g(n)
H1-
Hi(z)
Gi(Z)
D(z)
1-
hd
Sd(n)
Number of frames
Frame length
J(n)
.Jsc(n)
xviii
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
List of Notations
xix
x(n)
g(n)
e(n)
Jar(n)
R 11
R 12
JAM(n)
hi
z(n)
Difference between the estimated and true inverse filters, i.e., g(n) - 9
a;
-1
B~
B~
N;
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
Chapter 1
Introduction
1.1
An echo is the repetition of sound caused by the delayed transrnission of sound waves.
There are generally two major sources of echo in telecornmunication systems, hybrid
echo and acoustic echo [3] [4]. A telephone network for private prernises adopts a
two-wire subscriber line which is connected to the four-wire local exchange (for
long distance transrnission) via a two/four-wire hybrid bridge circuit.
A hybrid
echo therefore occurs when an impedance misrnatch exists between the two-wire
subscriber line and the four-wire truck line [5] [6].
quality of telephone speech and becomes annoying when the echo delay is significant
or its energy is relatively high. For long distance calls that are connected via a
satellite, the echo delay could last up to a few hundred rnilliseconds. Taking into
account the inherent delay of up to 200 ms in the telephone network, the hybrid echo
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
x(n)
Transmission
medium
1------- -------1
I
I
I
I
I
I
I
I
I
hen)
hen)
wen)
c~
~(n)
yen)
!y(n) +
Receiving
room
e(n)
}-----'---t----j
~-------------~
Acoustic echo
canceller
r-------'
Transmission
room
Figure 1.1: Schematic diagram of single-channel acoustic echo cancellation (after [1]).
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
/
/
Wall
,"
'\
1 \
1 \
1
\
,
,
/
/
'~
/
/
"
I'
-,
,,
,,
1
1
,,
, -,
\
\
\
" \ Microphone
'
" hl(r~)
,,
~S(rl)
1
1
1
1
Array
J>l
(1'1)
~T2('l1)
:.1: AI
(rz,)
a direct path as well as multipaths due to the reflections from the walls and ceiling.
Figure 1.2 illustrates the acoustic propagation from the source to a rnicrophone array
in a room. The reflected waves from the walls, ceilings and other surfaces require
a longer time to arrive at the microphones.
due to the absorption coefficient of these surfaces [10] and as a result, the received
reverberant signal is a mixture of the direct-path and reflected sounds.
Although reverberation adds warmth to the sound which is essential for rnusic
appreciation and enabling humans to better orientate thernselves in the listening
environrnent [11] [12], it leads to temporal and spectral srnearing of the clean speech.
This in turn distorts the frequency content of the signal. As a result, the received
speech sounds 'fuzzy' which reduces intelligibility, especially for the hearing-irnpaired
and elderly people [13] [14]. In addition, since reverberation alters the characteristics
and degrades the auditory quality of speech captured by a distant rnicrophone in the
roorn, it degrades the perforrnance of algorithrns that have been developed for a wide
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
1.2
This thesis is organized as follows: Chapter 2 addresses the problem of slow convergence of the affine projection (AP) based non-blind channel identification algorithms. It is begun by reviewing various AP algorithms (APAs). As will be shown,
the development of existing algorithms do not take the sparseness of an acoustic
impulse response (AIR) into account. This problem is particularly important since,
an AIR can vary in sparseness depending on the environment and/or the sourcemicrophone distance. Existing proportionate APA (PAPA) [17] and improved PAPA
(IPAPA) [18] [19] have been proposed for AEC by updating the estimated coefficients in a manner that is proportional to their magnitudes. It is worth noting that
the success of IPAPA depends on a predetermined control parameter; a pre-defined
and time-invariant control parameter may not be suitable for every system since
the sparseness of the AIR rnay vary depending on the acoustic environment. In
view of this, two sparseness-constrained IPAPAs (SC-IPAPA-I and SC-IPAPA-II)
are presented for fast convergence.
Chapter 3 addresses the noise robustness issue of a blind channel identification
algorithm when additive noise is present. The effect of noise on the performance
of the multichannel least-mean-square (:\lCLlVlS) algorithm [20] is firstly analyzed
using a cross-relation based cost function. It is shown via mathematical analysis
that constraining the power of the received signals across all channels can prevent
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
the algorithm from converging to the trivial solutions due to the additive noise.
A constrained cross-relation based cost function is then developed to address the
performance degradation of MCLMS in a noisy scenario. Monte Carlo simulation
results using different signal-to-noise ratios (SNRs) verify the effectiveness of the
proposed improved MCLMS (IMCLMS) algorithm.
In Chapter 4, the misconvergence problem of the normalized multichannel
frequency-domain least-mean-square (NMCFLMS) algorithm [21] is addressed for
the blind estimation of AIRs. It is begun by first reviewing the NMCFLMS algorithm. As will be shown, the estimated AIRs from NMCFLMS will converge
towards the null vectors in the presence of noise.
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
1.3
This thesis details contributions made by the author in channel identification (both
non-blind and blind, i.e., Chapters 2, 3 and 4) and acoustic channel equalization
(Chapters 5 and 6). The novelty of the proposed algorithms described in Chapter 2
is the utilization of the sparseness measure of an impulse response. Two mechanisms are proposed to achieve this.
IPAPA (SC-IPAPA-I), similar to SC-IPNLMS [23], additional weighting terms cornputed based on the sparseness of estimated impulse response are multiplied to the
proportionate and non-proportionate terms in the conventional IPAPA. It is noted
that the success of IPAPA depends on the value of a control parameter which may
not be optimal under various operating environments. Therefore, in the second
proposed SC-IPAPA-II, the constant control parameter is replaced by a sparsenessdependent parameter. This reduces the need to pre-define the control parameter
prior to adaptation. In addition to the development of the two algorithrns, stability and perforrnance analysis is provided which allows one to gain insights as tc
why the proposed SC-IPAPA-I and SC-IPAPA-II can achieve fast convergence. ThE
publication related to this contribution is [24].
Another contribution of this thesis is the analysis of how noise degrades the
performance of MCLMS using a cross-relation based cost function as presented ir
Chapter 3. One of the rnain contributions in this chapter is to show that minimizing
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
the cost function of MCLMS in a noisy scenario would lead to a solution where all
the estimated channels are equivalent. Further analysis shows that minimizing the
power of a filtered version of the received signals and the cross-relation based cost
function simultaneously can suppress the degradation due to noise effectively. This
chapter also forms the foundation to frequency-domain blind system identification
(BSI) algorithms described in Chapter 4.
The main contribution in Chapter 4 is the development of a fast converging
noise robust BSI algorithm. This is achieved via a two-step approach; incorporating
a power constraint to the adaptation and subsequently estimating the time instance
when the algorithm starts to misconverge. The starting point of the algorithmic
development is NMCFLMS [21] which achieves fast convergence and computational
efficiency by exploiting the inherent properties of filter adaptation in the frequency
domain [25]. However, it has been investigated in [26] that NMCFLMS lacks robustness to additive noise. This misconvergence problern is addressed by exploiting
the DP-NMCFLMS algorithm [22]. Irnposing a power constraint to the adaptation,
the proposed DP-NMCFLMS-PC algorithm not only achieves fast convergence, it
also gains an improvement in the steady-state perforrnance. As will be shown in
Chapter 4, another contribution is the delta-norrn estirnation (DNE) algorithm for
DP-NMCFLMS-PC. This algorithrn perforrns an online rnisconvergence point estimation using a power constraint that is close to the power of the true AIRs. In
practice, this is only achievable by cornputing the power of the estirnated AIRs
before the algorithrn rnisconverges. Therefore, estirnation of when the algorithm
rnisconverges is irnportant for DP-NMCFL:lVlS-PC. The proposed DNE algorithrn
achieves this by rnonitoring the gradient of the l2- norrn of the estirnated AIRs. The
misconvergence point is then identified as the instance by which the change in the
the gradient of the estimated AIRs is smaller than a pre-defined threshold. This
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
tional complexities of various algorithms are compared which shows that A-RAM is
more computational efficient than existing MINT-based algorithms. This work has
been accepted for publication in IEEE Transactions on Circuits and Systems I and
some parts of this work have been presented in [30].
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
10
Chapter 2
An Improved Affine Projection
Algorithm Employing Sparseness
2.1
Introduction
The use of adaptive filters for system identification has found applications in both
network echo cancellation (NEC) and acoustic echo cancellation (AEC) [23] [31].
Such adaptive filters are employed to estimate the unknown irnpulse response of the
systern and algorithrns developed for such applications require fast convergence as
well as good tracking performance. Although these requirernents are both important
for both applications, it is important to note that network impulse responses (NIRs)
and AIRs have different characteristics and hence adaptive algorithms developed for
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
2.1 Introduction
11
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
2.1 Introduction
12
algorithms is exploited when the AIR is sparse. This chapter begins by reviewing sparse adaptive algorithms in the context of the affine projection algorithm
(APA) [38] framework developed originally for NEe whereby the unknown impulse
response is sparse. One of such algorithms is the proportionate APA (PAPA) [17]
algorithm which jointly utilizes the APA and proportionate step-size technique that
has been presented in [33]. Similar to PNLMS, the PAPA achieves fast initial convergence but slows down subsequently due to the slow convergence of the coefficients
having significantly small magnitude [35]. The improved PAPA (IPAPA) [18] [19]
combines a weighted APA and PAPA such that the proportionate term associated
with PAPA is aimed at enhancing the convergence speed of the coefficients in the
active region while the non-proportionate term arising from APA ensures fast convergence for the coefficients having small magnitude in the non-active region. In
addition, frequency-dornain approaches have also been proposed for sparse system
identification [39] [40].
The contribution of this work is to further enhance the performance of IPAPA
for AEC by utilizing the sparseness measure [41] of an irnpulse response. As will be
discussed in Section 2.2.3, the success of IPAPA depends on a pre-defined control
parameter and it is foreseeable that this value may vary across different acoustic environments for fast convergence. In view of this, the sparseness of the estimated irnpulse response is incorporated in order to compute the time-varying weights assigned
to the proportionate and non-proportionate terrns. Two rnechanisms are proposed to
achieve this. In the first proposed sparseness-controlled IPAPA (SC-IPAPA-I), similar to SC-IPNLTvlS [23], additional weighting terms computed based on the sparseness measure of an estirnated irnpulse response are multiplied to the proportionate
and non-proportionate terrns in the conventional IPAPA. In the second proposed
SC-IPAPA-II, as opposed to existing IPAPA, SC-IPNLMS and SC-IPAPA, the pre-
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
13
h(n)
yen)
Adaptive
filter
e(n)
2.2
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
14
response h given by
x(n)
h
(2.1)
[ho hI ... hL -
(2.2)
I ] T,
where L is the filter length and [.]T denotes the transpose operator. Received signal
(2.3)
where v( 17,) is the observation noise. For clarity of presentation, the effects of v( 17,)
in the description of algorithms will be temporarily ignored .
2.2.1
The APA [38] with projection order p ~ L is derived based on optimization subject
to rnultiple equality constraints such that
y(n - p
where
h( n)
y(n)
(2.4)
+ 1)
(2.5)
is an estimate of h. To compute
h( n)
min
h(n)
II h(n) - h(n -
1) II ~ ,
S.
/'.
t. y(n) - X (n)h(n) =
0pXl,
(2.6)
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
--
1--
Opx1
15
1) ... y(n - p + l)]T and the L x p matrix X(n) == [x(n) x(n - 1) x(n - p + 1)].
The solution of h(n) can be obtained by taking the Lagrangian for (2.6) such that
--
1
2
(2.8)
Opx1,
(2.9)
(2.10)
<5
is a positive regularization
pararneter [42].
It has been shown in [43] that the choice of
<5
if it is not chosen properly under low SNR conditions, the APA rnay not converge.
In order to mitigate the effects of noise, a regularized APA (RAPA) was proposed
such that [44]
<5
RAPA -
L(1+V 1+ ENR) 2
ENR
(J";r'
(2.11)
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
==
16
2.2.2
The
proportionate
affine
projection
algo-
rithm (PAPA)
Extending APA and in order to update the coefficients that is proportional to their
magnitude for sparse system identification, a proportionate APA (PAPA) update
equation can be written as [17]
--
--
(2.12)
where A' is a p x 1 vector containing the PAPA Lagrangian multipliers and G(n) is a
diagonal proportionate step-size matrix of dimension L x L. The diagonal elements
of G( n) are related to the magnitude of the coefficients in
h(n
- 1) and can be
expressed as [33]
with I
~ Ki(n),
gl(n)
tq(n) /
rq(n)
max
{p x
max
(2.13)
(2.14)
==
OLx1
while p prevents
coefficients from stalling when they are much smaller than the largest coefficient [33].
Solving (2.12) and e(n) defined in (2.9), A'
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
17
-1
(2.15)
It is therefore noted that, similar to PNLMS, the PAPA assigns a higher step-size
to filter coefficients with higher magnitudes.
2.2.3
Similar to PNLMS, the drawback of PAPA is that it suffers from slow convergence
due to the small step-sizes allocated to coefficients with small magnitude. In order to
address this, an improved PAPA (IPAPA) was proposed in [18] [19]. This algorithm
is a combination of weighted PAPA and APA where elements of the control matrix
in (2.13) are now defined, similar to IPNLNIS [34], as
_ 1- a
gz ( n ) -
2L
(1 + a)lhz(n)1
--
21Ih(n)1I1
+f
== 0,1, ... ,L - 1,
(2.16)
given that (): is the control pararneter and ( is a srnall positive value that prevents
division by zero during initialization when h(O) == OLxI and
II . 111
is the [I-norm
gz(n) ==
1/ L,
[ == 0,1, ... , L - 1,
(2.17)
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
18
and as a consequence, the IPAPA is equivalent to the APA. On the other hand,
when ex
== 1,
l
== 0, 1, ... , L - 1,
(2.18)
the estimated filter coefficients are normalized by the [I-norm of h(n) such that
the effective step-size of IPAPA is proportional to the magnitude of the estimated
filter coefficients. Therefore, the performance of IPAPA is equivalent to APA when
ex
-1 while for a
2.3
One of the main weaknesses of the IPAPA algorithm is the need to determine n
that offers a high rate of convergence for different impulse responses. When the
estimated impulse response h( n) approaches towards the desired impulse response
frorn an initialized vector during adaptation, its magnitude and phase responses
may vary and therefore it may be desirable for ex to vary with iterations. It is
therefore proposed to assign a time-varying weighting to the proportionate and nonproportionate terrns in IPAPA iteratively according to the sparseness of h( n), rather
than using a pre-defined constant control parameter.
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
2.3.1
19
As stated in [41] [35] [45], the sparseness of an impulse response of length L can be
quantified using
_
L
.; - L - VI
where
Ilhllt
and
IIhll 2
[1 _VIllII hilthl1 ]
2
(2.19)
'
are defined as
(2.20)
(2.21)
As shown in [34] [35] a perfectly sparse irnpulse response with a single non-zero
coefficient has a sparseness measure
~.
==
o.
[0
L z xlI
e- 1/T e
2/ T
(L" - 1)/ T ]
(2.22)
where L, is the length of leading zeros and L; == L - l., is the length of decaying
window while
white Gaussian noise (WGN) sequence with variance (T~, the irnpulse response is
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
0.1
0.1
~=O.8518
~=O.9125
0.05
Q)
Q)
"0
~
.~
0.05
"0
~
~ -0.05
~-0.05
-0.1
-0.1
500
1000
~=O.7787
.~
1000
Coefficient index
0.1
~=O.7093
~0.05
~
Q)
~
500
Coefficient index
0.05
ctS
0.1
~.
t10
"0
20
.~
t10
ctS
~ -0.05
~-0.05
-0.1
-0.1
500
500
1000
1000
Coefficient index
Coefficient index
== 5, (b)
== 15, (c)
== 35, (d)
subsequently generated by
diag{b},
OLzxLz OLzxLu
h
[
OL u t.,
u+a,
(2.23)
B L u xL11.
where a is a LxI vector generated using another zero rnean WGN sequence with
variance (J~. This vector a ensures that the elernents in the 'inactive' region are
srnall but non-zero. Figure 2.2 shows an example of impulse responses generated
using (2.23) with a~
== 1,
a~
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
21
following decay constants (a) 7 == 5, (b) 7==15, (c) 7 == 35 and (d) 7 == 65 have
been used. These decay constants result in sparseness measures of (a) f, == 0.9125,
(b) f, == 0.8518, (c) f, == 0.7787 and (d) f, == 0.7093, respectively. Therefore, it can be
observed that a higher
2.3.2
h(n) given
by
(2.24)
The key feature of the proposed algorithms is their ability to control the relative
significance of the non-proportionate term (1 - a) / (2L) and the proportionate term
(1 +
SC-IPAPA-I
Similar to [35], it is proposed to control the weighting of the non-proportionate and
proportionate terms by using 1 - 0.5~c(n) and 1 + 0.5~c(n), respectively. It is
worthwhile to note that if h(n) is initialized as a null vector, i.e., h(O) ==
OLxl,
the
l2-norm of the impulse filter coefficient IIh(O) 112 == O. Hence, in order to prevent
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
22
Esc
is incorporated
(2.25)
By incorporating
Esc,
from n == O. Similar to
I can be expressed as
9z(n) ==
2L
21Ih(n)lh + E
(2.26)
for n2=:O and I == 0, ... , L - 1. It can be seen from (2.26) that, when ~c(n) is
high such as for a sparse impulse response, a higher weighting is assigned to the
proportionate term. Therefore the proposed SC- IPAPA-I achieves fast convergence
by taking the sparseness of the unknown system into account during adaptation.
SC-IPAPA-II
The PAPA achieves fast initial convergence but slows down subsequently due to
the small step-sizes allocated to coefficients with small rnagnitude.
pararneter
The control
in IPAPA addresses this issue and it has been shown in [19] that IPAPA
==
irnpulse responses.
Unlike [19] where a fixed value of
rnechanism for n is proposed.
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
estimated sparseness
23
==
[1 + cxsc(n)] Ihz(n)1
.21Ih(n)lh + E
while on the other hand, when ~c (n)
==
==
0,
(2.28)
is desirable to have
1 - cxsc(n)
- - L - - ==0.
(2.29)
From (2.28) and (2.29), two conditions for dispersive and sparse impulse response,
respectively, are therefore defined by
O!sc(n) =
1/J{ fsc(n) = o}
O!sc(n) =
1/J{ fsc(n) =
I}
= -1,
(2.30)
(2.31)
1.
There are various non-linear functions that can fulfill the conditions of (2.30)
and (2.31). Although it is useful to know that impulse responses, in general, follow an
exponential decay model [46], obtaining the time constant may be computationally
expensive [47]. For this reason, a third-order power function of ~c ( n) is proposed as
(2.32)
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
24
Figure 2.3 illustrates how Qsc(n) varies with ~c(n) in the proposed model. 11
can be seen that
Q sc
response increases. This implies that the proportionate term achieves higher weighting when f(n) -t 1. On the other hand, the non-proportionate term achieves higher
weighing when f(n) -t O. Similar to (2.16), elements in the control matrix for the
proposed SC-IPAPA-II can be expressed as
(2.33)
that, unlike [19] where the diagonal elements in the control matrix is dependent on
the predefined control parameter
Q,
~.
~,
rnagnitude of the irnpulse response. It is also important to note that the gradient ol
the slope increases with
~.
will adapt using a larger step-size for a sparse systern cornpared to a dispersive systern. On the other hand, filter coefficients with small magnitudes in a sparse system
will be assigned with a srnaller step-size cornpared to a dispersive systern. Therefore,
the proposed SC-IPAPA-II does not require a pre-defined control parameter and it
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
25
1...-----.-------.,.----.---.-----.------~
0.5
:s
tS
-0.5
-1
'-----'-_-L-
L - -_ _- L -
0.2
0.4
0.6
-'---_ _------1
0.8
results in a high rate of convergence that is robust to the sparseness of the impulse
response.
2.4
Stability analysis
Further insights of the proposed algorithrns are gained by analyzing their stability
as well as their convergence perforrnance. This allows us to determine the stepsize II to ensure convergence as well as justifying the irnprovement in convergence
performance over the IPAPA. Following the approach in [42]' the analysis is begun
by defining an impulse response error vector at time n as
c(n)
h(n) - h.
(2.34)
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
26
oC---------'---------'-------'-----'-------'
0.2
0.4
0.6
0.8
Figure 2.4: Magnitude of the elements in the control matrix with different sparseness
(~)
values.
Exploiting the APA update equation given by (2.10) and ignoring the effect of regularization parameter similar to [42], it can be derived that
(2.35)
(2.36)
R(n) = Q(n)r(n)QT(n),
(2.37)
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
27
where Q(n) is an unitary matrix whose columns are orthogonal set of eigenvectors
associated with the eigenvalues of R(n) while r(n) is a diagonal matrix with its
diagonal elements being the eigenvalues of R( n). These eigenvalues, denoted by
l'o(n) ,1'1(n), ... , l'L-l(n), are all non-negative and real. Assuming that the secondorder statistics of input signal is slowly varying, i.e., Q(n + 1)
Q(n), substituting
(2.36) and (2.37) into (2.35) and premultiplying the resultant equation by QT(n),
the following can be obtained
(2.38)
Defining z(n)
==
QT(n)c(n)
==
[I Lx L
z(n + 1)
+ 1) can be
expressed as
JLr (n )] z ( n )
IT [I
Lx L -
JLr(m )] z (1)
m=l
(2.39)
where
R=E{R(n)},
(2.40)
given that E {.} is the rnathernatical expectation operation. For the kth rnode of
APA, the following can be obtained
k == 0,1, ... ,L - 1;
(2.41)
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
28
n -+
00,
r.
== 0,1, ... ,L - 1.
(2.42)
(2.43)
!max
where
2
=;--
(2.44)
!max
where
;Y~ax
(2.45)
It is well known that the eigenvalue spread, which is defined as the ratio
between the largest eigenvalue and smallest eigenvalue of the correlation matrix (R
-I
and R ), has a significant impact on the convergence speed of steepest descent and
stochastic gradient algorithms [42] [48]. As the IPAPA and the proposed SC-IPAPAL SC-IPAPA-II have different diagonal proportionate matrices G(n), it is therefore
expected that these algorithms will have different eigenvalue spread. Figure 2.5
illustrates how the eigenvalue spread of the correlation matrix of IPAPA, SC- IPAPAI and SC-IPAPA-II varies for 128 ::; L ::; 1024. The input signal used for this
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
29
........
SOO~-.------.-----.-------r---
400
"'C
co
OJ
~
~300
OJ
:::l
co
~ 200
OJ
.~
100
200
400
600
800
Length of Impulse Response
1000
Figure 2.5: Eigenvalue spread for IPAPA, SC-IPAPA-I and SC-IPAPA-II for 128 ::; L ::;
1024.
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
30
Table 2.1: Number of multiplications of IPAPA, RAPA, SC-IPAPA-I and SCIPAPA-II per iteration.
Algorithm
IPAPA
RAPA
SC-IPAPA-I
SC-IPAPA-II
Number of multiplications
(p + 1)2 + (p2 + 2p + 2) + p3 + p2 - P - 9
(p2 + 2p)L + p3 + p2 + 3
(p + 1)L 2 + (p2 + 2p + 5)L + p3 + p2 - P + 1
(p + 1)L2 + (p2 + 2p + 2)L + p3 + p2 - P - 4
2.5
Algorithm
Number of additions
IPAPA
RAPA
SC-IPAPA-I
SC-IPAPA-II
Computational complexity
Tables 2.1 and 2.2 present the number of multiplications and additions per iteration
of IPAPA, RAPA, SC-IPAPA-I and SC-IPAPA-II, respectively. As can be seen frorn
Tables 2.1 and 2.2, IPAPA, SC-IPAPA-I and SC-IPAPA-II require 0(L 2 ) flops while
RAPA incurs the lowest computational complexity of O(L) flops since it does not
require computation of the proportionate matrix G( n) at each iteration. As will be
shown in Section 2.6, although SC-IPAPA-I and SC-IPAPA-II incur rnodestly higher
computational costs, the proposed algorithms can achieve significant performance
irnprovernent over RAPA and IPAPA.
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
31
2.6
Simulation results
The performance of the proposed SC-IPAPA-I and SC-IPAPA-II are compared with
IPAPA using the normalized misalignment defined by
r] n
) -_
lI h - h(n) lI ~
(2.46)
Ilhll~
Throughout the simulation, it is assumed that the length of the adaptive filter L
is equivalent to that of the unknown system. It has been studied in [50] that the
improvement in convergence reduces with increasing p. Therefore, to achieve a good
balance between fast convergence and cornputational complexity, the performance
of the algorithms is shown using an illustrative value of p
5 throughout the
simulations.
2.6.1
In this simulation, it is assurned that there is an echo path change midway during
the simulation where the irnpulse response is changed from a sparse to one that is
less sparse as shown in Figs. 2.2(b) and (d). The input signal is a zero mean WGN
sequence while another zero rnean WGN v(n) is added to y(n) as shown in Fig. 2.1
to achieve an SNR of 20 dB. In this simulation,
/-lSC-IPAPA-I
/-lSC-IPAPA-II
/-lIPAPA
= 0.18,
/-lRAPA
= 0.22 and
QSC-IPAPA-I
6RAPA
is determined by
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
32
Orr-----.--------.-------r-------,r-----.-------.---.
RAPA
RAPA
IPAPA, u=O.5
-5
C-IPAPA-I, u=O.5
-10
SC-IPAPA-I, u=O.
co -15
-0
-20
-25
-30
-35
L . . . - - _ - - . L -_ _ -<---_~.L.___-'-_
0.5
1.5
Number of iterations
_ _ _ _ L __ __ ' _ _ _ _ _ '
2.5
3
X 10
Figure 2.6: Convergence of IPAPA, RAPA, SC-IPAPA-I and SC-IPAPA-II using WGN
input with echo path changed midway during simulation.
(2.11). It can be observed from Fig. 2.6 that the normalized misalignment curves
of these algorithms increase midway during the simulation due to the change in
echo path. It is irnportant to note that before and after the echo path change, the
proposed SC-IPAPA-I and SC-IPAPA-II achieve an improvernent in steady-state
performance by more than 10 dB compared to IPAPA and they offer a higher rate
of convergence than that of RAPA. In addition, Fig. 2.7 illustrates the convergence
performance of these algorithrns using the same step-size of 0.16. As can been seen,
the proposed SC-IPAPA-I and SC-IPAPA-II consistently outperform RAPA and
IPAPA by offering a higher rate of convergence and lower steady-state.
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
33
O..----..--------.--------r---...------.--------.-----,
-5
RAPA
RAPA
IPAPA, a=O.S
IPAPA, a=O.S
-10
SC-IPAPA-I, a=O.
SC-IPAPA-I, a=O.S
co -15
SC-IPAPA-II
""C
-20
-25
-30
-35
'---_~
0.5
__
1.5
Number of iterations
_ J . . __ _...J.....___/
2.5
3
X 10
Figure 2.7: Convergence of IPAPA, RAPA, SC-IPAPA-I and SC-IPAPA-II using the same
step-size and WGN input with echo path change introduced midway during simulation.
2.6.2
pararneters for each of the algorithrns described in the previous section have been
used as well. Sirnilar to the results shown in Section 2.6.1, the proposed SC-IPAPAII achieves the highest rate of convergence.
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
34
0.5
1.5
2.5
3.5
4.5
Number of iterations
Figure 2.8: Convergence of IPAPA, SC-IPAPA-I and SC-IPAPA-II using speech input
with echo path change introduced midway during simulation.
2.6.3
In the next, the performance of the proposed SC-IPAPA-I and SC-IPAPA-II is verified using impulse responses generated by the method of images [51]. Sirnilar to
Figs. 2.6-2.8, it is assurned that there is an echo path change rnidway during the
simulation. The dimension of the room is 5 m x 6 rn x 5 rn and a loudspeaker is
placed at the center of the room. A sampling frequency
Is
tion time T60 = 300 ms are used. The AIRs are subsequently truncated to a length
of L
1024 sarnples. The position of the rnicrophone before the echo path change
was (2.5, 1, 1.6) m and the microphone is moved to (3.9, 2.3. 1.6) rn to sirnulate the
echo path change. The sirnulation setup for this irnage rnodel is shown in Fig. 2.9
while Figs. 2.10 (a) and (b) show the room impulse responses generated at positions
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
35
III
(b)
(a)
I----------------f~
~----------------. 5 III ----------------~
Figure 2.9: Simulation setup used in the image model to generate room impulse response.
'a' and 'b' labeled in Fig. 2.9. The sparseness measure of these impulse responses,
computed using (2.19), are ~a==0.6788 and ~b==0.8202, respectively. In this simulation, the input signal is generated by a zero rnean WGN sequence while the SNR
is set to 20 dB. Similar to Fig. 2.6, the step-sizes are
PSC-IPAPA-I
==
PSC-IPAPA-II
SC-IPAPA-I are
CXIPAPA
==
PIPAPA
== 0.18,
PRAPA
== 0.22,
CXSC-IPAPA-I
are shown in Fig. 2.11. As before, it is noted that the proposed SC-IPAPA-I and SCIPAPA-II achieve lower steady-state by approxirnately 10 dB compared to IPAPA.
In addition, the proposed algorithms offer a higher rate of convergence than that of
RAPA before and after the echo path change.
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
36
0.02
Q)
""C
(a)
0.01
::J
+-'
'c
eo
a
-0.01
100
200
300
400
500
600
700
800
900
1000
900
1000
Coefficient index
0.04
Q)
""C
(b)
0.02
:::J
+-'
'c
bO
a
-0.02
11 J a .r,....
T
100
200
300
400
500
600
700
800
Coefficient index
Figure 2.10: Impulse responses generated using the image model, L == 1024.
2.6.4
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
37
2.7 Conclusion
O..------.----.----------r---...,.-----.------,------,
RAPA
IPAPA, a=O.5
RAPA
-5
IPAPA, a=O.5
-10
SC-IPAPA-I, a=O.5
SC-IPAPA-I, a=O.5
-15
co
"'C
- -20
~
SC-IPAPA-II
-25
-30
-35
0.5
1.5
Number of iterations
2.5
3
X
10
Figure 2.11: Convergence of IPAPA, SC-IPAPA-I and SC-IPAPA-II using WGN input
with impulse responses generated using the method of images.
posed SCIPAPA- II algorithrn achieves approximately 3 dB improvement of normalized misalignrnent while SC-IPAPA-I achieves approximately 1.5 dB improvement
as compared to IPAPA, respectively.
2.7
Conclusion
SC-IP~A.PA-II. More
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
38
2.7 Conclusion
Speech
o
IPAPA,a=O. 5
/
0.5
SC-IPAPA-I,a=0.5
1.5
2.5
Iterations
3.5
4.5
5.5
4
x 10
Figure 2.12: Convergence of IPAPA, SC-IPAPA-I and SC-IPAPA-II using speech input
with impulse responses generated using the method of images.
in the conventional IPAPA. The proposed SC-IPAPA-II adopts this sparseness rneasure as a time-varying control parameter n(n) which assigns weight proportionately
to the coefficients in the estimated impulse response. This sparseness-dependent
weighting mechanism overcomes one of the main weaknesses of IPAPA since determination of a is no longer required for the proposed SC-IPAPA-II. Sirnulation
results using impulse responses generated by exponentially decay rnodel described
in [35] and the method of images [51] have shown that the proposed SC-IPAPA-I
and SC-IPAPA-II can achieve higher convergence rate than that of IPAPA.
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
39
Chapter 3
An Improved Multichannel
Least-Mean-Square Algorithm for
3.1
Introduction
Channel identification is the technique of building a mathematical model of an unknown dynamic system by analyzing its input/output data [21]. This problem of
fundamental interest arises in a variety of signal processing and communications applications [52] [53J. The self-recovering identification or blind channel identification
problern was originally described by Sato [54]. Since then, this research problem has
drawn widespread attention by many researchers who have developed algorithms
such as those presented in [55J [56] [57] [58]. Although these algorithms provide
reasonable channel estirnates under certain conditions, they often require relatively
large nurnber of data samples, which rnay limit their applications in an environment
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
3.1 Introduction
40
where the impulse responses are highly time-varying [59]. To address this issue,
many second-order statistics based algorithms have been proposed.
The least-squares approach [59] to blind channel identification introduces the
concept of cross-relation where the observed signal of one channel when convolved
with another channel's impulse response is equivalent to the convolution between
the impulse response obtained from the former channel and the observed signal of
the latter channel. It has also been shown that for the algorithm to identify the
channels uniquely and blindly, the polynomials of the channels are co-prime (i.e.,
they do not share any common roots) and that the auto-correlation matrix of the
source signal is full rank [59].
Based on the cross-relation concept, an adaptive multichannel least-meansquares (MCLMS) algorithm has been developed in [20] [60]. In these two papers, a
multichannel Newton (MCN) algorithm is also proposed. In order for the MCN algorithm to achieve a low steady-state, knowledge of the source signal auto-correlation
and the expectation of the cost function are required. However, as reported in [60],
such information are difficult to be obtained in practice. In addition, the l\1CN algorithrn is computationally rnore expensive than MCLMS since rnatrix inversion is
required [61]. In this chapter, the MCLMS algorithrn is described which serves as
the foundation for frequency-domain cross-relation based approaches in Chapter 4.
As will be shown in Section 3.5, MCLMS suffers from performance degradation in
a noisy environrnent.
The contribution of this chapter is the analysis of the cost function of MCLl\1S
which allows one to gain new insights as to why MCLMS is not robust to noise.
Thereafter, a constrained cross-correlation cost function is proposed to overcome
the perforrnance degradation of MCLMS in a noisy environrnent.
Monte Carlo
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
41
Input
Additive
noise
Channels
Observations
v1(n)
I----...c }----..
YI (n)
L..------:""----'
sen)
~--.f
}----..
Y2 (n)
~--.f
l----a-
Y M (n)
~-=----'
~.....o...:...o---,
Figure 3.1: Illustration of the relationship between the input s(n) and the observations
Yi(n) in a SIMO system.
simulation results provided in Section 3.5 will show that the proposed improved
MCLMS (IMCLMS) algorithm is rnore robust to noise and can gain significant
improvement in steady-state perforrnance.
3.2
Problem formulation
A single-input multiple-output (SIMO) finite impulse response (FIR) system is considered as shown in Fig. 3.1. The observed signal Yi (n) of each channel is a combination of the additive noise vi(n) and xi(n) which is a filtered version of the source
signal s(n), such that
where hi
,1\1,
(3.1)
1, ... , AI,
(3.2)
= 1, ...
+ 1)] T,
total number of channels. The aim of blind channel identification is to estimate the
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
42
channels hi, i
The MCLMS algorithm begins with the cross-relation between the sensor
outputs by considering
* denotes
ii=j~
(3.3)
(3.4)
where Yi(n)
+ l)]T.
noise, the above cross-relation in (3.4) no longer holds and an error function can be
defined as
_ { yT(n)hj
eij(n) -
(3.5)
0,
x(n) ==
L L
e;j(n),
(3.6)
i=l j=i+l
where eij(n) is now re-defined, similar to (3.5) but using the estimated impulse
response
hi (n ),
as
(3.7)
The channel irnpulse responses can be estimated by minirnizing (3.6). In order to
avoid a trivial estimate with all zero elernents, a unit-norrn constraint is imposed on
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
43
-.
-.
-.
-.
the M LxI concatenated channel response vector h(n) == [h[(n) hf(n) ... ht(n)]
(3.8)
J(n) ==
~ ~
c: .c:
~=l J=~+l
Eij(n) ==
xn
I/h( )1/ 2 .
n
(3.9)
~iYi(n)== Yi(n)y[(n),
LRYiYi(n)
(3.10)
-RY2Y1(n)
-RYMY1(n)
LRyiyi(n)
-R yMy2(n)
i#l
-RYl Y2(n)
R(n) ==
-RYlyM(n)
h(n + 1) =
(3.11)
i#2
-RY2Yl\1 (n)
L Ryiyi(n)
i#M
MLxML
h(n) - 2Jl[R(n)h(n) - x(n)h(n)]
Ilh(n) - 2p[R(n)h(n) -
x(n)h(n)] 112'
(3.12)
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
3.3
44
==
1, ... ,1\1.
(3.13)
+ n]",
(3.14)
(3.15)
(3.16)
(3.17)
eX(n)
+ eV(n)
(3.18)
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
lSI
and employing
E{ e;(n)ev(n)} =
45
be expressed as
E{ eT(n)e(n)}
E{x(n) }
E{Xx(n)}
M-l
where Xx(n)
+ E{Xv(n)},
L L
M-l
i=l j=i+l
L L
i=l j=i+l
where R X i X i
(3.19)
0 and
all the channels are assurned to be independent and identically distributed, i.e.,
R Vi Vi
Rvjvj'
-l,j = 1: ... , M, i
hi(n,)
hi(n)
(3.22)
i
-::I
j,
(3.23)
cannot be achieved sirnultaneously. Therefore, the presence of noise introduces disturbance to the JVICLJ\1S algorithrn.
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
3.4
46
In order to overcome the problem due to the presence of noise as described in (3.22)
and (3.23), it is proposed to estimate the channels by solving a constrained optimization problem given by
(3.24)
where h(n)
constrained optimization problem can address the problem of noise robustness, (3.24)
can be expressed as follows
.
~lln
~T (n)Yi(n)YiT (n)hi(n)
~ }
E {~~
LJ LJ hi
hen)
min
i=l j=i+l
I: t
E{h;(n)Yi(n)y;(n)hi(n)}
min
L L
h;(n)RYiYihi(n)
min
L L
(3.25)
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
47
(3.26)
(3.27)
Similar to the derivation of (3.20) and (3.21), the desired solutions of minimizing (3.27) are hi(n)
hi and hi(n)
= 0Lxl
(3.28)
and
(3.29)
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
==
OLxl
cs
==
Rvjvj'
== R X i X i + R V i V i , ar
(3.31
i=l j=i+l
Similar to lVICLMS as shown in (3.9), a modified cost function exploiting the uni
norrn constraint can subsequently be obtained as
(3.3
\1 J (n)
p
where
a~(n)
Bh(n)
~1
IIh(n)1I2
[a~(n)
Bh(n)
- 2J (n)h(n)]
p
(3.3~
,
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
49
(3.33)
where
(3.34)
(3.35)
Ra
R Y1Y1
Ry2Y2
R YM YM
(3.36)
!'vfLxML
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
L RYi Yi
- RY2Y1
-RyMYl
LRYiYi
-RyMY2
50
i:f;l
- RY1Y2
Rb ==
(3.37)
i:f;2
-R
Y1YM
-Ry2YM
LRyiYi
i:f;M
MLxML
'"
h (n + 1)
==
h(n) - fl[Rb
Ilh(n) -ll[R
b -
where
13' ==
...,..,..-----=----------------''--------:-:--
R YiYj
(3.38)
2: 1,
(3.39)
3.4.1
Convergence behaviour
It has been stated in [20] that the MCLMS algorithrn converges in the rnean if
1
state when n
-t
at the steady-
Ilh(n)112 ==
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
51
and therefore
z(n + 1)
~ x(n)IMLxMLJ)z(n)
(3.40)
E{z(n + I)}
(3.41)
is obtained. It has been stated in [20] [60] that h(n) converges in the mean to the
eigenvectors of R
OMLxML.
(3.42)
(h( n) - h((0))
i.e., E{ R(n)
w(n
+ 1)
(I Al L x Al L
2/LA) w (n )
(IA1 LxNJ L
2j1,A) n W (l ).
(3.43)
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
w(n + 1) ==
0MLxl.
52
== OMLxl, i.e.,
(3.44)
where Ai is the ith diagonal element of A. Since the diagonal elements of A are all
positive, (3.44) is valid when
(3.45)
Following the same concept, in order for IMCLMS to converge in the mean, the
Jl should satisfy 0
<
x(n)IMLxML - P'R a ]
3.4.2
Jl
<
>.Lx'
where
>.:nax
An analysis to explain why IMCLMS can achieve a lower steady-state than MCLMS
in the presence of noise is next provided under the same assumptions. For mathematical tractability, it is assumed in the analysis:
do not share any common zeros [62] [63]. This assurnption is consistent with
the channel identifiability conditions.
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
2. E{hTh} ==
IMLxML
and E{h}
53
== 0MLxl.
and variance
a; [42].
( ) _ Ilh - a(n)h(n)ll~
IIhll~
,
1/J n -
where a(n)
(3.46)
in [64] that the NPM is preferred over normalized misalignment for blind channel
identification applications since it quantifies the closeness of the estimate to the
original channels up to a scaling factor. Defining
u(n)
h - a(n)h(n),
(3.47)
1)/(n) =
II h
(3.48)
where tr{} denotes the trace of a rnatrix. Assurne during the steady-state, the
scaling pararneter n(n) is stable such that n(n
+ 1) =
tion, the tirne dependency factor n of a( n) will be ignored in the following analysis.
Substituting (3.12) into (3.47), the projection misalignrnent of MCLl\1S is obtained
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
54
as
u(n + 1)
h - ah(n + 1)
h - ah(n)
+ 2ILaR(n) -
(3.49)
2ILax(n)h(n)
+ 2p,QR(n)h - 2p,Qx(n)h(n).
00
(3.50)
Since a small step-size is adopted similar to [42], ILX(n) can be regarded as sufficiently
small so that the last term on the right-hand side of (3.49) can be ignored. For clarity
of presentation, the subscript of I in (3.51) will be ternporarily ignored. Exploiting
the independence between R(n) and u(n), the following can be obtained
R u ,n + l
(3.5J
+4p,2E {Q 2R(n)hhTRT(n)} .
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
55
Ru,n+l
OMLxl.
IMLxML
00,
E{R(n) }Ru,n
(3.52)
+ Ru,nE{RT(n)}
2fLQE{R(n) }Ru,nE{RT(n)}
+ 2fLQE{R(n)RT (n)}.
(3.53)
Determination of Rand R
Next, assume a white Gaussian input with zero mean and variance of
E{s(n)} == 0, E{s(n)s(n)} ==
Yi(n-l) as the lth observed signal at the nth frame in channel i, where 0
a;,
i.e.,
Denoting
I
L-1,
E{Ryiyi(n)}
E{Yi(n)y[(n) }
(3.54)
u, (n )Yi (n)
E
LxL
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
56
E{ x;(n -l)}
+ a;.
(3.55)
It is noted that
E { [h[ s( n )] 2 }
+ h;,L-ls(n -
L+ 1)]2}
(3.56)
E {hi,lhi,k} ==
I , I == k;
(3.57)
{ 0, I =F k.
+ a; is obtained. Fol-
lowing the same concept from (3.55) to (3.57) and noting that s(n) is uncorrelated,
E{RYi Yi (n)} ==
+ a v2
0
La s2
+ a v2
== (La;
La s2
+ a v2
+ a~)ILxL'
LxL
(3.58)
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
57
(1\1 - l)(La;
OLxL
(M - l)(La;
+ a~)ILxL
+ a;)IMLxML.
MLxML
(3.59)
E {[xi(n
~ I) + vi(n
-l)t}
(3.60)
E{x;(n-l)}E{v;(n-l)}
== a;a~
and E{vt(n-l)}
== a~.
E{ :];(n)}
==
==
(L
+ 1)].
0 andE{ :1;(n):(;(n)}
==
(L
==
+ 2)a~IMLxML
== [( (]V!
- l)(L
(3.61)
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
5~
+j-la
(3.63:
[( (M - 1 - (3)(L + 2)8; +
68;a;
+a~) ]IMLxML
(3.M
is obtained. Taking the trace of (3.53) and ernploying (3.63) and (3.64), the projectec
misalignment of IMCLMS is obtained as
1/JIM =
Comparing (3.62) and (3.65), it is noted that 1/J~M < ?P~/I because the denominatoi
of (3.62) is greater than that of (3.65) since 0 :S [3 :S 1. It is therefore expected thai
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
59
the proposed IMCLMS algorithm can achieve a lower misalignment than MCLMS.
IMCLMS,
f3 = 0.6
-5
f3 = 0.4
IMCLMS,
co
IMCLMS,
""tJ
~
a...
-10
IMCLMS,
f3 = 0.8
f3 = 0 2
MCLMS
IMCLMS,
\1
~
\ ... ,-
-15
""'-"
2000
..-
---...... ,
..... . - -
4000
6000
8000
10000
--
.~
.,...,..
-20
f3 = 1
12000
,:14000
- --"'"
16000
Number of iterations
Figure 3.2: The NPM performance of MCLMS and proposed IMCLMS algorithms using
Monte Carlo simulations with 100 trials while SNR
and 1 for IMCLlVIS.
3.5
20 dB, and f3
Sirnulat.ion results
1] T
and h 2
[1 -
2 cos( 1r /5)
1] T
[1 -
responses are used to rnodel irnpulse responses for comrnunication applications [20]
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
60
-2
-4
co -6
'"0
c,
-8
MCLMS
IMCLMS
-10
-12
-14
0
5000
10000
15000
Number of iterations
Figure 3.3: The NPM performance of MCLMS and proposed IMCLMS algorithms under
SNR == 15 dB using Monte Carlo simulations with 100 trials while (3 == 1 for IMCLMS.
[60]. In addition, noise is added to the ith channel observations such that the SNR
correspond to 20 dB and 15 dB. Following the sarne simulation setup of [20], a WGN
is chosen to demonstrate the principle of the algorithm. Similar approach can be
found in [66] where the WGN is chosen for diffuse noise to demonstrate the principle
of the algorithm. The step-size adopted for these algorithms is M = 0.01.
Figure 3.2 illustrates the NPM of MCLMS and IMCLlVIS for SNR = 20 dB
and ,3 =0.2, 0.4, 0.6, 0.8 and 1 for IMCLMS, respectively. Each of these plots is
generated from Monte Carlo sirnulations and averaged across one hundred trials.
Different ;3 values are employed in order to verify the significance of the constraint
shown in (3.30). As can be seen frorn Fig. 3.2, the steady-state performance of
IMCLMS improves as (3 -+ 1 and gains an improvernent in NPM of approximately
5 dB when ,3 == 1. Figure 3.3 shows another simulation using the sarne simulation
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
61
-co
MCLMS IMCLMS,
-5
~ = 0.4
IMCLMS,
-c
~ =0.6
~ -10
0..
ClJ
......,
ro
tr>- -15
-c
ro
IMCLMS, ~
ClJ
......,
Vl
= 0.8
-20
IMCLMS, ~
-25L--
------l.-
10
=1
---L-
15
SNR (dB)
...l.---
20
Figure 3.4: Variation of steady-state NPM for MCLMS and the proposed IMCLMS algorithms using Monte Carlo simulations with 100 trials from SNR = 6 to 24 dB.
be observed frorn Fig. 3.3 that the proposed IMCLMS algorithm achieves a lower
steady-state and gains an improvement in NPM of approxirnately 4 dB compared
to MCLMS.
Figure 3.4 illustrates the variation of steady-state NPM with SNR for different
f3. Similar to the above, these results were averaged over 100 trials and the
same parameters are adopted as that used to generate Fig 3.2. As can be seen, the
proposed IMCLMS algorithrn consistently outperforrns lVICLMS by achieving lower
steady-state NPlVI values. In addition, the steady-state NPl'vI value of IMCLMS
reduces as ,3 --+ 1 which justifies the need for the constrained cross-relation cost
function to reduce the noise disturbance effect as described in (3.20) and (3.21).
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
62
-8
P=1.2
-10
P=1.4
-12
iii' -14
2~
~ -16
P=1.0
-18
-20
-22
10
12
16
14
18
20
SNR(dB)
Figure 3.5: Comparison of NPM of the proposed IMCLMS algorithm with (3 == 1.0, 1.2
and 1.4 using Monte Carlo simulations with 100 trials from SNR == 10 to 20 dB.
= 1.0, 1.2 and 1.4. As can been seen from Fig. 3.5,
fJ = 1.2 are smaller than those of ;3 = 1.4, which irnplies that a larger} leads to a
poorer performance of IMCLMS when the j3 > 1. This is actually consistent with
the convergence analysis presented in (3.65). It can be observed from (3.65) that a
lower NPM will be achieved as j3 increases within the interval [0, 1]. However, the
NPM performance will start to degrade once (3 > 1 because the denorninator will
start to increase. When /J > 1, a larger rJ will lead to a larger denorninator resulting
in a poorer NPM perforrnance.
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
3.6 Conclusion
3.6
63
Conclusion
In this chapter, the cost function of the MCLMS algorithm is analyzed in a noisy
environment and showed that the additive noise can mis-lead the adaptive algorithm to a trivial solution. This trivial solution requires the estimated channels to
be equivalent which contradicts to the desired solution since the channels are often
distinct in practice. In the proposed IMCLMS algorithm, this issue is addressed by
solving a constrained cross-relation cost function. It has been shown through mathematical derivation that minimizing such a constrained cost function can mitigate
the cross-relation error due to noise thus achieving noise robustness. Misalignment
analysis has also been provided for MCLMS and IMCLMS under the same assumptions, which explains why the proposed IMCLMS algorithm can achieve a lower
NPM value. This is due to the additional constraint in the proposed IMCLMS algorithm can rnitigate the cross-relation error due to noise. Monte Carlo simulations
under different SNRs have verified that the proposed IMCLMS algorithm is more
robust to noise and can outperforrn lVICLMS by achieving an improvement in NPM
of approxirnately 5 dB.
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
64
Chapter 4
Adaptive Blind Room Impulse
Response Estimation Algorithms
for Speech Dereverberation
4.1
Review
of Speech
Dereverberation
Algo-
rithms
The primary aim of speech dereverberation is to recover a source signal that has been
distorted by the multipath effect in an enclosed environment [63]. The problem of
speech dereverberation is often blind since the source signal is unknown. There
are many different ways to classify speech dereverberation algorithms. From the
perspective of the nurnber of rnicrophones deployed, speech dereverberation can be
classified into single- or multi-channel approaches. On the other hand; speech dereverberation methods can also be categorized into (i) source rnodel-based speech dere-
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
65
LPC coefficients
D
Will
Figure 4.1: Simplified block diagram of source-filter model of speech production (after [2]).
4.1.1
s(n)
L aps(n - p) + Gb(n),
(4.1)
p=l
where b(n), G, P and ap are the noise/pulse input, gain of the input, prediction
order and prediction coefficients of the all-pole filter respectively. Equation (4.1)
is the well-known difference equation for linear-prediction coefficient (LPC) which
states that the value of the present output speech signal s( n) can be deterrnined by
summing Gb(n) with a weighted sum of the past output samples. The coefficients of
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
66
Microphone received
signal
Distorted speech
residual
Dereverberated
speech
x(n)
LP residual
enhancement
LP coefficient
Figure 4.2: General block diagram of an LP enhancement method for speech dereverberation.
0]
the original and estimated source, both autocorrelation and covariance methods
minimize the sum squared error within a time interval. The main difference between these two methods is that the autocorrelation rnethod introduces the Toeplitz
autocorrelation matrix which in turn gives rise to computational efficiency.
According to the source production rnodel discussed above, s(n) is producec
by an all-pole filter excited either by a pulse train or a random noise signal. Assuming
that an AIR can be expressed as a finite impulse response (FIR) filter which consists
of only zeros in its z-transform [51], it is foreseeable that roorn reverberation wil
introduce only zeros to the signals received by the rnicrophones. For this reason
reverberation will affect the excitation sequence hut not the all-pole filter in thr
speech production model [67]. In order to perforrn speech dereverberation, this clasi
of linear-prediction (LP) residual enhancernent based algorithms, such as presentee
in [9] [70] [71], modifies the speech excitation signals leaving the LP coefficient:
unaffected.
A block diagram for this class of algorithrn is depicted in Fig. 4.2. The residua
of a clean speech signal is desired to be a well-structured pulse train and a noise-lil
signal for voice and unvoiced region, respectively. For a reverberant speech however
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
67
the pulse train structure will be smeared [71]. Therefore, one can clearly identify the
distorted region from the LP residual e(n) of the received signal. Several methods
have been proposed to enhance the LP residual. In [9], the authors proposed to
compute a weighting function derived from the multiplication of a short-time fine
weighting function (with frame size of 2 ms) and a long-time weighting function (with
frame size of 20 ms) of the residual signal. The modified LP residual is designed
to emphasize high signal-to-reverberation ratio regions while attenuating regions of
low signal-to- reverberation.
Following similar concept, the authors of [72] proposed to reconstruct the
residual signal using the Hilbert envelope weighting. For an LP residual signal e( 11,),
its Hilbert envelope is defined as eHE(n)
==
J e2(n) + e~(n),
where
eH
denotes the
Hilbert transform of e(11,) obtained by switching the real and imaginary parts of
the discrete Fourier transform (DFT) of e( 11,) and subsequently taking its inverse
DFT [73]. It has been studied in [72] that the Hilbert envelope has larger amplitude when there is a strong excitation therefore making it a good indicator for a
voiced speech. Applying the Hilbert envelope weighting can enhance pulse train excited voiced speech thus leading to a less reverberant speech signal. An alternative
method airned at reducing the unwanted peaks in the residual signal is presented
in [74] through the use of wavelet clustering. The idea revolves around clustering
rnultichannel residual signals according to their wavelet extrernes and subsequently
finding an averaged single-channel residual signal. l\!loreover, a code excited linearprediction
((~ELP)
for those extracted pararneters used in either the speech envelope or excitation signal reconstruction. An alternative method presented in [75] modifies the residual
signal by exploiting its kurtosis. Since a reverberant speech signal is a mixture of
the direct-path and delayed speech signal, it will be close to a Gaussian-distributed
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
68
4.1.2
Harmonic filtering
Another approach to source model-based speech dereverberation is via the use of harrnonic filtering. Instead of utilizing the speech production model, this class of speech
dereverberation techniques exploits the characteristic of a speech signal which assumes that it is composed of a fundamental frequency and a series of harmonics [76];
the speech signal sounds clear if all the harmonics correspond to the multiples of
the fundamental frequency components. Following this principle, the authors of [77]
proposed to decorrelate the speech signal by suppressing the non-harmonic components. However, it should be noted that estimating the fundamental frequency
can be challenging. Furthermore, the unvoiced speech segments are not analyzed or
processed. As a result of these shortcornings, algorithms based on such techniques
may achieve limited performance in dereverberation.
4.1.3
The use of blind channel identification and inverse filtering for speech dereverberation has attracted a lot of interest recently. The rnotivation of this approach relies
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
69
on the fact that reverberation can be modeled by the convolution of a clean speech
signal and the AIR from the source to the distant microphone. This approach involves a two-step process-estimating the AIR and inverse filtering of the received
signal with the estimated AIR. Estimation of the AIR is a blind problem since the
source signal is unknown and only the reverberant speech signal received at the
microphone is used for AIR estimation. Algorithms for acoustic BSI have gained
much research interest in recent years due to the advent of multimedia signal processing and wireless communication systems. For the case of multi-channel speech
dereverberation, the estimated AIRs are then used to design equalization filters in
order to mitigate reverberation introduced by the AIRs. This chapter involves the
development of BSI algorithms for dereverberation. Review of existing techniques
for BSI will be presented in Section 4.2.
4.1.4
pression
It is also interesting to note that reverberation can be categorized into early reflections and late reverberation where the early reflections are perceived to reinforce
the speech and the late reverberation is known to degrade the intelligibility of the
original speech [78] [79]. In view of this, reverberant speech can be sepatated into
two parts, i.e., early speech and late reverberant speech [80] [81]. While rnost of
the speech dereverberation algorithrns have been developed to recover the anechoic
signal, many algorithms have been developed recently to mitigate the effect of late
reverberation via spectral enhancernent. In [82], the author proposed to adopt a
statistical rnodel of late reverberation and subsequently estirnate the power spectral
of the late reverberant speech. As a result, an estirnate of the clean speech signal can
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
70
be obtained by magnitude spectral subtraction [83]. As has been stated in [84], the
power spectral density (PSD) estimation method described in [82] is only suitable
in noise-free data frames. In [84], the authors showed how the PSD of late reverberant speech can be estimated by exploiting an Optimally Modified Log Spectral
Amplitude estimator [85] in the presence of noise. To further enhance estimation
of the late reverberant spectral variance, a statistical model which incorporates the
energy contribution of the direct-path has been proposed in [86].
4.2
Estimation of AIRs can be achieved via BSI which was first proposed by Sato for the
purpose of equalizing communication channels [54]. Since then, many BSI algorithms
have been proposed and they can broadly be classified into second-order statistical
(SOS) and higher-order statistical (HOS) rnethods. These rnethods can be further
sub-divided into adaptive and non-adaptive approaches.
Typical non-adaptive approaches include the subspace rnethod [87] from
which only an eigen-value decomposition (EVD) is needed rnaking the algorithm
computationally efficient and attractive for the equalization of narrow-band signals.
In this subspace method, the orthogonality property between the 'signal' and 'noise'
subspace is exploited and a quadratic cost function is rninirnized such that the desired unknown impulse filter coefficients are estimated up to a scaling factor. As
discussed in [88], the rnain limitation of subspace rnethods is the lack of robustness
to the over estimation of the channel order. Furtherrnore, the tirne-varying property
of an AIR implies that it is necessary for the algorithrn to track the changes of the
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
71
AIR.
In view of the above, adaptive algorithms have drawn a lot of interest in
recent years. It should be noted however that a cost function based on the HaS
is barely concave which, as a result, suffers from slow convergence and can lead
to a local minimurn in the presence of noise [21]. On the other hand, many SOS
based algorithms can achieve, to some extent, good BSI performance. These algorithms include the linear prediction based subspace (LP-SS) algorithm [89] and the
two-step maximum likelihood (TSML) algorithm [90]. The first step of the TSML
algorithm yields an exact solution of the AIR by determining a unique null vector
of the covariance rnatrix derived from the channel output signals. The second step
can be considered as another iteration with an additional weighting constraint. The
authors of [90J did not propose further iterations beyond the second step due to its
high cornputational load and that the accuracy of the algorithm might not improve
asyrnptotically. It has been subsequently shown in [91] that although these algorithms can approach the Cramer-Rae bound, LP-LS and TSML require substantial
of input data in order to achieve convergence.
As discussed in [91]; a blind-channel identification algorithm needs to satisfy
three design requirernents: fast convergence, adaptability to variations in the AIR
and computational efficiency. In order to cater for these design constraints, one of
the adaptive algorithrns based on SOS is the normalized rnultichannel frequencydornain least-mean-square (NlVICFLlVIS) algorithm [21]. The NMCFLMS algorithrn
is a frequency-dornain extension of l\1CLl\1S described in Chapter 3. The advantage
of Nl\1CFLMS over
YICL~S
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
Input
Additive
noise
Channels
72
Observations
v1(n)
Yl(n)
Y2(n)
sen)
vM(n)
xM(n)
YM(n)
fast Fourier-transform (FFT) [21] [25]. Due to its cornputational efficiency, another
advantage of NMCFLMS is that it can estimate AIRs that are longer. Therefore,
the NMCFLMS algorithm is more practical and popular than MCLMS for BSI.
However, as will be shown through simulations in Section 4.2.1, the NMCFLMS
algorithm suffers from misconvergence when the observation is contaminated with
additive noise. In this chapter, a noise robust AIRs estimation algorithm is proposed.
The proposed direct-path NMCFLMS with power constraint (DP-NMCFLMS-PC)
algorithm can also achieve high rate of convergence.
4.2.1
i == 1,2, ... , M~
(4.2)
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
73
h.
[h1" 0 h,1" 1
1,
...
h,1" L_l]T ,
s(n)
+ l)]T,
such that hi is the ith channel AIR and L is length of each AIR. The additive noise
E{ vi(n)s(n)}
T)} == 0 where I
=1=
==
E{vi(n)Vj(n)}
0 and that
==
0 and
E{vi(n - l)s(n -
NP~;l
defined in (3.46).
As discussed above, the NMCFLMS algorithrn is a frequency-dornain extension of MCLMS which has been reviewed in Chapter 3. This implies that NMCFLMS
exploits the same cost function as described in (3.9) but implements it in the frequency dornain. To describe the implementation of NMCFLMS, the L x L identity,
null and Fourier matrices are defined by I L x L ,
OLxL
elernent of F Lx L is given by
(F LxL )p,q ==
e-j27fpq/ L
p, q == 0, ... ,L - 1.
(4.3)
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
74
SNR=25
-1
~
-2
SNR=30
OJ
"'C
""'--"
~
o,
-3
-4
SNR=infinite
-5
_6L----------l...-----------'-----------J
100
50
150
Time (5)
Following the notation of [92], the following matrices are also defined
WID
2LxL
[ILxL OLxL]T,
(4.4)
DI
W 2LxL
[OLxL ILxL]T,
(4.5)
Lx2L
[lLxL OLxL],
(4.6)
DI
W Lx2L
[OLxL lLxL],
(4.7)
WID
2LxL
F 2Lx2L WID
F- I
2LxL LxL'
(4.8)
DI
W 2LxL
I
I
F 2Lx2L WO2LxL
F-LxL'
(4.9)
WID
2Lx2L'
(4.10)
DI
W Lx2L
I
F LxL W DI
2Lx2L F2Lx2L'
(4.11)
WID
Lx2L
LxL
WID
2Lx2L
F- I
hi (m)
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
75
W~1XLhi(m),
(4.12)
+ 1) Yj(mL + L - 1)]T}
(4.13)
such that Vyj(m) is a 2L x 2L matrix with diagonal elements containing the Fourier
transform of the tap-input vector Yj(m) and m is the frame index. As has been
shown in [21], the update equation of the ith channel AIR is
(4.14)
where
(4.15 )
It has been investigated in [26] that the NMCFLMS algorithrn lacks robustness to the additive noise vi(n). Figure 4.4 shows an illustrative example of how
NPlVI varies with tirne for Nl\1CFLMS in the presence of noise. The algorithrn was
evaluated using AIRs each of length L == 512 generated from the method of irnages [51] with 1\1 == 5 rnicrophones and sampling frequency
Is ==
these tests, vi(n) is added, as described in (4.2), giving SNRs of 35,30 and 25 dB. As
can be seen, the
N~1CFLMS algorithm
S~Rs.
the estirnated AIRs deviate frorn the true AIRs and it is therefore expected that
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
76
4.2.2
To address the problem of noise robustness of the NMCFLMS algorithm, the directpath NMCFLMS (DP-NMCFLMS) algorithm was proposed in [26]. This algorithm
prevents misconvergence by constraining the direct-path component of the AIRs to
that of the true AIRs. To describe DP-NMCFLMS,
hi,dp
component of the true AIR of the ith channel such that its elemental position within
hi (m) is determined by the distance between the source and the ith microphone,
the DP-NMCFLMS algorithm constrains the estirnated direct-path cornponent hi,dp
using the following equalities
[0 0
o hi,dp
hi,dP(m)
0... 0]
---.
hi,L-l
(4.16)
(m)
] T
hi(m,) + ~hi(rn),
F2Lx2L [ hi(m) ] ==
OLxl
(4.17)
F2Lx2L
W~~xLhi(m).
(4.18)
(4.19)
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
77
where
pdPi(m) + 5I2LX2L]-I,
(4.20)
L1)~j(m)~~:(m) - PI[Pi(m)
j=1
+ 5I2Lx2L]-I,
(4.21)
L
j=1
; (rn)f1~~:(m),
10
(4.22)
---.
1)u. (m) W
10
---.
2LxLf1hj(m):
(4.23)
F LxLf1h i (m),
(4.24)
M
Pi(rn - 1) + (1 - (}) L
1)~j (m)1)Yj (m)
j=l,j=i
(4.25)
given that PI is the step-size, 5 is the regularization parameter and {} is the forgetting
factor.
It is therefore irnportant to note that the direct-path constraint addresses
the rnisconvergence of NM CFLMS by constraining the direct path of the estirnated
AIR
hi,dP
hi,dp
4.3
II h(m) 112
with
h( m)
==
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
Vi (n)
78
achieve an SNR==25 dB. In addition,the same simulation settings are adopted as that
used in Fig. 4.4, in which the AIRs are generated from the image model [51] with a
room dimension of 4 m x 5 m x 6 m using a reverberation time T60 == 300 ms with
a sampling frequency
is ==
16 kHz. As can
be seen from Fig. 4.5, Ilh(m)ll~ estimated from the NMCFLMS algorithm reduces
towards zero with time. Comparing with simulation results shown in Fig. 4.4, the
NMCFLMS algorithm misconverges at SNR == 25 dB while the NPM 1jJ(m) defined
by (3.46) approaches zero with time. This effect can be explained from
T/'o.
'ljJ(m) -* 0
h _
h h~) h(m)
hT(m)h(m)
/'0.
h(m) -* 0,
or
h T'''
h(m,) -* O.
(4.26)
(4.27)
Hence, it is important to note that after rnisconvergence, either the estimated (concatenated) AIR
h(m)
hTh(rn) reduces to
o.
in (4.26) is a sufficient but not a necessary condition of the trivial solution as presented in (3.21) for the case of time-domain MCLl\1S. This explains the difference
between the perforrnance degradation of MCLJVIS and rnisconvergence of NMCFLMS
in the presence of noise. To address the rnisconvergence of NlVICFLMS due to the
null estirnate, one possible solution is to adopt a power constraint at each iteration.
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
79
25
til
ex:
20
'"0
OJ
+-'
ro
E
+-'
til
OJ
15
'+-
NMCFLMS
';=
10
_N
'"0
OJ
~
ro
:::s
C"'
V')
a
50
100
150
200
250
Time (5)
divided by its 12-norrn after each iteration, i.e., h(n) == h(n)/lIh(n)112' such that
Ilh( n) 112
As can be seen,
This is
hi,dp
is
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
80
-0.5
-1
DP-NMCFLMS with unit norm constraint
~ -1.5
~
0..
-2
-2.5
-3
DP-NMCFLMS
/
50
100
150
Time (s)
Figure 4.6: Illustration of NPM results from direct-path and unit-norm constraint NMCFLMS at SNR==25 dB.
often large in terms of magnitude compared to other coefficients of the AIR. Figures 4.7 (a) - (c) show the true AIR, of the first channel, the estimate of this AIR,
using DP-NMCFLMS with and without unit-norrn constraint, respectively. As can
be seen from Fig. 4.7 (b), with the unit- norrn norrnalization, the direct-path cornponent
hi,dp
whole AIR. As a result, the effect of direct-path constraint is reduced giving rise
to rnisconvergence as shown in Fig. 4.6. It is interesting to note that the NPM
approaches 0 dB despite the unit-norrn constraint being applied. Although the unitnorm constrained NMCFLMS does not lead to a null vector since the norrn of hi(m)
is constrained to be unity, it leads the DP-NJ\!ICFLJVlS algorithm to another trivial
solution as shown in Fig. 4.7 (b). This set of
h( m)
which satisfies (4.27). It was subsequently found, in this simulation example, that
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
81
-0.02
0
100
200
300
400
500
600
o. 2
o
r----..-----~............,__-----.~~_,___~~~______,___-~~~.,_-~~_____,_-~--~----,
-0.2
0
o. 02
100
200
300
400
600
500
r--~----,---~---~-~~---~~~--~~~--__,__~----c
r-----~-~~~ ",~-I''____J'~'J'----.-~-~~~r~~~-~--~~-~~
-0.02
0
100
200
300
400
500
(c)
600
Figure 4.7: (a) True h, (b) h(m) using DP-NMCFLMS with unit-norm constraint, (c)
h(m) using DP-NMCFLMS at SNR==25 dB.
hTh(m) =
0.0054 giving
orthogonality between
4.4
[llh - h:~<:)h(m)112/lIhI12] =
Algorit.hrnic derivation
IIhi (Tn) II ~
= {); (m),
where {)i(rn) is a scalar denoting the constrained power at the mth iteration. The
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
82
min
h
L L
i=l
s. t.
[y[(rn)hj(rn) - yJ(m)hi(m)f,
(4.28)
j=i+1
Following the similar variable and matrix notations as defined in (4.4)-(4.7), another
matrix is defined as
10
W2Lx2L -
ILxL OLxL ]
(4.29)
0LxL 0LxL
such that pre-multiplying this matrix with a vector will null out its last L elements.
Similar to that of DP-NMCFLMS and applying the concept of Lagrangian multiplier,
the update equation is derived for the proposed DP-NMCFLMS-PC algorithm as
(4.30)
where 'Pi(m) is defined as the vector rotation factor, A, Band C are defined in
(4.20) - (4.22), respectively.
Comparing (4.30) and (4.19), the additional terrn cpi(1n)W~x2L modifies the
first term on the right hand side of (4.30) leaving the term ABC unaffected. In
addition, by constraining IIhi(m)ll~ == lJ;(m), the term CPi(m)W~X2L irnposes a
power constraint by rotating
h( rtL)
towards
h( m + 1)
IIhi ( m) II~ == '13;( m) at each iteration [93]. It is therefore irnportant to note that the
proposed constraint not only constrains
IIhi(rn) II ~
79 i (rn) but also rotates the estirnated AIRs towards the true AIRs. To solve for
'Pi( m), let
(4.31 )
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
83
73;(m)
A closed-form solution of <Pi(m) can therefore be achieved by solving for <Pi(m) giving
where Q
~ 19;(m)) / 19;(m)
(4.33)
NMCFLMS-PC algorithm. The proposed algorithm therefore contains not only the
direct-path constraint which has been exploited to increase the robustness to noise
but also a power constraint by rotating the filter coefficients along the tangential
surface of
4.4.1
Since 'l9;( m) is required in (4.31), it is therefore important to determine its value for
the proposed DP-Nl\1CFLl\1S-PC algorithm. This value should be estirnated such
that the power of the estirnated AIRs should be close to that of the true AIRs.
To address this, 19;(rn) can be estirnated by computing
IIhi (m ) II~
when
hi(m,)
is
close to hi. In practice however, one often does not have any information about the
true AIRs. Therefore, it becornes impractical for one to cornpute the NPM of the
algorithrn such as shown in Fig. 4.8 (a) in order to deterrnine how close the estirnated
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
84
(a)
:: [
I :
:
~ 5:
-50
200
(b) :
50
100
~~f I :
o
:
150
100
50
150
200
(c) :
50
100
50
100
250
J
250
150
200
250
150
200
250
Time (5)
Figure 4.8: (a) NPM of NMCFLMS and variation of (b) IIh(m)II~, (c) ~llh(m)lI~ and
(d) cost function J(m) with time at SNR==25 dB.
AIRs are to the true AIRs. An online cost function flattening estimation (CFE) was
first proposed in [92] to address a similar problem, where the cost function is defined
as
M-l
J(m) =
L L
[y[(m)hj(m) - y;(m)h;(m)r
(4.34)
i=l j=i+l
It was found in [92] that the flattening point of J(m) corresponds to the misconvergence point of NM CFLMS. However, since a typical speech signal often consists
of a large dynamic range, J ('In) suffers from large local fluctuation as shown in
Fig. 4.8 (d). Figure 4.8 (b) shows how Ilh(m)lI~ varies with time and it varies more
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
85
Ilh(m)ll~
(4.35)
as the change in Ilh(m)II~. Figures 4.8 (a)-(d) show the correspondence between
the misconvergence point of NPM and the flattening point of Ilh(m)II~, ~llh(m)ll~
and J(m). It is observed that both the time instant corresponding to the flattening
point of ~lIh(1n)ll~ and Ilh(1n)lI~ are aligned with that of the misconvergence point
of NPM but the former approximates the actual time-instant of the rnisconvergence
point mme better. More importantly, it can be observed that Ilh(m)ll~ and ~llh(m)lI~
fluctuate less significantly than J(m). An accurate estimation of mme is crucial since
it is desired that
79; (m)
1~llh(m)II~1 <
0.0025
(4.36)
to occur. Once this condition is satisfied, Ilh(m)II~ ~ Ilhll~ and hence, the power of
estimated AIR of each channel will be confined by (4.31). Furthermore; the power
for each channel is constrained to
(4.37)
for m > mme, where Ilhi(mme)ll~ is the corresponding power of the ith channel AIR
at the mmeth iteration. Figure 4.9 illustrates an exarnple of how Ilh(m)ll~ varies
with tirne for the proposed DP-NMCFLMS-PC algorithm. In this sirnulation, the
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
86
25
VI
0:::
20
""0
OJ
......
co
Proposed DP-NMCFLMS-PC
..j3
V)
OJ
15
4-
0
'i= 10
_N
NMCFLMS
""0
co
::J
cr
V')
--------------------
--~------------------------
50
100
150
200
250
Time (s)
same simulation setup as that used to generate Fig. 4.5 is adopted. As can be seen
from Fig. 4.9, after the vector rotation power constraint, unlike the NMCFLMS
algorithm, Ilh(m)lI~ is prevented from converging towards zero.
4.5
Simulation results
The performance of the proposed algorithm is evaluated using AIRs generated from
the method of images [51]. The dimension of the roorn is taken to be 5 In x 6 m x 5 m
and a linear array consisting of M = 5 rnicrophones with a uniform separation of
0.8 m is deployed. The array center is located at (1.6,3,1.6) m while the first
111
= 400 Ins is used and the true AIRs are each of length
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
87
-1
NMCFLMS
-2
co -3
DP-NMCFLMS
"'C
sc,
-4
-5
-6
~proposed DP-NMCFLMS-PC
-7
100
50
150
me
200
L = 512. In all cases, the source signal is a male speech sampled at 16 kHz.
Figure 4.10 shows the perforrnance of DP-NMCFLMS and the proposed DPNMCFLMS-PC algorithrn with
mmc
0.09 at an SNR
mmc
is to compare
fn mc
mmc
is avail-
mmc
vergence point is detected at approxirnately 33 s and the power constraint for each
by the proposed DNE detection algorithrn. It can be noted from Fig. 4.10 that
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
88
-1
NMCFLMS
-2
co
"'0
io,
-3
DP-NMCFLMS
-4
Proposed DP-NMCFLMS-PC
me
-5
~Proposed
100
150
Time (5)
mme
obtained through direct observation and that the performance of the proposed DPNMCFLMS-PC algorithm based on these two
mme
same. It can be seen that the proposed DP-NMCFLMS-PC algorithm with DNE
achieves a higher rate of convergence compared to DP-NMCLFMS [26]. More specifically, the proposed algorithm is able to reach its steady-state performance in less
than 40 s compared to DP-NMCFLMS which requires more than 200 s to achieve
its steady-state.
Figures 4.11 and 4.12 show additional results with SNR == 30 and 25 dB respectively while the step-sizes for both two algorithms are chosen as PI == 0.08. The
misconvergence points are approxirnated by the proposed DNE rnethod at about
28 sand 17 s for the cases of SXR,
==
are consistent with the fact that rnisconvergence occurs earlier with a lower SNR as
shown in Fig. 4.4. In Fig. 4.11, the power constraint for each channel is estirnated as
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
89
NMCFLMS
-1
OJ
-2
~ -3
z
DP-NMCFLMS
/
-4
-5
50
100
150
200
250
Time (5)
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
90
4.6 Conclusion
4.6
Conclusion
DP-NMCFL~1S-PCalgorithrn
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
91
Chapter 5
A Sparseness-Constrained
Channel Equalization Algorithm
with Application to Speech
Dereverberation
5.1
Introduction
As discussed in Chapter 4, reverberation occurs when an acoustic signal propagates in an enclosed environrnent. It is well known that reverberation reduces the
intelligibility of the speech as well as degrading the performance of an autornatic
speech recognizer [94], particularly for hands-free rnobile devices. One possible way
to address this problern is to deploy speech dereverberation algorithrns via estimation of AIRs and channel equalization. This class of dereverberation algorithms
is popular because of its computational efficiency and its potential ability to deal
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
5.1 Introduction
92
with long reverberation time [21]. This chapter deals with the second stage of the
process-estimation of inverse filters for channel equalization.
By inverse filtering the reverberant speech signals with the estimated AIRs,
a good estimate of the original signal can be achieved. It is therefore important to
achieve a good estimate of the inverse filters for channel equalization. Approaches
for single-input/single-output (SISO) as well as SIMO equalization techniques have
been proposed. Algorithms developed for single-channel equalization include singlechannel least-squares (SCLS) and homomorphic equalization [95] [96]. In SCLS, an
adaptive algorithm minimizes the squared error between the outputs of the inverse
filter and a desired system while in homomorphic inverse filtering, the AIR is first
decomposed into minimum phase and all-pass components. An inverse can then be
estimated for the minimum phase component while the all-pass component is equalized using a matched filter [96]. However, it was found that using a matched filter
for the equalization of the non-minimum phase component will result in audible
residual echoes [97]. In addition, the least-square error (LSE) inverse filters often
require rnany coefficients which, in turn, introduces significant delay [94]. It was
subsequently concluded that although SCLS achieves less accurate inversion, it is
more efficient in practice [95]. One of the most popular multi-channel equalization
algorithms proposed for dereverberation is the use of rnultiple-input/output inversion theorem (MINT) [98]. In the context of roorn acoustics, achieving an exact
inverse of AIR is challenging since an AIR is often non-minimum phase [99]. The
MINT algorithm addresses this problem and estimates the inverse filters by exploiting spatial diversity using rnultiple rnicrophones. It is further shown, using Bezout
theorem, that as long as the AIRs are co-prirne, the SIMO system is irreducible and
there exists a set of inverse filters which can subsequently be used to recover the
source signal [67].
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
93
5.2
The estimation of a single-channel inverse filter is first reviewed by considering a single-input single-output system as shown in Fig. 5.1, where gl ==
[91,0 gl,l .. 9 1,L g - 1] T
If gl is an exact
a; ==
91,L g - 1 - l h1,l
l=O
when k
==
otherwise,
Ld
+ 1,
(5.1 )
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
94
Output
Input
hI
gl
FIR filter
FIR filter
and L d is the arbitrary delay [97]. Speech dereverberation can be achieved under
this condition by convolving received signal
x(n) * 91
Xl (n)
with 91 since
s(n)*hl*gl
s(n)
* dk
(5.2)
(5.3)
where
o
o
HI =
o
o
is the (L
+ Lg -
(5.4)
hl,L-l
hl,o
hl,L-1
hl,L-l
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
95
(5.5)
where HI(z) and GI(z) are the z-transforms of h- and gI, respectively.
When
(5.6)
The inverse filter gI can subsequently be found by the inverse z-transform of G I (z).
Although the above can be implemented in theory, equalization of a singlechannel AIR is not straightforward in practice since an AIR is often non- minirnum
phase [99]. As a result, (5.6) does not provide a stable causal solution for G I (z). An
alternative way to estimate the inverse filter gI is through the use of LSE adaptive
algorithm. However i in order to minimize the error, an inverse filter with high order
is required [94] which in turn translates to a high processing delay. Defining
(5.7)
in which
gl (n) =
regardless of the order of the inverse filter since the single-channel AIR is nonminimum phase [98].
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
5.3
96
In a multichannel acoustic system, the exact inverse of the system can, in theory,
be achieved based on a single-input J\;1-output system as shown in Fig. 5.2 [100]. In
order to achieve this, defining i as the channel index and M is the total number of
channels, Hi (z) and G i (z) rnust satisfy the relation
M
D(z) ==
L Hi(z)Gi(z) == 1,
(5.8)
i=I
where D(z), Hi(z), and Gi(z) are the z-transform of d, hi and gi, respectively.
Although MINT does not require the AIRs to be minimum phase, it requires Hi(z),
i
== 1,2, ... ,M, to be co-prime such that exact inverse of the system can be obtained
using
(5.9)
where 1[,
== [HI H 2
...
Hi(z), i == 1,2, ... ,AJ, are co-prirne becornes invalid. As a result, the convolution
rnatrix 1[, becomes rank deficient leading to an unreliable matrix inversion in (5.9).
Results presented in [40] showed a degradation in equalization performance of the
MINT algorithm in the presence of such near-common zeros.
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
97
s(n)
5.3.1
J(n)
where
g( n)
==
lid -llg(n)II~,
(5.10)
Y'J (n)
oJ(17,)
a( )
9 17,
= 21 d
+ 21
'"
1g(n),
(5.11)
(5.12)
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
98
computational complexity, but also its ability to efficiently adapt to changes of the
AIRs. However, as will be shown in Section 5.5, one of the main weaknesses of this
algorithm is its relatively slow convergence.
5.4
To address the slow convergence of A-MINT, one can exploit the sparseness property
of Kronecker delta function. The sparseness of a vector has been defined in (2.19).
However, as opposed to Chapter 2 where sparseness is defined for the AIR, the
sparseness of d is employed to achieve fast convergence for the estimation of inverse
filters. It is important to note that the target response d which is in the form of a
Kronecker delta function is perfectly sparse. Therefore, the sparseness measure of
the target response, which is constructed using g(n), i.e.,
d(n)
llg(n)
(5.13)
is exploited for the development of the equalization filters. The proposed sparsenesscontrolled A-MINT (SC-MINT) algorithrn is obtained by rninimizing the l2-norm
of the difference between d and d(n) while maxirnizing ~(d(n)), i.e.,
1.
(5.14)
Using a scaling pararneter /3, the constrained cost function can be expressed as
a; (n)
= II d
(5.15)
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
99
To incorporate the above cost function to the adaptive framework, the derivative of
(5.15) is taken with respect to g(n) giving
8Jsc(n)
8g(n)
-21l T d
+ 21l Tllg(n)
~
+2;3 [1 -
a( 1 -
~(d(n)))
~(d(n)) ]
JML g
8g(n)
a( 1 -
~(d(n)))
(J9(n)
,
a ["1-l9(n ) 111 ]
ML g - J A 1 Lg 8g(n)
Illlg(n)112 .
(5.16)
(5.17)
Furthermore, if define
8111lg(n) 11 1
ag(n)
811 1lg(n) 112
8g(n)
a(n)
b(n)
(5.18)
(5.19)
then
+ by + czl
8x
8Jlax
+ by + czl 2
8x
+ by + cz)
-----a
(ax
lax
(5.21)
+ by + czl
a when ax
+ by + cz
{ -a otherwise.
; : : 0;
(5.22)
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
100
(5.23)
81 1 X L g
and applying the above Lemma, one can obtain the L g x 1 vector
(5.24)
where 1
<l
<
obtained using
(5.25)
where the L g x 1 vector ai(n) == [11 X Llhi l,... , 11 X Llhi l]T, 1 ~ i ~ M. The derivation
of b(n) in (5.19) is illustrated as follows
b(n)
(5.26)
II1g('(1,) 1111T1g(n)
lI1g(n) II~
(5.27)
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
101
Substituting (5.16), (5.17) and (5.27) into (5.15), the update equation of SC-MINT
is obtained as
(5.28)
where
o.
p.
This analysis requires one to equate (5.29) to zero and making g(n) the subject of
the equality. However, it is noted that due to the non-linear cornplexity of (5.29)
with respect to g(n), obtaining a closed-form solution of (3 is irnpractical. As such,
f3 is proposed to be determined empirically as will be illustrated in Section 5.5.
Cornparing (5.29) with (5.10), it can be observed that SC-MINT takes into account
the sparseness measure of
d( n).
search space of SC-MINT such that it avoids, at each iteration, solutions where
more than one coefficients in
d( n)
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
5.5
102
Simulation Results
is == 16 kHz
==
==
i == 1, 2~ ... , M.
(5.31)
Figure 5.3 shows the steady-state performance of A-1VII:\"T and SC-MINT in a fivechannel model with CNR
==
==
==
of the sparseness constraint, the variation of steady state does not exhibit a fixed
pattern with different /3 values as shown in Fig. 5.3. It can be observed that with
approximately the same steady-state perforrnance, the convergence of SC- MINT
with /3
==
Figures 5.4 and 5.5 show the convergence perforrnance of A-MINT and SCMINT using the sarne AIRs as those adopted in Fig. 5.3 with CNR
==
30 and 20 dB,
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
103
10
o
-10
f3 =0.02
,,
,,
f3 =0.01
A-MINT
f3 =0.04
(!
f3 =0.08
\,,
-40
f3 =0.05
-50
"
-30
f3 =0.1
f3 =0.09
1000
--f3 =0.06
2000
- --
---
f3 =0.07
3000
... ~-----
f3 =0.03
4000
5000
6000
Number of iterations
Figure 5.3: Steady-state performance comparison between A-MINT and proposed SCMINT when CNR == 35 dB using AIRs generated by the image model with (3 == 0.01,0.02,
... ,1.
respectively.
== 0.05 for
;-J
/-LA-MINT
== 0.03,
/-LSC-MINT
/3 -# o.
rnisalignrnent performance. As can be seen from Figs. 5.4 and 5.5, the proposed
SC-1VIINT algorithm achieves a higher rate of convergence than that of A-MINT; it
achieves a 3 dB and 5 dB irnprovement in normalized misalignrnent over A-MINT
for /3
== 0.02 and 0.05 during initial convergence, respectively. This shows that the
sparseness constraint of SC-lVIL\T confines the search space such that g(n) converges
faster to 9 in order for
d( n)
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
104
5~--r-----,----.----r-----r-----r------,
o
-5
A-MINT
--10
co
""0
SC-MINT,
~ -15
P= 0.02
SC-MINT,
-20
P=0.05
-25
-30L_~==~~:E~~~~~~~~
1000
6000
7000
Figure 5.4: Convergence comparison between A-MINT and proposed SC-MINT when
CNR == 30 dB using AIRs generated by the image model.
ulation setup,
Is == 16 kHz
and AI
- K L
k=O
2( )
~n=kN+N-I
K-I
SRR(n) - 10 ~
10
~n=kN
glO ~n=kN+N-I
~n=kN
sd
[
Sd
) _/'..(
S
)J2'
(5.32)
s(n) is the clean speech, K is the nurnber of frames and N is the length of each
frame. In the recorded AIRs, the direct-path components of various channels (hd
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
105
O~---..,------.-------.--------r-------,
A-MINT
-5
SC-MINT, ~=O.02
SC-MINT, ~=O.05
-15
-20'--------L--------'-------------'------'----------J
o
1
2
3
4
Iterations
Figure 5.5: Convergence comparison between A-MINT and proposed SC-MINT when
CNR
of channels 1-5) have approximately the same magnitude and the mean of thern
has been adopted in (5.32). In this simulation, a rnale speech sarnpled at 16 kHz
with a duration of approxirnately 5 s is used. Sirnilar to all other simulations in
this chapter, a five-channel model is adopted and a WGN is added to hi to achieve
CNR
* rli(n).
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
106
(b)
A-MINT
co
-6
"'C
-
-B]:
/
SC-M INT,
f3 =0.02
-10~
I
-12~
SC-MINT,
f3 = 0.05
r
I
- 14
345
Iterations
7
X
10
Figure 5.6: (a) Recorded AIR; (b) Convergence comparison between A-MINT and proposed SC-MINT when CNR == 20 dB.
a good choice of jJ = 0.02 and 0.05 can be used for the Se-MINT algorithm.
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
107
5.6 Conclusion
20.----------,------.-------.--------,
-.
15
A-MINT
SC-MINT}
SC-MINT}
B= 0.02
B= 0.05
OL----------'------~------'----------J
4
Iterations
6
X
10
Figure 5.7: SRR performance of A-MINT and proposed SC-MINT when CNR == 20 dB
using recorded AIRs.
5.6
Conclusion
The concept of sparseness control was introduced into the estimation of inverse filters for speech dereverberation. For successful AIRs equalization, the convolution
between the AIRs and inverse filters is expected to be a Kronecker delta function
which is perfectly sparse. In the proposed SC- MINT algorithrn, the sparseness of
the Kronecker delta function that is constructed frorn the estimated inverse filters
is rnaxirnized. Subsequently, such sparseness is utilized as an additional constraint
to A-l'vIINT. As a result, the SC-11INT algorithm avoids solutions where rnore than
one coefficients are non-zero for the Kronecker delta function and therefore, a higher
rate of convergence can be expected frorn SC- MINT. Simulation results using AIRs
generated by the rnethod of irnages and recorded AIRs show that SC- ~lINT out-
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
5.6 Conclusion
108
performs A-MINT by offering a higher rate of convergence and can achieve approximately 5 dB improvement in terms of normalized misalignment during the initial
convergence.
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
109
Chapter 6
Adaptive Channel Equalization
Exploiting Segregated Sub-systems
6.1
Introduction
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
110
algorithm.
In this chapter, a new adaptive algorithm for the equalization of AIRs in
a SIMO system is presented. The proposed auto-relation aided MINT (A-RAM)
algorithm [30], which belongs to the category of non-blind channel equalization,
takes into account how the reverberant signals are generated from the convolution
between the source and the AIRs. It has been shown mathematically that for a SIMO
system, speech dereverberation can be achieved by deconvolving the reverberant
signals and their corresponding AIRs. System equalization is achieved by segregating
a SIMO system into two sub-systems such that each sub-system performs speech
dereverberation. These dereverberated signals are desired to be equivalent since
the received signals at different microphones are generated from a comlnon source.
Such source-channel relationship is defined as the auto-relation. The proposed ARAM algorithm utilizes this auto-relation constraint which minimizes the difference
between the dereverberated signals of the two sub-systerns iteratively.
6.2
Algorithrnic developrnent
It begins by first considering how s(n) can be recovered using a two-channel system
with channel indices i and .j, where the word "channel" refers to the AIRs hi and h j .
Similar to Chapter 2, for clarity of presentation, the development of the algorithm is
described for a noiseless case. For this two-channel case, the estimated source signal
corresponding to each channel is given by
Si(n)
xT(n)gi(n) ,
(6.1)
sj(n)
xJ (n)gj(n),
(6.2)
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
111
1--~+4-"
L..--_-----'
51 (n )
sen)
Figure 6.1: Proposed inverse filtering method for a SIMO system with two sub-systems.
+ 1)JT
gi (n)
respectively. Since the received signals of a SIMO system are generated by a cornmon
source, an irnportant relation is obtained
(6.3)
hi (n)
and hj(n) is undesirable since inverse filters gi(n) and gj(n,) in (6.1) and (6.2) do not
norrnally exist due to the non-minirnum phase property of a single AIR. Therefore, a
lvi-channel systern is segregated into two sub-systerns as illustrated in Fig. 6.1 and,
as a result, (6.3) is extended to a rnultichannel systern such that
Nfl
Lx[(n)gi(n)
i=l
Nf
L
i=Ml+l
x[(n)gi(n),
(6.4)
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
112
where M 1 (1 < M 1 < M) is the last channel index of the first sub-system. Since
the output of each sub-system (containing the dereverberated signal) is desired to
be equivalent, the difference between these outputs can be utilized for estimating
the inverse filters. This difference at iteration n is defined as
e(n)
Ml
M
Lxf(n)gi(n) xf(n)gi(n)
i=l
i=Ml+l
xi(n)91(n) - Xr(n)92(n),
(6.5)
the concatenated inverse filters of the first and second subsystems, respectively.
[xIt1+l(n) xIt1+2(n) ... xII(n)]T are the concatenated received signals corresponding to the first and second sub-systems, respectively. Similar to A-MINT described
in Section 5.3.1; a cost function can therefore be defined as
(6.6)
To describe why rninirnizing (6.6) alone will lead to rnultiple solutions, the
auto-relation is utilized
xi(n)91(n)
Xl (n)xi (n)9l (n)
Xr (n)92(n),
(6.7)
Xl (n)Xr (n)92(n),
(6.8)
(6.9)
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
113
Hence
91 (n) depends on
92 (0) .
Although minimizing (6.6) alone may not lead to the desired inverse filters,
(6.6) can be used as a constraint to confine the search space during adaptation for
fast convergence. This confined solution space can be shown by expressing (6.8) as
(6.10)
where R ll
noted from (6.10) that although multiple solutions exist [102], (6.6) confines the
solution of
91 (n)
[R n
RI 2 ] .
Since the two sub-systems are desired to operate sirnilar to two J\1INT frameworks, (6.6) can therefore be incorporated as a constraint to the existing A-J\!IINT
algorithm such that the cost function for the proposed A-RAM algorithrn can be
written as
(6.11 )
where 111 = [HI Hl\Ih] and 112 = [H M1+ 1 ... H A1 ] are the concatenated channel
convolutive matrices for the first and second sub-systems and ;3 is the Lagrangian
rnultiplier. It can be seen frorn (6.11) that the proposed A-RAJ\/I algorithm not only
segregates a 1\1-channel frarnework into two sub-systerns, it also utilizes the autorelation between the two sub-systems to confine the search space of
such that
the nullspace of
[R n
R12].
91 (71,)
and 92(71,)
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
114
(6.12)
where g(n)
[gi(n)
first invoke the linearity property of the \7 operator such that \7 JAM (n) ==
8J AM(n ) ) T ( 8Jk M(n ) ) T ] T where
[( 891 (n)
892(n)
'
(6.13)
(6.14)
6.2.1
It will be explained, through the use of convergence analysis, why A-RAM can
achieve a higher convergence rate than A-MINT. As derived in Appendix A, the
convergence behavior for A-MINT is given by
(6.15)
where
1{
is defined after (5.9) and 9 is the true inverse filter and z(n)
J-L
OMLgXl
==
g(n) - g.
is suitably chosen.
It has been shown in [101] that the utilization of a regularization parameter b
in RMINT (i.e.,
1{T1{
+ bIMLgxNJLg)
1{T1{
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
115
Initialization:
Computation:
gi (0) ==
[1 0
0] T, i == 1, 2, ... , M;
==
1, 2,
for n
9(n
+ 1)
9(n) - /1\7JAM(n),
where
which results in improved performance over MINT. When implernented in an adaptive frarnework, the update equation of adaptive RMINT (A-RMINT) is sirnilar to
that of A<VIINT (defined in (5.12)) but using a different gradient
(6.16)
The convergence behavior of A-RMINT has also been derived in Appendix A and
is given by
It can he noted that the effect of regularization is reflected in the first term on the
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
116
<5
can increase the convergence speed and gain robustness to additive noise.
As derived in the Appendix A, the convergence behavior of A-RAM can be
summarized by
E {zl(n)}
(6.18)
E {z2(n)}
2
-2j1Jj3asICM-Ml)LgxCM-Ml)Lg
]n 92'
(6.19)
where a; is the variance of s(n), 91 and 92 are the true inverse filters of the first
and second sub-systems respectively and zl(n)
== 91(n) -
91' z2(n)
== 92(n) -
92. Comparing (6.18) and (6.19) with (6.15) and (6.17), it is observed that the
1r12. Therefore, it can be expected that A-RAM with a suitable (3 value can
achieve higher rate of convergence than A-MINT.
Comparing (6.18) and (6.19) with (6.17), it can be observed that although
the effect of regularization is present in both A-RAM and A-RMINT, the second
term 2J.L<59 in (6.17) forms a bias and prevents A-RMINT from achieving g(n) ---t g.
It can therefore be expected that A-RAM will achieve a higher rate of convergence
than A-RlVIINT.
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
6.2.2
117
Derivation of closed-form
f3
(6.20)
+ (3e(n)X2(n)] ,
(6.21)
R:;l [1-f d
where R
e 2 (n )
== 0,
the closed-forrn solution for (3 can be determined by substituting (6.20) and (6.21)
into (6.5) and equating it to zero which results in
(6.22)
It is noted that e(n) in the denominator of (6.22) will cancel out with that in (6.20)
and (6.21) therefore preventing the division of a null term when e(n)
---t
o.
It can
also be noted from (6.22) that the time-varying (3cf(n) requires high cornputational
cost due to the need to cornpute R
1
1 and R:;l. This rnotivates the use of a constant
6.2.3
It is important to note, frorn Fig. 6.1, that Sl(n) and S2(n) are derived in the
proposed algorithrn. In addition, (6.13) and (6.14) irnply that the gradients of the
two sub-systems are different. Therefore, it is foreseeable that the performance of
the two sub-systems rnay differ from each other. Since it is desirable and sufficient
to obtain only a single dereverberated signal for the case of a
SI~lO
rnodel, it is
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
118
While it is straightfor-
10 K-1
SDR == K
L lOg10 -~-n-=k-N-:=-;-!!l-[-(-)
s n
k=O
Dn=kN
-_-'''''-(-)J-2'
(6.23)
s n
where k is the segment index of the original and estimated source signals, such
measure cannot be deployed for online applications since s( n) is unknown. To track
the performance of JAM1 (n) and JAM2 (n) is proposed, where
(6.24)
(6.25)
.I~ ==
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
119
40
30
1st sub-system
2nd sub-system
0'-------'-----'-----'---------'----------'
2000
4000
6000
8000
Number of iterations
1"
The SDR, is adopted to quantify the performance of the first and second sub-systems
of A-RAM. Although it is not feasible to employ the SDR rneasure in practice, this
is included to illustrate the correlation between the perforrnance of each sub-system
and convergence of the cost functions
JAM! (n)
and
JANI2 (n).
Figures 6.2 and 6.3 illustrate the performance of the first and second subsysterns in terrns of SDR and the convergence of the cost functions respectively.
As can OC seen from Fig. 6.2, the SDR for the first sub-system is higher than that
for the second sub-system. This implies that the first sub-system outperforms the
second in terrns of dereverberation. As shown in Fig. 6.3,
higher rate and achieves a lower value than
JAM2 (n).
JANIl
(n) converges at a
can select the dereverberated signal corresponding to the sub-system which has a
higher rate of convergence and a lower value of cost function
Ji (n).
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
120
m
~
c
-10
.Q
~ -20~
:J
'+-
:r~T\iTl
t;; -30~
- 40
r
I
I
I
-50 L---_~_~.
o
2000
---------L_ _
4000
6000
8000
Number of iterations
6.2.4
Significance of Jar(n)
To further examine the significance of Jar(n) in (6.11) when the estimated AIRs are
imperfect, the performance of A-RAl\!I via different values of j3 for M = 5 is compared. Sirnilar to Section 6.2.3, the SIMO systern is partitioned into two sub-systems
where the first sub-system comprises the first three channels while the second subsystem comprises the rernaining two channels. The AIRs are generated using the
method of irnages [51] with the same sirnulation setup described in Section 6.2.3.
Then, the performance of different algorithms is evaluated using the SDR measure.
To investigate the significance of the constraint Jar (n) in the presence of estirnation
error in AIRs, noise is added to the AIRs such that CNR
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
121
40r-----,----,.......-------,------r------,
30
(Xl
20
ex:
V')
2000
4000
6000
8000
Number of iterations
Figure 6.4: Convergence of SDR for A-MINT and A-RAM with CNR == 20 dB L == L g ==
1600, J-l == 0.001 using WGN.
Nurnber of multiplications
Pre-calculation
Every iteration
/3 ==
0.05
for A-RAM. It is also interesting to note from Fig. 6.4 that, in the presence of BSI
estimation errors, a fixed value of j3 will incur higher arnount of gradient noise.
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
122
Number of additions
Pre-calculation
Every iteration
(Mf+M?)L2+4(M1 +M2)L+2
6.2.5
3+
Computational complexity
Tables 6.2 and 6.3 illustrate the computational complexity of A-MINT and A-RAM
using a fixed value of j3. For both algorithms, some terms in the update equation
can be computed before adaptation. For example in A-MINT, lld and ll Tll do not
vary with the number of iterations and can therefore be pre-computed. It is worth
noting that the only difference between A-RMINT and A-MINT is the regularization
term in ll Tll
+ 8IMLgxMLg,
M 2 L 2 additions for the pre-calculation. As can be seen from Tables 6.2 and 6.3,
33-
complexity of (){ T}} per iteration and O{ I}} in the pre-calculation. A closer observation reveals that A-RAM with two sub-systems requires a lower computational
load than A-MINT and A-RMINT. This is because, taking the number of addition
per iteration as an exarnple,
A11)2)L 2
(M~
+ (!vI -
(A1 2
+ 21\;11 (All
- 1\d)) L 2
since 0 < M 1 < ]\;1. For the exarnple case of 1\1 = 5 and M 1 = 3, the computational
cornplexity of A-RAl\1 is reduced approximately by half since Mf+1\J?
= 13 ~
~!v12.
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
6.3
123
6.3.1
In this sirnulation, the same AIRs are adopted as those described in Section 6.2.3.
The scalar multiplier (3 = 0.05 is chosen for A-RAM while 6" = 0.01 is selected for
A-RlVIINT [101]. A step-size p
order to be consistent with the sirnulations in Section 6.3.2 where these algorithms
achieve approximately the same initial convergence. Since a speech signal has been
used, the performance of these algorithrns is assessed by evaluating the quality of
the dereverberated speech using the bark spectral distortion (BSD) rneasure [104]
defined by
BSD
1 "\"'K "\"'Nc
LJk=l LJi=l
s 'l
B k ( ~)J 2
S 1
.l "\"'K
K
[B k ( ")
(6.26)
'
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
124
0.35 r - - - - - - - - , - - - - - , - - - - - - - - - - , - - - - . . , . - - - - - - - - . ,
0.3
A-MINT
2
~
0.25
:6
0.2
A-RMINT
~ 0.15
A-RAM
ru
~
0..
V)
.s:
~
co
0.1
0.05
MINT
200
400
600
Number of iterations
800
1000
Figure 6.5: BSD comparison between A-MINT, A-RMINT and the proposed A-RAM
algorithm using speech input and AIRs generated using the image model with L == 1600.
where B:(i) and B~(i) are the bark spectral component of the kth segment of the
original and dereverberated speech respectively and N; is the number of critical
bands. It has been reported in [104] that the BSD is a perceptually motivated
objective measure of speech quality which has a statistically linear relationship with
the mean opinion score. One can interpret frorn (6.26) that a lower BSD value
corresponds to a better dereverberated speech.
Figure 6.5 illustrates the BSD performance of A-MINT, A-RMINT and ARAM. As can be seen, the proposed A-RAM algorithm achieves the highest rate of
convergence cornpared to A-MINT and A-RMINT. The performance of the MINT
algorithrn has also been included in Fig. 6.5 for cornparison. It can be seen that all
the algorithrns can achieve nearly perfect speech dereverberation in the steady-state
since no noise is assumed in this illustrative exarnple.
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
~=0.03
--_. ~=0.02S
....... ~=0.02
1.2
s::
.g
lo-
o
+-'
V)
:c 0.8
co
lo-
a.
A-MINT
125
A-RAM
A-RMINT
MINT
0.6
V)
ro
co
0.4
0.2
OC-----'----'----L--------'--------'------'---------'----'-----'
200
400
600
800
Number of iterations
Figure 6.6: BSD comparison between A-MINT, A-RMINT and proposed A-RAM algorithm using recorded AIRs with L == 1600, SNR == 35 dB.
6.3.2
In the next, Monte Carlo simulation results of A-MINT, A-RMINT and A-RAM
using recorded AIRs in a typical classroom environrnent, obtained frorn [105], will
be shown. In total, twenty sets of five AIRs; i.e., one hundred AIRs are used in
this sirnulation where each AIR is re-sampled at 8 kHz and is of length L = 1600.
Here two noisy scenarios are considered where pink noise is added to the reverberant
speech xi(n) to achieve SNRs of 35 dB and 15 dB. As before, 3 == 0.05 is adopted for
A-RAM and the suggested optirnal regularization pararneter c5 == 0.018 and 0.18 [101]
are used in A-Rl\JIINT for SNR
Figure 6.6 shows BSD results averaged across these twenty sets of AIRs for
SNR = 35 with f-1 = 0.02, 0.025 and 0.03. As can be seen, the convergence and
steady-state perforrnance of A-MINT, A-RMINT and A-RAIVI varies with step-size
as expected, i.e.. a larger step-size will achieve a faster convergence with a trade-off
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
126
==
==
0.02 when
15 dB. These parameters are chosen since the algorithms can achieve approx-
imately the same initial convergence with this step-size. As can be seen from Figs. 6.6
and 6.7, A-MINT can achieve only limited performance when noise is present in the
received signals while A-RMINT and A-RAM can achieve significant improvement
in equalization performance given by a much lower steady-state BSD value. The
improvement of A-RMINT is due to the regularization parameter while for A-RAM,
this improvement is due to the additional auto-relation constraint that requires the
dereverberated speech frorn the two sub-systems to be equivalent. As observed from
Figs. 6.6 and 6.7, A-RAM outperforms A-RMINT by offering a higher rate of convergence. The standard deviation of these results (average across the twenty sets of
AIRs) are 0.1889 for A-MINT, 0.1323 for A-RMINT and 0.1147 for A-RAM when
SNR
==
==
are 0.2218 for A-l\!IINT, 0.1086 for A-RMINT and 0.114 for A-RAM.
It can also be noted from Fig. 6.6 that the steady-state of A-RAM is modestly lower than
~1INT
implies that A-RAl\!1 can achieve better speech recovery in the presence of noise.
Such noise robustness is due to the cost function of A-RAM which aims to reconstruct the received signal by exploiting the relationship between 5(n) and xi(n) in
(6.1) and (6.2). While A-lVIINT and A-RMINT do not take such relationship into
account, by minirnizing the cost function of A-RAM, the dereverberated signals of
the two sub-systerns are constrained to be equivalent which results, to sorne ex-
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
127
7~---~-------r---------.------,
A-MINT
~5
~
A-RMINT
+-I
VI
:04
ro
A-RAM
~ 3
Q.
VI
~
co
2
1
18
OL.-----------l.-------=--'.;t-----'-----------'-----'
500
1000
1500
Number of iterations
Figure 6.7: BSD comparison between A-MINT, A-RMINT and proposed A-RAM algorithm using recorded AIRs with L == 1600, SNR == 15 dB.
tent, in robustness to additive noise that are uncorrelated across the channels. The
above also explains why without noise, all algorithms can achieve equal steady-state
perforrnance as shown in Fig. 6.5.
In addition to BSD comparison, Fig. 6.8 illustrates d cornputed using the estimated inverse filters from A-MINT A-RMINT and A-RAYI at the 820th iteration
i
of Fig. 6.7. This number of iterations is chosen for illustration because A-RAM has
approxirnately converged while A-MINT and A-Rl\1INT have not. The corresponding BSD values are 3.167, 1.074 and 0.4677 which are denoted by B 1 , B 2 and B 3
in Fig. 6.7. As can be seen from Fig. 6.8, d obtained from A-RAlVI is closer to the
Kronecker delta function than that of A-MINT and A-Rl\!IINT.
In order to visualize the quality of dereverberated speech, the spectrograms
of the (a) clean, (b ) reverberant and dereverberated speech corresponding to (c) B1
from A-MINT, (d) B2 from A-RMINT and (e) B3 from A-RAl\1 are plotted in
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
500
1000
1500
500
1000
1500
1000
1500
128
(c)
500
Sample index
Figure 6.8: Equalized AIRs from (a) A-MINT, (b) A-RMINT and (c) A-RAM.
Fig. 6.9, where B1, B2 and B3 are denoted in Fig. 6.7. These spectrograms are
cornputed using a Hanning window of length 256 sarnples with an overlapping factor
of 50%. The different colors depict the energy distribution of the speech where dark
area represents the speech frame with high energy while light area corresponds to
the low energy frarne.
at 8 kHz and for clarity of presentation, the spectrogram of the speech signal for
the first 3.2 s is shown. As can be seen from Fig. 6.9 (b), the spectrograrn of the
reverberant speech is srneared compared to that of the clean speech in Fig. 6.9 (a).
This effect can be clearly seen in the region of 0-2 kHz at 0.3 s, 0.5 s, 1.5 sand
3 s. It can be seen frorn Figs. 6.9 (c)-(e) that speech is being dereverberated since
their spectrograrns look rnore similar to that of the clean speech. More importantly,
comparing frequencies within the region of 0-2 kHz frorn time 0.4 to 0.5 s, the
spectrograrn of Fig. 6.9 (e) shows less energy srnearing along the time-axis.
In
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
129
100
0.5
1.5
2
Time (5)
2.5
Figure 6.9: Spectrograms of (a) clean (b) reverberant speech and dereverberated speech
corresponding to (c) BI of A-MINT, (d) B2 of A-RMINT and (e) B3 of A-RAM.
addition, within the region of 0-2 kHz from time 1 to 1.2 s, 2 to 2.1 sand 2.4 to
2.6 s, the spectrogram of Fig. 6.9 (e) shows distinct frequency separation cornpared
to Figs. 6.9 (c) and (d). As noted in Figs. 6.9 (a) and (c)-(e), the A-RAM algorithm
reduces the smearing effect significantly which in turn suggests that the quality of
the dereverberated speech of A-RAM is better that of A-MINT and A-RMINT.
Lastly, the quality of the dereverberated speech is further assessed by the
rnean opinion score (MOS) test. With the same simulation setup as those described
in Fig. 6.6, five pairs of dereverberated speech are selected at the 200th, 400th, 600th,
800th and 1000th iteration from A-MINT and A-RAM, respectively. In this subjectivc test, twenty fluent English speakers are employed arnong which fifteen are male
while five are female. To better describe the fine difference of the dereverberated
speech frorn these two algorithms, the subjects are allowed to give decirnal score. To
illustrate the irnprovernent of the quality of dereverberated speeches when cornpared
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
130
4.5
OJ
o
~
:~ 3.5
C-
o
c
ro
OJ
~A-MINT
1.5 ~--'-----'-----'----'-------'---------'--------'----'----'------'
100
200
300
400
500
600
700
800
900
1000
Number of iterations
Figure 6.10: Mean opinion scores of the selected 5-pair dereverberated speeches.
to the reverberant speech, we included the MaS measure for the reverberant speech
which has an MaS score of 2, as the starting point for the MaS result curves for
both of the two algorithrns. As can be seen in Fig. 6.10, (i) the MaS results of the
proposed A-RAM algorithm are always higher than that of the A-MINT algorithm;
(ii) the improvement in quality of dereverberated speech in terms of MaS frorn the
proposed A-RAM algorithm is more significant than that of the A-MINT algorithm
within the sirnulation interval of 0-400 iterations. These show that the proposed
A-RA:\1 algorithm outperforms the A-MIl\T algorithm by offering a higher rate of
convergence for speech dereverberation. In addition, it is important to note frorn
Figs. 6.6 and 6.10 that a lower BSD measure corresponds to a higher MaS evaluation score. Results presented for the BSD and MaS measures justifies that these
two speech reverberation measurernent rnethods are reliable.
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
6.4 Conclusion
6.4
Conclusion
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
131
Chapter 7
Discussion and Conclusions
7 .1
Summary
determined control parameter. Since it is expected that the value of this control parameter may vary depending on the environrnent, the sparseness rneasure of the estirnated irnpulse response is proposed to be incorporated in order to compute the timevarying weights assigned to the proportionate and non-proportionate terms. Specifically, two mechanisms are proposed to achieve this. In the proposed SC- IPAPA-I,
additional weighting terrns computed based on the sparseness rneasure of estimated
irnpulse response are multiplied to the proportionate and non-proportionate terrns in
the conventional IPAPA. Furthermore, the proposed SC-IPAPA-II reduces the need
of deterrnining the control pararneter by employing a sparseness-dependent control
parameter. The proposed SC-IPAPA-I and SC-IPAPA-II ensure that the filter coef-
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
7.1 Summary
132
ficients with large magnitudes will adapt with a larger step-size for a sparse system
compared to a dispersive system. On the other hand, the small filter coefficients
in a sparse system will be assigned with a smaller step-size compared to a dispersive system. The contribution of the proposed SC-IPAPA-II removes the need for a
pre-defined control parameter which results in a higher rate of convergence that is
robust to the sparseness of the impulse response.
Chapters 3 and 4 are dedicated to BSI. In these chapters, the MCLMS [20] and
NMCFLMS [21] algorithms are reviewed. It has been shown that these algorithms
suffer from the robustness problem to additive noise which are uncorrelated across
the channels. The time-domain analysis of MCLMS in Chapter 3 revealed that the
additive noise requires the estimate of different channels to be equivalent. These
solutions are trivial since the channels differ in practice. The proposed IMCLl\JlS
algorithm in Chapter 3 addressed this issue by utilizing a constrained cross-relation
cost function which mitigates the cross-relation error due to noise thus achieving
noise robustness. The misalignment analysis performed on MCLMS and IMCLMS
under the sarne assumptions also showed that the proposed IMCLMS algorithrn
gains improvement in steady-state performance. In addition, Monte Carlo simulation
results showed that IMCLMS can achieve an irnprovement in NPM by approxirnately
5 dB over MCLMS.
As shown in Chapter 4, although the DP-NMCFLMS algorithrn [26] can address the rnisconvergence problem associated with the additive noise for NMCFLIVIS,
it suffers from slow convergence. The proposed DP-NMCFLMS-PC algorithm in
Chapter 4 not only addresses the noise robustness issue of NJ\!ICFLJ\!IS, but also
gains a higher rate of convergence than DP-NMCFLMS. The main contribution of
DP-NMCFLMS-PC is the additional power constraint which is achieved by rotating
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
7.1 Summary
133
the estimated hi(m) to its following updated value hi(m + 1) along the tangential
gradient of the surface Ilh(m + l)ill~
== 'l9;(m).
Another contribution of Chapter 4 is the DNE technique for DP-NMCFLMSPC. This technique performs an online misconvergence point estimation using a
power constraint that is close to the power of the true AIRs. In a practical implementation, this is achievable by computing the power of the estimated AIRs before
the algorithm misconverges. Therefore, estimation of when the algorithm misconverges is important for DP-NMCFLMS-PC. The proposed DNE technique achieves
this by monitoring the gradient of the l2- norm of the estirnated AIRs. In the context of the DNE technique, the misconvergence point was then defined as the time
point by which the change in the gradient of the estirnated AIRs is smaller than
a predefined threshold. It is shown through simulations that the convergence time
of the proposed DP-NMCFLMS-PC algorithrn is approxirnately one quarter of the
DP-NMCFLMS algorithm. In addition, the proposed DP-NlVICFLMS-PC algorithm
offers an improvement in steady-state NPlVI value by approximately 1.5 dB.
Chapters 5 and 6 are dedicated to the problern of channel equalization. Chapter 5 details improvement made to the MINT and A-l'vIINT algorithrns using the
sparseness constraint.
avoids matrix inversion via an adaptive approach thus achieving computational efficiency. However, the main weakness of A-MINT is its slow convergence. To address
this problern, it is proposed to exploit the sparseness property of the Kronecker delta
function. In view of this, the SC- MINT algorithrn is developed for acoustic channel
equalization. The rnain contribution is to irnprove the convergence speed by suppressing any undesirable non-zero coefficients in the Kronecker delta function that
is constructed from the estimated inverse filters at each iteration. Simulation results
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
134
using AIRs generated by the method of images and recorded AIRs showed that the
proposed 8C-MINT algorithm can achieve faster convergence than A-MINT.
In Chapter 6, an adaptive sub-system based multichannel system equalization
algorithm that performs inverse filtering of room acoustics is proposed. Conventional channel equalization algorithms does not take into account the source signal
and consider only the estimated AIRs obtained from B8I algorithms. Given that
such equalization techniques have been developed independently of B8I, the performance of existing approaches for dereverberation is limited to a great extent. In the
proposed A-RAM algorithm, however, how reverberant speech was generated is first
taken into consideration of by utilizing a sub-system configuration. In addition to the
constraints required by conventional equalization algorithms, the difference between
the outputs of the two sub-systems is further minimized. It is subsequently shown
through sirnulations that the proposed algorithm is able to achieve fast convergence
for the inverse filtering of room acoustics. J\!Ioreover, it has been shown through
complexity analysis that the proposed A-RAM algorithm with two sub-systems is
more computationally efficient than the existing MIl\T-based algorithms.
7.2
This research has been focused on the developrnent of adaptive algorithms for channel identification and equalization with applications to echo cancellation and speech
dereverberation. The following are the suggestions for the near future research:
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
135
3. Speech
dereverberation
in
multiple-input
multiple-output
(MIMO) system. This research is also focused in a SIMO rnodel where only
one source signal is presented. In a rnore realistic scenario, rnultiple speakers
can be involved which brings speech dereverberation into the case of a MIM0
system. Existing research such as blind source separation (BSS) atternpts to
separate different source signals. However, all the separated signals are a filtered version of the original speech. In a view of this, speech dereverberation
is desired in BSS as well. Therefore, one possible future work is to extend and
develop algorithrns with regard to joint speech dereverberation and BSS where
the aim is to irnprove the quality of the separated speech.
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
136
Appendix A
Convergence analysis of A-MINT,
A-RMINT and A-RAM
Convergence analysis of the A-MINT, A-RMINT and A-RAM algorithms is provided
in this appendix. The motivation of this analysis is to explain why the proposed ARA M algorithm, described in Section 6.2.1, can achieve fast convergence.
For rnathematical tractability, the following are assumed in the analysis:
1. The channel hi, i = 1,
formed frorn hi, i
1,
a; [42].
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
137
(A.I)
E {Z A-MINT ( n )}
[IMLgxMLg -
2J-L1T1r z (O)
- [IMLgxMLg -
2/l1T1
OMLgXl,
rg,
(A.2)
as cornrnon in practice.
In the proposed A-RAM algorithm, the update equation for the two subsystems are
1)] ,(A.3)
The update equation (A.3) is now used to illustrate the convergence of the first
sub-systern of A-RAM. For clarity of presentation, the time dependency factor n
for e(n) is ignored, gl (n) and Xl (n). Following similar approach of (A.2), one rnay
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
138
Z1 (n)
[IMlLgXMlLg -
2J.l1f11] Z1 (n - 1) - 2J.lf3eXl
[I M l L g XMlL g
[IMlLgXMlLg -
(A.5
( L
X1'iZ1'i) Xl
1,=0
(A.6)
xaZI(n - 1),
where
2
XI,O
XI,OXI,1
XI,OXI,1
XI,1
(A.7)
[I M l L g xM,L g
+2/l'11Xf z2(n -
2J.l1f11 - 2J.lf3xa ] Z1 (n - 1)
(A.8)
l)XI
[1(M-MdLgX(M-MI)L g -
+2ppxi zl(n -
1)X2'
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
139
where
2
X2,O
X2,OX2,1
X2,OX2,1
2
X2,1
X2,OX2,(M -Ml)Lg-l
X2,lX2,(M -M1)Lg-l
(A.I0)
zl(n)
[IMILgXMILg -
+2jL,Bxr{ [I(M-Ml)LgX(M-MI)L g
+2jL,Bxfzl(n - 2)X2
[IMILgXMILg -
}XI
+2jLt:ixfzl(n -
2)X2
}XI'
3)
(A.H)
Similar to [42], a srnall step-size is assurned, such that the higher orders of J-L are
sufficiently small so that they can be approximated to 0, i.e., J-Ln ---+ 0 for n 2: 2.
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
140
Hence,
[IMILgXMILg
2/-L1-lf1-l2
}Xl}
2/-L1-l'[1-l 1
-2/-L,BE{xI {[I(M-Ml)LgX(M-Ml)Lg
(A.I2
2/-L1-lI1-l2 - 2/-L,BE{x
b}rg2
}Xl}
(J"2
(J"2
E{X a }
(A.13)
(J"2
MILgxMIL g
and similarly
o
o
In addition, denoting
as
A(M-MdLg x],
(A.14)
[I(M-M1)LgX(M-MdLg -
2/--l1-lr1-l 2
E{ xf
{[I(M-M1)LgX(M-MdLg
~ 2j1,1-lI1-l2 -
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
(M - M l )L9 - l
XI,O
X2,i Ai
(M - M l )L9 - l
XI,1
Xl,MlLg-l
X2,i Ai
J\;11L gxl
(A.15)
OMILgXlo
141
1, .
M', i.e.,
{Xl,OX2,O}
o.
(A.16)
E {zl(n)}
(A.17)
It should be noted that the update equation of A-RIVIINT, which can be forrnulated
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
142
g(n
+ 1) == g(n)
- M (-21i T d
(A.lS)
Following the same derivation of (A.2), the convergence behavior of A-RMINT can
be obtained as
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
143
Appendix B
Publications arising from this
thesis
Journal papers
1. L. Liao and A. W. H. Khong, "An adaptive sub-systems based algorithm for
275-278, 2011.
Conference papers
1. L. Liao and A. W. H. Khong, "Equalization of rnultichannel acoustic sys-
tern using sub-systerns for speech dereverberation," in Proc. IEEE Int. Conf.
Acoust.) Speech and Signal Process., May 22-27, 2011, pp. 313-316.
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
144
Workshop on Acoust.
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
145
Bibliography
[1] A. W. H. Khong, "Adaptive algorithms employing tap selection for single channel and stereophonic acoustic echo cancellation," Ph.D. dissertation, Imperial
College London, 2006.
[2] L. Rabiner and R. Schafer, Digital processing of speech signals. NJ: PrenticeHall, 1978.
[5] R. Mofett, "Echo and delay problerns in sorne digital cornrnunication systems,"
IEEE Commun. Mag., vol. 25, pp. 41-47, 1987.
in
[7] K. Miura, H. Fujiya, T. Mizuno, and T. Ushiki, "Cell-based echo canceller for
voice cornrnunications over atrn networks," in Proc. IEEE Telecornrnunications
Conference, vol. 1, 1995, pp. 77-82.
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
BIBLIOGRAPHY
146
[14] P. Jinachitra and R. E. Prieto, "Towards speech recognition oriented dereverberation," in Proc. IEEE Int. Con]. Acoust., Speech and Signal Process., 2005,
pp. 437-440.
[15] B. W. Gillespie and L. E. Atlas, "Acoustic diversity for irnproved speech recognition in reverberant environrnents," in Proc. IEEE Int. Con]. Acoust., Speech
and Signal Process., vol. 1, 2002, pp. 557-560.
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
BIBLIOGRAPHY
147 .
[16] A. Sehr and W. Kellermann, "Strategies for modeling reverberant speech in the
feature domain," in Proc. IEEE Int. Conf. Acoust., Speech and Signal Process.,
2009, pp. 3725-3728.
[17] T. Gansler, J. Benesty, S. L. Gay, and M. M. Sondhi, "A robust proportionate
affine projection algorithm for network echo cancelation," in Proc. IEEE Int.
Conf. Acoust., Speech and Signal Process., vol. 2, 2000, pp. 793-796.
[18] O. Hoshuyama, R. A. Goubran, and A. Sugiyama, "A generalized proportionate variable step-size algorithm for fast changing acoustic environments," in
Proc. IEEE Int. Conf. Acoust., Speech and Signal Process., vol. IV, 2004, pp.
161-164.
[19] K. Sakhnov, "An improved proportionate affine projection algorithm for network echo cancelation," in Proc. IEEE Int. Conf. System Signals and Image
Process., Jun. 2008, pp. 125-128.
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
BIBLIOGRAPHY
148
controlled algorithms for echo cancellation," IEEE Trans. Audio, Speech and
for echo cancellation," in Proc. Asia Pacific Signal and Information Processing
exploiting sparseness constraint," IEEE Signal Process. Lett., vol. 18, pp. 275278, 2011.
[30] - - , "Equalization of rnultichannel acoustic system using sub-systerns for
speech dereverberation," in Proc. IEEE Int. Conf. Acoust., Speech and Sig-
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
BIBLIOGRAPHY
149
[31] M. Nekuii and M. Atarodi, "A fast converging algorithm for network echo
cancelation," IEEE Signal Process. Lett., vol. 11, pp. 427-430, 2004.
[32] D. Sabolic, R. Malaric, and A. Bazant, "Some properties of impulse response
of distribution networks with periodic tree-type topology," IEEE Trans. Power
Delivery, vol. 26, pp. 2416-2427, 2011.
[33] D. L. Duttweiler, "Proportionate normalized least mean squares adaptation in
echo cancellers," IEEE Trans. Speech and Audio Process., vol. 8, pp. 508-518,
Sep. 2000.
[34] J. Benesty and S. L. Gay, "An irnproved PNLMS algorithm," in Proc. IEEE
Int. Conf. Acoust., Speech and Signal Process., vol. 2,2002, pp. 1881-1884.
[35] A. W. H. Khong and P. A. Naylor, "Efficient use of sparse adaptive filters," in
Proc. Int. Conf. Signals, Systerns and Computers, Oct. 2006, pp. 1375-1379.
[36] A. N. Birkett and R. A. Goubran, "Acoustic echo cancellation for hands-free
telephony using neural networks," in Proc. IEEE Workshop Neural Networks
for Signal Process., Sep. 1994, pp. 249-258.
[37] J. Beh, T. Lee, 1. Lee, H. Kirn, S. Ahn, and H.
1(0,
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
BIBLIOGRAPHY
150
[39] A. W. H. Khong, P. A. Naylor, and J. Benesty, "A low delay and fast converging
improved proportionate algorithm for sparse system identification," EURASIP
Journal Audio, Speech and Music Process., vol. 2007, Jan. 2007.
[40] A. W. H. Khong, X. S. Lin, and P. A. Naylor, "Algorithms for identifying
clusters of near common zeros in multichannel blind system identification and
equalization," in Proc. IEEE Int. Conf. Acoust. Speech and Signal Process.,
2008, pp. 389 - 392.
[41] J. Benesty, Y. A. Huang, J. Chen, and P. A. Naylor, "Adaptive algorithms for
the identification of sparse impulse responses," in Topics in Acoust. Echo and
[43] C. Paleologu, J. Benesty, and S. Ciochina, "Regularization of the affine projection algorithm," IEEE Trans. Circuits and Systems-II: Express Briefs, vol. 58,
pp. 366-370, Jun. 2011.
[44] H.-C. Shin and \V.-J. Song, "Affine projection algorithms with adaptive regularization matrix," in Proc. IEEE Int. Conf. Acoust., Speech and Signal Pro-
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
BIBLIOGRAPHY
151
Reference to Other Acoustical Parameters, Switzerland: International Standards Organization, 1997, iSO 3382-1997s.
[48] E. K. Miller, "A comparison of solution accuracy resulting from factoring and
inverting ill-conditioned matrices," in Proc. IEEE Int. Symposium Antennas
Press., 1988.
[50] H. R. Abutalebi, H. SHeikhzadeh, R. L. Brennan, and G. H. Freeman, "Affine
projection algorithm for oversarnpled subband adaptive filters," in Proc. IEEE
Int. Conf. on Acoust., Speech and Signal Process., vol. 6, 2003, pp. 209-212.
[51] J. B. Allen and D. A. Berkley, "Image method for efficiently simulating smallroom acoustics," J. Acoust. Soc. Arner., vol. 65, pp. 943-950, Apr. 1979.
[52] D. H. Pham and J. H. Manton, "A subspace algorithm for guard interval based
channel identification and source recovery requiring just two received blocks,"
in Proc. IEEE Int. Conf. Acoust., Speech and Signal Process., 2003.
[53] H. Xu, S. Dasgupta, and Z. Ding, "A novel channel identification rnethod for
fast wireless cornrnunication systems," in Proc. IEEE Int. Conf. Commun.,
2001, pp. 2443-2448.
[54] Y. Sato, "A method of self-recovering equalization for multilevel amplitudemodulation," IEEE Trans. Commuu., vol. COM-23, pp. 679-682, 1975.
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
BIBLIOGRAPHY
152
[55] A. Benveniste, M. Goursat, and G. Ruget, "Robust identification of a nonminimum phase system: Blind adjustment of a linear equalizer in data communications," IEEE Trans. Automat. Contr., vol. 2, pp. 385-399, Jun. 1980.
[60] Y. Huang and J. Benesty, "Adaptive rnulti-channel least rnean square and
newton algorithms for blind channel identification;" Signal Process., vol. 82,
pp. 1127-1138, Aug. 2002.
Conf. Acoust., Speech and Signal Process., May 2001, pp. 4140-4143.
[62] L. Tong and Q. Zhao, "Joint order detection and blind channel estirnation by
least squares smoothing," IEEE Trans. Signal Process., vol. 47, pp. 2345-2355,
Sep. 1999.
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
153
BIBLIOGRAPHY
[63] N. D. Gaubitch, J. Benesty, and P. A. Naylor, "Adaptive common root estimation and common zeros problem in blind channel identification," in Proc.
Speech, and Lang. Process., vol. 15, pp. 1681-1695, Jul. 2007.
[66] M. Jeub, C. Nelke, C. Beaugeant, and P. Vary, "Blind estimation of the
coherent-to-diffuse energy ratio from noisy speech signals," in Proc. European
Springer-Verlag New
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
BIBLIOGRAPHY
154
[73] R. G. Lyons, Understanding Digital Signal Processing, 2nd ed. Prentice Hall,
Mar. 2004.
[74] S. Griebel and M. Brandstein, "Wavelet transform extrema clustering for multchannel speeh dereverberation," in Proc. Int. Workshop on Acoust. Echo and
Noise Control, 1999.
[77] T. Nakatani, 1\1. Miyoshi, and K. Kninoshita, Single microphone blind dereverberaiion. In: Speech Enhancernent. Springer, Berlin, Heidelberg, 2005, ch. 11,
pp. 247-270.
[78] T. A. Palka and D. W. Tufts, "Reverberation characerization and suppression
by means of principal cornponents," in Proc. IEEE OCEANS, vol. 3, 1998, pp.
1501-1506.
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
BIBLIOGRAPHY
155
[81] H. W. Lollmann and P. Vary, "A blind speech enhancement algorithm for the
suppression of late reverberation and noise," in Proc. IEEE Con]. Acoust.,
Speech and Signal Process., 2009, pp. 3989-3992.
[83] K. Lebart, J. M. Boucher, and P. N. Denbigh, "A new method based on spectral
subtraction for speech dereverberation," in Acta Acustica united with Acustica,
vol. 87, no. 3, 2001, pp. 359-368.
[85] 1. Cohen, "Relaxed statistical rnodel for speech enhancement and a priori SNR
estimation," IEEE Trans. Speech and A udio Process., vol. 13, pp. 870-881,
Sep. 2005.
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
BIBLIOGRAPHY
156
Conf. Acoust., Speech and Signal Process., vol. 4, Apr. 1994, pp. 573-576.
[88] L. Hoteit, "Extending the subspace method for blind identification," in Proc.
IEEE Int. Conf. Signal Process., vol. 1, Oct. 1998, pp. 347-350.
[89] D. Slock, "Blind fractionally-spaced equalization, perfect reconstruction filterbanks, and multilinear prediction," in Proc. IEEE Int. Conf. Acoust., Speech
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
BIBLIOGRAPHY
[95] J. Mourjopoulos,
squares and
157
nOJmOlm()m'UU~'t4~enltll(:JUE~Sfor
pp. 1958-1961.
[96] B. Radlovic and R. Kennedy, "Nonminimum-phase equalization and its sub-
tems in oversampled subbands," IEEE Trans. on Audio Speech and Lang. Process., vol. 17, pp. 1061-1070, 2009.
[98] M. Miyoshi and Y. Kaneda, "Inverse filtering of room acoustics," IEEE Trans.
Acoust. Speech Signal Process., vol. 36, pp. 145-152, Feb. 1988.
[99] S. T. Neely and J. B. Allen, "Invertibility of a room impulse response," J.
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library
BIBLIOGRAPHY
158
[103] A. Blin, S. Araki, and S. Makino, "Blind source separation when speech signals
of speech coders," IEEE Trans. Selected Areas in Commun., vol. 10, pp. 819
- 829, 1992.
[105] S. Rebecca and S. Mark, "Database of omnidirectional and b-format impulse
responses," in Proc. of IEEE Int. Conf. Acoust., Speech and Signal Process.,
Mar. 2010, pp. 165-168.