You are on page 1of 179

ATTENTION: The Singapore Copyright Act applies to the use of this document.

Nanyang Technological University Library

ADAPTIVE SYSTEM IDENTIFICATION AND


EQUALIZATION ALGORITHMS FOR ACOUSTIC ECHO
CANCELLATION AND SPEECH DEREVERBERATION

LIAO LEI

School of Electrical and Electronic Engineering


A thesis submitted to the Nanyang Technological University
in partial fulfillment of the requirement for the degree of
Doctor of Philosophy
2013

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

To my wife Zhang Yuan, my parents


Liao Qinsong, Peng Yunnian, my parents-in-law
Zhang Jian, Yuan Weiling, my sister, Liao Jinxia and

To Andy W. H. Khong

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

ii

Abstract
A class of adaptive algorithms for acoustic echo cancellation (AEC) and speech dereverberation have been developed and analyzed in this thesis. The starting point of
this work is the affine projection (AP) based non-blind channel identification algorithms for AEC. The proposed. sparseness constrained improved proportionate AP
algorithms (SC-IPAPA-I and SC-IPAPA-II) exploit the sparseness of the estimated
channel and allocate different effective step sizes accordingly. The performance of
these algorithms has been studied in the context of single-channel AEC and tracking
capability.
The performance of a blind channel identification algorithm with additive
noise is studied by providing reasons for the degradation in perforrnance of the rnultichannel least-rnean-square (MCLMS) algorithm in the presence of additive noise.
Subsequently, it is shown mathematically, through a cross-correlation based cost
function, that minimizing the power of a filtered version of the received signals can
suppress the noise effect. The performance of MCLMS and improved lVICLlVIS (LVICLMS) are evaluated using the normalized projection rnisalignment via lVlonte Carlo
simulations.
It is shown through simulations that the well-known normalized rnultichannel
frequency-domain least-mean-square (NMCFLMS) algorithrn, originally developed
for acoustic impulse response (AIR) identification in the frequency domain, also
suffers from the noise robustness problem. A noise robust blind AIR estima.tion
algorithm is then proposed. Inspired by the trivial solution achieved by NMCFLJVIS

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

Abstract

iii

in the presence of noise, the proposed direct-path NMCFLMS algorithm with power
constraint (DP-NMCFLMS-PC) jointly applies direct-path and power constraint to
improve its robustness against noise. The DP-NMCLFMS-PC not only addresses the
noise robustness issue but also achieves fast convergence compared to NMCFLMS.
The adaptive multiple input/output inversion theorem (A-MINT) algorithm
has been developed for channel equalization with application to speech dereverberation. In order to increase its convergence rate, the proposed algorithm suppresses
any undesired non-zero coefficients in the estimated Kronecker delta function iteratively. This is achieved by applying the sparseness measure of the estimated
Kronecker delta function and using it as an additional constraint to A-MINT.
Unlike existing channel equalization systems, the proposed auto-relation
aided MINT (A-RAM) algorithrn, which achieves good equalization performance,
takes into account how the received signals are generated during its adaptation process. The differential output signals from two sub-systems are then utilized in the
cost function during equalization. Sirnulation results have shown that the proposed
A-RAM algorithm can achieve a higher rate of convergence leading to better dereverberated speech signal cornpared to existing MINT-based equalization algorithms.

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

iv

Acknowledgment
This thesis is the result of four years of research and I would like to take this
opportunity to thank many people who have contributed towards this work in one
way or another.
Foremost, I would like to express my sincere gratitude and appreciation to my
advisor Dr. Andy W. H. Khong who has given me tremendous guidance and advice
to my research work. I am very grateful for his constant patient and enthusiasm
over the past four years without which my research could not be such enjoyable.
I am also very grateful for Dr. Woon Seng Gan who generously hosted my
first year research in the Digital Signal Processing Lab. I sincerely thank him for
his valuable comments and advice in my research.
For the non-scientific side of my thesis, I particularly want to thank my wife
Zhang Yuan for being my eternal sunshine. I thank my parents, parents-in-law and
my sister for their care and support shown to me through the entire process. I also
thank the technical staff Mr. Chu Chun Chung, Mr. Yeo Sung Kheng and Mr. Ong
Say Cheng for their great assistance in the Lab. I also like to express my sincere
thanks to our teammates for their valuable comments.

Liao Lei

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

Contents
Abstract

ii

Acknowledgment

iv

Abbreviations
1

xii

Introduction

1.1

General introduction to echo cancellation and speech reverberation

1.2

Organization of the thesis

1.3

Statement of the originality, main contributions and related publications

. . . . . . . . . . . . . . . . . . . . . ..

An Improved Affine Projection Algorithm Employing Sparseness


Constraint for Acoustic Echo Cancellation

10

2.1

Introduction..................

10

2.2

Review of APAs for non-blind system identification

13

2.2.1

Affine projection and regularized APAs . . .

14

2.2.2

The proportionate affine projection algorithm (PAPA)

16

2.2.3

The improved proportionate affine projection algorithm


(IPAPA) . . . . . . . . . . . . . . .

2.3

The proposed sparseness-controlled APAs. .

17
18

2.3.1

The sparseness rneasure

19

2.3.2

The proposed SC-IPAPA-I and SC-IPAPA-II .

21

2.4

Stability analysis . . . . .

25

2.5

Computational cornplexity

30

2.6

Simulation results. . . . .

31

2.6.1

Simulations using WGN input

31

2.6.2

Simulation using speech input

33

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

CONTENTS

2.6.3

vi

Simulation using impulse responses generated by the image


model

2.6.4

' . . . . . . . ..

Simulation using impulse responses generated by the image


model and speech input

2.7

34

36

Conclusion..........

37

An Improved Multichannel Least-Mean-Square Algorithm for


Blind Channel Identification

39

3.1

Introduction.....

39

3.2

Problem formulation

41

3.3

Analysis of MCLMS in the presence of noise

44

3.4

Development of the IMCLMS algorithm

..

46

3.4.1

Convergence behaviour. . . . . . . .

50

3.4.2

Misalignment behaviour of MCLMS and IMCLMS .

52

3.5

Simulation results.

59

3.6

Conclusion.....

63

Adaptive Blind Room Impulse Response Estimation Algorithms


for Speech Dereverberation

64

4.1

64

Review of Speech Dereverberation Algorithms


4.1.1

LP residual enhancement methods

65

4.1.2

Harmonic filtering. . . . . . . . . .

68

4.1.3

Speech dereverberation via channel identification and inverse


filtering

4.1.4

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

68

Speech dereverberation via late reverberation suppression.

69

Review of BSI algorithms for acoustic channels.

70

4.2.1

Review of the NMCFLMS algorithm . .

72

4.2.2

Review of the DP-NlVICFLMS algorithm

76

4.3

The proposed DP-NMCFLlVIS-PC algorithm . .

77

4.4

Algorithmic derivation . . . . . . . . .'. . . . .

81

4.2

4.4.1

Determination of

f); (ni) using proposed rnisconvergence point

estimation .

83

4.5

Simulation results.

86

4.6

Conclusion.....

90

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

vii

CONTENTS

A Sparseness-Constrained Channel Equalization Algorithm with


Application to Speech Dereverberation

91

5.1

Introduction..............

91

5.2

Review of single-channel equalization

93

5.3

Review of the MINT algorithm

96

5.3.1

....

97

Review of the A-MINT algorithm

98

5.4

Proposed sparseness-controlled A-MINT algorithm.

5.5

Simulation Results

102

5.6

Conclusion....

107

Adaptive

Channel

Equalization

Exploiting

Segregated

Sub-

systems

109

6.1

Introduction........

. 109

6.2

Algorithmic developrnent

. 110

6.3

6.2.1

Convergence behavior of A-RAM

6.2.2

Derivation of closed-form

6.2.3

Selection of equalization results from sub-systems

. 117

6.2.4

Significance of Jar (n) . . . . . . . . . . . . . . . .

120

6.2.5

Computational cornplexity . . . . . . . . . . . . .

122

f3

. 123

Channel equalization using AIRs generated from the image


.

123

Monte Carlo sirnulation results using recorded AIRs .

125

Conclusion..................

129

model
6.3.2

117

Simulations results with application to speech dereverberation


6.3.1

6.4

114

Discussion and Conclusions

131

7.1

Summary

131

7.2

Future research direction

134

Appendices

136

A Convergence analysis of A-MINT, A-RMINT and A-RAM

136

B Publications arising from this thesis

143

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

viii

List of Figures
1.1

Schematic diagram of single-channel acoustic echo cancellation (after [1]). .

1.2

Illustration of acoustic propagation in a room. . . . . . . . . . . . . . . .

2.1

Schematic diagram of an acoustic echo canceller.

2.2

Impulse response generated using (a)


T

= 65 with

as

= 5, (b)

13
T

= 15, (c)

= 35, (d)
20

= 1.

2.3

Control parameter varies with sparseness measure ~c(n).

2.4

Magnitude of the elements in the control matrix with different sparseness

26

(() values.

2.5

Eigenvalue spread for IPAPA, SC-IPAPA-I and SC-IPAPA-II for 128 <

L
2.6

1024. . . . '. . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..

29

Convergence of IPAPA, RAPA, SC-IPAPA-I and SC-IPAPA-II using WGN


input with echo path changed midway during simulation. . . . . . . . ..

2.7

25

32

Convergence of IPAPA, RAPA, SC-IPAPA-I and SC-IPAPA-II using the


same step-size and WGN input with echo path change introduced midway
during simulation.

2.8

. . . . . . . . . . . . . . . . . . . . . . . . . . . ..

Convergence of IPAPA, SC-IPAPA-I and SC-IPAPA-II using speech input


with echo path change introduced midway during simulation. . . . . . ..

2.9

33

34

Simulation setup used in the image model to generate room impulse response. 35

2.10 Impulse responses generated using the image model, L

= 1024. . . . . .. 36

2.11 Convergence of IPAPA, SC-IPAPA-I and SC-IPAPA-II using WGN input


with impulse responses generated using the method of images.

37

.2.12 Convergence of IPAPA, SC-IPAPA-I and SC-IPAPA-II using speech input


with impulse responses generated using the method of images.

38

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

LIST OF FIGURES

3.1

ix

Illustration of the relationship between the input s( n) and the observations

Yi(n) in a SIMO system. . . . . . . . . . . . . . . . . . . . . . . . . ..


3.2

41

The NPM performance of MCLMS and proposed IMCLMS algorithms


using Monte Carlo simulations with 100 trials while SNR == 20 dB, and

59

f3 == 0.2, 0.4, 0.6, 0.8 and 1 for IMCLMS.

3.3

The NPM performance of MCLMS and proposed IMCLMS algorithms


under SNR == 15 dB using Monte Carlo simulations with 100 trials while

60

f3 == 1 for IMCLMS.

3.4

Variation of steady-state NPM for MCLMS and the proposed IMCLMS


algorithms using Monte Carlo simulations with 100 trials from SNR == 6
to 24 dB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..

3.5

61

Comparison of NPM of the proposed IMCLMS algorithm with f3 == 1.0,


1.2 and 1.4 using Monte Carlo simulations with 100 trials from SNR == 10
to 20 dB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..

4.1

Simplified block diagram of source-filter model of speech production (after [2]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4.2

62

65

General block diagram of an LP enhancement method for speech dereverberation. . . . . . . . . . . . . . . . . . . . . . . . . . .

66

4.3

Relationship between input and output in a SIMO model. . .

72

4.4

Effect of noise on NPlVI for the NlVICFLMS algorithnl.. . . .

74

4.5

Illustration of Ilh(m)ll~ using NMCFLMS algorithm at SNR==25 dB.

79

4.6

Illustration of NPM results from direct-path and unit-norm constraint NMCFLMS at SNR==25 dB. . . . . . . . . . . . . . . . . . . . . . . . . ..

4.7

(a) True h, (b)

h(m) using DP-NJ\1CFLMS with unit-norm constraint,

(c)

h(Tn) using DP-NJ\1CFLMS at SNR==25 dB. . . . . . . . . . . . . . . ..


4.8

4.9

80

81

(a) NPM of NMCFLMS and variation of (b) Ilh(m)II~, (c) ~llh(m)ll~ and
(d) cost function J(m) with time at SNR==25 dB. . . . . . . . . . .

84

Illustration of IIh(n~)II~ using NMCFLMS algorithm at SNR==25 dB.

86

4.10 Comparison of BSI algorithms with SNR == 35 dB.

87

4.11 Comparison of BSI algorithms with SNR == 30 dB.

88

..

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

LIST OF FIGURES

4.12 Comparison of BSI algorithms with SNR == 25 dB.

89

5.1

Conventional least-square error based single-channel inverse filtering.

94

5.2

Inverse filtering to a single-input M -output system. . . . . . . . . .

97

5.3

Steady-state performance comparison between A-MINT and proposed SCMINT when CNR

== 35 dB using AIRs generated by the image model with

f3 == 0.01, 0.02, ... ,1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103


5.4

Convergence comparison between A-MINT and proposed SC-MINT when


CNR

5.5

== 20 dB using AIRs generated by image model. . . . . . . . . . . . 105

(a) Recorded AIR; (b) Convergence comparison between A-MINT and


proposed SC-MINT when CNR

5.7

104

Convergence comparison between A-MINT and proposed SC-MINT when


CNR

5.6

== 30 dB using AIRs generated by the image model.

== 20 dB.

106

SRR performance of A-MINT and proposed SC-MINT when CNR

== 20 dB

using recorded AIRs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

6.1

Proposed inverse filtering method for a SIMO system with two sub-systems. 111

6.2

Performance of two sub-systems in terms of SDR. . . .

6.3

Convergence for the cost functions of two sub-systems.

6.4

Convergence of SDR for A-MINT and A-RAM with CNR


Lg

6.5

== 1600,

J-L

119
. 120

== 20 dB L ==

== 0.001 using WGN. . . . . . . . . . . . . . . . . . . . .. 121

BSD comparison between A-MINT, A-RMINT and the proposed A-RAM


algorithm using speech input and AIRs generated using the image model
with L

6.6

== 1600. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

BSD comparison between A-MINT, A-RMINT and proposed A-RAM algorithm using recorded AIRs with L

6.7

== 1600, SNR == 35 dB. . . . . . . . . 125

BSD comparison between A-MINT, A-RMINT and proposed A-RAlVI al-

== 1600, SNR == 15 dB . . . '. .

127

6.8

Equalized AIRs from (a) A-MINT, (b) A-RMINT and (c) A-RAM.

128

6.9

Spectrograms of (a) clean (b) reverberant speech and dereverberated

gorithm using recorded AIRs with L

speech corresponding to (c) B1 of A-MINT, (d) B2 of A-RMINT and (e)


B3 of A-RAM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

,
ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

xi

List of Tables
2.1

Number of multiplications of IPAPA, RAPA, SC-IPAPA-I and SCIPAPA-II per iteration. . . . . . . . . . . . . . . . . . . . . . . . . ..

2.2

30

Number of additions of IPAPA, RAPA, SC-IPAPA-I and SC-IPAPAII per iteration. . . . . . . . . . . . . . . . . . . .

30
. 115

6.1

Proposed A-RAM algorithm for SIMO system..

6.2

Number of additions and multiplications of A-MINT.

. . 121

6.3

Number of additions and multiplications of A-RAM...

. . 122

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

xii

Abbreviations
AIR:

Acoustic impulse response

BSD:

Bark spectral distortion

BSI:

Blind system identification

CFE:

Cost function online estimation

DNE:

Delta-norm estimation

DP-NMCFLMS:
DP-NMCFLMS-PC:

Direct-path NMCFLMS
DP-NMCFLMS with power constraint

FFT:

Fast Fourier-transform

HOS:

Higher-order statistical

MCLMS:
MINT:
NMCFLMS:

Multichannel least-mean-square
Multiple-input/ output inversion theorem
Normalized multichannel frequency-domain least-mean-square

NPM:

Normalized projection misalignment

SDR:

Signal-to-distortion ratio

SOS:

Second-order statistical

SRR:

Segmental signal-to- reverberation ratio

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

xiii

List of Notations
Chapter 1
n

Sample iteration

ST(n)

Source signal in the transmission room

SR(n)

Source signal in the receiving room

g(n)

Time-varying AIR in the transmission room

h(n)

Time-varying AIR in the receiving room

h(n)

Estimated h(n)

w(n)

The additive noise in the receiving room

x(n)

Signal in the transmission medium from the transmission room

y(n)

Signal in the transmission medium from the receiving room

fj(n)

Estirnated y ( n )

e(n)

The difference between y(n) and fj(n) , i.e., y(n) - fj(n)

s(n)

Clean speech signal in a reverberant room

hi(n)

The ith channel AIR for a rnultichannel systern

Xi(n)

The ith channel reverberant signal received by a rnultichannel system

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

List of Notations

Chapter 2
L

Length of the AIRs

Projection order of APA

Time-invariant AIR (single channel)

h(n)

Estimated h (single channel)

x(n)

Tap-input vector in AEC

v(n)

Additive noise

y(n)

Received signal with additive noise in AEC

X(n)

Concatenated x(n) of a pth-order APA

Lagrangian multiplier

Vector notation of multiple Lagrangian multipliers

Lagrangian function
Identity rnatrix of dimension m rows and n colurnns

Omxn

Null matrix of dimension m. rows and 17, columns


Step-size
Regularization parameter
Variance of signal x (n )

G(n)

Diagonal proportionate step-size rnatrix

9z(n)

The lth diagonal element of G( 17,)


Covariance rnatrix of the tap-input source signal

Instantaneous correlation rnatrix


Control pararneter
Positive constant to prevent null division
Positive constant for the design of the proportionate step-size matrix
Sparseness value

xiv

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

List of Notations

Lz

Length of leading zero

Lu

Length of decay window

Decay constant

JC

Positive scalar

c(n)

Difference between the estimated and true AIRs, i.e., h(n) - h

Q(n)

Unitary matrix

T'{n)

Square eigenvalue matrix

ryi (n)

The ith diagonal element of T (n)

TJ( n )

Normalized misalignment

xv

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

List of Notations

xvi

Chapter 3
The ith channel time-invariant AIR
M

Number of microphones

Concatenated

Yi(n)

The ith channel observed signal, i.e., xi(n)

Linear convolution operator

eij(n)

Cross-relation error due to source signal

eYj ( n)

Cross-relation error due to noise

e(n)

Cross-relation error vector

Xx(n)

Cross-relation based cost function due to source signal

Xv(n)

Cross-relation based cost function due to noise

x(n)

Cross-relation based cost function due to source and noise

Xp(n)

Proposed cross-relation based cost function of

J(n)

Cost function of MCLMS with unit-norrn constraint

Jp(n)

Cost function of IMCLMS with unit-norrn constraint

Ryiyj(n)

Estirnated covariance matrix between observations Yi (n) and Yj (n)

hi,

i.e.,

[hi hf h~]T
+ vi(n)

I~1 CLMS

Forgetting factor
(3

Lagrangian multiplier

(J'

Effective Lagrangian rnultiplier

~)(n)

Normalized projection rnisalignment

1jJ' (n)

Projection misalignment

a(n)

Scaling parameter in norrnalized projection rnisalignrnent

u(n)

Difference between the true and estimated AIRs with scaling, i.e. h - a(n)h

z(n)

Difference between the estimated AIR at the nth iteration and the final estir
i.e., h(n) - h(oo)

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

List of Notations

Chapter 4
m

Frame index

b(n)

Noise/pulse input in the source-filter model of speech production

s(n)

Estimated s(n)

Is

Sampling frequency

Prediction order of linear prediction

Gain of input in the source-filter model of speech production


The pth coefficient of the all-pole filter
Fourier matrix of dimension L x L

-'tJ

e..(m)

Frequency-domain cross-correlation of the mth frame

hi(m)

Estimated hi in the frequency domain of the mth frame

hi,dp(m)

Direct-path cornponent of estimated hi in the time domain

Vyi(m)

Diagonal rnatrix of the Fourier transform of the tap-input vector


Forgetting factor

J(m)

Cost function of NMCFLMS

'l9 i (m)

Power constraint value

cpi(m)

Vector rotation factor

xvii

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

List of Notations

Chapter 5
Length of inverse filter
Inverse filter of the ith channel

gi(n)

Estimated gi

g(n)

Concatenation of the estimated inverse filters

Kronecker delta function

Estimated Kronecker delta function

H1-

Convolution matrix of the ith channel AIR

Hi(z)

z- transform of the ith channel AIR

Gi(Z)

z-transform of the ith channel inverse filter

D(z)

z- transform of the Kronecker delta function

1-

Concatenated convolution matrix of AIRs

hd

Direct-path component of AIR

Sd(n)

Direct acoustic path clean speech signal

Number of frames

Frame length

J(n)

Cost function of A-MINT

.Jsc(n)

Cost function of SC- MINT

xviii

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

List of Notations

xix

Chapter 6 and Appendix A


S(n)

Estimated speech signal of the sub-system for A-RAM

x(n)

Concatenated received signals of the sub-system for A-RAM

Concatenated true inverse filters

g(n)

Concatenated estimated inverse filters

e(n)

Differential output signal from the two sub-systems for A-RAM

Jar(n)

Cost function due to auto-relation

R 11

Expected of the auto-correlation matrix of the received signals


in the first sub-system of A-RAM

R 12

Expected of the correlation matrix of the received signals


in the first and second sub-systems of A-RAM

JAM(n)

Cost function of A-RAM

hi

Estirnated error for the ith channel

Regularization pararneter for RMINT/ A-RMINT

z(n)

Difference between the estimated and true inverse filters, i.e., g(n) - 9

a;

Variance of the clean source signal

-1

Inverse of the concatenated convolution matrix of the AIRs

Segrnent index of the original and estimated source signals

Nurnber of segrnents for the cornputation of the bark spectral calculation

B~

Bark spectral cornponent of the kth segment of the original speech

B~

Bark spectral cornponent of the kth segment of the dereverberated speecl

N;

Nurnber of critical bands in bark spectral calculation

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

Chapter 1

Introduction

1.1

General introduction to echo cancellation and


speech reverberation

An echo is the repetition of sound caused by the delayed transrnission of sound waves.
There are generally two major sources of echo in telecornmunication systems, hybrid
echo and acoustic echo [3] [4]. A telephone network for private prernises adopts a
two-wire subscriber line which is connected to the four-wire local exchange (for
long distance transrnission) via a two/four-wire hybrid bridge circuit.

A hybrid

echo therefore occurs when an impedance misrnatch exists between the two-wire
subscriber line and the four-wire truck line [5] [6].

A hybrid echo degrades the

quality of telephone speech and becomes annoying when the echo delay is significant
or its energy is relatively high. For long distance calls that are connected via a
satellite, the echo delay could last up to a few hundred rnilliseconds. Taking into
account the inherent delay of up to 200 ms in the telephone network, the hybrid echo

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

1.1 General introduction to echo cancellation and speech reverberation2

x(n)

Transmission
medium

1------- -------1
I

I
I
I
I
I
I
I
I

hen)

hen)

wen)

c~

~(n)

yen)

!y(n) +
Receiving
room

e(n)
}-----'---t----j

~-------------~

Acoustic echo
canceller

r-------'

Transmission
room

Figure 1.1: Schematic diagram of single-channel acoustic echo cancellation (after [1]).

will becorne significantly noticeable in digital mobile cornrnunication. As a sequence,


a network echo canceller is required in mobile switching centers [6] [7].
As opposed to network echo, an acoustic echo is mainly caused by the acoustic and possibly the mechanical coupling between the microphone and loudspeaker
in a hands-free system, such as an in-car telephony or tele-conferencing systern [1].
Figure 1.1 illustrates a schernatic diagram of a typical single-channel acoustic echo
cancellation (AEC) systern. As can be seen, acoustic echoes occur when an input
signal x(n) from the transrnission room is transmitted to the loudspeaker in the
receiving roorn which is in turn acoustically coupled to the receiving room's rnicrophone [1].
In addition to background noise, room reverberation is the other rnajor cause
of speech degradation [8] [9]. For any acoustic source in an enclosed environrnent,
the sound travels frorn the source to the receiver (e.g., human ears, rnicrophones) via

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

1.1 General introduction to echo cancellation and speech reverberation3

/
/

Wall

,"

'\

1 \
1 \
1
\

,
,

/
/

'~

/
/

"
I'

-,

,,

,,

1
1

,,

, -,

\
\
\
" \ Microphone
'

" hl(r~)

,,

~S(rl)

1
1
1
1

Array
J>l

(1'1)

~T2('l1)
:.1: AI

(rz,)

Figure 1.2: Illustration of acoustic propagation in a room.

a direct path as well as multipaths due to the reflections from the walls and ceiling.
Figure 1.2 illustrates the acoustic propagation from the source to a rnicrophone array
in a room. The reflected waves from the walls, ceilings and other surfaces require
a longer time to arrive at the microphones.

These signals are often attenuated

due to the absorption coefficient of these surfaces [10] and as a result, the received
reverberant signal is a mixture of the direct-path and reflected sounds.
Although reverberation adds warmth to the sound which is essential for rnusic
appreciation and enabling humans to better orientate thernselves in the listening
environrnent [11] [12], it leads to temporal and spectral srnearing of the clean speech.
This in turn distorts the frequency content of the signal. As a result, the received
speech sounds 'fuzzy' which reduces intelligibility, especially for the hearing-irnpaired
and elderly people [13] [14]. In addition, since reverberation alters the characteristics
and degrades the auditory quality of speech captured by a distant rnicrophone in the
roorn, it degrades the perforrnance of algorithrns that have been developed for a wide

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

1.2 Organization of the thesis

range of speech applications including hands-free technology, video conferencing,


automatic speech recognition and hearing-aids devices [15] [16]. One way to address
such challenges is to employ dereverberation signal processing algorithms.

1.2

Organization of the thesis

This thesis is organized as follows: Chapter 2 addresses the problem of slow convergence of the affine projection (AP) based non-blind channel identification algorithms. It is begun by reviewing various AP algorithms (APAs). As will be shown,
the development of existing algorithms do not take the sparseness of an acoustic
impulse response (AIR) into account. This problem is particularly important since,
an AIR can vary in sparseness depending on the environment and/or the sourcemicrophone distance. Existing proportionate APA (PAPA) [17] and improved PAPA
(IPAPA) [18] [19] have been proposed for AEC by updating the estimated coefficients in a manner that is proportional to their magnitudes. It is worth noting that
the success of IPAPA depends on a predetermined control parameter; a pre-defined
and time-invariant control parameter may not be suitable for every system since
the sparseness of the AIR rnay vary depending on the acoustic environment. In
view of this, two sparseness-constrained IPAPAs (SC-IPAPA-I and SC-IPAPA-II)
are presented for fast convergence.
Chapter 3 addresses the noise robustness issue of a blind channel identification
algorithm when additive noise is present. The effect of noise on the performance
of the multichannel least-mean-square (:\lCLlVlS) algorithm [20] is firstly analyzed
using a cross-relation based cost function. It is shown via mathematical analysis
that constraining the power of the received signals across all channels can prevent

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

1.2 Organization of the thesis

the algorithm from converging to the trivial solutions due to the additive noise.
A constrained cross-relation based cost function is then developed to address the
performance degradation of MCLMS in a noisy scenario. Monte Carlo simulation
results using different signal-to-noise ratios (SNRs) verify the effectiveness of the
proposed improved MCLMS (IMCLMS) algorithm.
In Chapter 4, the misconvergence problem of the normalized multichannel
frequency-domain least-mean-square (NMCFLMS) algorithm [21] is addressed for
the blind estimation of AIRs. It is begun by first reviewing the NMCFLMS algorithm. As will be shown, the estimated AIRs from NMCFLMS will converge
towards the null vectors in the presence of noise.

The direct-path NMCFLMS

(DP-NMCFLMS) algorithm [22], which is capable of achieving noise robustness


but suffers from slow convergence, has also been reviewed. Exploiting an additional
power constraint which confines the power of estimated AIRs, a DP-Nl\!ICFLMS
with power constraint (DP-NMCFLMS-PC) algorithm is developed. It has shown
through simulations that the proposed algorithrn not only addresses the misconvergence of NMCLFMS, it also achieves a higher rate of convergence compared to that
for the DP-NMCFLMS algorithm.
Equalization of the estimated AIRs is irnportant to achieve speech dereverberation. In view of this, Chapter 5 presents a sparseness-constraint adaptive rnultiple
input/output inversion theorem (SC-MINT) algorithrn for channel equalization. It
has been shown that the SC- MINT algorithrn is able to achieve a high rate of convergence in the context of speech dereverberation. In Chapter 6, an auto-relation aided
MINT (A-RAM) algorithrn which takes the received reverberant speech into consideration is further proposed. Simulation results, using both synthetic and recorded
AIRs, show that the proposed A-RAM algorithrn can achieve a fast convergence

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

1.3 Statement of the originality, main contributions and related


publications

compared to the existing MINT-based algorithms. Finally, Chapter 7 presents the


summary and discusses the future research direction.

1.3

Statement of the originality, main contributions and related publications

This thesis details contributions made by the author in channel identification (both
non-blind and blind, i.e., Chapters 2, 3 and 4) and acoustic channel equalization
(Chapters 5 and 6). The novelty of the proposed algorithms described in Chapter 2
is the utilization of the sparseness measure of an impulse response. Two mechanisms are proposed to achieve this.

In the first proposed sparseness-controlled

IPAPA (SC-IPAPA-I), similar to SC-IPNLMS [23], additional weighting terms cornputed based on the sparseness of estimated impulse response are multiplied to the
proportionate and non-proportionate terms in the conventional IPAPA. It is noted
that the success of IPAPA depends on the value of a control parameter which may
not be optimal under various operating environments. Therefore, in the second
proposed SC-IPAPA-II, the constant control parameter is replaced by a sparsenessdependent parameter. This reduces the need to pre-define the control parameter
prior to adaptation. In addition to the development of the two algorithrns, stability and perforrnance analysis is provided which allows one to gain insights as tc
why the proposed SC-IPAPA-I and SC-IPAPA-II can achieve fast convergence. ThE
publication related to this contribution is [24].
Another contribution of this thesis is the analysis of how noise degrades the
performance of MCLMS using a cross-relation based cost function as presented ir
Chapter 3. One of the rnain contributions in this chapter is to show that minimizing

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

1.3 Statement of the originality, main contributions and related


publications

the cost function of MCLMS in a noisy scenario would lead to a solution where all
the estimated channels are equivalent. Further analysis shows that minimizing the
power of a filtered version of the received signals and the cross-relation based cost
function simultaneously can suppress the degradation due to noise effectively. This
chapter also forms the foundation to frequency-domain blind system identification
(BSI) algorithms described in Chapter 4.
The main contribution in Chapter 4 is the development of a fast converging
noise robust BSI algorithm. This is achieved via a two-step approach; incorporating
a power constraint to the adaptation and subsequently estimating the time instance
when the algorithm starts to misconverge. The starting point of the algorithmic
development is NMCFLMS [21] which achieves fast convergence and computational
efficiency by exploiting the inherent properties of filter adaptation in the frequency
domain [25]. However, it has been investigated in [26] that NMCFLMS lacks robustness to additive noise. This misconvergence problern is addressed by exploiting
the DP-NMCFLMS algorithm [22]. Irnposing a power constraint to the adaptation,
the proposed DP-NMCFLMS-PC algorithm not only achieves fast convergence, it
also gains an improvement in the steady-state perforrnance. As will be shown in
Chapter 4, another contribution is the delta-norrn estirnation (DNE) algorithm for
DP-NMCFLMS-PC. This algorithrn perforrns an online rnisconvergence point estimation using a power constraint that is close to the power of the true AIRs. In
practice, this is only achievable by cornputing the power of the estirnated AIRs
before the algorithrn rnisconverges. Therefore, estirnation of when the algorithm
rnisconverges is irnportant for DP-NMCFL:lVlS-PC. The proposed DNE algorithrn
achieves this by rnonitoring the gradient of the l2- norrn of the estirnated AIRs. The
misconvergence point is then identified as the instance by which the change in the
the gradient of the estimated AIRs is smaller than a pre-defined threshold. This

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

1.3 Statement of the originality, main contributions and related


publications

contribution has been presented in [27].


In the domain of channel equalization, one of the main contributions is to
improve the convergence speed of the adaptive MINT (A-MINT) algorithm [28] by
suppressing any undesirable non-zero coefficients in the estimated Kronecker delta
function at each iteration. As will be described in Chapter 5, the novelty of the
proposed approach is the utilization of the sparseness of the estimated Kronecker
delta function and using this as an additional constraint to the A-MINT algorithm.
The publication associated with this contribution is [29].
Conventional channel equalization algorithms do not take into account the
source signal and consider only the estimated AIRs obtained from BSI algorithms.
Given that such equalization techniques have been developed independently of BSI,
the performance of existing approaches for dereverberation is limited to a great extent. One of the contributions in Chapter 6 is the development of an algorithm which
takes into the consideration of how the reverberant speech signals are generated by
utilizing a sub-system configuration. In addition to the constraints required by conventional equalization algorithms, the difference of the outputs of two sub-systems
is further minirnized. The equivalence between the deconvoluted received signals of
these sub-systems is termed as auto-relation. This auto-relation is subsequently exploited as an additional constraint to the existing A-MINT algorithrn. In addition,
mathernatical analysis of the auto-relation constraint is provided which shows that
this constraint confines the solution of equalization filters within a multi-dimensional
space. It is also explained through the use of convergence analysis why the proposed
A-RAl\1 algorithrn can achieve a higher rate of convergence cornpared to the existing l\1INT-based algorithrns. Sirnulation results have shown that A-RAM is able to
achieve fast convergence for the inverse filtering of room acoustics. The computa-

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

1.3 Statement of the originality, main contributions and related


publications

tional complexities of various algorithms are compared which shows that A-RAM is
more computational efficient than existing MINT-based algorithms. This work has
been accepted for publication in IEEE Transactions on Circuits and Systems I and
some parts of this work have been presented in [30].

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

10

Chapter 2
An Improved Affine Projection
Algorithm Employing Sparseness

Constraint for Acoustic Echo


Cancellation

2.1

Introduction

The use of adaptive filters for system identification has found applications in both
network echo cancellation (NEC) and acoustic echo cancellation (AEC) [23] [31].
Such adaptive filters are employed to estimate the unknown irnpulse response of the
systern and algorithrns developed for such applications require fast convergence as
well as good tracking performance. Although these requirernents are both important
for both applications, it is important to note that network impulse responses (NIRs)
and AIRs have different characteristics and hence adaptive algorithms developed for

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

2.1 Introduction

11

such applications differ.


Impulse responses of network hybrids are typically of length 64-128 ms. These
NIRs possess an active region of large amplitudes with a duration that is typically
of 8-12 ms [4] [32] and as a consequence, they are often considered sparse. Development of adaptive algorithms for NEC revolves around the use of proportionate-type
algorithms, such as in proportionate normalized least-mean-square (PNLMS) [33],
where the step-size of each filter coefficient is made proportional to the corresponding magnitude of the estimated filter coefficient. It was subsequently found that
the convergence rate of PNLMS reduces considerably with time due to the small
step-sizes used for filter coefficients with smaller magnitudes. To address this issue, the improved PNLMS (IPNLMS) algorithm [34] incorporates the proportionate
(PNLMS) as well as the non-proportionate (NLMS) terms. It was found in [35] that
the IPNLMS algorithm outperforms PNLMS for sparse AIRs in rnost cases.
In contrast to NIRs, AIRs are comparatively dispersive. This dispersive nature is brought about by the early as well as late reflections of an enclosed environment. This effect becomes more prominent especially for applications including
hands-free telephony [36] and/or robotic control [37] where the acoustic coupling between the loudspeaker and the microphone is significant due to the rnultipath effect
of the enclosure. Since such multipath propagation contributes to the late reflections
of an AIR, it is therefore foreseeable that the AIR will becorne comparatively sparse
when these reflections are reduced. Such sparse AIRs can occur in, for example, an
outdoor environment.
In this chapter, sparse system identification algorithrns is exploited for AEC
particularly if the AIR is sparse. Although proportionate-type adaptive filtering
algorithrns have originally been developed for NEC, the fast convergence of such

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

2.1 Introduction

12

algorithms is exploited when the AIR is sparse. This chapter begins by reviewing sparse adaptive algorithms in the context of the affine projection algorithm
(APA) [38] framework developed originally for NEe whereby the unknown impulse
response is sparse. One of such algorithms is the proportionate APA (PAPA) [17]
algorithm which jointly utilizes the APA and proportionate step-size technique that
has been presented in [33]. Similar to PNLMS, the PAPA achieves fast initial convergence but slows down subsequently due to the slow convergence of the coefficients
having significantly small magnitude [35]. The improved PAPA (IPAPA) [18] [19]
combines a weighted APA and PAPA such that the proportionate term associated
with PAPA is aimed at enhancing the convergence speed of the coefficients in the
active region while the non-proportionate term arising from APA ensures fast convergence for the coefficients having small magnitude in the non-active region. In
addition, frequency-dornain approaches have also been proposed for sparse system
identification [39] [40].
The contribution of this work is to further enhance the performance of IPAPA
for AEC by utilizing the sparseness measure [41] of an irnpulse response. As will be
discussed in Section 2.2.3, the success of IPAPA depends on a pre-defined control
parameter and it is foreseeable that this value may vary across different acoustic environments for fast convergence. In view of this, the sparseness of the estimated irnpulse response is incorporated in order to compute the time-varying weights assigned
to the proportionate and non-proportionate terrns. Two rnechanisms are proposed to
achieve this. In the first proposed sparseness-controlled IPAPA (SC-IPAPA-I), similar to SC-IPNLTvlS [23], additional weighting terms computed based on the sparseness measure of an estirnated irnpulse response are multiplied to the proportionate
and non-proportionate terrns in the conventional IPAPA. In the second proposed
SC-IPAPA-II, as opposed to existing IPAPA, SC-IPNLMS and SC-IPAPA, the pre-

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

2.2 Review of APAs for non-blind system identification

13

Input signal x(n)

h(n)

yen)

Adaptive
filter

e(n)

Figure 2.1: Schematic diagram of an acoustic echo canceller.

determined time-invariant control parameter in conventional IPAPA is replaced by


a sparseness-dependent control parameter. This reduces the need to pre-determine
the control parameter prior to adaptation. In addition to the development of the
two algorithms, the convergence performance and stability conditions has also been
analyzed which allows one to gain insights as to why the proposed SC-IPAPA-I and
SC-IPAPA-II can achieve fast convergence. Results presented in Section 2.6 shows
that the proposed SC-IPAPA-I and SC-IPAPA-II algorithrns outperform the IPAPA
as they achieve a higher rate of convergence and a lower steady-state performance.

2.2

Review of APAs for non-blind system identification

Adaptive algorithms in the context of non-blind system identification are reviewed


by defining, as shown in Fig. 2.1, the tap-input vector x(n) and the unknown irnpulse

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

2.2 Review of APAs for non-blind system identification

14

response h given by

x(n)
h

[x(n) x(n - 1) ... x(n - L + l)]T,

(2.1)

[ho hI ... hL -

(2.2)

I ] T,

where L is the filter length and [.]T denotes the transpose operator. Received signal

y(n) can be expressed by


y(n) = xT(n)h + v(n),

(2.3)

where v( 17,) is the observation noise. For clarity of presentation, the effects of v( 17,)
in the description of algorithms will be temporarily ignored .

2.2.1

Affine projection and regularized APAs

The APA [38] with projection order p ~ L is derived based on optimization subject
to rnultiple equality constraints such that

y(n - p

where

h( n)

y(n)

(2.4)

+ 1)

(2.5)

is an estimate of h. To compute

h( n)

iteratively, this optirnization

problern can be forrnulated, based on the principle of minirnal disturbance [42], as

min
h(n)

II h(n) - h(n -

1) II ~ ,

S.

/'.

t. y(n) - X (n)h(n) =

0pXl,

(2.6)

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

2.2 Review of APAs for non-blind system identification

--

1--

where "h(n)ll~ == Li:O h;(n),

Opx1

15

is the p x 1 null vector, y(n) == [y(n) y(n -

1) ... y(n - p + l)]T and the L x p matrix X(n) == [x(n) x(n - 1) x(n - p + 1)].
The solution of h(n) can be obtained by taking the Lagrangian for (2.6) such that

where A == [Ao ... Ap -1]T is a p x 1 vector containing the Lagrangian multipliers.


Differentiating (2.7) with respect to h(n) gives
--

--

1
2

h(n) == h(n - 1) + -X(n)A.

(2.8)

Substituting (2.8) into

-e(n) == y(n) - X T (n)h(n)


==

Opx1,

(2.9)

the vector of Lagrangian multipliers is given by j\ == 2 [X T (n )X( n)] -1 e( n) which


can then be substituted back into (2.8) to give

(2.10)

where I p x p is a p x p identity, p is the step-size and

<5

is a positive regularization

pararneter [42].
It has been shown in [43] that the choice of

<5

is irnportant in practice since,

if it is not chosen properly under low SNR conditions, the APA rnay not converge.
In order to mitigate the effects of noise, a regularized APA (RAPA) was proposed
such that [44]
<5

RAPA -

L(1+V 1+ ENR) 2
ENR
(J";r'

(2.11)

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

2.2 Review of APAs for non-blind system identification

where the echo-to-noise ratio ENR

==

O"~/ O"~ while

16

0";, O"~ and O"~ are the variance of

x(n), y(n) and v(n), respectively.

2.2.2

The

proportionate

affine

projection

algo-

rithm (PAPA)
Extending APA and in order to update the coefficients that is proportional to their
magnitude for sparse system identification, a proportionate APA (PAPA) update
equation can be written as [17]
--

--

h(n) == h(n - 1) + "2G(n)X(n)A/,

(2.12)

where A' is a p x 1 vector containing the PAPA Lagrangian multipliers and G(n) is a
diagonal proportionate step-size matrix of dimension L x L. The diagonal elements
of G( n) are related to the magnitude of the coefficients in

h(n

- 1) and can be

expressed as [33]

with I

~ Ki(n),

gl(n)

tq(n) /

rq(n)

max

{p x

max

{n; Iho(n)I,, IhL - 1(n)I}; Ihl(n)I},

(2.13)

(2.14)

== 0, ... , L - 1 being the tap-indices. The scalar n is included in (2.14) to

prevent hl(n) frorn stalling during initialization when h(O)

==

OLx1

while p prevents

coefficients from stalling when they are much smaller than the largest coefficient [33].
Solving (2.12) and e(n) defined in (2.9), A'

== 2e(n) [XT(n)G(n)X(n)] -1 and hence,

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

2.2 Review of APAs for non-blind system identification

17

the update equation of PAPA is given by

h(n) == h(n - 1) + fLG(n)X(n)R;;(n)e(n),


where R;;(n) == [XT(n)G(n)X(n) + 6"Ip x p ]

-1

(2.15)

and 6" is the regularization parameter.

It is therefore noted that, similar to PNLMS, the PAPA assigns a higher step-size
to filter coefficients with higher magnitudes.

2.2.3

The improved proportionate affine projection algorithm (IPAPA)

Similar to PNLMS, the drawback of PAPA is that it suffers from slow convergence
due to the small step-sizes allocated to coefficients with small magnitude. In order to
address this, an improved PAPA (IPAPA) was proposed in [18] [19]. This algorithm
is a combination of weighted PAPA and APA where elements of the control matrix
in (2.13) are now defined, similar to IPNLNIS [34], as
_ 1- a

gz ( n ) -

2L

(1 + a)lhz(n)1

--

21Ih(n)1I1

+f

== 0,1, ... ,L - 1,

(2.16)

given that (): is the control pararneter and ( is a srnall positive value that prevents
division by zero during initialization when h(O) == OLxI and

II . 111

is the [I-norm

operator. The parameter a is pre-defined and it determines the relative significance


of APA and PAPA. As can be seen from (2.16), when a == -1,

gz(n) ==

1/ L,

[ == 0,1, ... , L - 1,

(2.17)

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

2.3 The proposed sparseness-controlled APAs

18

and as a consequence, the IPAPA is equivalent to the APA. On the other hand,
when ex

== 1,
l

== 0, 1, ... , L - 1,

(2.18)

the estimated filter coefficients are normalized by the [I-norm of h(n) such that
the effective step-size of IPAPA is proportional to the magnitude of the estimated
filter coefficients. Therefore, the performance of IPAPA is equivalent to APA when

ex

-1 while for a

1, IPAPA performs similar to PAPA. When -1 ::; ex ::; 1,

a fast convergence of IPAPA is achieved by combining an APA term (1 - ex)j(2L)


and a proportionate term (1 + ex)(lh z(n ) l) j (2I1 h (n ) III + E). In practice, good choices
for ex are 0 or -0.5 [34] and results presented in [19] showed that ex == 0.5 is suitable
for very sparse AIRs.

2.3

The proposed sparseness-controlled APAs

One of the main weaknesses of the IPAPA algorithm is the need to determine n
that offers a high rate of convergence for different impulse responses. When the
estimated impulse response h( n) approaches towards the desired impulse response
frorn an initialized vector during adaptation, its magnitude and phase responses
may vary and therefore it may be desirable for ex to vary with iterations. It is
therefore proposed to assign a time-varying weighting to the proportionate and nonproportionate terrns in IPAPA iteratively according to the sparseness of h( n), rather
than using a pre-defined constant control parameter.

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

2.3 The proposed sparseness-controlled APAs

2.3.1

19

The sparseness measure

As stated in [41] [35] [45], the sparseness of an impulse response of length L can be
quantified using
_
L
.; - L - VI

where

Ilhllt

and

IIhll 2

[1 _VIllII hilthl1 ]
2

(2.19)
'

are defined as

(2.20)

(2.21)

As shown in [34] [35] a perfectly sparse irnpulse response with a single non-zero
coefficient has a sparseness measure

~.

== 1 while a perfectly dispersive system with

constant magnitude has a sparseness rneasure

==

o.

In order to further illustrate this concept, synthetic impulse responses with


various sparseness are generated using an exponential model and a random input
sequence [35]. This is achieved by first defining a LxI vector

[0

L z xlI

e- 1/T e

2/ T

(L" - 1)/ T ]

(2.22)

where L, is the length of leading zeros and L; == L - l., is the length of decaying
window while

is the decay constant. Defining a L; x 1 vector b as a zero mean

white Gaussian noise (WGN) sequence with variance (T~, the irnpulse response is

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

2.3 The proposed sparseness-controlled APAs

0.1

0.1
~=O.8518

~=O.9125

0.05

Q)

Q)

"0
~

.~

0.05

"0
~

~ -0.05

~-0.05

-0.1

-0.1
500

1000

~=O.7787

.~

1000

Coefficient index

0.1

~=O.7093

~0.05
~

Q)
~

500

Coefficient index

0.05

ctS

0.1

~.

t10

"0

20

.~

t10

ctS

~ -0.05

~-0.05

-0.1

-0.1

500

500

1000

1000

Coefficient index

Coefficient index

Figure 2.2: Impulse response generated using (a)


== 65 with a~ == 1.

== 5, (b)

== 15, (c)

== 35, (d)

subsequently generated by

diag{b},
OLzxLz OLzxLu

h
[

OL u t.,

u+a,

(2.23)

B L u xL11.

where a is a LxI vector generated using another zero rnean WGN sequence with
variance (J~. This vector a ensures that the elernents in the 'inactive' region are
srnall but non-zero. Figure 2.2 shows an example of impulse responses generated
using (2.23) with a~

== 1,

a~

== 3 x 10- 3 , L == 1024 and L; == 40, in which the

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

2.3 The proposed sparseness-controlled APAs

21

following decay constants (a) 7 == 5, (b) 7==15, (c) 7 == 35 and (d) 7 == 65 have
been used. These decay constants result in sparseness measures of (a) f, == 0.9125,
(b) f, == 0.8518, (c) f, == 0.7787 and (d) f, == 0.7093, respectively. Therefore, it can be
observed that a higher

2.3.2

corresponds to a more sparse impulse response.

The proposed SC-IPAPA-I and SC-IPAPA-II

Two sparseness-controlled algorithms, SC-IPAPA-I and SC-IPAPA-II, are proposed


to improve the performance of IPAPA by taking into account the sparseness of the
estimated impulse response

h(n) given

by

(2.24)

The key feature of the proposed algorithms is their ability to control the relative
significance of the non-proportionate term (1 - a) / (2L) and the proportionate term
(1 +

a)lhz(n) I/ (21Ih (n) lh + E)

in (2.16) according to fsc(n). The improved conver-

gence speed is achieved by assigning a higher weighting to the proportionate term


if the estimated impulse response is sparse. On the other hand, for an estirnated
irnpulse response that is dispersive, a higher weighting will be allocated to the nonproportionate term.

SC-IPAPA-I
Similar to [35], it is proposed to control the weighting of the non-proportionate and
proportionate terms by using 1 - 0.5~c(n) and 1 + 0.5~c(n), respectively. It is
worthwhile to note that if h(n) is initialized as a null vector, i.e., h(O) ==

OLxl,

the

l2-norm of the impulse filter coefficient IIh(O) 112 == O. Hence, in order to prevent

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

2.3 The proposed sparseness-controlled APAs

22

division by zero, and in contrast to [35], a small positive number

Esc

is incorporated

in (2.24) such that

(2.25)

By incorporating

the proposed sparseness measure given by (2.25) can be applied

Esc,

from n == O. Similar to

in [19] and [(n) in [35], computation of gz(n) for SC-IPAPA-

I can be expressed as
9z(n) ==

[1 - O.5~c(n)] 1- a + [1 + O.5~c(n)] (1 ~ a)lh/(n)l.


L

2L

21Ih(n)lh + E

(2.26)

for n2=:O and I == 0, ... , L - 1. It can be seen from (2.26) that, when ~c(n) is
high such as for a sparse impulse response, a higher weighting is assigned to the
proportionate term. Therefore the proposed SC- IPAPA-I achieves fast convergence
by taking the sparseness of the unknown system into account during adaptation.

SC-IPAPA-II
The PAPA achieves fast initial convergence but slows down subsequently due to
the small step-sizes allocated to coefficients with small rnagnitude.
pararneter

The control

in IPAPA addresses this issue and it has been shown in [19] that IPAPA

with a properly chosen

can improve the convergence speed of PAPA. Although it

has been shown in [19] that a choice of

==

0.5 achieves fast convergence in IPAPA,

the convergence of IPAPi\ may differ for different

values when estimating different

irnpulse responses.
Unlike [19] where a fixed value of
rnechanism for n is proposed.

is used for IPAPA, a new tirne-varying

This new control parameter is a function of the

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

2.3 The proposed sparseness-controlled APAs

estimated sparseness

23

2:c(n) such that


(2.27)

where 1/J{.} is a non-linear scaling function. Similar to the proposed SC-IPAPA-I,


the motivation of including the sparseness measure in the control matrix is to weigh
the non-proportionate and proportionate terms at each time iteration. When the
sparseness of the impulse response ~c (n)

==

0, it is desirable to have the proportion-

ate term in (2.16) as

[1 + cxsc(n)] Ihz(n)1
.21Ih(n)lh + E
while on the other hand, when ~c (n)

==

==

0,

(2.28)

1 for a perfectly sparse impulse response, it

is desirable to have
1 - cxsc(n)
- - L - - ==0.

(2.29)

From (2.28) and (2.29), two conditions for dispersive and sparse impulse response,
respectively, are therefore defined by

O!sc(n) =

1/J{ fsc(n) = o}

O!sc(n) =

1/J{ fsc(n) =

I}

= -1,

(2.30)

(2.31)

1.

There are various non-linear functions that can fulfill the conditions of (2.30)
and (2.31). Although it is useful to know that impulse responses, in general, follow an
exponential decay model [46], obtaining the time constant may be computationally
expensive [47]. For this reason, a third-order power function of ~c ( n) is proposed as

(2.32)

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

2.3 The proposed sparseness-controlled APAs

where K is a scalar. I note that K

24

== 8 satisfies conditions (2.30) and (2.31).

Figure 2.3 illustrates how Qsc(n) varies with ~c(n) in the proposed model. 11
can be seen that

Q sc

(n) increases exponentially when the sparseness of the impulse

response increases. This implies that the proportionate term achieves higher weighting when f(n) -t 1. On the other hand, the non-proportionate term achieves higher
weighing when f(n) -t O. Similar to (2.16), elements in the control matrix for the
proposed SC-IPAPA-II can be expressed as

(2.33)

where I = 0, ... , L - 1 and

'l/J{ fsc( n)}

is defined in (2.32). It is important to note

that, unlike [19] where the diagonal elements in the control matrix is dependent on
the predefined control parameter

Q,

the relative proportionate/non-proportionatE

weighting in the proposed SC-IPAPA-II is computed based on the sparseness ol


estirnated irnpulse response h(n) at every iteration.
To better understand how (2.33) allocates step-sizes to different coefficients
according to their rnagnitudes, Fig. 2.4 shows how values of the diagonal elements
of the control matrix 9[(n) defined in (2.33) varies with Ih[(n)l/max{ Ih[(n)l} for
different

~.

It can be seen that, for each

~,

the effective step-size increases with the

rnagnitude of the irnpulse response. It is also important to note that the gradient ol
the slope increases with

~.

This implies that filter coefficients with large magnitudes

will adapt using a larger step-size for a sparse systern cornpared to a dispersive systern. On the other hand, filter coefficients with small magnitudes in a sparse system
will be assigned with a srnaller step-size cornpared to a dispersive systern. Therefore,
the proposed SC-IPAPA-II does not require a pre-defined control parameter and it

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

2.4 Stability analysis

25

1...-----.-------.,.----.---.-----.------~

0.5

:s

tS

-0.5

-1

'-----'-_-L-

L - -_ _- L -

0.2

0.4

0.6

-'---_ _------1

0.8

Sparseness measure ~(n)

Figure 2.3: Control parameter varies with sparseness measure ~c(n).

results in a high rate of convergence that is robust to the sparseness of the impulse
response.

2.4

Stability analysis

Further insights of the proposed algorithrns are gained by analyzing their stability
as well as their convergence perforrnance. This allows us to determine the stepsize II to ensure convergence as well as justifying the irnprovement in convergence
performance over the IPAPA. Following the approach in [42]' the analysis is begun
by defining an impulse response error vector at time n as

c(n)

h(n) - h.

(2.34)

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

26

2.4 Stability analysis

oC---------'---------'-------'-----'-------'

0.2

0.4

0.6

0.8

Normalized I h(n) I coefficient

Figure 2.4: Magnitude of the elements in the control matrix with different sparseness

(~)

values.

Exploiting the APA update equation given by (2.10) and ignoring the effect of regularization parameter similar to [42], it can be derived that

(2.35)

for which the instantaneous correlation matrix can be expressed as

(2.36)

The R( n) can be further decomposed as

R(n) = Q(n)r(n)QT(n),

(2.37)

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

27

2.4 Stability analysis

where Q(n) is an unitary matrix whose columns are orthogonal set of eigenvectors
associated with the eigenvalues of R(n) while r(n) is a diagonal matrix with its
diagonal elements being the eigenvalues of R( n). These eigenvalues, denoted by

l'o(n) ,1'1(n), ... , l'L-l(n), are all non-negative and real. Assuming that the secondorder statistics of input signal is slowly varying, i.e., Q(n + 1)

Q(n), substituting

(2.36) and (2.37) into (2.35) and premultiplying the resultant equation by QT(n),
the following can be obtained

(2.38)

Defining z(n)

==

QT(n)c(n)

==

[zo(n), ... , zL_l(n)]T, z(n

[I Lx L

z(n + 1)

+ 1) can be

expressed as

JLr (n )] z ( n )

IT [I

Lx L -

JLr(m )] z (1)

m=l

(2.39)

where

is a matrix with the diagonal elernents corresponding to the eigenvalues of

R=E{R(n)},

(2.40)

given that E {.} is the rnathernatical expectation operation. For the kth rnode of
APA, the following can be obtained

k == 0,1, ... ,L - 1;

(2.41)

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

2.4 Stability analysis

28

where ~k is the kth diagonal element in

n -+

00,

r.

In order for h(n) to converge to h when

the step-size J-l should satisfy

-1 < 1 - J-Lryk < 1,

== 0,1, ... ,L - 1.

(2.42)

Therefore, the stability of the APA is guaranteed when


2

o < J-L < =--,

(2.43)

!max

where

rymax is the largest eigenvalue of R.


Similarly, for the proportionate-type APAs, i.e., PAPA, IPAPA as well as

the proposed SC-IPAPA-I and SC-IPAPA-II, convergence is guaranteed when the


step-size parameter satisfies

o < J-L <

2
=;--

(2.44)

!max

where

;Y~ax

is the largest eigenvalue of the correlation matrix

(2.45)

It is well known that the eigenvalue spread, which is defined as the ratio
between the largest eigenvalue and smallest eigenvalue of the correlation matrix (R
-I

and R ), has a significant impact on the convergence speed of steepest descent and
stochastic gradient algorithms [42] [48]. As the IPAPA and the proposed SC-IPAPAL SC-IPAPA-II have different diagonal proportionate matrices G(n), it is therefore
expected that these algorithms will have different eigenvalue spread. Figure 2.5
illustrates how the eigenvalue spread of the correlation matrix of IPAPA, SC- IPAPAI and SC-IPAPA-II varies for 128 ::; L ::; 1024. The input signal used for this

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

2.4 Stability analysis

29

........

SOO~-.------.-----.-------r---

400
"'C

co

OJ
~

~300
OJ
:::l

co

~ 200

OJ

.~

100

200

400
600
800
Length of Impulse Response

1000

Figure 2.5: Eigenvalue spread for IPAPA, SC-IPAPA-I and SC-IPAPA-II for 128 ::; L ::;
1024.

illustrative example is a zero rnean WG N and the impulse response depicted in


Fig. 2.2 (a) is used as the true irnpulse response. As can be seen from Fig. 2.5,
the eigenvalue spread of these algorithrns increases with increasing length of the
impulse response which implies that the correlation matrix tends to be more illconditioned when its dimension increases [49]. More importantly, it is observed that
the eigenvalue spread of SC-IPAPA-I and SC-IPAPA-II are consistently smaller than
that of IPAPA with SC-IPAPA-II achieving the least eigenvalue spread. Therefore, in
addition to the effective allocation of proportionate step-size; the reduced eigenvalue
spread shown here also justifies the desired improvement in convergence of SCIPAPA-I and SC-IPAPA-II over that of IPAPA.

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

2.5 Computational complexity

30

Table 2.1: Number of multiplications of IPAPA, RAPA, SC-IPAPA-I and SCIPAPA-II per iteration.
Algorithm
IPAPA
RAPA
SC-IPAPA-I
SC-IPAPA-II

Number of multiplications
(p + 1)2 + (p2 + 2p + 2) + p3 + p2 - P - 9
(p2 + 2p)L + p3 + p2 + 3
(p + 1)L 2 + (p2 + 2p + 5)L + p3 + p2 - P + 1
(p + 1)L2 + (p2 + 2p + 2)L + p3 + p2 - P - 4

Table 2.2: Number of additions of IPAPA, RAPA, SC-IPAPA-I and SC-IPAPA-II


per iteration.

2.5

Algorithm

Number of additions

IPAPA
RAPA
SC-IPAPA-I
SC-IPAPA-II

(p + 1)2 + (p2 + p + 3)L - p + 1


(2p+p2 -1)L+p+3
(p + 1)2 + (p2 + p + 6) - p + 2
(p + 1)L 2 + (p2 + p + 4)L - p + 3

Computational complexity

Tables 2.1 and 2.2 present the number of multiplications and additions per iteration
of IPAPA, RAPA, SC-IPAPA-I and SC-IPAPA-II, respectively. As can be seen frorn
Tables 2.1 and 2.2, IPAPA, SC-IPAPA-I and SC-IPAPA-II require 0(L 2 ) flops while
RAPA incurs the lowest computational complexity of O(L) flops since it does not
require computation of the proportionate matrix G( n) at each iteration. As will be
shown in Section 2.6, although SC-IPAPA-I and SC-IPAPA-II incur rnodestly higher
computational costs, the proposed algorithms can achieve significant performance
irnprovernent over RAPA and IPAPA.

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

31

2.6 Simulation results

2.6

Simulation results

The performance of the proposed SC-IPAPA-I and SC-IPAPA-II are compared with
IPAPA using the normalized misalignment defined by

r] n

) -_

lI h - h(n) lI ~

(2.46)

Ilhll~

Throughout the simulation, it is assumed that the length of the adaptive filter L
is equivalent to that of the unknown system. It has been studied in [50] that the
improvement in convergence reduces with increasing p. Therefore, to achieve a good
balance between fast convergence and cornputational complexity, the performance
of the algorithms is shown using an illustrative value of p

5 throughout the

simulations.

2.6.1

Simulations using WGN input

In this simulation, it is assurned that there is an echo path change midway during
the simulation where the irnpulse response is changed from a sparse to one that is
less sparse as shown in Figs. 2.2(b) and (d). The input signal is a zero mean WGN
sequence while another zero rnean WGN v(n) is added to y(n) as shown in Fig. 2.1
to achieve an SNR of 20 dB. In this simulation,
/-lSC-IPAPA-I

/-lSC-IPAPA-II

/-lIPAPA

= 0.18,

/-lRAPA

= 0.22 and

= 0.16 have been used. These step-sizes have been

chosen in order for RAPA, SC-IPAPA-I and SC-IPAPA-II to achieve approximately


the sarne steady-state perforrnance and for IPAPA to achieve the sarne initial convergence perforrnance. The control parameters for IPAPA, and SC-IPAPA-I are
aIPAPA

QSC-IPAPA-I

= 0.5. The regularization parameter for IPAPA, SC-IPAPA-I

and SC-IPAPA-II is chosen as

= 0.001 while for RAPA,

6RAPA

is determined by

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

2.6 Simulation results

32

Orr-----.--------.-------r-------,r-----.-------.---.

RAPA
RAPA

IPAPA, u=O.5

-5

C-IPAPA-I, u=O.5

-10

SC-IPAPA-I, u=O.

co -15
-0

-20

-25
-30
-35

L . . . - - _ - - . L -_ _ -<---_~.L.___-'-_

0.5

1.5

Number of iterations

_ _ _ _ L __ __ ' _ _ _ _ _ '

2.5

3
X 10

Figure 2.6: Convergence of IPAPA, RAPA, SC-IPAPA-I and SC-IPAPA-II using WGN
input with echo path changed midway during simulation.

(2.11). It can be observed from Fig. 2.6 that the normalized misalignment curves
of these algorithms increase midway during the simulation due to the change in
echo path. It is irnportant to note that before and after the echo path change, the
proposed SC-IPAPA-I and SC-IPAPA-II achieve an improvernent in steady-state
performance by more than 10 dB compared to IPAPA and they offer a higher rate
of convergence than that of RAPA. In addition, Fig. 2.7 illustrates the convergence
performance of these algorithrns using the same step-size of 0.16. As can been seen,
the proposed SC-IPAPA-I and SC-IPAPA-II consistently outperform RAPA and
IPAPA by offering a higher rate of convergence and lower steady-state.

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

2.6 Simulation results

33

O..----..--------.--------r---...------.--------.-----,

-5

RAPA

RAPA
IPAPA, a=O.S

IPAPA, a=O.S

-10

SC-IPAPA-I, a=O.

SC-IPAPA-I, a=O.S

co -15

SC-IPAPA-II

""C

-20

-25
-30
-35

'---_~

0.5

__

. L . . . . _ _ - - - ' - - ' - ' - - - - - ' - I . . J . . . L L . L . . . . _ _ _ - ' - - - -_

1.5

Number of iterations

_ J . . __ _...J.....___/

2.5

3
X 10

Figure 2.7: Convergence of IPAPA, RAPA, SC-IPAPA-I and SC-IPAPA-II using the same
step-size and WGN input with echo path change introduced midway during simulation.

2.6.2

Simulation using speech input

In this simulation, the perforrnance of the proposed SC-IPAPA-I and SC-IPAPA-II


is verified using a rnale speech input. As before, the same impulse responses as
described in Section 2.6.1 have been used.

The sarne step-sizes and the control

pararneters for each of the algorithrns described in the previous section have been
used as well. Sirnilar to the results shown in Section 2.6.1, the proposed SC-IPAPAII achieves the highest rate of convergence.

Cornpared to RAPA, the proposed

SC-IPAPA-I and SC-IPAPA-II achieve an irnprovernent of approximately 3 dB and


5 dB normalized misalignrnent, respectively.

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

2.6 Simulation results

34

0.5

1.5

2.5

3.5

4.5

Number of iterations

Figure 2.8: Convergence of IPAPA, SC-IPAPA-I and SC-IPAPA-II using speech input
with echo path change introduced midway during simulation.

2.6.3

Simulation using impulse responses generated by the


image model

In the next, the performance of the proposed SC-IPAPA-I and SC-IPAPA-II is verified using impulse responses generated by the method of images [51]. Sirnilar to
Figs. 2.6-2.8, it is assurned that there is an echo path change rnidway during the
simulation. The dimension of the room is 5 m x 6 rn x 5 rn and a loudspeaker is
placed at the center of the room. A sampling frequency

Is

8 kHz and reverbera-

tion time T60 = 300 ms are used. The AIRs are subsequently truncated to a length
of L

1024 sarnples. The position of the rnicrophone before the echo path change

was (2.5, 1, 1.6) m and the microphone is moved to (3.9, 2.3. 1.6) rn to sirnulate the
echo path change. The sirnulation setup for this irnage rnodel is shown in Fig. 2.9
while Figs. 2.10 (a) and (b) show the room impulse responses generated at positions

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

2.6 Simulation results

35

III

(b)

(a)

I----------------f~
~----------------. 5 III ----------------~

Figure 2.9: Simulation setup used in the image model to generate room impulse response.

'a' and 'b' labeled in Fig. 2.9. The sparseness measure of these impulse responses,
computed using (2.19), are ~a==0.6788 and ~b==0.8202, respectively. In this simulation, the input signal is generated by a zero rnean WGN sequence while the SNR
is set to 20 dB. Similar to Fig. 2.6, the step-sizes are
PSC-IPAPA-I

==

PSC-IPAPA-II

SC-IPAPA-I are

CXIPAPA

==

PIPAPA

== 0.18,

PRAPA

== 0.22,

== 0.16 while the control pararneters for IPAPA and

CXSC-IPAPA-I

== 0.5. The convergence of the algorithms

are shown in Fig. 2.11. As before, it is noted that the proposed SC-IPAPA-I and SCIPAPA-II achieve lower steady-state by approxirnately 10 dB compared to IPAPA.
In addition, the proposed algorithms offer a higher rate of convergence than that of
RAPA before and after the echo path change.

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

2.6 Simulation results

36

0.02

Q)

""C

(a)

0.01

::J

+-'

'c

eo

a
-0.01

100

200

300

400

500

600

700

800

900

1000

900

1000

Coefficient index

0.04

Q)

""C

(b)

0.02

:::J

+-'

'c

bO

a
-0.02

11 J a .r,....
T

100

200

300

400

500

600

700

800

Coefficient index

Figure 2.10: Impulse responses generated using the image model, L == 1024.

2.6.4

Simulation using impulse responses generated by the


image model and speech input

In addition to Section 2.6.3, the performance of SC-IPAPA-I and SC-IPAPA-II are


verified using impulse responses generated by the method of images and speech input. The sarne impulse responses as described in Section 2.6.3 are adopted and
the sarne speech input used in Section 2.6.2 are utilized. Sirnilar to Fig. 2.11, the
sarne step-sizes and control parameters are adopted for IPAPA, SC-IPAPA-I and SCIPAPA-II, respectively. As before, it can be seen from Fig. 2.12 that the proposed
SC-IPAPA-I and SC-IPAPA-II can achieve a higher convergence rate than that of
IPAPA before and also after the echo path change. More irnportantly, the pro-

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

37

2.7 Conclusion

O..------.----.----------r---...,.-----.------,------,

RAPA
IPAPA, a=O.5

RAPA

-5

IPAPA, a=O.5

-10
SC-IPAPA-I, a=O.5

SC-IPAPA-I, a=O.5

-15
co

"'C

- -20
~

SC-IPAPA-II

-25
-30

-35

0.5

1.5

Number of iterations

2.5

3
X

10

Figure 2.11: Convergence of IPAPA, SC-IPAPA-I and SC-IPAPA-II using WGN input
with impulse responses generated using the method of images.

posed SCIPAPA- II algorithrn achieves approximately 3 dB improvement of normalized misalignrnent while SC-IPAPA-I achieves approximately 1.5 dB improvement
as compared to IPAPA, respectively.

2.7

Conclusion

In this chapter, two sparseness-controlled affine projection algorithms are proposed


for AEC. The sparseness rneasure of the estirnated AIRs is utilized in the proposed
SC-IPAPA-I and

SC-IP~A.PA-II. More

specifically, in the proposed SC-IPAPA-I, ad-

ditional weighting terrns cornputed based on the sparseness measure of estimated


irnpulse response are rnultiplied to the proportionate and non-proportionate terrns

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

38

2.7 Conclusion

Speech

o
IPAPA,a=O. 5
/

0.5

SC-IPAPA-I,a=0.5

1.5

2.5

Iterations

3.5

4.5

5.5
4

x 10

Figure 2.12: Convergence of IPAPA, SC-IPAPA-I and SC-IPAPA-II using speech input
with impulse responses generated using the method of images.

in the conventional IPAPA. The proposed SC-IPAPA-II adopts this sparseness rneasure as a time-varying control parameter n(n) which assigns weight proportionately
to the coefficients in the estimated impulse response. This sparseness-dependent
weighting mechanism overcomes one of the main weaknesses of IPAPA since determination of a is no longer required for the proposed SC-IPAPA-II. Sirnulation
results using impulse responses generated by exponentially decay rnodel described
in [35] and the method of images [51] have shown that the proposed SC-IPAPA-I
and SC-IPAPA-II can achieve higher convergence rate than that of IPAPA.

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

39

Chapter 3
An Improved Multichannel
Least-Mean-Square Algorithm for

Blind Channel Identification

3.1

Introduction

Channel identification is the technique of building a mathematical model of an unknown dynamic system by analyzing its input/output data [21]. This problem of
fundamental interest arises in a variety of signal processing and communications applications [52] [53J. The self-recovering identification or blind channel identification
problern was originally described by Sato [54]. Since then, this research problem has
drawn widespread attention by many researchers who have developed algorithms
such as those presented in [55J [56] [57] [58]. Although these algorithms provide
reasonable channel estirnates under certain conditions, they often require relatively
large nurnber of data samples, which rnay limit their applications in an environment

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

3.1 Introduction

40

where the impulse responses are highly time-varying [59]. To address this issue,
many second-order statistics based algorithms have been proposed.
The least-squares approach [59] to blind channel identification introduces the
concept of cross-relation where the observed signal of one channel when convolved
with another channel's impulse response is equivalent to the convolution between
the impulse response obtained from the former channel and the observed signal of
the latter channel. It has also been shown that for the algorithm to identify the
channels uniquely and blindly, the polynomials of the channels are co-prime (i.e.,
they do not share any common roots) and that the auto-correlation matrix of the
source signal is full rank [59].
Based on the cross-relation concept, an adaptive multichannel least-meansquares (MCLMS) algorithm has been developed in [20] [60]. In these two papers, a
multichannel Newton (MCN) algorithm is also proposed. In order for the MCN algorithm to achieve a low steady-state, knowledge of the source signal auto-correlation
and the expectation of the cost function are required. However, as reported in [60],
such information are difficult to be obtained in practice. In addition, the l\1CN algorithrn is computationally rnore expensive than MCLMS since rnatrix inversion is
required [61]. In this chapter, the MCLMS algorithrn is described which serves as
the foundation for frequency-domain cross-relation based approaches in Chapter 4.
As will be shown in Section 3.5, MCLMS suffers from performance degradation in
a noisy environrnent.
The contribution of this chapter is the analysis of the cost function of MCLl\1S
which allows one to gain new insights as to why MCLMS is not robust to noise.
Thereafter, a constrained cross-correlation cost function is proposed to overcome
the perforrnance degradation of MCLMS in a noisy environrnent.

Monte Carlo

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

41

3.2 Problem formulation

Input

Additive
noise

Channels

Observations

v1(n)
I----...c }----..

YI (n)

L..------:""----'

sen)

~--.f

}----..

Y2 (n)

~--.f

l----a-

Y M (n)

~-=----'

~.....o...:...o---,

Figure 3.1: Illustration of the relationship between the input s(n) and the observations
Yi(n) in a SIMO system.

simulation results provided in Section 3.5 will show that the proposed improved
MCLMS (IMCLMS) algorithm is rnore robust to noise and can gain significant
improvement in steady-state perforrnance.

3.2

Problem formulation

A single-input multiple-output (SIMO) finite impulse response (FIR) system is considered as shown in Fig. 3.1. The observed signal Yi (n) of each channel is a combination of the additive noise vi(n) and xi(n) which is a filtered version of the source
signal s(n), such that

where hi

,1\1,

(3.1)

1, ... , AI,

(3.2)

= 1, ...

[hi,a hi,l'" hi,L_I]T is the ith channel irnpulse response, s(n) =

[s(n) s(n - 1) ... s(n - L

+ 1)] T,

L is the length of the channel and M is the

total number of channels. The aim of blind channel identification is to estimate the

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

42

3.2 Problem formulation

channels hi, i

== 1, ... ,M without any prior knowledge of the source signal s(n).

The MCLMS algorithm begins with the cross-relation between the sensor
outputs by considering

i,j==l, ... ,M,

in the absence of noise, where

* denotes

ii=j~

(3.3)

the linear convolution operator. Express-

ing (3.3) in vector form notation, the following can be obtained

(3.4)

where Yi(n)

== [Yi(n) Yi(n - 1) ... y(n -

+ l)]T.

However, in the presence of

noise, the above cross-relation in (3.4) no longer holds and an error function can be
defined as

_ { yT(n)hj
eij(n) -

yJ(n)h i , i i= j, i,j == 1, ... ,1\1,

(3.5)

i == j, i,j == 1, ... ,1\1.

0,

As a result, a cross-relation based cost function is


M-l

x(n) ==

L L

e;j(n),

(3.6)

i=l j=i+l

where eij(n) is now re-defined, similar to (3.5) but using the estimated impulse
response

hi (n ),

as

(3.7)
The channel irnpulse responses can be estimated by minirnizing (3.6). In order to
avoid a trivial estimate with all zero elernents, a unit-norrn constraint is imposed on

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

3.2 Problem formulation

43

-.

-.

-.

-.

the M LxI concatenated channel response vector h(n) == [h[(n) hf(n) ... ht(n)]

at each iteration [20] such that the error signal becomes

(3.8)

The corresponding cross-relation based cost function becomes


M-l

J(n) ==

~ ~

c: .c:

~=l J=~+l

Eij(n) ==

xn

I/h( )1/ 2 .
n

(3.9)

The result of minimizing (3.9) is the MCLMS algorithm given by [20]

~iYi(n)== Yi(n)y[(n),
LRYiYi(n)

(3.10)

-RY2Y1(n)

-RYMY1(n)

LRyiyi(n)

-R yMy2(n)

i#l

-RYl Y2(n)
R(n) ==

-RYlyM(n)
h(n + 1) =

(3.11)

i#2

-RY2Yl\1 (n)

L Ryiyi(n)
i#M
MLxML
h(n) - 2Jl[R(n)h(n) - x(n)h(n)]

Ilh(n) - 2p[R(n)h(n) -

where J-l is the step-size.

x(n)h(n)] 112'

(3.12)

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

3.3 Analysis of MCLMS in the presence of noise

3.3

44

Analysis of MCLMS in the presence of noise

In this section, the behavior of MCLMS is analyzed when vi(n) -# 0, i

==

1, ... ,1\1.

To do so, the error function (3.7) is first expanded as

(3.13)

where xi(n) == [xi(n) xi(n-1) .xi(n-L+1)]T, vi(n) == [vi(n) vi(n-1) vi(nL

+ n]",
(3.14)

is the cross-relation error due to source signal while

(3.15)

is the cross-relation error due to noise. Defining

[ef2(n) ef3(n) e(M_l)J\,1(n)]T,

(3.16)

[e~2(n) e~3(n) e(M_l)M(n)]T,

(3.17)

eX(n)

+ eV(n)

[e12(n) e13(n) e(M_l)M(n)]T

(3.18)

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

3.3 Analysis of MCLMS in the presence of noise

lSI

and employing

E{ e;(n)ev(n)} =

45

0, the expectation of the cost function in (3.6) can

be expressed as

E{ eT(n)e(n)}

E{x(n) }

E{Xx(n)}
M-l

where Xx(n)

+ E{Xv(n)},

L L

M-l

(eij(n))2 and Xv(n) =

i=l j=i+l

L L

(eYj(n))2. It can there-

i=l j=i+l

fore be observed that minimizing (3.19) is equivalent to E {


E{

(eYj (n)) 2} ---+ 0,

where R X i X i

(3.19)

(efj (n))2} ---+

0 and

which can be subsequently written as

E { Xi (n )xT (n)} and R V i V i

E { Vi (n )vT (n) }. Since the noise across

all the channels are assurned to be independent and identically distributed, i.e.,

R Vi Vi

Rvjvj'

-l,j = 1: ... , M, i

-::I i, it can be observed that (3.20) and (3.21)

contradict each other since

hi(n,)
hi(n)

(3.22)
i

-::I

j,

(3.23)

cannot be achieved sirnultaneously. Therefore, the presence of noise introduces disturbance to the JVICLJ\1S algorithrn.

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

3.4 Development of the IMCLMS algorithm

3.4

46

Development of the IMCLMS algorithm

In order to overcome the problem due to the presence of noise as described in (3.22)
and (3.23), it is proposed to estimate the channels by solving a constrained optimization problem given by

(3.24)

where h(n)

= [hf(n) hf(n) ... h'L(n)r.

In order to understand why the above

constrained optimization problem can address the problem of noise robustness, (3.24)
can be expressed as follows

.
~lln

~T (n)Yi(n)YiT (n)hi(n)
~ }
E {~~
LJ LJ hi

hen)

min

i=l j=i+l

I: t

E{h;(n)Yi(n)y;(n)hi(n)}

hen) i=l j=i+l


M-l

min

L L

h;(n)RYiYihi(n)

hen) i=l j=i+l


M-l

min

L L

hen) i=l j=i+l

h;(n) [R XiXi + RViVJhi(n)

(3.25)

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

3.4 Development of the IMCLMS algorithm

47

subject to the constraint

E{ ~ j~l [yT(n)hj(n) - yJ(n)hi(n)f}


~ j~l E{ [yf(n)hj(n) - yJ(n)hi(n)f}
M-l

"c: "c: ["-"'


. -. .
"-"'T (n)Rxjxjhi(n)
. -. .
h jT (n)RXixihj(n)
- hi
i=l j=i+l

(3.26)

Minimizing (3.25) and (3.26) simultaneously is equivalent to the following

(3.27)

Similar to the derivation of (3.20) and (3.21), the desired solutions of minimizing (3.27) are hi(n)

hi and hi(n)

= 0Lxl

which arise from

(3.28)

and
(3.29)

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

3.4 Development of the IMCLMS algorithm

respectively. It is important to note that the trivial solution hi(n)

==

OLxl

cs

be avoided by the unit-norm constraint as described in (3.8). It can now be ser


that the disturbance effect of the noise, i.e., hi(n)

== hj(n) as described in (3.2~

is eliminated. It can therefore be expected that the proposed IMCLMS algorith


based on (3.24) will enhance the performance of MCLMS.
Using the Lagrangian multiplier 0 < f3 < 1, R Y i Y i
assuming, R V i V i

==

Rvjvj'

== R X i X i + R V i V i , ar

a constrained cross-relation can be defined as

(3.31
i=l j=i+l

Similar to lVICLMS as shown in (3.9), a modified cost function exploiting the uni
norrn constraint can subsequently be obtained as

(3.3

It is therefore important to note that the unit-norm normalization is introduce


in (3.31) to avoid the null estirnate rather than addressing the noise robustne:
issue. To obtain the gradient of (3.31), the derivative of Jp(n) with respect to h(1
is taken giving

\1 J (n)
p

where

a~(n)
Bh(n)

~1

IIh(n)1I2

[a~(n)
Bh(n)

- 2J (n)h(n)]
p

(3.3~
,

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

3.4 Development of the IMCLMS algorithm

49

Furthermore, the following is evaluated

(3.33)

where

(3.34)

As a result of (3.31)-(3.34), the gradient of Jp(n) can be written as

(3.35)

where x(n) is defined in (3.G),

Ra

R Y1Y1

Ry2Y2

R YM YM

(3.36)

!'vfLxML

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

3.4 Development of the IMCLMS algorithm

L RYi Yi

- RY2Y1

-RyMYl

LRYiYi

-RyMY2

50

i:f;l
- RY1Y2

Rb ==

(3.37)

i:f;2

-R
Y1YM

-Ry2YM

LRyiYi

i:f;M

MLxML

The update equation of the proposed IMCLMS algorithm can be summarized as

'"

h (n + 1)

==

h(n) - fl[Rb

Ilh(n) -ll[R

b -

where

13' ==

x(n)IMLxML - J3'Ra ]h(n)

...,..,..-----=----------------''--------:-:--

x(n)IMLxML - j3I R a ]h(n)1


12

2M f3. In practice, the matrix

R YiYj

(3.38)

can be estimated using

2: 1,

(3.39)

where 0 < e < 1 is an exponential forgetting factor.

3.4.1

Convergence behaviour

It has been stated in [20] that the MCLMS algorithrn converges in the rnean if
1

o < fl < -,-,


"'max

where Am ax is the largest eigenvalue of E{R(n) - x(n)IMLxML} where R(n) is


defined in (3.11). This can be understood by the analysis of convergence as follows.
Define

z( ti + 1) == h(ti + 1) - h(00) where h(00) is the estirnate of h

state when n

-t

00. Since the unit-norm norrnalization is applied,

at the steady-

Ilh(n)112 ==

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

3.4 Development of the IMCLMS algorithm

51

and therefore

z(n + 1)

(IMLXML - 2p, [R(n)

~ x(n)IMLxMLJ)z(n)

+2p, [R(n) - x(n)IMLxML] h(oo).

(3.40)

Taking expectation on both sides of (3.40),

E{z(n + I)}

E{ (IMLxML - 2p, [R(n) - x(n)IMLxMLJ) }E{Z(n)}


+2P,E{ [R(n) - x(n)IMLxML] }E{h(OO)}

(3.41)

is obtained. It has been stated in [20] [60] that h(n) converges in the mean to the
eigenvectors of R
OMLxML.

E{R(n)} which implies that E{ [R(n) - x(n)IMLxML]}

As a result, (3.41) can be simplified to

E{z(n + I)} = E{ (IMLxML - 2p, [R(n) - x(n)I MLxAfL]) }E{ z(n)}.

To achieve such a steady-state, it is required that lim n -.+ oo


OMLxl.

(3.42)

(h( n) - h((0))

the eigenvalue decomposition is performed to E{ R(n) - x(n)IMLxML}'

i.e., E{ R(n)

~ x(n)hlLxML } = QAQT where Q is a square matrix whose columns

are formed by the eigenvectors of E{R(n) - x(n)IMLxML} and A is a diagonal


rnatrix whose diagonal elernents are the corresponding eigenvalues. Subsequently,
by defining w(n)

QT z(n)Q and pre-rnultiplying QT and post-rnultiplying Q to

both sides of (3.42), I obtain

w(n

+ 1)

(I Al L x Al L

2/LA) w (n )

(IA1 LxNJ L

2j1,A) n W (l ).

(3.43)

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

3.4 Development of the IMCLMS algorithm

To achieve its steady-state, it is required that lim n --+ oo (h(n) - h(oo))

w(n + 1) ==

0MLxl.

52

== OMLxl, i.e.,

This requires that the step-size f.1 of (3.43) satisfies

(3.44)

where Ai is the ith diagonal element of A. Since the diagonal elements of A are all
positive, (3.44) is valid when

(3.45)

where Am ax is the maximum diagonal element of A. It can be deduced from (3.45)


that
1

o < f.1 < -,-.


/\max

Following the same concept, in order for IMCLMS to converge in the mean, the
Jl should satisfy 0

<

x(n)IMLxML - P'R a ]

3.4.2

Jl

<

>.Lx'

where

>.:nax

is the largest eigenvalue of [Rb

Misalignment behaviour of MCLMS and IMCLMS

An analysis to explain why IMCLMS can achieve a lower steady-state than MCLMS
in the presence of noise is next provided under the same assumptions. For mathematical tractability, it is assumed in the analysis:

1. The channels hi, i

== 1, ... , M are independent processes and the polynornials

formed frorn hi, i

== 1, ... , Mare co-prirne, i.e., the channel transfer functions

do not share any common zeros [62] [63]. This assurnption is consistent with
the channel identifiability conditions.

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

3.4 Development of the IMCLMS algorithm

2. E{hTh} ==

IMLxML

and E{h}

53

== 0MLxl.

3. The step-size J1 is sufficiently small [42].


4. The input s(n) is a sequence of i.i.d. random Gaussian variables of zero mean

and variance

a; [42].

Define the normalized projection misalignment (NPM) [64]

( ) _ Ilh - a(n)h(n)ll~
IIhll~
,

1/J n -

where a(n)

(3.46)

== h:~hn) and II . 112 is defined as the 12-norm. It has been discussed

in [64] that the NPM is preferred over normalized misalignment for blind channel
identification applications since it quantifies the closeness of the estimate to the
original channels up to a scaling factor. Defining

u(n)

h - a(n)h(n),

(3.47)

the 12-norm of the projection rnisalignrnent can be expressed as

1)/(n) =

II h

- n (n) h(n) II ~ = u T (n,) u (n) = tr { u (n) u T ( n) } ,

(3.48)

where tr{} denotes the trace of a rnatrix. Assurne during the steady-state, the
scaling pararneter n(n) is stable such that n(n

+ 1) =

n(n). For clarity of presenta-

tion, the tirne dependency factor n of a( n) will be ignored in the following analysis.
Substituting (3.12) into (3.47), the projection misalignrnent of MCLl\1S is obtained

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

3.4 Development of the IMCLMS algorithm

54

as

u(n + 1)

h - ah(n + 1)
h - ah(n)

+ 2ILaR(n) -

(3.49)

2ILax(n)h(n)

[IMLXML - 2p,QR(n)] u(n)

+ 2p,QR(n)h - 2p,Qx(n)h(n).

It is important to note that x(n) -+ 0 when n -+

00

when MCLMS converges.

Therefore, (3.49) can be simplified as

u(n + 1) = [IMLXML - 2p,QR(n)] u(n) + 2p,QR(n)h.

(3.50)

Since a small step-size is adopted similar to [42], ILX(n) can be regarded as sufficiently
small so that the last term on the right-hand side of (3.49) can be ignored. For clarity
of presentation, the subscript of I in (3.51) will be ternporarily ignored. Exploiting
the independence between R(n) and u(n), the following can be obtained

R u ,n + l

E{u(n+ l)u T(n+ I)}

[I - 2p,QE{R(n)}] u..; [I - 2p,QE{RT(n)}]


+2P,E{ [I - 2p,QR(n)] u(n)QhTRT(n)} + 2P,E{QR(n)huT(n)
.[I - 2W:YRT(n)] } + 4112E { Q2R(n)hhTRT(n) }
[I - 2p,QE {R(n)} ] Ru,n [I - 2p,QE {RT(n)} ]
+2/lE {

(3.5J

[I - 2/wR(n)] u(n)nhTRT(n) + nR(n)huT(n) [I - 2/lnRT(n)]

+4p,2E {Q 2R(n)hhTRT(n)} .

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

3.4 Development of the IMCLMS algorithm

55

In this analysis, and for mathematical tractability, it is assumed E{hTh} ==


and E{h} ==

Ru,n+l

OMLxl.

IMLxML

As a result, (3.51) can be written as

Ru,n - 2fLQE{R(n)} - Ru,n - 2fLQRu,nE{RT(n)}

+4p?Q2 E{R(n) }Ru,n E {RT (n)}

Since Ru,n+l == Ru,n when n -+

00,

E{R(n) }Ru,n

(3.52)

+ 4fL2o? E {R(n)} E{RT(n)}.

it can be derived from (3.52) that

+ Ru,nE{RT(n)}

2fLQE{R(n) }Ru,nE{RT(n)}

+ 2fLQE{R(n)RT (n)}.

(3.53)

For clarity, define R == E{R(n)} and R == E{R(n)RT(n)}.

Determination of Rand R
Next, assume a white Gaussian input with zero mean and variance of

E{s(n)} == 0, E{s(n)s(n)} ==

a,; and E{s(n-I)s(n-T)}==O where I-l= T.

Yi(n-l) as the lth observed signal at the nth frame in channel i, where 0

a;,

i.e.,

Denoting
I

L-1,

it can be deduced that

E{Ryiyi(n)}

E{Yi(n)y[(n) }

(3.54)

u, (n )Yi (n)
E
LxL

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

3.4 Development of the IMCLMS algorithm

56

where each element of E {R Yi Yi (n)} is given by

E{ x;(n -l)}

+ a;.

(3.55)

It is noted that

E { [h[ s( n )] 2 }

E{ [h;,os(n) + h;,ls(n ~ 1) + '"

+ h;,L-ls(n -

L+ 1)]2}
(3.56)

in which the following assumption is utilized

E {hi,lhi,k} ==

I , I == k;

(3.57)

{ 0, I =F k.

Substituting (3.57) into (3.55), E{Yi(n -l)Yi(n -I)} == La;

+ a; is obtained. Fol-

lowing the same concept from (3.55) to (3.57) and noting that s(n) is uncorrelated,

E{Yi(n -l)Yi(n - T)} == 0 is obtained when I =F T. Therefore,


Las2

E{RYi Yi (n)} ==

+ a v2
0

La s2

+ a v2

== (La;
La s2

+ a v2

+ a~)ILxL'

LxL

(3.58)

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

3.4 Development of the IMCLMS algorithm

57

Following the same approach of (3.54)-(3.58), E{R1/iYj(n )} == OLxL when i =I- j. As


a result of the above, R can be expressed as

(1\1 - l)(La;

OLxL

(M - l)(La;

+ a~)ILxL

+ a;)IMLxML.

MLxML
(3.59)

In addition, it is observed that

E {[xi(n

~ I) + vi(n

-l)t}

(3.60)

E {:r; (n - I)} + 6E { :1:; (n - l) } E { 17; (11, -l) } + E {vi ('11, - l) } ,

E{x;(n-l)}E{v;(n-l)}

== a;a~

and E{vt(n-l)}

identically distributed Gaussian signal :r(n) with

== a~.

E{ :];(n)}

For an independent and


==

(s';, it has been shown in [65] that E{ x(n)xT(n)x(n)xT(n)}


where x(n)

==

[x(n) x(n - 1) x(n - L

be derived as E{yt(n - l)}

==

(L

+ 1)].

0 andE{ :1;(n):(;(n)}
==

(L

==

+ 2)a~IMLxML

Therefore, the result of (3.60) can

+ 2)a~ + 6a;a~ + a~.

Following similar approach

frorn (3.54) to (3.59),

== [( (]V!

- l)(L

+ 2)a~ + 6a~a; + a~) ] IMLxML

(3.61)

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

3.4 Development of the IMCLMS algorithm

5~

is obtained. In order to obtain the projected misalignment of the MCLMS algorithm


the trace of (3.53) is taken which results in

+ 2)a; + 6a;a~ + a;)],


/-La [M L((L + 2)a; + 6a;a~ + a~)]
(La; + a~) - j-la(M - l)(La; + a~)2

+j-la

[ML(M -1) ((L

Comparing the update equations of MCLMS and IMCLMS as shown in (3.12:


and (3.38) respectively, it is noted that the derivation of misalignrnent for IMCLME
will follow the same procedure from (3.49) to (3.53) as for MCLMS, except that R(n:
in (3.53) will be replaced by [R(n)-tJ'Ra(n)] where Ra(n) is defined in (3.36) and ~
is defined after (3.38). Following the same procedure from (3.54) to (3.61), for thr
ICLMS algorithm,

(AI - 1 - ;3)(La; + a~)IA,ILxl\:fL,

(3.63:

[( (M - 1 - (3)(L + 2)8; +

68;a;

+a~) ]IMLxML

(3.M

is obtained. Taking the trace of (3.53) and ernploying (3.63) and (3.64), the projectec
misalignment of IMCLMS is obtained as

1/JIM =

[M L ((L + 2)a; + 6a;a~ + a~) ]


(La; + a~) - j-la(Al - 1 - (3)(La; + a~)2
/-La

Comparing (3.62) and (3.65), it is noted that 1/J~M < ?P~/I because the denominatoi
of (3.62) is greater than that of (3.65) since 0 :S [3 :S 1. It is therefore expected thai

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

59

3.5 Simulation results

the proposed IMCLMS algorithm can achieve a lower misalignment than MCLMS.

IMCLMS,

f3 = 0.6

-5

f3 = 0.4

IMCLMS,

co

IMCLMS,

""tJ

~
a...

-10

IMCLMS,

f3 = 0.8

f3 = 0 2

MCLMS
IMCLMS,

\1

~
\ ... ,-

-15

""'-"

2000

..-

---...... ,
..... . - -

4000

6000

8000

10000

--

.~

.,...,..

-20

f3 = 1

12000

,:14000

- --"'"

16000

Number of iterations

Figure 3.2: The NPM performance of MCLMS and proposed IMCLMS algorithms using
Monte Carlo simulations with 100 trials while SNR
and 1 for IMCLlVIS.

3.5

20 dB, and f3

0.2, 0.4, 0.6, 0.8

Sirnulat.ion results

As explained in (3.46), the NPM is adopted to quantify the perforrnance of MCLl\1S


and Il\1CLMS. For these simulations, sirnilar to that of [20], 1\1 = 2 with h , =
2 cos( 1r /10)

1] T

and h 2

[1 -

2 cos( 1r /5)

1] T

[1 -

are adopted. These synthetic irnpulse

responses are used to rnodel irnpulse responses for comrnunication applications [20]

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

3.5 Simulation results

60

-2
-4
co -6

'"0

c,

-8

MCLMS

IMCLMS

-10
-12
-14
0

5000

10000

15000

Number of iterations

Figure 3.3: The NPM performance of MCLMS and proposed IMCLMS algorithms under
SNR == 15 dB using Monte Carlo simulations with 100 trials while (3 == 1 for IMCLMS.

[60]. In addition, noise is added to the ith channel observations such that the SNR
correspond to 20 dB and 15 dB. Following the sarne simulation setup of [20], a WGN
is chosen to demonstrate the principle of the algorithm. Similar approach can be
found in [66] where the WGN is chosen for diffuse noise to demonstrate the principle
of the algorithm. The step-size adopted for these algorithms is M = 0.01.
Figure 3.2 illustrates the NPM of MCLMS and IMCLlVIS for SNR = 20 dB
and ,3 =0.2, 0.4, 0.6, 0.8 and 1 for IMCLMS, respectively. Each of these plots is
generated from Monte Carlo sirnulations and averaged across one hundred trials.
Different ;3 values are employed in order to verify the significance of the constraint
shown in (3.30). As can be seen frorn Fig. 3.2, the steady-state performance of
IMCLMS improves as (3 -+ 1 and gains an improvernent in NPM of approximately
5 dB when ,3 == 1. Figure 3.3 shows another simulation using the sarne simulation

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

61

3.5 Simulation results

-co

MCLMS IMCLMS,

-5

~ = 0.4

IMCLMS,

-c

~ =0.6

~ -10

0..

ClJ
......,

ro

tr>- -15

-c

ro

IMCLMS, ~

ClJ
......,
Vl

= 0.8

-20
IMCLMS, ~

-25L--

------l.-

10

=1

---L-

15
SNR (dB)

...l.---

20

Figure 3.4: Variation of steady-state NPM for MCLMS and the proposed IMCLMS algorithms using Monte Carlo simulations with 100 trials from SNR = 6 to 24 dB.

parameters of Fig. 3.2 but with SNR = 15 dB and

f3 = 1. Sirnilar to Fig. 3.2, it can

be observed frorn Fig. 3.3 that the proposed IMCLMS algorithm achieves a lower
steady-state and gains an improvement in NPM of approxirnately 4 dB compared
to MCLMS.
Figure 3.4 illustrates the variation of steady-state NPM with SNR for different

f3. Similar to the above, these results were averaged over 100 trials and the

same parameters are adopted as that used to generate Fig 3.2. As can be seen, the
proposed IMCLMS algorithrn consistently outperforrns lVICLMS by achieving lower
steady-state NPlVI values. In addition, the steady-state NPl'vI value of IMCLMS
reduces as ,3 --+ 1 which justifies the need for the constrained cross-relation cost
function to reduce the noise disturbance effect as described in (3.20) and (3.21).

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

62

3.5 Simulation results

-8

P=1.2

-10

P=1.4

-12

iii' -14
2~

~ -16

P=1.0

-18
-20
-22
10

12

16

14

18

20

SNR(dB)

Figure 3.5: Comparison of NPM of the proposed IMCLMS algorithm with (3 == 1.0, 1.2
and 1.4 using Monte Carlo simulations with 100 trials from SNR == 10 to 20 dB.

In order to verify the effectiveness of the proposed IlVICLlVIS algorithm when


(3 > 1, additional simulations are conducted with ;3 = 1.2 and 1.4 using the same

simulation setup as those in Fig. 3.4 from SNR


shows only the results of j3

= 10 to 20 dB. For clarity, Fig. 3.5

= 1.0, 1.2 and 1.4. As can been seen from Fig. 3.5,

the NPM performance of the proposed IMCLMS algorithrn degrades as (3 increases.


It can also be observed from Fig. 3.5 that the NPl\1 values of j3
are consistently greater than those of i3

1.2 and 1.4

1.0. Furtherrnore, The NPM values of

fJ = 1.2 are smaller than those of ;3 = 1.4, which irnplies that a larger} leads to a
poorer performance of IMCLMS when the j3 > 1. This is actually consistent with
the convergence analysis presented in (3.65). It can be observed from (3.65) that a
lower NPM will be achieved as j3 increases within the interval [0, 1]. However, the
NPM performance will start to degrade once (3 > 1 because the denorninator will
start to increase. When /J > 1, a larger rJ will lead to a larger denorninator resulting
in a poorer NPM perforrnance.

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

3.6 Conclusion

3.6

63

Conclusion

In this chapter, the cost function of the MCLMS algorithm is analyzed in a noisy
environment and showed that the additive noise can mis-lead the adaptive algorithm to a trivial solution. This trivial solution requires the estimated channels to
be equivalent which contradicts to the desired solution since the channels are often
distinct in practice. In the proposed IMCLMS algorithm, this issue is addressed by
solving a constrained cross-relation cost function. It has been shown through mathematical derivation that minimizing such a constrained cost function can mitigate
the cross-relation error due to noise thus achieving noise robustness. Misalignment
analysis has also been provided for MCLMS and IMCLMS under the same assumptions, which explains why the proposed IMCLMS algorithm can achieve a lower
NPM value. This is due to the additional constraint in the proposed IMCLMS algorithm can rnitigate the cross-relation error due to noise. Monte Carlo simulations
under different SNRs have verified that the proposed IMCLMS algorithm is more
robust to noise and can outperforrn lVICLMS by achieving an improvement in NPM
of approxirnately 5 dB.

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

64

Chapter 4
Adaptive Blind Room Impulse
Response Estimation Algorithms
for Speech Dereverberation

4.1

Review

of Speech

Dereverberation

Algo-

rithms
The primary aim of speech dereverberation is to recover a source signal that has been
distorted by the multipath effect in an enclosed environment [63]. The problem of
speech dereverberation is often blind since the source signal is unknown. There
are many different ways to classify speech dereverberation algorithms. From the
perspective of the nurnber of rnicrophones deployed, speech dereverberation can be
classified into single- or multi-channel approaches. On the other hand; speech dereverberation methods can also be categorized into (i) source rnodel-based speech dere-

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

4.1 Review of Speech Dereverberation Algorithms

65

LPC coefficients

D
Will

Figure 4.1: Simplified block diagram of source-filter model of speech production (after [2]).

verberation, (ii) separation of speech reverberation via homomorphic transformation


and (iii) speech dereverberation via channel identification and equalization [67].

4.1.1

LP residual enhancement methods

Before reviewing different speech dereverberation algorithms, the speech production


model since rnany source-model based speech dereverberation algorithms are based
on this knowledge is first discussed. The source-filter model is considered as one
of the most well-known speech production models which simulates the articulation
systems [2] [68]. A simplified block diagram of this model is shown in Fig. 4.1. In
this model the excitation signal is either a pulse train for voiced speech or a randorn
noise for unvoiced speech. The speech signal is then written as
p

s(n)

L aps(n - p) + Gb(n),

(4.1)

p=l

where b(n), G, P and ap are the noise/pulse input, gain of the input, prediction
order and prediction coefficients of the all-pole filter respectively. Equation (4.1)
is the well-known difference equation for linear-prediction coefficient (LPC) which
states that the value of the present output speech signal s( n) can be deterrnined by
summing Gb(n) with a weighted sum of the past output samples. The coefficients of

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

66

4.1 Review of Speech Dereverberation Algorithms

Microphone received
signal

Distorted speech
residual

Dereverberated
speech

x(n)
LP residual
enhancement

LP coefficient

Figure 4.2: General block diagram of an LP enhancement method for speech dereverberation.

the all-pole filter ap can be obtained by analyzing s( n) using the autocorrelation


the covariance method [69]. Defining e(n)

0]

== s(n) - s(n) as the difference betweer

the original and estimated source, both autocorrelation and covariance methods
minimize the sum squared error within a time interval. The main difference between these two methods is that the autocorrelation rnethod introduces the Toeplitz
autocorrelation matrix which in turn gives rise to computational efficiency.
According to the source production rnodel discussed above, s(n) is producec
by an all-pole filter excited either by a pulse train or a random noise signal. Assuming
that an AIR can be expressed as a finite impulse response (FIR) filter which consists
of only zeros in its z-transform [51], it is foreseeable that roorn reverberation wil
introduce only zeros to the signals received by the rnicrophones. For this reason
reverberation will affect the excitation sequence hut not the all-pole filter in thr
speech production model [67]. In order to perforrn speech dereverberation, this clasi
of linear-prediction (LP) residual enhancernent based algorithms, such as presentee
in [9] [70] [71], modifies the speech excitation signals leaving the LP coefficient:
unaffected.
A block diagram for this class of algorithrn is depicted in Fig. 4.2. The residua
of a clean speech signal is desired to be a well-structured pulse train and a noise-lil
signal for voice and unvoiced region, respectively. For a reverberant speech however

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

4.1 Review of Speech Dereverberation Algorithms

67

the pulse train structure will be smeared [71]. Therefore, one can clearly identify the
distorted region from the LP residual e(n) of the received signal. Several methods
have been proposed to enhance the LP residual. In [9], the authors proposed to
compute a weighting function derived from the multiplication of a short-time fine
weighting function (with frame size of 2 ms) and a long-time weighting function (with
frame size of 20 ms) of the residual signal. The modified LP residual is designed
to emphasize high signal-to-reverberation ratio regions while attenuating regions of
low signal-to- reverberation.
Following similar concept, the authors of [72] proposed to reconstruct the
residual signal using the Hilbert envelope weighting. For an LP residual signal e( 11,),
its Hilbert envelope is defined as eHE(n)

==

J e2(n) + e~(n),

where

eH

denotes the

Hilbert transform of e(11,) obtained by switching the real and imaginary parts of
the discrete Fourier transform (DFT) of e( 11,) and subsequently taking its inverse
DFT [73]. It has been studied in [72] that the Hilbert envelope has larger amplitude when there is a strong excitation therefore making it a good indicator for a
voiced speech. Applying the Hilbert envelope weighting can enhance pulse train excited voiced speech thus leading to a less reverberant speech signal. An alternative
method airned at reducing the unwanted peaks in the residual signal is presented
in [74] through the use of wavelet clustering. The idea revolves around clustering
rnultichannel residual signals according to their wavelet extrernes and subsequently
finding an averaged single-channel residual signal. l\!loreover, a code excited linearprediction

((~ELP)

post-filter was proposed in [70] to reduce the quantization error

for those extracted pararneters used in either the speech envelope or excitation signal reconstruction. An alternative method presented in [75] modifies the residual
signal by exploiting its kurtosis. Since a reverberant speech signal is a mixture of
the direct-path and delayed speech signal, it will be close to a Gaussian-distributed

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

4.1 Review of Speech Dereverberation Algorithms

68

signal according to central limit theorem. It is therefore foreseeable that a clean


speech signal has a higher kurtosis value compared to its reverberant counterpart.
The authors of [75] proposed an adaptive algorithm to maximize the kurtosis of
the residual signal hence reducing the reverberation effect. All the algorithms reviewed above can mitigate the effect of reverberation due to early reflections but
show limited performance in removing effects due to the late reverberation [67].

4.1.2

Harmonic filtering

Another approach to source model-based speech dereverberation is via the use of harrnonic filtering. Instead of utilizing the speech production model, this class of speech
dereverberation techniques exploits the characteristic of a speech signal which assumes that it is composed of a fundamental frequency and a series of harmonics [76];
the speech signal sounds clear if all the harmonics correspond to the multiples of
the fundamental frequency components. Following this principle, the authors of [77]
proposed to decorrelate the speech signal by suppressing the non-harmonic components. However, it should be noted that estimating the fundamental frequency
can be challenging. Furthermore, the unvoiced speech segments are not analyzed or
processed. As a result of these shortcornings, algorithms based on such techniques
may achieve limited performance in dereverberation.

4.1.3

Speech dereverberation via channel identification and


inverse filtering

The use of blind channel identification and inverse filtering for speech dereverberation has attracted a lot of interest recently. The rnotivation of this approach relies

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

4.1 Review of Speech Dereverberation Algorithms

69

on the fact that reverberation can be modeled by the convolution of a clean speech
signal and the AIR from the source to the distant microphone. This approach involves a two-step process-estimating the AIR and inverse filtering of the received
signal with the estimated AIR. Estimation of the AIR is a blind problem since the
source signal is unknown and only the reverberant speech signal received at the
microphone is used for AIR estimation. Algorithms for acoustic BSI have gained
much research interest in recent years due to the advent of multimedia signal processing and wireless communication systems. For the case of multi-channel speech
dereverberation, the estimated AIRs are then used to design equalization filters in
order to mitigate reverberation introduced by the AIRs. This chapter involves the
development of BSI algorithms for dereverberation. Review of existing techniques
for BSI will be presented in Section 4.2.

4.1.4

Speech dereverberation VIa late reverberation sup-

pression
It is also interesting to note that reverberation can be categorized into early reflections and late reverberation where the early reflections are perceived to reinforce
the speech and the late reverberation is known to degrade the intelligibility of the
original speech [78] [79]. In view of this, reverberant speech can be sepatated into
two parts, i.e., early speech and late reverberant speech [80] [81]. While rnost of
the speech dereverberation algorithrns have been developed to recover the anechoic
signal, many algorithms have been developed recently to mitigate the effect of late
reverberation via spectral enhancernent. In [82], the author proposed to adopt a
statistical rnodel of late reverberation and subsequently estirnate the power spectral
of the late reverberant speech. As a result, an estirnate of the clean speech signal can

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

4.2 Review of BSI algorithms for acoustic channels

70

be obtained by magnitude spectral subtraction [83]. As has been stated in [84], the
power spectral density (PSD) estimation method described in [82] is only suitable
in noise-free data frames. In [84], the authors showed how the PSD of late reverberant speech can be estimated by exploiting an Optimally Modified Log Spectral
Amplitude estimator [85] in the presence of noise. To further enhance estimation
of the late reverberant spectral variance, a statistical model which incorporates the
energy contribution of the direct-path has been proposed in [86].

4.2

Review of BSI algorithms for acoustic channels

Estimation of AIRs can be achieved via BSI which was first proposed by Sato for the
purpose of equalizing communication channels [54]. Since then, many BSI algorithms
have been proposed and they can broadly be classified into second-order statistical
(SOS) and higher-order statistical (HOS) rnethods. These rnethods can be further
sub-divided into adaptive and non-adaptive approaches.
Typical non-adaptive approaches include the subspace rnethod [87] from
which only an eigen-value decomposition (EVD) is needed rnaking the algorithm
computationally efficient and attractive for the equalization of narrow-band signals.
In this subspace method, the orthogonality property between the 'signal' and 'noise'
subspace is exploited and a quadratic cost function is rninirnized such that the desired unknown impulse filter coefficients are estimated up to a scaling factor. As
discussed in [88], the rnain limitation of subspace rnethods is the lack of robustness
to the over estimation of the channel order. Furtherrnore, the tirne-varying property
of an AIR implies that it is necessary for the algorithrn to track the changes of the

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

4.2 Review of BSI algorithms for acoustic channels

71

AIR.
In view of the above, adaptive algorithms have drawn a lot of interest in
recent years. It should be noted however that a cost function based on the HaS
is barely concave which, as a result, suffers from slow convergence and can lead
to a local minimurn in the presence of noise [21]. On the other hand, many SOS
based algorithms can achieve, to some extent, good BSI performance. These algorithms include the linear prediction based subspace (LP-SS) algorithm [89] and the
two-step maximum likelihood (TSML) algorithm [90]. The first step of the TSML
algorithm yields an exact solution of the AIR by determining a unique null vector
of the covariance rnatrix derived from the channel output signals. The second step
can be considered as another iteration with an additional weighting constraint. The
authors of [90J did not propose further iterations beyond the second step due to its
high cornputational load and that the accuracy of the algorithm might not improve
asyrnptotically. It has been subsequently shown in [91] that although these algorithms can approach the Cramer-Rae bound, LP-LS and TSML require substantial
of input data in order to achieve convergence.
As discussed in [91]; a blind-channel identification algorithm needs to satisfy
three design requirernents: fast convergence, adaptability to variations in the AIR
and computational efficiency. In order to cater for these design constraints, one of
the adaptive algorithrns based on SOS is the normalized rnultichannel frequencydornain least-mean-square (NlVICFLlVIS) algorithm [21]. The NMCFLMS algorithrn
is a frequency-dornain extension of l\1CLl\1S described in Chapter 3. The advantage
of Nl\1CFLMS over

YICL~S

is the computational efficiency brought about by the the

cornputation of convolution in the frequency domain [25]. The NMCFLMS algorithm


also pre-whitens a colored input to some extent due to the inherent property of the

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

4.2 Review of BSI algorithms for acoustic channels

Input

Additive
noise

Channels

72

Observations

v1(n)
Yl(n)
Y2(n)

sen)

vM(n)
xM(n)

YM(n)

Figure 4.3: Relationship between input and output in a SIMO model.

fast Fourier-transform (FFT) [21] [25]. Due to its cornputational efficiency, another
advantage of NMCFLMS is that it can estimate AIRs that are longer. Therefore,
the NMCFLMS algorithm is more practical and popular than MCLMS for BSI.
However, as will be shown through simulations in Section 4.2.1, the NMCFLMS
algorithm suffers from misconvergence when the observation is contaminated with
additive noise. In this chapter, a noise robust AIRs estimation algorithm is proposed.
The proposed direct-path NMCFLMS with power constraint (DP-NMCFLMS-PC)
algorithm can also achieve high rate of convergence.

4.2.1

Review of the NMCFLMS algorithm

Consider a speech signal s(n) in a reverberant roorn and a SIMO model as


shown in Fig. 4.3. Defining vi(n) as the background noise at rnicrophone index
i == 1, 2, ... , AI, the received signal of the ith channel is given by

i == 1,2, ... , M~

(4.2)

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

73

4.2 Review of BSI algorithms for acoustic channels

where M is the number of microphones while

h.

[h1" 0 h,1" 1

1,

...

h,1" L_l]T ,

[s(n) s(n - 1) ... s(n - L

s(n)

+ l)]T,

such that hi is the ith channel AIR and L is length of each AIR. The additive noise

Vi(n) on each of the M microphones is assumed to be uncorrelated with the source


signal, i.e.,

E{ vi(n)s(n)}

T)} == 0 where I

=1=

==

E{vi(n)Vj(n)}

0 and that

==

0 and

E{vi(n - l)s(n -

T. The channel identification problem is therefore to estimate

hi using received signals Yi (n). For channel identifiability, it is assumed, similar to


Section 3.4.2, that the polynomials formed from the channel transfer functions hi are
co-prirne and that the autocorrelation matrix of the source signal is full rank [20].
Similar to Section 3.4.2, the perforrnance of these BSI algorithrns arc quantified
using

NP~;l

defined in (3.46).

As discussed above, the NMCFLMS algorithrn is a frequency-dornain extension of MCLMS which has been reviewed in Chapter 3. This implies that NMCFLMS
exploits the same cost function as described in (3.9) but implements it in the frequency dornain. To describe the implementation of NMCFLMS, the L x L identity,
null and Fourier matrices are defined by I L x L ,

OLxL

and F L x L such that the (p, q)th

elernent of F Lx L is given by

(F LxL )p,q ==

e-j27fpq/ L

p, q == 0, ... ,L - 1.

(4.3)

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

74

4.2 Review of BSI algorithms for acoustic channels

SNR=25

-1
~

-2

SNR=30

OJ
"'C

""'--"

~
o,

-3

-4
SNR=infinite

-5

_6L----------l...-----------'-----------J

100

50

150

Time (5)

Figure 4.4: Effect of noise on NPM for the NMCFLlVlS algorithm.

Following the notation of [92], the following matrices are also defined

WID

2LxL

[ILxL OLxL]T,

(4.4)

DI
W 2LxL

[OLxL ILxL]T,

(4.5)

Lx2L

[lLxL OLxL],

(4.6)

DI
W Lx2L

[OLxL lLxL],

(4.7)

WID

2LxL

F 2Lx2L WID
F- I
2LxL LxL'

(4.8)

DI
W 2LxL

I
I
F 2Lx2L WO2LxL
F-LxL'

(4.9)

WID

2Lx2L'

(4.10)

DI
W Lx2L

I
F LxL W DI
2Lx2L F2Lx2L'

(4.11)

WID

Lx2L

LxL

WID

2Lx2L

F- I

and the estirnated AIR In the Fourier transforrned dornain given by

hi (m)

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

4.2 Review of BSI algorithms for acoustic channels

F2Lx2L [ hi(m) ] == F2Lx2L


OLxl

75

W~1XLhi(m),

diag {F 2L X 2L [Yj(mL - L) Yj(mL - L

(4.12)

+ 1) Yj(mL + L - 1)]T}
(4.13)

such that Vyj(m) is a 2L x 2L matrix with diagonal elements containing the Fourier
transform of the tap-input vector Yj(m) and m is the frame index. As has been
shown in [21], the update equation of the ith channel AIR is

(4.14)

where

(4.15 )

It has been investigated in [26] that the NMCFLMS algorithrn lacks robustness to the additive noise vi(n). Figure 4.4 shows an illustrative example of how
NPlVI varies with tirne for Nl\1CFLMS in the presence of noise. The algorithrn was
evaluated using AIRs each of length L == 512 generated from the method of irnages [51] with 1\1 == 5 rnicrophones and sampling frequency

Is ==

16 kHz. For each of

these tests, vi(n) is added, as described in (4.2), giving SNRs of 35,30 and 25 dB. As
can be seen, the

N~1CFLMS algorithm

misconverges under different SNRs with the

effect of misconvergence being more significant for lower

S~Rs.

This implies that

the estirnated AIRs deviate frorn the true AIRs and it is therefore expected that

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

4.2 Review of BSI algorithms for acoustic channels

76

the performance of the speech dereverberation algorithm will degrade significantly


if such estimated AIRs are employed for equalization. It is also important to note
that the performance of the NMCFLMS algorithm degrades with increasing noise
power.

4.2.2

Review of the DP-NMCFLMS algorithm

To address the problem of noise robustness of the NMCFLMS algorithm, the directpath NMCFLMS (DP-NMCFLMS) algorithm was proposed in [26]. This algorithm
prevents misconvergence by constraining the direct-path component of the AIRs to
that of the true AIRs. To describe DP-NMCFLMS,

hi,dp

is defined as the direct-path

component of the true AIR of the ith channel such that its elemental position within
hi (m) is determined by the distance between the source and the ith microphone,
the DP-NMCFLMS algorithm constrains the estirnated direct-path cornponent hi,dp
using the following equalities

[0 0

o hi,dp

hi,dP(m)

0... 0]

[hi,o(m) hi,! (m) ... hi,dp

---.
hi,L-l

(4.16)

(m)

] T

hi(m,) + ~hi(rn),
F2Lx2L [ hi(m) ] ==
OLxl

(4.17)
F2Lx2L

W~~xLhi(m).

(4.18)

For each channel, the AIR is then estirnated iteratively using

(4.19)

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

4.3 The proposed DP-NMCFLMS-PC algorithm

77

where

pdPi(m) + 5I2LX2L]-I,

(4.20)

L1)~j(m)~~:(m) - PI[Pi(m)
j=1

+ 5I2Lx2L]-I,

(4.21)

L
j=1

; (rn)f1~~:(m),
10

(4.22)

---.

1)yj(m) W 2Lx Lf1hi (m) -

1)u. (m) W

10

---.

2LxLf1hj(m):

(4.23)

F LxLf1h i (m),

(4.24)
M

Pi(rn - 1) + (1 - (}) L
1)~j (m)1)Yj (m)
j=l,j=i

(4.25)

given that PI is the step-size, 5 is the regularization parameter and {} is the forgetting
factor.
It is therefore irnportant to note that the direct-path constraint addresses
the rnisconvergence of NM CFLMS by constraining the direct path of the estirnated
AIR

hi,dP

to that of the true impulse response

hi,dp

as can be seen from (4.17).

This constraint allows DP-NMCFLMS to search its solution within a constrained


multidirnensional space thus preventing it from rnisconverging to a solution that
does not satisfy hi (nL) == hi.

4.3

The proposed DP-NMCFLMS-PC algorithm

In order to develop a fast converging noise robust algorithrn, the problern of


misconvergence is first investigated by studying the variation of

II h(m) 112

across tirne iteration for NJVICFLMS as shown in Fig. 4.5, where

[hf(m) hI(m) ... h1(m)

with

h( m)

==

denotes the M LxI concatenated impulse response

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

4.3 The proposed DP-NMCFLMS-PC algorithm

vector. In this illustrative example, noise

Vi (n)

78

is added to the received signals to

achieve an SNR==25 dB. In addition,the same simulation settings are adopted as that
used in Fig. 4.4, in which the AIRs are generated from the image model [51] with a
room dimension of 4 m x 5 m x 6 m using a reverberation time T60 == 300 ms with
a sampling frequency

is == 16 kHz and then truncated to

L == 512. The input signal

is a continuous male speech with the same sampling frequency

is ==

16 kHz. As can

be seen from Fig. 4.5, Ilh(m)ll~ estimated from the NMCFLMS algorithm reduces
towards zero with time. Comparing with simulation results shown in Fig. 4.4, the
NMCFLMS algorithm misconverges at SNR == 25 dB while the NPM 1jJ(m) defined
by (3.46) approaches zero with time. This effect can be explained from
T/'o.

'ljJ(m) -* 0

h _

h h~) h(m)
hT(m)h(m)
/'0.

h(m) -* 0,

or

h T'''
h(m,) -* O.

(4.26)
(4.27)

Hence, it is important to note that after rnisconvergence, either the estimated (concatenated) AIR

h(m)

hTh(rn) reduces to

o.

reduces towards a null-vector or that the projection term


It is also interesting to note that the null estimate of h shown

in (4.26) is a sufficient but not a necessary condition of the trivial solution as presented in (3.21) for the case of time-domain MCLl\1S. This explains the difference
between the perforrnance degradation of MCLJVIS and rnisconvergence of NMCFLMS
in the presence of noise. To address the rnisconvergence of NlVICFLMS due to the
null estirnate, one possible solution is to adopt a power constraint at each iteration.

It is natural, at this juncture, for us to consider the unit-norrn constraint


as discussed in [21] [20]. In this unit-norrn approach, the estirnated AIR h(n) is

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

4.3 The proposed DP-NMCFLMS-PC algorithm

79

25
til

ex:

20

'"0
OJ
+-'

ro

E
+-'
til

OJ

15

'+-

NMCFLMS

';=

10

_N

'"0
OJ
~

ro
:::s

C"'
V')

a
50

100

150

200

250

Time (5)

Figure 4.5: Illustration of IIh(rn)lI using NMCFLMS algorithm at SNR==25 dB.

divided by its 12-norrn after each iteration, i.e., h(n) == h(n)/lIh(n)112' such that

Ilh( n) 112

is constrained to unity after every adaptation process. Therefore, a null

estimate as shown in (4.26) is avoided effectively. However, it can be shown that


irnposing such unit-norrn constraint on DP-NMCFLMS will not address the misconvergence problern. Figure 4.6 illustrates the performance of both DP-NMCFLlVlS
and DP-NMCFLl\1S with unit-norrn constraint at an SNR of 25 dB. The simulation settings are the sarne as that used to generate Fig. 4.5.

As can be seen,

the additional unit-norrn constraint degrades the perforrnance of DP-NMCFLlVlS.


The original rnotivation of introducing a direct-path constraint alone in [92] is to
guide the adaptive algorithrn's search direction towards the true AIRs.
achieved without the unit-norm constraint since the direct-path cornponent

This is
hi,dp

is

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

4.3 The proposed DP-NMCFLMS-PC algorithm

80

-0.5
-1
DP-NMCFLMS with unit norm constraint

~ -1.5
~

0..

-2
-2.5
-3

DP-NMCFLMS

/
50

100

150

Time (s)

Figure 4.6: Illustration of NPM results from direct-path and unit-norm constraint NMCFLMS at SNR==25 dB.

often large in terms of magnitude compared to other coefficients of the AIR. Figures 4.7 (a) - (c) show the true AIR, of the first channel, the estimate of this AIR,
using DP-NMCFLMS with and without unit-norrn constraint, respectively. As can
be seen from Fig. 4.7 (b), with the unit- norrn norrnalization, the direct-path cornponent

hi,dp

achieves similar magnitude compared to other coefficients within the

whole AIR. As a result, the effect of direct-path constraint is reduced giving rise
to rnisconvergence as shown in Fig. 4.6. It is interesting to note that the NPM
approaches 0 dB despite the unit-norrn constraint being applied. Although the unitnorm constrained NMCFLMS does not lead to a null vector since the norrn of hi(m)
is constrained to be unity, it leads the DP-NJ\!ICFLJVlS algorithm to another trivial
solution as shown in Fig. 4.7 (b). This set of

h( m)

is orthogonal to the true AIR

which satisfies (4.27). It was subsequently found, in this simulation example, that

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

81

4.4 Algorithmic derivation

-0.02
0

100

200

300

400

500

600

o. 2
o

r----..-----~............,__-----.~~_,___~~~______,___-~~~.,_-~~_____,_-~--~----,

-0.2
0
o. 02

100

200

300

400

600

500

r--~----,---~---~-~~---~~~--~~~--__,__~----c

r-----~-~~~ ",~-I''____J'~'J'----.-~-~~~r~~~-~--~~-~~

-0.02
0

100

200

300

400

500

(c)

600

Figure 4.7: (a) True h, (b) h(m) using DP-NMCFLMS with unit-norm constraint, (c)
h(m) using DP-NMCFLMS at SNR==25 dB.

hTh(m) =

0.0054 giving

orthogonality between

4.4

[llh - h:~<:)h(m)112/lIhI12] =

0.99. This shows that the

h( m) and h contributes to the misconvergence of NMCFLl\1S.

Algorit.hrnic derivation

To develop a fast converging BSI algorithm that is robust to noise, DP-NMCFLlVIS is


exploited for noise robustness while to achieve fast convergence, a power constraint is
incorporated using a vector rotation algorithm similar to that proposed in [93]. The
main rnotivation of introducing this power constraint is to prevent the estirnated AIR
from approaching towards a null vector while rnaintaining the effect of the directpath constraint of the DP-NMCFLMS algorithrn. A power constraint is imposed
to D P -NlVl CFLMS using the vector rotation rnethod such that

IIhi (Tn) II ~

= {); (m),

where {)i(rn) is a scalar denoting the constrained power at the mth iteration. The

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

4.4 Algorithmic derivation

82

problem of this power constraint can therefore be reformulated as


M-1

min
h

L L
i=l

s. t.

[y[(rn)hj(rn) - yJ(m)hi(m)f,

(4.28)

j=i+1

hi ,dp(1n) == hi,dp and

IIhi( m,) II~ == 'l9;(m).

Following the similar variable and matrix notations as defined in (4.4)-(4.7), another
matrix is defined as
10

W2Lx2L -

ILxL OLxL ]

(4.29)

0LxL 0LxL

such that pre-multiplying this matrix with a vector will null out its last L elements.
Similar to that of DP-NMCFLMS and applying the concept of Lagrangian multiplier,
the update equation is derived for the proposed DP-NMCFLMS-PC algorithm as

(4.30)

where 'Pi(m) is defined as the vector rotation factor, A, Band C are defined in
(4.20) - (4.22), respectively.
Comparing (4.30) and (4.19), the additional terrn cpi(1n)W~x2L modifies the
first term on the right hand side of (4.30) leaving the term ABC unaffected. In
addition, by constraining IIhi(m)ll~ == lJ;(m), the term CPi(m)W~X2L irnposes a
power constraint by rotating

h( rtL)

towards

h( m + 1)

along the gradient of surface

IIhi ( m) II~ == '13;( m) at each iteration [93]. It is therefore irnportant to note that the
proposed constraint not only constrains

IIhi(rn) II ~

to the target power constraint

79 i (rn) but also rotates the estirnated AIRs towards the true AIRs. To solve for
'Pi( m), let

(4.31 )

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

83

4.4 Algorithmic derivation

Expanding (4.31), the following can be obtained

73;(m)

A closed-form solution of <Pi(m) can therefore be achieved by solving for <Pi(m) giving

[W~~X2Lh~o(m)rQ/19T(m)+[ ([W~L2Lh~O(m)r Qr /19t(m)


- (QTQ

where Q

~ 19;(m)) / 19;(m)

(4.33)

ABC. In summary, (4.30) and (4.33) constitute the proposed DP-

NMCFLMS-PC algorithm. The proposed algorithm therefore contains not only the
direct-path constraint which has been exploited to increase the robustness to noise
but also a power constraint by rotating the filter coefficients along the tangential
surface of

4.4.1

Ilhi(rn) II~ = 1~;Crn)

in order to achieve a high convergence rate.

Determination of l);(rr~) using proposed misconvergence point estimation

Since 'l9;( m) is required in (4.31), it is therefore important to determine its value for
the proposed DP-Nl\1CFLl\1S-PC algorithm. This value should be estirnated such
that the power of the estirnated AIRs should be close to that of the true AIRs.
To address this, 19;(rn) can be estirnated by computing

IIhi (m ) II~

when

hi(m,)

is

close to hi. In practice however, one often does not have any information about the
true AIRs. Therefore, it becornes impractical for one to cornpute the NPM of the
algorithrn such as shown in Fig. 4.8 (a) in order to deterrnine how close the estirnated

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

84

4.4 Algorithmic derivation

(a)

:: [

I :
:

~ 5:
-50

200

(b) :

50

100

~~f I :
o

:
150

100

50

150

200

(c) :

50

100

50

100

250

J
250

150

200

250

150

200

250

Time (5)

Figure 4.8: (a) NPM of NMCFLMS and variation of (b) IIh(m)II~, (c) ~llh(m)lI~ and
(d) cost function J(m) with time at SNR==25 dB.

AIRs are to the true AIRs. An online cost function flattening estimation (CFE) was
first proposed in [92] to address a similar problem, where the cost function is defined
as
M-l

J(m) =

L L

[y[(m)hj(m) - y;(m)h;(m)r

(4.34)

i=l j=i+l

It was found in [92] that the flattening point of J(m) corresponds to the misconvergence point of NM CFLMS. However, since a typical speech signal often consists
of a large dynamic range, J ('In) suffers from large local fluctuation as shown in
Fig. 4.8 (d). Figure 4.8 (b) shows how Ilh(m)lI~ varies with time and it varies more

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

4.4 Algorithmic derivation

85

smoothly compared to J(m). Therefore, employing IIh(m) II~ is proposed to estimate


the misconvergence point. This proposed delta-norm estimation (DNE) algorithm
is achieved by defining

~IIh(m)II~ = IIh(m + 1)11~

Ilh(m)ll~

(4.35)

as the change in Ilh(m)II~. Figures 4.8 (a)-(d) show the correspondence between
the misconvergence point of NPM and the flattening point of Ilh(m)II~, ~llh(m)ll~
and J(m). It is observed that both the time instant corresponding to the flattening
point of ~lIh(1n)ll~ and Ilh(1n)lI~ are aligned with that of the misconvergence point
of NPM but the former approximates the actual time-instant of the rnisconvergence
point mme better. More importantly, it can be observed that Ilh(m)ll~ and ~llh(m)lI~
fluctuate less significantly than J(m). An accurate estimation of mme is crucial since
it is desired that

79; (m)

approximates the power of the unknown AIRs. To achieve

this, the estimate fii me is defined as time taken for

1~llh(m)II~1 <

0.0025

(4.36)

to occur. Once this condition is satisfied, Ilh(m)II~ ~ Ilhll~ and hence, the power of
estimated AIR of each channel will be confined by (4.31). Furthermore; the power
for each channel is constrained to

(4.37)

for m > mme, where Ilhi(mme)ll~ is the corresponding power of the ith channel AIR
at the mmeth iteration. Figure 4.9 illustrates an exarnple of how Ilh(m)ll~ varies
with tirne for the proposed DP-NMCFLMS-PC algorithm. In this sirnulation, the

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

4.5 Simulation results

86

25
VI

0:::

20

""0

OJ
......
co

Proposed DP-NMCFLMS-PC

..j3
V)

OJ

15

4-

0
'i= 10

_N

NMCFLMS

""0

co
::J
cr

V')

--------------------

--~------------------------

50

100

150

200

250

Time (s)

Figure 4.9: Illustration of IIh(m)lI~ using NMCFLMS algorithm at SNR=25 dB.

same simulation setup as that used to generate Fig. 4.5 is adopted. As can be seen
from Fig. 4.9, after the vector rotation power constraint, unlike the NMCFLMS
algorithm, Ilh(m)lI~ is prevented from converging towards zero.

4.5

Simulation results

The performance of the proposed algorithm is evaluated using AIRs generated from
the method of images [51]. The dimension of the roorn is taken to be 5 In x 6 m x 5 m
and a linear array consisting of M = 5 rnicrophones with a uniform separation of

0.8 m is deployed. The array center is located at (1.6,3,1.6) m while the first

and last microphones are placed at (1.6,1.4,1.6)


A reverberation time T60

111

and (1.6,4.6,1.6) In respectively.

= 400 Ins is used and the true AIRs are each of length

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

4.5 Simulation results

87

-1

NMCFLMS

-2

co -3

DP-NMCFLMS

"'C

sc,

-4

Proposed DP-NMCFLMS-PC with DNE

-5

-6

~proposed DP-NMCFLMS-PC

with direct observation of m

-7

100

50

150

me

200

Training time (5)

Figure 4.10: Comparison of BSI algorithms with SNR = 35 dB.

L = 512. In all cases, the source signal is a male speech sampled at 16 kHz.

Figure 4.10 shows the perforrnance of DP-NMCFLMS and the proposed DPNMCFLMS-PC algorithrn with

mmc

obtained from direct observation and that esti-

rnated by the proposed DNE rnethod with PI

0.09 at an SNR

pose of including the result obtained frorn direct observation of

35 dB. The pur-

mmc

is to compare

the perforrnance of DP-NMCFLMS-PC using an estimated misconvergence point

fn mc

as opposed to an unlikely scenario where perfect knowledge of

mmc

is avail-

able. This allows us to study the degradation in convergence performance of the


algorithrn due to the estirnation of

mmc

using DNE. In this sirnulation, the rniscon-

vergence point is detected at approxirnately 33 s and the power constraint for each

by the proposed DNE detection algorithrn. It can be noted from Fig. 4.10 that

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

88

4.5 Simulation results

-1
NMCFLMS

-2

co

"'0

io,

-3

DP-NMCFLMS

-4

Proposed DP-NMCFLMS-PC

with direct observation of m

me

-5
~Proposed

DP-NMCFLMS-PC with DNE

100

150
Time (5)

Figure 4.11: Comparison of BSI algorithms with SNR == 30 dB.

the proposed DNE estimation of

mme

is close to the actual misconvergence point

obtained through direct observation and that the performance of the proposed DPNMCFLMS-PC algorithm based on these two

mme

values are approximately the

same. It can be seen that the proposed DP-NMCFLMS-PC algorithm with DNE
achieves a higher rate of convergence compared to DP-NMCLFMS [26]. More specifically, the proposed algorithm is able to reach its steady-state performance in less
than 40 s compared to DP-NMCFLMS which requires more than 200 s to achieve
its steady-state.
Figures 4.11 and 4.12 show additional results with SNR == 30 and 25 dB respectively while the step-sizes for both two algorithms are chosen as PI == 0.08. The
misconvergence points are approxirnated by the proposed DNE rnethod at about
28 sand 17 s for the cases of SXR,

==

30 and 25 dB, respectively. These findings

are consistent with the fact that rnisconvergence occurs earlier with a lower SNR as
shown in Fig. 4.4. In Fig. 4.11, the power constraint for each channel is estirnated as

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

4.5 Simulation results

89

NMCFLMS

-1

OJ

-2

~ -3
z

DP-NMCFLMS
/

Proposed DP-NMCFLM5-PC with DNE

-4

-5

50

100

150

200

250

Time (5)

Figure 4.12: Comparison of BSI algorithms with SNR == 25 dB.

['19i, 'l9~, .. . , 'l9~] == [0.16462,0.063062,0.063062,0.063062,0.063062] while in Fig. 4.12,


it is estimated as ['19i, 'l9~, ... ~ 'l9~] == [0.072582,0.027052,0.027052,0.027052,0.027052].
As can be seen from Figs. 4.11 and 4.12, DP-NMCFLMS reaches its steady-state
at about 250 s while the proposed DP-NMCFLMS-PC algorithm converges much
faster and reaches its steady-state in less than 40 s. This shows that the proposed
DP-NMCFLl\1S-PC algorithrn can achieve a higher rate of convergence than that
of DP-NMCFLMS. It is also interesting to note from Figs. 4.10-4.12 that the proposed DP-NMCFLMS-PC algorithm achieves an improvernent in steady-state of
approxirnately 1.5 dB when SNR decreases to 25 dB. This irnplies that the proposed
DP-NlVICFLMS-PC algorithrn is rnore robust to noise cornpared to DP-Nl\lCFLl\/lS.

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

90

4.6 Conclusion

4.6

Conclusion

A power constrained DP-NMCFLMS algorithm is developed for BSI to address the


noise robustness problem of NMCFLMS as well as the slow convergence problem of
the DP-NMCFLMS algorithm in the presence of additive noise. It has been shown
in this chapter that the misconvergence of NMCFLMS are caused by either of the
following two factors-estimated channels converging towards the null vectors or that
the estimated channels being orthogonal to the real ones. It has also been shown
that the DP-NMCFLMS algorithm with unit-norm constraint cannot address the
misconvergence issue since the estimated channels converge towards a solution that
is approximately orthogonal to the real channels. As opposed to DP-NMCFLMS
with unit-norm constraint, the proposed DP-NMCFLMS-PC algorithm employ a
vector rotation approach and the DNE algorithm in order to constrain the power
of the estimated channels to that of the unknown AIRs. Simulation results show
that the proposed noise robust

DP-NMCFL~1S-PCalgorithrn

not only achieves fast

convergence, but also gains an irnprovernent in NPM by approximately 1.5 dB.

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

91

Chapter 5
A Sparseness-Constrained
Channel Equalization Algorithm
with Application to Speech
Dereverberation

5.1

Introduction

As discussed in Chapter 4, reverberation occurs when an acoustic signal propagates in an enclosed environrnent. It is well known that reverberation reduces the
intelligibility of the speech as well as degrading the performance of an autornatic
speech recognizer [94], particularly for hands-free rnobile devices. One possible way
to address this problern is to deploy speech dereverberation algorithrns via estimation of AIRs and channel equalization. This class of dereverberation algorithms
is popular because of its computational efficiency and its potential ability to deal

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

5.1 Introduction

92

with long reverberation time [21]. This chapter deals with the second stage of the
process-estimation of inverse filters for channel equalization.
By inverse filtering the reverberant speech signals with the estimated AIRs,
a good estimate of the original signal can be achieved. It is therefore important to
achieve a good estimate of the inverse filters for channel equalization. Approaches
for single-input/single-output (SISO) as well as SIMO equalization techniques have
been proposed. Algorithms developed for single-channel equalization include singlechannel least-squares (SCLS) and homomorphic equalization [95] [96]. In SCLS, an
adaptive algorithm minimizes the squared error between the outputs of the inverse
filter and a desired system while in homomorphic inverse filtering, the AIR is first
decomposed into minimum phase and all-pass components. An inverse can then be
estimated for the minimum phase component while the all-pass component is equalized using a matched filter [96]. However, it was found that using a matched filter
for the equalization of the non-minimum phase component will result in audible
residual echoes [97]. In addition, the least-square error (LSE) inverse filters often
require rnany coefficients which, in turn, introduces significant delay [94]. It was
subsequently concluded that although SCLS achieves less accurate inversion, it is
more efficient in practice [95]. One of the most popular multi-channel equalization
algorithms proposed for dereverberation is the use of rnultiple-input/output inversion theorem (MINT) [98]. In the context of roorn acoustics, achieving an exact
inverse of AIR is challenging since an AIR is often non-minimum phase [99]. The
MINT algorithm addresses this problem and estimates the inverse filters by exploiting spatial diversity using rnultiple rnicrophones. It is further shown, using Bezout
theorem, that as long as the AIRs are co-prirne, the SIMO system is irreducible and
there exists a set of inverse filters which can subsequently be used to recover the
source signal [67].

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

5.2 Review of single-channel equalization

93

In order to reduce computational complexity, the adaptive MINT (A-MINT)


algorithm has been proposed. The main contribution of this chapter is to enhance
the performance of A-MINT by developing a fast converging equalization algorithm
for the estirnation of inverse filters. This is achieved by suppressing any undesired
non-zero coefficients in the Kronecker delta function constructed from the estimated
inverse filters at each iteration. As will be shown, the above is achieved by introducing the concept of sparseness measure of the Kronecker delta function and using this
as an additional constraint to A-MINT. Simulation results presented in Section 5.5
illustrate that the proposed algorithm can achieve faster convergence than A-MINT.

5.2

Review of single-channel equalization

The estimation of a single-channel inverse filter is first reviewed by considering a single-input single-output system as shown in Fig. 5.1, where gl ==
[91,0 gl,l .. 9 1,L g - 1] T

inverse of the AIR h ,

and L g is the length of the inverse filter.

If gl is an exact

== [h1,0 h1,1 , . . . ,h1,L _ 1]T , the target response given by


L g-1

a; ==

91,L g - 1 - l h1,l

l=O

will be the Kronecker delta function such that

when k

==

otherwise,

Ld

+ 1,

(5.1 )

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

94

5.2 Review of single-channel equalization

Output

Input

hI

gl

FIR filter

FIR filter

Figure 5.1: Conventional least-square error based single-channel inverse filtering.

and L d is the arbitrary delay [97]. Speech dereverberation can be achieved under
this condition by convolving received signal

x(n) * 91

Xl (n)

with 91 since

s(n)*hl*gl
s(n)

* dk
(5.2)

In order to obtain a closed-forrn solution of (5.1), it is required that

(5.3)

where

o
o

HI =

o
o
is the (L

+ Lg -

(5.4)

hl,L-l

hl,o

hl,L-1

hl,L-l

1) x L g convolution matrix. The coefficients of the inverse filter gl

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

5.2 Review of single-channel equalization

95

can be found by solving, in z-domain,

(5.5)

where HI(z) and GI(z) are the z-transforms of h- and gI, respectively.

When

HI (z) is a minimum-phase system, a stable inverse filter can be found by replacing


the zeros of HI (z) with poles such that

(5.6)

The inverse filter gI can subsequently be found by the inverse z-transform of G I (z).
Although the above can be implemented in theory, equalization of a singlechannel AIR is not straightforward in practice since an AIR is often non- minirnum
phase [99]. As a result, (5.6) does not provide a stable causal solution for G I (z). An
alternative way to estimate the inverse filter gI is through the use of LSE adaptive
algorithm. However i in order to minimize the error, an inverse filter with high order
is required [94] which in turn translates to a high processing delay. Defining

(5.7)

as the estimate of gl at tirne iteration n, another drawback of this approach


is that the error energy

[d - HIi!;1 (n) [d - HIi!;1 (n)],

[91,o(n), 91,1 (n), ... , 91,L9-1 (n)

in which

gl (n) =

is the estimate of gl, does not converge to zero

regardless of the order of the inverse filter since the single-channel AIR is nonminimum phase [98].

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

5.3 Review of the MINT algorithm

5.3

96

Review of the MINT algorithm

In a multichannel acoustic system, the exact inverse of the system can, in theory,
be achieved based on a single-input J\;1-output system as shown in Fig. 5.2 [100]. In
order to achieve this, defining i as the channel index and M is the total number of
channels, Hi (z) and G i (z) rnust satisfy the relation
M

D(z) ==

L Hi(z)Gi(z) == 1,

(5.8)

i=I
where D(z), Hi(z), and Gi(z) are the z-transform of d, hi and gi, respectively.
Although MINT does not require the AIRs to be minimum phase, it requires Hi(z),
i

== 1,2, ... ,M, to be co-prime such that exact inverse of the system can be obtained

using
(5.9)
where 1[,

== [HI H 2

...

HMJ is the concatenated convolution matrix with Hi being

defined in the same form of (5.4).


Although the J\1INT algorithm is able to estirnate the inverse of an M -channel
system, the main disadvantage is its high computational complexity. Since the length
of hi can be in the order of several hundreds of sarnples, the dimension of the convolution matrix 1[, can becorne prohibitively large. As a result, the MINT algorithm
requires a high cornputationalload to invert 1[,T1[, as required by (5.9). When comman or near-common zeros exist between different channels, the assumption that

Hi(z), i == 1,2, ... ,AJ, are co-prirne becornes invalid. As a result, the convolution
rnatrix 1[, becomes rank deficient leading to an unreliable matrix inversion in (5.9).
Results presented in [40] showed a degradation in equalization performance of the
MINT algorithm in the presence of such near-common zeros.

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

5.3 Review of the MINT algorithm

97

s(n)

Figure 5.2: Inverse filtering to a single-input M-output system.

5.3.1

Review of the A-MINT algorithm

In order to reduce the computational complexity of MINT, the A-MINT algorithm


has been proposed in [28]. This algorithm is based on the least-mean-square (LMS)
algorithm and is developed based on a cost function defined by

J(n)

where

g( n)

[g;[ (n) ... g;1 (n)

==

lid -llg(n)II~,

(5.10)

is a M t., x 1 vector of inverse filter estimates.

The gradient of this cost function is given by

Y'J (n)

oJ(17,)
a( )
9 17,

= 21 d

+ 21

'"

1g(n),

(5.11)

which is subsequently ernployed in the update equation

g(17, + 1) == g(n) - pY' J (n) 19=9(11,),

(5.12)

where p, is the step-size.


It is noted that the advantage of A-MINT is not lirnited to the reduction of

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

98

5.4 Proposed sparseness-controlled A-MINT algorithm

computational complexity, but also its ability to efficiently adapt to changes of the
AIRs. However, as will be shown in Section 5.5, one of the main weaknesses of this
algorithm is its relatively slow convergence.

5.4

Proposed sparseness-controlled A-MINT algorithrn

To address the slow convergence of A-MINT, one can exploit the sparseness property
of Kronecker delta function. The sparseness of a vector has been defined in (2.19).
However, as opposed to Chapter 2 where sparseness is defined for the AIR, the
sparseness of d is employed to achieve fast convergence for the estimation of inverse
filters. It is important to note that the target response d which is in the form of a
Kronecker delta function is perfectly sparse. Therefore, the sparseness measure of
the target response, which is constructed using g(n), i.e.,

d(n)

llg(n)

(5.13)

is exploited for the development of the equalization filters. The proposed sparsenesscontrolled A-MINT (SC-MINT) algorithrn is obtained by rninimizing the l2-norm
of the difference between d and d(n) while maxirnizing ~(d(n)), i.e.,

min lid - llg(n,) II~, s. t. ((d(n))


g(n)

1.

(5.14)

Using a scaling pararneter /3, the constrained cost function can be expressed as

a; (n)

= II d

- llg(n) 1122+;3.(1 - ~ (d. -. . (n)) )2 .

(5.15)

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

5.4 Proposed sparseness-controlled A-MINT algorithm

99

To incorporate the above cost function to the adaptive framework, the derivative of
(5.15) is taken with respect to g(n) giving

8Jsc(n)
8g(n)

-21l T d

+ 21l Tllg(n)
~

+2;3 [1 -

a( 1 -

~(d(n)))

~(d(n)) ]

JML g

8g(n)

a( 1 -

~(d(n)))
(J9(n)
,

a ["1-l9(n ) 111 ]

ML g - J A 1 Lg 8g(n)

Illlg(n)112 .

(5.16)

(5.17)

Furthermore, if define

8111lg(n) 11 1
ag(n)
811 1lg(n) 112
8g(n)

a(n)
b(n)

(5.18)
(5.19)

then

In order to sirnplify (5.20), the following Lemrna is utilized: If x, y and z are


variables, a, band c are constants and
81ax

+ by + czl
8x

I. I denotes the absolute value operation,

8Jlax

+ by + czl 2
8x

+ by + cz)
-----a
(ax

lax

(5.21)

+ by + czl

a when ax

+ by + cz

{ -a otherwise.

; : : 0;

(5.22)

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

5.4 Proposed sparseness-controlled A-MINT algorithm

100

Defining 1 1 XL g == [1, 1, ... ,1] as a 1 x L g vector of ones, the M L g x 1 vector

(5.23)
81 1 X L g

12:t!:1 Higi(n) '8gM

and applying the above Lemma, one can obtain the L g x 1 vector

(5.24)

where 1

<l

<

1\1 and Ihil

== [Ihi,al, ... , Ihi,L_ll]T. Therefore, a(n) in (5.18) can be

obtained using

(5.25)
where the L g x 1 vector ai(n) == [11 X Llhi l,... , 11 X Llhi l]T, 1 ~ i ~ M. The derivation
of b(n) in (5.19) is illustrated as follows

b(n)

(5.26)

Substituting (5.26) into (5.20), one can obtain

II1g('(1,) 1111T1g(n)
lI1g(n) II~

(5.27)

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

5.4 Proposed sparseness-controlled A-MINT algorithm

101

Substituting (5.16), (5.17) and (5.27) into (5.15), the update equation of SC-MINT
is obtained as

g(n + 1) == g( n) - /-LV Jsc(n) Ig=g(n)'

(5.28)

where

with a( n) being defined in (5.25).


It is important to note that SC-MINT is equivalent to A-MINT when f3 ==

o.

Analysis of (5.14) and (5.29) is required to achieve a closed-form solution of

p.

This analysis requires one to equate (5.29) to zero and making g(n) the subject of
the equality. However, it is noted that due to the non-linear cornplexity of (5.29)
with respect to g(n), obtaining a closed-form solution of (3 is irnpractical. As such,
f3 is proposed to be determined empirically as will be illustrated in Section 5.5.

Cornparing (5.29) with (5.10), it can be observed that SC-MINT takes into account
the sparseness measure of

d( n).

The use of such sparseness measure confines the

search space of SC-MINT such that it avoids, at each iteration, solutions where
more than one coefficients in

d( n)

convergence cornpared to A-MINT.

are non-zero. This results in a higher rate of

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

5.5 Simulation Results

5.5

102

Simulation Results

The performance of SC-MINT is compared to A-MINT by the normalized misalignment defined by


(5.30)
The performance of A-1VIINT and Se-MINT is verified using AIRs generated by
the image model [51]. In this simulation, a room of dimension 5 m x 6 m x 5 m
is used and the loudspeaker is placed at (1.6, 3, 1.6) m. A sampling frequency of

is == 16 kHz

and reverberation time T 60

==

subsequently truncated to a length of L

400 ms has also been used. The AIRs are

==

1024 samples. To illustrate the steady-

state performance of A-MINT and SC-MINT, a WGN b, is added into hi prior to


the equalization with the channel-to-noise ratio (CNR) being defined as

i == 1, 2~ ... , M.

(5.31)

Figure 5.3 shows the steady-state performance of A-1VII:\"T and SC-MINT in a fivechannel model with CNR

==

35 dB. As can be seen from Fig. 5.3, the steady-state

performance of the proposed SC-l\1INT algorithm varies with different j3 values.


With approximately the same initial convergence, ;3
steady state while ;3

==

==

0.05 results in the lowest

0.1 leads to the highest. Because of the non-linear property

of the sparseness constraint, the variation of steady state does not exhibit a fixed
pattern with different /3 values as shown in Fig. 5.3. It can be observed that with
approximately the same steady-state perforrnance, the convergence of SC- MINT
with /3

==

0.05 is increased significantly compared to A-MINT.

Figures 5.4 and 5.5 show the convergence perforrnance of A-MINT and SCMINT using the sarne AIRs as those adopted in Fig. 5.3 with CNR

==

30 and 20 dB,

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

103

5.5 Simulation Results

10

o
-10

f3 =0.02

,,
,,

f3 =0.01
A-MINT

f3 =0.04

(!

f3 =0.08

\,,

-40

f3 =0.05

-50

"

-30

f3 =0.1
f3 =0.09

1000

--f3 =0.06
2000

- --

---

f3 =0.07
3000

... ~-----

f3 =0.03

4000

5000

6000

Number of iterations

Figure 5.3: Steady-state performance comparison between A-MINT and proposed SCMINT when CNR == 35 dB using AIRs generated by the image model with (3 == 0.01,0.02,
... ,1.

respectively.

For clarity, only two

f3 values of 0.02 and 0.05 are chosen for SC-

1V1INT to illustrate the significance of the sparseness control constraint when


The step-sizes are chosen as
fl'SC-MINT

== 0.05 for

;-J

/-LA-MINT

== 0.03,

/-LSC-MINT

/3 -# o.

== 0.03 for f3 == 0.02 and

== 0.05 in order to achieve the same steady-state norrnalized

rnisalignrnent performance. As can be seen from Figs. 5.4 and 5.5, the proposed
SC-1VIINT algorithm achieves a higher rate of convergence than that of A-MINT; it
achieves a 3 dB and 5 dB irnprovement in normalized misalignrnent over A-MINT
for /3

== 0.02 and 0.05 during initial convergence, respectively. This shows that the

sparseness constraint of SC-lVIL\T confines the search space such that g(n) converges
faster to 9 in order for

d( n)

to achieve a perfectly sparse Kronecker delta function.

In the next, the application of equalization filters in the context of inverse


filtering for speech dereverberation is illustrated using recorded AIRs. In this sim-

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

104

5.5 Simulation Results

5~--r-----,----.----r-----r-----r------,

o
-5

A-MINT

--10
co
""0

SC-MINT,

~ -15

P= 0.02

SC-MINT,

-20

P=0.05

-25
-30L_~==~~:E~~~~~~~~

1000

2000 3000 4000 5000


Number of iterations

6000

7000

Figure 5.4: Convergence comparison between A-MINT and proposed SC-MINT when
CNR == 30 dB using AIRs generated by the image model.
ulation setup,

Is == 16 kHz

and AI

== 5 AIRs were recorded in a room of dimension

8 m x 10 m x 3.4 m. The centroid of the microphone array (with a microphone


spacing of 12 em) and the source were positioned at (4, 4, 1.6) rn and (4, 2, 1.6) m,
respectively. An example of one of the channels of the AIRs is shown in Fig. 5.6 (a).
The significance of the convergence rate of A-MINT and SC-MINT in speech dereverberation is verified by comparing their performance in terms of Segmental Signalto-Reverberation ratio (SRR) [94] defined as

- K L

k=O

where the variable sd(n)

2( )

~n=kN+N-I

K-I

SRR(n) - 10 ~

10

~n=kN

glO ~n=kN+N-I
~n=kN

sd
[

Sd

) _/'..(
S

)J2'

(5.32)

== s('n) * hd , hd is the direct-path component of the AIR,

s(n) is the clean speech, K is the nurnber of frames and N is the length of each
frame. In the recorded AIRs, the direct-path components of various channels (hd

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

5.5 Simulation Results

105

O~---..,------.-------.--------r-------,

A-MINT

-5

SC-MINT, ~=O.02

SC-MINT, ~=O.05

-15

-20'--------L--------'-------------'------'----------J
o
1
2
3
4
Iterations

Figure 5.5: Convergence comparison between A-MINT and proposed SC-MINT when
CNR

= 20 dB using AIRs generated by image model.

of channels 1-5) have approximately the same magnitude and the mean of thern
has been adopted in (5.32). In this simulation, a rnale speech sarnpled at 16 kHz
with a duration of approxirnately 5 s is used. Sirnilar to all other simulations in
this chapter, a five-channel model is adopted and a WGN is added to hi to achieve
CNR

20 dB. The reverberant speech xi(n) is obtained by convolving s(n) with hi

while the dereverberated speech is achieved by s(n) = L:~=1 xi(n)

* rli(n).

The norrnalized rnisalignrnent convergence of A-MINT and SC-lVIINT using


the same step-sizes and ;3 values as in Figs. 5.4 and 5.5 is shown in Fig. 5.6 (b) while
the SRR perforrnance is illustrated in Fig. 5.7. It can be noted frorn Fig. 5.6 (b) that
SC-MINT achieves a higher rate of convergence compared to A-l\1INT. In addition,
as noted from Fig. 5.7, SC-MINT requires only approximately a third of the adaptation time of A-MINT to achieve the same SRR. As can be seen frorn Figs. 5.4-5.7,

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

5.5 Simulation Results

106

(b)

A-MINT

co

-6

"'C
-

-B]:

/
SC-M INT,

f3 =0.02

-10~
I

-12~

SC-MINT,

f3 = 0.05

r
I

- 14

345

Iterations

7
X

10

Figure 5.6: (a) Recorded AIR; (b) Convergence comparison between A-MINT and proposed SC-MINT when CNR == 20 dB.

a good choice of jJ = 0.02 and 0.05 can be used for the Se-MINT algorithm.

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

107

5.6 Conclusion

20.----------,------.-------.--------,

-.

15

A-MINT

SC-MINT}

SC-MINT}

B= 0.02
B= 0.05

OL----------'------~------'----------J

4
Iterations

6
X

10

Figure 5.7: SRR performance of A-MINT and proposed SC-MINT when CNR == 20 dB
using recorded AIRs.

5.6

Conclusion

The concept of sparseness control was introduced into the estimation of inverse filters for speech dereverberation. For successful AIRs equalization, the convolution
between the AIRs and inverse filters is expected to be a Kronecker delta function
which is perfectly sparse. In the proposed SC- MINT algorithrn, the sparseness of
the Kronecker delta function that is constructed frorn the estimated inverse filters
is rnaxirnized. Subsequently, such sparseness is utilized as an additional constraint
to A-l'vIINT. As a result, the SC-11INT algorithm avoids solutions where rnore than
one coefficients are non-zero for the Kronecker delta function and therefore, a higher
rate of convergence can be expected frorn SC- MINT. Simulation results using AIRs
generated by the rnethod of irnages and recorded AIRs show that SC- ~lINT out-

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

5.6 Conclusion

108

performs A-MINT by offering a higher rate of convergence and can achieve approximately 5 dB improvement in terms of normalized misalignment during the initial
convergence.

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

109

Chapter 6
Adaptive Channel Equalization
Exploiting Segregated Sub-systems

6.1

Introduction

It has been discussed in Chapters 4 and 5 that speech dereverberation can be


achieved by inverse filtering the reverberant speech signals with the estimated AIRs.
Obtaining accurate inverse filters is as important as the estimation of AIR,s. In
Chapter 5, the well-known AIR equalization algorithrns are reviewed, such as the
rnultiple-input multiple-out inversion theorem (MINT) [98] and adaptive l\!IINT (AMINT) [28] algorithms. Sirnilar to MINT, the regularized l\1INT (Rl\1INT) algorithm [101] was proposed to improve the robustness of the estimates to any fluctuation in AIRs and noise. It achieves this by including a regularization pararneter
in the rnatrix inversion process described by (5.9). It has been shown in [101] that
a suitably chosen regularization pararneter can increase the minimum eigenvalue of

nTn hence leading to performance improvement of RMINT compared to the l\;lINT

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

6.2 Algorithmic development

110

algorithm.
In this chapter, a new adaptive algorithm for the equalization of AIRs in
a SIMO system is presented. The proposed auto-relation aided MINT (A-RAM)
algorithm [30], which belongs to the category of non-blind channel equalization,
takes into account how the reverberant signals are generated from the convolution
between the source and the AIRs. It has been shown mathematically that for a SIMO
system, speech dereverberation can be achieved by deconvolving the reverberant
signals and their corresponding AIRs. System equalization is achieved by segregating
a SIMO system into two sub-systems such that each sub-system performs speech
dereverberation. These dereverberated signals are desired to be equivalent since
the received signals at different microphones are generated from a comlnon source.
Such source-channel relationship is defined as the auto-relation. The proposed ARAM algorithm utilizes this auto-relation constraint which minimizes the difference
between the dereverberated signals of the two sub-systerns iteratively.

6.2

Algorithrnic developrnent

It begins by first considering how s(n) can be recovered using a two-channel system
with channel indices i and .j, where the word "channel" refers to the AIRs hi and h j .
Similar to Chapter 2, for clarity of presentation, the development of the algorithm is
described for a noiseless case. For this two-channel case, the estimated source signal
corresponding to each channel is given by

Si(n)

xT(n)gi(n) ,

(6.1)

sj(n)

xJ (n)gj(n),

(6.2)

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

6.2 Algorithmic development

111

1--~+4-"

L..--_-----'

51 (n )

sen)

Figure 6.1: Proposed inverse filtering method for a SIMO system with two sub-systems.

where xi(n) = [xi(n) xi(n - 1) - -xi(n - L

+ 1)JT

is the received signal of the

ith microphone and the estimated ith-channel inverse filter is given by

gi (n)

respectively. Since the received signals of a SIMO system are generated by a cornmon
source, an irnportant relation is obtained

(6.3)

It is important to note that, as opposed to the cross-relation defined in (3.4) for


BSI, the auto-relation given by (6.3) governs the recovery of s(n) in the context of
channel equalization.
It is noted that estimation of gi (n) and gj (n) independently from AIRs

hi (n)

and hj(n) is undesirable since inverse filters gi(n) and gj(n,) in (6.1) and (6.2) do not
norrnally exist due to the non-minirnum phase property of a single AIR. Therefore, a
lvi-channel systern is segregated into two sub-systerns as illustrated in Fig. 6.1 and,
as a result, (6.3) is extended to a rnultichannel systern such that
Nfl

Lx[(n)gi(n)
i=l

Nf

L
i=Ml+l

x[(n)gi(n),

(6.4)

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

6.2 Algorithmic development

112

where M 1 (1 < M 1 < M) is the last channel index of the first sub-system. Since
the output of each sub-system (containing the dereverberated signal) is desired to
be equivalent, the difference between these outputs can be utilized for estimating
the inverse filters. This difference at iteration n is defined as

e(n)

Ml
M
Lxf(n)gi(n) xf(n)gi(n)
i=l
i=Ml+l
xi(n)91(n) - Xr(n)92(n),

(6.5)

the concatenated inverse filters of the first and second subsystems, respectively.

The variables Xl(n)

[xT(n) xf(n) ... xIt1(n)]T and X2(n) =

[xIt1+l(n) xIt1+2(n) ... xII(n)]T are the concatenated received signals corresponding to the first and second sub-systems, respectively. Similar to A-MINT described
in Section 5.3.1; a cost function can therefore be defined as

(6.6)

To describe why rninirnizing (6.6) alone will lead to rnultiple solutions, the
auto-relation is utilized

xi(n)91(n)
Xl (n)xi (n)9l (n)

Xr (n)92(n),

(6.7)

Xl (n)Xr (n)92(n),

(6.8)

from which it can be noted that the solution of 91 (TL) is given by

(6.9)

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

113

6.2 Algorithmic development

Hence

91 (n) depends on

the initial value assigned to

92 (0) .

Although minimizing (6.6) alone may not lead to the desired inverse filters,
(6.6) can be used as a constraint to confine the search space during adaptation for
fast convergence. This confined solution space can be shown by expressing (6.8) as

(6.10)

where R ll

= E{Xl(n)xf(n)} and R 12 = E{Xl(n)xI(n)}. it can therefore be

noted from (6.10) that although multiple solutions exist [102], (6.6) confines the
solution of

91 (n)

and 92(n) within the nullspace of

[R n

RI 2 ] .

Since the two sub-systems are desired to operate sirnilar to two J\1INT frameworks, (6.6) can therefore be incorporated as a constraint to the existing A-J\!IINT
algorithm such that the cost function for the proposed A-RAM algorithrn can be
written as

(6.11 )

where 111 = [HI Hl\Ih] and 112 = [H M1+ 1 ... H A1 ] are the concatenated channel
convolutive matrices for the first and second sub-systems and ;3 is the Lagrangian
rnultiplier. It can be seen frorn (6.11) that the proposed A-RAJ\/I algorithm not only
segregates a 1\1-channel frarnework into two sub-systerns, it also utilizes the autorelation between the two sub-systems to confine the search space of
such that

91 (n) and 92(n) lie in

the nullspace of

[R n

R12].

91 (71,)

and 92(71,)

These factors ensure

that the dereverberated signal frorn each sub-systern is equivalent.


Similar to algorithrns proposed in earlier chapters, a Ll\1S algorithrn is pro-

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

6.2 Algorithmic development

114

posed to solve this optimization problem using an update equation

(6.12)

where g(n)

[gi(n)

g;(n)]T for A-RAM. To solve for \7JAM(n), we

first invoke the linearity property of the \7 operator such that \7 JAM (n) ==
8J AM(n ) ) T ( 8Jk M(n ) ) T ] T where
[( 891 (n)
892(n)
'

for each subsystem

(6.13)
(6.14)

The proposed A-RAM algorithm is summarized in Table 6.1.

6.2.1

Convergence behavior of A-RAM

It will be explained, through the use of convergence analysis, why A-RAM can
achieve a higher convergence rate than A-MINT. As derived in Appendix A, the
convergence behavior for A-MINT is given by

(6.15)

where

1{

is defined after (5.9) and 9 is the true inverse filter and z(n)

It can be seen from (6.15) that R {ZA-MINT(n)} -+


if

J-L

OMLgXl

==

g(n) - g.

as n -+ 00, i.e., g(n) -+ 9

is suitably chosen.
It has been shown in [101] that the utilization of a regularization parameter b

in RMINT (i.e.,

1{T1{

+ bIMLgxNJLg)

increases the minimum eigenvalue of

1{T1{

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

6.2 Algorithmic development

115

Table 6.1: Proposed A-RAM algorithm for SIMO system.

Initialization:
Computation:

gi (0) ==

[1 0

0] T, i == 1, 2, ... , M;

==

1, 2,

for n

X[ (n)91 (n) - X~ (n)92(n)


[\7J1(n); \7 J2(n)],
where

== -21[d + 21[1191 (n) + 2/Je(n)X1 (n)


\7 J2(n) == -21~ d + 21~1292(n) - 2;3e(n)X2(n)
\7 J1(n)

9(n

+ 1)

9(n) - /1\7JAM(n),
where

9(n + 1) == [9i (n + 1) 9~ (n + 1)] T


91 (n + 1) == 91 (n) - /1\7J1(n)
92(n + 1) == 92(n) - Jl\7J2(n)
end for

which results in improved performance over MINT. When implernented in an adaptive frarnework, the update equation of adaptive RMINT (A-RMINT) is sirnilar to
that of A<VIINT (defined in (5.12)) but using a different gradient

(6.16)

The convergence behavior of A-RMINT has also been derived in Appendix A and
is given by

It can he noted that the effect of regularization is reflected in the first term on the

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

116

6.2 Algorithmic development

R.H.S of (6.17). The significance of regularization in adaptive channel equalization


has also been described in [43] where it has been proven that a suitably chosen

<5

can increase the convergence speed and gain robustness to additive noise.
As derived in the Appendix A, the convergence behavior of A-RAM can be
summarized by

E {zl(n)}

(6.18)

E {z2(n)}
2
-2j1Jj3asICM-Ml)LgxCM-Ml)Lg

]n 92'

(6.19)

where a; is the variance of s(n), 91 and 92 are the true inverse filters of the first
and second sub-systems respectively and zl(n)

== 91(n) -

91' z2(n)

== 92(n) -

92. Comparing (6.18) and (6.19) with (6.15) and (6.17), it is observed that the

auto-relation constraint for A-RAM results in a convergence behavior that is in the


form similar to A-RMINT, i.e., the term

/3a; provides regularization to 1[11 and

1r12. Therefore, it can be expected that A-RAM with a suitable (3 value can
achieve higher rate of convergence than A-MINT.
Comparing (6.18) and (6.19) with (6.17), it can be observed that although
the effect of regularization is present in both A-RAM and A-RMINT, the second
term 2J.L<59 in (6.17) forms a bias and prevents A-RMINT from achieving g(n) ---t g.
It can therefore be expected that A-RAM will achieve a higher rate of convergence
than A-RlVIINT.

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

6.2 Algorithmic development

6.2.2

117

Derivation of closed-form

f3

Although (3 can be determined empirically, a closed-form solution for (3 is derived


by equating (6.13) and (6.14) to zero giving

R 11[1-[ d - (3e(n)X1 (n)],

(6.20)

+ (3e(n)X2(n)] ,

(6.21)

R:;l [1-f d

where R

== 1-[1-1 and R 2 == 1-f1-2. Since it is required that Jar(n) ==

e 2 (n )

== 0,

the closed-forrn solution for (3 can be determined by substituting (6.20) and (6.21)
into (6.5) and equating it to zero which results in

(6.22)

It is noted that e(n) in the denominator of (6.22) will cancel out with that in (6.20)
and (6.21) therefore preventing the division of a null term when e(n)

---t

o.

It can

also be noted from (6.22) that the time-varying (3cf(n) requires high cornputational
cost due to the need to cornpute R

1
1 and R:;l. This rnotivates the use of a constant

(3 that will be determined ernpirically in Section 6.2.4.

6.2.3

Selection of equalization results from sub-systems

It is important to note, frorn Fig. 6.1, that Sl(n) and S2(n) are derived in the
proposed algorithrn. In addition, (6.13) and (6.14) irnply that the gradients of the
two sub-systems are different. Therefore, it is foreseeable that the performance of
the two sub-systems rnay differ from each other. Since it is desirable and sufficient
to obtain only a single dereverberated signal for the case of a

SI~lO

rnodel, it is

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

118

6.2 Algorithmic development

proposed to select the best estimate is from

81(n) and 82 (n).

While it is straightfor-

ward to employ quantitative measure such as signal-to-distortion ratio (SDR) [103]


defined by
~n=kN+N-1 s2(n)

10 K-1
SDR == K

L lOg10 -~-n-=k-N-:=-;-!!l-[-(-)
s n
k=O

Dn=kN

-_-'''''-(-)J-2'

(6.23)

s n

where k is the segment index of the original and estimated source signals, such
measure cannot be deployed for online applications since s( n) is unknown. To track
the performance of JAM1 (n) and JAM2 (n) is proposed, where

lid -1191 (n)"~ + !3Jar(n) ,


lid - 1292(n) II~ + !3Jar(n).

(6.24)
(6.25)

It can be expected that a lower cost corresponds to a better dereverberation. By


monitoring the convergence rate and the value of these cost functions, A-RAM is
therefore able to compute 8(n) that corresponds to the sub-system having the best
performance.
The above process is described through the use of an illustrative example.
In this simulation example: a single-input five-output system is divided into two
sub-systems in which the first sub-system contains channel indices 1 to 3 while the
second sub-systern contains channel indices 4 and 5. The AIRs are each of length
L == 1600 and are generated using the method of irnages [51] in the context of acoustic

equalization with sarnpling frequency

.I~ ==

8 kHz. In this illustrative example, s(n)

is a WeN. The dirnension of the rOOIn is taken to be 5 III X 6 rn x 5 m. A linear array


consisting of M == 5 rnicrophones with a uniforrn separation of 16 crn is used in this
simulation. The array centroid is located at (1.6,3,1.6) m while the first and last
microphones are located at (1.6,2.68,1.6) In and (1.6,3.32,1.6) m, respectively. The

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

6.2 Algorithmic development

119

40

30

1st sub-system

2nd sub-system

0'-------'-----'-----'---------'----------'

2000

4000

6000

8000

Number of iterations

Figure 6.2: Performance of two sub-systems in terms of SDR.

length of inverse filters is L g

== L == 1600 and step-size

1"

== 0.001 is used for (6.12).

The SDR, is adopted to quantify the performance of the first and second sub-systems
of A-RAM. Although it is not feasible to employ the SDR rneasure in practice, this
is included to illustrate the correlation between the perforrnance of each sub-system
and convergence of the cost functions

JAM! (n)

and

JANI2 (n).

Figures 6.2 and 6.3 illustrate the performance of the first and second subsysterns in terrns of SDR and the convergence of the cost functions respectively.
As can OC seen from Fig. 6.2, the SDR for the first sub-system is higher than that
for the second sub-system. This implies that the first sub-system outperforms the
second in terrns of dereverberation. As shown in Fig. 6.3,
higher rate and achieves a lower value than

JAM2 (n).

JANIl

(n) converges at a

These results irnply that one

can select the dereverberated signal corresponding to the sub-system which has a
higher rate of convergence and a lower value of cost function

Ji (n).

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

6.2 Algorithmic development

120

m
~
c

-10

.Q

~ -20~
:J

'+-

:r~T\iTl

t;; -30~

- 40

r
I
I
I

-50 L---_~_~.
o
2000

---------L_ _

4000

~_-L--_ ________'___ _________'

6000

8000

Number of iterations

Figure 6.3: Convergence for the cost functions of two sub-systems.

6.2.4

Significance of Jar(n)

To further examine the significance of Jar(n) in (6.11) when the estimated AIRs are
imperfect, the performance of A-RAl\!I via different values of j3 for M = 5 is compared. Sirnilar to Section 6.2.3, the SIMO systern is partitioned into two sub-systems
where the first sub-system comprises the first three channels while the second subsystem comprises the rernaining two channels. The AIRs are generated using the
method of irnages [51] with the same sirnulation setup described in Section 6.2.3.
Then, the performance of different algorithms is evaluated using the SDR measure.
To investigate the significance of the constraint Jar (n) in the presence of estirnation
error in AIRs, noise is added to the AIRs such that CNR

= 20 dB, where CNR has

been defined in (5.31).


As can be seen frorn Fig. 6.4, with (3 = 0.05 and f3cf(n) computing using (6.22),
the performance of A-RAM is improved significantly compared to the five-channel
A-MINT algorithrn. It can be observed that the two sub-systems of A-RAM achieve

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

121

6.2 Algorithmic development

40r-----,----,.......-------,------r------,

A-RAM, (3=(3 cf' 1st sub-system

30

(Xl

A-RAM, P=O.OS, 1st sub-system

20

ex:

A-RAM, (3=0.05, 2nd sub-system

V')

~ A-RAM, p=pcf 2nd sub-system


A-MINT, five-channel
_10'----------'---------'--------.1..-------'------.1

2000

4000

6000

8000

Number of iterations

Figure 6.4: Convergence of SDR for A-MINT and A-RAM with CNR == 20 dB L == L g ==
1600, J-l == 0.001 using WGN.

Table 6.2: Number of additions and multiplications of A-MINT.


Number of additions

Nurnber of multiplications

Pre-calculation
Every iteration

a higher perforrnance rate than A-MINT, gaining approxirnately 12 dB irnprovernent


in the steady-state performance. More irnportantly, these results illustrate that ARAM achieves better equalization performance compared to A-lVIINT in the presence
of BSI errors.
In addition, it can be observed frorn Fig. 6.4 that ,3 == 0.05 and Jcf(n) have
cornparable perforrnance. As noted in (6.22), cornputation of /3cf(n) is expensive
and therefore one can choose to use an empirically deterrnined value of

/3 ==

0.05

for A-RAM. It is also interesting to note from Fig. 6.4 that, in the presence of BSI
estimation errors, a fixed value of j3 will incur higher arnount of gradient noise.

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

122

6.2 Algorithmic development

Table 6.3: Number of additions and multiplications of A-RAM.


Number of multiplications

Number of additions
Pre-calculation

2(Mf+ M?)L (2(M 1+ M 2)2(Mf + M?))L 2 - 2(1\/I1M2)L

2(Mf+ M?)L 3+(2(M1+M2)(Mf + 1\,/?)) L 2 - (M 1 + 1\'/2) L

Every iteration

(Mf + M?)L 2 + 2(Al1 + M2)L

(Mf+M?)L2+4(M1 +M2)L+2

6.2.5

3+

Computational complexity

Tables 6.2 and 6.3 illustrate the computational complexity of A-MINT and A-RAM
using a fixed value of j3. For both algorithms, some terms in the update equation
can be computed before adaptation. For example in A-MINT, lld and ll Tll do not
vary with the number of iterations and can therefore be pre-computed. It is worth
noting that the only difference between A-RMINT and A-MINT is the regularization
term in ll Tll

+ 8IMLgxMLg,

which results in additional M 2 L 2 multiplications and

M 2 L 2 additions for the pre-calculation. As can be seen from Tables 6.2 and 6.3,

suming the case of L g

33-

L, A-MINT, A-RMINT and A-RAM incur a computational

complexity of (){ T}} per iteration and O{ I}} in the pre-calculation. A closer observation reveals that A-RAM with two sub-systems requires a lower computational
load than A-MINT and A-RMINT. This is because, taking the number of addition
per iteration as an exarnple,

A11)2)L 2

(M~

+ (!vI -

(A1 2

+ 21\;11 (All

- 1\d)) L 2

since 0 < M 1 < ]\;1. For the exarnple case of 1\1 = 5 and M 1 = 3, the computational
cornplexity of A-RAl\1 is reduced approximately by half since Mf+1\J?

= 13 ~

~!v12.

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

6.3 Simulations results with application to speech dereverberation

6.3

123

Simulations results with application to speech


dereverberation

The performance of A-RAM is verified using synthetic as well as recorded AIRs in


the context of room acoustics for speech dereverberation [94]. In all the simulations,
the source signal is a male speech sampled at 8 kHz. As before, a single-input fiveoutput model is adopted and for A-RAM, the first sub-systern consists of the first
three channels while the second sub-system consists of the last two channels, i.e.,
M = 5, M 1 = 3. The best dereverberated speech signal arising from one of the two

sub-systems in A-RAM is selected to compare with the dereverberated signals from


the baseline algorithrns.

6.3.1

Channel equalization using AIRs generated from the


image model

In this sirnulation, the same AIRs are adopted as those described in Section 6.2.3.
The scalar multiplier (3 = 0.05 is chosen for A-RAM while 6" = 0.01 is selected for
A-RlVIINT [101]. A step-size p

= 0.02 is adopted for all the adaptive algorithms in

order to be consistent with the sirnulations in Section 6.3.2 where these algorithms
achieve approximately the same initial convergence. Since a speech signal has been
used, the performance of these algorithrns is assessed by evaluating the quality of
the dereverberated speech using the bark spectral distortion (BSD) rneasure [104]
defined by
BSD

1 "\"'K "\"'Nc
LJk=l LJi=l

s 'l

B k ( ~)J 2
S 1

,,\",!!C [Bks (i)] 2


LJk=l LJ~=l

.l "\"'K
K

[B k ( ")

(6.26)
'

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

6.3 Simulations results with application to speech dereverberation

124

0.35 r - - - - - - - - , - - - - - , - - - - - - - - - - , - - - - . . , . - - - - - - - - . ,
0.3

A-MINT

2
~

0.25

:6

0.2

A-RMINT

~ 0.15

A-RAM

ru
~

0..
V)

.s:
~

co

0.1
0.05

MINT

200

400
600
Number of iterations

800

1000

Figure 6.5: BSD comparison between A-MINT, A-RMINT and the proposed A-RAM
algorithm using speech input and AIRs generated using the image model with L == 1600.

where B:(i) and B~(i) are the bark spectral component of the kth segment of the
original and dereverberated speech respectively and N; is the number of critical
bands. It has been reported in [104] that the BSD is a perceptually motivated
objective measure of speech quality which has a statistically linear relationship with
the mean opinion score. One can interpret frorn (6.26) that a lower BSD value
corresponds to a better dereverberated speech.
Figure 6.5 illustrates the BSD performance of A-MINT, A-RMINT and ARAM. As can be seen, the proposed A-RAM algorithm achieves the highest rate of
convergence cornpared to A-MINT and A-RMINT. The performance of the MINT
algorithrn has also been included in Fig. 6.5 for cornparison. It can be seen that all
the algorithrns can achieve nearly perfect speech dereverberation in the steady-state
since no noise is assumed in this illustrative exarnple.

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

6.3 Simulations results with application to speech dereverberation

~=0.03
--_. ~=0.02S
....... ~=0.02

1.2
s::

.g

lo-

o
+-'
V)

:c 0.8
co

lo-

a.

A-MINT

125

A-RAM
A-RMINT

MINT

0.6

V)

ro

co

0.4
0.2
OC-----'----'----L--------'--------'------'---------'----'-----'

200

400

600

800

1000 1200 1400 1600 1800

Number of iterations

Figure 6.6: BSD comparison between A-MINT, A-RMINT and proposed A-RAM algorithm using recorded AIRs with L == 1600, SNR == 35 dB.

6.3.2

Monte Carlo simulation results using recorded AIRs

In the next, Monte Carlo simulation results of A-MINT, A-RMINT and A-RAM
using recorded AIRs in a typical classroom environrnent, obtained frorn [105], will
be shown. In total, twenty sets of five AIRs; i.e., one hundred AIRs are used in
this sirnulation where each AIR is re-sampled at 8 kHz and is of length L = 1600.
Here two noisy scenarios are considered where pink noise is added to the reverberant
speech xi(n) to achieve SNRs of 35 dB and 15 dB. As before, 3 == 0.05 is adopted for
A-RAM and the suggested optirnal regularization pararneter c5 == 0.018 and 0.18 [101]
are used in A-Rl\JIINT for SNR

35 dB and 15 dB, respectively.

Figure 6.6 shows BSD results averaged across these twenty sets of AIRs for
SNR = 35 with f-1 = 0.02, 0.025 and 0.03. As can be seen, the convergence and
steady-state perforrnance of A-MINT, A-RMINT and A-RAIVI varies with step-size
as expected, i.e.. a larger step-size will achieve a faster convergence with a trade-off

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

6.3 Simulations results with application to speech dereverberation

126

in higher steady-state. More importantly, A-RAM consistently outperforms that of


A-MINT and A-RMINT; A-RMINT requires approximately 1000 iterations more to
achieve the same steady-state performance as A-RAM.
Additional simulation results are presented in Fig. 6.7 using J1
SNR

==

==

0.02 when

15 dB. These parameters are chosen since the algorithms can achieve approx-

imately the same initial convergence with this step-size. As can be seen from Figs. 6.6
and 6.7, A-MINT can achieve only limited performance when noise is present in the
received signals while A-RMINT and A-RAM can achieve significant improvement
in equalization performance given by a much lower steady-state BSD value. The
improvement of A-RMINT is due to the regularization parameter while for A-RAM,
this improvement is due to the additional auto-relation constraint that requires the
dereverberated speech frorn the two sub-systems to be equivalent. As observed from
Figs. 6.6 and 6.7, A-RAM outperforms A-RMINT by offering a higher rate of convergence. The standard deviation of these results (average across the twenty sets of
AIRs) are 0.1889 for A-MINT, 0.1323 for A-RMINT and 0.1147 for A-RAM when
SNR

==

35 dB. For the case when SNR

==

15 dB, the average standard deviations

are 0.2218 for A-l\!IINT, 0.1086 for A-RMINT and 0.114 for A-RAM.
It can also be noted from Fig. 6.6 that the steady-state of A-RAM is modestly lower than

~1INT

while this improvement is more significant in Fig. 6.7. This

implies that A-RAl\!1 can achieve better speech recovery in the presence of noise.
Such noise robustness is due to the cost function of A-RAM which aims to reconstruct the received signal by exploiting the relationship between 5(n) and xi(n) in

(6.1) and (6.2). While A-lVIINT and A-RMINT do not take such relationship into
account, by minirnizing the cost function of A-RAM, the dereverberated signals of
the two sub-systerns are constrained to be equivalent which results, to sorne ex-

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

6.3 Simulations results with application to speech dereverberation

127

7~---~-------r---------.------,

A-MINT

~5
~

A-RMINT

+-I

VI

:04

ro

A-RAM

~ 3
Q.
VI
~

co

2
1
18

OL.-----------l.-------=--'.;t-----'-----------'-----'

500

1000

1500

Number of iterations

Figure 6.7: BSD comparison between A-MINT, A-RMINT and proposed A-RAM algorithm using recorded AIRs with L == 1600, SNR == 15 dB.

tent, in robustness to additive noise that are uncorrelated across the channels. The
above also explains why without noise, all algorithms can achieve equal steady-state
perforrnance as shown in Fig. 6.5.
In addition to BSD comparison, Fig. 6.8 illustrates d cornputed using the estimated inverse filters from A-MINT A-RMINT and A-RAYI at the 820th iteration
i

of Fig. 6.7. This number of iterations is chosen for illustration because A-RAM has
approxirnately converged while A-MINT and A-Rl\1INT have not. The corresponding BSD values are 3.167, 1.074 and 0.4677 which are denoted by B 1 , B 2 and B 3
in Fig. 6.7. As can be seen from Fig. 6.8, d obtained from A-RAlVI is closer to the
Kronecker delta function than that of A-MINT and A-Rl\!IINT.
In order to visualize the quality of dereverberated speech, the spectrograms
of the (a) clean, (b ) reverberant and dereverberated speech corresponding to (c) B1
from A-MINT, (d) B2 from A-RMINT and (e) B3 from A-RAl\1 are plotted in

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

6.3 Simulations results with application to speech dereverberation

500

1000

1500

500

1000

1500

1000

1500

128

(c)

500

Sample index

Figure 6.8: Equalized AIRs from (a) A-MINT, (b) A-RMINT and (c) A-RAM.

Fig. 6.9, where B1, B2 and B3 are denoted in Fig. 6.7. These spectrograms are
cornputed using a Hanning window of length 256 sarnples with an overlapping factor
of 50%. The different colors depict the energy distribution of the speech where dark
area represents the speech frame with high energy while light area corresponds to
the low energy frarne.

The speech used for evaluating the algorithms is sampled

at 8 kHz and for clarity of presentation, the spectrogram of the speech signal for
the first 3.2 s is shown. As can be seen from Fig. 6.9 (b), the spectrograrn of the
reverberant speech is srneared compared to that of the clean speech in Fig. 6.9 (a).
This effect can be clearly seen in the region of 0-2 kHz at 0.3 s, 0.5 s, 1.5 sand
3 s. It can be seen frorn Figs. 6.9 (c)-(e) that speech is being dereverberated since
their spectrograrns look rnore similar to that of the clean speech. More importantly,
comparing frequencies within the region of 0-2 kHz frorn time 0.4 to 0.5 s, the
spectrograrn of Fig. 6.9 (e) shows less energy srnearing along the time-axis.

In

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

6.3 Simulations results with application to speech dereverberation

129

100

0.5

1.5
2
Time (5)

2.5

Figure 6.9: Spectrograms of (a) clean (b) reverberant speech and dereverberated speech
corresponding to (c) BI of A-MINT, (d) B2 of A-RMINT and (e) B3 of A-RAM.

addition, within the region of 0-2 kHz from time 1 to 1.2 s, 2 to 2.1 sand 2.4 to
2.6 s, the spectrogram of Fig. 6.9 (e) shows distinct frequency separation cornpared
to Figs. 6.9 (c) and (d). As noted in Figs. 6.9 (a) and (c)-(e), the A-RAM algorithm
reduces the smearing effect significantly which in turn suggests that the quality of
the dereverberated speech of A-RAM is better that of A-MINT and A-RMINT.
Lastly, the quality of the dereverberated speech is further assessed by the
rnean opinion score (MOS) test. With the same simulation setup as those described
in Fig. 6.6, five pairs of dereverberated speech are selected at the 200th, 400th, 600th,
800th and 1000th iteration from A-MINT and A-RAM, respectively. In this subjectivc test, twenty fluent English speakers are employed arnong which fifteen are male
while five are female. To better describe the fine difference of the dereverberated
speech frorn these two algorithms, the subjects are allowed to give decirnal score. To
illustrate the irnprovernent of the quality of dereverberated speeches when cornpared

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

6.3 Simulations results with application to speech dereverberation

130

4.5
OJ

o
~

:~ 3.5
C-

o
c

ro
OJ

~A-MINT

1.5 ~--'-----'-----'----'-------'---------'--------'----'----'------'

100

200

300

400

500

600

700

800

900

1000

Number of iterations

Figure 6.10: Mean opinion scores of the selected 5-pair dereverberated speeches.

to the reverberant speech, we included the MaS measure for the reverberant speech
which has an MaS score of 2, as the starting point for the MaS result curves for
both of the two algorithrns. As can be seen in Fig. 6.10, (i) the MaS results of the
proposed A-RAM algorithm are always higher than that of the A-MINT algorithm;
(ii) the improvement in quality of dereverberated speech in terms of MaS frorn the
proposed A-RAM algorithm is more significant than that of the A-MINT algorithm
within the sirnulation interval of 0-400 iterations. These show that the proposed
A-RA:\1 algorithm outperforms the A-MIl\T algorithm by offering a higher rate of
convergence for speech dereverberation. In addition, it is important to note frorn
Figs. 6.6 and 6.10 that a lower BSD measure corresponds to a higher MaS evaluation score. Results presented for the BSD and MaS measures justifies that these
two speech reverberation measurernent rnethods are reliable.

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

6.4 Conclusion

6.4

Conclusion

A sub-system based channel equalization algorithm is proposed for a SIMO system.


In the proposed A-RAM algorithm, the auto-relation was utilized as an additional
constraint for A-MINT. This auto-relation constraint is found to confine the solution search within a multidimensional space. In addition, through convergence
behavior analysis, the auto-relation constraint is found to be equivalent to having
a regularization effect on the channel convolutive matrix which, as a consequence,
results in an improved convergence performance. Although the closed-form solution
of the Lagrangian multiplier has been determined for the auto-relation constraint, it
has been shown that this closed-form solution is computationally expensive to compute. It is observed that the proposed A-RAM algorithm can achieve comparable
perforrnance when an empirically determined or closed-form Lagrangian rnultiplier
is used. Therefore, it is proposed to adopt the empirically determined Lagrangian
multipliers for A-RAM. It has also been proposed to select the equalized AIRs frorn
the sub-systems by tracking the convergence performance of the cost function of
each sub-systern. It has been shown via complexity analysis that the proposed ARAM algorithm with two sub-systems requires a lower cornputationalload than the
existing MINT-based algorithrns. Sirnulation results using synthetic and recorded
AIRs have verified that the proposed A-RANI algorithm can achieve higher rate of
convergence and a lower error resulting in a better dereverberated speech in the
context of roorn acoustics.

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

131

Chapter 7
Discussion and Conclusions

7 .1

Summary

This thesis began by presenting a class of non-blind channel identification algorithms


for sparseness-controlled AEC. Various APAs, such as PAPA [17] and IPAPA [18] [19]
are reviewed.

It has been shown that the success of IPAPA depends on a pre-

determined control parameter. Since it is expected that the value of this control parameter may vary depending on the environrnent, the sparseness rneasure of the estirnated irnpulse response is proposed to be incorporated in order to compute the timevarying weights assigned to the proportionate and non-proportionate terms. Specifically, two mechanisms are proposed to achieve this. In the proposed SC- IPAPA-I,
additional weighting terrns computed based on the sparseness rneasure of estimated
irnpulse response are multiplied to the proportionate and non-proportionate terrns in
the conventional IPAPA. Furthermore, the proposed SC-IPAPA-II reduces the need
of deterrnining the control pararneter by employing a sparseness-dependent control
parameter. The proposed SC-IPAPA-I and SC-IPAPA-II ensure that the filter coef-

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

7.1 Summary

132

ficients with large magnitudes will adapt with a larger step-size for a sparse system
compared to a dispersive system. On the other hand, the small filter coefficients
in a sparse system will be assigned with a smaller step-size compared to a dispersive system. The contribution of the proposed SC-IPAPA-II removes the need for a
pre-defined control parameter which results in a higher rate of convergence that is
robust to the sparseness of the impulse response.
Chapters 3 and 4 are dedicated to BSI. In these chapters, the MCLMS [20] and
NMCFLMS [21] algorithms are reviewed. It has been shown that these algorithms
suffer from the robustness problem to additive noise which are uncorrelated across
the channels. The time-domain analysis of MCLMS in Chapter 3 revealed that the
additive noise requires the estimate of different channels to be equivalent. These
solutions are trivial since the channels differ in practice. The proposed IMCLl\JlS
algorithm in Chapter 3 addressed this issue by utilizing a constrained cross-relation
cost function which mitigates the cross-relation error due to noise thus achieving
noise robustness. The misalignment analysis performed on MCLMS and IMCLMS
under the sarne assumptions also showed that the proposed IMCLMS algorithrn
gains improvement in steady-state performance. In addition, Monte Carlo simulation
results showed that IMCLMS can achieve an irnprovement in NPM by approxirnately
5 dB over MCLMS.
As shown in Chapter 4, although the DP-NMCFLMS algorithrn [26] can address the rnisconvergence problem associated with the additive noise for NMCFLIVIS,
it suffers from slow convergence. The proposed DP-NMCFLMS-PC algorithm in
Chapter 4 not only addresses the noise robustness issue of NJ\!ICFLJ\!IS, but also
gains a higher rate of convergence than DP-NMCFLMS. The main contribution of
DP-NMCFLMS-PC is the additional power constraint which is achieved by rotating

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

7.1 Summary

133

the estimated hi(m) to its following updated value hi(m + 1) along the tangential
gradient of the surface Ilh(m + l)ill~

== 'l9;(m).

Another contribution of Chapter 4 is the DNE technique for DP-NMCFLMSPC. This technique performs an online misconvergence point estimation using a
power constraint that is close to the power of the true AIRs. In a practical implementation, this is achievable by computing the power of the estimated AIRs before
the algorithm misconverges. Therefore, estimation of when the algorithm misconverges is important for DP-NMCFLMS-PC. The proposed DNE technique achieves
this by monitoring the gradient of the l2- norm of the estirnated AIRs. In the context of the DNE technique, the misconvergence point was then defined as the time
point by which the change in the gradient of the estirnated AIRs is smaller than
a predefined threshold. It is shown through simulations that the convergence time
of the proposed DP-NMCFLMS-PC algorithrn is approxirnately one quarter of the
DP-NMCFLMS algorithm. In addition, the proposed DP-NlVICFLMS-PC algorithm
offers an improvement in steady-state NPlVI value by approximately 1.5 dB.
Chapters 5 and 6 are dedicated to the problern of channel equalization. Chapter 5 details improvement made to the MINT and A-l'vIINT algorithrns using the
sparseness constraint.

The existing A-MINT algorithrn [40] equalizes AIRs and

avoids matrix inversion via an adaptive approach thus achieving computational efficiency. However, the main weakness of A-MINT is its slow convergence. To address
this problern, it is proposed to exploit the sparseness property of the Kronecker delta
function. In view of this, the SC- MINT algorithrn is developed for acoustic channel
equalization. The rnain contribution is to irnprove the convergence speed by suppressing any undesirable non-zero coefficients in the Kronecker delta function that
is constructed from the estimated inverse filters at each iteration. Simulation results

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

7.2 FUture research direction

134

using AIRs generated by the method of images and recorded AIRs showed that the
proposed 8C-MINT algorithm can achieve faster convergence than A-MINT.
In Chapter 6, an adaptive sub-system based multichannel system equalization
algorithm that performs inverse filtering of room acoustics is proposed. Conventional channel equalization algorithms does not take into account the source signal
and consider only the estimated AIRs obtained from B8I algorithms. Given that
such equalization techniques have been developed independently of B8I, the performance of existing approaches for dereverberation is limited to a great extent. In the
proposed A-RAM algorithm, however, how reverberant speech was generated is first
taken into consideration of by utilizing a sub-system configuration. In addition to the
constraints required by conventional equalization algorithms, the difference between
the outputs of the two sub-systems is further minimized. It is subsequently shown
through sirnulations that the proposed algorithm is able to achieve fast convergence
for the inverse filtering of room acoustics. J\!Ioreover, it has been shown through
complexity analysis that the proposed A-RAM algorithm with two sub-systems is
more computationally efficient than the existing MIl\T-based algorithms.

7.2

Future research direction

This research has been focused on the developrnent of adaptive algorithms for channel identification and equalization with applications to echo cancellation and speech
dereverberation. The following are the suggestions for the near future research:

1. Channel identification in a noisy situation. Future research effort can


focus on a more realistic phenomenon in channel identification by taking into
consideration of noise. The proposed IMCLMS algorithm has shown improve-

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

135

7.2 Future research direction

ment in steady-state performance for short impulse response identification in


a noisy environment for typical communication application. A possible future
research direction is to develop an algorithm that is noise robust for long AIRs
identification, not only be able to avoid the misconvergence, but one that can
converge to a lower steady-state in terms of NPM in the presence of noise.
2. One-step speech dereverberation by inverse filtering. For the algorithms developed for speech dereverberation, the focus is the two-step approach by firstly estimating the AIRs and subsequently estimating the inverse
filters. Although this two-step approach is popular, such an approach limits
the dereverberation process if the first stage estimation of AIRs is inaccurate.
The proposed A-RAM algorithrn has atternpted to overcome this drawback by
taking into the consideration of how reverberant speech signal is generated. To
cornpletely avoid this problern, a possible future work may involve the development of a one-step inverse filtering speech dcrcvcrbcration algorithm which
estimates the inverse filters of the AIRs directly using the multichannel model.

3. Speech

dereverberation

in

multiple-input

multiple-output

(MIMO) system. This research is also focused in a SIMO rnodel where only
one source signal is presented. In a rnore realistic scenario, rnultiple speakers
can be involved which brings speech dereverberation into the case of a MIM0
system. Existing research such as blind source separation (BSS) atternpts to
separate different source signals. However, all the separated signals are a filtered version of the original speech. In a view of this, speech dereverberation
is desired in BSS as well. Therefore, one possible future work is to extend and
develop algorithrns with regard to joint speech dereverberation and BSS where
the aim is to irnprove the quality of the separated speech.

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

136

Appendix A
Convergence analysis of A-MINT,
A-RMINT and A-RAM
Convergence analysis of the A-MINT, A-RMINT and A-RAM algorithms is provided
in this appendix. The motivation of this analysis is to explain why the proposed ARA M algorithm, described in Section 6.2.1, can achieve fast convergence.
For rnathematical tractability, the following are assumed in the analysis:
1. The channel hi, i = 1,
formed frorn hi, i

1,

, it! are M independent processes and the polynornials


,M are co-prime, i.e., the channel transfer functions

Hi do not share any comrnon zeros [62] [63].


2. The step-size 11, is sufficiently srnall [42].
3. The input .s( 17,) is a sequence of i.i.d. random Gaussian variables of zero rnean
and variance

a; [42].

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

A. Convergence analysis of A-MINT, A-RMINT and A-RAM

137

The update equation of A-MINT is [28]

(A.I)

Defining z( n) == g( n) - 9 as the difference between the estimate and true inverse


filters where 9 is the true solution of the inverse filters, we have

E {Z A-MINT ( n )}

z(n - 1) - J-L[ - 211.T d + 21i.T1z(n - 1) + 21T19]


[IMLgXMLg -

2J-l1i T1i] z(n - 1)

[IMLgxMLg -

2J-L1T1r z (O)

- [IMLgxMLg -

where it has been assumed g(O) ==

2/l1T1

OMLgXl,

rg,

(A.2)

as cornrnon in practice.

In the proposed A-RAM algorithm, the update equation for the two subsystems are

91(n) = 91(n - 1) - J-L [ - 21[d

+ 21f1191 (n) + 2,3e(n - 1)X1 (n -

92(n) = 92(n - 1) - J-L[ - 21f d

+ 21f1292(n) - 2f3e(n - 1)x2(n -1)] .(A.4)

1)] ,(A.3)

The update equation (A.3) is now used to illustrate the convergence of the first
sub-systern of A-RAM. For clarity of presentation, the time dependency factor n
for e(n) is ignored, gl (n) and Xl (n). Following similar approach of (A.2), one rnay

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

A. Convergence analysis of A-MINT, A-RMINT and A-RAM

138

express zl(n) using (A.3) as

Z1 (n)

[IMlLgXMlLg -

2J.l1f11] Z1 (n - 1) - 2J.lf3eXl

[I M l L g XMlL g

2J.l1f11] Z1 (n - 1) - 2J.lf3 [Xfg1X1 - Xfg2X1 ]

[IMlLgXMlLg -

(A.5

2J.l1 f11] Z1 (n - 1) - 2J.lf3 [xfZ1 (n - 1)X1 - xf z2(n - 1)X1]

It is noted that since


MILg-I

( L

X1'iZ1'i) Xl

1,=0

(A.6)

xaZI(n - 1),

where
2

XI,O

XI,OXI,1

XI,OXI,1

XI,1

(A.7)

one can express (A.5) as

[I M l L g xM,L g

+2/l'11Xf z2(n -

2J.l1f11 - 2J.lf3xa ] Z1 (n - 1)
(A.8)

l)XI

Sirnilarly, frorn (A.4),

[1(M-MdLgX(M-MI)L g -

+2ppxi zl(n -

1)X2'

2J.l1f12 - 2J.lf3x ] Z2(n - 1)


(A.9)

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

A. Convergence analysis of A-MINT, A-RMINT and A-RAM

139

where
2
X2,O

X2,OX2,1

X2,OX2,1

2
X2,1

X2,OX2,(M -Ml)Lg-l

X2,lX2,(M -M1)Lg-l

(A.I0)

Substituting (A.9) into (A.8), the following can be obtained

zl(n)

[IMILgXMILg -

2jL1-f1-1 - 2jL,Bx a]zl(n -1)

+2jL,Bxr{ [I(M-Ml)LgX(M-MI)L g
+2jL,Bxfzl(n - 2)X2
[IMILgXMILg -

2jL1-r1-2 - 2jL,BX b ] z2(n - 2)

}XI

2/11-f1-1 - 2JlI:lxa]zl(n -1)

+2jL,Bxr{ [I(M-Ml)LgX(M-M,)L g - 2jL1-r1-2 - 2jLPX b]


. ( [I(M-M,)L g x(M -M,)L g

+2jLt:ixfzl(n -

~ 2/t1-r1-2 ~ 2jLi'3X b] Z2( ri -

3)x2) + 2jL,3Xf zl(n -

2)X2

}XI'

3)

(A.H)

Similar to [42], a srnall step-size is assurned, such that the higher orders of J-L are
sufficiently small so that they can be approximated to 0, i.e., J-Ln ---+ 0 for n 2: 2.

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

A. Convergence analysis of A-MINT, A-RMINT and A-RAM

140

Hence,

[IMILgXMILg

~ 2/-L1-l'[1-l1 - 2/-L,BE{x a}]E{Zl(n -I)}

+2/-L,BE{ xf{ [I(M~MdLgx(M-MdLg


-2/-L,BE{xb}]2 z2(n - 2)
[IMILgxMILg -

2/-L1-lf1-l2

}Xl}

2/-L1-l'[1-l 1

2/-L,BE{x a } ] E{zl(n - I)}

-2/-L,BE{xI {[I(M-Ml)LgX(M-Ml)Lg

(A.I2

2/-L1-lI1-l2 - 2/-L,BE{x

b}rg2

}Xl}

Furthermore, it can be observed from (A.7) that

(J"2

(J"2

E{X a }

(A.13)

(J"2

MILgxMIL g

and similarly

o
o
In addition, denoting

as

A(M-MdLg x],

(A.14)

[I(M-M1)LgX(M-MdLg -

one can approximate

2/--l1-lr1-l 2

E{ xf

2/--lpE{X b }Jng 2 in (A.12)

{[I(M-M1)LgX(M-MdLg

~ 2j1,1-lI1-l2 -

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

A. Convergence analysis of A-MINT, A-RMINT and A-RAM

(M - M l )L9 - l

XI,O

X2,i Ai

(M - M l )L9 - l

XI,1

Xl,MlLg-l

X2,i Ai

((M-~L9-I X2,i Ai)


~=o

J\;11L gxl

(A.15)

OMILgXlo

The above results from the independency of hi, i

141

1, .

M', i.e.,

{Xl,OX2,O}

o.

(A.16)

As a result of (A.15), (A.12) can be simplified to

E {zl(n)}

(A.17)

It should be noted that the update equation of A-RIVIINT, which can be forrnulated

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

A. Convergence analysis of A-MINT, A-RMINT and A-RAM

142

from (5.12) and (6.16) as

g(n

+ 1) == g(n)

- M (-21i T d

+ 2 [1i T1i + 8IMLgxMLg]g(n)) .

(A.lS)

Following the same derivation of (A.2), the convergence behavior of A-RMINT can
be obtained as

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

143

Appendix B
Publications arising from this
thesis
Journal papers
1. L. Liao and A. W. H. Khong, "An adaptive sub-systems based algorithm for

channel equalization in a SIMO system,' IEEE Trans. Circuits and Systems-I,


(accepted for publication), 2012.
2. L. Liao and A. W. H. Khong, "Adaptive channel equalization of room acoustics

exploiting sparseness constraint," IEEE Signal Process.

Lett., vol. 18, pp.

275-278, 2011.

Conference papers
1. L. Liao and A. W. H. Khong, "Equalization of rnultichannel acoustic sys-

tern using sub-systerns for speech dereverberation," in Proc. IEEE Int. Conf.

Acoust.) Speech and Signal Process., May 22-27, 2011, pp. 313-316.

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

B. Publications arising from this thesis

144

2. L. Liao and A. W. H. Khong, "Sparseness-controlled affine projection algorithm


for echo cancellation," in Proc. Asia Pacific Signal and Information Processing

Association Annual Summit and Conference (APSIPA-ASC), Dec. 2010.


3. L. Liao and A. W. H. Khong, "A noise robust multichannel algorithm for blind
estimation of room impulse responses," in Proc. Int.

Echo and Noise Control, 2010.

Workshop on Acoust.

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

145

Bibliography
[1] A. W. H. Khong, "Adaptive algorithms employing tap selection for single channel and stereophonic acoustic echo cancellation," Ph.D. dissertation, Imperial
College London, 2006.

[2] L. Rabiner and R. Schafer, Digital processing of speech signals. NJ: PrenticeHall, 1978.

[3] S. Weinstein, "Echo cancellation in the telephone network," in IEEE Commun.


Mag., vol. 15, no. 1, Jan. 1977, pp. 8-15.

[4] M. M. Sondhi and D. A. Berkley, "Silencing echoes on the telephone network,"


IEEE Proceedings, vol. 68, pp. 948-963, Aug. 1980.

[5] R. Mofett, "Echo and delay problerns in sorne digital cornrnunication systems,"
IEEE Commun. Mag., vol. 25, pp. 41-47, 1987.

[6] S. V. Vaseghi, Multirnedia Signal Processing: Theory and Applications


Speech, Music and Comrnunications.

in

John Wiley & Sons, Ltd, 2007.

[7] K. Miura, H. Fujiya, T. Mizuno, and T. Ushiki, "Cell-based echo canceller for
voice cornrnunications over atrn networks," in Proc. IEEE Telecornrnunications
Conference, vol. 1, 1995, pp. 77-82.

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

BIBLIOGRAPHY

146

[8] M. Berouti, R. Schwartz, and J. Makhoul, "Enhancement of speech corrupted


by acoustic noise," in Proc. IEEE Int. Conf. Acoust., Speech and Signal Process., Apr. 1979, pp. 208-211.

[9] B. Yegnanarayana and P. S. Murthy, "Enhancement of reverberant speech


using LP residual signal," IEEE Trans. Speech and Audio Process., vol. 8, pp.

267-281, May 2000.


[10] Y. Song, Z. Q. Zhao, W. Zhang, and S. Q. Yu, "Research on the reverberation absorption coefficient of material measured by underwater reverberation
field method," in Proc. IEEE Symposium Piezoelectricity, Acoustic Waves and
Device Applications, 2011, pp. 130-133.

[11] J. A. Moorer, "About this reverberation business," Computer Music Journal,


vol. 3, no. 2, pp. 13-28, 1979.

[12] D. T. Murphy, D. M. Howard, and A. M. Tyrrell, "Multi-channel reverberation


for cornputer rnusic application," in Proc. IEEE Workshop Signal Process.
Systerns, 1998, pp. 210-219.

[13] K. S. Helfer, "Aging and binatural advantage in reverberation and noise,"


Journal of Speech and Hearing Research, vol. 35, pp. 1394-1401, Dec. 1992.

[14] P. Jinachitra and R. E. Prieto, "Towards speech recognition oriented dereverberation," in Proc. IEEE Int. Con]. Acoust., Speech and Signal Process., 2005,
pp. 437-440.

[15] B. W. Gillespie and L. E. Atlas, "Acoustic diversity for irnproved speech recognition in reverberant environrnents," in Proc. IEEE Int. Con]. Acoust., Speech
and Signal Process., vol. 1, 2002, pp. 557-560.

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

BIBLIOGRAPHY

147 .

[16] A. Sehr and W. Kellermann, "Strategies for modeling reverberant speech in the
feature domain," in Proc. IEEE Int. Conf. Acoust., Speech and Signal Process.,
2009, pp. 3725-3728.
[17] T. Gansler, J. Benesty, S. L. Gay, and M. M. Sondhi, "A robust proportionate
affine projection algorithm for network echo cancelation," in Proc. IEEE Int.

Conf. Acoust., Speech and Signal Process., vol. 2, 2000, pp. 793-796.
[18] O. Hoshuyama, R. A. Goubran, and A. Sugiyama, "A generalized proportionate variable step-size algorithm for fast changing acoustic environments," in
Proc. IEEE Int. Conf. Acoust., Speech and Signal Process., vol. IV, 2004, pp.

161-164.
[19] K. Sakhnov, "An improved proportionate affine projection algorithm for network echo cancelation," in Proc. IEEE Int. Conf. System Signals and Image
Process., Jun. 2008, pp. 125-128.

[20] Y. Huang and J. Benesty, "Adaptive blind channel identification: Multichannel


and mean square and Newton algorithrns," in Proc. IEEE Int. Conf. Acoust.,

Speech and Signal Process., May 2002.


[21] - - , "A class of frequency-dornain adaptive approaches to blind rnultichannel
identification," IEEE Trans. Signal Process., vol. 51, pp. 11-24, Jan. 2003.
[22] R. Ahrnad, A. W. H. Khong, 1\1. !(. Hasan, and P. A. Naylor, "An extended
normalized multichannel FLMS algorithm for blind channel identification," in
Proc. European Signal Process. Conference, 2006.

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

BIBLIOGRAPHY

148

[23] P. Loganathan, A. W. H. Khong, and P. A. Naylor, "A class of sparseness-

controlled algorithms for echo cancellation," IEEE Trans. Audio, Speech and

Lang. Process., vol. 17, pp. 1591-1601, Nov. 2009.


[24] L. Liao and A. W. H. Khong, "Sparseness-controlled affine projection algorithm

for echo cancellation," in Proc. Asia Pacific Signal and Information Processing

Association Annual Summit and Conference (APSIPA-ASC), Dec. 2010.


[25] J. J. Shynk, "Frequency-domain and multirate adaptive filtering," IEEE Signal

Process. Mag., vol. 9, pp. 14-37, Jan. 1992.


[26] M. K. Hasan, J. Benesty, P. A. Naylor, and D. B. Ward, "Improving robustness

of blind adaptive multichannel identifcation algorithms using constraints," in

Proc. European Signal Process., 2005.


[27] L. Liao and A. W. H. Khong, "A noise robust multichannel algorithrn for blind

estimation of room impulse responses," in Proc. Int. Workshop on Acoust. Echo

and Noise Control, 2010.


[28] W. Zhang, A. W. H. Khong, and P. A. Naylor, "Adaptive inverse filtering of

room acoustics," in Proc. Asilomar Conf. Signals, Systems, Computers, Oct.


2008, pp. 788-792.
[29] L. Liao and A. W. H. Khong, "Adaptive channel equalization of roorn acoustics

exploiting sparseness constraint," IEEE Signal Process. Lett., vol. 18, pp. 275278, 2011.
[30] - - , "Equalization of rnultichannel acoustic system using sub-systerns for

speech dereverberation," in Proc. IEEE Int. Conf. Acoust., Speech and Sig-

nal Process., May 22-27 2011, pp. 313-316.

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

BIBLIOGRAPHY

149

[31] M. Nekuii and M. Atarodi, "A fast converging algorithm for network echo
cancelation," IEEE Signal Process. Lett., vol. 11, pp. 427-430, 2004.
[32] D. Sabolic, R. Malaric, and A. Bazant, "Some properties of impulse response
of distribution networks with periodic tree-type topology," IEEE Trans. Power
Delivery, vol. 26, pp. 2416-2427, 2011.
[33] D. L. Duttweiler, "Proportionate normalized least mean squares adaptation in
echo cancellers," IEEE Trans. Speech and Audio Process., vol. 8, pp. 508-518,
Sep. 2000.
[34] J. Benesty and S. L. Gay, "An irnproved PNLMS algorithm," in Proc. IEEE
Int. Conf. Acoust., Speech and Signal Process., vol. 2,2002, pp. 1881-1884.
[35] A. W. H. Khong and P. A. Naylor, "Efficient use of sparse adaptive filters," in
Proc. Int. Conf. Signals, Systerns and Computers, Oct. 2006, pp. 1375-1379.
[36] A. N. Birkett and R. A. Goubran, "Acoustic echo cancellation for hands-free
telephony using neural networks," in Proc. IEEE Workshop Neural Networks
for Signal Process., Sep. 1994, pp. 249-258.
[37] J. Beh, T. Lee, 1. Lee, H. Kirn, S. Ahn, and H.

1(0,

"Combining acoustic echo

cancellation and adaptive beamforrning for achieving robust speech interface


in rnobile robot," in Proc. IEEE Int. Conf. Intelligent Robots and Systems,
Sep. 2008, pp. 1693-1698.
[38] K. Ozeki and T. Umeda, "An adaptive filtering algorithm using an orthogonal
projection to an affine subspace and its properties," Trans. Electronics and
Commun. in Japan, vol. 67-A, no. 2, pp. 126-132, Feb. 1984.

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

BIBLIOGRAPHY

150

[39] A. W. H. Khong, P. A. Naylor, and J. Benesty, "A low delay and fast converging
improved proportionate algorithm for sparse system identification," EURASIP

Journal Audio, Speech and Music Process., vol. 2007, Jan. 2007.
[40] A. W. H. Khong, X. S. Lin, and P. A. Naylor, "Algorithms for identifying
clusters of near common zeros in multichannel blind system identification and
equalization," in Proc. IEEE Int. Conf. Acoust. Speech and Signal Process.,
2008, pp. 389 - 392.
[41] J. Benesty, Y. A. Huang, J. Chen, and P. A. Naylor, "Adaptive algorithms for
the identification of sparse impulse responses," in Topics in Acoust. Echo and

Noise Control, 2006, ch. 5, pp. 125-153.


[42] S. Haykin, Adaptive Filter Theory, 4th ed.

Prentice Hall, 2002.

[43] C. Paleologu, J. Benesty, and S. Ciochina, "Regularization of the affine projection algorithm," IEEE Trans. Circuits and Systems-II: Express Briefs, vol. 58,
pp. 366-370, Jun. 2011.
[44] H.-C. Shin and \V.-J. Song, "Affine projection algorithms with adaptive regularization matrix," in Proc. IEEE Int. Conf. Acoust., Speech and Signal Pro-

cess., vol. 3, 2006, p. III.


[45] P. O. Hoyer, "Non-negative rnatrix factorization with sparseness constraints,"

Journal of Machine Learning Research, vol. 5, pp. 1457-1469, Nov. 2004.


[46] J. Polack, "La transmission de ISt'energie sonore dans les salles. dissertation."
Ph.D. dissertation, Universitt'e du Maine, 1988.

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

BIBLIOGRAPHY

151

[47] Geneva, Acoustics (j Measurement of the Reverberation Time of Rooms with

Reference to Other Acoustical Parameters, Switzerland: International Standards Organization, 1997, iSO 3382-1997s.
[48] E. K. Miller, "A comparison of solution accuracy resulting from factoring and
inverting ill-conditioned matrices," in Proc. IEEE Int. Symposium Antennas

and Propagation Society, 1999, pp. 866-869.


[49] G. H. Golub and C. F. V. Loan, Matrix computations.

Johns Hopkins Univ.

Press., 1988.
[50] H. R. Abutalebi, H. SHeikhzadeh, R. L. Brennan, and G. H. Freeman, "Affine
projection algorithm for oversarnpled subband adaptive filters," in Proc. IEEE

Int. Conf. on Acoust., Speech and Signal Process., vol. 6, 2003, pp. 209-212.
[51] J. B. Allen and D. A. Berkley, "Image method for efficiently simulating smallroom acoustics," J. Acoust. Soc. Arner., vol. 65, pp. 943-950, Apr. 1979.
[52] D. H. Pham and J. H. Manton, "A subspace algorithm for guard interval based
channel identification and source recovery requiring just two received blocks,"
in Proc. IEEE Int. Conf. Acoust., Speech and Signal Process., 2003.
[53] H. Xu, S. Dasgupta, and Z. Ding, "A novel channel identification rnethod for
fast wireless cornrnunication systems," in Proc. IEEE Int. Conf. Commun.,
2001, pp. 2443-2448.
[54] Y. Sato, "A method of self-recovering equalization for multilevel amplitudemodulation," IEEE Trans. Commuu., vol. COM-23, pp. 679-682, 1975.

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

BIBLIOGRAPHY

152

[55] A. Benveniste, M. Goursat, and G. Ruget, "Robust identification of a nonminimum phase system: Blind adjustment of a linear equalizer in data communications," IEEE Trans. Automat. Contr., vol. 2, pp. 385-399, Jun. 1980.

[56] A. Benveniste and M. Goursat, "Blind equalizers," IEEE Trans. Commun.,


vol. COM-32, no. 8, pp. 871-883, Aug. 1984.

[57] Z. Ding, R. A. Kennedy, B. D. O. Anderson, C. R. Johnson, and C. R. Jr,


"Ill-convergence of godard blind equalizers in data communication systems,"

IEEE Trans. Commun., vol. 39, pp. 1313-1327, Sep. 1991.


[58] W. A. Gardner, "A new method of channel identification," IEEE Trans. Com-

mun., vol. 39, no. 6, pp. 813-817, Jun. 1991.


[59] G. Xu, H. Liu, L. Tong, and T. Kailath, "A least-squares approach to blind
channel identification," IEEE Trans. Signal Process., vol. 43, pp. 2982-2993,
Dec. 1995.

[60] Y. Huang and J. Benesty, "Adaptive rnulti-channel least rnean square and
newton algorithms for blind channel identification;" Signal Process., vol. 82,
pp. 1127-1138, Aug. 2002.

[01] D. Schrnid and G. Enzner, "Evaluation of adaptive blind SL\!IO identification


in terms of a normalized filter-projection misalignment," in Proc. IEEE Int.

Conf. Acoust., Speech and Signal Process., May 2001, pp. 4140-4143.
[62] L. Tong and Q. Zhao, "Joint order detection and blind channel estirnation by
least squares smoothing," IEEE Trans. Signal Process., vol. 47, pp. 2345-2355,
Sep. 1999.

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

153

BIBLIOGRAPHY

[63] N. D. Gaubitch, J. Benesty, and P. A. Naylor, "Adaptive common root estimation and common zeros problem in blind channel identification," in Proc.

European Signal Process. Conference, 2005.


[64] D. R. Morgan, J. Benesty, and M. M. Sandhi, "On the evaluation of estimated
impulse responses," IEEE Signal Process. Lett., vol. 3, pp. 174-176, Jul. 1998.
[65] A. W. H. Khong and P. A. Naylor, "Selective-tap adaptive filtering with performance analysis for identification of time-varying systems," IEEE Trans. Audio,

Speech, and Lang. Process., vol. 15, pp. 1681-1695, Jul. 2007.
[66] M. Jeub, C. Nelke, C. Beaugeant, and P. Vary, "Blind estimation of the
coherent-to-diffuse energy ratio from noisy speech signals," in Proc. European

Signal Processing Conference, Aug. - Sep. 2011.


[67] J. Benesty, M. M. Sandhi, and Y. Huang, Springer Handbook of Speech Pro-

cessing, J. Benesty, M. M. Sandhi, and Y. Huang, Eds.

Springer-Verlag New

York, Inc., 2007.


[68] S. Hiroya and M. Honda, "Estirnation of articulatory rnovements from speech
acoustics using an HMM-based speech production rnodel," IEEE Trans. Speech

and Audio Process., vol. 12, pp. 175-185, 2004.


[69] A. M. Kondoz, Digital speech coding for lout bit rate cornrnunication systems.
John Wiley and Sons, Ltd, 2004, ch. 4: Speech signal analysis and rnodelling,
pp.57-85.
[70] ~1. Jeub and P. Vary, "Enhancement of reverberant speech using the CELP
postfilter," in Proc. IEEE Int. Conf. Acousi., Speech and Signal Process., 2009,
pp. 3993-3996.

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

BIBLIOGRAPHY

154

[71] K. Kinoshita, T. Nakatani, and M. Miyoshi, "Blind upmix of stereo music


signals using multi-step linear prediction based reverberation extraction," in
Proc. IEEE Int. Conf. Acoust., Speech and Signal Process., 2011, pp. 49-52.

[72] B. Yegnanarayana, S. R. M. Prasanna, and K. S. Rao, "Speech enhancement


using excitation source information," in Proc. IEEE Int. Conf. Acoust., Speech
and Signal Process., vol. 1, 2002, pp. 541-544.

[73] R. G. Lyons, Understanding Digital Signal Processing, 2nd ed. Prentice Hall,
Mar. 2004.
[74] S. Griebel and M. Brandstein, "Wavelet transform extrema clustering for multchannel speeh dereverberation," in Proc. Int. Workshop on Acoust. Echo and
Noise Control, 1999.

[75] B. W. Gillespie, H. S. Malvar, and D. A. F. Florencio, "Speech dereverberation


via maximum-kurtosis subband adaptive filtering," in Proc. IEEE Int. Conf.
Acoust., Speech and Signal Process., 2001, pp. 3701-3704.

[76] T. Nakatani and M. Miyoshi, "Blind dereverberation of single channel speech


signal based on harmonic structure," in Proc. IEEE Int. Conf. Acoust., Speech
and Signal Process., vol. 1, Apr. 2003, pp. 1-92-5.

[77] T. Nakatani, 1\1. Miyoshi, and K. Kninoshita, Single microphone blind dereverberaiion. In: Speech Enhancernent. Springer, Berlin, Heidelberg, 2005, ch. 11,

pp. 247-270.
[78] T. A. Palka and D. W. Tufts, "Reverberation characerization and suppression
by means of principal cornponents," in Proc. IEEE OCEANS, vol. 3, 1998, pp.
1501-1506.

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

BIBLIOGRAPHY

155

[79] M. Wu and D. Wang, "A two-stage algorithm for enhancement of reverberant


speech," in Proc. IEEE Int. Conf. Acoust., Speech and Signal Process., 2005,
pp. 1085-1088.

[80] E. A. P. Habets, S. Cannot, and 1. Cohen, "Speech dereverberation using


backward estimation of the late reverberant spectral variance," in Proc. IEEE
Conv. Electrical and Electronics Engineers in Istrael, 2008, pp. 384-388.

[81] H. W. Lollmann and P. Vary, "A blind speech enhancement algorithm for the
suppression of late reverberation and noise," in Proc. IEEE Con]. Acoust.,
Speech and Signal Process., 2009, pp. 3989-3992.

[82] E. A. P. Habets, "Multi-channel speech dereverberation based on stastical


model of late reverberation," in Proc. IEEE Int. Conf. Acoust., Speech and
Signal Process., 2005, pp. 173-176.

[83] K. Lebart, J. M. Boucher, and P. N. Denbigh, "A new method based on spectral
subtraction for speech dereverberation," in Acta Acustica united with Acustica,
vol. 87, no. 3, 2001, pp. 359-368.

[84] E. A. P. Habets and S. Cannot, "Dual-microphone speech dereverberation in


a noisy environment," in Proc. IEEE Syrnposiurn Signal Process. and Inforrnation Technology, 2006, pp. 651-655.

[85] 1. Cohen, "Relaxed statistical rnodel for speech enhancement and a priori SNR
estimation," IEEE Trans. Speech and A udio Process., vol. 13, pp. 870-881,
Sep. 2005.

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

BIBLIOGRAPHY

156

[86] E. A. P. Habets and S. Gannot, "Late reverberant spectral variance estimation


based on a statistical model," IEEE Signal Process. Lett., vol. 16, pp. 770-773,
Sep.2009.
[87] E. Moulines, P. Duhamel, J. Cardoso, and S. Mayrargue, "Subspace methods
for the blind identification of multichannel FIR filters," in Proc. IEEE Int.

Conf. Acoust., Speech and Signal Process., vol. 4, Apr. 1994, pp. 573-576.
[88] L. Hoteit, "Extending the subspace method for blind identification," in Proc.

IEEE Int. Conf. Signal Process., vol. 1, Oct. 1998, pp. 347-350.
[89] D. Slock, "Blind fractionally-spaced equalization, perfect reconstruction filterbanks, and multilinear prediction," in Proc. IEEE Int. Conf. Acoust., Speech

and Signal Process., Apr. 1994, pp. 585-588.


[90] Y. Hua, "Fast maximum likelihood for blind identification of multiple FIR
channels," IEEE Trans. Signal Process., vol. 86, pp. 1951-1968, Oct. 1996.
[91] Q. Zhao and L. Tong, "Adaptive blind channel estirnation by least squares
smoothing," IEEE Trans. Sigal Process., vol. 47, pp. 3000-3012, Nov. 1999.
[92] R. Ahrnad, A. W. H. Khong, and P. A. Naylor, "A practical adaptive blind
multichannel estirnation algorithm with application to acoustic irnpulse responses," in Proc. IEEE Int. Conf. Digtital Signal Process., 2007, pp. 31-34.
[93] S. Douglas, S. Amari, and S. Y. Kung, "On gradient adaptation with unitnorm constraints," Proc. IEEE. Int. Conf. Acoust., Speech and Signal Proces.,
vol. 48, pp. 1843-1847, 2000.
[94] P. A. Naylor and N. D. Gaubitch, "Speech dereverberation," in Proc. Int.

Workshop Acoust. Echo Noise Control, Sep. 2005.

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

BIBLIOGRAPHY

[95] J. Mourjopoulos,

squares and

157

J. Hammond, "A comparative study of least-

nOJmOlm()m'UU~'t4~enltll(:JUE~Sfor

Proc. IEEE Int.

inversion of mixed phase signals," in

Speech and Signal Process., vol. 7, May 1982,

pp. 1958-1961.
[96] B. Radlovic and R. Kennedy, "Nonminimum-phase equalization and its sub-

jective importance in room acoustics," IEEE Trans. Speech Audio Process.,


vol. 8, no. 6, pp. 728-737, Nov 2000.
[97] N. D. Gaubitch and P. A. Naylor, "Equalization of multichannel acoustic sys-

tems in oversampled subbands," IEEE Trans. on Audio Speech and Lang. Process., vol. 17, pp. 1061-1070, 2009.
[98] M. Miyoshi and Y. Kaneda, "Inverse filtering of room acoustics," IEEE Trans.

Acoust. Speech Signal Process., vol. 36, pp. 145-152, Feb. 1988.
[99] S. T. Neely and J. B. Allen, "Invertibility of a room impulse response," J.

Acoust. Soc. Amer., vol. 66, pp. 165-169, Jul. 1979.


[100] M. Miyoshi and Y. Kaneda, "Inverse control of roorn acoustics using multiple

loudspeakers and/or microphones," in Proc. IEEE Int. Conf. Acoust., Speech


and Signal Process., Apr. 1986, pp. 917 - 920.
[101] T. Hikichi, ]\1. Delcroix, and YI. Mivoshi, "Inverse filtering for speech dere-

verberation less sensitive to noise and room transfer function fluctuations,"


EURASIP Journal on Advances in Signal Process., vol. 2007, Jan. 2007.
[102] T. K. Moon, Mathernatical rnethods and algorithrns for signal processing.

Prentice Hall, 2000.

ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

BIBLIOGRAPHY

158

[103] A. Blin, S. Araki, and S. Makino, "Blind source separation when speech signals

outnumber sensors using a Sparseness-Mixing Matrix Estimation SMME," in


Proc. Int. Workshop on Acoust. Echo and Noise Control, Sep. 2003.
[104] S. Wang and A. Sekey, "An objective measure for prediciting subjective quality

of speech coders," IEEE Trans. Selected Areas in Commun., vol. 10, pp. 819
- 829, 1992.
[105] S. Rebecca and S. Mark, "Database of omnidirectional and b-format impulse

responses," in Proc. of IEEE Int. Conf. Acoust., Speech and Signal Process.,
Mar. 2010, pp. 165-168.

You might also like