You are on page 1of 11

Subband Weighting With

Pixel Connectivity for 3-D


Wavelet Coding

R.KANAGAVALLI
ECE-3rd yr

IEEE TRANSACTIONS ON IMAGE


PROCESSING,
Subband Weighting With Pixel Connectivity for
3-D Wavelet presented by R.Kanagavalli(ECE)
AbstractPerforming optimal bit-allocation
with 3-D wavelet coding methods is difcult
because energy is not conserved after applying
the motion-compensated temporal ltering
(MCTF) process and the spatial wavelet
transform. The problem cannot be solved by
extending the 2-D wavelet coefcients
weighting method directly and then applying the
result to 3-D wavelet coefcients, since this
approach does not consider the complicated
pixel connectivity that results from the liftingbased MCTF process. In this paper, we propose
a novel weighting method, which takes account
of the pixel connectivity, to solve the problem
and derive the effect of the quantization error of
a subband on the reconstruction error of a group
of pictures. We employ the proposed method on
a 2-D structure with different temporal lters,
namely the 5-3 lter and the 9-7 lter.
Experiments on various coding parameters and
sequences show that the proposed approach
improves the bit-allocation performance over
that obtained by using the weightings derived
without considering the pixel connectivity in the
MCTF process. Index TermsBit allocation,
motion-compensated temporal ltering (MCTF),
3-D wavelet coding.
I. INTRODUCTION MULTICAST is a
promising technique that delivers multimedia
data to a large number of subscribers
simultaneously over the Internet. However, this
is a very challenging task because it is necessary
to handle content that has vastly different
characteristics in terms of display resolution,
play rate, and quality, all of which are
constrained by limited bandwidth. In recent
years, scalable video codecs have become
increasingly important because they facilitate

multimedia multicast over heterogeneous


network environments [1], [21]. The
multiresolution property of 3-D wavelet
representation based on motion-compensated
temporal ltering (MCTF) is a natural way to
solve the scalability issue in video coding [2],
[3], [8], [13]. Even so, to compete with the
considerable success of conventional scalable
coding methods based on H.264, the
ManuscriptreceivedSeptember24,2007;revisedA
ugust08,2008.Firstpublished December 2, 2008;
current version published December 12, 2008.
The associate editor coordinating the review of
this manuscript and approving it for publication
was Dr. Giovanni Poggi. C.-C. Cheng is with the
Institute of Information Science, Academia
Sinica, Nankang, Taipei, 11529 Taiwan, R.O.C.
(e-mail: eddie.cheng@oba.co.uk). W.-L. Hwang
is with the Institute of Information Science,
Academia Sinica, Nankang, Taipei, 11529
Taiwan, R.O.C., and also with the Digital
Technology Department, Kainan University,
Taoyuan, Taiwan, R.O.C. (e-mail:
whwang@iis.sinica.edu.tw). G.J.PengiswiththeGraduateInstituteofElectronicsE
ngineering,National Taiwan University, Taipei,
Taiwan, R.O.C. (e-mail:
B89052@csie.ntu.edu.tw). Color versions of one
or more of the gures in this paper are available
online at http://ieeexplore.ieee.org. Digital
Object Identier 10.1109/TIP.2008.2007067
MCTF-based 3-D wavelet video codec must be
constantly improved. The technique used to
solve the optimal bit-allocation problem is an
important feature that contributes to the success
of current video coding. For example, in
wavelet-based image coding methods, it is
widely recognized that the reconstruction error
variance in the pixel domain does not equal the
variance resulting from quantizing the subbands
coefcients. The problem has been elegantly
solved in 2-D wavelet coefcients [16], [18] by
assigning different weights to the subbands,

resulting in equivalent reconstruction error


variance and quantization error variance in the
wavelet domain. The analysis employs the high
bit rate assumption to simplify the derivation
process. The assumption supposes that the bit
rate is so high that the quantization error of the
coefcients of a subband resulting from
applying the optimal bit-allocation procedure
can be modeled as independent and identicallydistributed white noise. Solving the optimal bitallocation problem in 3-D wavelet coding is
more complicated because the energy difference
between the pixel and wavelet domains results
from using a bi-orthogonal wavelet and the
MCTF process. The latter imposes a different
connectivity status (single-connected, multipleconnected,orunconnected)oneachpixelduringmot
ionprediction.
Thus,directextensionofthemethodsin[16]and[18]
from2-D waveletcoefcientsto3Dwaveletcoefcientscannotsolvethe 3Dwaveletoptimalbitallocationproblem,sincetheextension does not
take account of the pixel connectivity in the
MCTF process. For the Haar wavelet used in
temporal ltering, Ohm [9] gives a detailed
description of how to derive different
quantization weights for low-pass and high-pass
components. The weights are adjusted according
to the positions of unconnected and multipleconnected pixels. However, it is very hard to
extendtheapproachtoothertemporalltersbecause
adjustingthe
weightsofunconnectedandmulticonnectedpixelsi
smorecomplicated.
Inthispaper,weaddresstheaboveproblemfora3Dwavelet coding method. We derive the
weighting factors based on the analysis of pixel
connectivity in the MCTF process and show that
the reconstruction error of a group of pictures
(GOP) can be derived from a weighting of the
quantization error on each spatial-temporal
subband. Our analysis is based on Usevitchs
derivation of the weighting factors for image

coding under the high bit rate assumption and


Girod and Hans representation of MCTFs
prediction and update steps. Depending on the
way a spatial-temporal subband is obtained,
current 3-D wavelet coding schemes can be
divided into two categories: 2-D (MCTF is
applied rst) and 2-D (the spatial wavelet
CHENG et al.: SUBBAND WEIGHTING
WITH PIXEL CONNECTIVITY FOR 3-D
WAVELET CODING 53
transform is applied rst). The 2-D scheme
yields a high coding gain and a low coding
complexity in full resolution frames. The issues
of optimal bit allocation under the 2-D scheme
are discussed in [19]. Since the motion vectors
of the 2-D scheme can be estimated more
accurately and can be subbanddependent,theschemeisagoodcandidateforscalabl
e video coding [10]. In this paper, we base our
analysis and experiments on the 2-D scheme.
The experiment results show that our approach
improves the bit-allocation performance over
that obtained by applying the weighting factors
derivedwithout considering pixelconnectivelyin
the MCTFprocess. Note that, although our
derivation is based on the 2-D structure, it can
also be applied to 2-D structure. The remainder
of this paper is organized as follows. In Section
II, we briey review the 2-D wavelet coding
scheme. In Section III, we explain how to derive
spatial weighting factors, and discuss the
derivation of the weighting factors for the 2-D
scheme. In Section IV, we formulate the bit
allocation problemofanMCTFbasedwaveletcodec,andproposeanefcient
algorithmtoobtainitssuboptimalsolution.Wereportthe experiment results
in Section V, and then summarize our
conclusions in Section VI.
II. 2-D WAVELET VIDEO CODING SCHEME
In a 2-D coding scheme, video frames are rst
spatially decomposed into multiple subbands,

after which lifting-based MCTF is applied to


each subband separately. In lifting-based MCTF,
motion compensation is implemented by using
even frames to predict odd ones. We use the 5-3
temporal lter to demonstrate the MCTF process
[4]. The lifting-based MCTF process using the
9-7 temporal lter is described in [20]. In a
frame, , let be the discrete wavelet coefcients
(with decimated lter banks) in the th subband at
the th level of decomposition (starting from
zero). In addition, let and be, respectively, the
forward and backward motion vectors obtained
by applying motion estimation methods on the
dyadic wavelet coefcients of the frames, where
the dyadic wavelet coefcients are obtained by
an un-decimated lter bank [12]. The highpass
coefcients of a block are 1.where denotes the
dyadic wavelet coefcients of the th subbands in
the th level of frame ; and are, respectively,the
forwardand backward motion vector block .The
lowpass coefcients of the block are(2)where
and arethe thsubbandinthe th level dyadic
wavelet coefcients of frames and , respectively.
Note that the motion vectors in (1) and (2) are
not scalable.TaubmanandSecker investigated
themotion information scalability problem.
Interested readers can refer to [14] for details of
their frame work
From(1)and(2),weobservethatthe energy in the
pixel domain can be altered after application of
the spatial wavelettransform,temporal
wavelettransform, and motion estimation in the
MCTF process. To preserve the energy between
the pixel domain and the wavelet domain, we
derive the weighting factors of the spatialtemporal subbands.
III. SUBBAND WEIGHTING The weighting
factor indicates how much a unit quantization
power in the subband contributes to the overall
distortion in the reconstructed GOP. The error
propagation model in [17] can be used to derive
the weighting factor of the spatial wavelet
transform.However,toobtaintheweightingfactoro
fthe MCTF process, we need to invoke a novel

derivation of the approach, as the process


applies a 1-D wavelet transform in the temporal
direction. It also involves a complicated motion
compensation process.
A. Spatial Subband Weighting Wenowre view
the error propagation modelproposedin[17]
forderivingtheweightingfactorsofspatialwavelettr
ansforms. Let and be the analysis matrix and the
synthesis matrix, respectively.Both matricesare
double subscripted by ,where represents the
level of decomposition(startingfromzero),and
represents the subband channel (0 or 1). In
addition, let the original image size be by ; and
let , , , and denote,respectively,thefour
subbandmatricesafter applying one level 2-D
discrete wavelet decomposition to an image .
Using the identity of can be reconstructed from ,
with as follows:
(3)Let denote the error matrix resulting from the
reconstruction of an image , and let denote the
quantization error matrix of the th subband in
the rst decomposition level of . According to
(3), we have
(4)Using the property of Kronecker products
where , are column vectors constructed row-wise
from the matrices , , respectively. Applying this
identity to (4), we now have
here and are the column vectors constructed
rowwise from the matrices and , respectively.
The reconstruction mean square error (MSE) of
an image is
(6)
The equation can be solved by the high bit rate
assumption as follows. At a high bit rate, it is
assumed that the quantization errors of
waveletsubbands are whiteand mutually
uncorrelated [7]. In this situation, we obtain the
following identities for the vector representation
of errors in subbands and :

when (7)
where is the MSE of any element in the vector .
By substituting(5)into(6)
TheabovederivationshowsthattheMSEmeasuredi
nthe pixel domain is the weighted sum of the
MSE of subbands in the waveletdomain. Note
thatthe weightingfactors aredetermined by the
lters. The derivations from (4)(8) for the onelevel decomposition case can be used to derive
the MSE of a multilevel decomposition case, as
described in [17].
B. Temporal Subband Weighting

To represent the prediction and update steps of


the liftingbased MCTF process, we follow the
scheme proposed in [5], whereby all the
predictions and updates of an -sized frame are
integrated to form two matrices, and . Fig. 1
shows an example of motion estimation. The
corresponding pixel connectivity matrices, and ,
are constructed as follows Fig.1. Example
ofMCTF motion estimation.The types
ofconnectivity pixels in the example are singleconnected, multiple-connected, and unconnected
pixels. The corresponding prediction (P) and
update (U) matrices are given in (9), indicates
that there is a connection between the th pixel
where 1/4 and 1/2 are, respectively, the scaling
factors for the high-band and low-band signals
of the 5-3 wavelet lter. Without loss of the
generality, in the following, we derive the
temporal weighting factor by using the 5-3 lter.
A similar derivation can be easily extended to
the 9-7 temporal lter, as described in the
Appendix. 1) OneLevelTemporalDecomposition: Weusealowercase letterto denote the vectorformed byrowwise concatenation of
animage,whichisdenotedbyacapitalletter.Forexa
mple, is the vector, constructed row-wise, of the
image . Using Girod and Hans notations [5], all
the block-based one-level MCTF processes with
5-3 lters, as given in (1) and (2), can be
integrated and written as
where and are the vector representations of
highpass
andlowpasssubbands,respectively.Thesuperscript
s and represent the motion directions based on
the th frame, where forward and backward
prediction are denoted as and respectively. From
(36) and (37), we observe that the MCTF
process involves the temporal wavelet lters and
the pixel connectivity matrices, i.e, and . Let and
represent the quantization errors resulting
fromlossysourcecoding,and and denote the
reconstructed even and odd frames respectively.
From (37), we can obtain the reconstruction

error of the th frame as follows Substituting (12)


into (36) for and , we obtain the following
reconstruction error of the th frame: quations
(12) and (13) represent a motion-dependent error
propagation model for a one-level MCTF
process using the 5-3 temporal wavelet lter.
The reconstructed MSE of the th frame can be
derived as follows: Wedene .Using the
highbitrate assumption that derives the identities
in (7), the last three cross terms are zero and (14)
becomes where , , and are the reconstruction
errors of induced by quantizing an element in
subbands , , and , respectively. Following the
samederivation,thereconstructionerrorofthe
thframe is By
applyingderivationssimilartothosein(15)and(16)t
oeach even frame and each odd frame in a GOP
respectively (an
exampleofaGOPisshowninFig.2),wecanrelatethe
reconstructionerrorsoftheGOPtothequantizatione
rrorsofsubbandsby a linear relation where (see
the equation shown at the bottom of the page).
To measure the consequence of quantizing a
temporal subband in the GOP, we should
aggregate the errors induced in all the frames by
quantizing the subband. This is exactly the
summation of the corresponding column of the
matrix for the subband. For example, the
reconstruction error resulting from subband
isthesummationofthevaluesinthefthcolumn of .
The summation is denoted as , which is the
temporal weighting factor of . The following
equation gives the temporal weighting factors of
subbands derived from :Fig. 2. Example of twolevel temporal wavelet decomposition of a GOP.
The frames outside the boundary of the GOP can
be dealt with by pasting blank frames or by
changing the bi-directional prediction mode to
the uni-directional prediction Note that if is an
orthonormal matrix, then is equal
to1.Inthiscase,thequantizationerrorinthetemporal
subbands becomes the reconstruction error of
the GOP. 2) Multilevel Temporal
Decomposition: To represent the

thleveloftemporal decomposition,weusethe
subscript (starting from 0) to predict and update
matrices and respectively. The error at level 2
that causes the error at level 1 can be derived by
extending the analysis of the following subbands
(see Fig. 2)
From (19)(21), the quantization errors of
subbands , , and can be derived, respectively, as
follows:
Followingthesamederivationsasthosein(14)and(1
5),wecan obtain: Substituting (22) and (24) into
(23) for and , and applying the derivations in
(14) and (15), we now have Here, we omit the
derivations of the other subbands at level 1
because they can be obtained by a similar
derivation to (25) or (26). We should also point
out that the errors at level 2 only affect the lowfrequency subbands at level 1. To sum up, the
matrix that relates errors between levels note
that the rows in corresponding to high-frequency
subbandsatlevel1(e.g., )arezero.Thereconstructed
MSE of the frames (at level 0) obtained by
quantizing the subbands at level 2 can be derived
by substituting (27)
Accordingly,thesummationofthevaluesinacolum
nof used to measure the error caused by
quantizing a subband becomes the reconstructed
MSE of the GOP. As in the one-level temporal
decomposition case, we call the summation the
temporal weighting factor of the subband and
denote it as . The temporal weighting factors of
the subbands at level 2 are calculated as
follows:The above derivation can also be used to
generate the temporal error propagation model
for temporal decomposition of more than two
levels; thus, we omit the derivation of that case
here. We have analyzed the error propagation
models of spatial wavelet subbands and
temporal wavelet subbands individually. Next,
we combine the analyses to construct the error
propagation model of spatial-temporal subbands.

C. Spatial-Temporal Subband Weighting


Inwaveletvideocoding,eachsubbandisindexedbyb
othspatial and temporal decomposition. We use
the notation to denote the th spatial-temporal
subband of frame , where represents the th
subband in the th spatial decomposition, and
represents the thsubband in the th
temporaldecomposition.Forsimplicity,butwithout
lossofgenerality, we derive the spatial-temporal
subband weighting factorsbasedonthetwolevel5-3temporallteringexamplegiven in
Section III-B. Note that, in the 2-D approach, the
MCTF process is applied to the spatial subbands
of a GOP. Thus, the spatial-temporal weighting
can be derived by using (29) to calculate the
temporal weighting for each spatial subband,
after which we can obtain the spatial weighting
of each spatial subband. The spatial-temporal
weighting matrix that relates the reconstructed
MSE of the GOP and the errors on the spatialtemporal .where denotes the temporal weighting
matrix of spatial subband (see(28)),and
isthespatialweightingofsubband (the derivations
can be found in [17]). Without loss of generality,
we use (31) to illustrate how to derive the
spatial-temporal weighting of a subband. The
spatial-temporal weighting of subband is the
summation of the values in the second column
of (denoted as ) multiplied by . The example can
be generalized to obtain the weighting of any
spatial-temporal subband as follows
This weighting indicates the reconstruction MSE
of a GOP resulting from quantizing subband .
After deriving the spatialtemporalweighting,wecanperformratedistortionanalysis of the 2-D coding in the
wavelet domain.
IV. OPTIMAL BIT ALLOCATION The
objective of optimal bit allocation is to assign
bits to different subbands so that the least
distortion of decoded videos can be achieved
under a certain rate limitation. For simplicity,
weassumethatthe rate-

distortionfunctionofeachsubbandcan be derived
independently. Therefore, the optimal bitallocation of a GOP can be formulated as follow
where denotes the set of spatial-temporal
subbands of the GOP, is the rate of motion
vectors, and is the spatial-temporal weighting of
subband , whose value is calculated according to
(32). The formulation implies that we should
modify the encoding phase of 2-D , shown in the
top subgure of Fig. 3, to incorporate the
weighting factors. This can bedone
bymultiplyingthe spatialweightingfactors bythe
spatial wavelet subbands, followed by
multiplying the temporal weighting matrix by
the result of the MCTF process, as shown in the
bottom subgure of Fig. 3. Althoughthe solution
ofthe aboveoptimizationproblem can be obtained
by using a Lagrangian optimization approach
[15] or a dynamic programming algorithm [11],
we propose a sim Fig. 3. Top: Bit-allocation of a
2-D structure without weighting. Bottom:
Proposed bit-allocation method, where spatial
and temporal weighting are applied to a 2-D
structure.
Fig. 4. Top: Schematic lifting structure of the 5-3
temporal lter. Bottom: Lifting structure of the
9-7 temporal lter. Conceptually, the lifting
structure of the 9-7 lter can be regarded as the
concatenation of two 5-3 lifting structures, but
with different coefcients on the prediction and
update matrices.
pler, more efcient algorithm to derive a suboptimal solution.
Conceptually,ouralgorithmisbasedonthepremiset
hatthe the magnitudes of the slopes of the
optimal rate-distortion function
indicatetheamountofdistortionreduction.Therefor
e,thealgorithm
assignsbitstoencodesubbandsaccording
toeachslopes magnitude. First, we compute the
rate-distortion function of
eachsubband.Then,wecalculatetheslopesofallrate
-distortion functions, and arrange the slopes in

decreasing order of magnitude. Finally, we


assign bits to encode a subband according to that
order and stop when the bit budget is exhausted.
Although the rate-distortion function would be
convex for an optimal coder, in practice, the
optimal rate-distortion function
canonlybeapproximated.Therefore,itispossibleth
attheslope of a segment at a lower bit rate will
be smaller than that of the segment at a higher
bit rate. As a consequence, the above procedure
encodes the segment at the higher bit rate prior
to encoding it at the lower bit rate. To
summarize, when the high bit rate segment is
selected,allsegmentsbetweenthelowerbitrate
segment and the higher bit rate segment are
encoded.
V. EXPERIMENT RESULTS We now
demonstrate the efcacy of applying the
proposed weightingmethodsto2-D
waveletcoding.Intheexperiment, we compare the
coding performance of the proposed method
with that of the method that does not adjust the
coefcients and the method that adjusts the
wavelet coefcients without considering the
pixel connectivity statuseither single,
multiple, or unconnectedin the MCTF
process. Note that the last method is the same as
the case of extending the weighting method in
[19] to a 2-D structure. In our experiment, a
GOP has 32 CIF frames. Each frame of a video
sequence is rst decomposed by applying a
threelevel spatial wavelet transform using either
the 5-3 or the 9-7 wavelet lter. Then, a velevel MCTF process using the 5-3 or the 9-7
temporal wavelet lter is applied to each spatial
subband. The lifting-based 9-7 temporal lter is
implemented by concatenating two 5-3-like
lifting structures. The derivation of its weighting
factors is described in the Appendix. During the
process, motion estimation is implemented using
a full-search with integer-pixel accuracy on the
dyadic wavelet coefcients. Theblocksizeis16
16,andthesearchrangeforboththehorizontalandver
ticaldimensionsis[ 16,15].Finally,weuse2-D

EZBC[6]toencodeindividualspatialsubbandsthatr
esultfrom the temporal MCTF process. We
demonstrate and compare the results on three
different parameter settings. Scheme 1 does not
use spatial weighting or temporal weighting in
the wavelet
domain.Scheme2usesthespatialandtemporalweig
htingfactors, which are derived without
considering the pixel connectivity in the MCTF
process. Scheme 3 applies the proposed method
in the wavelet domain.
Figs.5and6show,respectively,theabsoluteMSEdif
ference between the reconstruction error and the
quantization error in the wavelet domain by
using the 5-3 and 9-7 temporal lters versus the
bit rate for various CIF sequences. We observe
that both Scheme 2 and Scheme 3 reduce the
absolute MSE differencemore effectivelythan
Scheme 1, which does notapply any
weightingfactors.Withregardtotheweightingfacto
rs,Scheme 2 reduces the absolute value of the
MSE difference; however, Scheme 3, which
incorporates the pixel connectivity into the
weightingfactors,reducestheabsolutevalueofthe
MSEdifferencemoreeffectivelythanScheme2athi
ghbitrates.Scheme3
yieldsasubstantialabsoluteMSEreductioninallbitr
atescompared to Scheme 2 when the 9-7
temporal lter is used. We also compare the
coding performances of the three schemes for
bit-allocation in a 2-D structure. Figs. 7 and 8
illustrate the PSNR coding gain of Scheme 2 and
Scheme 3 over that of Scheme 1. The average
coding performance of each scheme is measured
on the CIF resolution of a GOP with different
spatial and temporal lter combinations. As
shown in Fig. 7, for the 5-3 temporal lter, the
coding gains of Scheme 2
andScheme3arealmostthesameforallbitrates.Ifwe
average the coding gains over all the bit rates,
both schemes achieve
CHENG et al.: SUBBAND WEIGHTING
WITH PIXEL CONNECTIVITY FOR 3-D
WAVELET CODING 59

Fig. 5. Absolute MSE difference between the


reconstruction error and the quantization error at
various bit rates by applying the 5-3 temporal
lter and different spatial wavelets on different
video sequences (CIF) at a frame rate of 30 fps.
Scheme 1: Neither spatial nor temporal
weighting is used. Scheme 2: Both spatial and
temporal subbands are weighted but the pixel
connectivity is not considered. Scheme 3:
Proposed method is used. Scheme 3 yields the
smallest absolute MSE difference at high bit
rates.
approximate 0.5 dB coding gain. However, as
shown in Fig. 8, for the 9-7 temporal lter, a
signicant coding gain is achieved by using
Scheme 3 rather than Scheme 2. Scheme 2
yields an average1dB codinggain overthe
codingderivedbyScheme 1,
andScheme3achievesanaveragecodinggainofmor
ethan0.5 dB over that of Scheme 2. Because the
lifting-based 9-7 lter is the cascade of two
lifting-based 5-3-like structure, as shown in Fig.
4, the effects of the 9-7 temporal lters pixel
connectivity matrices on the reconstruction error
are higher than those of the 5-3 temporal lter.
Thus, Scheme 3 using the 9-7 temporal lter
yields a better coding performance result than
Scheme 2.
VI. CONCLUSION

In this paper, we propose a novel weighting


method, which takes account of the pixel
connectivity in the MCTF process, to derive the
effect of the quantization error of a subband on
the reconstruction error of a group of pictures.
We employ the proposed method on a 2-D
structure and show that it improves the bitallocation performance over that obtained by
using weightings derived without considering
the pixel connectivity
intheMCTFprocess.Inafuturework,wewillexperi
mentwith different coding parameters, such as
variable-bock size coding and subpixel motion
vectors, to obtain a more compelling result.
Fig. 6. Absolute MSE difference between the
reconstruction error and the quantization error at
various bit rates by applying the 9-7 temporal
lter and different spatial wavelets on different
video sequences (CIF) at a frame rate of 30 fps.
Scheme 1: Neither spatial nor temporal
weighting is used. Scheme 2: Both spatial and
temporal subbands are weighted, but the pixel
connectivity is not considered. Scheme 3:
Proposed method is used. Scheme 3 yields the
smallest absolute MSE difference at all bit rates.
APPENDIX WEIGHTING FACTORS OF THE
9-7 TEMPORAL FILTER The 9-7 lifting
structure comprises two steps, which are similar
to those the 5-3 lifting structures, but with
different lter coefcients. As shown in Fig. 4,
after the rst step, intermediate frames (IH and
IL) are generated. Then, the intermediate frames
are used to generate the L and H frames in the
second step.Inthefollowing,
denotestheframebeforetheliftingprocedure; and
represent the respective intermediate frames; and
represent the frames inthe low-passsequence and
highpass sequence respectively; and the matrices
and denote the pixel connectivity matrices
discussed in Section III-B. The lifting-based 9-7
temporal ltering scheme Note that wemodify
thesuperscriptsof and toshowwhich step of the 97 lifting scheme these matrices belong to. The

derivation of the weighting factor for 9-7


temporal ltering is
Fig. 7. Comparison the average PSNR gain of
the CIF resolution of a GOP using different
spatial-temporal weighting schemes. The
temporal lter is 5-3 and the spatial lter is
either 5-3 or 9-7. The performance gains of
Scheme 2 and Scheme 3 over Scheme 1 range
from 0.11 dB. Although the results in Fig. 5
show that Scheme 3 yields a smaller absolute
MSE difference between the reconstruction error
and the quantization error at high bit rates than
Scheme 2, the PSNR gains of both schemes are
almost the same.
similar to that for 5-3 temporal ltering.
Therefore, for simplicity, we only show the
derivation of the weighting factors for one level
temporal decomposition using the 9-7 lter.
From (34)(37), we obtain the following error
term
stitutingtheerrorsoftheintermediatesequencesin(3
9)and (41)into(38)and(40),respectively,andrearrangingtheterms,
Fig. 8. Comparison the average PSNR gain of
the CIF resolution of a GOP using different
spatial-temporal weighting schemes. The
temporal lter is 9-7 andthespatiallteriseither53or9-7.TheperformancegainsofScheme2and
Scheme 3 over Scheme 1 are signicant. The
obvious PSNR gain of Scheme 3
overScheme2atallbitratesindicatesthat,whenthe97temporallterisused, including the pixel
connectivity in the weighting ofthe wavelet
coefcients improves the coding
performance.we obtain the relationship between
the quantization error and the reconstruction
error. For even frames, we haveCHENG et al.:
Usingthehighbitrateassumption,wecanobtainthe
weighting factors, as follows:
ACKNOWLEDGMENT
Theauthorswouldliketothanktheanonymousrevie
wersfor their insightful comments.

REFERENCES [1] Y. Andreopoulos, A.


Munteanu, J. Barbarien, M. Van der Schaar, J.
Cornelis,andP.Schelkens,Inbandmotioncompensatedtemporalltering,Sign
alProcess.:ImageCommun.,vol.19,pp.653
673,August 2004. [2] P. Chen and J. W. Woods,
Bidirectional MC-EZBC with lifting
implementation, IEEE Trans. Circuits Syst.
Video Technol., vol. 14, no. 10, pp. 11831194,
Oct. 2004. [3] S.-J. Choi and J. W. Woods,
Motion-compensated 3-D subband coding of
video, IEEE Trans. Image Process., vol. 8, no.
2, pp. 155167, Feb. 1999. [4] M. Flierl and B.
Girod, Investigation of motion-compensated
lifted wavelet transforms, in Proc. Picture
Coding Symp., Apr. 2003, pp. 5962. [5] B.
Girod and S. Han, Optimum update for motioncompensated lifting, IEEE Signal Process.
Lett., vol. 12, no. 2, pp. 150153, Feb. 2005. [6]
S.-T. Hsiang and J. W. Woods, Embedded video
coding using invertible motion compensated 3-D
subband/wavelet lter bank, Signal Process.:
Image Commun., vol. 16, pp. 705724, May
2001. [7] N. Jayant and P. Noll, Digital Coding
of Waveforms. Englewood Cliffs, NJ: PrenticeHall, 1984. [8] J.-R. Ohm, Three-dimensional
subband coding with motion compensation,
IEEE Trans. Image Process., vol. 3, no. 5, pp.
559571, Sep. 1994. [9] J.-R. Ohm, Multimedia
Communication Technology. New York:
Springer Verlag, 2004. [10] J.-R. Ohm, M. Van
der Schaar, and J. W. Woods, Interframe
wavelet codingMotion picture representation
for universal scalability, Signal Process.: Image
Commun., vol. 19, pp. 877908, 2004. [11] A.
Ortega and K. Ramchandran, Rate-distortion
methods for image and video compression,
IEEE Signal Process. Mag., vol. 15, no. 6, pp.
2350, Nov. 1998. [12] H.-W. Park and H.-S.
Kim, Motion estimation using low-band-shift
methodforwavelet-basedmovingpicturecoding,IEEE Trans.Image Process., vol.
9, no. 4, pp. 577587, Apr. 2000. [13]
T.Rusert,K.Hanke,andJ.-

R.Ohm,Transitionlteringandoptimized
quantizationininterframewaveletvideocoding,Pr
oc.SPIEVis.Communi. Image Process., vol.
5150, pp. 682694, 2003. [14] A. Secker and D.
Taubman, Highly scalable video compression
with scalable motion coding, IEEE Trans.
Image Process., vol. 13, no. 8, pp. 10291041,
Aug. 2004. [15] G. J. Sullivan and T. Wiegand,
Rate-distortion optimization for video
compression, IEEE Signal Process. Mag., vol.
15, no. 6, pp. 7490, Nov. 1998. [16] D. S.
Taubman and M. W. Marcellin, JPEG2000.
Norwell, MA: Kluwer, 2002. [17] B. Usevitch,
Optimal bit allocation for biorthogonal wavelet
coding, in Proc. Data Compression Conf.,
1996, pp. 387395. [18] B. Usevitch, A tutorial
on modern lossy wavelet image compression:
Foundations of JPEG 2000, IEEE Signal

Process. Mag., vol. 18, no. 5, pp. 2234, Sep.


2001.
[19] R. Xiong, J. Xu, F. Wu, S. Li, and Y.-Q.
Zhang, Optimal subband rate allocation for
spatial scalability in 3D wavelet video coding
with motion aligned temporal ltering, Proc.
SPIE Vis. Commun. Image Process., vol. 5960,
pp. 381392, 2005. [20] C. H. Yang, J. C. Wang,
J. F. Wang, and C. W. Chang, A block-based
architecture for lifting scheme discrete wavelet
transform, IEICE Trans. Fundam. Electron.,
Commun., Comput. Sci., pp. 10621071, May
2007. [21] Q. Zhang, Q. Guo, Q. Ni, W. Zhu,
and Y.-Q. Zhang, Sender-adaptive and receiverdriven layered multicast for scalable video over
the internet, IEEE Trans. Circuits Syst. Video
Technol., vol. 15, no. 4, pp. 482495, Apr. 2005.

You might also like