Professional Documents
Culture Documents
Perceptual transparency
Manuscript received Aug. 16, 2001; revised Feb. 20, 2002. The main requirement for watermarking is perceptual trans-
This work is supported by the Ministry of Information and Communication (MIC) of Korea.
Jongwon Seok (phone: +82 42 860 1306, e-mail: jwseok@etri.re.kr), Jinwoo Hong (e-mail:
parency. The embedding process should not introduce any per-
jwhong@etri.re.kr) are with Broadcasting Content Protection Research Team, ETRI, Daejeon, ceptible artifacts, that is, the watermark should not affect the
Korea.
quality of the original signal. However, for robustness, the wa-
Jinwoong Kim (e-mail: jwkim@video.etri.re.kr) is with Broadcasting Media Research
Department, ETRI, Daejeon, Korea. termark energy should be maximized under the constraint of
ETRI Journal, Volume 24, Number 3, June 2002 Jongwon Seok et al. 181
keeping perceptual artifacts as low as possible. Thus, there [7] presented an audio watermarking algorithm that exploits
must be a trade-off between perceptual transparency and ro- temporal and frequency masking by adding a perceptually
bustness. This problem can be solved by applying human per- shaped spread-spectrum sequence. However, the disadvantage
ceptual modeling in the watermark embedding process. of this scheme is that the original audio signal is needed in the
watermark detection process. Neubauer et al. [8]-[10] intro-
Security duced the bit stream watermarking concept. The basic idea of
The watermark must be strongly resistant to unauthorized this method was to partly decode the input bitstream, add a
detection. It is also desirable that watermarks be difficult for an perceptually hidden watermark in the frequency domain, and
unauthorized agent to forge. Watermark security addresses the finally quantize and code the signal again. As an actual applica-
secrecy of the embedded information. Where secrecy is neces- tion, they embedded the watermarks directly into compressed
sary, a secret key has to be used for the embedding and extrac- MPEG-2 AAC bit streams so that the watermark could be de-
tion process. tected in the decompressed audio data. Bassia et al. [11] pre-
sented a watermarking scheme in the time domain. They em-
Robustness bedded a watermark in the time domain of a digital audio sig-
For all watermarking applications, robustness is one of the nal by slightly modifying the amplitude of each audio sample.
major algorithm design issues because it determines the algo- The characteristics of this modification were determined both
rithm behavior towards data distortions introduced through by the original signal and the copyright owner key. Solana
standard and malicious data processing. The watermark should Technology [12] proposed a watermark algorithm based on
be robust in common signal processing, including digital-to- spread-spectrum coding using a linear predictive coding tech-
analog and analog-to-digital conversion, linear and nonlinear nique and fast Fourier transform (FFT) to determine the spec-
filtering, compression, and scaling. tral shape. In the watermark detection process, the water-
marked audio signal is whitened before correlation using the
Watermark recovery without the original signal rake receiver.
In most applications, watermark extraction processes do not This paper presents a novel audio watermarking scheme for
have to access to the original signal because of the large size of copyright protection of digital audio. The watermark embed-
the data. This is called public or blind watermarking. In fact, ding scheme accomplishes perceptual transparency after wa-
this property is essential to real environments such as broad- termark embedding by exploiting the masking effect of the
casting and portable players. Except for some special cases, human auditory system, as in [7], [13]-[15]. This approach can
embedded information should be recovered without the be applied to other signals, such as image and video. For im-
original signal. ages, this can be accomplished by adopting a human visual
model [16]. In the detection procedure, by applying whitening,
Data rate or de-correlation, before correlation, the proposed scheme does
The watermarking scheme should have a data rate sufficient not need the original audio signal to extract the watermark in-
to support the copyright information. In most applications, formation. Linnartz et al. [17] and Depovere et al. [18] used a
copyright information consists of copy control information, us- similar approach in image watermarking. They applied the
age information, and the serial number of the contents. To meet whitenning procedure in watermark detection to remove
the requirement for copyright information, at least several tens correlation in pixels using a lowpass filter. We achieved this by
of bits are needed. removing the audio spectrum in the watermarked audio using a
Several digital watermark algorithms have been proposed, linear prediction analysis, which is a frequently used technique
but most watermark algorithms focus on image and video. in speech signal processing.
Only a few audio watermark algorithms have been reported. This paper is organized as follows. Section II introduces con-
Bender et al. [6] proposed several watermarking techniques, siderations for improving watermark detection performance in
which include the following: spread-spectrum coding, which the correlation scheme. Section III presents the proposed wa-
uses a direct sequence spread-spectrum method; echo coding, termarking scheme, including the embedding and detection
which employs multiple decaying echoes to place a peak in the process. We show the experimental results of the proposed al-
cepstrum domain at a known location; and phase coding, gorithm in section IV and conclude the paper in section V.
which uses phase information as a data space. Unfortunately,
these watermarking algorithms cause perceptible signal distor-
II. SOME CONSIDERATIONS
tion and show low robustness. Furthermore, they have a rela-
tive high complexity in the detection process. Swanson et al. Figure 1 shows the basic and simple watermarking embed-
182 Jongwon Seok et al. ETRI Journal, Volume 24, Number 3, June 2002
ding and detection process using the correlation scheme. As the false positive and negative probabilities.
Fig. 1 illustrates, in principle, we can regard most correlation-
based embedding processes as adding an additional signal to 1. The false positive and negative probabilities decrease as
the original signal to obtain a watermarked signal. The water- the strength of watermark dw to be embedded increases.
mark embedding process from Fig. 1 can be represented as However, the strength of the embedded watermark signal is
limited and depends on the human perceptual characteristics of
y (n) = x(n) + w(n) , (1)
the audio signal.
2. The false positive and negative probabilities decrease as
where y (n) is the watermarked audio signal, x(n) the the standard deviation x increases. This can be obtained by
original signal, and w(n) the watermark signal. Performing
applying whitening, or de-correlation, before correlation in the
the watermark detection by correlation, the resulting decision
detection procedure. Application of the whitening procedure
value d will be
should significantly remove the correlation in the signal.
d = dx + dw
1 1 (2) III. WATERMARKING SCHEME
=
N
x(i)w(i) + N w
i i
2
(i ).
1. Watermark Embedding
From (2), it is clear that dw is the energy of the watermark Our watermark embedding scheme is based on a direct-
w(n) , and the expected value of dx is zero. Because of the sequence spread-spectrum (DSSS) method. The auxiliary in-
central limit theorem, if N is sufficiently large and if the formation that is to be embedded is modulated by a pseudo-
contributions of the sums are sufficiently independent, dx has noise (PN) sequence. The spread-spectrum signal is then
a Gaussian distribution. If we set the threshold for the water- shaped in the frequency domain and inserted into the original
mark detection T = dx / 2 , the probability of a false negative audio signal. The embedding strength determines the energy of
P (the watermark is present, but the detector decides it is the watermark signal and can be considered as the energy of
not) equals the probability of a false positive P+ (the the noise added to the audio signal.
watermark is not present, but the detector decides it is): Our goal was to design the weighting function so that the
watermark energy was maximized subject to a required maxi-
1 dw mal acceptable distortion. The strength of the embedded wa-
P+ = P = erfc , (3)
2 8x termark signal depends on the human perceptual characteristics
of the audio signal. We accomplished our goal with an embed-
where erfc() is the complementary error function. From (3), ding scheme that exploits the masking effect of the human
we may draw the following preliminary conditions to decrease auditory system. We intended to adapt the watermark so that
x y y
Decision
Value
w
w
Watermark
Generator
Message
ETRI Journal, Volume 24, Number 3, June 2002 Jongwon Seok et al. 183
the energy of the watermark was maximized and the audi- method, our watermark embedding works as follows:
tory artifact was kept as low as possible. We used the mask-
ing model defined in the ISO-MPEG Audio Psychoacoustic a) Calculate the masking threshold of the current analysis
Model [19]. The detailed procedure to obtain the Psycho- audio frame using the Psychoacoustic model with an analy-
acoustic Model is well described in [15], [19], [20]. sis audio frame size of 1024 samples and a 1024 point FFT.
The inner ear performs a short-time critical band analysis b) Generate the PN sequence with a length of 1024.
where the frequency-to-place transform occurs along the c) Perform the FFT to the copyright information, which is
basilar membrane. The power spectra are not represented on modulated by the PN sequence.
a linear frequency scale but on limited frequency bands d) Using the masking threshold, shape the watermark signal
called critical bands. The auditory system can roughly be de- to be imperceptible in the frequency domain.
scribed as a band-pass filter bank. Simultaneous masking is a e) Compute the inverse FFT of the shaped watermark signal.
frequency domain phenomenon where a low-level signal can f) Create the final watermarked audio signal by adding the
be made inaudible by a simultaneously occurring stronger watermark signal to the original audio signal in the time do-
signal. Such masking is largest in the critical band in which main.
the masker is located, and it is effective to a lesser degree in
neighboring bands. The masking threshold can be measured, 2. Watermark Detection
and low-level signals below this threshold will not be audible. When designing a watermark detection system, we need to
The final masking threshold curve obtained from the analy- consider the desired performance and robustness of the system.
sis audio segment is approximated with a 12th order all-pole The watermark should be able to be detected under common
filter. The filtering is then applied to the PN sequence using signal processing operations, such as digital-to-analog and
the filter coefficients obtained from the all-pole modeling to analog-to-digital conversion, linear and nonlinear filtering,
shape the watermark signal to be imperceptible in the fre- compression, and scaling. Furthermore, in most applications,
quency domain. This masking threshold is the limit for watermark extraction processes do not have to access the
maximizing the watermark energy while keeping the percep- original signal. In fact, not having access to the original signal
tual audio quality. is essential in real environments such as broadcasting.
In our embedding process, we repeatedly apply an em- The proposed watermark detection procedure does not re-
bedding operation on short segments of the audio signal. quire access to the original audio signal to detect the water-
Each one of these segments is called a frame. A diagram of mark signal. Most correlation detector schemes assume that
the audio watermark embedding scheme is shown in Fig. 2. the channel is white Gaussian; howerer, this is not the case
Using the concept of the Psychoacoustic model and DSSS for real audio signals because the audio samples are highly
W atermarked
Audio
Copyright
FFT
Information
PN
Sequence
184 Jongwon Seok et al. ETRI Journal, Volume 24, Number 3, June 2002
correlated. For a non-white Gaussian channel, it is possible the audio spectrum removed, which has the characteristics of
to increase detection performance by pre-processing before both the excitation signal of the original audio u (n) and the
correlation. This can be achieved by applying whitening or watermark signal w(n) , from the watermarked audio signal
de-correlation before correlation. using the linear predictive analysis.
Application of the whitening procedure should significantly This method transforms the non-white watermarked audio
remove the correlation in the signal to achieve optimum signal to a whitened signal by removing the audio spectrum.
detection. To remove the correlation in the audio signal, we Figures 3(a) and (b) show the probability density function of
used an autoregressive modeling named linear predictive a small segment of the watermarked audio signal before
coding (LPC) [21]. LPC filtering and the residual or error signal of the water-
The LPC method can approximate the original audio sig- marked audio signal after LPC filtering. The probability den-
nal x(n) as a linear combination of the past p audio sity function of the watermarked audio signal is clearly not
samples, such that smooth and has a large variance. On the other hand, the
probability density function of the residual signal after LPC
x(n) a1 x(n 1) + a 2 x(n 2) + L + a p x(n p) , (4) filtering has a smoother distribution and a smaller variance
than the watermarked audio signal. Furthermore, this distri-
where the coefficients a1, a 2 ,L a p are assumed constant bution can be easily modeled using the Gaussian probability
over the analysis audio segment. We convert (4) to an density function. Using the residual signal and pseudo-noise
equality by including an error term u (n) , giving sequence as a template, we applied matched filtering to ex-
p
tract the embedded information. Figure 4 shows a diagram
x(n) = a i x(n i ) + u (n) , (5) of the audio watermark detection scheme.
i =1
3
p
y (n) = a i x(n i ) + u (n) + w(n) . (6) 2.5
Probability
i =1
2
can be expressed as 1
p 0.5
y ( n) = a k y ( n k ) + m ( n) , (7) 0
k =1 0 200 400 600 800 1000
Bin
where m(n) is the residual signal of the watermarked audio (a) Watermarked audio signal
p
~
y ( n) = a k y ( n k ) . (8) 0.01
k =1
e( n ) = y ( n ) ~
y ( n)
p 0
= y (n) a k y (n k )
0 200 400 600 800 1000
(9) Bin
k =1
(b) Residual of watermarked audio signal
= m(n).
Fig. 3. Probability density function before and after linear
Eq. (9) reveals that we can estimate the signal m(n) with prediction filtering.
ETRI Journal, Volume 24, Number 3, June 2002 Jongwon Seok et al. 185
PN
Sequence
Copyright
Linear Information
Matched
Prediction
Filtering
Filtering Residual
Signal
was embedded into the audio at a rate of 128 bits per 15 sec- Grandmas Hand Livingston Taylor Male a cappella
onds. Figure 5 shows one frame of the audio signal and water- Flute Concerto in D Vivaldi Flute Concerto
mark signal which is to be embedded.
1. Robustness Test
To evaluate the performance of the proposed watermarking
0.1
algorithm, we tested its robustness according to the SDMI
(Secured Digital Music Initiative) Phase-1 robustness test
procedure [22]. The detailed robustness test procedure is as 0.05
Normalized Amplitude
follows.
0
D/A, A/D conversion
Using a 16-bit full duplex sound card and a SONY TCD-D8 -0.05
DAT, we performed the D/A and A/D conversions. First, the
watermarked audio signal was sent to the DAT through the
-0.1
line-out terminal in the PC sound card. Next, the recorded wa- Beamforming Destination
Network
termarked audio signal in the DAT was sent to the PC. This
means that the D/A and A/D conversions occurred twice. 100 200 300 400 500 600 700 800 900 1000
Sample
Mix down and amplitude compression Fig. 5. A frame of the audio signal and resulting watermark signal
A two-channel stereo audio signal was mixed down to a one- to be embedded.
186 Jongwon Seok et al. ETRI Journal, Volume 24, Number 3, June 2002
channel mono signal, and the amplitude of the audio signal was
compressed with a non-linear gain factor. Table 2. Robustness test: detection results for various attacks.
we selected an MPEG-1 Audio Layer 3 with a bit rate of 64 Equalization 0/768 0.00
kbps/stereo and MPEG-2 AAC Audio Coding with a bit rate of
128 kbps/stereo.
Table 2 shows the robustness tests detection results for the annoying (3.00 to 4.00). The number of transparent items
various attacks. We obtained a perfect detection result for the represents the number of incorrectly identified items. Fifteen
mix down, echo addition, and amplitude compression. Al- listeners participated in the listening test. The quality of the wa-
though some errors occurred in the MPEG-1 Audio Layer 3 termarked audio signal was acceptable for all the test signals.
coding and band-pass filtering, their detection results were ac-
ceptable. V. CONCLUSION
2. Subjective Listening Test In this paper, we described a new algorithm for digital audio
To further evaluate the watermarked audio quality, we also watermarking. The proposed watermark embedding scheme
performed an informal subjective listening test according to a accomplishes perceptual transparency after watermark embed-
double blind triple stimulus with hidden reference listening test ding by exploiting the masking effect of the human auditory
[23]. The subjective listening test results are summarized in Ta- system. This embedding scheme adapts the watermark so that
ble 3 under Diffgrade and # of Transparent Items. The the energy of the watermark is maximized under the constraint
Diffgrade is equal to the subjective rating given to the water- of keeping the auditory artifact as low as possible. Our
marked test item minus the rating given to the hidden refer- detection procedure extracts copyright information without
ence: a Diffgrade near 0.00 indicates a high level of quality. access to the original signal by applying whitening, or de-
The Diffgrade can even be positive, which indicates an incor- correlation, before correlation. The whitening procedure
rect identification of the watermarked item, i.e., the quality is effectively removes the correlation in the signal. Experimental
transparent. The Diffgrade scale is partitioned into five ranges: results showed that our watermarking scheme is robust to
imperceptible (> 0.00), not annoying (0.00 to 1.00), slightly common signal processing attacks, such as mix down, ampli-
annoying (1.00 to 2.00), annoying (2.00 to 3.00), and very tude compression, and data compression.
ETRI Journal, Volume 24, Number 3, June 2002 Jongwon Seok et al. 187
Despite the success of the proposed method, it also has a Audio Signals, Proc. of Multimedia96, 1996, pp. 473-480.
drawback: a synchronization problem. The use of a PN se- [16] C. Podilchuk and W. Zeng, Image-Adaptive Watermarking Us-
quence to generate the watermark signal is vulnerable to time ing Visual Models, IEEE J. on Selected Areas in Comm., special
scale modification attacks, such as linear speed change, pitch- issue on Copyright and Privacy Protection, vol. 16, no. 4, May
1998, pp. 525-539.
invariant time scale modification, and wow and flutter. This
[17] J.P.M.G. Linnartz, A.A.C. Kalker, G. Depovere, On the Design of
phenomenon is caused by the loss of synchronization between
a Watermarking System: Considerations and Rationales, Infor-
the PN sequence and the watermarked audio signal with time mation Hiding Workshop, Preliminary Proc. and Springer Verlag,
scale modification. Further research will focus on overcoming LNCS 1768, Sept. 1999, pp. 254-269.
this problem. [18] G. Depovere, T. Kalker, and J.-P. Linnartz, Improved Watermark
Detection Reliability Using Filtering before Correlation, Proc.
REFERENCES ICIP, vol. 1, 1998, pp. 430-434.
[19] ISO/IEC IS 11172, Information Technology Coding of Moving
[1] M. Swanson, M. Kobayashi, and A. Tewfik, Multimedia Data Pictures and Associated Audio for Digital Storage up to about 1.5
Embedding and Watermarking Technologies, Proc. of IEEE, vol. Mbits/s.
86, no. 6, June 1998, pp. 1064-1087. [20] T. Painter and A. Spanias, Perceptual Coding of Digital Audio,
[2] C. Cox, J. Killian, T. Leighton, and T. Shamoon, Secure Spread Proc. of IEEE, vol. 88, no. 4, Apr. 2000, pp. 451-515.
Spectrum Communication for Multimedia, Technical report, [21] B.S. Atal and S.L. Hanauer, Speech Analysis and Synthesis by
N.E.C. Research Institute, 1995. Linear Prediction of the Speech Wave, J. Acoust. Soc. Am., vol.
[3] R.J. Anderson and F. Peticolas, On the Limit of Steganography, 50, 1971, pp. 637-655.
IEEE J. Select. Areas Comm., vol. 16, May 1998, pp. 474-481. [22] SDMI Portable Device Specification 1.0, PDWG99070802, July
[4] A. Shizaki, J. Tanimoto, M. Iwata, A Digital Image Watermark- 1999.
ing Scheme Withstanding Malicious Attacks, IEICE Trans. Fun- [23] ITUT-R Rec. BS. 1116, Method for the Subjective Assessment
damentals, vol. E-83-A, no. 10, Oct. 2000, pp. 2015-2022. of Small Impairments in Audio Systems Including Multichannel
[5] F. Hartung and M. Kutter, Multimedia Watermarking Tech- Sound Systems, International Telecommunication Union, Ge-
niques, Proc .of IEEE, vol. 86, no. 6, June 1998, pp. 1079-1107. neva, Switzerland, 1994.
[6] W. Bender, D. Gruhl, and N. Morimoto, Techniques for Data
Hiding, Proc. SPIE, vol. 2420, San Jose, CA, Feb. 1995, p. 40.
[7] M. Swanson, B. Zhu, A. Tewfik, and L. Boney, Robust Audio
Watermarking Using Perceptual Masking, Signal Processing, vol.
66, no. 3, May 1998, pp. 337-355.
[8] C. Neubauer and J. Herre, Digital Watermarking and its Influ-
ence on Audio Quality, 105th AES Convention, Audio Engineer-
ing Society preprint 4823, San Francisco, Sept. 1998.
[9] C. Neubauer and J. Herre, Audio Watermarking of MPEG-2
AAC Bitstreams, 108th AES Convention, Audio Engineering So-
ciety preprint 5101, Paris, Feb. 2000.
[10] C. Neubauer and J. Herre, Advanced Audio Watermarking and
its Applications, 109th AES Convention, Audio Engineering Soci-
ety preprint 5176, Los Angeles, Sept. 2000.
[11] P. Bassia, I. Pitas, and N. Nikolaidis, Robust Audio Watermark-
ing in the Time Domain, IEEE Trans. on Multimedia, vol. 3, no.
2, June 2001, pp. 232-241. Jongwon Seok received his BS, MS and PhD
[12] C. Lee, K. Moallemi, and R. Warren, Method and Apparatus for degrees in electronics from Kyungpook Na-
Transporting Auxiliary Data in Audio Signals, U.S. Patent tional University, Taegu, Korea in 1993, 1995
5,822,360, Oct. 1998. and 1999. Since 1999, he has been a Senior
[13] Recardo A. Garcia, Digital Watermarking of Audio Signals Us-
Member of Research Staff in Electronics and
ing a Psychoacoustic Auditory Model and Spread Spectrum The-
Telecommunications Research Institute (ETRI),
ory, Proc. of 107th AES Convention, preprint 5073, Sept. 1999.
[14] M. Veen, W. Oomen, F. Bruekers, J. Haitsma , T. Kalker, and A. Korea. He has been engaged in the develop-
Lemma, Robust, Multi-Functional, and High Quality Audio Wa- ment of multichannel audio codec system, digital broadcast content
termarking Technology, Proc. of 110th AES Convention, preprint protection and management. His research interests include digital sig-
5345, May 2001. nal processing in the field of multimedia communications, multimedia
[15] L. Boney, A. Tewfik, and K. Hamdy, Digital Watermarks for systems, and digital content protection and management.
188 Jongwon Seok et al. ETRI Journal, Volume 24, Number 3, June 2002
Jinwoo Hong was born on April 15, 1959 in
Yujoo, Korea. He received his BS and MS de-
grees in electronic engineering from Kwang-
woon University, Seoul, Korea, in 1982 and
1984, respectively. He also received his PhD
degree in computer engineering from the same
university in 1993. Since 1984, he has been with
ETRI in Daejeon, Korea, as a Principal Member of Engineering Staff,
where he is currently a Team Manager of the Broadcasting Content
Protection Research Team. From 1998 to 1999, he researched at
Fraunhofer Institute in Erlangen, Germany, as a Visiting Researcher.
He has been engaged in the development of the TDX digital switching
system, multichannel audio codec system, digital broadcast content
protection and management. His research interests include audio &
speech coding, acoustic signal processing, audio watermarking, and
IPMP.
ETRI Journal, Volume 24, Number 3, June 2002 Jongwon Seok et al. 189