Session 02 - Paper 26

*
Corresponding author. Abhishek verma

Proceedings oI International ConIerence on Computing Sciences
WILKES100 ICCS 2013
ISBN: 978-93-5107-172-3
A passive approach to detect copy-move Iorgery in digital speech
audio signal
Abhishek Verma
1,*
, Vivek Kumar Singh
2
and R.C.Tripathi
3

1
M.Tech., Speech and Image Processing Lab, Indian Institute of Information Technology Allahabad 211012, Uttar Pradesh, India.
2
Research Scholar, Speech and Image Processing Lab, Indian Institute of Information Technology Allahabad 211012, Uttar Pradesh, India.
3
Professor,Department of Human-Computer Interaction, Indian Institute of Information Technology Allahabad 211012, Uttar Pradesh, India
Abstract
Due to the advancement in the digital technologies, digital tampering becomes very easy in the multimedia contents like
digital image, digital audio signal etc. Recent news related to Iorgery in speech clips have already attracted the attention oI
mass in India as well as in the other countries. Using advanced and powerIul audio editing soItware and tools, the digital
audios tampering becomes very common now-a-days. In digital speech tampering, some words or part oI sentence are deleted
or inserted in the same signal or in diIIerent signal and sometime can be mixed with other signal oI the same person at
diIIerent time. In this paper, the Iocus is to detect 'copy-pasted also called 'copy-move type oI Iorgery. In this type oI
Iorgery, one portion oI the signal (either a word or a part oI sentence) is copied and then pasted somewhere else in the same
digital audio signal. The aim oI this Iorgery is to alter the useIul inIormation in the original signal. Some most common type
oI Iorgery cases are those in which the audio clips are the main case evidence, which requires that the case evidence should be
authentic. In this paper, we proposed a method to detect the copy-move Iorgery in digital speech clips. By using Ieatures oI
the audio signals, we examine the whole audio signal. A window-by-window basis approach is used Ior this. In result oI this
paper, we show the result with tampered speech clips by highlighting those part oI the speech signal clips which are suspected
to be Iorged.
2013 Elsevier Science. All rights reserved.
Keywords: Digital speech Tampering, Copy-Move Forgery, Audio Features, Neighbor Shift matching
1. Introduction
Digital audios have become ubiquitous with the popularity oI the internet and portable digital devices such as
personal music players and smart phones. In the meanwhile, rapid developments oI low-cost and sophisticate editing soItware
make the modiIication oI audio Iile much easier Ior untrained users.
Digital Iorgery is a kind oI threat in which the digital content is modiIied, altered or tampered in a way to disorient the
people Ior any beneIit involved. With the advancement in the internet technologies it is now very easy to collect and store, as
well as upload the digital multimedia contents to or Irom the websites. The easy and Iree availability oI audio editing
soItware, the alteration in any manner in the audio becomes very easy. In this way, the digital audio Iorgery becomes a serious
problem and to detect this problem is itselI a challenge in the present scenario. There are various cases about the Iorgery in the
digital content spreads across all over the world.
184 Elsevier Publications, 2013
Abhishek Verma, Vivek Kumar Singh and R.C. Tripathi
Figure. 1
Following are the some high proIile cases which was/are in controversy:
1. Amar Singh controversial telephonic record releases in the year 2011. (Not prooIed)
2. American Actor Mel Gibson`s controversial audio recording in the year 2010.
3. Bin laden`s tape not genuine` the tape released in 2002 was told that the voice in the
tape is not the voice oI bin laden.
A huge number oI Iorged audios calls Ior more eIIective tools Ior the authentication and Iorgery
detection Ior digital audio
1.1 Types of forgery in the audio contents
Some common types oI Iorgeries in the digital audio/speech signal are:-
Insertion: Here, some part or a complete diIIerent audio clip is inserted at any place in the original audio clip.
Copy-Move: Copy-move Iorgery is a type oI Iorgery in which some part oI the signal is copied and
then moved/pasted to diIIerent location in the same signal. This type oI Iorgery is considered to be detected
in this paper|Iig 1|
Deletion:Some part is deleted Irom the audio/speech signal/clip.
Substitution:The part oI some other signal is substituted in the original signal at the place oI a particular word
while the original signal size does not modiIy.
Splicing:A kind oI mixing is done between two audio/speech clips. Sections Irom one audio are inserted to
another audio
1.2 Previous work
Forgery becomes very common and serious type oI threat. A noticeable amount oI image Iorgry detection
techniques are already been in use. Area oI detection oI attacks on audio is in its starting phase. Math et al. |1|
shows a hierarchal overview oI various kinds oI digital Iorgeries. Lineage (Ilow) mapping and source (origin)
identiIication approaches are mentioned to detect the Iorgery in digital content. While Greige et al. |2|
discussed biometric identity veriIication problems in audio-visual biometric systems which select suitable audio
signal Ieatures and extract Irom the questioned speech signal. A determined number oI Mel Irequency cepstral
coeIIicients (MFCC) are used to represent the spectral envelope oI the speech signal to veriIy the speaker. Block
matching is a very popular method to detect the copy-move Iorgery in the digital image. it is used in the method
proposed in |3,5,10|.Singh et al. in |3| proposes a eIIicient ad robust method to detect copy move type oI Iorgery
which is the basis to our approach. In the same, Ieatures are calculated Ior the diagonally divided blocks oI the
image and then using lexicographical sorting and neighbor shiIt matching the suspected region which are
copied and pasted are marked, in this way ,it detect region duplication in the image. Lin et al. |5| represent an
integrated technique Ior splicing and copy-move Iorgery detection in image . Images Irom RGB color space
are converted into YCbCr color space. Splicing is detected by computing the probability oI the Iorged block
using statistics oI the histogram which again is computed by checking the DQ eIIect oI this histogram. For
detection oI copy-move Iorgery, SURF algorithm is used. Bayram et al. |10| uses block-by-block approach as in
sm,i. .i c.,,x.. l....,
A assive pproach to etect opyMve orgery in igital peech udio ignal
|3|. The duplicated regions which are candidate oI copy-move Iorgery are detected by calculating Ieature
vector oI the image block-by-block basis. Lexicographical sorting and bloom Iilters with suitable distance
measures are then applied to ensure about the copy-move Iorgery in the image.
Except this the Iorgery work in the audio signal are very less and still in progress and one oI the research area
in the digital signal processing Iield. However, the work in the Iield oI digital audio Iorgery detection done by
Yang et al.|8| whose proposed method is to detect tampering in MP3 audio Iormat by using number oI
active (non-zero) spectral coeIIicients as Iunction oI Irame oIIsets. Korycki et al.|9| does the detection oI
tampering by doing visual inspection oI the audio signal in time-Irequency domain, the possibility oI
tampering is decided. The short-time Fourier transIorm and the analysis based on ENF Irequency criteria is used.
Maher et al.|4| explain ENF(electric network Irequency) criteria, Time-domain level detection, Irequency-
domain Iiltration like common digital signal processing approaches .Other manual process oI tampering detection
like Physical examination and inspection oI audio recordings ,critical listening and waveIorm observation,
and ,careIul listening till the end oI the recordings are also mentioned which are also used Ior analog
recording oI the audios. Koenig et al. |11| shows various approaches Ior Iorensic examination oI the audio.
Spectrographic Analysis SoItware, Fast Fourier TransIorm (FFT) Analyzers and SoItware, High-Resolution
WaveIorm Analysis SoItware etc.are described and explained. Iorensic evidence examination and procedures
are given like Evidence marking, Physical inspection, Digital data imaging, Playback/conversion
optimization, Critical listening, High- resolution waveIorm analysis, Narrow-band spectrum analysis,
Spectrographic analysis, Digital data analysis, Miscellaneous techniques, Work notes and reporting.
Digital Audio signal Ieature plays very important role in speaker veriIication, audio matching and authenticity
oI the audio signal. In |6, 7| various audio Ieatures are discussed. Mitrovic et al. |6| proposes various audio
Ieatures which are useIul in audio classiIication and matching process. The Temporal Ieatures, Physical
Irequency Ieatures, Perceptual Irequency Ieatures, Cepstral Features, Modulation Irequency Ieatures, Eigen
domain Ieatures, Phase space Ieatures are discussed which have numerous applicability in the audio and
speech signal processing. While Herre et al. |7| those audio Ieatures are proposed which are useIul in audio
matching. The Ieatures are described are spectral peaks, spectral Ilatness measure, spectral crest Iactor. These
mentioned Ieatures are well known Ior their predictability and robustness which doing matching oI the two given
audios. We use these Ieature in our method also.
Except this |12,13| does a small survey in the audio Iorensic Iields. Gupta et al.|12| discussed recent
developments in Iorgery detection oI digital audio Iiles by elaborating the audio processing systems with
Iorgery detection. Techniques like spectrogram analysis, device based authentication, environment based
authentication are discussed. Kraetzer et al. |13| presents approaches Ior digital media Iorensics to determine the
used microphones and the environments oI recorded digital audio samples by using known audio
steganalysis Ieatures .AAST and WEKA data mining soItware are used Ior Ieature computation and
classiIication.
In this paper, we describe a method that can be applied to detect a common Iorm oI tampering in digital audio
signals known as copy-move Iorgery, where sections in one audio is copied and pasted into other location
in the same audio. Our method achieves this by detecting matching pairs oI windows over the signal by
comparing audio Ieatures using shiIt vector calculation.
2. Features Used
In digital audio signal processing, there exists class oI parameters (Ieatures) like low-level signal parameters,
MFCC, psychoacoustic Ieatures including roughness, loudness and sharpness etc.There exists a class oI
statisticalIeatures oI audio signal which are known Ior their robustness and low computation overload.
These statistical Ieatures perIorm well in matching process. Some are as Iollows which is useIul in the
proposed method
Root mean square value (RMS value): -It is given as:-
Where, x(n) is the signal`s sample value at n,N is the length oI the signal and is the Root mean
Square value.It is useIul Ior sinosoids, means where the variates are positive and negative. It will give the value
equal to the one period oI the signal.
Zero-crossing rate (ZCR): The rate at which the sinusoids` sign change occurs is called zero-crossing rate. It
is deIined as:-
Where, is the signal sample value at i, sign gives sign value at ith position, Is is the sampling rate oI
the signal and N is the signal length taken or given, This Ieature has been used heavily in both speech recognition
and music inIormation retrieval, being a key Ieature to classiIy percussive sounds
Spectral Flatness measure (SFM) : Spectral Ilatness is calculated by dividing the geometric mean oI the power
spectrum by the arithmetic mean oI the power spectrum.It is Iormulated as:-
Where, denotes the power spectral density (PSD) coeIIicients in Irequency band, N is the signal length.
SFM shows robustness in the matching process between the audio signal pairs.
Spectral Crest Factor (SCF): It is the value obtained by taking the ratio oI the largest power spectral density
(PSD) and the mean power spectral density (PSD) value in the Irequency band.
Where, denotes the power spectral density coeIIicients in Irequency band, N is the length oI the signal.
These two spectral Ieatures SFM and SCF have the property that they are not signiIicantly inIluenced by the
speech coding system or by other intended modiIications oI the audio signal
Mel-Frequency Cepstral Coefficients (MFCC):
Mel-Frequency Cepstral is the representation oI the short-term power spectrum oI the sound, and Mel-
Irequency Cepstral CoeIIicients are those coeIIicients which make the MFC oI a sound. This Ieature is very
useIul and popular in speaker recognition and veriIication systems. While in our system we are Iocusing on the
matching pairs which have same vector oI Ieature values. So, in this way MFCC is used only Ior matching
purpose not Ior the recognition here.
Power Spectral Density (Spectra Density Mean and Spectral Density Standard Deviation): Power Spectral
Density (PSD) describes how the power oI a signal or time series is distributed with Irequency. Power Spectral
Density (PSD) coeIIicients oI the signal are calculated using the Iormula:-
Where, denotes the power spectral density (PSD) coeIIicients in Irequency band,Nw denotes
overlapped segments oI the x(n)where x(n) is divided into Nd length window w(n) where w(n) is the window
chosen (hamming window) and the Iormula inside , , is nothing but the Discrete Fourier transIorm (DFT)oI the
signal.
A assive pproach to etect opy-ove orgery in igital peech udio ignal
SFM and SCF are computed Irom the power spectral density. In this way the statistical data Irom the PSD like
mean, standard deviation, min, max oI the PSD can be helpIul. So considering this Iact, Spectral Density mean
and Spectral Density Standard deviation are used as Ieatures Ior the proposed method
3. Proposed Method
Figure 2. Flow Chart oI Proposed Methodology.
The proposed method takes the input digital audio speech signal and then it starts processing in the Time-
domain. To reduce the size oI the input signal, down sampling can be done.
AIter this, the signal is divided into overlapped windows oI same size .Then these windows are matched to
detect iI there is any copy-move Iorgery present in the signal or not.
Detailed procedure is divided into Iollowing parts:-
DownSampling (Optional):-While representing the signal into its Time-domain, it has a large number oI
sampling values oI the signal, which are to be processed and take much time to process. So, Downsampling is
done with Iactor oI 1 on the signal to reduce the signal size assuring the nyquist rate Iormula. Because, there
exists the problem oI losing those sampled values in the signal which may be tampered, not downsampling can
be ignored
Windowing of samples:-AIter step 1, the windowing process starts .Consider M samples in a window with N
overlapped sample values between two adjacent windows oI the signal; here every window is treated as a signal
itselI.
So Ior a signal oI length L, there will be
WL/M, iI N0;
WL-(M-(M-N)), iI NM-1;
Windows in the given signal.
For example, II our signal length is 30,000,and the value taken Ior M is 1000 samples and N999 overlapped
samples between the adjacent windows, then total number oI windows Ior the given signal will be,
W30,000-(1000-(1000-999))
W29001
Feature Extraction/computation: - The next step is to extract/compute Ieatures discussed in section 3 Ior every
window by considering every window itselI as an individual signal.
Let Bi denotes the ith window which will contain the Ieatures Ior the ith window, Total there are Bi (i1,
2....W) where W is the total number oI windows Ior the given input signal. Also with the Ieature set Fij
(i1, 2...W, j1, 2..7), a vector oI the Iorm,
(Fij, Bi)
Where Bi is the window number and Fij is the vector oI seven parameters values; is generated.
Features are extracted/calculated directly Irom each window, the computation as Iollows.
From the experiment it is shown that Root mean Square (RMS) and Zero-Crossing Rate (ZCR) have the exactly
similar values Ior the matched pair, For Fij(j1), Iirst Ieature which is computed/extracted is RMS and then For
Fij(j2) ZCR is computed.
AIter the calculation oI the two statistical Ieatures, For Fij (j3), computation Ior MFCC starts. First coeIIicients
oI the resulting MFCC array Ior the window is taken. Because, MFCC shows the uniqueness to the speech, it will
give almost similar coeIIicients values Ior the same speech signal, so it is computed as the third Ieature.
Now, two Spectral Ieatures SFM and SCF are computed as Fij (j4 and j5), as already mentioned that these
two are known Ior their robustness and these two are also used in the audio matching process. For SFM and SCF,
power spectral density (PSD) coeIIicients are used.AIter this the Ieature computation process ends with
computation oI the last two Ieatures. calculate mean and standard deviation oI the PSD as the Fij (j6 and j7)
Ieature.Thease two Ieatures values have less ability to give the inIormation about the matched pairs
comparatively, but iI thrsholding oI very narrow range values is used then chances oI getting the almost similar
Ieature value are increases .Now, store set (Fij, Bi) in a multidimensional array, A, where array A has the rows
equal to the number oI the windows and have eight columns. A contains all the real values.
The Array A has the Iorm like:-
Fig 3. Structure oI the array
AIter calculating all the Ieatures Ior a window, matching process starts.
Lexicographical Sorting:-Now, all these are stored in a multidimensional array A. Every row oI this array has
eight values; one window number which is starting sample value oI the window and other seven are Ieature
values described above. This means that ,this array has W number oI rows and eight columns First seven
column are used in sorting ,highest priority is given to the Ieatures I4 and I5 then I1 and I2 and then all other
Ieatures gets the priority.
Detection of suspected copied region:- Array A is successIully sorted, hence the rows, representing windows
with very similar Ieatures, are arranged nearby. Those matched pairs or set oI pairs which are the candidate oI
possible Iorged part are taken out by add/subtract preset threshold value which are diIIerent Ior diIIerent Ieatures
set, they are,
+0.001 threshold Ior the MFCC, SFM and SCF Ieatures value,
+0.007 Ior standard deviation oI PSD values and +0.01 Ior mean oI PSD values.
In all the experiments, the given threshold is used and except some results almost oI the tampered signal detected.
Neighbor Shift matching: - It is very much possible that the nearby windows detected aIter step 5 are those
matching pair oI the windows which are just adjacent to each other. Neglect and discard these matching pairs
because this part cannot be considered as the Iorged part, means, the windows are just 1 sample overlapped and
one sample Iorgery is meaningless and just one sample cannot show a word or alphabet in the signal. In Neighbor
shiIt match, those (Fij,Bi) vector which shows a long chain oI corresponding matching windows are considered
and other vectors discard Irom the array we get in step 5. Neighbor ShiIt value is calculated by subtracting
two corresponding Ieature vectors. ShiIt vector oI the entire suspected duplicate region will be same. |3| As in
1.
ShiIt ( (j), (j)) ShiIt ( (k), (k)).1
Two copied and then moved areas will yield almost same pair oI identical Ieatures. This Ieature will yield same
shiIt vector. This shiIt vector will be checked Ior a particular number oI neighbors. Same shiIt vector will
be showing the duplicated region.
The Iollowing result shows the eIIect oI the neighbor shiIt matching
Fig 5. BeIore neighbor shiIt matching
Fig 6. AIter neighbor shiIt matching
Detection and marking of Duplicated Region:-Now, those windows which are suspected to be tampered are in
hand. AIter step 6, the decision about tampered region can be taken. The proposed method mark tampered region
in the time domain plot oI the signal in the result. The two marked regions shown in result tells that one oI them
is copied and pasted at the other place but can not tell which one is copied and then pasted. Except Ior some Ialse
matching, the system detects the tampered region successIully.
The overall procedure is as Iollows in a Iorm oI algorithm;
3.1 Algorithm-copy-move forgery detection of digital audio speech signal:-
Let x(n) be the digital speech audio signal represented in Time-Domain. M is the length oI window set by user; L
is the length oI the signal. Fij ((i1, 2, 3...W) (j1, 2.7)) is the Ieature vector Ior every window. Bi is the ith
window and i is the Iirst sample location as well as the window number.W is the total number oI windows Ior the
input signal computed by the Iormula given in section 4.N is the number oI samples which are overlapped
between the adjacent windows, A is the multidimensional array oI size (W 8), where ith row oI A is a vector
(Fij, Bi) oI length 8, Now the algorithm Ior the proposed method is as Iollows:-
//Set the value Ior M and N
Initialize M, N
//Determine total number oI Window we will have in the given signal
Calculate W
//Create and Initialize multidimensional array Create and Initialize Array A
//Do windowing and calculate/extract Ieature one by one Ior every window
Table 1. Copy-Move Iorgery Detection Algorithm
Begin
For i1 to L
Get window Bi
For j1 to 7
Extract/calculate Ieature set Fij Ior Bi
Store (Fij, Bi) in A
End
End
For i1 to W
Do Lexicographical sorting oI A
End
For i1 to W
Thresholding oI the window Ieatures Ior those suspected pairs
End
For i1 to W
Do Neighbor shiIt matching process over the Array A
End
Decide and Mark
End
4. Experimental result/Description
The proposed Copy-move Iorgery detection algorithm presented in Section 5 and section 6 was implemented
with MATLAB 7.10.0.The implementation is done on a computer oI CPU 3.0GHz with main memory 4GB and
secondary storage memory oI 460 GB. We collect a set oI 100 high qualities audio speech signals in the with 32
KHz sampling rate oI length around 4-5 seconds Irom the TIMIT dataset |15|.The speech clips are recorded in
the voice oI either male or Iemale but not both .The Forged signals are made by the popular audio editing
soItware (like audacity, cool edit etc.).We made 100 copy-moved Iorged signals corresponding to the 100
original signals we have chosen Irom the dataset. Except the copy-move Iorgery, all the test audio clips are Iree
Irom any other type oI audio tampering discussed in section 1.
For all the results, we have taken 1000 sample size Ior the window with 999 samples overlapped to the two
adjacent windows. Downsampling is also done Ior some audio clips.Thresholding is used whenever needed to
retrieve those matched window chain pairs which becomes very similar to each other, however the threshold
value range taken is very small.
For example, this audio clip is recorded in a Iemale voice. The sentence oI the clip is 'guess the question from
the answers in original TIMIT |15| data set. We Iorged this audio clip as 'guess the question from the answers
guess and checked Ior the tamper detection which is surely present. Without doing downsampling oI the Iorged
clip we start the procedure. The window size is taken 1000 sample and overlapped size is 999 sampled between
the adjacent windows. There was no other manipulation done /present in the clip going to experimented. The
procedure takes exceeded time but we give the priority to the result`s accuracy, no matter how much time it takes.
Various attempts were made with diIIerent threshold and diIIerent shiIt vector counts. The process repeats more
than one time on the audio clip to gain accuracy. Besides some Ialse detection oI the result, we were able to
decide about the Iorgery attempt done on the given audio by marking those regions which were suspected to be
tampered
Fig 6a. Iorged audio clip Fig 6b. Detection Results
The two rectangular boxes shows us the suspected part which were copied and pasted but cannot tell to which
part is copied and which part is pasted. One oI the box parts in the signal is copied and then pasted in the other
box.
Also the two oval shape boxes tell some Ialse matching pairs which were also detected. Because total accuracy is
not possible to achieve, we tried to detect the doctored part as much as possible and we succeed in almost all the
tested audio clips.
Now to deIine the accuracy oI the proposed method, Iirst, the perIormance and contribution oI the Ieatures that
were used in the method will be explored
We have used set oI seven Ieatures those are zero crossing rate(ZCR),Root mean square(RMS).Mel-Irequency
cepstral coeIIicients(MFCC),Spectral Flatness Measure(SFM),Spectral Crest Factor(SCF),Power Spectral
Density(PSD) coeIIicients` mean and Power Spectral Density coeIIicients` Standard Deviation. The requirement
oI so many Ieatures used in the matching process is essential.
Let L is the length oI the signal in which r is region which is copied and pasted in the signal, The resulting
tampered signal length will be (Lr) and the region (L-r) is unmanipulated. The genuine region oI the signal
retrieved by our method is called Ialse matching pair. The reason behind to use 7 Ieatures can be understand by
analyzing the percentage oI Ialse matching pair retrieved
RMS value and ZCR value only 70
RMS value, ZCR value,MFCC with
threshold
45
RMS value, ZCR value,MFCC value with
appropriate threshold value,PSD coeIIicients
mean and slandered deviation value
30
SFM value and SCF value with appropriate
thrsholding
20
All Ieatures used 8-10
5. Conclusion
In this paper, the topic oI copy-move Iorgery is discussed in digital speech audio signals Ior the purpose oI
authenticity by doing matching process. In the time-domain oI the signal, windowing over the signal is done and
treating each window itselI as a signal. By using most suitable audio Ieatures, the method shows the region oI
suspected Iorged part oI the signal with those matching pairs oI the window, which is returned by shiIt vector
calculation. Those Ieatures which are known Ior their robustness like spectral Ilatness and spectral centroids were
chosen. These Ieatures give us the chance to Iind the matching pairs within some threshold limit constraint.
Except the time taken by the experiment, it deIinitely decides about the authenticity oI the signal. We are
currently in the process oI developing new methods oI detecting copy-move Iorgery except this method, by using
the beneIit oI the audio Ieatures.
Acknowledgement:
We are thankIul to IIIT Allahabad Ior providing us necessary resources in Speech, image and Language
processing Lab. We are also thankIul to TCS RSP Ior their support Ior the work.
References
|1| Shrishail Math and R.C.Tripathi. Digital Forgeries: Problems and Challenges International Journal oI Computer Applications (0975
8887), Vol 5 No.12, August 2010.
|2| Hanna Greige and Walid Karam.. Audio-Visual Biometrics and Forgery Advanced Biometric Technologies. Dr. Girija Chetty (Ed.),
ISBN: 978-953-307-487-0, 2011.
|3| V.K.Singh and ,R. C Tripathi. Fast and EIIicient Region Duplication Detection in Digital Images using Sub-Blocking Method
International Journal of Advanced Science and Technology, Vol. 35, pp. 93-102, 2011.
|4| Robert C. Maher. Audio Forensic Examination |Authenticity, enhancement, and Interpretation| IEEE signal processing magazine |84|,
March 2009.
|5| ShinIeng D. Lin and Tszan Wu. An Integrated Technique Ior Splicing and Copy-move Forgery Image Detection 4th International
Congress on Image and Signal Processing, 2011.
|6| Dalibor Mitrovic, Matthias Zeppelzauer and Christian Breiteneder. Features Ior Content-Based Audio Retrieval Advances in
Computers, Vol.78, pp.71-150, 2010.
|7| Jiirgen Herre, Eric Allamanche and Oliver Hellmuth. Robust Matching OI Audio Signals Using Spectral Flatness Features IEEE
Workshop on Applications of Signal Processing to Audio and Acoustics 2001, 21-24 October 2001.
|8| Rui Yang, Zhenhua Qu and Jiwu Huang Detecting Digital Audio Forgeries by Checking Frame OIIsets ACM 978-1-60558-
058,MM&Sec08,September 2223, 2008,
|9| RaIal Korycki. Methods oI Time-Frequency Analysis in Authentication oI Digital Audio Recordings International Journal of
Electronics and telecommunications, Vol. 56, September 2010,
|10| Sevinc Bayram, Husrev Taha Sencar and Nasir Memon An EIIicient and Robust Method Ior Detecting Copy-Move Forgery IEEE,
ICASSP 2009.
|11| Bruce E.Koenig and Douglas S. Lacey. Forensic Authentication oI Digital Audio Recordings J. Audio Eng. Soc.,Vol. 57, No. 9,
September 2009.
|12| Swati Gupta. Current Developments and Future Trends in Audio Authentication IEEE Computer Society, 2012.
Table 2. Comparison oI results on the basis oI diIIerent Ieatures
Features used Percentage of false matching pair
|13| Christian Kraetzer, Andrea Oermann, Jana Dittmann and Andreas Lang. Digital Audio Forensics: A First Practical Evaluation on
Microphone and Environment ClassiIication ACM 978-1-59593-857-2.September 2007.
|14| J. S. GaroIolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S.Pallett, and N. L. Dahlgren. DARPA TIMIT acoustic-phonetic
continuous speech corpus CD-ROM, 1993. National Institute of Standards andTechnology, NISTIR 4930.
|15| ABC News. Did someone mess with mel gibson`s audio recordings? http://abcnews.go.com/Entertainment/ mel-gibsons-rants-
messed/story? id11169736, July, 2010
Index

A
Audio features, 186, 193

C
Copy-move forgery, 184186, 188
detection algorithm, 191
sample of, 185

D
Digital audios
define, 184
down sampling, 188
duplicated region, detection and marking of, 190
experimental result/description, 191192
feature extraction/computation, 189
flow chart, 188
lexicographical sorting, 189190
MFCC, 187
neighbor shift matching, 190
PSD, 187
root mean square value (RMS value), 186
SCF, 187188
SFM, 187188
signal processing, 186188
suspected copied region, detection, 190
windowing of samples, 188189
ZCR, 187
Digital forgery
define, 184
types of, 185186
Digital speech tampering, 184

M
Mel-frequency cepstral coefficients (MFCC), 187

N
Neighbor shift matching, 185, 190

P
Power spectral density (PSD), 187

R
Root mean square value (RMS value), 186

S
Spectral crest factor (SCF), 187188
Spectral flatness measure (SFM), 187188

Z
Zero-crossing rate (ZCR), 187

Session 02 - Paper 26

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Session 02 - Paper 26

Uploaded by

Copyright:

Available Formats

*

Corresponding author. Abhishek verma

You might also like