Professional Documents
Culture Documents
Este artigo foi reproduzido do original final entregue pelo autor, sem edições, correções ou considerações feitas pelo comitê
técnico. A AES Brasil não se responsabiliza pelo conteúdo. Outros artigos podem ser adquiridos através da Audio
Engineering Society, 60 East 42nd Street, New York, New York 10165-2520, USA, www.aes.org. Informações sobre a seção
Brasileira podem ser obtidas em www.aesbrasil.org. Todos os direitos são reservados. Não é permitida a reprodução total
ou parcial deste artigo sem autorização expressa da AES Brasil.
_________________________________
Automatic Estimation of Harmonic Complexity in Audio
1
José Fornari & Tuomas Eerola
1
Finnish Centre of Excellence in Interdisciplinary Music Research
Department of Music, University of Jyväskylä.
Jyväskylä, P.O. Box 35(M), Finland
{fornari,tuomas.eerola}@campus.jyu.fi
ABSTRACT
Music Complexity is here defined as the perceived entropy conveyed by music stimuli. Part of this information
is given by musical chords, more specifically, by their structure (static) and progression (dynamic). We named
this subset of music complexity as Harmonic Complexity (HC). A computational model able to automatically
estimate HC in musical audio files can be useful in a broad range of applications for sound processing, computer
music and music information retrieval. For instance, it could be used in large-scale music database searches or
in adaptive audio processing models. It is rare to find in the literature studies about the automatic estimation of
HC directly from audio files. This work studies several principles related to the perception of HC in order to
propose an initial audio model for its automatic estimation. The prediction of these principles were compared
with the data acquired in a behavioral experiment where thirty-three listeners rated one-hundred music excerpts.
Lastly, a linear model with these principles is presented. The results are shown and described based on the
comparison of the listeners' mean ratings with principles and model predictions.
(from simple triads, major, minor, augmented, diminished, Next, these two chromagram matrixes are collapsed in
to complex clusters), 2) chord functions (fundamental, time, which means that their columns were summated and
dominant, subdominant, etc.), 3) chord tensions (related to their consequent twelve points arrays normalized. As the
the presence of 7ths, 9ths, 13ths, etc.) and 4) chord music excerpts rated in this experiment were all with five
progressions (diatonic, modal, chromatic, atonal). As these seconds of duration, collapsing the chromagram seemed
principles are based on the musical context, they lead to the reasonable once this is nearby the cognitive “now time” of
development of a high-level descriptor, as described in [4], music, however, for longer audio files, a more
in order to properly predict the perception of HC in a sophisticated approach should be developed. In this work,
computational model. the collapsed chromagram array proved to efficiently
A high-level musical descriptor of HC is expected to represents the overall harmonic information in this
deliver a scalar measurement that represents the overall HC particular small time duration. We then calculate two
of a musical excerpt. This perception is conveyed only in principals out of these two arrays, named here as: energy
excerpts that are longer than the cognitive "now time" of and density. Energy is the summation of the square of each
music, around three to six seconds of duration [5]. array element. The idea was to investigate if the array
The work here presented describes the study of several energy was somehow related to the chord structure. The
principles that seem to be related to the human perception second principle, density, accounts for the array sparsity.
of HC. We built a computational model with these The idea is that, for each array, the smaller the number of
principles that were then analyzed and improved, for the zeros between nonzero elements, the bigger is its density.
best prediction of HC that we could achieve, as shown in The intention here was to investigate if the collapsed
the experimental results. chromagram density is related to the chord structure. This
came from the notion that, in terms of musical harmony,
1 COMPUTATIONAL MODEL DESIGN the closer the notes in a chord are, the more complex the
harmony tends to be. Figure 1 shows the diagram for the
The first step to create a computational model for HC calculation of these six principles.
was to investigate principles related to the amount of
complexity found in chords structures and their
progressions. Chords are related to the musical scale region
where note events happen close enough in time to be
perceived as simultaneous and therefore interpreted by the
human audition as chords. In audio signal processing, this
corresponds to the fundamental partials that coexist in a
narrow time frame of approximately fifty milliseconds and
are gathered in a region of frequency where music chords
mostly occur. In musical terms this is normally located
below the melody region and above the bass line.
However, as the bass line also influences the interpretation
of harmony, this one also had to be taken into
consideration for the HC model.
The model starts by calculating the chromagrams for
these two regions (bass and harmony). A chromagram is a
form of spectrogram whose partials are folded in one
musical octave, with twelve bins, corresponding to the
musical notes in a chromatic musical scale. We separate
each region using pass-band digital filters in order to
attenuate the effects of partials not related to the musical
harmonic structure. Figure 1 HC Principles Diagram.
These two chromagrams are calculated for each time
frame of audio corresponding to the window-size of its This computational model was written and simulated in
lowest frequency. Therefore, each chromagram is Matlab. As seen in Figure 1, this model calculated these
presented in the form of a matrix with twelve lines, each three principles (auto-similarity, density and energy) for
one corresponding to one musical note from the chromatic two chromagrams representing the regions of frequency
scale, and several columns, corresponding to the number of corresponding to the musical bass line and harmony.
time frames. We then tested three principles that initially Results of our measurements are described in the next
seemed to be related to the HC perception, as they were section.
created from the studies and evidences mentioned in the
introduction of this work. 2 HC LISTENERS RATING
The first principle is what we named here as auto-
For the evaluation of the computational model, a group
similarity. This one measures how similar each
of thirty-three students of music were asked to rate the
chromagram column is with each other. This calculation
harmonic complexity of one hundred music excerpts of
returns an array whose size is the same as the number of
five second of duration each. These excerpts were
columns in the chromagram. The variation of this array is
extracted from instrumental (without singing voice) movies
inversely proportional to the auto-similarity. The rationale
soundtracks. The experiment was run individually using a
behind this principle is that auto-similarity may be
computer and the order of the examples was randomized.
proportional to the randomness of the chromagram and this
The rating was done using a nine-point Likert scale [6] that
one seems to be related to chord progression, one of the
goes from “no complexity” (at the leftmost side) to
studied aspects of HC perception.
“extremely complex” (at the rightmost side).
6º CONGRESSO / 12ª CONVENÇÃO NACIONAL DA AES BRASIL, SÃO PAULO, 5 A 7 DE MAIO DE 2008
FORNARI AND EEROLA AUTOMATIC ESTIMATION OF HARMONIC COMPLEXITY
6º CONGRESSO / 12ª CONVENÇÃO NACIONAL DA AES BRASIL, SÃO PAULO, 5 A 7 DE MAIO DE 2008
FORNARI AND EEROLA AUTOMATIC ESTIMATION OF HARMONIC COMPLEXITY
6º CONGRESSO / 12ª CONVENÇÃO NACIONAL DA AES BRASIL, SÃO PAULO, 5 A 7 DE MAIO DE 2008