You are on page 1of 12

Musical Acoustics and Human Vowel Sounds

Rik Ghosh
Lab Partners: Charlie Rose, Harrison Chang
April 26, 2016

Abstract
We quantitatively explore sine waves, square waves, and ramp-up waves, both sound and electronic,
using Fourier Analysis and the Fast Fourier Transform (FFT). Comparing our data to theoreticallypredicted results, we confirm the reliability of the FFT program and that any of these waveforms can
be represented as a sum of simpler sinusoidal waves, though some results are inconsistent with theory
at higher FFT peak frequencies possible sources of error are discussed. We then sing three dierent
human vowel sounds and gather FFT data to confirm that dierent vowels can be characterized by a
unique decomposition of FFT peaks. To conclude, we eectively generate one of these vowel sounds by
layering multiple computer-generated sine waves driven at the relevant experimentally-determined peak
frequencies, further confirming our results.

INTRODUCTION
All sound is created by periodic waves of diering air pressure made when air molecules go through cycles
of compression and decompression. From vibrating strings to standing waves in a column of air, dierent
types of instruments all have varying methods of bringing this about, but the ultimate result is always
longitudinal, periodic waves of sound.
As such, the displacement of these air particles, sound itself, can be modeled by 2-periodic functions.
Actual sounds, however, are complicated since they can be made up of waves that are not necessarily sinusoidal, but such waves can be very well approximated by summing multiple sine waves of various frequencies
and amplitudes such a sum is known as a waves Fourier Series. Audio analysis software is capable of exe1

cuting a Fast Fourier Transform (FFT), which given an audio input returns the frequencies and amplitudes
(relative to the wave with the greatest amplitude) of the sine waves that make up its Fourier Series.
In this experiment, we use such Fourier Analysis to find the sine waves that make up various complicated
waveforms, both sound and electronically-generated. Then, noting that human vowel sounds are characterized by a distinct combination of waves of dierent frequencies (called formants), we find the formants
that make up three dierent human vowel sounds using our own singing as an input. Finally, we layer
electronically-generated sine waves of these frequencies to successfully form one particular human vowel
from multiple computer-generated straight tones.

EXPERIMENTAL DESIGN

Figure 1: A PASCO power amplifier outputs various waves at 630 Hz that are fed into a computer running
the FFT program on DataStudio

The setup used to conduct Fourier Analysis on electronic waveforms is shown in Figure 1. The PASCO
power amplifier generates each type of waveform at 630 Hz, and they are fed directly into DataStudios FFT
program. In separate trials, the power amplifier outputs sine waves, square waves, and ramp-up waves. The

peak frequencies (and corresponding amplitudes) are recorded, and the values are compared to theory.

The setup used to conduct Fourier Analysis on electronically-generated sound waves is shown in Figure 2.

Figure 2: A PASCO power amplifier outputs various waves at 630 Hz that are fed into a speaker. The
speaker plays the resulting sound, which is recorded by the microphone and then fed into FFT program on
DataStudio for analysis

The PASCO power amplifier generates each type of waveform at 630 Hz, and they are fed into a speaker,
which plays them. A microphone is placed near the speaker, and the resulting recording is fed into
DataStudios FFT program. Square waves and ramp-up waves are again examined, and peak frequencies
and corresponding amplitudes are again recorded. To account for microphone-speaker issues (which cause
a varying degree of error between amplitude values of dierent frequencies) a 1 V sine wave is played at
each of the peak frequencies to measure the amplitude at each frequency and generate a calibration factor.
The amplitudes from the square and ramp-up wave data are adjusted using these calibration factors and

normalized; the resulting data are compared to theory.

The setup used to conduct Fourier Analysis on human-generated vowel sounds is shown in Figure 3. A

Figure 3: A human sings vowel sounds at a pitch of about 183 Hz into a microphone that is fed into the
FFT program on DataStudio

human sings three dierent vowel sounds (/ah/, /ee/, /oo/) at a constant pitch of about 183 Hz into a
microphone that is fed into DataStudios FFT program. Frequency peaks in the FFT of each vowel sound
is recorded (with corresponding amplitudes) in an eort to locate the formants that make up each vowel.
The setup in Figure 2 is used again to generate a sine wave at each of the peak frequencies found, using
the setup in figure 3, in the FFT of the /ah/ vowel (with appropriate amplitudes), but now the microphone
feeds into Garage Band. The recordings from each sine wave are layered with the aim of reconstructing the
/ah/ sound with only computer-generated straight tones.

PROCEDURE
We first experimented with waves generated by a PASCO power amplifier. To measure the nature of
each type of wave, we used DataStudio, looking at the FFT for a sine wave at 630 Hz. Then, we considered
(separately) ramp-up and square waves also produced at 630 Hz, noting the frequency and relative amplitude
of each FFT spike, comparing this to theoretically-predicted values these characterized the sine waves that
made up the Fourier Series of each waveform.
Next we experimented with sound waves generated by the power amplifier, played through a speaker,
and recorded by a microphone. We first recorded FFT data for square and ramp-up waves generated at 630
Hz, and after noting the frequency and relative amplitude of each peak, we generated a sine wave at each
peak frequency to calibrate the speaker-mic combination and arrive at adjusted amplitude values, which we
normalized.
After that, we continued using the same setup with the microphone, but instead of playing computergenerated sounds through a speaker, we sang three dierent vowel sounds, /ah/, /ee/, and /oo/, at a
fundamental frequency of approximately 183 Hz into the microphone. With this, we gathered FFT data and
recorded amplitude and frequency values. Then, we used Garage Band to synthetically recreate the human
/ah/ vowel by layering computer-generated, single-frequency sine waves (played through the speaker) at each
of the peak frequencies in the /ah/ FFT we found earlier.

RESULTS AND ANALYSIS


A table of frequencies and corresponding amplitudes for the peaks in an FFT for a square wave
generated at 630 Hz by a power amplifier is shown in Table 1. Also shown are corresponding
theoretically calculated amplitudes.

Experimental Results

Theoretical Results

Frequency (Hz)

Relative Amplitude

Frequency (Hz)

Relative Amplitude

630

630

1889

0,33

1890

0,333333333

3148

0,2

3150

0,2

4407

0,14

4410

0,142857143

5766

0,11

5670

0,111111111

6926

0,08

6930

0,090909091

8185

0,07

8190

0,076923077

9444
0,06
8
9450
0,066666667
Table 1. Experimental and theoretically predicted frequencies and corresponding relative amplitudes
of peaks in an FFT of a square wave generated at 630 Hz by a power amplifier.

Comparing both experimental frequency and relative amplitude values to those predicted by
theory, we see that our experiment was very accurate. Noting, however, that the power amplifier fed
the waveform directly into Data Studio, this is to be expected there is very little room for error.
Ultimately, this confirms the reliability of Data Studios FFT for use in later experiments that may
have more room for error, ruling out the FFT itself as a potential source of error.
In Table 2 is shown a similar table of frequencies and corresponding amplitudes for the peaks in
an FFT, this time for a ramp-up wave generated at 630 Hz by a power amplifier. Again shown are
corresponding theoretically calculated amplitudes.

Experimental Results

Theoretical Results

Frequency (Hz)

Relative Amplitude

Frequency (Hz)

Relative Amplitude

632

630

1262

0,5

1260

0,5

1892

0,33

1890

0,333333333

2522

0,35

2520

0,25

3152

0,2

3150

0,2

3782

0,16

3780

0,166666667

4412

0,14

4410

0,142857143

5041

0,12

5040

0,125

5671

0,11

5670

0,111111111

6301

0,095

10

6300

0,1

6931

0,086

11

6930

0,090909091

7561

0,078

12

7560

0,083333333

8191

0,071

13

8190

0,076923077

8821

0,065

14

8820

0,071428571

9450
0,06
15
9450
0,066666667
Table 2. Experimental and theoretically predicted frequencies and corresponding relative amplitudes
of peaks in an FFT of a ramp-up generated at 630 Hz by a power amplifier.

Again most of the experimental values seem to be very similar to those predicted by theory.
However, the relative amplitude data seem to be less accurate at the very high frequencies, which is
understandable given that the relative amplitude values are very small. A potential consequence is
that relative amplitude values at such frequencies should be taken with consideration in later
experiments, noting that the FFT itself may contribute to some error.
Various measurements (including amplitude) of a sound wave produced by a speaker vary with
the frequency of that sound. To produce useful data, these measurements must be adjusted by a
correction factor that is unique to each frequency. A table of FFT peak frequencies, corresponding
relative amplitudes, adjusted amplitudes (by the correction factor), and normalized amplitudes (so
that the lowest frequency peak has amplitude 1) is shown in Table 3. The input is a square sound
wave generated at 630 Hz by a speaker.

Frequency (Hz)

Relative Amplitude

Adjusted
Amplitude

Normalized Amplitude

632

1,385041551

1892

0,66

0,493273543

0,356143498

3152

0,51

0,3126916

0,225763335

4411

0,2

0,201005025

0,145125628

5671

0,17

0,160984848

0,116231061

6931

0,1

0,132802125

0,095883134

8191

0,036

0,125

0,09025

9294
0,063
0,134042553
0,096778723
Table 3. The FFT peak frequencies and corresponding amplitudes of a square wave generated at 630
Hz by a speaker.

We can see that the data overall support the conclusion that a peak that has frequency roughly
a multiple of the fundamental frequency has a corresponding normalized amplitude roughly one over
that multiple, as predicted by theory. This is to say that for a peak with frequency nf, where f is the
fundamental frequency, the corresponding normalized amplitude is roughly 1/n. This is less true for
the data points of very high frequency, but as we had found before, the FFT data is less reliable at
high frequencies due to very small amplitude values. This coupled with the additional error in
measuring actual sound (as opposed to a wave directly sent through a wire) can account for this
variation from theory.
A similar table of Frequency, Relative Amplitude, Adjusted Amplitude, and Normalized
Amplitude values is shown in Table 4. This time, however, the input signal was a ramp-up sound
wave generated by a speaker.

Frequency (Hz)

Adjusted
Amplitude

Relative Amplitude

Normalized Amplitude

632
0,63
0,872576177
1
1262
1
0,498753117
0,571586906
1892
0,4
0,298953662
0,342610388
2522
0,28
0,222575517
0,255078608
3152
0,3
0,183936235
0,210796765
3782
0,23
0,154362416
0,176904229
4412
0,13
0,130653266
0,149732791
5041
0,1
0,105932203
0,121401668
5671
0,11
0,104166667
0,119378307
6301
0,045
0,100222717
0,114858416
6931
0,061
0,081009296
0,092839225
7561
0,042
0,085714286
0,098231293
8191
0,024
0,083333333
0,095502646
Table 4. The FFT peak frequencies and corresponding amplitudes of a ramp-up wave generated at
630 Hz by a speaker.

The conclusion here is again similar: the ratio of any frequency and the fundamental frequency
is roughly equal to the inverse ratio of their corresponding normalized amplitudes, as predicted by
theory. The data again diverge slightly at higher frequency data points, but this is understandable
for the same reasons listed above.
A table of frequencies and corresponding relative amplitudes for the FFT of three different
vowel sounds sung into a microphone is shown in Table 5. The fundamental frequency (which
corresponds to the note being sung) remained roughly constant at 183 Hz between the different
vowels.

/ah/

/oo/

/ee/

Relative
Amplitude

Frequency
(Hz)

Relative
Amplitude

Frequency
(Hz)

Relative
Amplitude

Frequency
(Hz)

0,44

183

0,245

183

0,703

178

0,279

369

359

359

0,83

549

0,094

539

0,117

535

730

0,054

725

0,068

715

0,895

910

0,036

911

0,038

891

0,415

1096

0,078

1082

0,087

2136

0,187

1277

0,028

1267

0,092

2312

0,078

1457

0,159

2487

0,014

2370

0,045

2668

0,031

2551

0,14

2844

0,061

2732

0,035

3020

0,091

2912

0,046

3201

0,041

3098

0,06

3279

0,049

3459

0,044

3640

Table 5. The FFT peak frequencies and corresponding amplitudes of three vowel sounds being sung
into a microphone, all with a fundamental frequency near 183 Hz

Theory1 predicts that the /ah/ vowel has formants centered near 710 Hz, 1100 Hz, and 2640
Hz, with the formant at 2640 having lesser amplitude. Indeed with our data, we see frequency peaks
near these values, at 730 Hz, 1096 Hz, and 2912 Hz, respectively, and the peak at 2912 Hz indeed
has lesser amplitude. The peaks at 730 Hz and 2912 Hz act well as centers of peaks since the
amplitudes of surrounding peaks fall off roughly at the same rate on either side of these central
peaks. The peak at 1096 Hz functions less well as a center, but it is reassuring that such a peak
exists. The peak at 2912 Hz is the most off from the predicted value of 2640 Hz, but as we have

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
1

Wright, David. Mathematics and Music. Providence, RI: American Mathematical Society,
2009. 134. Print.
10

shown before, the FFT data is less reliable at high frequencies, perhaps due to the very small
amplitude values.
Theory predicts that the /oo/ vowel has formants centered near 310 Hz, 870 Hz, and 2250 Hz.
Looking at our data we see a similar peak at 359 Hz but no other similar values. The peak at 2250
Hz is predicted to have very low amplitude, and given that it is at a high frequency, there is much
room for error. Furthermore, there may have been variations on how we were singing (pronouncing)
the /oo/ vowel as opposed to those gathering the theoretical results, which may explain the other
missing peak.
Theory predicts that the /ee/ vowel has formants centered near 280 Hz, 2250 Hz, and 2900
Hz. We see corresponding peaks at 359 Hz, 2487 Hz, and 2844 Hz. These peaks are all relatively
close to the predicted values in terms of frequency, and they seem to follow the appropriate trend for
relative amplitudes (that is, the 280 Hz being higher than the 2250 Hz and 2900 Hz, which are
roughly equal). Furthermore, each of these peaks acts well as a center of peaks showing that there
was much less error in the data from this vowel than the other.
Finally, though this did not yield quantitative data, layering computer-generated sine waves in
Garage Band at frequencies and amplitudes listed in the /ah/ column of Table 5 does indeed yield
somewhat of an /ah/ sound! The result is clearly not human, but as each layer gets added, the
overall tone of the sound distinctly becomes more /ah/-like, as opposed to just a straight tone, which
further confirms that human vowel sounds can actually be characterized by their (unique)
constituent frequencies or formants.

11

CONCLUSION
This experiment has verified the merits of Fourier Analysis as well as applied it to effectively
determine the frequencies of the sine waves that make up three human vowel sounds. Furthermore,
it has successfully recreated one of these vowel sounds using layered, computer-generated sound.
Finding that the FFT for a sine wave is simply a peak at the frequency at which it is being driven,
we verify that more complicated waveforms like square waves, ramp-up waves, and human vowels
are indeed a sum of constituent sine waves using data that corresponds very well to theoretically
predicted values. Our amplitude data do slightly diverge from theory, however, for peaks at higher
frequency values.
Given the excellence of the results for lower frequency peaks, this divergence is most likely due
to experimental error. In fact, there may be the same amount of (miniscule) experimental error in
trials for all of the frequency values, and it may be only visible in the high frequency peaks because
those peaks have very low amplitude to begin with the same amount of error would appear
magnified. In any case, measurement issues do arise when dealing with very small values.
Future experimenters might repeat the experiment in a more perfectly quiet room to reduce any
potential noise or such sources of error with many other lab groups present, there was quite a lot
of external noise in our experiment. Otherwise, they might choose to simply ignore these high
frequency peaks with low amplitude since it is difficult to gather data, and compared to the other
peaks with large amplitude, they have a small effect on the overall waveform.

12

You might also like