Professional Documents
Culture Documents
!"#$%&'()+,-./012345<yA|
M ASARYK U NIVERSITY
FACULTY OF I NFORMATICS
B ACHELOR THESIS
Roman Kollar
ii
Acknowledgement
I would like to thank RNDr. Bc. Jonas Sevck for suggestions and ideas
about the content of this work.
iii
Abstract
iv
Keywords
v
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Audio and video coding . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Temporal . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.2 Spatial . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Audio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Frame types . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Aspect ratios . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.5 Timestamps . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.1 H.264 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Advanced Audio Coding . . . . . . . . . . . . . . . . . . . . . 9
3.3 MPEG transport stream . . . . . . . . . . . . . . . . . . . . . 9
3.4 FLV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4 FFmpeg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.1 Command line options syntax . . . . . . . . . . . . . . . . . . 11
4.1.1 Useful options . . . . . . . . . . . . . . . . . . . . . . . 11
4.2 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.2.1 Useful filters . . . . . . . . . . . . . . . . . . . . . . . . 13
4.3 H.264 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.3.1 x264 options . . . . . . . . . . . . . . . . . . . . . . . . 13
4.4 AAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.4.1 fdk aac . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.4.2 faac . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.4.3 aac . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.4.4 vo aacenc . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.5 LibAV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5 Known problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.1 A/V synchronization . . . . . . . . . . . . . . . . . . . . . . . 16
5.2 Timestamp overflow . . . . . . . . . . . . . . . . . . . . . . . 16
5.3 Sample aspect ratio in FLV format . . . . . . . . . . . . . . . 17
5.4 re option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1
5.5 Start delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.5.1 Seamless restart proposal . . . . . . . . . . . . . . . . 18
6 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
6.1 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
6.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6.2.1 Versions . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6.2.2 Multiple outputs . . . . . . . . . . . . . . . . . . . . . 22
6.2.3 x264 presets . . . . . . . . . . . . . . . . . . . . . . . . 22
6.2.4 AAC libraries . . . . . . . . . . . . . . . . . . . . . . . 25
6.3 Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
6.3.1 Most common cases . . . . . . . . . . . . . . . . . . . 26
7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
8 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
A Attachments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
A.1 Patches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
A.2 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2
Chapter 1
Introduction
In the past video and audio were transmitted using analog channels. With
the development of technology digital channels have taken their place.
In the recent years a new medium caused a revolution in this area the
Internet. It created a way of distributing multimedia data to every device
connected to the Internet without the need of a special hardware. This
allowed an exponential growth in the industry and also development of
better audio and video formats.
Today, transport of multimedia data can be done in numerous ways.
This work is about live streaming such data over the Internet. Depending
on the device used by the receiver, this may require usage of different
formats and settings.
A tool called ffmpeg from the FFmpeg project allows to convert (e.g.
decode and encode) video and audio to formats suitable for distribution
over the Internet. These formats, for example, reduce the quality of the
multimedia data to reduce its bit rate.
The goal of this work is to prepare the current version of the ffmpeg
tool and some of the external libraries for purposes defined above. There
are some already known problems in the current version. With a particular
combination of input format and configuration, ffmpeg is unable to run
longer that 26.5 hours. Together with another problem, the start delay,
this causes an unacceptable discontinuity in the output stream. When
running continuously for a long period of time, there can be permanent
problems, like audio to video synchronization. Its impossible for these kind
of problems to be detected automatically and have to be avoided. The last
problem is the inability of video players for web browsers to handle some
aspects of the video. These problems are analyzed, explained and fixed
either by a change in the configuration or by a patch of the source code.
When running multiple instances of ffmpeg there is a need of an optimal
configuration in order to save CPU power and memory capacity. Testing
has to be done to find the configuration offering the best output quality for
an acceptable CPU usage. Also, if the source of the input data is transmitted
3
1. I NTRODUCTION
over the network, there is a possibility of random errors. The second part
of the testing covers tests of such damaged data and shows the ability of
ffmpeg to handle the errors.
Chapter 2 contains explanation of terms and principles used in audio
and video coding. Chapter 3 is about video and file formats used later
on. Chapter 4 explains what is FFmpeg and what are some of its relevant
configuration methods. It also lists external libraries and their options used
in the practical part. Chapter 5 describes and solves the issues with the
ffmpeg tool. Chapter 6 contains performance and stability tests of both
ffmpeg and avconv.
4
Chapter 2
2.1 Video
Digital video is sampled (Figure 2.1) spatially into pixels and temporally
into frames. Multiple temporal samples form an appearance of motion.
Video coding uses both spatial and temporal redundancy.
5
2. A UDIO AND VIDEO CODING
2.1.1 Temporal
2.1.2 Spatial
6
2. A UDIO AND VIDEO CODING
2.2 Audio
The ratio of width and height of a single pixel is the sample aspect
ratio (SAR). In a typical video, it defines how each pixel should be
stretched in width. Resulting ratio of the displayed image is computed
by multiplying SAR with the dimension in pixels. This ratio is called the
display aspect ratio (DAR).
The terminology described above is the one used by the FFmpeg project.
There is also an alternative and conflicting terminology [6].
2.5 Timestamps
7
2. A UDIO AND VIDEO CODING
should be decoded. The presentation timestamp (PTS) is the time when the
frame should be presented to the viewer. Having two types of timestamps
is relevant only if the codec uses B-frames. Otherwise PTS is equal to DTS.
For example (Table 2.1), if the frames are in presentation order IBBBBP
and the B-frames use the P-frame as reference, then the P-frame has to be
decoded before the B-frames. Thus, the decoding order would be IPBBBB:
Frame I P B B B B
DTS 0 10 20 30 40 50
PTS 0 50 10 20 30 40
8
Chapter 3
Formats
3.1 H.264
9
3. F ORMATS
3.4 FLV
FLV is one of the Flash Video container formats. It is used to store a single
video stream and audio stream. Currently, FLV is used as a format for
playing video and audio on the web using the Adobe Flash Player5 .
A FLV file starts with a header indicating presence of video or audio
streams. The following body consists of alternating back-pointers and tags.
The back-pointer is a size of the previous tag and is 0 for the first one.
Every tag contains, among other things, its type, a timestamp and the
data itself. The timestamps are equivalent to DTS (section 2.5) in case the
video codec uses B-frames (section 2.3). The data is composed from a set of
headers indicating the type of the payload. In case of video, the payload is a
packet depending on the type of the video. For H.264 packet, there is a field
with the composition time offset. This offset can be added to the timestamp
in the tag to obtain timestamps equivalent to PTS [11].
In addition to video and audio tags, there is a special type of tag
script data. This tag can contain a onMetaData tag which can hold another
information about the file. For example width, height and frame rate of the
video.
5. http://www.adobe.com/products/flashplayer.html.
10
Chapter 4
FFmpeg
b
Sets the bit rate of encoder.
In a live streaming environment it is important to have an optimal bit
rate so that receivers are able to download the data in real time. Using
stream specifiers [12, section 21.1] it can be set for video (-b:v) and
for audio (-b:a). What kind of bit rate is computed and what changes
1. https://www.ffmpeg.org/documentation.html.
2. https://trac.ffmpeg.org/wiki.
11
4. FF MPEG
are made to match the target bit rate is specific to the codec it applies
to.
The size of the buffer over which the bit rate is computed is specified
by the -bufsize option. Limits can be specified with the -minrate
and -maxrate options and can be used to set a constant bit rate even
for video.
g
Sets the maximum GOP length (section 2.3). For live streaming it can
be lowered so that the receiver has to wait less before receiving an
I-frame and being able to decode the video.
copyts
Copies the input timestamps without any processing.
It also keeps the offset of the input timestamps. This can be disabled
using the -start_at_zero option [12, section 5.11].
probesize
Before starting encoding the input data, ffmpeg first analyzes a chunk
of it to detect the individual streams and their parameters. This option
set the probing size in bytes over which the stream is analyzed.
The -analyzeduration is the same, but in microseconds [12,
section 21].
re
Has to be specified for an input. Reads the input at its native frame
rate. This allows to slow down reading of the input for a real-time
output [12, section 5.11].
loglevel
The loglevel option changes the verbosity level [12, section 5.2].
debug ts
This option provides information about the timestamps (section 2.5)
in every stage of the transcoding and remuxing process [12,
section 5.4].
4.2 Filtering
After demuxing and decoding, ffmpeg can apply filters to audio and video.
Multiple filters can be chained together to form a filtergraph. There are two
kinds of filtergraphs.
12
4. FF MPEG
A simple filtergraph has only one input and one output of the same type
and it is applied only to the next output. They can, for example, change the
resolution of a video stream. Configuration is done via the -af and -vf
options (for audio and video).
A complex filtergraph is defined globally and can have multiple
inputs and outputs, even of mixed types. It is configured via the
-filter_complex option. Note that the same streams cannot be used in
both simple filtergraphs and complex filtergraphs [12, section 3.1].
A special case are bitstream filters. They, as opposite to the normal
filters, operate on encoded data without the need of decoding [12,
section 20].
4.3 H.264
3. http://www.videolan.org/developers/x264.html.
13
4. FF MPEG
Quantization parameter
Can be set using the -qp option. Allows to set a fixed quantizer
parameter (section 3.1) for the encoder without regard to bit rate. Setting
QP to 0 results in a lossless encoding. However, not all players can decode
such video [15].
Profile
Can be set using the -profile option. A profile is a subset of H.264
features. This is only needed when the target device supports only a certain
profile.
Preset
Can be set using the -preset option. A preset is a collection of
option changes from the default configuration. A slower preset provides
better compression ratio (lower bit rate) but requires more computational
power. The presets can be viewed by running the x264 utility with the
--fullhelp option.
4.4 AAC
There are four different AAC encoders available in FFmpeg. The encoders
in this section are ordered from best to worst output quality.
4. Not drastically
5. https://github.com/mstorsjo/fdk-aac.
14
4. FF MPEG
via the -b:a option and VBR via the -vbr option [12, section 17.4].
4.4.2 faac
This is the Freeware Advanced Audio Coder6 . It supports ABR via the -b:a
option and VBR via the -q:a option [12, section 17.3].
4.4.3 aac
This is the internal experimental AAC encoder in FFmpeg. It supports CBR
via the -b:a option and has an experimental support for VBR via the -q:a
option [12, section 17.1].
4.4.4 vo aacenc
This encoder has a very poor quality [17] and according to the
documentation it is the worst from the available encoders. It supports only
CBR via the -b:a option [12, section 17.9]. Because of these reasons it will
not be considered in the performance testing.
4.5 LibAV
6. http://www.audiocoding.com/faac.html.
7. https://libav.org/.
8. https://github.com/FFmpeg/FFmpeg.
9. At least feature-wise.
15
Chapter 5
Known problems
When running ffmpeg for a long period of time there are issues when the
audio stops being in sync with video. Samples capturing the moment when
this starts were obtained and the results were analyzed.
Multiple configurations of the aresample audio filter (subsection 4.2.1)
were tested. Tested were also the asyncts alternative in avconv and a
deprecated option -async in ffmpeg. None of these were able to fix the
synchronization problem.
It is caused by missing audio data in the input. Audio timestamps in
ffmpeg are computed only from the number of samples and the sampling
frequency(section 2.2). If there is a part missing, it is skipped and the audio
is then continuous. This results in the audio being ahead of the video for
the rest of the processing and it cannot be automatically detected. None
of the audio filters were able to compensate, because they need correct
timestamps to work.
The only solution is to use the -copyts option (subsection 4.1.1), so that
the timestamps are preserved and the audio filters for synchronization are
able to use these timestamps to correctly insert silence in the audio stream
to compensate for the missing data.
16
5. K NOWN PROBLEMS
2 33
a frequency of 90 kHz, they overflow exactly after 90000 95443.72 s
26.51 h. The patch (appendix patch 1) handles such overflow by unrolling
them.
If a web-based FLV (section 3.4) player ignores the SAR (section 2.4) stored
inside the video stream then an invalid DAR is computed and the video is
disproportional. Note that FLV has no means of storing SAR.
The only way to have correct DAR is to change the width or height of the
video.. This can be done by resending the onMetaData tag with the correct
width or height every time the SAR in the input video changes. Note that
the onMetaData tag is ignored by conventional video players.
A patch (appendix patch 2) was created that adds this ability to ffmpeg.
It splits the flv_write_header function into two1 one that writes only
the header and one that writes the onMetaData tag. This acquires the
ability to resend the tag when a SAR change is detected and the width of the
video is then properly scaled. However, the muxer writing function only
has the first SAR available during the muxing process and any changes are
not manifested in the structures available. Thus, the aspect ratio has to be
updated manually in the main code of ffmpeg by a workaround.
5.4 re option
Using the -re option together with the -copyts option(subsection 4.1.1)
causes ffmpeg not to start reading and processing the input. The -re option
works by computing the relative time from the start in the same units as the
produced timestamps. When the timestamps are ahead of the relative time,
reading sleeps for a short period of time.
However, when using the -copyts option, the timestamps have an
offset. This results in sleeping until the relative time catches up to the offset.
There are two possible solutions. The -start_at_zero option2 can be
used to globally remove the offset. Alternatively, a patch (appendix patch
3) that subtracts the offset from the timestamp locally in the function that
compares it to the relative time can be used. After using one of these options
ffmpeg starts immediately as normal.
1. During creation of this work, this has been already done in LibAV and merged into
FFmpeg.
2. This option has been added October 20 2014.
17
5. K NOWN PROBLEMS
For example, if there are 10 seconds of video in the buffer of the backup
process and the normal process is killed after 10 seconds of not producing
any output, the lost data are recovered by processing the contents of the
buffer. If sending data to a receiver is delayed by at least 10 seconds, this
minimizes any major discontinuities on the receivers end.
An experimental patch (appendix patch 4) was created when reading
18
5. K NOWN PROBLEMS
input via UDP3 . It keeps the circular buffer used by ffmpeg when reading
from an UDP source filled to no more than a half of its capacity. This
capacity can be set via the fifo_size option in the URL [12, section 25.33].
Waiting for the signal itself is activated via the wait_for_signal option
in the URL and then ffmpeg waits for the SIGUSR1 signal. After receiving
the signal it continues as usual.
To further improve this method, the backup process buffer can be
synchronized by flushing it via another signal when the main process is
producing valid output. Sending of the signal is done by the application
that detects error in the output of the processes. When the main process
stops producing data, the backup process buffer contains exactly the data
from the point when the main process stopped producing valid data.
3. http://tools.ietf.org/html/rfc768.
19
Chapter 6
Testing
6.1 Tools
1. https://github.com/pyke369/sffmpeg.
20
6. T ESTING
The memory used by the process is obtained using a tool called smem 2 .
The memory size used in the tests is the unique set size (USS), which is the
memory unique to the process, thus not including any shared libraries3 .
Data gathering is done using a script that saves the data in an rrd
database and produces graphs (appendix tool 1) using the rrdtool utility4 .
Note that the tests were performed on a GNU/Linux server and everything
stated above applies to only this operating system.
6.2 Performance
6.2.1 Versions
This test compares new and old version of ffmpeg and the current version
of avconv.
2. http://www.selenic.com/smem/.
3. This does not really matter in this case since the tools were linked statically.
4. http://oss.oetiker.ch/rrdtool/.
21
6. T ESTING
There is no difference between the old and the new versions of ffmpeg. Only
avconv has a slightly higher CPU usage and uses about 20% more memory.
Using two separate ffmpeg processes the CPU usage is higher by around
12% and the memory usage is higher by around 30% because of having to
decode the input twice.
Tests of the x264 library presets. The goal is to test how these presets affect
CPU usage and the bit rate. Encoder settings are left on their default values
and the output video is scaled down to a 960540 resolution.
22
6. T ESTING
23
6. T ESTING
Table 6.2 shows the relation between the CPU/memory usage and the bit
rate. The results show that using faster preset than veryfast is not worth
the saved CPU power. At this output settings using slow presets offers only
small decrease in the bit rate. However, presets slower than medium have
a significantly better output quality and thus the slower or slow preset
should be used when high quality output is required, since they CPU usage
penalty is worth the quality increase [24].
24
6. T ESTING
The only difference is in the memory usage of the faac library, which uses
about twice as much memory as other encoders.
6.3 Stability
25
6. T ESTING
After starting to receive undamaged input, both ffmpeg and avconv never
resumed producing valid data and the CPU usage dropped to minimum.
In some cases, one of the tools was able to produce valid output. They,
however, started to use more CPU even when they started reading
undamaged input.
26
6. T ESTING
In this case, avconv recovered and produced valid data. However, it started
to use more memory than before, probably due to a memory leak during
processing of the damaged input. The size of the leak varied between
different inputs.
In this case, no tool continued to produce valid data and avconv started to
use more CPU while endlessly allocating memory.
27
6. T ESTING
Note that the drop at the end in the memory usage (Figure 6.16) is a
result of closing the output socket.
28
Chapter 7
Conclusion
1. Not if the input has an error-correcting code with enough redundancy attached
29
7. C ONCLUSION
30
Chapter 8
Bibliography
31
8. B IBLIOGRAPHY
32
Appendix A
Attachments
Any input data, exact configurations or logs of ffmpeg and avconv are
not included as requested by the advisor. Some of the patches may not be
public.
A.1 Patches
1 ffmpeg-2.4.3-mpegts_ts_unroll.patch .
Source: https://github.com/arut/ffmpeg-patches/blob/
master/mpegts-33bit.
2 ffmpeg-2.4.3-flv_sar.patch .
3 ffmpeg-2.4.3-re.patch .
4 ffmpeg-2.4.3-wait_for_signal.patch .
A.2 Tools
1 rrd.sh: a script for gathering data and creating graphs using rrdtool.
33