You are on page 1of 38

}w

!"#$%&'()+,-./012345<yA|
M ASARYK U NIVERSITY
FACULTY OF I NFORMATICS

Configuration of FFmpeg for


High Stability During
Encoding

B ACHELOR THESIS

Roman Kollar

Brno, autumn 2014


Declaration

Hereby I declare, that this paper is my original authorial work, which I


have worked out by my own. All sources, references and literature used or
excerpted during elaboration of this work are properly cited and listed in
complete reference to the due source.

Advisor: RNDr. Bc. Jonas Sevck

ii
Acknowledgement

I would like to thank RNDr. Bc. Jonas Sevck for suggestions and ideas
about the content of this work.

iii
Abstract

The goal of this work is to explain basic principles in video coding,


multiplexing of video and audio, introduce FFmpeg and LibAV projects
and to test the ffmpeg/avconv tools for live streaming usage.

iv
Keywords

FFmpeg, LibAV, streaming, MPEG-TS, FLV, aspect ratio, timestamps

v
Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Audio and video coding . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Temporal . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.2 Spatial . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Audio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Frame types . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Aspect ratios . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.5 Timestamps . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.1 H.264 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Advanced Audio Coding . . . . . . . . . . . . . . . . . . . . . 9
3.3 MPEG transport stream . . . . . . . . . . . . . . . . . . . . . 9
3.4 FLV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4 FFmpeg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.1 Command line options syntax . . . . . . . . . . . . . . . . . . 11
4.1.1 Useful options . . . . . . . . . . . . . . . . . . . . . . . 11
4.2 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.2.1 Useful filters . . . . . . . . . . . . . . . . . . . . . . . . 13
4.3 H.264 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.3.1 x264 options . . . . . . . . . . . . . . . . . . . . . . . . 13
4.4 AAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.4.1 fdk aac . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.4.2 faac . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.4.3 aac . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.4.4 vo aacenc . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.5 LibAV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5 Known problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.1 A/V synchronization . . . . . . . . . . . . . . . . . . . . . . . 16
5.2 Timestamp overflow . . . . . . . . . . . . . . . . . . . . . . . 16
5.3 Sample aspect ratio in FLV format . . . . . . . . . . . . . . . 17
5.4 re option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1
5.5 Start delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.5.1 Seamless restart proposal . . . . . . . . . . . . . . . . 18
6 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
6.1 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
6.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6.2.1 Versions . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6.2.2 Multiple outputs . . . . . . . . . . . . . . . . . . . . . 22
6.2.3 x264 presets . . . . . . . . . . . . . . . . . . . . . . . . 22
6.2.4 AAC libraries . . . . . . . . . . . . . . . . . . . . . . . 25
6.3 Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
6.3.1 Most common cases . . . . . . . . . . . . . . . . . . . 26
7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
8 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
A Attachments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
A.1 Patches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
A.2 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2
Chapter 1

Introduction

In the past video and audio were transmitted using analog channels. With
the development of technology digital channels have taken their place.
In the recent years a new medium caused a revolution in this area the
Internet. It created a way of distributing multimedia data to every device
connected to the Internet without the need of a special hardware. This
allowed an exponential growth in the industry and also development of
better audio and video formats.
Today, transport of multimedia data can be done in numerous ways.
This work is about live streaming such data over the Internet. Depending
on the device used by the receiver, this may require usage of different
formats and settings.
A tool called ffmpeg from the FFmpeg project allows to convert (e.g.
decode and encode) video and audio to formats suitable for distribution
over the Internet. These formats, for example, reduce the quality of the
multimedia data to reduce its bit rate.
The goal of this work is to prepare the current version of the ffmpeg
tool and some of the external libraries for purposes defined above. There
are some already known problems in the current version. With a particular
combination of input format and configuration, ffmpeg is unable to run
longer that 26.5 hours. Together with another problem, the start delay,
this causes an unacceptable discontinuity in the output stream. When
running continuously for a long period of time, there can be permanent
problems, like audio to video synchronization. Its impossible for these kind
of problems to be detected automatically and have to be avoided. The last
problem is the inability of video players for web browsers to handle some
aspects of the video. These problems are analyzed, explained and fixed
either by a change in the configuration or by a patch of the source code.
When running multiple instances of ffmpeg there is a need of an optimal
configuration in order to save CPU power and memory capacity. Testing
has to be done to find the configuration offering the best output quality for
an acceptable CPU usage. Also, if the source of the input data is transmitted

3
1. I NTRODUCTION

over the network, there is a possibility of random errors. The second part
of the testing covers tests of such damaged data and shows the ability of
ffmpeg to handle the errors.
Chapter 2 contains explanation of terms and principles used in audio
and video coding. Chapter 3 is about video and file formats used later
on. Chapter 4 explains what is FFmpeg and what are some of its relevant
configuration methods. It also lists external libraries and their options used
in the practical part. Chapter 5 describes and solves the issues with the
ffmpeg tool. Chapter 6 contains performance and stability tests of both
ffmpeg and avconv.

4
Chapter 2

Audio and video coding

Encoding is a process of compressing a digital input in order to


reduce the bit rate of the output. The opposite process is called
decoding (decompression). Encoding and decoding algorithms together
form a codec (or a video format). After the specific compression methods
are applied there is also removal of the statistical redundancy (i.e. entropy
coding) [1, chapter 3]. Compression algorithms can be either lossy or
lossless. The ratio between compressed and uncompressed data is called
the compression ratio [2, chapter 1].
Different types of encoded multimedia data can be multiplexed together
by muxing (short from multiplexing). The opposite of this process is called
demuxing. Muxing and demuxing algorithms together form a container
format.

2.1 Video

Digital video is sampled (Figure 2.1) spatially into pixels and temporally
into frames. Multiple temporal samples form an appearance of motion.
Video coding uses both spatial and temporal redundancy.

5
2. A UDIO AND VIDEO CODING

Figure 2.1: Temporal and spatial sampling [1]

2.1.1 Temporal

Temporal compression uses redundancy between consecutive frames.


Modern encoders use a prediction model that tries to predict the
content of the current frame using previously encoded frames1 . This
is usually done by motion compensation which tracks the position of
blocks of pixels (macroblocks) between frames. Frames that are used in
this prediction are called reference frames. The predicted frame is then
subtracked from the original one and leaves a residual frame. The decoder
creates the same prediction from previously decoded frames and adds
the current residual frame to it to create an approximation of the original
frame [1, chapter 3].

2.1.2 Spatial

Spatial compression is done after the temporal one. It uses redundancy in


the individual frames the same way as normal static image compression [1,
chapter 3].
One of the techniques used in image compression is a process called
quantization. This process maps a set of numbers to a smaller set of
numbers. A simple example is mapping (i.e. rounding) real numbers to
integers [3].

1. This can include future frames.

6
2. A UDIO AND VIDEO CODING

2.2 Audio

Digital audio is typically represented using pulse-code modulation (PCM).


PCM samples the amplitude of an analog signal. The interval is the
sampling rate (or sampling frequency). The available range of quantization
values for the amplitude is the bit depth [4]. The compression algorithm
then, for example, removes parts that are less important or inaudible to
human ear, like frequencies above 20 kHz.

2.3 Frame types

Depending on what the encoded frame uses as a reference frame, there


are three kinds of frames. An I-frame does not require another frame for
decoding. It is also called the key frame. This frame type requires the most
bits to store. A P-frame uses previous frames as reference. A B-frame uses
both previous and next frames as reference. This frame type requires the
least amount of bits to store [5, section 2.10].
A continuous set of frames that only use frames from this set for
reference is called a group of pictures (GOP). In a GOP, an I-frame is always
decoded first for other frames to use. Therefore, a GOP length is a distance
between two I-frames in the decoding order.

2.4 Aspect ratios

The ratio of width and height of a single pixel is the sample aspect
ratio (SAR). In a typical video, it defines how each pixel should be
stretched in width. Resulting ratio of the displayed image is computed
by multiplying SAR with the dimension in pixels. This ratio is called the
display aspect ratio (DAR).
The terminology described above is the one used by the FFmpeg project.
There is also an alternative and conflicting terminology [6].

2.5 Timestamps

Timestamps allow to synchronize individual data streams in a container


when presented to a viewer. They are also stored in the container2 .
The decoding timestamp (DTS) denotes the time when the frame3

2. Note that not every container has to use timestamps.


3. For audio, this can mean a set of audio samples.

7
2. A UDIO AND VIDEO CODING

should be decoded. The presentation timestamp (PTS) is the time when the
frame should be presented to the viewer. Having two types of timestamps
is relevant only if the codec uses B-frames. Otherwise PTS is equal to DTS.
For example (Table 2.1), if the frames are in presentation order IBBBBP
and the B-frames use the P-frame as reference, then the P-frame has to be
decoded before the B-frames. Thus, the decoding order would be IPBBBB:

Frame I P B B B B
DTS 0 10 20 30 40 50
PTS 0 50 10 20 30 40

Table 2.1: DTS and PTS example

8
Chapter 3

Formats

3.1 H.264

H.264 (or AVC) is a video compression standard defined in the MPEG-4


Part 101 standard. It only defines the syntax of its bitstream for a compliant
decoder to work. The encoder can use its own algorithms if produces a valid
bitstream. It can also only implement a subset (a profile) of the available
features.
The quantization (subsection 2.1.2) is controlled by the quantization
parameter (QP). It can take a value between 0 (lossless) and 51, where 0
is the best and 51 the worst quality [7].
H.264 is a successor of H.2632 and provides a better quality at the same
bit rate. There already exist a successor of H.264 H.265, which has even
better compression ratio, but it is not widespread yet [8].

3.2 Advanced Audio Coding

The Advanced Audio Coding (AAC) is an audio codec defined in the


MPEG-2 Part 73 standard. It replaces older codecs like MP3 and has a better
quality at the same bit rate [9].

3.3 MPEG transport stream

The MPEG-2 Part 14 defines multiplexing of multimedia data to form


a single stream of data. There are two different types of multiplexers

1. ISO/IEC 14496-10 (Advanced Video Coding), http://www.iso.org/iso/home/


store/catalogue_ics/catalogue_detail_ics.htm?csnumber=66069.
2. http://en.wikipedia.org/wiki/H.263.
3. ISO/IEC 13818-7 (Advanced Audio Coding (AAC)), http://www.iso.org/iso/
iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=43345.
4. ISO/IEC 13818-1 (Systems Specification), http://www.iso.org/iso/home/store/
catalogue_ics/catalogue_detail_ics.htm?csnumber=62074.

9
3. F ORMATS

transport stream (MPEG-TS) and program stream (MPEG-PS). For purpose


of this work only the transport stream multiplexer is considered.
Program is a somehow related set of elementary streams (representing,
for example, a television channel). MPEG-TS can have multiple programs.
Elementary streams are encoded streams of multimedia data, typically
video or audio. A single program can contain an arbitrary number of
elementary streams of the same type. For example video streams of
different resolutions or audio streams of different languages.
Each elementary stream is converted into a packetized elementary
stream (PES) by dividing and wrapping the elementary stream into PES-
packets. This is done by adding a header to a payload containing a part
of the elementary stream. In case of the transport stream, the packets
have a fixed size. One of the things located in the PES-packet header are
timestamps. Timestamp (section 2.5) in MPEG-TS is a 33-bit sample from a
90-kHz clock called the program clock. Indicated by a bit flag in the PES-
packet header, it can contain both PTS and DTS [5, 10].

3.4 FLV

FLV is one of the Flash Video container formats. It is used to store a single
video stream and audio stream. Currently, FLV is used as a format for
playing video and audio on the web using the Adobe Flash Player5 .
A FLV file starts with a header indicating presence of video or audio
streams. The following body consists of alternating back-pointers and tags.
The back-pointer is a size of the previous tag and is 0 for the first one.
Every tag contains, among other things, its type, a timestamp and the
data itself. The timestamps are equivalent to DTS (section 2.5) in case the
video codec uses B-frames (section 2.3). The data is composed from a set of
headers indicating the type of the payload. In case of video, the payload is a
packet depending on the type of the video. For H.264 packet, there is a field
with the composition time offset. This offset can be added to the timestamp
in the tag to obtain timestamps equivalent to PTS [11].
In addition to video and audio tags, there is a special type of tag
script data. This tag can contain a onMetaData tag which can hold another
information about the file. For example width, height and frame rate of the
video.

5. http://www.adobe.com/products/flashplayer.html.

10
Chapter 4

FFmpeg

FFmpeg is a multimedia framework for encoding, decoding, muxing,


demuxing, streaming and playing multimedia data. It contains individual
libraries (e.g. libavformat for muxing and demuxing) for developers and
also utilities (e.g. ffmpeg and ffplay) for users. For purpose of this work
only the ffmpeg utility is considered, which is used for transcoding and
remuxing.
The FFmpeg project has a documentation available for both tools and
libraries1 . There is also a community maintained wiki page and a bug
tracker2 .

4.1 Command line options syntax

Command line options consist of arbitrary number of inputs and arbitrary


number of outputs. Everything except options is considered an output.
Options apply to the next input or output and they are reset for the next
input or output, with the exception of global options. Even for options
for the same input or output the order does matter. Also, according to the
documentation, all inputs should be specified before outputs [12, section 2].

4.1.1 Useful options

b
Sets the bit rate of encoder.
In a live streaming environment it is important to have an optimal bit
rate so that receivers are able to download the data in real time. Using
stream specifiers [12, section 21.1] it can be set for video (-b:v) and
for audio (-b:a). What kind of bit rate is computed and what changes

1. https://www.ffmpeg.org/documentation.html.
2. https://trac.ffmpeg.org/wiki.

11
4. FF MPEG

are made to match the target bit rate is specific to the codec it applies
to.
The size of the buffer over which the bit rate is computed is specified
by the -bufsize option. Limits can be specified with the -minrate
and -maxrate options and can be used to set a constant bit rate even
for video.
g
Sets the maximum GOP length (section 2.3). For live streaming it can
be lowered so that the receiver has to wait less before receiving an
I-frame and being able to decode the video.
copyts
Copies the input timestamps without any processing.
It also keeps the offset of the input timestamps. This can be disabled
using the -start_at_zero option [12, section 5.11].
probesize
Before starting encoding the input data, ffmpeg first analyzes a chunk
of it to detect the individual streams and their parameters. This option
set the probing size in bytes over which the stream is analyzed.
The -analyzeduration is the same, but in microseconds [12,
section 21].
re
Has to be specified for an input. Reads the input at its native frame
rate. This allows to slow down reading of the input for a real-time
output [12, section 5.11].
loglevel
The loglevel option changes the verbosity level [12, section 5.2].
debug ts
This option provides information about the timestamps (section 2.5)
in every stage of the transcoding and remuxing process [12,
section 5.4].

4.2 Filtering

After demuxing and decoding, ffmpeg can apply filters to audio and video.
Multiple filters can be chained together to form a filtergraph. There are two
kinds of filtergraphs.

12
4. FF MPEG

A simple filtergraph has only one input and one output of the same type
and it is applied only to the next output. They can, for example, change the
resolution of a video stream. Configuration is done via the -af and -vf
options (for audio and video).
A complex filtergraph is defined globally and can have multiple
inputs and outputs, even of mixed types. It is configured via the
-filter_complex option. Note that the same streams cannot be used in
both simple filtergraphs and complex filtergraphs [12, section 3.1].
A special case are bitstream filters. They, as opposite to the normal
filters, operate on encoded data without the need of decoding [12,
section 20].

4.2.1 Useful filters


aresample
Filter for resampling (changing the sample rate (section 2.2) audio to
have the specified parameters. One of its useful functions is the ability
to stretch/squeeze the audio and to inject silence or cut part the audio
out to match the timestamps [12, section 35.12].
The LibAV project does not have this filter. A comparable alternative
in avconv is the asyncts filter [13, section 21.8].
overlay
Can place one video over another. It can be, for example, used to insert
bitmap subtitles into the video [12, section 38.65] or to create a mosaic
from multiple input videos (appendix tool 2).

4.3 H.264

The H.264 video codec is supported through an external library called


x2643 . Configuration options of x264 can be specified as corresponding
options in ffmpeg[14].

4.3.1 x264 options


Bit rate
For the x264 library the -b:v sets the average bit rate. It changes the
QP (section 3.1) to match the target bit rate.

3. http://www.videolan.org/developers/x264.html.

13
4. FF MPEG

Quantization parameter
Can be set using the -qp option. Allows to set a fixed quantizer
parameter (section 3.1) for the encoder without regard to bit rate. Setting
QP to 0 results in a lossless encoding. However, not all players can decode
such video [15].

Constant rate factor


Can be set using the -crf option. CRF changes the overall quality of the
output without regard to bit rate. Like QP, it takes values from 0 to 51.
The difference between QP and CRF is that CRF changes QP dynamically4
depending on the input, e.g. using higher QP (worse quality) when there
is more motion between frames. Because of this, using CRF produces
subjectively better quality video [16].

Profile
Can be set using the -profile option. A profile is a subset of H.264
features. This is only needed when the target device supports only a certain
profile.

Preset
Can be set using the -preset option. A preset is a collection of
option changes from the default configuration. A slower preset provides
better compression ratio (lower bit rate) but requires more computational
power. The presets can be viewed by running the x264 utility with the
--fullhelp option.

4.4 AAC

There are four different AAC encoders available in FFmpeg. The encoders
in this section are ordered from best to worst output quality.

4.4.1 fdk aac


This is the Fraunhofer FDK AAC library5 . According to the documentation
of FFmpeg, the fdk aac library has the best output quality. It supports CBR

4. Not drastically
5. https://github.com/mstorsjo/fdk-aac.

14
4. FF MPEG

via the -b:a option and VBR via the -vbr option [12, section 17.4].

4.4.2 faac
This is the Freeware Advanced Audio Coder6 . It supports ABR via the -b:a
option and VBR via the -q:a option [12, section 17.3].

4.4.3 aac
This is the internal experimental AAC encoder in FFmpeg. It supports CBR
via the -b:a option and has an experimental support for VBR via the -q:a
option [12, section 17.1].

4.4.4 vo aacenc
This encoder has a very poor quality [17] and according to the
documentation it is the worst from the available encoders. It supports only
CBR via the -b:a option [12, section 17.9]. Because of these reasons it will
not be considered in the performance testing.

4.5 LibAV

LibAV7 is a fork of FFmpeg (with ffmpeg equivalent called avconv). It was


created in 2011 by a part of developers unsatisfied with the management
after a failed takeover of the FFmpeg project [18].
Some of GNU/Linux distributions (mainly Debian and therefore also
Ubuntu) switched to LibAV since the maintainers of the FFmpeg package
in them were the same people that forked LibAV [19]. However, Debian
maintainers are considering switching back to FFmpeg [20].
At the moment, changes in LibAV are merged into the ffmpegs git
repository8 and LibAV only occasionally picks changes from FFmpeg [21].
FFmpeg is therefore a superset of LibAV9 and has less bugs [19]. According
to one of the Debian security team members, FFmpeg developers are also
more responsive to security issues [22].

6. http://www.audiocoding.com/faac.html.
7. https://libav.org/.
8. https://github.com/FFmpeg/FFmpeg.
9. At least feature-wise.

15
Chapter 5

Known problems

5.1 A/V synchronization

When running ffmpeg for a long period of time there are issues when the
audio stops being in sync with video. Samples capturing the moment when
this starts were obtained and the results were analyzed.
Multiple configurations of the aresample audio filter (subsection 4.2.1)
were tested. Tested were also the asyncts alternative in avconv and a
deprecated option -async in ffmpeg. None of these were able to fix the
synchronization problem.
It is caused by missing audio data in the input. Audio timestamps in
ffmpeg are computed only from the number of samples and the sampling
frequency(section 2.2). If there is a part missing, it is skipped and the audio
is then continuous. This results in the audio being ahead of the video for
the rest of the processing and it cannot be automatically detected. None
of the audio filters were able to compensate, because they need correct
timestamps to work.
The only solution is to use the -copyts option (subsection 4.1.1), so that
the timestamps are preserved and the audio filters for synchronization are
able to use these timestamps to correctly insert silence in the audio stream
to compensate for the missing data.

5.2 Timestamp overflow

Using the -copyts option in ffmpeg, it starts to produce invalid output


after 26.51 hours. The logs show that the timestamps are non-monotonous.
[ mpegts @ 0 x38e3a40 ] Nonmonotonous DTS i n output stream 0 : 0 ;
previous : 8 5 8 9 9 3 1 2 0 0 , c u r r e n t : 2 0 8 ; changing t o 8 5 8 9 9 3 1 2 0 1 .
This may r e s u l t i n i n c o r r e c t timestamps i n t h e output f i l e .

This happens because of a timestamp overflow in the MPEG-


TS (section 3.3) demuxer. Since timestamps in MPEG-TS have 33 bits and

16
5. K NOWN PROBLEMS
2 33
a frequency of 90 kHz, they overflow exactly after 90000 95443.72 s
26.51 h. The patch (appendix patch 1) handles such overflow by unrolling
them.

5.3 Sample aspect ratio in FLV format

If a web-based FLV (section 3.4) player ignores the SAR (section 2.4) stored
inside the video stream then an invalid DAR is computed and the video is
disproportional. Note that FLV has no means of storing SAR.
The only way to have correct DAR is to change the width or height of the
video.. This can be done by resending the onMetaData tag with the correct
width or height every time the SAR in the input video changes. Note that
the onMetaData tag is ignored by conventional video players.
A patch (appendix patch 2) was created that adds this ability to ffmpeg.
It splits the flv_write_header function into two1 one that writes only
the header and one that writes the onMetaData tag. This acquires the
ability to resend the tag when a SAR change is detected and the width of the
video is then properly scaled. However, the muxer writing function only
has the first SAR available during the muxing process and any changes are
not manifested in the structures available. Thus, the aspect ratio has to be
updated manually in the main code of ffmpeg by a workaround.

5.4 re option

Using the -re option together with the -copyts option(subsection 4.1.1)
causes ffmpeg not to start reading and processing the input. The -re option
works by computing the relative time from the start in the same units as the
produced timestamps. When the timestamps are ahead of the relative time,
reading sleeps for a short period of time.
However, when using the -copyts option, the timestamps have an
offset. This results in sleeping until the relative time catches up to the offset.
There are two possible solutions. The -start_at_zero option2 can be
used to globally remove the offset. Alternatively, a patch (appendix patch
3) that subtracts the offset from the timestamp locally in the function that
compares it to the relative time can be used. After using one of these options
ffmpeg starts immediately as normal.

1. During creation of this work, this has been already done in LibAV and merged into
FFmpeg.
2. This option has been added October 20 2014.

17
5. K NOWN PROBLEMS

5.5 Start delay

Restarting ffmpeg is needed when the input is corrupted and it is unable


to recover from it. However, there is an analyze phase (controlled by the
-probesize or -analyzeduration options (subsection 4.1.1)) before
starting to produce output. With a live source, ffmpeg has to wait for
enough data to be buffered, which adds a discontinuity (after a restart)
in the output on the receivers end depending on the options mentioned
above. Lowering the -probesize and -analyzeduration options is
not a solution, because it raises the probability of incorrect detection of
individual streams or their parameters.

5.5.1 Seamless restart proposal

Previous problem can be fixed by running a second instance of ffmpeg in


the background that only buffers data and does not process them. When
needed, the processing in the backup process can be resumed by sending a
signal to it. After that, another backup process is started in the background
waiting for a signal.
Because enough data for the analyze phase is buffered, there is no
additional delay. If the time length of the buffered data corresponds with
the invalid output detection time (Figure 5.1), it also removes any output
discontinuity caused by this detection.

Figure 5.1: Seamless restart proposal

For example, if there are 10 seconds of video in the buffer of the backup
process and the normal process is killed after 10 seconds of not producing
any output, the lost data are recovered by processing the contents of the
buffer. If sending data to a receiver is delayed by at least 10 seconds, this
minimizes any major discontinuities on the receivers end.
An experimental patch (appendix patch 4) was created when reading

18
5. K NOWN PROBLEMS

input via UDP3 . It keeps the circular buffer used by ffmpeg when reading
from an UDP source filled to no more than a half of its capacity. This
capacity can be set via the fifo_size option in the URL [12, section 25.33].
Waiting for the signal itself is activated via the wait_for_signal option
in the URL and then ffmpeg waits for the SIGUSR1 signal. After receiving
the signal it continues as usual.
To further improve this method, the backup process buffer can be
synchronized by flushing it via another signal when the main process is
producing valid output. Sending of the signal is done by the application
that detects error in the output of the processes. When the main process
stops producing data, the backup process buffer contains exactly the data
from the point when the main process stopped producing valid data.

3. http://tools.ietf.org/html/rfc768.

19
Chapter 6

Testing

Since encoding video and audio is CPU intensive, it is important to use


optimal configuration and libraries. In a live streaming environment it is
also important to know if the tools can survive any kind of input so they
can run continuously. Following tests cover both these aspects. For tests
performed multiple times, only one example is provided if all the tests have
the same result.

6.1 Tools

The following versions of ffmpeg, avconv and external libraries are


used (Table 6.1). Note that for the 0.142.x version of the x264 library this
is a git snapshot snapshot-20141116-2245-stable. For easy statically linked
compilation of FFmpeg, LibAV and the libraries, the sffmpeg project1 is
used.
FFmpeg/libAV x264 faac fdk aac
ffmpeg-new 2.4.3 0.142.x 1.28 0.1.3
ffmpeg-old 0.8.16 0.122.2184 1.28 not used
avconv 11 0.142.x 1.28 0.1.3

Table 6.1: Versions

The CPU usage in the tests is collected from the /proc/<pid>/stat


file where the CPU time spent running the process is available. These times
are in ticks per seconds [23]. Times spent in kernel and user space are used
and added together. The values available in the user space have a 100Hz
resolution. Thus, the actual time in seconds (per second) can be obtained
by dividing the number of ticks by 100. Because the graphs show the tick
increment per second, the number of ticks per second also represents the
CPU usage in percents in that second.

1. https://github.com/pyke369/sffmpeg.

20
6. T ESTING

The memory used by the process is obtained using a tool called smem 2 .
The memory size used in the tests is the unique set size (USS), which is the
memory unique to the process, thus not including any shared libraries3 .
Data gathering is done using a script that saves the data in an rrd
database and produces graphs (appendix tool 1) using the rrdtool utility4 .
Note that the tests were performed on a GNU/Linux server and everything
stated above applies to only this operating system.

6.2 Performance

6.2.1 Versions

This test compares new and old version of ffmpeg and the current version
of avconv.

Figure 6.1: CPU usage

Figure 6.2: Memory usage

2. http://www.selenic.com/smem/.
3. This does not really matter in this case since the tools were linked statically.
4. http://oss.oetiker.ch/rrdtool/.

21
6. T ESTING

There is no difference between the old and the new versions of ffmpeg. Only
avconv has a slightly higher CPU usage and uses about 20% more memory.

6.2.2 Multiple outputs

Figure 6.3: CPU usage

Figure 6.4: Memory usage

Using two separate ffmpeg processes the CPU usage is higher by around
12% and the memory usage is higher by around 30% because of having to
decode the input twice.

6.2.3 x264 presets

Tests of the x264 library presets. The goal is to test how these presets affect
CPU usage and the bit rate. Encoder settings are left on their default values
and the output video is scaled down to a 960540 resolution.

22
6. T ESTING

Figure 6.5: CPU usage

Figure 6.6: Memory usage

Figure 6.7: CPU usage

23
6. T ESTING

Figure 6.8: Memory usage

Preset CPU [tps] Memory [MB] Bit rate [kbits/s]


veryslow 1202.17 514.64 928.358
slower 682.20 457.14 1007.65
slow 514.27 429.47 1044.03
medium 391.60 408.83 1054.41
fast 361.91 375.43 1067.69
faster 325.06 358.13 1009.18
veryfast 285.41 339.60 908.94
superfast 248.06 327.20 1571.77
ultrafast 230.33 240.65 2837.75

Table 6.2: Average values

Table 6.2 shows the relation between the CPU/memory usage and the bit
rate. The results show that using faster preset than veryfast is not worth
the saved CPU power. At this output settings using slow presets offers only
small decrease in the bit rate. However, presets slower than medium have
a significantly better output quality and thus the slower or slow preset
should be used when high quality output is required, since they CPU usage
penalty is worth the quality increase [24].

24
6. T ESTING

6.2.4 AAC libraries

Figure 6.9: CPU usage

Figure 6.10: Memory usage

The only difference is in the memory usage of the faac library, which uses
about twice as much memory as other encoders.

6.3 Stability

Goal of these tests is to check the behavior on a damaged MPEG-


TS (section 3.3) input, specifically crashes and recovery after such input.
The damaged input was generated by a fuzzer (appendix tool 4) which flips
every bit in a part of the input with some probability. Validity of output
was checked manually. Tests were performed on both H.264 and MPEG-2
encoded input with similar results. In some cases the processes continued
running even after the output socket was closed.

25
6. T ESTING

6.3.1 Most common cases


1
For a 10000 error probability the most common case was a successful
recovery by both ffmpeg and avconv. This section covers the most common
cases when the recovery failed. The fuzzer injects errors between 1:30 and
1:50.

Case 1: no output in both tools

After starting to receive undamaged input, both ffmpeg and avconv never
resumed producing valid data and the CPU usage dropped to minimum.

Figure 6.11: Case 1: CPU usage

Case 2: excessive CPU usage

In some cases, one of the tools was able to produce valid output. They,
however, started to use more CPU even when they started reading
undamaged input.

Figure 6.12: Case 2: CPU usage

26
6. T ESTING

Case 3: one-time memory leak in avconv

In this case, avconv recovered and produced valid data. However, it started
to use more memory than before, probably due to a memory leak during
processing of the damaged input. The size of the leak varied between
different inputs.

Figure 6.13: Case 3: CPU usage

Figure 6.14: Case 3: Memory usage

In this example, the memory usage increased by over 360%.

Case 4: memory exhaustion by avconv

In this case, no tool continued to produce valid data and avconv started to
use more CPU while endlessly allocating memory.

27
6. T ESTING

Figure 6.15: Case 4: CPU usage

Figure 6.16: Case 4: Memory usage

Note that the drop at the end in the memory usage (Figure 6.16) is a
result of closing the output socket.

28
Chapter 7

Conclusion

Known problems were fixed either by a configuration change (section 5.1,


section 5.4) or by a patch (section 5.2, section 5.3, section 5.5). These fixes
were tested to verify they solve the problems. Solution to the problem with
timestamp overflow (section 5.2) removes the 26.5 hour runtime limit of an
ffmpeg process reading MPEG-TS input.
Performance of multiple versions of tools and external libraries were
tested (section 6.2). They show that it is optimal to use a single ffmpeg
process for multiple outputs in order to save CPU power by not decoding
the input multiple times (subsection 6.2.2). For H.264 they show that the
slow and slower presets are the most preferable presets that offer good
balance between CPU usage, bit rate and output quality (subsection 6.2.3).
For AAC libraries they show that there is no substantial reason to prefer one
of them only for performance reasons (subsection 6.2.4). This makes the
Fraunhofer FDK AAC library the best choice for encoding AAC because
of its superior output quality (subsection 4.4.1). Therefore the tests allow
for a better selection of tools, external libraries and for a more optimized
configuration.
Stability of ffmpeg and avconv was tested on a damaged input using
a fuzzer (section 6.3). All the found problems were present for both
MPEG-2 and H.264 encoded input. The results also show that avconv
has severe memory management problems, which makes it unsuitable
for production environment. No crashes occurred during the testing with
the exception of allocating too much memory in avconv. Since ffmpeg is
also unable to handle damaged input (i.e. recover from it), the the only
solution1 at the moment is to detect such errors and use the seamless restart
proposal (section 5.5).
Considering the current state of the FFmpeg/LibAV conflict (section 4.5)
and memory management issues of LibAV, there is currently no reason to
prefer LibAV over FFmpeg.

1. Not if the input has an error-correcting code with enough redundancy attached

29
7. C ONCLUSION

Since the input in a live-streaming environment can be typically


damaged, fixing the handling of such input is one of the possible directions
of further improvment of the ffmpeg and avconv tools.

30
Chapter 8

Bibliography

[1] Iain E. Richardson. The H.264 Advanced Video Compression


Standard. 2nd. John Wiley & Sons, Ltd, Apr. 2010.
[2] Ida M. Pu. Fundamental Data Compression. Butterworth-Heinemann,
Nov. 2005.
[3] Mario A. T. Figueiredo. Scalar and Vector Quantization. Tech. rep.
Nov. 2008.
[4] Pulse-code modulation. URL: http://en.wikipedia.org/wiki/
Pulse-code_modulation (visited on 01/01/2015).
[5] A Guide to MPEG Fundamentals and Protocol Analysis. Tutorial.
Tektronix, 2000.
[6] Pixel aspect ratio. URL: http : / / en . wikipedia . org / wiki /
Pixel_aspect_ratio (visited on 01/01/2015).
[7] Understanding H.264 video. Tech. rep. IndigoVision, 2008. URL:
http://www.vdtsi.com/indigo/11.pdf.
[8] Jan Ozer. HEVC: Are We There Yet? In: Streaming Media (Sept.
2014). URL: http : / / www . streamingmedia . com / Articles /
Editorial/Featured-Articles/HEVC-Are-We-There-Yet-
99363.aspx (visited on 01/04/2015).
[9] Advanced Audio Coding. URL: http : / / en . wikipedia . org /
wiki/Advanced_Audio_Coding (visited on 01/01/2015).
[10] Peter A. Sarginson. MPEG-2: Overview of the systems layer. Research
and Development Report. British Broadcasting Corporation, 1996.
URL : http://www.bbc.co.uk/rd/publications/rdreport_
1996_02.
[11] Video File Format Specification. Version 10. Adobe Systems
Incorporated. 2008. URL: http : / / www . adobe . com / content /
dam / Adobe / en / devnet / flv / pdfs / video _ file _ format _
spec_v10.pdf.

31
8. B IBLIOGRAPHY

[12] ffmpeg Documentation. URL: https : / / www . ffmpeg . org /


ffmpeg-all.html (visited on 01/01/2015).
[13] avconv Documentation. URL: https : / / libav . org / avconv .
html (visited on 01/01/2015).
[14] x264 FFmpeg Options Guide. URL: https : / / sites . google .
com / site / linuxencoding / x264 - ffmpeg - mapping (visited
on 01/04/2015).
[15] FFmpeg and H.264 Encoding Guide. URL: https://trac.ffmpeg.
org/wiki/Encode/H.264 (visited on 01/01/2015).
[16] Werner Robitza. CRF Guide. 2014. URL: http : / / slhck . info /
articles/crf (visited on 01/01/2015).
[17] Hendrik Leppkes. [PATCH] doc/encoders: Add libvo-aacenc doc.
2013. URL: http://ffmpeg.org/pipermail/ffmpeg- devel/
2013-June/144589.html (visited on 01/01/2015).
[18] Michael Larabel. A Group Of FFmpeg Developers Just Forked As
Libav. In: Phoronix (Mar. 2011). URL: http : / / www . phoronix .
com / scan . php ? page = news _ item & px = OTIwNw (visited on
01/01/2015).
[19] FFmpeg versus Libav. URL: https://github.com/mpv-player/
mpv/wiki/FFmpeg-versus-Libav (visited on 01/01/2015).
[20] Jonathan Corbet. Reconsidering ffmpeg in Debian. In: LWN (Aug.
2014). URL: http : / / lwn . net / Articles / 607591/ (visited on
01/01/2015).
[21] The FFmpeg/Libav situation. 2012. URL: http : / / blog . pkh .
me / p / 13 - the - ffmpeg - libav - situation . html (visited on
01/01/2015).
[22] Moritz Muhlenhoff. Re: Bug#729203: [FFmpeg-devel] Reintroducing
FFmpeg to Debian. Aug. 2014. URL: http://lwn.net/Articles/
607596/ (visited on 01/01/2015).
[23] proc(5) - Linux man page. URL: http://linux.die.net/man/5/
proc (visited on 01/01/2015).
[24] James Church. Preset settings in x264: the quality and compression
speed test. Mar. 2014. URL: http : / / www . videoquality . pl /
preset - settings - x264 - quality - compression - speed -
test/ (visited on 01/03/2015).

32
Appendix A

Attachments

Any input data, exact configurations or logs of ffmpeg and avconv are
not included as requested by the advisor. Some of the patches may not be
public.

A.1 Patches

1 ffmpeg-2.4.3-mpegts_ts_unroll.patch .
Source: https://github.com/arut/ffmpeg-patches/blob/
master/mpegts-33bit.

2 ffmpeg-2.4.3-flv_sar.patch .

3 ffmpeg-2.4.3-re.patch .

4 ffmpeg-2.4.3-wait_for_signal.patch .

5 coreutils-8.23-tee_pw.patch: a patch for tee created during


some of the tests where a single input has to be copied to multiple
instances of ffmpeg. Allows to ignore the SIGPIPE signal and write
errors.

A.2 Tools

1 rrd.sh: a script for gathering data and creating graphs using rrdtool.

2 mosaic.sh: an example script that utilizes the overlay filter in


FFmpeg to create a mosaic of multiple input videos.

3 Directory tester/: helper scripts for running multiple instances of


ffmpeg.

4 Directory fuzzer/: the fuzzer used in the stability tests.

33

You might also like