Ffmpeg HOWTO

Ffmpeg is the backbone of many video-related applications from players to editors (and is in fact even both of these things in itself). For the most flexible video and audio system possible, Ffmpeg should be installed directly from the latest source code with as many codec options turned on as possible. Find the code at http://ffmpeg.org.

Strengths [Weaknesses]

Codec Indifference

Ffmpeg contains one of the most powerful libraries of multimedia (on any platform, evident by how common it is used in both open and closed source applications). A complete build of ffmpeg can ingest, process, and output nearly every video, audio, and image in common use (and then some).

Scripting

Ffmpeg is a shell command as well as a set of libraries, so it can be invoked directly or as a sub-process.

Multi-threaded

Encoding media is famously intensive, so why not use every core your computer has to offer?

Weaknesses [Strengths]

Complex

This is an application filled with features, and it deals with the tangled mess that is media codecs as well as anything can. It does take time to learn it, and then more time to understand how all of the features of encoding actually work. This is no worse than any other option; media encoding is a difficult enterprise.

Moving Target

Compatibility with media codecs is a moving target, so ffmpeg is subject to sometimes drastic changes on a fairly regular basis. Upgrading too often is generally not advised.

ffmpeg vs. libav

TL;DR
It is commonly said that libav and avconv can replace ffmpeg, but this is not the case. Use ffmpeg unless you have tested libav compatibility with the applications you depend upon.

Ffmpeg was forked into a project called http://libav.org, which in programming terminology means that the code was copied and released as a new application. The applications are basically the same to most users, but not necessarily to the Slackermedia user.

Like ffmpeg, libav provides a collection of libraries as well as a shell application called avconv. To confuse matters further, some Linux distributions aliased the ffmpeg command to point to avconv such that when users typed in ffmpeg, they were actually running avconv commands using libav libraries. Since the two were so similar, this seemed a harmless technicality, and indeed at first it was, but as the projects drifted apart in functionality, problems started cropping up such that an application calling specfically for ffmpeg really did need ffmpeg, not avconv.

After having tested extensively both ffmpeg and libav/avconv since the previous edition of this handbook, it is now Slackermedia's recommendation that you use ffmpeg proper for maximum assured compatibility. The greater Internet may try to assure you that the two are completely interchangeable, but for safety, you should use ffmpeg until applications start adopting libav or until you have tested the applications that you rely upon and feel confident that they are [and always will be] compatible with libav.

Media Compression with Ffmpeg

Understanding how media compression and ffmpeg work doesn't entirely remove the element of chance from media encoding, but it does help you understand how to begin an attempt at transcoding or compressing. In a worst case scenario, it can help you determine what went wrong with a badly compressed video or audio file, and how to fix it.

Nobody compresses or transcodes their media because they want to. You process audio and video through ffmpeg for one of two reasons:

File Size

The file size is too large for convenient storage (on a hard drive) or transmission (over a network).

In this case, you're intentionally compressing media in order to reduce file size. You're taking a loss in quality in favour of convenience.

File Format

A file's video format cannot be played (or cannot be played well) on a device (like a media player) or used in a program (like a video editor, or a streaming server) that you want to use.

In this scenario, you are exclusively concerned with compatibility and would rather not take any loss in quality. Ideally, transcoding from one format to another would provide a file with the exact same quality; it would be a 1:1 process. In fact, however, all pixels are being re-analyzed and re-written, so it's best to think of it as compression even though compression is precisely what you might be trying to avoid.

Know Your Media

Before making an attempt to convert your media to some other format or size, you should first look at the file's current attributes so that you can make intelligent choices about how you convert. To see all the most important information about a video file, use the command:

ffmpeg -i myVideo.mts

The output of this command can be pretty messy (especially if you're not entirely sure what you're looking for), so if you prefer to use a front end that parses the output, then install and use mediainfo:

mediainfo blah blah

Or video-meta;

video-meta blah blah

Or use mediainfo-gui, which presents the same information in a graphical application; it's a handy application, since it accepts drag-and-drop input.

mediainfo.jpg

Take note of attributes such as frame size (the dimensions of the image, in pixels) for video, the bitrate of both the video and audio, the sample rate of audio, the codecs of the video and audio, and the container format all of this is saved into.

Codecs and Containers

Which video and audio codec you choose for your media is often not decided by you, but dictated to you by either a client or the restrictions of a software application. If you're encoding so that you can play a video on a specific media player, gadget, or application that will only play, for instance, h.264, then your choice is clear: h.264. On the other hand, if you're encoding for HTML5 then your choices are Theora, Webm, or h.264. If it's for personal use or for some software that is flexible in the codecs it can work with, then your choices may be nearly limitless.

To complicate matters, your media also must be placed into a specific file format, or “container”. While some container formats are very flexible and allow you to put nearly any combination of video, audio, frame size, aspect ratio, and frame rate into it, others are very strict. The .mpg container, for instance, specifically holds MPEG video (and then only layers 1 or 2), and that's all it can do. The .mkv container, on the other hand, is designed to be as flexible as possible and holds nearly anything you put into it.

If you are encoding for you own use, then you are free to use any container format you please; as long as your media software is happy with it, then anything is fair game. However, if you expect your media to be playable on devices or software that you do not control then you need to choose carefully by reading up on what is supported by the device or software you are targeting.

Ffmpeg will not always prevent you from creating files that are impossible to play back.

Some players will even play containers that contain incorrect codecs (think of it as a “quirks-mode” for media players), but that hardly means that they are “supposed” to, or that they will continue to do so after an update. The safest thing to do is to look at the specs and requirements of your targets, and encode according to their own specifications.

It sounds confusing, and it is, but the good news is that (broadly speaking), all video codecs do basically the same thing: they encode visual images into a signal which can be decoded as an array of pixels which provides the audience the illusion of moving pictures. Audio, similarly, renders audio waves in a digital form (samples); the higher the resolution, the better the sound being reproduced.

Strictly speaking, certain codecs have characteristics and quirks that might make you prefer one to another either artistically (for instance, maybe you prefer one codec's handling of a shallow depth of field, or how another handles shadows) or technically (maybe you need a codec's streaming ability), but these are usually opinions and not things you can find in technical specifications. Don't get too caught up trying to determine which codec is “best” for a specific media file you need to process. Eventually you'll form your own opinions on the subject, but until you have enough experience to intelligently form those opinions, just settle for the codec that best suits your real world needs, and start learning how to process video into it well.

Ffmpeg chooses what codec to use if you keep your command simple, so if you transcode a file from one format to another, you can just define your codec parameters with the file format:

$ ffmpeg -i DSC0023.MTS maria_1_LS_2.mkv

From that command, ffmpeg converts an .mts (muxed mpeg stream) to an .mkv (open source Matroska) format along with codec choices that make sense between those two file formats. This abstracts choice away from you, saving you from potentially illegal [out of spec] choices, although if you know your formats and codecs well enough then define them explicitly:

$ ffmpeg -i DSC10049.MOV -vcodec libxvid -acodec libvorbis aran_18_MCU_1.mkv

These simple format conversions don't take into account the quality or size of your video, but of course there are additional, important attributes of video files, all of which you can manipulate with ffmpeg to change or remain as close to the original as possible.

Frame Size

To reduce the amount of data in a video file, you can reduce the array of pixels it contains. In other words, reduce the frame size. It's not rocket science to understand why a smaller frame size would produce a smaller file; 1920 pixels by 1080 pixels means that the luma and chroma values of 2,073,600 pixels are being individually managed by your computer…30 times (or sometimes more) per second, so reducing the volume of pixels reduces the file size by orders of magnitude.

The -s flag denotes frame size in an ffmpeg command and takes as its argument a literal pixel count such as 1920×1080 or 1280×720 or by common names such as hd1080, hd720, vga, and so on. Here is an example of a command that converts the format and slashes the file size:

maria_klandestina.jpg

$ ffmpeg -i editors-cut_goldmaster.mkv -s hd720 editors-cut_reference.mov

maria_klandestina.jpg

The eternal question is how much you are willing to sacrifice quality for file size. You could reduce the frame size until your video is down to a more manageable file size, but any full-screen playback would then require software to up-res more and more, and screen resolutions of monitors on the market are, of course, continually increasing. Luckily, the frame size isn't the only method of reducing the size of the resultant file.

Bit Rate

Rather than reducing the number of pixels contained in an image, you can reduce the quality of how the original pixels or audio samples are represented.

This is the bit rate of a media file and it is the potential maximum amount of data that the video will allot to re-creating your image or your sound. It is usually defined in either kilobits per second (Kbps) or megabits per second (Mbps).

Still frame from video with a low bit rate. Image by Klaatu.

Think of all the advantages that Blu-ray, for instance, has over DVD, as a way to visualize the impact that bit rate has on video. Blu-ray movies have a larger frame size, and yet they are clearer and sharper than DVDs, the textures and objects in the background are less likely to become “muddy”, there's no accidental posterisation in the extreme dark areas of the image, and there are fewer digital artifacts in general.

Still frame from higher bit rate video. Image by Klaatu.

To decide what bit rate to use for your own video, you should first know its current bit rate, which you'll get from that initial ffmpeg -i (or mediainfo or video-meta) command you ran against your source video file.

Since finding out the bit rate is simple enough, arbitrarily deciding upon a lower bit rate to reduce the file size is easy. But how do you make an informed decision about bit rate? First, you must understand how encoders “see” audio and video. It's certainly not how our human eyes perceive them.

I,B,P Frames

The term “frames” in video is misleading, even when speaking of Progressive (1080p, 720p) video. On celluloid film there are literal frames; one new image per 1/24th or 1/25th of a second. The human eye can see each frame, and on a traditional film editing bench they do in fact look at each frame as they decide where to make a splice. Those are frames.

Photo by mannyisdead

With video, data-complete frames (ie, a complete coherent picture) don't need to happen every 1/24th of a second, since a computer can retain pixels that don't change from previous frames. Imagine a video of a university professor standing in front of a whiteboard giving a lecture, in which half the whiteboard never changes and nothing ever crosses in front of it; the encoder could theoretically draw it once at the beginning of the video and never re-draw it for the full three-hours of the lecture.

An encoder may borrow pixels from previous or future frames.

In ffmpeg, the frequency of high quality “intra” frames (I-frames) is controlled with the GOP setting (Group of Pictures). Intuition might lead you to believe that a video consisting exclusively of I-frames would be a lossless video, but in practise it tends to waste bit rate with little or no gain in quality. This is because effective bit budgeting depends not upon whether a frame is “complete” but how many individual pixels are being calculated and drawn on screen.

If the encoder can borrow pixels from the previous or next frame and save bandwidth by not re-drawing those pixels, then a forced I-frame that necessarily re-draws all pixels would actually demand maximum bit rate for that frame either at the expense of the next few frames, or the overall file size. Alternatively, the encoder can blob a group of similar pixels together and, essentially, draw three or 10 or 30 pixels for the price of a single sample.

More important is the timing of the I-frames, and this would be a tedious thing for us to control; luckily it's one of the functions of a video encoder. An encoder compares frames and determine what would benefit most from a full re-draw (producing an I-frame), what would benefit from borrowing pixels from the previous frame (producing a P-frame(, and what would benefit from borrowing pixels from both the previous and the following frame (B-frame).

Most of the modern encoders like Ogg, VPX (webm), xvid, and x.26? are very good at setting GOP sizes so that the I-frames are intelligently placed throughout the video, so under normal circumstances most people will never need to set the GOP size. But if you are having problems with your results, then you might need to alter the GOP size as part of your command. For instance, if your encoded video is too “blocky”, then you need more I-frames so that there are more frequent high-quality images to sample. If your encoded video is still too big (in file size) then you might need to decrease the frequency of I-frames by increasing the gop number.

If you want to try different GOP sizes, you might want to start with your frame rate multiplied by 2. The GOP size is set with the -g flag. For example:

$ ffmpeg -i chase_12_LS_2.mkv -s hd720 -g 60 -v:b 21000k chase_ref_g60.mkv

A GOP size of 60 at a frame rate of 29.97 would instantiate an I-Frame every 2 seconds (60 divided by 29.97). This may be too frequent, or possibly not often enough; it depends on the content of the shot, the size of the frame, and the bit rate.

If that doesn't work for you, then increase or decrease the GOP size by the frame rate, and then work with smaller increments until you reach a quality and file size that makes you happy.

Once you're satisfied, take note of what settings you used, and what kind of video file it was (original codec, destination codec, fast or slow paced content, and so on) for future reference.

Variable and Constant Bit Rates

A computer is particularly well suited for determining just how much each of a few million pixels has changed from one fraction of a second to the next, so one of the best ways to decide a bit rate is to just leave it up to the encoder. The -qscale flag in ffmpeg provides a variable bit rate (VBR), a powerful option that lets the encoder decide what bit rate to use depending on the complexity of some set of frames. In an action film, for example, the bit rate would be kept low during an expository conversation in a quiet restaurant wherein groups of pixels can be re-used for seconds at a time, but it would be boosted once the conversation inevitably turns into an explosive action scene.

Set to 1, -qscale tells the encoder to maintain excellent overall quality, whatever bit rate it takes. Conversely, setting -qscale to 31 tells the encoder to allow quality to suffer. Depending on what you choose between these two extremes, the file size savings may not be enough for your purposes. In that case, you might need to hard-code the bit rate, which turns out to be more of an art than a science.

While overall quality does have a relationship between frame size and bit rate (the larger the frame size, the greater the bit rate “wants” to be), there is no algorithm for the process because bit rate is also bound to pixel activity within the frame.

A three-hour college lecture will look fine at a lower bit rate while a 90-minute action film would suffer under the same rate. This isn't only because of the difference in pixel activity; there's an artistic quality to defining “quality”. Your eye is more discerning of the action film; most people are a lot less forgiving of digital artifacts in their entertainment than in a boring lecture video.

So it comes back to knowing your video. Skim through your video and make some general classifications; is it fast-paced action video, or an educational lecture, or a travel video, a student film or a million-dollar blockbuster, or a family vacation? Let this guide you toward what range of bit rates to consider. It helps to think in familiar terms like below-DVD-quality, DVD-quality, or Blu-ray Quality. For below-DVD-quality, start in the high-hundreds for standard definition video or the low thousands for high definition video. For DVD-quality, start at 7000kbps or 8000kbps for standard definition and 12000kbps to 15000kbps for HD. For Blu-ray quality, look at 25000kbps to 35000kbps.

A command providing a high quality VBR encode with reduced file size:

$ ffmpeg -i DSC0032.MOV -s hd720 -aspect 16:9 \
-qscale 1 -vcodec libxvid -acodec libvorbis 32_review.mkv

Or in the case that you require more control over bit rate, you might specify a constant bit rate:

$ ffmpeg -i DSC0032.MOV -s hd720 -aspect 16:9 \
-v:b 15000k -vcodec libxvid -acodec libvorbis 32_review.mkv

Notice that in both examples the size is reduced from the original 1080p to 720p. This, combined with lowering the bit rates from the original video, compresses the file to a more reasonable file size.

If you were only transcoding and were trying to achieve zero loss of quality, then a simpler and less specific command would do;

$ ffmpeg -i DSC0032.MOV -qscale 1 32_review.mkv

2-pass Encoding

One way to help the encoder achieve the best possible quality at the best possible file size is to use 2-pass encoding. The first pass analyses your video and creates a log file which ffmpeg can then use during the second pass to do the actual encoding of video and audio. A 2-pass encode doesn't mean your file will be any smaller, necessarily, but it almost always ensures that the encoding will be of a higher quality for the file size you end up with. In short, you get “more bang for your buck”.

To perform a 2-pass encode, the first pass must be exclusively analytical:

$ ffmpeg -i well_3_MLS_1.mkv -vcodec libxvid -an \
-pass 1 -f rawvideo -y /dev/null

This causes ffmpeg to write the new video to /dev/null (in other words, it throws the results out) while writing data about the frames to a file called ffmpeg2pass, saved in the current directory. Since audio is not accounted for during this process, you can use the -an flag to ignore the audio stream (think “audio: null” or just “audio? no!”).

The second pass is performed like your usual ffmpeg command, with the addition of the -pass 2 flag and the name of the log file to which ffmpeg should refer, such as -passlogfile ffmpeg2pass.

$ ffmpeg -i well_3_MLS_1.mkv -vcodec libxvid -acodec libvorbis \
-r 25 -ar 44100 -b:a 128k -qscale 5 -s hd720 -pass 2 \
-passlogfile ffmpeg2pass well_ref.mkv

Frame Rate

Any video has a natural frame rate, depending on how it was recorded. As with everything else, you can find out a video's native frame rate with ffmpeg -i or a tool like mediainfo or video-meta or even a good player like vlc. Common values are 29.97 for standard definition videos, 23.98 for DVD, and 24 and 25 from the tradition of film, 48 and 60 for higher frame rate video (often used to reduce strobing). As you might expect, reducing the frame rate reduces the resulting file size and increases the streamability of your video, although its primary benefit is in streaming.

The pay-off with a reduced frame rate is mostly aesthetic; the motion is not as smooth as with a video's native frame rate. The more you reduce the frame rate, the more drastic and noticeable this becomes. Exactly when you or your audience actually start to notice it is an entirely different matter and depends on how much movement there is in the frame in the first place, and whether or not the viewer tends to notice things like that. Experiment with different frame rates to witness the practical difference between them.

In practise, lowering the frame rate does not reduce the file size as much as you might think (taking a 60 fps video down to 30 with no other change, for instance, does not reduce the file size by half as you might expect). It's usually safe to leave the frame rate at its native value, or at 24 or so in the cases of high frame rate source files, unless your destination device or application demands a change.

Frame rate is controlled in ffmpeg with the -r flag:

$ ffmpeg -i desk_b-roll.mkv -r 18 desk_18fps.mkv

Audio

Since most videos have sound, at least some portion of your overall file size is determined by how its audio has been encoded. The same general principles that apply to video also apply to audio, with a few variations.

Audio has a bit rate, assigned with the -b:a flag, which determines how much data is used to recreate the audio waves. You're probably already familiar with this idea since online music stores usually advertise their song quality as either 128kbps or the higher quality 192kbps or 256kbps versions. The higher bit rates usually provide better subtleties, with the more modest ranges (128kbps is a good middle-of-the-road number) providing a “good enough” quality, while the lower ranges start to blatantly sacrifice quality. Once again, how much this matters to you depends on you, and the content of the media itself. A lecture video, for instance, can tolerate 80kbps encoding while a lush musical soundtrack would suffer arguably even at 192kbps.

Audio also has channels. As you might expect, the more channels you have, the larger the file will be. It's common practise to reduce any surround sound soundtrack to stereo, and in some cases to simply use one mono channel. ffmpeg uses the -ac flag to define how many channels you want, with 2 being stereo and 1 being mono.

The sample rate of audio defines how many samples of a soundwave is used per second, and is measured in thousands of samples per second. DVD quality is considered 48000hz (48khz) while CD quality is 44100hz (44.1khz). Anything lower (32000hz, 22050hz, 16000hz) suffers noticeably in quality although they do have remarkable results on file size savings. However, when transcoding, changing the sample rate of the audio drastically could throw the audio track out of sync with your video, so use this ability carefully. Ffmpeg uses the -ar flag to define sample rate.

Here is an example of defining the type of sound encoding during video compression:

$ ffmpeg -i hackmovie_goldmaster.mkv -s hd720
-b:v 18000k -b:a 128k -ar 44100 -ac 2 hackmovie_online.mkv

Threads

If you're using a computer with multiple CPU cores, take advantage of the -threads flag. It's simple:

$ ffmpeg -i hackmovie_goldmaster.mkv -threads 8 hackmovie.webm

The rule of thumb for calculating the number of threads your computer can handle is to take the number of CPU cores and either:

  • To encode and still be able to use your computer: Number of cores = threads
  • To encode overnight: (Number of cores) * 2 = threads

To find out how many cores your CPU has, look into /proc:

$ grep proc /proc/cpuinfo
processor:    : 1
processor:    : 2
processor:    : 3
processor:    : 4
processor:    : 5
processor:    : 6
processor:    : 7
processor:    : 8

Adjust as needed, depending on the actual performance of your computer and what you need to work on while it encodes.

How to Test Before Encoding

Because a computer is doing the work, many people view video encoding as a strictly technical process. And certainly, the encoders we use should be respected; no sane artist or programmer would want to decide how each individual frame will be encoded when there 30 of them per second, per 90 minute movie.

On the other hand, there's a lot of artistry in looking at a video and making intelligent choices when issuing the encoding commands to the computer. Take into account what the video content is, how people will be watching it, what action is happening within the frames and what qualities are important. Use these artistic impressions to guide you in the choices you make about frame size, bit rate, and frame rate. Encode with two passes, and encode multiple versions of the same video. Compare the results. In no time, you'll get a good feeling for what different codecs have to offer, what kinds of videos can handle different kinds of compression.

It's not easy to test encoding when each encode takes 6 hours, only to be thrown out for another try. Luckily, ffmpeg has a start and end time, allowing you to encode small sections of a video.

The -ss option dictates what time to start encoding, and the -t dictates how long to encode for (not the timecode at which to stop, as a video editor would expect; the values are start time and duration, not in and out). For example, to start encoding at 3 minutes and 30 seconds into a video, and to encode for 1 minute:

$ ffmpeg -i hackmovie_snippet.mkv -ss 03:30:00 -t 01:00:00 \
-s hd720 -threads 8 compression-test.webm

Run a few hundred encoding tests overnight, study the results, and you'll be an expert in no time.

Performance Boost

Linux video and audio editors generally support every possible codec they can, and that's quite a lot. This affords you great freedom, but should you experience performance issues while working with a compressed file format (such as MP3, Vorbis, Theora, Dirac, and so on), consider extracting the video and audio streams from their compressed containers and working with them as native or [nearly] uncompressed files.

If you do find that you need to convert media, you obviously want to avoid losing quality, so ensure that you are using a less compressed container, and that you are retaining the exact same settings as the source video.

If mediainfo or video-meta reports a bit rate of 67M at 29.97 fps, then when you convert, use at least 67M for the bit rate and 29.97 for the frame rate.

Here are some example commands for the various native and [mostly] uncompressed formats:

WAV

Uncompressed PCM audio.

ffmpeg -i foo.bar -vn -ar 48000 foo.wav

AIFF

Uncompressed PCM audio.

ffmpeg -i foo.bar -vn -ar 48000 foo.aiff

AU

Sun Microsystems uncompressed PCM data.

ffmpeg -i foo.bar -vn -ar 48000 foo.au

Native and [mostly] uncompressed video formats:

FFV1

Native ffmpeg video format.

ffmpeg -i foo.bar -an -vcodec FFV1 -b:v 80M -threads 8 foo.mkv

Huff YUV

Lossless video format.

ffmpeg -i foo.bar -an -vcodec huffyuv  -b:v 80M -threads 8 foo.mkv

MOV

Quicktime movie file.

ffmpeg -i foo.bar -an -vcodec libquicktime  -b:v 80M -threads 8 foo.mov

There are other formats, but these are well supported and tested.

Lossless Codecs

Ffmpeg supports a number of lossless formats ideal for Gold Masters and long term storage, including FFV1 and HuffYUV for video, FLAC and WAVPACK for sound.

R S Q