**This is an old revision of the document!**

Ffmpeg

Ffmpeg is the backbone of many video-related applications from players to editors (and is in fact even both of these things in itself). For the most flexible video and audio system possible, Ffmpeg should be installed directly from the latest source code with as many codec options turned on as possible. Find the code at http://ffmpeg.org.

Strengths [Weaknesses]

Codec Indifference

Ffmpeg contains one of the most powerful libraries of multimedia (on any platform, evident by how common it is used in both open and closed source applications). A complete build of ffmpeg can ingest, process, and output nearly every video, audio, and image in common use (and then some).

Scripting

Ffmpeg is a shell command as well as a set of libraries, so it can be invoked directly or as a sub-process.

Multi-threaded

Encoding media is famously intensive, so why not use every core your computer has to offer?

Weaknesses [Strengths]

Complex

This is an application filled with features, and it deals with the tangled mess that is media codecs as well as anything can. It does take time to learn it, and then more time to understand how all of the features of encoding actually work. This is no worse than any other option; media encoding is a difficult enterprise.

Moving Target

Compatibility with media codecs is a moving target, so ffmpeg is subject to sometimes drastic changes on a fairly regular basis. Upgrading too often is generally not advised.

ffmpeg2theora

ffmpeg2theora is not a part of <zapp>ffmpeg</zapp> but an independent frontend for converting any format that ffmpeg can read into the free Ogg Vorbis and Ogg Theora codecs.

There may be no real advantage to using ffmpeg2theora rather than just using ffmpeg, but it does have an abbreviated set of options since its goal is far more focused than the near-infinite possibilities of ffmpeg.

Tip

For some video (especially high-res video meant to be dynamically scaled on any variety of device with any variety of screen size), you should also consider the webm format, which is also free and open source like Theora, directly through ffmpeg.

The command structure is similar, so if you've used ffmpeg or mencoder to transcode media then it will all feel very familiar:

$ ffmpeg2theora input_file.avi -x 1920 -y 1080 -V 21000kbps -A 320kbps -c 2 -H 44100 -o output.ogv

In other words:

$ ffmpeg2theora [input filename] -x
[target horizontal pixel count] -y [target vertical
pixel count] -V [target bitrate in kbps] -A [audio
bit rate] -c [audio channels] -H [audio sample
rate in Hz] -o [output filename]

ffmpeg2theora is available from slackbuilds.org and requires no special compile options.

FFmpeg HOWTO

Unlike proprietary video converters, ffmpeg is versatile; since it has no commercial interests, it will decode and encode any format aside from those that proprietary vendors successfully obfuscate. ffmpeg is primarily a command line application, so controlling it is direct, simple, and can be scripted to handle repetitious tasks.

Understanding how video compression and ffmpeg work doesn't entirely remove the element of chance from video encoding, but it does help you understand how to begin an attempt at transcoding or compressing. In a worst case scenario, it can help you determine what went wrong with a badly compressed video and how to fix it.

Why Compress Video?

Nobody compresses or transcodes their video because they want to. We process video through ffmpeg for one of two reasons:

  • The file size is too large for convenient storage (on a hard drive) or transmission (over a network).

    In ths case, you're intentionally compressing video in order to reduce file size. You're taking a loss in quality in favour of convenience.

  • A file's video format cannot be played (or cannot be played well) on a device (like a media player) or used in a program (like a video editor, or a streaming server) that we want to use.

    In this scenario, you are exclusively concerned with compatibility and would rather not take any loss in quality. Ideally, transcoding from one format to another would provide a file with the exact same quality; it would be a 1:1 process. In fact, however, all pixels are being re-analyzed and re-written, so it's best to think of it as compression even though compression is precisely what you might be trying to avoid.

Know Your Video

Before making an attempt to convert your video to some other format or size, you should first look at the video's current attributes so that you can make intelligent choices about how you convert. To see all the most important information about a video file, use the command:

$  ffmpeg -i myVideo.mts

The output of this command can be pretty messy (especially if you're not entirely sure what you're looking for), so if you prefer to use a front end that parses the output then install and use either ??? or ??? or even vlc, all of which provide the same information in easy-to-read output.

Codecs and Containers

Which video and audio codec you choose for your video is often dictated to you. If you're encoding so that you can play a video on a specific media player, gadget, or software that will only play h.264, for instance, then your choice is clear. On the other hand, if you're encoding for HTML5 then your choices are Theora, Webm, or h.264. If it's for personal use or for some software that is flexible in the codecs it can work with, then your choices may be nearly limitless.

Your video probably also contains audio, so you'll need to choose an audio codec as well.

To complicate matters, your video also must be placed into a specific file format, or “container”. While some container formats are very flexible and allow you to put nearly any combination of video, audio, frame size, aspect ratio, and frame rate into it, others are very strict. The .mpg container, for instance, specifically holds MPEG video (and then only layers 1 or 2), and that's all it can do. The .mkv container, on the other hand, is designed to be as flexible as possible and holds nearly anything you put into it.

If you are encoding for you own use, then you are free to use any container format you please; as long as your media software is happy with it, then anything is fair game. However, if you expect your media to be playable on devices or software that you do not control then you need to choose carefully by reading up on what is supported by the device or software you are targeting. Ffmpeg will not always prevent you from creating files that are impossible to play back, and some players will even play containers that contain incorrect codecs (think of it as a “quirks-mode” for video players), so if you're experimenting with new file formats then look them up to make sure you stay safely within spec.

It sounds confusing, and it is, but the good news is that (broadly speaking), all video codecs do basically the same thing: they encode visual images into a signal which can be decoded as an array of pixels which provides the audience the illusion of moving pictures. Strictly speaking, certain codecs have characteristics and quirks that might make you prefer one to another either artistically (for instance, maybe you prefer one codec's handling of a shallow depth of field, or how another handles shadows) or technically (maybe you need a codec's streaming ability), but these are usually opinions and not things you can find in technical specifications. Don't get too caught up trying to determine which codec is “best” for a specific video you need to process. Eventually you'll form your own opinions on the subject, but until you have enough experience to intelligently form those opinions, just settle for the codec that best suits your real world needs, and start learning how to process video into it well.

Ffmpeg chooses what codec to use if you keep your command simple, so if you transcode a file from one format to another, you can just define your codec parameters with the file format:

$ ffmpeg -i sintel_1080p.mkv sintel.mov

From that command, ffmpeg converts a .mkv to a .mov format along with codec choices that make sense for Quicktime decoders. This abstracts choice away from you, saving you from potentially illegal [out of spec] choices, although if you know your formats and codecs well enough then define them explicitly:

$ ffmpeg -i DSC10049.MOV -vcodec libxvid

-acodec libvorbis output.mkv

These simple format conversions don't take into account the quality or size of your video, but of course there are additional, important attributes of video files, all of which ffmpeg can manipulate to change or stay as close to the same as possible.

Frame Size

To reduce the amount of data in a video file, you can reduce the array of pixels it contains. In other words, reduce the frame size. It's not rocket science to understand why a smaller frame size would produce a smaller file; 1920 pixels by 1080 pixels means that the luma and chroma values of 2,073,600 pixels are being individually managed by your computer…30 times (or sometimes more) per second, so reducing the volume of pixels reduces the file size by orders of magnitude.

The -s flag denotes frame size in an ffmpeg command and takes as its argument a literal pixel count such as 1920×1080 or 1280×720 or by common names such as hd1080, hd720, vga, and so on. Here is an example of a command that converts the format and slashes the file size:

$ ffmpeg -i sintel_1080p.mkv -s hd720

sintel.mov

The eternal question is how much you are willing to sacrifice quality for file size. You could reduce the frame size until your video is down to a more manageable file size, but any full-screen playback would then require software to up-res more and more, and screen resolutions of monitors on the market are, of course, continually increasing. Luckily, the frame size isn't the only method of reducing the size of the resultant file.

Bit Rate

Rather than reducing the number of pixels contained in an image, you can reduce the quality of how the original pixels are represented. This is the bit rate of a video and it is the potential maximum amount of data that the video will allot to re-creating your image, per second. It is generally defined in either kilobits per second (Kbps) or megabits per second (Mbps).

Think of all the advantages that Blu-ray, for instance, has over DVD, as a way to visualize the impact that bit rate has on video. Blu-ray movies have a larger frame size, and yet they are clearer and sharper than DVDs, the textures and objects in the background are less likely to become “muddy”, there's no accidental posterization in the extreme dark areas of the image, and there are fewer digital artifacts in general.

To decide what bit rate to use for your own video, you should first know its current bit rate, which you'll get from that initial ffmpeg -i (or mediainfo or video-meta) command you ran against your source video file.

Since finding out the bit rate is simple enough, arbitrarily deciding upon a lower bit rate to reduce the file size is easy. But how do you make an informed decision about bit rate? First, you must understand how encoders “see” video. It's certainly not how our human eyes perceive them.

I,B,P Frames

The term “frames” in video is misleading, even when speaking of Progressive (1080p, 720p) video. On celluloid film there are literal frames; one new image per 1/24th (or 25/th) of a second. The human eye can see each frame, and on a traditional film editing bench they do in fact look at each frame as they decide where to make a splice. Those are frames.

With video, data-complete frames (ie, a complete coherent picture) don't need to happen every 1/24th of a second, since a computer can retain pixels that don't change from previous frames. Imagine a video of a university professor standing in front of a whiteboard giving a lecture, in which half the whiteboard never changes and nothing ever crosses in front of it; the encoder could theoretically draw it once at the beginning of the video and never re-draw it for the full three-hours of the lecture.

An encoder may borrow pixels from previous or future frames

In ffmpeg, the frequency of high quality “intra” frames (I-frames) is controlled with the GOP setting (Group of Pictures). Intuition might lead you to believe that a video consisting exclusively of I-frames would be a lossless video, but in practise it tends to waste bit rate with little or no gain in quality. This is because effective bit budgeting depends not upon whether a frame is “complete” but how many individual pixels are being calculated and drawn on screen.

If the encoder can borrow pixels from the previous or next frame and save bandwidth by not re-drawing those pixels, then a forced I-frame that necessarily re-draws all pixels would actually demand maximum bitrate for that frame either at the expense of the next few frames, or the overall file size. Alternatively, the encoder can blob a group of similar pixels together and, essentially, draw three or 10 or 30 pixels for the price of a single sample.

More important is the timing of the I-frames, and this would be a tedious thing for us to control; luckily it's one of the functions of a video encoder. An encoder compares frames and determine what would benefit most from a full re-draw (producing an I-frame), what would benefit from borrowing pixels from the previous frame (producing a P-frame(, and what would benefit from borrowing pixels from both the previous and the following frame (B-frame).

Most of the modern encoders like ogg (theora), vpx (webm), xvid, and x.26? are very good at setting GOP sizes so that the I-frames are intelligently placed throughout the video, so under normal circumstances most people will never need to set the GOP size. But if you are having problems with your results, then you might need to alter the GOP size as part of your command. For instance, if your encoded video is too “blocky”, then you need more I-frames so that there are more frequent high-quality images to sample. If your encoded video is still too big (in file size) then you might need to decrease the frequency of I-frames by increasing the gop number.

If you want to try different GOP sizes, you might want to start with your frame rate multiplied by 10. The GOP size is set with the -g flag. For example:

$ ffmpeg -i foo.avi -s vga -g 300 -v:b 2500 foo.mkv

A GOP size of 300 at a frame rate of 29.97 would instatiate an I-Frame every 10 seconds (300 divided by 30).

If that is too infrequent, decrement multiples of your frame rate until you hit on something that works for you. Then take note of what you use, and what kind of video file it was (original codec, destinatior codec, fast or slow paced content, and so on) for future reference.

Variable and Constant Bit Rates

A computer is particularly well suited for determining just how much each of a few million pixels has changed from one fraction of a second to the next, so one of the best ways to decide a bit rate is to just leave it up to the encoder. The -qscale flag in ffmpeg provides a variable bit rate (VBR), a powerful option that lets the encoder decide what bit rate to use depending on the complexity of some set of frames. In an action film, for example, the bit rate would be kept low during an expository conversation in a quiet restaurant wherein groups of pixels can be re-used for seconds at a time, but it would be boosted once the conversation inevitably turns into an explosive action scene.

Set to 1, -qscale tells the encoder to maintain excellent overall quality, whatever bit rate it takes. Conversely, setting -qscale to 31 tells the encoder to allow quality to suffer. Depending on what you choose between these two extremes, the file size savings may not be enough for your purposes. In that case, you might need to hard-code the bit rate, which turns out to be more of an art than a science.

While overall quality does have a relationship between frame size and bit rate (the larger the frame size, the greater the bit rate “wants” to be), there is no algorithm for the process because bit rate is also bound to pixel activity within the frame.

A three-hour college lecture will look fine at a lower bit rate while a 90-minute action film would suffer under the same rate. This isn't only because of the difference in pixel activity; there's an artistic quality to defining “quality”. Your eye is more discerning of the action film; most people are a lot less forgiving of digital artifacts in their entertainment than in a boring lecture video.

So it comes back to knowing your video. Skim through your video and make some general classifications; is it fast-paced action video, or an educational lecture, or a travel video, a student film or a million-dollar blockbuster, or a family vacation? Let this guide you toward what range of bit rates to consider. It helps to think in familiar terms like below-DVD-quality, DVD-quality, or Blu-ray Quality. For below-DVD-quality, start in the high-hundreds for standard definition video or the low thousands for high definition video. For DVD-quality, start at 7000kbps or 8000kbps for standard definition and 12000kbps to 15000kbps for HD. For Blu-ray quality, look at 25000kbps to 35000kbps.

A command providing a high quality VBR encode with reduced file size:

$ ffmpeg -i sintel_1080p.avi -s hd720

-aspect 16:9 -qscale 1 sintel.mkv

Or in the case that you require more control over bit rate, you might specify a constant bit rate:

$ ffmpeg -i sintel_1080p.avi -s hd720 -aspect 16:9 -v:b 15000 sintel.mkv

Notice that in both examples the size is reduced from the original 1080p to 720p. This, combined with lowering the bitrates from the original video, compresses the file to a more reasonable file size.

If you were only transcoding and were trying to achieve zero loss of quality, then a simpler and less specific command would do;

$ ffmpeg -i sintel_1080p.avi -qscale 1 sintel.mkv

2-pass Encoding

One way to help the encoder achieve the best possible quality at the best possible file size is to use 2-pass encoding. The first pass analyses your video and creates a log file which ffmpeg can use during the second pass when it actually encodes video and audio. A 2-pass encode doesn't mean your file will be any smaller, necessarily, but it does mean that the encoding will be better and possibly more efficient with budgeting bits.

To perform a 2-pass encode, the first pass must be exclusively analytical:

$ ffmpeg -i sintel.mkv -vcodec libxvid -an
-pass 1 -f rawvideo -y /dev/null

This causes ffmpeg to write the new video to /dev/null (in other words, it throws the results out) while writing data about the frames to a file called ffmpeg2pass, saved in the current directory. Since audio is not accounted for during this process, you can use the -an flag to ignore the audio stream.

The second pass is performed like your usual ffmpeg command, with the addition of the -pass 2 flag and the name of the log file to which ffmpeg should refer: -passlogfile ffmpeg2pass.

$ ffmpeg -i sintel.mkv -vcodec libxvid -acodec
libvorbis -r 18 -ar 44100 -b:a 80k -qscale 10 -s vga -pass 2
-passlogfile ffmpeg2pass sintel_small.mkv

Frame Rate

Any video has a natural frame rate, depending on how it was recorded. As with everything else, you can find out a video's native frame rate with ffmpeg -i or a tool like mediainfo or vlc. Common values are 29.97 for standard definition videos, 23.98 for DVD, and 24, 48, and 60 for high definition. As you might expect, reducing the frame rate will reduce the resulting file size and increase the streamability of your video.

The pay-off with a reduced frame rate is mostly aesthetic; the motion is not as smooth as with a video's native frame rate. The more you reduce the frame rate the more drastic and noticeable this becomes. Exactly when you or your audience actually start to notice is an entirely different matter and depends on how much movement there is in the frame in the first place, and whether or not the viewer tends to notice things like that. Experiment with different frame rates to witness the practical difference between them.

In practise, lowering the frame rate does not reduce the file size as much as you might think (taking a 60 fps video down to 30 with no other change, for instance, does not reduce the file size by half as you might expet). It's usually safe to leave the frame rate at its native value, or at 24 or so in the cases of high frame rate source files, unless your destination device or application demands a change.

Frame rate is controlled in ffmpeg with the -r flag:

$ ffmpeg -i foo.mkv -r 18 foo_18fps.mkv

Audio

Since most videos have sound, at least some portion of your overall file size is determined by how its audio has been encoded. The same principles apply to audio, with a few variations in terminology.

Audio has a bit rate, assigned with the -b:a flag, which determines how much data is used to recreate the audio waves. You're probably already familiar with this idea since online music stores usually advertises their song quality as either 128kbps or the higher quality 192kbps or 256kbps versions. The higher bit rates usually provide better subtleties, with more modest ranges (128kbps is a good middle-of-the-road number) provide a “good enough” quality, while the lower ranges start to noticeably sacrifice quality. Once again, how much this matters to you depends on you, and the content of the video itself. A lecture video can tolerate 80kbps encoding while a video with a lush soundtrack would suffer.

Audio also has channels. As you might expect, the more channels you have, the larger the file will be. It's common practise to reduce any surround sound soundtrack to stereo, and in some cases to simply use one mono channel. ffmpeg uses the -c flag to define how many channels you want, with 2 being stereo and 1 being mono.

The sample rate of audio defines how many samples of a soundwave is used per second, and is measured in thousands of samples per second. DVD quality is considered 48000hz while CD quality is 44100hz. Anything lower (32000hz, 22050hz, 16000hz) suffers noticeably in quality although they do have remarkable results on file size savings. However, when transcoding, changing the sample rate of the audio drastically could throw the audio track out of sync with your video, so use this ability carefully. ffmpeg uses the -ar flag to define sample rate.

Here is an example of defining the type of sound encoding during video compression:

$ ffmpeg -i sintel_1080p.avi -s hd720 -b:v 8000 -b:a 128k -ar 44100 -c 1 sintel.mkv

Threads

If you're using a computer with multiple CPU cores, you can take advantage of the -threads flag. It's simple:

$ ffmpeg -i sintel.mkv -threads 8 sintel.mov

How to Test Before Encoding

Because a computer is doing the work, many people view video encoding as a strictly technical process. And certainly, the encoders we use should be respected; no sane artist or programmer would want to decide how each individual frame will be encoded when there 30 of them per second, per 90 minute movie.

On the other hand, there's a lot of artistry in looking at a video and making intelligent choices when issuing the encoding commands to the computer. Take into account what the video content is, how people will be watching it, what action is happening within the frames and what qualities are important. Use these artistic impressions to guide you in the choices you make about frame size, bitrate, and frame rate. Encode with two passes, and encode multiple versions of the same video. Compare the results. In no time, you'll get a good feeling for what different codecs have to offer, what kinds of videos can handle different kinds of compression.

It's not easy to test encoding when each encode takes 6 hours, only to be thrown out for another try. Luckily, ffmpeg has a start and end time, allowing you to encode small sections of a video.

The -ss option dictates what time to start encoding, and the -t dictates how long to encode for (not the timecode at which to stop, as a video editor would expect; the values are start time and duration, not in and out). For example, to start encoding at 3 minutes and 30 seconds into a video, and to encode for 1 minute:

$ ffmpeg -i sintel.mov -ss 00:03:30:00 -t 00:01:00:00 -threads 8 sintel.mkv

Run a few hundred encoding tests overnight, study the results, and you'll be an expert in no time.

See Also
mencoder