Although the ffmpeg (and avconv) program has a relatively intuitive command-line interface, given the diversity and complexity of the functionality that it exposes, there are still many operations which can be difficult to express. I found letterboxing (and pillarboxing) to be one of those operations, so in order to save others the trouble of working out the details, this post will develop a command for doing boxing with ffmpeg/avconv.
A Quick Note on Terminology
For the unfamiliar, ffmpeg is both a command and the name of the project (more properly written FFmpeg) which developed the ffmpeg command as well as a significant amount of other multimedia software. The avconv program is a fork of ffmpeg by the Libav project. The relationship between the two projects is a bit complex (see this StackOverflow question and linked pages for some details), but all commands in this post should work with either ffmpeg or avconv. Feel free to use whichever you prefer.
The process that this post is attempting to simplify can be either letterboxing, adding horizontal bars to an image, or pillarboxing, adding vertical bars to an image. From this point forward, either process will simply be referred to as “boxing”.
Objective
Why would someone want to box video? My particular motivation is to convert video for use on mobile devices, which often require video to have particular resolutions in order to take advantage of hardware acceleration. When the aspect ratio of the video does not match the ratio of the desired resolution it’s necessary to either box or stretch the video. So, I’m boxing it. More generally, any time the aspect of the source video does not match the aspect of the desired output, boxing may be necessary.
Filtering the Video
Ffmpeg/avconv supports a number of different filters for performing video manipulation along with a formula evaluation syntax for configuring them. This allows users to write arithmetic formulas using predefined constants to specify the behavior of the filters based on values that may change based on the multimedia input.
Boxing the video will require both the scale
and pad
filters, scale
to
fit the video into the target resolution and pad
to add the bars. First,
scaling the video requires calculating the output resolution of the video
without any boxing. The most desirable output resolution in this case is one
which preserves the source aspect ratio (so the video is not stretched) and
fills the screen either horizontally or vertically. This can be done by
calculating a scale factor and applying it equally, as follows:
scale=iw*min($MAX_WIDTH/iw\,$MAX_HEIGHT/ih):ih*min($MAX_WIDTH/iw\,$MAX_HEIGHT/ih)
Note that $MAX_WIDTH
and $MAX_HEIGHT
should be replaced with the desired
output width and height, or set as variables in a shell script. Also note
that the scale factor (the min
function) is calculated twice. It could
probably be stored in a variable and reused, but I am not sure of the correct
syntax. Finally, note that the backslash before the commas is required
because commas are used to separate filters and we are using it to separate
function arguments in this case.
Now that the video has been scaled to fit the desired output resolution, the
pad
filter can be used to add the appropriate bars. pad
requires the
output width and height as well as the offset for where the input should be
placed within the defined output and, optionally, a color. To add equal-sized
pads simply requires that the offset is half of the size difference between
the output and the input in each dimension, as follows:
pad=$MAX_WIDTH:$MAX_HEIGHT:(ow-iw)/2:(oh-ih)/2
Again, $MAX_WIDTH
and $MAX_HEIGHT
should be replaced with the desired
output width and height, or set as variables in a shell script.
Dealing with Anamorphic Video (Advanced)
It’s possible that the input video is intended to be displayed at a resolution which has a different aspect ratio than the source file, called an anamorphic format. I have not had success playing anamorphic video on mobile devices (probably in part because I don’t really understand it and in part because it is rather esoteric and poorly supported), so since the video is being scaled anyway this is a great time to get rid of the anamorphism and make the pixels square. All that this requires is to take the Source Aspect Ratio (SAR) into account in the scaling calculation:
scale=iw*sar*min($MAX_WIDTH/(iw*sar)\,$MAX_HEIGHT/ih):ih*min($MAX_WIDTH/(iw*sar)\,$MAX_HEIGHT/ih)
Putting it Together
With the filters defined above, all that is required is to put them together into a complete command. It is possible to produce H.264 video with AAC audio by using the following command:
avconv \
-i "$INPUT_FILE" \
-map 0 \
-vf "scale=iw*sar*min($MAX_WIDTH/(iw*sar)\,$MAX_HEIGHT/ih):ih*min($MAX_WIDTH/(iw*sar)\,$MAX_HEIGHT/ih),pad=$MAX_WIDTH:$MAX_HEIGHT:(ow-iw)/2:(oh-ih)/2" \
-c:v libx264 \
-vprofile baseline -level 30 \
-c:a libvo_aacenc \
"$OUTPUT_FILE"
Simply replace $MAX_WIDTH
, $MAX_HEIGHT
, $INPUT_FILE
, and $OUTPUT_FILE
(or define them as environment variables) as desired. That’s it.