FFmpeg: Difference between revisions

Latest revision as of 06:01, 5 June 2025

FFmpeg (Fast Forward MPEG) is a library for encoding and decoding multimedia.

You can interact with FFmpeg using their command-line interface or using their C API.
Note that a lot of things involving just decoding or encoding can be done by calling their CLI application and piping things to stdin or from stdout.

CLI

You can download static builds of FFmpeg from

Linux: https://johnvansickle.com/ffmpeg/
Windows: https://ffmpeg.zeranoe.com/builds/

If you need nvenc support, you can build FFmpeg with https://github.com/markus-perl/ffmpeg-build-script.

Basic usage is as follows:

ffmpeg [-ss start_second] -i input_file [-s resolution] [-b bitrate] [-t time] [-r output_framerate] output.mp4

Use -pattern_type glob for wildcards (e.g. all images in a folder)

x264

x264 is a software h264 decoder and encoder.
[1]

Changing Pixel Format

Encode to h264 with YUV420p pixel format

ffmpeg -i input.mp4 -c:v libx264 -profile:v high -pix_fmt yuv420p output.mp4

Images to Video

Reference
Assuming 60 images per second and you want a 30 fps video.

# Make sure -framerate is before -i
ffmpeg -framerate 60 -i image-%03d.png -r 30 video.mp4

Video to Images

Extracting frames from a video

ffmpeg -i video.mp4 frames/%d.png

Use -ss H:M:S to specify where to start before you input the video
Use -vframes 1 to extract one frames
Use -vf "select=not(mod(n\,10))" to select every 10th frame

Get a list of encoders/decoders

Reference

for i in encoders decoders filters; do
    echo $i:; ffmpeg -hide_banner -${i} | egrep -i "npp|cuvid|nvenc|cuda"
done

PSNR/SSIM

Reference
FFmpeg can compare two videos and output the psnr or ssim numbers for each of the y, u, and v channels.

ffmpeg -i distorted.mp4 -i reference.mp4 \
       -lavfi "ssim;[0:v][1:v]psnr" -f null –

ffmpeg -i distorted.mp4 -i reference.mp4 -lavfi  psnr -f null -
ffmpeg -i distorted.mp4 -i reference.mp4 -lavfi  ssim -f null -

Generate Thumbnails

Reference
Below is a bash script to generate all thumbnails in a folder

Script

#!/usr/bin/env bash

OUTPUT_FOLDER="thumbnails"

mkdir -p $OUTPUT_FOLDER
for file in *.mp4;
  do ffmpeg -i "$file" -vf "select=gte(n\,300)" -vframes 1 "$OUTPUT_FOLDER/${file%.mp4}.png";
done

MP4 to GIF

Normally you can just do

ffmpeg -i my_video.mp4 my_video.gif

If you want better quality, you can use the following filter_complex:

[0]split=2[v1][v2];[v1]palettegen=stats_mode=full[palette];[v2][palette]paletteuse=dither=sierra2_4a

Here is another script from https://superuser.com/questions/556029/how-do-i-convert-a-video-to-gif-using-ffmpeg-with-reasonable-quality

mp4 to gif script

#!/bin/sh
ffmpeg -i $1 -vf "fps=15,split[s0][s1];[s0]palettegen[p];[s1][p]paletteuse" -loop 0 $2

Pipe to stdout

Below is an example of piping the video only to stdout:

ffmpeg -i video.webm -pix_fmt rgb24 -f rawvideo -

In Python, you can read it as follows:

video_width = 1920
video_height = 1080
ffmpeg_process = subprocess.Popen(ffmpeg_command,
                                  stdout=subprocess.PIPE,
                                  stderr=subprocess.PIPE)
raw_image = ffmpeg_process.stdout.read(
              video_width * video_height * 3)
image = (np.frombuffer(raw_image, dtype=np.uint8)
           .reshape(video_height, video_width, 3))

Filters

Filters are part of the CLI
https://ffmpeg.org/ffmpeg-filters.html

Crop

ffmpeg -i input_filename -vf  "crop=w:h:x:y" output_filename

Here x and y are the top left corners of your crop. w and h are the height and width of the final image or video.

Resizing/Scaling

FFMpeg Scaling
scale filter

ffmpeg -i input.avi -vf scale=320:240 output.avi

ffmpeg -i input.jpg -vf scale=iw*2:ih input_double_width.png

If the aspect ratio is not what you expect, try using the setdar filter.
- E.g. setdar=ratio=2/1

Resizing with transparent padding

Useful for generating logos

ffmpeg -i icon.svg -vf "scale=h=128:w=128:force_original_aspect_ratio=decrease,pad=128:128:(ow-iw)/2:(oh-ih)/2:color=0x00000000" -y icon.png

More sizes

256

ffmpeg -i icon.svg -vf "scale=h=256:w=256:force_original_aspect_ratio=decrease,pad=256:256:(ow-iw)/2:(oh-ih)/2:color=0x00000000" -y icon.png

512

ffmpeg -i icon.svg -vf "scale=h=512:w=512:force_original_aspect_ratio=decrease,pad=512:512:(ow-iw)/2:(oh-ih)/2:color=0x00000000" -y icon.png

Rotation

transpose filter

To rotate 180 degrees

ffmpeg -i input.mp4 -vf "transpose=1,transpose=1" output.mp4

0 – Rotate by 90 degrees counter-clockwise and flip vertically.
1 – Rotate by 90 degrees clockwise.
2 – Rotate by 90 degrees counter-clockwise.
3 – Rotate by 90 degrees clockwise and flip vertically.

360 Video

See v360 filter

Converting EAC to equirectangular

Youtube sometimes uses an EAC format. You can convert this to the traditional equirectangular format as follows:

ffmpeg -i input.mp4 -vf "v360=eac:e" output.mp4

Sometimes you may run into errors where height or width is not divisible by 2.
Apply a scale filter to fix this issue.

ffmpeg -i input.mp4 -vf "v360=eac:e,scale=iw:-2" output.mp4

Converting to rectilinear

ffmpeg -i input.mp4 -vf "v360=e:rectilinear:h_fov=90:v_fov=90" output.mp4

Metadata

To add 360 video metadata, you should use Google's spatial-media. This will add the following sidedata which you can see using ffprobe:

Side data:
 spherical: equirectangular (0.000000/0.000000/0.000000)

Removing Duplicate Frames

Reference
mpdecimate filter

Useful for extracting frames from timelapses.

ffmpeg -i input.mp4 -vf mpdecimate,setpts=N/FRAME_RATE/TB out.mp4

Stack and Unstack

To stack, see hstack, vstack.
To unstack, see crop.

Filter-Complex

Filter complex allows you to create a graph of filters.

Suppose you have 3 inputs: $1, $2, $3.
Then you can access them as streams [0], [1], [3].
The filter syntax allows you to chain multiple filters where each filter is an edge.
For example, [0]split[t1][t2] creates two vertices t1 and t2 from input 0. The last statement in your edge will be the output of your command:
E.g. [t1][t2]vstack

ffmpeg -i $1 -i $2 -i $3 -filter_complex "[0]split[t1][t2];[t1][t2]vstack" output.mkv -y

Concatenate Videos

ffmpeg -i part_1.mp4 \
    -i part_2.mp4 \
    -i part_3.mp4 \
    -filter_complex \
    "[0]scale=1920:1080[0s];\
     [1]scale=1920:1080[1s];\
     [2]scale=1920:1080[2s];\
     [0s][0:a][1s][1:a][2s][2:a]concat=n=3:v=1:a=1[v][a]" \
    -map "[v]" -map "[a]" \
    -vsync 2 \
    all_parts.mp4 -y

Replace transparency

Reference
Add a background to transparent images.

ffmpeg -i in.mov -filter_complex "[0]format=pix_fmts=yuva420p,split=2[bg][fg];[bg]drawbox=c=white@1:replace=1:t=fill[bg];[bg][fg]overlay=format=auto" -c:a copy new.mov

Draw Text

https://stackoverflow.com/questions/15364861/frame-number-overlay-with-ffmpeg

ffmpeg -i input -vf "drawtext=fontfile=Arial.ttf: text='%{frame_num}': start_number=1: x=(w-tw)/2: y=h-(2*lh): fontcolor=black: fontsize=20: box=1: boxcolor=white: boxborderw=5" -c:a copy output

C API

A doxygen reference manual for their C api is available at [2].
Note that FFmpeg is licensed under GPL.
If you only need to do encoding and decoding, you can simply pipe the inputs and outputs of the FFmpeg CLI to your program [3].

Getting Started

Best way to get started is to look at the official examples.

Structs

AVInputFormat/AVOutputFormat Represents a container type.
AVFormatContext Represents your specific container.
AVStream Represents a single audio, video, or data stream in your container.
AVCodec Represents a single codec (e.g. H.264)
AVCodecContext Represents your specific codec and contains all associated paramters (e.g. resolution, bitrate, fps).
AVPacket Compressed Data.
AVFrame Decoded audio or video data.
SwsContext Used for image scaling and colorspace and pixel format conversion operations.

Pixel Formats

Reference
Pixel formats are stored as AVPixelFormat enums.
Below are descriptions for a few common pixel formats.
Note that the exact sizes of buffers may vary depending on alignment.

AV_PIX_FMT_RGB24

This is your standard 24 bits per pixel RGB.
In your AVFrame, data[0] will contain your single buffer RGBRGBRGB.
Where the linesize is typically $\displaystyle 3 * width$ bytes per row and $\displaystyle 3$ bytes per pixel.

AV_PIX_FMT_YUV420P

This is a planar YUV pixel format with chroma subsampling.
Each pixel will have its own luma component (Y) but each $\displaystyle 2 \times 2$ block of pixels will share chrominance components (U, V)
In your AVFrame, data[0] will contain your Y image, data[1] will contain your .
Data[0] will typically be $\displaystyle width * height$ bytes.
Data[1] and data[2] will typically be $\displaystyle width * height / 4$ bytes.

Muxing to memory

You can specify a custom AVIOContext and attach it to your AVFormatContext->pb to mux directly to memory or to implement your own buffering.

NVENC

Options Reference

When encoding using NVENC, your codec_ctx->priv_data is a pointer to a NvencContext.

To list all of the things you can set in the private data, you can type the following in bash

ffmpeg -hide_banner -h encoder=h264_nvenc

NVENC Codec Ctx

  if ((ret = av_hwdevice_ctx_create(&hw_device_ctx, AV_HWDEVICE_TYPE_CUDA, NULL,
                                    NULL, 0)) < 0) {
    cerr << "[VideoEncoder::VideoEncoder] Failed to create hw context" << endl;
    return;
  }

  if (!(codec = avcodec_find_encoder_by_name("h264_nvenc"))) {
    cerr << "[VideoEncoder::VideoEncoder] Failed to find h264_nvenc encoder"
         << endl;
    return;
  }
  codec_ctx = avcodec_alloc_context3(codec);
  codec_ctx->bit_rate = 2500000;
  codec_ctx->width = source_codec_ctx->width;
  codec_ctx->height = source_codec_ctx->height;
  codec_ctx->codec_type = AVMEDIA_TYPE_VIDEO;
  codec_ctx->time_base = source_codec_ctx->time_base;
  input_timebase = source_codec_ctx->time_base;
  codec_ctx->framerate = source_codec_ctx->framerate;
  codec_ctx->pix_fmt = AV_PIX_FMT_CUDA;
  codec_ctx->profile = FF_PROFILE_H264_CONSTRAINED_BASELINE;
  codec_ctx->max_b_frames = 0;
  codec_ctx->delay = 0;
  codec_ctx->gop_size = 0;
// Todo: figure out which ones of these do nothing
  av_opt_set(codec_ctx->priv_data, "cq", "23", AV_OPT_SEARCH_CHILDREN);
  av_opt_set(codec_ctx->priv_data, "preset", "llhp", 0);
  av_opt_set(codec_ctx->priv_data, "tune", "zerolatency", 0);
  av_opt_set(codec_ctx->priv_data, "look_ahead", "0", 0);
  av_opt_set(codec_ctx->priv_data, "zerolatency", "1", 0);
  av_opt_set(codec_ctx->priv_data, "nb_surfaces", "0", 0);

C++ API

FFmpeg does not have an official C++ API.
There are wrappers such as Raveler/ffmpeg-cpp which you can use.
However, I recommend just using the C API and wrapping things in smart pointers.

Python API

You can try pyav which contains bindings for the library. However I haven't tried it.
If you just need to call the CLI, you can use ffmpeg-python to help build calls.

JavaScript API

To use FFmpeg in a browser, see ffmpegwasm.
This is used in https://davidl.me/apps/media/index.html.

My Preferences

My preferences for encoding video

AV1

Prefer AV1 for encoding video on on modern devices.

H265/HEVC

H264/HEVC is now a good tradeoff between size, quality, and compatibility. This has been supported on devices since Android 5.0 (2014).

ffmpeg -i $1 -c:v libx265 -crf 23 -preset slow -pix_fmt yuv444p10le -c:a libopus -b:a 128K $2

Notes

The pixel format yuv444p10le is 10 bit color without chroma subsampling. If your source is lower, you can use yuv420p instead for 8-bit color and 4:2:0 chroma subsampling.

H264

If you need compatability with very old and low end devices.

ffmpeg -i $1 -c:v libx264 -crf 28 -preset medium -pix_fmt yuv420p -c:a libfdk_aac -b:a 128K $2

Opus

For streaming:

ffmpeg -i input.wav -c:a libopus -b:a 96k output.opus

See https://wiki.xiph.org/Opus_Recommended_Settings