FFmpeg
FFmpeg (Fast Forward MPEG) is a library for encoding and decoding multimedia.
You can interact with FFmpeg using their command-line interface or using their C API.
I find it useful for converting videos to gifs. You can also extract videos into a sequence of images or vice-versa.
CLI
You can download static builds of FFmpeg from
- Linux: https://johnvansickle.com/ffmpeg/
- Windows: https://ffmpeg.zeranoe.com/builds/
Basic usage is as follows:
ffmpeg [-ss start_second] -i input_file [-s resolution] [-b bitrate] [-t time] [-r output_framerate] output.mp4
- Use
-pattern_type glob
for wildcards (e.g. all images in a folder)
x264
x264 is a software h264 decoder and encoder.
[1]
Changing Pixel Format
Encode to h264 with YUV420p pixel format
ffmpeg -i input.mp4 -c:v libx264 -profile:v high -pix_fmt yuv420p output.mp4
Images to Video
Reference
Assuming 60 images per second and you want a 30 fps video.
# Make sure -framerate is before -i
ffmpeg -framerate 60 -i image-%03d.png -r 30 video.mp4
Video to Images
Extracting frames from a video
ffmpeg -i video.mp4 frames/%d.png
- Use
-ss H:M:S
to specify where to start before you input the video - Use
-vframes 1
to extract one frames - Use
-vf "select=not(mod(n\,10))"
to select every 10th frame
Crop
ffmpeg -i input_filename -vf "crop=w:h:x:y" output_filename
- Here
x
andy
are the top left corners of your crop.w
andh
are the height and width of the final image or video.
Get a list of encoders/decoders
for i in encoders decoders filters; do
echo $i:; ffmpeg -hide_banner -${i} | egrep -i "npp|cuvid|nvenc|cuda"
done
PSNR/SSIM
Reference
FFmpeg can compare two videos and output the psnr or ssim numbers for each of the y, u, and v channels.
ffmpeg -i distorted.mp4 -i reference.mp4 \
-lavfi "ssim;[0:v][1:v]psnr" -f null –
ffmpeg -i distorted.mp4 -i reference.mp4 -lavfi psnr -f null -
ffmpeg -i distorted.mp4 -i reference.mp4 -lavfi ssim -f null -
Generate Thumbnails
Reference
Below is a bash script to generate all thumbnails in a folder
#!/usr/bin/env bash
OUTPUT_FOLDER="thumbnails"
mkdir -p $OUTPUT_FOLDER
for file in *.mp4;
do ffmpeg -i "$file" -vf "select=gte(n\,300)" -vframes 1 "$OUTPUT_FOLDER/${file%.mp4}.png";
done
MP4 to GIF
Normally you can just do
ffmpeg -i my_video.mp4 my_video.gif
However, Ruofei has a more advanced script below:
#!/bin/sh
start_time=0:0
duration=17
palette="/tmp/palette.png"
filters="fps=15,scale=320:-1:flags=lanczos"
ffmpeg -v warning -ss $start_time -t $duration -i $1.mp4 -vf "$filters,palettegen" -y $palette
ffmpeg -v warning -ss $start_time -t $duration -i $1.mp4 -i $palette -lavfi "$filters [x]; [x][1:v] paletteuse" -y $1.gif
Here is another script from https://superuser.com/questions/556029/how-do-i-convert-a-video-to-gif-using-ffmpeg-with-reasonable-quality
#!/bin/sh
ffmpeg -i $1 -vf "fps=15,split[s0][s1];[s0]palettegen[p];[s1][p]paletteuse" -loop 0 $2
Filters
Resizing/Scaling
ffmpeg -i input.avi -vf scale=320:240 output.avi
ffmpeg -i input.jpg -vf scale=iw*2:ih input_double_width.png
- If the aspect ratio is not what you expect, try using the
setdar
filter.- E.g.
setdar=ratio=2/1
- E.g.
- Resizing with transparent padding
Useful for generating logos
ffmpeg -i icon.svg -vf "scale=h=128:w=128:force_original_aspect_ratio=decrease,pad=128:128:(ow-iw)/2:(oh-ih)/2:color=0x00000000" -y icon.png
- 256
ffmpeg -i icon.svg -vf "scale=h=256:w=256:force_original_aspect_ratio=decrease,pad=256:256:(ow-iw)/2:(oh-ih)/2:color=0x00000000" -y icon.png
- 512
ffmpeg -i icon.svg -vf "scale=h=512:w=512:force_original_aspect_ratio=decrease,pad=512:512:(ow-iw)/2:(oh-ih)/2:color=0x00000000" -y icon.png
Rotation
To rotate 180 degrees
ffmpeg -i input.mp4 -vf "transpose=1,transpose=1" output.mp4
- 0 – Rotate by 90 degrees counter-clockwise and flip vertically.
- 1 – Rotate by 90 degrees clockwise.
- 2 – Rotate by 90 degrees counter-clockwise.
- 3 – Rotate by 90 degrees clockwise and flip vertically.
360 Video
See v360 filter
Converting EAC to equirectangular
Youtube sometimes uses an EAC format. You can convert this to the traditional equirectangular format as follows:
ffmpeg -i input.mp4 -vf "v360=eac:e" output.mp4
Sometimes you may run into errors where height or width is not divisible by 2.
Apply a scale filter to fix this issue.
ffmpeg -i input.mp4 -vf "v360=eac:e,scale=iw:-2" output.mp4
Converting to rectilinear
ffmpeg -i input.mp4 -vf "v360=e:rectilinear:h_fov=90:v_fov=90" output.mp4
Metadata
To add 360 video metadata, you should use Google's spatial-media.
This will add the following sidedata which you can see using ffprobe
:
Side data: spherical: equirectangular (0.000000/0.000000/0.000000)
Removing Duplicate Frames
Useful for extracting frames from timelapses.
ffmpeg -i input.mp4 -vf mpdecimate,setpts=N/FRAME_RATE/TB out.mp4
Stack and Unstack
To stack, see hstack
, vstack
.
To unstack, see crop
.
Filter-Complex
Filter complex allows you to create a graph of filters.
Suppose you have 3 inputs: $1, $2, $3.
Then you can access them as streams [0], [1], [3].
The filter syntax allows you to chain multiple filters where each filter is an edge.
For example, [0]split[t1][t2]
creates two vertices t1 and t2 from input 0.
The last statement in your edge will be the output of your command:
E.g. [t1][t2]vstack
ffmpeg -i $1 -i $2 -i $3 -filter-complex "[0]split[t1][t2];[t1][t2]vstack" output.mkv -y
Concatenate Videos
ffmpeg -i part_1.mp4 \ -i part_2.mp4 \ -i part_3.mp4 \ -filter_complex \ "[0]scale=1920:1080[0s];\ [1]scale=1920:1080[1s];\ [2]scale=1920:1080[2s];\ [0s][0:a][1s][1:a][2s][2:a]concat=n=3:v=1:a=1[v][a]" \ -map "[v]" -map "[a]" \ -vsync 2 \ all_parts.mp4 -y
C API
A doxygen reference manual for their C api is available at [2].
Getting Started
Structs
AVInputFormat
/AVOutputFormat
Represents a container type.AVFormatContext
Represents your specific container.AVStream
Represents a single audio, video, or data stream in your container.AVCodec
Represents a single codec (e.g. H.264)AVCodecContext
Represents your specific codec and contains all associated paramters (e.g. resolution, bitrate, fps).AVPacket
Compressed Data.AVFrame
Decoded audio or video data.SwsContext
Used for image scaling and colorspace and pixel format conversion operations.
Pixel Formats
Reference
Pixel formats are stored as AVPixelFormat
enums.
Below are descriptions for a few common pixel formats.
Note that the exact sizes of buffers may vary depending on alignment.
- AV_PIX_FMT_RGB24
- This is your standard 24 bits per pixel RGB.
- In your AVFrame, data[0] will contain your single buffer RGBRGBRGB.
- Where the linesize is typically \(\displaystyle 3 * width\) bytes per row and \(\displaystyle 3\) bytes per pixel.
- AV_PIX_FMT_YUV420P
- This is a planar YUV pixel format with chroma subsampling.
- Each pixel will have its own luma component (Y) but each \(\displaystyle 2 \times 2\) block of pixels will share chrominance components (U, V)
- In your AVFrame, data[0] will contain your Y image, data[1] will contain your .
- Data[0] will typically be \(\displaystyle width * height\) bytes.
- Data[1] and data[2] will typically be \(\displaystyle width * height / 4\) bytes.
Muxing to memory
You can specify a custom AVIOContext
and attach it to your AVFormatContext->pb
to mux directly to memory or to implement your own buffering.
NVENC
When encoding using NVENC, your codec_ctx->priv_data
is a pointer to a NvencContext
.
To list all of the things you can set in the private data, you can type the following in bash
ffmpeg -hide_banner -h encoder=h264_nvenc
if ((ret = av_hwdevice_ctx_create(&hw_device_ctx, AV_HWDEVICE_TYPE_CUDA, NULL,
NULL, 0)) < 0) {
cerr << "[VideoEncoder::VideoEncoder] Failed to create hw context" << endl;
return;
}
if (!(codec = avcodec_find_encoder_by_name("h264_nvenc"))) {
cerr << "[VideoEncoder::VideoEncoder] Failed to find h264_nvenc encoder"
<< endl;
return;
}
codec_ctx = avcodec_alloc_context3(codec);
codec_ctx->bit_rate = 2500000;
codec_ctx->width = source_codec_ctx->width;
codec_ctx->height = source_codec_ctx->height;
codec_ctx->codec_type = AVMEDIA_TYPE_VIDEO;
codec_ctx->time_base = source_codec_ctx->time_base;
input_timebase = source_codec_ctx->time_base;
codec_ctx->framerate = source_codec_ctx->framerate;
codec_ctx->pix_fmt = AV_PIX_FMT_CUDA;
codec_ctx->profile = FF_PROFILE_H264_CONSTRAINED_BASELINE;
codec_ctx->max_b_frames = 0;
codec_ctx->delay = 0;
codec_ctx->gop_size = 0;
// Todo: figure out which ones of these do nothing
av_opt_set(codec_ctx->priv_data, "cq", "23", AV_OPT_SEARCH_CHILDREN);
av_opt_set(codec_ctx->priv_data, "preset", "llhp", 0);
av_opt_set(codec_ctx->priv_data, "tune", "zerolatency", 0);
av_opt_set(codec_ctx->priv_data, "look_ahead", "0", 0);
av_opt_set(codec_ctx->priv_data, "zerolatency", "1", 0);
av_opt_set(codec_ctx->priv_data, "nb_surfaces", "0", 0);
C++ API
FFmpeg does not have an official C++ API.
There are wrappers such as Raveler/ffmpeg-cpp which you can use.
However, I recommend just using the C API and wrapping things in smart pointers.
My Preferences
My preferences for encoding video
- H264
I mostly use H264 when working on projects for compatibility purposes. Here I typically don't need the smallest file size or best quality, prioritizing encoding speed.
!#/bin/bash
ffmpeg -i $1 -c:v libx264 -crf 28 -preset medium -pix_fmt yuv420p -c:a libfdk_aac -b:a 128K $2
- Notes
- MP4 is ok
- H265/HEVC
H264/HEVC is used for archival purposes to minimize the file size and maximize the quality.
!#/bin/bash
ffmpeg -i $1 -c:v libx265 -crf 23 -preset slow -pix_fmt yuv444p10le -c:a libopus -b:a 128K $2
- Notes
- You need to output to a MKV file
- The pixel format
yuv444p10le
is 10 bit color without chroma subsampling. If your source is lower, you can useyuv420p
instead for 8-bit color and 4:2:0 chroma subsampling.