FFmpeg
FFmpeg (Fast Forward MPEG) is a library for encoding and decoding multimedia.
You can interact with FFmpeg using their command-line interface or using their C API.
Note that a lot of things involving just decoding or encoding can be done by calling their CLI application and piping things to stdin or from stdout.
CLI
You can download static builds of FFmpeg from
- Linux: https://johnvansickle.com/ffmpeg/
- Windows: https://ffmpeg.zeranoe.com/builds/
If you need nvenc support, you can build FFmpeg with https://github.com/markus-perl/ffmpeg-build-script.
Basic usage is as follows:
ffmpeg [-ss start_second] -i input_file [-s resolution] [-b bitrate] [-t time] [-r output_framerate] output.mp4
- Use
-pattern_type glob
for wildcards (e.g. all images in a folder)
x264
x264 is a software h264 decoder and encoder.
[1]
Changing Pixel Format
Encode to h264 with YUV420p pixel format
ffmpeg -i input.mp4 -c:v libx264 -profile:v high -pix_fmt yuv420p output.mp4
Images to Video
Reference
Assuming 60 images per second and you want a 30 fps video.
# Make sure -framerate is before -i
ffmpeg -framerate 60 -i image-%03d.png -r 30 video.mp4
Video to Images
Extracting frames from a video
ffmpeg -i video.mp4 frames/%d.png
- Use
-ss H:M:S
to specify where to start before you input the video - Use
-vframes 1
to extract one frames - Use
-vf "select=not(mod(n\,10))"
to select every 10th frame
Get a list of encoders/decoders
for i in encoders decoders filters; do
echo $i:; ffmpeg -hide_banner -${i} | egrep -i "npp|cuvid|nvenc|cuda"
done
PSNR/SSIM
Reference
FFmpeg can compare two videos and output the psnr or ssim numbers for each of the y, u, and v channels.
ffmpeg -i distorted.mp4 -i reference.mp4 \
-lavfi "ssim;[0:v][1:v]psnr" -f null –
ffmpeg -i distorted.mp4 -i reference.mp4 -lavfi psnr -f null -
ffmpeg -i distorted.mp4 -i reference.mp4 -lavfi ssim -f null -
Generate Thumbnails
Reference
Below is a bash script to generate all thumbnails in a folder
#!/usr/bin/env bash
OUTPUT_FOLDER="thumbnails"
mkdir -p $OUTPUT_FOLDER
for file in *.mp4;
do ffmpeg -i "$file" -vf "select=gte(n\,300)" -vframes 1 "$OUTPUT_FOLDER/${file%.mp4}.png";
done
MP4 to GIF
Normally you can just do
ffmpeg -i my_video.mp4 my_video.gif
If you want better quality, you can use the following filter_complex:
[0]split=2[v1][v2];[v1]palettegen=stats_mode=full[palette];[v2][palette]paletteuse=dither=sierra2_4a
Here is another script from https://superuser.com/questions/556029/how-do-i-convert-a-video-to-gif-using-ffmpeg-with-reasonable-quality
#!/bin/sh
ffmpeg -i $1 -vf "fps=15,split[s0][s1];[s0]palettegen[p];[s1][p]paletteuse" -loop 0 $2
Pipe to stdout
Below is an example of piping the video only to stdout:
ffmpeg -i video.webm -pix_fmt rgb24 -f rawvideo -
In Python, you can read it as follows:
video_width = 1920
video_height = 1080
ffmpeg_process = subprocess.Popen(ffmpeg_command,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
raw_image = ffmpeg_process.stdout.read(
video_width * video_height * 3)
image = (np.frombuffer(raw_image, dtype=np.uint8)
.reshape(video_height, video_width, 3))
Filters
Filters are part of the CLI
https://ffmpeg.org/ffmpeg-filters.html
Crop
ffmpeg -i input_filename -vf "crop=w:h:x:y" output_filename
- Here
x
andy
are the top left corners of your crop.w
andh
are the height and width of the final image or video.
Resizing/Scaling
ffmpeg -i input.avi -vf scale=320:240 output.avi
ffmpeg -i input.jpg -vf scale=iw*2:ih input_double_width.png
- If the aspect ratio is not what you expect, try using the
setdar
filter.- E.g.
setdar=ratio=2/1
- E.g.
- Resizing with transparent padding
Useful for generating logos
ffmpeg -i icon.svg -vf "scale=h=128:w=128:force_original_aspect_ratio=decrease,pad=128:128:(ow-iw)/2:(oh-ih)/2:color=0x00000000" -y icon.png
- 256
ffmpeg -i icon.svg -vf "scale=h=256:w=256:force_original_aspect_ratio=decrease,pad=256:256:(ow-iw)/2:(oh-ih)/2:color=0x00000000" -y icon.png
- 512
ffmpeg -i icon.svg -vf "scale=h=512:w=512:force_original_aspect_ratio=decrease,pad=512:512:(ow-iw)/2:(oh-ih)/2:color=0x00000000" -y icon.png
Rotation
To rotate 180 degrees
ffmpeg -i input.mp4 -vf "transpose=1,transpose=1" output.mp4
- 0 – Rotate by 90 degrees counter-clockwise and flip vertically.
- 1 – Rotate by 90 degrees clockwise.
- 2 – Rotate by 90 degrees counter-clockwise.
- 3 – Rotate by 90 degrees clockwise and flip vertically.
360 Video
See v360 filter
Converting EAC to equirectangular
Youtube sometimes uses an EAC format. You can convert this to the traditional equirectangular format as follows:
ffmpeg -i input.mp4 -vf "v360=eac:e" output.mp4
Sometimes you may run into errors where height or width is not divisible by 2.
Apply a scale filter to fix this issue.
ffmpeg -i input.mp4 -vf "v360=eac:e,scale=iw:-2" output.mp4
Converting to rectilinear
ffmpeg -i input.mp4 -vf "v360=e:rectilinear:h_fov=90:v_fov=90" output.mp4
Metadata
To add 360 video metadata, you should use Google's spatial-media.
This will add the following sidedata which you can see using ffprobe
:
Side data: spherical: equirectangular (0.000000/0.000000/0.000000)
Removing Duplicate Frames
Useful for extracting frames from timelapses.
ffmpeg -i input.mp4 -vf mpdecimate,setpts=N/FRAME_RATE/TB out.mp4
Stack and Unstack
To stack, see hstack
, vstack
.
To unstack, see crop
.
Filter-Complex
Filter complex allows you to create a graph of filters.
Suppose you have 3 inputs: $1, $2, $3.
Then you can access them as streams [0], [1], [3].
The filter syntax allows you to chain multiple filters where each filter is an edge.
For example, [0]split[t1][t2]
creates two vertices t1 and t2 from input 0.
The last statement in your edge will be the output of your command:
E.g. [t1][t2]vstack
ffmpeg -i $1 -i $2 -i $3 -filter_complex "[0]split[t1][t2];[t1][t2]vstack" output.mkv -y
Concatenate Videos
ffmpeg -i part_1.mp4 \ -i part_2.mp4 \ -i part_3.mp4 \ -filter_complex \ "[0]scale=1920:1080[0s];\ [1]scale=1920:1080[1s];\ [2]scale=1920:1080[2s];\ [0s][0:a][1s][1:a][2s][2:a]concat=n=3:v=1:a=1[v][a]" \ -map "[v]" -map "[a]" \ -vsync 2 \ all_parts.mp4 -y
Replace transparency
Reference
Add a background to transparent images.
ffmpeg -i in.mov -filter_complex "[0]format=pix_fmts=yuva420p,split=2[bg][fg];[bg]drawbox=c=white@1:replace=1:t=fill[bg];[bg][fg]overlay=format=auto" -c:a copy new.mov
Draw Text
https://stackoverflow.com/questions/15364861/frame-number-overlay-with-ffmpeg
ffmpeg -i input -vf "drawtext=fontfile=Arial.ttf: text='%{frame_num}': start_number=1: x=(w-tw)/2: y=h-(2*lh): fontcolor=black: fontsize=20: box=1: boxcolor=white: boxborderw=5" -c:a copy output
C API
A doxygen reference manual for their C api is available at [2].
Note that FFmpeg is licensed under GPL.
If you only need to do encoding and decoding, you can simply pipe the inputs and outputs of the FFmpeg CLI to your program [3].
Getting Started
Best way to get started is to look at the official examples.
Structs
AVInputFormat
/AVOutputFormat
Represents a container type.AVFormatContext
Represents your specific container.AVStream
Represents a single audio, video, or data stream in your container.AVCodec
Represents a single codec (e.g. H.264)AVCodecContext
Represents your specific codec and contains all associated paramters (e.g. resolution, bitrate, fps).AVPacket
Compressed Data.AVFrame
Decoded audio or video data.SwsContext
Used for image scaling and colorspace and pixel format conversion operations.
Pixel Formats
Reference
Pixel formats are stored as AVPixelFormat
enums.
Below are descriptions for a few common pixel formats.
Note that the exact sizes of buffers may vary depending on alignment.
- AV_PIX_FMT_RGB24
- This is your standard 24 bits per pixel RGB.
- In your AVFrame, data[0] will contain your single buffer RGBRGBRGB.
- Where the linesize is typically \(\displaystyle 3 * width\) bytes per row and \(\displaystyle 3\) bytes per pixel.
- AV_PIX_FMT_YUV420P
- This is a planar YUV pixel format with chroma subsampling.
- Each pixel will have its own luma component (Y) but each \(\displaystyle 2 \times 2\) block of pixels will share chrominance components (U, V)
- In your AVFrame, data[0] will contain your Y image, data[1] will contain your .
- Data[0] will typically be \(\displaystyle width * height\) bytes.
- Data[1] and data[2] will typically be \(\displaystyle width * height / 4\) bytes.
Muxing to memory
You can specify a custom AVIOContext
and attach it to your AVFormatContext->pb
to mux directly to memory or to implement your own buffering.
NVENC
When encoding using NVENC, your codec_ctx->priv_data
is a pointer to a NvencContext
.
To list all of the things you can set in the private data, you can type the following in bash
ffmpeg -hide_banner -h encoder=h264_nvenc
if ((ret = av_hwdevice_ctx_create(&hw_device_ctx, AV_HWDEVICE_TYPE_CUDA, NULL,
NULL, 0)) < 0) {
cerr << "[VideoEncoder::VideoEncoder] Failed to create hw context" << endl;
return;
}
if (!(codec = avcodec_find_encoder_by_name("h264_nvenc"))) {
cerr << "[VideoEncoder::VideoEncoder] Failed to find h264_nvenc encoder"
<< endl;
return;
}
codec_ctx = avcodec_alloc_context3(codec);
codec_ctx->bit_rate = 2500000;
codec_ctx->width = source_codec_ctx->width;
codec_ctx->height = source_codec_ctx->height;
codec_ctx->codec_type = AVMEDIA_TYPE_VIDEO;
codec_ctx->time_base = source_codec_ctx->time_base;
input_timebase = source_codec_ctx->time_base;
codec_ctx->framerate = source_codec_ctx->framerate;
codec_ctx->pix_fmt = AV_PIX_FMT_CUDA;
codec_ctx->profile = FF_PROFILE_H264_CONSTRAINED_BASELINE;
codec_ctx->max_b_frames = 0;
codec_ctx->delay = 0;
codec_ctx->gop_size = 0;
// Todo: figure out which ones of these do nothing
av_opt_set(codec_ctx->priv_data, "cq", "23", AV_OPT_SEARCH_CHILDREN);
av_opt_set(codec_ctx->priv_data, "preset", "llhp", 0);
av_opt_set(codec_ctx->priv_data, "tune", "zerolatency", 0);
av_opt_set(codec_ctx->priv_data, "look_ahead", "0", 0);
av_opt_set(codec_ctx->priv_data, "zerolatency", "1", 0);
av_opt_set(codec_ctx->priv_data, "nb_surfaces", "0", 0);
C++ API
FFmpeg does not have an official C++ API.
There are wrappers such as Raveler/ffmpeg-cpp which you can use.
However, I recommend just using the C API and wrapping things in smart pointers.
Python API
You can try pyav which contains bindings for the library. However I haven't tried it.
If you just need to call the CLI, you can use ffmpeg-python to help build calls.
JavaScript API
To use FFmpeg in a browser, see ffmpegwasm.
This is used in https://davidl.me/apps/media/index.html.
My Preferences
My preferences for encoding video
H264
H264 is best when you need the most compatability, especially with older or low end devices.
!#/bin/bash
ffmpeg -i $1 -c:v libx264 -crf 28 -preset medium -pix_fmt yuv420p -c:a libfdk_aac -b:a 128K $2
- Notes
- MP4 is ok
H265/HEVC
H264/HEVC is now a good tradeoff between size, quality, and compatability.
!#/bin/bash
ffmpeg -i $1 -c:v libx265 -crf 23 -preset slow -pix_fmt yuv444p10le -c:a libopus -b:a 128K $2
- Notes
- You need to output to a MKV file
- The pixel format
yuv444p10le
is 10 bit color without chroma subsampling. If your source is lower, you can useyuv420p
instead for 8-bit color and 4:2:0 chroma subsampling.