FFmpeg: Difference between revisions

No edit summary
 
(97 intermediate revisions by the same user not shown)
Line 1: Line 1:
FFmpeg is a library for encoding and decoding multimedia.
[https://ffmpeg.org/ FFmpeg] (Fast Forward MPEG) is a library for encoding and decoding multimedia.
You can interact with FFmpeg using their command-line interface or using their [https://ffmpeg.org/doxygen/trunk/index.html C API].
 
I find it useful for converting videos to gifs. You can also [https://en.wikibooks.org/wiki/FFMPEG_An_Intermediate_Guide/image_sequence extract videos into a sequence of images or vice-versa].
You can interact with FFmpeg using their command-line interface or using their [https://ffmpeg.org/doxygen/trunk/index.html C API].
Note that a lot of things involving just decoding or encoding can be done by calling their CLI application and piping things to stdin or from stdout.


==CLI==
==CLI==
You can download static builds of FFmpeg from
* Linux: [https://johnvansickle.com/ffmpeg/ https://johnvansickle.com/ffmpeg/]
* Windows: [https://ffmpeg.zeranoe.com/builds/ https://ffmpeg.zeranoe.com/builds/]
If you need nvenc support, you can build FFmpeg with https://github.com/markus-perl/ffmpeg-build-script.
Basic usage is as follows:
Basic usage is as follows:
<pre>
<pre>
ffmpeg -i input_file [-s resolution] [-b bitrate] [-ss start_second] [-t time] output.mp4
ffmpeg [-ss start_second] -i input_file [-s resolution] [-b bitrate] [-t time] [-r output_framerate] output.mp4
</pre>
</pre>
* Use <code>-pattern_type glob</code> for wildcards (e.g. all images in a folder)


===x264===
===x264===
Line 17: Line 26:
<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
ffmpeg -i input.mp4 -c:v libx264 -profile:v high -pix_fmt yuv420p output.mp4
ffmpeg -i input.mp4 -c:v libx264 -profile:v high -pix_fmt yuv420p output.mp4
<syntaxhighlight>
</syntaxhighlight>
 
===Images to Video===
[https://en.wikibooks.org/wiki/FFMPEG_An_Intermediate_Guide/image_sequence Reference]<br>
Assuming 60 images per second and you want a 30 fps video.
<syntaxhighlight lang="bash">
# Make sure -framerate is before -i
ffmpeg -framerate 60 -i image-%03d.png -r 30 video.mp4
</syntaxhighlight>
 
===Video to Images===
Extracting frames from a video
 
<syntaxhighlight lang="bash">
ffmpeg -i video.mp4 frames/%d.png
</syntaxhighlight>
 
* Use <code>-ss H:M:S</code> to specify where to start before you input the video
* Use <code>-vframes 1</code> to extract one frames
* Use <code>-vf "select=not(mod(n\,10))"</code> to select every 10th frame
 
===Get a list of encoders/decoders===
[https://superuser.com/questions/1236275/how-can-i-use-crf-encoding-with-nvenc-in-ffmpeg Reference]
<syntaxhighlight lang="bash">
for i in encoders decoders filters; do
    echo $i:; ffmpeg -hide_banner -${i} | egrep -i "npp|cuvid|nvenc|cuda"
done
</syntaxhighlight>
 
===PSNR/SSIM===
[https://github.com/stoyanovgeorge/ffmpeg/wiki/How-to-Compare-Video Reference]<br>
FFmpeg can compare two videos and output the psnr or ssim numbers for each of the y, u, and v channels.<br>
<syntaxhighlight lang="bash">
ffmpeg -i distorted.mp4 -i reference.mp4 \
      -lavfi "ssim;[0:v][1:v]psnr" -f null –
 
ffmpeg -i distorted.mp4 -i reference.mp4 -lavfi  psnr -f null -
ffmpeg -i distorted.mp4 -i reference.mp4 -lavfi  ssim -f null -
</syntaxhighlight>
 
===Generate Thumbnails===
[https://superuser.com/questions/1099491/batch-extract-thumbnails-with-ffmpeg Reference]<br>
Below is a bash script to generate all thumbnails in a folder
{{hidden|Script|
<syntaxhighlight lang="bash">
#!/usr/bin/env bash
 
OUTPUT_FOLDER="thumbnails"
 
mkdir -p $OUTPUT_FOLDER
for file in *.mp4;
  do ffmpeg -i "$file" -vf "select=gte(n\,300)" -vframes 1 "$OUTPUT_FOLDER/${file%.mp4}.png";
done
</syntaxhighlight>
}}
 
===MP4 to GIF===
Normally you can just do
<syntaxhighlight lang="bash">
ffmpeg -i my_video.mp4 my_video.gif
</syntaxhighlight>
 
If you want better quality, you can use the following filter_complex:
<pre>
[0]split=2[v1][v2];[v1]palettegen=stats_mode=full[palette];[v2][palette]paletteuse=dither=sierra2_4a
</pre>
 
Here is another script from [https://superuser.com/questions/556029/how-do-i-convert-a-video-to-gif-using-ffmpeg-with-reasonable-quality https://superuser.com/questions/556029/how-do-i-convert-a-video-to-gif-using-ffmpeg-with-reasonable-quality]
{{hidden | mp4 to gif script |
<syntaxhighlight lang="bash">
#!/bin/sh
ffmpeg -i $1 -vf "fps=15,split[s0][s1];[s0]palettegen[p];[s1][p]paletteuse" -loop 0 $2
</syntaxhighlight>
}}
 
===Pipe to stdout===
Below is an example of piping the video only to stdout:
<pre>
ffmpeg -i video.webm -pix_fmt rgb24 -f rawvideo -
</pre>
 
In Python, you can read it as follows:
<syntaxhighlight lang="python">
video_width = 1920
video_height = 1080
ffmpeg_process = subprocess.Popen(ffmpeg_command,
                                  stdout=subprocess.PIPE,
                                  stderr=subprocess.PIPE)
raw_image = ffmpeg_process.stdout.read(
              video_width * video_height * 3)
image = (np.frombuffer(raw_image, dtype=np.uint8)
          .reshape(video_height, video_width, 3))
</syntaxhighlight>
 
==Filters==
Filters are part of the CLI<br>
[https://ffmpeg.org/ffmpeg-filters.html https://ffmpeg.org/ffmpeg-filters.html]
 
===Crop===
<syntaxhighlight lang="bash">
ffmpeg -i input_filename -vf  "crop=w:h:x:y" output_filename
</syntaxhighlight>
 
* Here <code>x</code> and <code>y</code> are the top left corners of your crop. <code>w</code> and <code>h</code> are the height and width of the final image or video.
 
===Resizing/Scaling===
[https://trac.ffmpeg.org/wiki/Scaling FFMpeg Scaling]<br>
[https://ffmpeg.org/ffmpeg-filters.html#scale scale filter]
 
<syntaxhighlight lang="bash">
ffmpeg -i input.avi -vf scale=320:240 output.avi


ffmpeg -i input.jpg -vf scale=iw*2:ih input_double_width.png
</syntaxhighlight>
* If the aspect ratio is not what you expect, try using the <code>setdar</code> filter.
** E.g. <code>setdar=ratio=2/1</code>
;Resizing with transparent padding
Useful for generating logos
<syntaxhighlight lang="bash">
ffmpeg -i icon.svg -vf "scale=h=128:w=128:force_original_aspect_ratio=decrease,pad=128:128:(ow-iw)/2:(oh-ih)/2:color=0x00000000" -y icon.png
</syntaxhighlight>
{{hidden | More sizes |
;256
<syntaxhighlight lang="bash">
ffmpeg -i icon.svg -vf "scale=h=256:w=256:force_original_aspect_ratio=decrease,pad=256:256:(ow-iw)/2:(oh-ih)/2:color=0x00000000" -y icon.png
</syntaxhighlight>
;512
<syntaxhighlight lang="bash">
ffmpeg -i icon.svg -vf "scale=h=512:w=512:force_original_aspect_ratio=decrease,pad=512:512:(ow-iw)/2:(oh-ih)/2:color=0x00000000" -y icon.png
</syntaxhighlight>
}}
===Rotation===
[https://ffmpeg.org/ffmpeg-filters.html#transpose transpose filter]
To rotate 180 degrees
<syntaxhighlight lang="bash">
ffmpeg -i input.mp4 -vf "transpose=1,transpose=1" output.mp4
</syntaxhighlight>
* 0 – Rotate by 90 degrees counter-clockwise and flip vertically.
* 1 – Rotate by 90 degrees clockwise.
* 2 – Rotate by 90 degrees counter-clockwise.
* 3 – Rotate by 90 degrees clockwise and flip vertically.
===360 Video===
See [https://ffmpeg.org/ffmpeg-filters.html#v360 v360 filter]
=====Converting EAC to equirectangular=====
Youtube sometimes uses an EAC format. You can convert this to the traditional equirectangular format as follows:
<pre>
ffmpeg -i input.mp4 -vf "v360=eac:e" output.mp4
</pre>
Sometimes you may run into errors where height or width is not divisible by 2.<br>
Apply a scale filter to fix this issue.
<pre>
ffmpeg -i input.mp4 -vf "v360=eac:e,scale=iw:-2" output.mp4
</pre>
====Converting to rectilinear====
<pre>
ffmpeg -i input.mp4 -vf "v360=e:rectilinear:h_fov=90:v_fov=90" output.mp4
</pre>
====Metadata====
To add 360 video metadata, you should use [https://github.com/google/spatial-media Google's spatial-media].
This will add the following sidedata which you can see using <code>ffprobe</code>:
<pre>
Side data:
spherical: equirectangular (0.000000/0.000000/0.000000)
</pre>
===Removing Duplicate Frames===
[https://stackoverflow.com/questions/37088517/remove-sequentially-duplicate-frames-when-using-ffmpeg Reference]<br>
[https://ffmpeg.org/ffmpeg-filters.html#mpdecimate mpdecimate filter]
Useful for extracting frames from timelapses.
<syntaxhighlight lang="bash">
ffmpeg -i input.mp4 -vf mpdecimate,setpts=N/FRAME_RATE/TB out.mp4
</syntaxhighlight>
===Stack and Unstack===
To stack, see [https://ffmpeg.org/ffmpeg-all.html#hstack <code>hstack</code>], [https://ffmpeg.org/ffmpeg-all.html#vstack <code>vstack</code>]. 
To unstack, see <code>crop</code>.
===Filter-Complex===
Filter complex allows you to create a graph of filters.
Suppose you have 3 inputs: $1, $2, $3. 
Then you can access them as streams [0], [1], [3]. 
The filter syntax allows you to chain multiple filters where each filter is an edge. 
For example, <code>[0]split[t1][t2]</code> creates two vertices t1 and t2 from input 0.
The last statement in your edge will be the output of your command: 
E.g. <code>[t1][t2]vstack</code>
<pre>
ffmpeg -i $1 -i $2 -i $3 -filter_complex "[0]split[t1][t2];[t1][t2]vstack" output.mkv -y
</pre>
===Concatenate Videos===
<pre>
ffmpeg -i part_1.mp4 \
    -i part_2.mp4 \
    -i part_3.mp4 \
    -filter_complex \
    "[0]scale=1920:1080[0s];\
    [1]scale=1920:1080[1s];\
    [2]scale=1920:1080[2s];\
    [0s][0:a][1s][1:a][2s][2:a]concat=n=3:v=1:a=1[v][a]" \
    -map "[v]" -map "[a]" \
    -vsync 2 \
    all_parts.mp4 -y
</pre>
===Replace transparency===
[https://superuser.com/questions/1341674/ffmpeg-convert-transparency-to-a-certain-color Reference]<br>
Add a background to transparent images.<br>
<pre>
ffmpeg -i in.mov -filter_complex "[0]format=pix_fmts=yuva420p,split=2[bg][fg];[bg]drawbox=c=white@1:replace=1:t=fill[bg];[bg][fg]overlay=format=auto" -c:a copy new.mov
</pre>
===Draw Text===
https://stackoverflow.com/questions/15364861/frame-number-overlay-with-ffmpeg
<pre>
ffmpeg -i input -vf "drawtext=fontfile=Arial.ttf: text='%{frame_num}': start_number=1: x=(w-tw)/2: y=h-(2*lh): fontcolor=black: fontsize=20: box=1: boxcolor=white: boxborderw=5" -c:a copy output
</pre>


==C API==
==C API==
A doxygen reference manual for their C api is available at [https://ffmpeg.org/doxygen/trunk/index.html].
A doxygen reference manual for their C api is available at [https://ffmpeg.org/doxygen/trunk/index.html].<br>
Note that FFmpeg is licensed under GPL.<br>
If you only need to do encoding and decoding, you can simply pipe the inputs and outputs of the FFmpeg CLI to your program [https://batchloaf.wordpress.com/2017/02/12/a-simple-way-to-read-and-write-audio-and-video-files-in-c-using-ffmpeg-part-2-video/].<br>
 
===Getting Started===
Best way to get started is to look at the [https://ffmpeg.org/doxygen/trunk/examples.html official examples].
 
====Structs====
* [https://www.ffmpeg.org/doxygen/trunk/structAVInputFormat.html <code>AVInputFormat</code>]/[https://www.ffmpeg.org/doxygen/trunk/structAVOutputFormat.html <code>AVOutputFormat</code>] Represents a container type.
* [https://www.ffmpeg.org/doxygen/trunk/structAVFormatContext.html <code>AVFormatContext</code>] Represents your specific container.
* [https://www.ffmpeg.org/doxygen/trunk/structAVStream.html <code>AVStream</code>] Represents a single audio, video, or data stream in your container.
* [https://www.ffmpeg.org/doxygen/trunk/structAVCodec.html <code>AVCodec</code>] Represents a single codec (e.g. H.264)
* [https://www.ffmpeg.org/doxygen/trunk/structAVCodecContext.html <code>AVCodecContext</code>] Represents your specific codec and contains all associated paramters (e.g. resolution, bitrate, fps).
* [https://www.ffmpeg.org/doxygen/trunk/structAVPacket.html <code>AVPacket</code>] Compressed Data.
* [https://www.ffmpeg.org/doxygen/trunk/structAVFrame.html <code>AVFrame</code>] Decoded audio or video data.
* [https://www.ffmpeg.org/doxygen/trunk/structSwsContext.html <code>SwsContext</code>] Used for image scaling and colorspace and pixel format conversion operations.
 
====Pixel Formats====
[https://www.ffmpeg.org/doxygen/4.0/pixfmt_8h.html Reference]<br>
Pixel formats are stored as <code>AVPixelFormat</code> enums.<br>
Below are descriptions for a few common pixel formats.<br>
Note that the exact sizes of buffers may vary depending on alignment.<br>
 
;AV_PIX_FMT_RGB24
* This is your standard 24 bits per pixel RGB.<br>
* In your AVFrame, data[0] will contain your single buffer RGBRGBRGB.<br>
* Where the linesize is typically <math>3 * width</math> bytes per row and <math>3</math> bytes per pixel.
 
;AV_PIX_FMT_YUV420P
* This is a planar YUV pixel format with chroma subsampling.<br>
* Each pixel will have its own luma component (Y) but each <math>2 \times 2</math> block of pixels will share chrominance components (U, V)<br>
* In your AVFrame, data[0] will contain your Y image, data[1] will contain your .<br>
* Data[0] will typically be <math>width * height</math> bytes.<br>
* Data[1] and data[2] will typically be <math>width * height / 4</math> bytes.<br>
 
===Muxing to memory===
You can specify a custom <code>AVIOContext</code> and attach it to your <code>AVFormatContext->pb</code> to mux directly to memory or to implement your own buffering.
 
===NVENC===
[https://superuser.com/questions/1296374/best-settings-for-ffmpeg-with-nvenc Options Reference]
 
When encoding using NVENC, your <code>codec_ctx->priv_data</code> is a pointer to a <code>NvencContext</code>.
 
To list all of the things you can set in the private data, you can type the following in bash
<syntaxhighlight lang="bash">
ffmpeg -hide_banner -h encoder=h264_nvenc
</syntaxhighlight>
 
{{ hidden | NVENC Codec Ctx |
<syntaxhighlight lang="c++">
  if ((ret = av_hwdevice_ctx_create(&hw_device_ctx, AV_HWDEVICE_TYPE_CUDA, NULL,
                                    NULL, 0)) < 0) {
    cerr << "[VideoEncoder::VideoEncoder] Failed to create hw context" << endl;
    return;
  }
 
  if (!(codec = avcodec_find_encoder_by_name("h264_nvenc"))) {
    cerr << "[VideoEncoder::VideoEncoder] Failed to find h264_nvenc encoder"
        << endl;
    return;
  }
  codec_ctx = avcodec_alloc_context3(codec);
  codec_ctx->bit_rate = 2500000;
  codec_ctx->width = source_codec_ctx->width;
  codec_ctx->height = source_codec_ctx->height;
  codec_ctx->codec_type = AVMEDIA_TYPE_VIDEO;
  codec_ctx->time_base = source_codec_ctx->time_base;
  input_timebase = source_codec_ctx->time_base;
  codec_ctx->framerate = source_codec_ctx->framerate;
  codec_ctx->pix_fmt = AV_PIX_FMT_CUDA;
  codec_ctx->profile = FF_PROFILE_H264_CONSTRAINED_BASELINE;
  codec_ctx->max_b_frames = 0;
  codec_ctx->delay = 0;
  codec_ctx->gop_size = 0;
// Todo: figure out which ones of these do nothing
  av_opt_set(codec_ctx->priv_data, "cq", "23", AV_OPT_SEARCH_CHILDREN);
  av_opt_set(codec_ctx->priv_data, "preset", "llhp", 0);
  av_opt_set(codec_ctx->priv_data, "tune", "zerolatency", 0);
  av_opt_set(codec_ctx->priv_data, "look_ahead", "0", 0);
  av_opt_set(codec_ctx->priv_data, "zerolatency", "1", 0);
  av_opt_set(codec_ctx->priv_data, "nb_surfaces", "0", 0);
</syntaxhighlight>
}}
 
==C++ API==
FFmpeg does not have an official C++ API.<br>
There are wrappers such as [https://github.com/Raveler/ffmpeg-cpp Raveler/ffmpeg-cpp] which you can use.<br>
However, I recommend just using the C API and wrapping things in smart pointers.
 
==Python API==
You can try [https://github.com/PyAV-Org/PyAV pyav] which contains bindings for the library. However I haven't tried it. 
If you just need to call the CLI, you can use [https://github.com/kkroening/ffmpeg-python ffmpeg-python] to help build calls.
 
==JavaScript API==
To use FFmpeg in a browser, see [https://ffmpegwasm.netlify.app/ ffmpegwasm]. 
This is used in https://davidl.me/apps/media/index.html.
 
==My Preferences==
My preferences for encoding video
 
===AV1===
Prefer AV1 for encoding video on on modern devices.
 
 
===H265/HEVC===
H264/HEVC is now a good tradeoff between size, quality, and compatibility.
This has been supported on devices since Android 5.0 (2014).
<syntaxhighlight lang="bash">
ffmpeg -i $1 -c:v libx265 -crf 23 -preset slow -pix_fmt yuv444p10le -c:a libopus -b:a 128K $2
</syntaxhighlight>
 
;Notes
* The pixel format <code>yuv444p10le</code> is 10 bit color without chroma subsampling. If your source is lower, you can use <code>yuv420p</code> instead for 8-bit color and 4:2:0 chroma subsampling.
 
===H264===
If you need compatability with very old and low end devices.
<syntaxhighlight lang="bash">
ffmpeg -i $1 -c:v libx264 -crf 28 -preset medium -pix_fmt yuv420p -c:a libfdk_aac -b:a 128K $2
</syntaxhighlight>
 
===Opus===
 
For streaming:
<syntaxhighlight lang="bash">
ffmpeg -i input.wav -c:a libopus -b:a 96k output.opus
</syntaxhighlight>
 
See https://wiki.xiph.org/Opus_Recommended_Settings