A categorized collection of FFmpeg commands for video automation pipelines
The original (and maintained) cheatsheet can be found in our github repo: https://github.com/rendi-api/ffmpeg-cheatsheetUse this as inspiration for your own work, to troubleshoot your FFmpeg commands, or to explore what others are building in automated media apps.
“I know it burns a tree every time you ask gpt a question, but it beats slogging through 10 year old answers on stackexchange”
-vf
(also -filter:v
) video filter-af
(also -filter:a
) audio filterfilter_complex
Complex filter graph - used for general filtering, controlling both audio and video across all streams[0]
Select all streams from the first input (0-based index)[0:v]
Select the video stream from first input[1:a]
Select the audio stream from second input0:v:0
From first input, first video stream (0-based index)0:a:1
From first input, second audio stream (0-based index)[name]
Select a named stream, usually used with -filter_complex
if
, lte
, gte
and more
-y
Auto-overwrite output files if existing. Add this flag to the beginning of every FFmpeg command to avoid it asking for confirmation of overwriting
-c copy
- Read below
MKV and MP4: Both are video containers and can store H264 and H265 encoded videos and AAC and MP3 encoded audio. The video quality itself is not determined by the container format but rather by the video codec used to compress the video data. MKV can contain several streams of video, while MP4 is a more widely supported on different platforms and devices.Remux MP4 to MOV:
scale=w=1080:h=-1
and let FFmpeg pick the correct height, while making sure we keep the original aspect ratio and the maximum width is 1080.
Specifying -2 scale=w=1080:h=-2
forces to use dimension sizes that are divisible by 2
Notice that we can’t use scale=w=-1:1920
here because it will make FFmpeg pick a width which is larger than 1080, conflicting with the output width we are looking for which is 1080 - resulting in an error.
force_original_aspect_ratio:
͏ Achievable with “force_original_aspect_ratio”. Of 3 possible values:pad=1080:1920:(ow-iw)/2:(oh-ih)/2:color=black Center the resized video and fill the rest with black padding. Values are
͏ |0| “disable” (default)
͏ |1| “decrease”: auto-decrease output dimensions on need.
͏ |2| “increase”: auto-increase output dimensions on need.
width:height:x:y
where x:y is the top left corner. Negative values also place the image at the center, so you can use pad=1080:1920:-1:-1:color=black
for a similar effect.
setsar=1:1 Sample aspect ratio - ensures the output pixels scale exactly to 1x1 per pixel. It could also be set to 1
or 1/1
- these are all the same. In some cases, FFmpeg may set the Sample Aspect Ratio to compensate for ratio change. Expressly state SAR 1:1 to make things work intended.
Create two scaled videos from the same input video using one FFmpeg command - one horizontal and another vertical. To the vertical video add an overlay\logo to the top:
Note: Above command is unexpected in that it hasc:v copy
- it trims in places that are not keyframes without re-encoding so I would have expected to see black frames. But, the output video looks perfect. Also, when trying to explicitly re-encode with-c:v libx264
the output video turned out to be 7 seconds long, longer than the shortest 5 second audio. Searching online I couldn’t find an explanation for both these things.
-ar
Sample rate 16KHz - the amount of digital audio wave samples per second-b:a 48k
(which is the same as -ab
) Bitrate 48KBit/s - which is the amount of data stored per second Stackoverflow reference-ac 1
audio channels - 1 (mono)[1:a]
means audio from file 1, in a 0-based index. [a1]
marks the changed volume audio so that we could mix it with the video’s audio.
[0:a][a1]amix=inputs=2 Takes the audio from the first stream (the video) and the changed volume audio and mixes them together
If you don’t want to change volumes, you can just use this filter instead: -filter_complex "[0:a][1:a]amix=inputs=2:duration=shortest"
:duration=shortest
makes the new audio as short as the shortest audio, the next -shortest
flag is still required because it controls the length of the final output video (and not just its audio)
Nice discussion about cases when video is shorter or longer than audio and you want to align the output video’s length accordingly
An open bug around this topic Using duration:shortest
and -shortest
avoids the implications of the bug.
[0:a]
3-second d=3
fade out t=out
starting from its 2nd second st=2
[1:a]
3-second d=3
fade-in t=in
at start of the audio st=0
v=0
).
-q:a 2 - High quality audio output with an average stereo bitrate of 170-210 KBit/s
pan=mono|c0=.5*c0+.5*c1
The output channel (c0) is made by blending 50% of the left input (c0
) and 50% of the right input (c1
).
dynaudnorm
Applies dynamic audio normalization (smoothens loud/quiet parts)
FFmpeg docs about panning and stereo to mono
atempo=1.5
speeds up audio playback rate while preserving pitch
N
The count of consumed frames\audio samples, not including the current frame for audio, starting from 0FRAME_RATE
\ SR
Video frame rate and audio sample rateTB
The timebase of the input timestamps[1]
[2]
[3]
trim=0.0:4.5 Each crop chunk is a temporary new video starting from the start time and ending in the end time [3]trim=8.5
does not specify an end time, so it will end with the video
Resetting timestamps with setpts=PTS-STARTPTS is required when using trim and concat to make sure that concat works correctly over seemingly separate video streams (the trimmed streams)
crop=min(in_w-300,480):min(in_h-0,720):300:0 The values are width:height:x:y
x,y are the top left corner. The min dimensions ensure FFmpeg won’t crop outside the designated size of the output frame, before scaling. The minimum calculations are not required in this scenario, they are there as placeholders in case you will require different dimensions or x,y positioning
If cropping is outside the boundaries of the frame - the crop will distort the video. In order to handle this, we can use black padding to fill in the gaps:
fontfile=<path_to_file>
, for example: drawtext=text='Get ready':x=50:y=100:fontsize=80:fontcolor=black:fontfile=arial.ttf
drawtext
Explanation of the “Get ready” overlay drawtext=text='Get ready':x=50:y=100:fontsize=80:fontcolor=black:alpha='if(gte(t,1)*lte(t,3),(t-1)/2,1)':box=1:boxcolor=#6bb666@0.6:boxborderw=7:enable='gte(t,1)'
:
*
is the AND operator Display from t = 1salpha='if(gte(t,1)*lte(t,3),(t-1)/2,1)'
Alpha fades in between t=1 to t=3, at all other times it equals 1 (fully opaque)box=1
draws a background behind the text with 7px padding boxborderw=7
x=50:y=100
Top left position of boxtextfile=
and fontfile=
, therefore you need to download the file manually from https://storage.rendi.dev/sample/sample_text.txt and https://storage.rendi.dev/sample/Poppins-Regular.ttf
It is recommended to use textfile instead of specifying the text within the FFmpeg command itself, to avoid issues with special characters that could interfere with the command line syntax.
Poppins
, with custom subtitles style. Notice to use the FontName
(and not the file name) - you can find it when you open the font file. Also, specify the fontsdir
which holds the font file.
You can download the Poppins font file from https://storage.rendi.dev/sample/Poppins-Regular.ttf
Colors are either &HBBGGRR
- blue, green, red or &HAABBGGRR
if you want to add alpha channel (transparency) with FF being 100% transparent and 00 is no transparency.
PrimaryColour
is the font color
🛠️ OutlineColour=&H4066B66B,Outline=1,BorderStyle=3
Configures the green background (40/FF in HEXA is 25% opaque) and #6bb666 color in RGB. In order to make the background appear you have to set Outline=1,BorderStyle=3
Stylizing the background is a bit tricky, this reddit thread has useful info.
Official FFmpeg documentation: How To Burn Subtitles Into Video ; Subtitles filter
If you want to really customize your subtitles’ appearance, the best option is using the ASS subtitles format. A good source of info which I use it constantly.
For pixel-perfect subtitle burning with special effects and unique appearances, it is best to create opaque images outside of any subtitle format and burn images on the video with FFmpeg.
Add a default subtitles srt track to the video and store it in an MKV container, without re-encoding the video, the codec remains H264:
-c:s srt
Subtitle format is srt
-disposition:s:0 default Sets on the default subtitles track
Extract the subtitles from the mkv file:
x=(main_w-overlay_w)/8:y=(main_h-overlay_h)/8
Positions the overlay’s top left corner horizontally at 1/8th of the remaining space from the left and top
main_w\main_h
is the width and height of the main videooverlay_w\overlay_h
is the width and height of the overlay imageenable='gte(t,1)*lte(t,7)'
Controls when the overlay is visible - greater than or equal to 1 seconds and less than or equal to 7 seconds, *
is the AND operator
🛠 If you want FFmpeg to control the overlay’s transparency you can use this command:
[1:v]format=argb,geq=r='r(X,Y)':a='0.5*alpha(X,Y)'[v1]
Creates the transparent logo
[1:v]
Selects the video stream from the second input (the logo)format=argb
Converts the image to ARGB format, so it works with overlay images that don’t have an alpha channela='0.5*alpha(X,Y)'
makes the logo 50% transparent by multiplying the alpha channel by 0.5[v1]
marks this processed logo as a new video stream[1:v][0:v]
First puts the image (background) and on top puts the video.
(W-w)/2:(H-h)/2
centers the video horizontally and vertically on the background image by picking the video’s top left corner accordingly. W\H are background width and height, w\h are video width and height, the capital letters belong to the first specified stream [1:v]
and lower case is the second specified stream [0:v]
. Notice that the order is based on [1:v][0:v]
and not the order of the input files.
Combine intro main and outro to one video and mix with background music:
duration=first
The output audio stream duration should be the like the input stream (the combined audio), dropout_transition=2
creates a fade out effect for the shorter audio so that it won’t cut off abruptly
aformat=sample_fmts=fltp Convert audio format to 32-bit float planar (a commonly used format in FFmpeg), couldn’t find good simple sources for it online
🛠️ Stack two videos vertically and keep the audio of the second video:
-shortest
between the output video and the audio from the second input video we pick the shortest.
-loop 1
Infinitely loop over the input image. -t 10
the duration of the input loop to take is 10 seconds, so even though we infinitely loop the input image, we stop after 10 seconds.
Excellent stackoverflow reference about loops - must read
fade=t=in:st=0:d=1 1-second (d=1
) fade-in (t=in
) at start of the video (st=0
)
🛠️ Create a slideshow video of 5 seconds per input image and background audio, fading between images:
[0:v]...[v0];[1:v]...[v1];[v0][v1]...[v]
First input video stream [0:v]
is filtered with fade in and its result is marked as v0
, then second input video stream is filtered and its result is marked as v1
, then they are concatenated together with xfade and the output video result is marked as v
xfade=transition=fade:duration=0.5:offset=4.5 Starts fade out transition of the first image at its 4.5 second, which lasts 0.5 second, while adding the second image during the transition.
🛠️ Create a Ken Burns style video from images:
z='zoom+0.005'
Every new frame generated adds 0.005 zoom to the previous frame, or, zooms in the previous frame by 1.005
x='iw/2-(iw/zoom/2)':y='ih/2-(ih/zoom/2)'
Pan to the center of the frame
d=100:s=1920x1080:fps=25
specifies that the effect will generate 100 frames (d
), of output resolution s=1920x1080
and 25 frames per second fps
which is 4 seconds effect (100 frames divided by 25 fps)
scale=8000:-1
Used to first upscale the frame and then zoom on it, this avoids a jitteriness bug which occurs with zoompan filter, at the cost of more compute time for upscaling. -1
means to adjust the height so that aspect ratio will be preserved according to 8000px width. Good reads: https://superuser.com/a/1112680/431710 , https://superuser.com/questions/1112617/ffmpeg-smooth-zoompan-with-no-jiggle
zoompan=z=‘if(lte(zoom,1.0),1.5,max(zoom-0.005,1.005))’ This part zooms out of a zoomed-in starting frame. If the zoom factor is less than 1.0 then we set it to 1.5 - this corresponds to the starting frame. Then the command zooms out by 0.005 at each frame until it reaches a zoom factor of 1.005, giving the zoom-out effect, and then stops changing the zoom - keeping it from resetting the zoom out effect.
trim=duration=4
It was not possible to specify -t=4
before the input file and keep this image chunk at 4 seconds (like above in create a video from a looping input image). When trying to do that, the first chunk is of a correct length because of xfade
up to 4 seconds, but the second chunk gets repeated so that the total output video matches the audio’s length. I tried different ways of solving it, but nothing helped. This is probably due to the zoompan filter which basically eliminates the purpose of -t
by specifying the fps and the frames number without specifying a hard maximum cap.
The only thing that worked is to specify the trim duration after the zoompan.
Ken Burns Effect
Blog post about ken burns and FFmpeg:
lte(n\,1)+gt(trunc(t/2),trunc(prev_t/2))
) and accelerating the playing speed by 10 (setpts='PTS*0.1'
)
-loop 0
is the default, and can actually be omitted, stating that the loop is indefinite. To only loops once use -loop 1
Good reference
trim=start=X:end=Y
Cuts video to specified time range, atrim
- corresponding for audio
setpts=PTS-STARTPTS
Resets timestamps to start from 0
fade=t=in:st=0:d=0.5...fade=t=out:st=3.5:d=0.5
See above in creating a slideshow
afade
See above in audio processing
concat=n=2:v=1:a=1
Combines two segments with both video and audio
-q:v
:
-frames:v 1
Output only 1 video frame
Create a thumbnail from the first frame of a scene change:
tile=2X2
to create a 2X2 storyboard from the scenes in a video. Example from FFmpeg’s documentation
vsync is fine to use but is deprecated in newer versions of ffmpeg, -fps_mode
is the change Reference FFmpeg docs
-skip_frame nokey
As the text suggests - skips frames that are not key.
,tile4x2
part
ultrafast
.
-threads 0
Specifies how many system threads to use. Optimal is 0 (and is default), usually it is best to just not use this parameter and let FFmpeg optimize. But sometimes you want to tweak it, depending on your system and command
-c:v
Specifies the video encoder and -c:a
specifies the audio encoder.
-c:a aac
AAC encoded audio. This is also the default for FFmpeg, and a good practice to specify.
-c:a libmp3lame The encoding library for MP3
-an
Disable audio in the output
Apple devices sometimes have issues with FFmpeg generated H265 videos (for example in iOS Airdop), use -vtag hvc1
to solve it. Thanks! Also related
You may need to use -vf format=yuv420p (or the alias -pix_fmt yuv420p) for your output to work in QuickTime and most other players. These players only support the YUV planar color space with 4:2:0 chroma subsampling for H.264 video. Otherwise, depending on your source, ffmpeg may output to a pixel format that may be incompatible with these players.Good info about yuv420p in this reddit thread
-crf
Constant Rate Factor (CRF) - It is the default bitrate control option for libx264 and libx265:
Use this rate control mode if you want to keep the best quality and care less about the file size. This is the recommended rate control mode for most uses. This method allows the encoder to attempt to achieve a certain output quality for the whole file when output file size is of less importance. This provides maximum compression efficiency with a single pass. By adjusting the so-called quantizer for each frame, it gets the bitrate it needs to keep the requested quality level. The downside is that you can’t tell it to get a specific filesize or not go over a specific size or bitrate, which means that this method is not recommended for encoding videos for streaming. The range of the CRF scale is 0–51, where 0 is lossless (for 8 bit only, for 10 bit use -qp 0), 23 is the default, and 51 is worst quality possible. A lower value generally leads to higher quality, and a subjectively sane range is 17–28. Consider 17 or 18 to be visually lossless or nearly so; it should look the same or nearly the same as the input but it isn’t technically lossless. The range is exponential, so increasing the CRF value +6 results in roughly half the bitrate / file size, while -6 leads to roughly twice the bitrate.Common advice is to use
-crf 18
for very high quality H264 output, I found that using -crf 10
results in better quality video.
Use -movflags +faststart to make videos start playing faster online, optimizing for web viewing, by moving metadata to the front of the container:
big_buck_bunny_720p_16sec_h265_faststart.mp4
is indeed encoded with moov faststart, run
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x...] type:'moov' size:... pos:...
libvpx-vp9
can save about 20–50% bitrate compared to libx264
(the default H264 encoder), while retaining the same visual quality.
Constant Quality -crf
encoding in libvpx-vp9
- similar to constant rate factor in libx264
:
To trigger this mode, you must use a combination of-crf
and-b:v 0
. Note that-b:v
MUST be 0. Setting it to anything higher or omitting it entirely will instead invoke the Constrained Quality mode.
The CRF value can be from 0–63. Lower values mean better quality. Recommended values range from 15–35, with 31 being recommended for 1080p HD video.
-c:a libopus
The default audio encoder for WebM is libopus, the above command re-encodes the AAC audio in mp4 to opus in webm.
CPU, speed and multithread controls for vp9
To summarize, here’s what you should do [which bitrate encoding configuration to use], depending on your use case:slhck probably meant two-pass CRF in VP9 for streaming - the first pass lets libvpx-vp9 calculate the desired measures to encode in higher compression in the second pass for reduced file size while keeping quality. This method is more optimized for web hosted videos. Good references:
- Archival — CRF that gives you the quality you want.
- Streaming — Two-pass CRF or ABR with VBV-constrained bitrate.
- Live Streaming — One-pass CRF or ABR with VBV-constrained bitrate, or CBR if you can waste bits.
- Encoding for Devices — Two-pass ABR, typically.
-c copy
whenever possible, it re-muxes the video and audio instead of re-encoding which is compute intensive (especially video re-encoding). -c:v copy
Specifically copies video without re-encoding and -c:a copy
does the same for audio (and is the same as -acodec copy
).
Remuxing involves rewrapping streams into a new container without altering them, unlike transcoding, which changes compression and quality. For example - MP4 can be remuxed to MKV and MOV because they are all containers of H264 codec.
When not to use -c copy
?
-ss
before input):
-ss
after input) “decodes but discards input until the timestamps reach position” - it is frame accurate, but can take longer time to process because needs to decode.
🛠️ When trimming, it is advised to use output seeking without -c:v copy
, re-encoding the output video.
The reasons being:
-c:v copy
Stackoverflow discussion -
FFmpeg repo bug report .
Therefore, it is advised to trim with output seeking (with or without -c:v copy
).
-c:v copy
you can see black frames in the output video,
this is due to c:v copy
copying frames that started after a keyframe, but not the keyframe itself,
which misses out on the data required to do the frames. Read more in FFmpeg’s trac documentation
When used as an input option, seeks in this input file to position. Note that in most formats it is not possible to seek exactly, so ffmpeg will seek to the closest seek point before position. When transcoding and -accurate_seek is enabled (the default), this extra segment between the seek point and position will be decoded and discarded. When doing stream copy or when -noaccurate_seek is used, it will be preserved.Nice answer about this issue in Stackoverflow The re-encoded output video could be in a different bitrate, so you might need to adjust the output bitrate accordingly (see below).