FFmpeg Cheatsheet
A categorized collection of FFmpeg commands for video automation pipelines
The original (and maintained) cheatsheet can be found in our github repo: https://github.com/rendi-api/ffmpeg-cheatsheet
Use this as inspiration for your own work, to troubleshoot your FFmpeg commands, or to explore what others are building in automated media apps.
How to use this cheatsheet?
-
Use the table of contents, to browse through the main topics covered.
-
Use Ctrl + F to find what you need. All commands, filters and flags are explained throughout the document. If you don’t see a specific explanation, it means that it appears somewhere else in this document and you can search for it (and if you don’t find it please open an issue and I will take care of it).
-
All sample commands can immediately run from your local machine, since they use sample files that are stored online and FFmpeg is able to download them locally.
-
I have attached the original reference to each command, filter, flag, keyword, and explanation. Use these to dig deeper into FFmpeg, and in general - video formats. Some of the findings are my own, in which case no reference was specified.
-
To make the most of this cheatsheet, it is best to use it along with your favorite LLM (or MCP server or AI agent). A few ways of doing that:
- Just copy the full text intro your LLM and let the LLM index of all the information found here. Make sure to copy the RAW version of the MD file for best results.
- You can also just refer the LLM to the URL of this file and have it index it.
- If you’re interested in a specific command - copy it into the LLM and chat with the LLM to alter this command according to your specific needs
- You should find all the explanations you require for all the commands within this document and the references it provides. Still, you can always copy a command into the chat interface and have the LLM elaborate.
“I know it burns a tree every time you ask gpt a question, but it beats slogging through 10 year old answers on stackexchange”
About LLMs and FFmpeg
I used LLMs as much as I could to make the work for this file as easy as possible, still, all commands and explanations have been tested and vetted by me manually. Many of them I have used in the pre-GPT era - hinting at how old I’m getting (🙈)
LLMs miss out on FFmpeg because it sometimes requires accuracy and attention to fine details that are hard to find online, especially when working with complex filters. I like to use it as a sophisticated search and summarization engine - pointing out specific details and keywords that I then validate online.
🛠️ - Headlines marked with 🛠️ are ones that were especially hard to find correct solutions or explanations with LLMs, or are too important to trust LLMs with the info, so I did manual research and trial and error.
Glossary of common flags\filters
For those looking to optimize their existing FFmpeg commands, skip to the section starting at Command settings
-vf
(also-filter:v
) video filter-af
(also-filter:a
) audio filterfilter_complex
Complex filter graph - used for general filtering, controlling both audio and video across all streams
Common filter keywords (you can change the numbers to specify the required index):
[0]
Select all streams from the first input (0-based index)[0:v]
Select the video stream from first input[1:a]
Select the audio stream from second input0:v:0
From first input, first video stream (0-based index)0:a:1
From first input, second audio stream (0-based index)[name]
Select a named stream, usually used with-filter_complex
-map [name] Selecting stream for output
Expression evaluations if
, lte
, gte
and more
-y
Auto-overwrite output files if existing. Add this flag to the beginning of every FFmpeg command to avoid it asking for confirmation of overwriting
Simple editing
Converting formats
Remux MP4 to MKV:
-c copy
- Read below
MKV and MP4: Both are video containers and can store H264 and H265 encoded videos and AAC and MP3 encoded audio. The video quality itself is not determined by the container format but rather by the video codec used to compress the video data.
MKV can contain several streams of video, while MP4 is a more widely supported on different platforms and devices.
Remux MP4 to MOV:
Encode MP4 to AVI:
More about video encoding below
Resizing and padding
🛠️ Upscale the video to 1080X1920 preserving the original aspect ratio and adding black padding to fill in gaps as needed:
scale=w=1080:h=1920:force_original_aspect_ratio=decrease resize video to fit inside 1080x1920, will automatically lower output dimensions to be equal or below the specified width and height, while fitting the original aspect ratio of the input. In this case, will down-scale the input to 1080X810, before adding padding.
If you are unsure about the height (or width) required to keep the original aspect ratio, you can specify scale=w=1080:h=-1
and let FFmpeg pick the correct height, while making sure we keep the original aspect ratio and the maximum width is 1080.
Specifying -2 scale=w=1080:h=-2
forces to use dimension sizes that are divisible by 2
Notice that we can’t use scale=w=-1:1920
here because it will make FFmpeg pick a width which is larger than 1080, conflicting with the output width we are looking for which is 1080 - resulting in an error.
͏ Achievable with “force_original_aspect_ratio”. Of 3 possible values:
͏ |0| “disable” (default)
͏ |1| “decrease”: auto-decrease output dimensions on need.
͏ |2| “increase”: auto-increase output dimensions on need.
pad=1080:1920:(ow-iw)/2:(oh-ih)/2:color=black Center the resized video and fill the rest with black padding. Values are width:height:x:y
where x:y is the top left corner. Negative values also place the image at the center, so you can use pad=1080:1920:-1:-1:color=black
for a similar effect.
setsar=1:1 Sample aspect ratio - ensures the output pixels scale exactly to 1x1 per pixel. It could also be set to 1
or 1/1
- these are all the same. In some cases, FFmpeg may set the Sample Aspect Ratio to compensate for ratio change. Expressly state SAR 1:1 to make things work intended.
Create two scaled videos from the same input video using one FFmpeg command - one horizontal and another vertical. To the vertical video add an overlay\logo to the top:
Two stackoverflow sources of info I constantly use Link 1 ; Link 2
Trim by time
There are faster ways to trim, but they are less accurate or can create black frames.
For the advanced explanation see input\output seeking below
Audio Processing
Replace audio in video
Replace the audio in the video with a new audio file
-shortest Trims the video’s end to be as short as the audio. If you want to keep the video length you can remove this flag (and the output will be muted after 5 seconds)
Note: Above command is unexpected in that it has
c:v copy
- it trims in places that are not keyframes without re-encoding so I would have expected to see black frames. But, the output video looks perfect. Also, when trying to explicitly re-encode with-c:v libx264
the output video turned out to be 7 seconds long, longer than the shortest 5 second audio. Searching online I couldn’t find an explanation for both these things.
Extract audio from video
Encode MP4 to MP3:
Extract the audio from an MP4 video, downsample it to 16,000 Hz, convert it to mono MP3, also extract the video (muted):
-ar
Sample rate 16KHz - the amount of digital audio wave samples per second-b:a 48k
(which is the same as-ab
) Bitrate 48KBit/s - which is the amount of data stored per second Stackoverflow reference-ac 1
audio channels - 1 (mono)
Extract AAC audio from MP4 without encoding it:
Mix the audio in video
Mix the audio in the video with a new audio file and lower its volume:
[1:a]volume=0.2[a1] Lowers the volume of the audio file so we could also hear the audio from the video file, [1:a]
means audio from file 1, in a 0-based index. [a1]
marks the changed volume audio so that we could mix it with the video’s audio.
[0:a][a1]amix=inputs=2 Takes the audio from the first stream (the video) and the changed volume audio and mixes them together
If you don’t want to change volumes, you can just use this filter instead: -filter_complex "[0:a][1:a]amix=inputs=2:duration=shortest"
:duration=shortest
makes the new audio as short as the shortest audio, the next -shortest
flag is still required because it controls the length of the final output video (and not just its audio)
Nice discussion about cases when video is shorter or longer than audio and you want to align the output video’s length accordingly
An open bug around this topic Using duration:shortest
and -shortest
avoids the implications of the bug.
Combine two mp3 tracks
[0:a]afade=t=out:st=2:d=3…[1:a]afade=t=in:st=0:d=3 Fade out the first and fade in the second:
- First input audio
[0:a]
3-secondd=3
fade outt=out
starting from its 2nd secondst=2
- Second audio
[1:a]
3-secondd=3
fade-int=in
at start of the audiost=0
[a0][a1]concat=n=2:v=0:a=1 Concatenates the two faded audio streams back together to create 1 output audio stream, no video (v=0
).
-q:a 2 - High quality audio output with an average stereo bitrate of 170-210 KBit/s
Crossfade
acrossfade=d=3:c1=exp:c2=qsin 3-second audio crossfade where first track fades out quickly while second track fades in slowly
Change audio format
MP3 to WAV pcm_s32le (unsigned 32-bit little-endian) format, mono and 48KHz sample frequency:
Merge the audio from two mp4 files, mix them into mono equally, normalizes the volume, downsample to 16 kHz, encode as MP3 at 64 KBits/s:
pan=mono|c0=.5*c0+.5*c1
The output channel (c0) is made by blending 50% of the left input (c0
) and 50% of the right input (c1
).
dynaudnorm
Applies dynamic audio normalization (smoothens loud/quiet parts)
FFmpeg docs about panning and stereo to mono
Advanced editing
Change playback speed, without distorting audio
setpts=PTS/1.5 speeds up video by 1.5x. atempo=1.5
speeds up audio playback rate while preserving pitch
Change video frame per second without changing audio speed
Jump cuts
Used for making clips shorter, silence removal, removing transitions, etc.
setpts=N/FRAME_RATE/TB…asetpts=N/SR/TB Reset video and audio presentation timestamps according to the trims requested
N
The count of consumed frames\audio samples, not including the current frame for audio, starting from 0FRAME_RATE
\SR
Video frame rate and audio sample rateTB
The timebase of the input timestamps
🛠️ Video cropping for social media
Crop a 1080X720 video to 720X1080 by cropping chunks of video to 480X720 and upscaling them by 1.5 at specific time frames to create a vertical social media video:
split=3[1][2][3] Splits the input video into 3 chunks and names them [1]
[2]
[3]
trim=0.0:4.5 Each crop chunk is a temporary new video starting from the start time and ending in the end time [3]trim=8.5
does not specify an end time, so it will end with the video
Resetting timestamps with setpts=PTS-STARTPTS is required when using trim and concat to make sure that concat works correctly over seemingly separate video streams (the trimmed streams)
crop=min(in_w-300,480):min(in_h-0,720):300:0 The values are width:height:x:y
x,y are the top left corner. The min dimensions ensure FFmpeg won’t crop outside the designated size of the output frame, before scaling. The minimum calculations are not required in this scenario, they are there as placeholders in case you will require different dimensions or x,y positioning
If cropping is outside the boundaries of the frame - the crop will distort the video. In order to handle this, we can use black padding to fill in the gaps:
Overlay text on video
Overlay three different text messages on a video, each appearing at a specific time, with a fade-in alpha effect and a semi-transparent background box.:
If you have a locally stored font file, you can specify it using: fontfile=<path_to_file>
, for example: drawtext=text='Get ready':x=50:y=100:fontsize=80:fontcolor=black:fontfile=arial.ttf
Explanation of the “Get ready” overlay drawtext=text='Get ready':x=50:y=100:fontsize=80:fontcolor=black:alpha='if(gte(t,1)*lte(t,3),(t-1)/2,1)':box=1:boxcolor=#6bb666@0.6:boxborderw=7:enable='gte(t,1)'
:
- enable=‘gte(t,1)’ Controls when the overlay is visible - greater than or equal to 1 seconds.
*
is the AND operator Display from t = 1s alpha='if(gte(t,1)*lte(t,3),(t-1)/2,1)'
Alpha fades in between t=1 to t=3, at all other times it equals 1 (fully opaque)box=1
draws a background behind the text with 7px paddingboxborderw=7
- boxcolor=#6bb666@0.6 — greenish background #6bb666 at 60% opacity.
x=50:y=100
Top left position of box
Add text overlay to video from a text file and font file:
FFmpeg does not download the files within textfile=
and fontfile=
, therefore you need to download the file manually from https://storage.rendi.dev/sample/sample_text.txt and https://storage.rendi.dev/sample/Poppins-Regular.ttf
It is recommended to use textfile instead of specifying the text within the FFmpeg command itself, to avoid issues with special characters that could interfere with the command line syntax.
🛠️ Add subtitles to a video
This command burns subtitles with a custom font - Poppins
, with custom subtitles style. Notice to use the FontName
(and not the file name) - you can find it when you open the font file. Also, specify the fontsdir
which holds the font file.
You can download the Poppins font file from https://storage.rendi.dev/sample/Poppins-Regular.ttf
Colors are either &HBBGGRR
- blue, green, red or &HAABBGGRR
if you want to add alpha channel (transparency) with FF being 100% transparent and 00 is no transparency.
PrimaryColour
is the font color
🛠️ OutlineColour=&H4066B66B,Outline=1,BorderStyle=3
Configures the green background (40/FF in HEXA is 25% opaque) and #6bb666 color in RGB. In order to make the background appear you have to set Outline=1,BorderStyle=3
Stylizing the background is a bit tricky, this reddit thread has useful info.
Official FFmpeg documentation: How To Burn Subtitles Into Video ; Subtitles filter
If you want to really customize your subtitles’ appearance, the best option is using the ASS subtitles format. A good source of info which I use it constantly.
For pixel-perfect subtitle burning with special effects and unique appearances, it is best to create opaque images outside of any subtitle format and burn images on the video with FFmpeg.
Add a default subtitles srt track to the video and store it in an MKV container, without re-encoding the video, the codec remains H264:
-c:s srt
Subtitle format is srt
-disposition:s:0 default Sets on the default subtitles track
Extract the subtitles from the mkv file:
Extract the subtitles from the mkv file
Combine media assets
Overlay an image on video - add logo\watermark to video:
The above command puts an overlay with a transparent background on top of the video
x=(main_w-overlay_w)/8:y=(main_h-overlay_h)/8
Positions the overlay’s top left corner horizontally at 1/8th of the remaining space from the left and top
main_w\main_h
is the width and height of the main videooverlay_w\overlay_h
is the width and height of the overlay image
enable='gte(t,1)*lte(t,7)'
Controls when the overlay is visible - greater than or equal to 1 seconds and less than or equal to 7 seconds, *
is the AND operator
🛠 If you want FFmpeg to control the overlay’s transparency you can use this command:
[1:v]format=argb,geq=r='r(X,Y)':a='0.5*alpha(X,Y)'[v1]
Creates the transparent logo
[1:v]
Selects the video stream from the second input (the logo)format=argb
Converts the image to ARGB format, so it works with overlay images that don’t have an alpha channel- geq=‘p(X,Y)’ defines the color of the pixel of the logo at point X,Y to be the color from the original image. It is required in order to exactly control the transparency of the pixel
a='0.5*alpha(X,Y)'
makes the logo 50% transparent by multiplying the alpha channel by 0.5[v1]
marks this processed logo as a new video stream
Put video on top of a background image - creating a video in a new resolution and aspect ratio:
[1:v][0:v]
First puts the image (background) and on top puts the video.
(W-w)/2:(H-h)/2
centers the video horizontally and vertically on the background image by picking the video’s top left corner accordingly. W\H are background width and height, w\h are video width and height, the capital letters belong to the first specified stream [1:v]
and lower case is the second specified stream [0:v]
. Notice that the order is based on [1:v][0:v]
and not the order of the input files.
Combine intro main and outro to one video and mix with background music:
duration=first
The output audio stream duration should be the like the input stream (the combined audio), dropout_transition=2
creates a fade out effect for the shorter audio so that it won’t cut off abruptly
aformat=sample_fmts=fltp Convert audio format to 32-bit float planar (a commonly used format in FFmpeg), couldn’t find good simple sources for it online
🛠️ Stack two videos vertically and keep the audio of the second video:
shortest=1 from the two video streams we follow the shortest when vstacking both. -shortest
between the output video and the audio from the second input video we pick the shortest.
Asset generation
Image to video
Create a 10 second video from a looping input image and audio file, image fades into view:
Above command runs slowly because it is downloading the image frame for every video frame. To make it run faster, download the png locally and run the command with the local file.
-loop 1
Infinitely loop over the input image. -t 10
the duration of the input loop to take is 10 seconds, so even though we infinitely loop the input image, we stop after 10 seconds.
Excellent stackoverflow reference about loops - must read
fade=t=in:st=0:d=1 1-second (d=1
) fade-in (t=in
) at start of the video (st=0
)
🛠️ Create a slideshow video of 5 seconds per input image and background audio, fading between images:
The resulting video is 9.5 seconds because there is an overlap of 0.5 second when fading from the first image to the second image. First image is faded in and last image is faded out.
[0:v]...[v0];[1:v]...[v1];[v0][v1]...[v]
First input video stream [0:v]
is filtered with fade in and its result is marked as v0
, then second input video stream is filtered and its result is marked as v1
, then they are concatenated together with xfade and the output video result is marked as v
xfade=transition=fade:duration=0.5:offset=4.5 Starts fade out transition of the first image at its 4.5 second, which lasts 0.5 second, while adding the second image during the transition.
🛠️ Create a Ken Burns style video from images:
Command creates a video from two input images and background audio. Zooms in on the first image’s center, plays it for 4 seconds and fade transitions to the next image. Second image zooms out from its left side while playing it for 4 seconds. Output is 7 seconds long because of 1 second fade transition between the two image chunks and command shortens the audio to match the video
z='zoom+0.005'
Every new frame generated adds 0.005 zoom to the previous frame, or, zooms in the previous frame by 1.005
x='iw/2-(iw/zoom/2)':y='ih/2-(ih/zoom/2)'
Pan to the center of the frame
d=100:s=1920x1080:fps=25
specifies that the effect will generate 100 frames (d
), of output resolution s=1920x1080
and 25 frames per second fps
which is 4 seconds effect (100 frames divided by 25 fps)
scale=8000:-1
Used to first upscale the frame and then zoom on it, this avoids a jitteriness bug which occurs with zoompan filter, at the cost of more compute time for upscaling. -1
means to adjust the height so that aspect ratio will be preserved according to 8000px width. Good reads: https://superuser.com/a/1112680/431710 , https://superuser.com/questions/1112617/ffmpeg-smooth-zoompan-with-no-jiggle
zoompan=z=‘if(lte(zoom,1.0),1.5,max(zoom-0.005,1.005))’ This part zooms out of a zoomed-in starting frame. If the zoom factor is less than 1.0 then we set it to 1.5 - this corresponds to the starting frame. Then the command zooms out by 0.005 at each frame until it reaches a zoom factor of 1.005, giving the zoom-out effect, and then stops changing the zoom - keeping it from resetting the zoom out effect.
trim=duration=4
It was not possible to specify -t=4
before the input file and keep this image chunk at 4 seconds (like above in create a video from a looping input image). When trying to do that, the first chunk is of a correct length because of xfade
up to 4 seconds, but the second chunk gets repeated so that the total output video matches the audio’s length. I tried different ways of solving it, but nothing helped. This is probably due to the zoompan filter which basically eliminates the purpose of -t
by specifying the fps and the frames number without specifying a hard maximum cap.
The only thing that worked is to specify the trim duration after the zoompan.
Blog post about ken burns and FFmpeg:
Create GIFs
Create a looping gif from video auto-scaled to 320px width, taking every 2nd frame (lte(n\,1)+gt(trunc(t/2),trunc(prev_t/2))
) and accelerating the playing speed by 10 (setpts='PTS*0.1'
)
-loop 0
is the default, and can actually be omitted, stating that the loop is indefinite. To only loops once use -loop 1
Turn video frames into a video compilation
Create a video compilation based on single input video which gets split into parts, with fade effects:
The command takes two segments from the input video (11-15 seconds and 21-25 seconds) applies fade in/out effects to each segment and concatenates both.
trim=start=X:end=Y
Cuts video to specified time range, atrim
- corresponding for audio
setpts=PTS-STARTPTS
Resets timestamps to start from 0
fade=t=in:st=0:d=0.5...fade=t=out:st=3.5:d=0.5
See above in creating a slideshow
afade
See above in audio processing
concat=n=2:v=1:a=1
Combines two segments with both video and audio
Create thumbnails from video
Create a thumbnail from the frame in second 7:
To control the output image quality use -q:v
:
Values are from 2 to 31, 2 being the best and 31 being the worst. References: Stackoverflow 1 Stackoverflow 2
Creates two thumbnails - one from the first frame after second 5 and one from the first frame after second 15:
-frames:v 1
Output only 1 video frame
Create a thumbnail from the first frame of a scene change:
gt(scene,0.4) parameter determines FFmpeg’s sensitivity to changes in frame indicating scene change. The value is from 0 to 1, lower values mean FFmpeg will be more sensitive to scene changes and will recognize more scene changes. Recommended values are from 0.3 to 0.5
Good stackoverflow discussion about detecting scenes with Ffmpeg
Create an image thumbnail from input images
Create a storyboard from a video
All commands below extract frames from video to create different storyboards
🛠️ Use tile=2X2
to create a 2X2 storyboard from the scenes in a video. Example from FFmpeg’s documentation
Create the same storyboard but with separate image files per scene
🛠️ -vsync 0 drops frames that belong to the same scene so there are no duplication. This parameter is complex to use, good explanation
vsync is fine to use but is deprecated in newer versions of ffmpeg,
-fps_mode
is the change Reference FFmpeg docs
Create storyboard with several files that are tiled, base the frames the video’s keyframes instead of scenes. Example is from FFmpeg’s documentation
-skip_frame nokey
As the text suggests - skips frames that are not key.
Create 4X2 tile files from every 10th frame of a video. To just create images per frame, remove the ,tile4x2
part
Command settings
Generic, simple and optimized FFmpeg command for daily use
This command re-sizes the input video and is good for archiving, streaming (non-live) and playing on many different edge devices. You can usually use the flags in this command for all your FFmpeg commands, unless you have a specific reason not to.
The parameters in this command have different configuration options. Make sure to read through their FFmpeg references. Let me know if you would like me to elaborate about them here.
-tune fastdecode Encoded output will require less computational power to decode - good for viewing in many different edge devices. You can use zerolatency for optimization for fast encoding and low latency streaming
-preset veryslow Slower encoding, but with a more compressed output keeping the high quality - good when optimize for web-viewing (VOD, archiving, non-live streaming). If your require very fast encoding, at the cost of larger output file use ultrafast
.
-threads 0
Specifies how many system threads to use. Optimal is 0 (and is default), usually it is best to just not use this parameter and let FFmpeg optimize. But sometimes you want to tweak it, depending on your system and command
🛠️ Video\Audio encoding, codecs and bitrate
-c:v
Specifies the video encoder and -c:a
specifies the audio encoder.
-c:a aac
AAC encoded audio. This is also the default for FFmpeg, and a good practice to specify.
-c:a libmp3lame The encoding library for MP3
-an
Disable audio in the output
-c:v libx264 - H264 (AVC)
Generally FFmpeg will default to H264 when asking for MP4 output, unless you’re not using an FFmpeg build with libx264. It’s good practice to always specify the codec.
libx265 - H265 (HEVC), the newer codec, is very similar in behavior and controls. H264 is still the most commonly used.
Apple devices sometimes have issues with FFmpeg generated H265 videos (for example in iOS Airdop), use
-vtag hvc1
to solve it. Thanks! Also related
format=yuv420p H264 YUV planar color format, is used for playback compatibility in most players. Use this flag when transforming images to video, unless you have specific reasons not to.
You may need to use -vf format=yuv420p (or the alias -pix_fmt yuv420p) for your output to work in QuickTime and most other players. These players only support the YUV planar color space with 4:2:0 chroma subsampling for H.264 video. Otherwise, depending on your source, ffmpeg may output to a pixel format that may be incompatible with these players.
Good info about yuv420p in this reddit thread
-crf
Constant Rate Factor (CRF) - It is the default bitrate control option for libx264 and libx265:
Use this rate control mode if you want to keep the best quality and care less about the file size. This is the recommended rate control mode for most uses.
This method allows the encoder to attempt to achieve a certain output quality for the whole file when output file size is of less importance. This provides maximum compression efficiency with a single pass. By adjusting the so-called quantizer for each frame, it gets the bitrate it needs to keep the requested quality level. The downside is that you can’t tell it to get a specific filesize or not go over a specific size or bitrate, which means that this method is not recommended for encoding videos for streaming.
The range of the CRF scale is 0–51, where 0 is lossless (for 8 bit only, for 10 bit use -qp 0), 23 is the default, and 51 is worst quality possible. A lower value generally leads to higher quality, and a subjectively sane range is 17–28. Consider 17 or 18 to be visually lossless or nearly so; it should look the same or nearly the same as the input but it isn’t technically lossless. The range is exponential, so increasing the CRF value +6 results in roughly half the bitrate / file size, while -6 leads to roughly twice the bitrate.
Common advice is to use -crf 18
for very high quality H264 output, I found that using -crf 10
results in better quality video.
Use -movflags +faststart to make videos start playing faster online, optimizing for web viewing, by moving metadata to the front of the container:
YouTube recommend uploading MP4 files with faststart to YouTube. They will then re-encode these to VP9.
Fast start is supported in MP4, M4A and MOV and could take a few seconds to process I couldn’t find an official place that states that faststart works with libx265, but the following command shows that it does work:
In order to make sure that big_buck_bunny_720p_16sec_h265_faststart.mp4
is indeed encoded with moov faststart, run
and check that towards the beginning there is a line that resembles [mov,mp4,m4a,3gp,3g2,mj2 @ 0x...] type:'moov' size:... pos:...
libvpx-vp9
It is the VP9 video encoder for WebM, an open, royalty-free media file format. VP9 is owned by google, and most videos on YouTube are encoded with it. It is an encoding designed and optimized for static web hosted video. libvpx-vp9
can save about 20–50% bitrate compared to libx264
(the default H264 encoder), while retaining the same visual quality.
Constant Quality -crf
encoding in libvpx-vp9
- similar to constant rate factor in libx264
:
To trigger this mode, you must use a combination of
-crf
and-b:v 0
. Note that-b:v
MUST be 0. Setting it to anything higher or omitting it entirely will instead invoke the Constrained Quality mode.
The CRF value can be from 0–63. Lower values mean better quality. Recommended values range from 15–35, with 31 being recommended for 1080p HD video.
-c:a libopus
The default audio encoder for WebM is libopus, the above command re-encodes the AAC audio in mp4 to opus in webm.
CPU, speed and multithread controls for vp9
VP9, libx264 and libx265 support 1-pass and 2-pass encodings (you can read about these in their respective references). slhck summarized it well::
To summarize, here’s what you should do [which bitrate encoding configuration to use], depending on your use case:
- Archival — CRF that gives you the quality you want.
- Streaming — Two-pass CRF or ABR with VBV-constrained bitrate.
- Live Streaming — One-pass CRF or ABR with VBV-constrained bitrate, or CBR if you can waste bits.
- Encoding for Devices — Two-pass ABR, typically.
slhck probably meant two-pass CRF in VP9 for streaming - the first pass lets libvpx-vp9 calculate the desired measures to encode in higher compression in the second pass for reduced file size while keeping quality. This method is more optimized for web hosted videos.
Good references:
- Another very good reference by slhck about crf
- Stackoverflow CRF in FFmpeg
- Reddit discussion about CRF
- Reddit discussion about CRF VS CQP VS CBR and GPU encoding
- Reddit discussion about CBR and CQP
- Reddit discussion about CRF and 2-pass
🛠️ -c copy
Use -c copy
whenever possible, it re-muxes the video and audio instead of re-encoding which is compute intensive (especially video re-encoding). -c:v copy
Specifically copies video without re-encoding and -c:a copy
does the same for audio (and is the same as -acodec copy
).
Remuxing involves rewrapping streams into a new container without altering them, unlike transcoding, which changes compression and quality. For example - MP4 can be remuxed to MKV and MOV because they are all containers of H264 codec.
When not to use -c copy
?
- When applying video filters (scale, overlay, subtitles, trim, fade) or mixing or modifying audio (amix, atempo, volume) - these require re-encoding
- For precise trimming (frame-accurate) -c copy can only cut at keyframes, leading to rough/inaccurate edits.
- Burning subtitles into the video requires re-encoding
- Transcoding between different codecs requires re-encoding
- If you want to compress media
🛠️ Input\Output seeking
Input seeking (-ss
before input):
Parses the video by keyframe, making it very fast, but less accurate (in h624 with 25fps there is a keyframe every 10 seconds).
If you trim the video with input seeking, it resets the timestamps of the video to the trimmed version, so when using filters you need to make sure to adhere to the video times after trim.
Output seeking (-ss
after input) “decodes but discards input until the timestamps reach position” - it is frame accurate, but can take longer time to process because needs to decode.
🛠️ When trimming, it is advised to use output seeking without -c:v copy
, re-encoding the output video.
The reasons being:
-
There is an open bug with trimming with input seeking and
-c:v copy
Stackoverflow discussion - FFmpeg repo bug report . Therefore, it is advised to trim with output seeking (with or without-c:v copy
). -
When trimming with output seeking with
-c:v copy
you can see black frames in the output video, this is due toc:v copy
copying frames that started after a keyframe, but not the keyframe itself, which misses out on the data required to do the frames. Read more in FFmpeg’s trac documentation
This excerpt from FFmpeg’s documentation sums it all:
When used as an input option, seeks in this input file to position. Note that in most formats it is not possible to seek exactly, so ffmpeg will seek to the closest seek point before position. When transcoding and -accurate_seek is enabled (the default), this extra segment between the seek point and position will be decoded and discarded. When doing stream copy or when -noaccurate_seek is used, it will be preserved.
Nice answer about this issue in Stackoverflow
The re-encoded output video could be in a different bitrate, so you might need to adjust the output bitrate accordingly (see below).
🛠️ Use GPU for acceleration
Transcode video from AVI to H264 (AVC) using Nvidia GPU:
Transcode video from AVI to H265 (HEVC) using Nvidia GPU:
Transcode using the Intel GPU - Quick Sync Video (QSV) encoder:
More complicated, and less supported, encoding via AMD GPUs via the Mesa VAAPI driver
Misc
FFmpeg Installation
List the formats your FFmpeg build supports:
List the codecs your FFmpeg build supports:
FFprobe
It provides structured metadata about media files. Show detailed stream information of a video file:
Credits
- www.bigbuckbunny.org for the video and image files.
- Music credit to https://www.fiftysounds.com/music/neon-lights.mp3
- National Library of Congress and Paramount Pictures for Popeye https://www.loc.gov/item/2023602008/