FFmpeg Extract Audio From Video: Command for WAV, MP3, AAC With Codec, Channel, and Sample Rate Control

Extracting audio from a video file is the most common FFmpeg task. You have an MP4, MOV, or MKV and you need the audio in a specific format. Maybe you need mono WAV at 16kHz for a speech-to-text pipeline. Maybe you need high-quality MP3 for a podcast. Maybe you just want the audio stream copied out without any quality loss.

FFmpeg handles all of these scenarios with a handful of flags.

This article covers the exact commands for every audio extraction scenario. You get working commands for WAV, MP3, and AAC with full control over codec, channel count, and sample rate. Then we show how to run the same commands at scale using a hosted FFmpeg API when local processing stops making sense.

Key Takeaways

Output	Command	Best For
WAV mono 16kHz	`ffmpeg -i input.mp4 -vn -ac 1 -ar 16000 -c:a pcm_s16le output.wav`	Speech-to-text, Whisper, transcription
MP3 high quality	`ffmpeg -i input.mp4 -vn -c:a libmp3lame -q:a 0 output.mp3`	Podcasts, distribution
AAC stream copy (lossless)	`ffmpeg -i input.mp4 -vn -c:a copy output.m4a`	Maximum quality, no re-encode

Key flags to remember: -vn strips video, -ac controls channel count, -ar sets sample rate, -c:a selects the audio codec. Stream copy with -c:a copy is lossless but cannot change format. Re-encoding is required for WAV output or any channel or sample rate change.

Need to process hundreds of files? Offload extraction to Very Good FFmpeg. Same commands, cloud infrastructure, no server management.

Background

What does "extract audio from video" actually mean with FFmpeg?

A video file is a container (MP4, MKV, MOV, AVI) that holds one or more video streams, audio streams, and metadata all muxed together. Extracting audio means demuxing the audio stream and writing it to its own file.

There are two approaches to this.

Approach	Speed	Quality Change	Can Change Format?
Stream copy (`-c:a copy`)	Fast (no decode)	Lossless	No
Re-encode (default or `-c:a X`)	Slower (decode + encode)	Depends on codec	Yes

Stream copy copies the audio packets as-is. It is fast and has zero quality loss. But it cannot change the codec, sample rate, or channel count. Re-encoding decodes the audio and then re-encodes it. This is slower and can reduce quality, but it gives you full control over the output format.

What are the key FFmpeg flags for audio extraction?

The FFmpeg command line uses these flags to control audio extraction:

-vn -- disable video recording. FFmpeg skips all video streams.
-c:a codec or -acodec codec -- set the audio codec. Common values are libmp3lame for MP3, pcm_s16le for WAV, aac for AAC, copy for stream copy.
-ac N -- set the number of audio channels. 1 for mono, 2 for stereo.
-ar freq -- set the audio sample rate in Hz. Typical values: 8000, 16000, 22050, 44100, 48000.
-b:a bitrate -- set the audio bitrate. For example, -b:a 192k.
-map 0:a -- select all audio streams and exclude video and subtitles.
-c copy -- enable stream copy mode. No decoding or encoding.

Why does sample rate and channel count matter?

Sample rate is the number of audio samples captured per second. Higher sample rates capture more detail but produce larger files. The standard CD quality sample rate is 44.1kHz. The industry standard for speech recognition is 16kHz.

Channel count determines whether the output is mono or stereo. Transcription and diarization engines expect mono input. Music and podcast delivery typically use stereo.

Bit depth matters for WAV output. FFmpeg defaults to pcm_s16le, which is 16-bit signed little-endian PCM. This gives a good balance of quality and file size. For mono 16kHz 16-bit audio, expect roughly 1.5 MB of output per minute of audio. For stereo 44.1kHz 16-bit, expect roughly 10 MB per minute.

Main Sections

What is the exact FFmpeg command to extract audio to WAV in mono at a specific sample rate?

The canonical command for extracting audio as WAV mono at a specific sample rate is:

bash

ffmpeg -i input.mp4 -vn -ac 1 -ar 16000 -c:a pcm_s16le output.wav

Here is what each flag does:

-i input.mp4 -- the source video file.
-vn -- no video. Only process audio.
-ac 1 -- downmix to mono. If the source has stereo or 5.1 audio, FFmpeg performs an automatic downmix.
-ar 16000 -- resample to 16kHz. The standard sample rate for speech recognition.
-c:a pcm_s16le -- 16-bit PCM codec. This is the default for WAV output but it is good practice to be explicit.
output.wav -- the output file. The .wav extension tells FFmpeg to use the WAV container.

This exact combination is the universal preprocessing step for speech-to-text engines. OpenAI Whisper, Google Speech-to-Text, Deepgram, and AssemblyAI all expect mono 16-bit 16kHz WAV as input.

If you need different sample rates for different use cases, use these variants:

-ar 44100 -- CD quality, suitable for music analysis or high-fidelity archiving.
-ar 22050 -- voice quality, smaller file than CD quality.
-ar 8000 -- telephone quality, used for telephony or bandwidth-constrained applications.

The file size depends on the sample rate and channel count. Mono 16kHz audio produces about 1.5 MB per minute. Stereo 44.1kHz audio produces about 10 MB per minute. Keep this in mind when planning disk space or storage costs for large batches.

How do I extract audio without re-encoding (lossless stream copy)?

If the audio codec in your source video is already the format you need, use stream copy. This copies the audio packets directly with zero quality loss and near-instant speed.

bash

ffmpeg -i input.mp4 -vn -c:a copy output.m4a

The file extension must match the source codec. Here is how to handle different scenarios:

Run ffprobe input.mp4 to check the audio codec of your input file. Look for the codec_name field under the audio stream section.
If the source audio is AAC (common in MP4 files), use .m4a or .aac as the output extension.
If the source audio is MP3 (common in AVI or old files), use .mp3.
If the source audio is PCM (common in MOV files from cameras), use .wav.

Stream copy cannot change the codec, sample rate, or channel count. If you need any of those to be different, you must re-encode. The advantage is that stream copy is lossless and nearly instant even for large files.

What is the best audio codec for my use case (WAV vs MP3 vs AAC)?

Each audio codec serves a different purpose. This table shows the tradeoffs.

Codec	Container	Compression	Quality	Best Use Case
PCM (`pcm_s16le`)	WAV	None (lossless)	Perfect	Transcription, editing, archival
MP3 (`libmp3lame`)	MP3	Lossy	Good at 192kbps+	Podcasts, distribution
AAC (`aac`)	M4A	Lossy	Better than MP3 at same bitrate	Streaming, Apple ecosystem
FLAC (`flac`)	FLAC	Lossless	Perfect	Music archival

For transcription pipelines, use WAV mono 16kHz. This is the only format that every speech-to-text engine accepts without conversion. Do not use lossy codecs for transcription. The compression artifacts confuse speech models and reduce accuracy.

For podcast delivery, use MP3 at 192kbps stereo. MP3 has universal player support and produces reasonable file sizes. Use -q:a 0 for the highest VBR quality setting.

For streaming and Apple ecosystem delivery, use AAC at 128-192kbps. AAC produces better quality than MP3 at the same bitrate.

How do I extract audio to MP3 at a specific bitrate and quality?

MP3 extraction requires the libmp3lame encoder. FFmpeg ships with it in most builds.

bash

ffmpeg -i input.mp4 -vn -c:a libmp3lame -q:a 0 output.mp3

The -q:a flag controls VBR (variable bitrate) quality on a scale from 0 (best) to 9 (worst). Use -q:a 0 for the highest quality. This typically produces bitrates around 220-260kbps depending on the audio content.

For CBR (constant bitrate), use -b:a instead:

bash

ffmpeg -i input.mp4 -vn -c:a libmp3lame -b:a 192k output.mp3

Set the channel count and sample rate for MP3 output the same way as WAV:

bash

ffmpeg -i input.mp4 -vn -ac 1 -ar 22050 -c:a libmp3lame -q:a 0 output.mp3

This produces a mono MP3 at 22.05kHz with the highest VBR quality.

How do I extract audio to AAC?

AAC is the default audio format for MP4 containers and the standard for streaming delivery. Extract AAC with:

bash

ffmpeg -i input.mp4 -vn -c:a aac -b:a 192k output.m4a

Use the aac encoder that ships with FFmpeg. Set bitrate with -b:a. Good AAC quality starts at 128kbps for stereo content.

For the highest AAC quality, use the libfdk_aac encoder if your FFmpeg build supports it (most distribution builds do not due to licensing):

bash

ffmpeg -i input.mp4 -vn -c:a libfdk_aac -vbr 5 output.m4a

The -vbr flag for libfdk_aac uses a scale of 1 (lowest bitrate) to 5 (highest bitrate).

How do I extract only a portion of the audio (trim with timecodes)?

Add the -ss (start time) and -t (duration) flags to extract a segment:

bash

ffmpeg -i input.mp4 -ss 01:30:00 -t 00:05:00 -vn -ac 1 -ar 16000 clip.wav

This command extracts 5 minutes of audio starting at 1 hour and 30 minutes into the video.

The position of -ss matters for performance:

-ss before -i (fast seeking): FFmpeg jumps to the nearest keyframe. The seek is fast but the start point may be a few frames off.
-ss after -i (slow seeking): FFmpeg decodes from the beginning but cuts at the exact frame. The seek is slower but more accurate.

For most audio extraction use cases, fast seeking (before -i) is fine because the audio will be a few milliseconds off at most.

How do I extract a specific audio track from a multi-language video?

Video files with multiple audio tracks are common. Movies often have English, Spanish, and French audio tracks. MKV files from Blu-ray rips can have 6 or more audio streams.

First, list all streams in the input file:

bash

ffprobe input.mkv

Look for streams of type audio and note their index numbers.

Extract a specific track by index (zero-indexed):

bash

ffmpeg -i input.mkv -map 0:a:3 -vn output.wav

This extracts the fourth audio track (index 3) and writes it as a WAV file.

Extract all audio tracks to individual files in one command:

bash

ffmpeg -i input.mov -map 0:a:0 eng.wav -map 0:a:1 spa.wav -map 0:a:2 fr.wav

The -map 0:a syntax selects all audio streams and nothing else. It automatically excludes video and subtitle streams. This is cleaner than relying on stream copy behavior because it is explicit about which streams to include.

How do I check the audio codec of my input file?

Use ffprobe to inspect the audio stream of any media file:

bash

ffprobe -v error -show_entries stream=codec_name,channels,sample_rate -of default=noprint_wrappers=1 input.mp4

The output shows the codec name, channel count, and sample rate. This tells you whether stream copy is an option and what the correct output file extension should be.

For a more human-readable summary:

bash

ffprobe -v error -show_entries stream=index,codec_type,codec_name,channels,sample_rate input.mp4

How do I extract audio at scale (batch processing hundreds of files)?

Processing a single file with FFmpeg is straightforward. Processing 500 podcast episodes or a folder of meeting recordings is a different problem.

A simple bash loop handles small batches locally:

bash

for f in *.mp4; do
  ffmpeg -i "$f" -vn -ac 1 -ar 16000 "${f%.mp4}.wav"
done

This works for a dozen files. It breaks down when you need reliability, monitoring, or parallel processing. Local batch processing ties up your machine. A single encoding error can stop the entire loop. There is no built-in retry logic, no progress tracking, and no way to handle files that are too long for your machine.

This is where a hosted FFmpeg API becomes useful.

How does Very Good FFmpeg handle audio extraction at scale?

Very Good FFmpeg runs the exact same FFmpeg commands on dedicated cloud infrastructure. You pass the same flags you use locally. The only difference is the execution environment.

An audio extraction request looks like this:

bash

curl -X POST https://verygoodffmpeg.com/api/ffmpeg \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input_files": {
      "input.mp4": "https://your-bucket.s3.amazonaws.com/input.mp4"
    },
    "output_files": ["output.wav"],
    "ffmpeg_commands": ["-i {{input.mp4}} -vn -ac 1 -ar 16000 -c:a pcm_s16le {{output.wav}}"]
  }'

The command inside ffmpeg_commands is the same command you would run on your terminal. Very Good FFmpeg provides an official TypeScript and Python SDK so you can integrate extraction into your application code without managing subprocess calls.

Here is the Python version:

python

from very_good_ffmpeg import VGF

client = VGF("your-api-key")

job = client.run(
    input_files={"input.mp4": "https://your-bucket.s3.amazonaws.com/input.mp4"},
    output_files=["output.wav"],
    ffmpeg_commands=["-i {{input.mp4}} -vn -ac 1 -ar 16000 -c:a pcm_s16le {{output.wav}}"],
    wait=True,
)

print(job.output_files["output.wav"])

Why use a hosted API for batch extraction? Here is the comparison.

Factor	Local FFmpeg	Very Good FFmpeg API
Infrastructure	Your machine	Cloud (16 vCPU, 32GB RAM, NVMe)
Scaling	Manual bash loops	API calls, 100 requests per second
Max job runtime	Your machine uptime	6 hours
Cost	Electricity and hardware	$0.50/GB first 10GB, then $0.10/GB
Monitoring	DIY logging	Real-time logs, status URLs, webhooks
SDKs	None	TypeScript, Python, MCP server
Auto-retry	Manual	Built-in

For a concrete example, imagine processing 500 podcast episodes into mono WAV for a transcription pipeline. Running this locally takes days of wall clock time and ties up your machine. With the API, you fan out the requests in parallel (up to 100 per second) and collect the results via webhooks. The output lands at stable URLs that you can feed directly into your transcription pipeline.

Very Good FFmpeg also handles the edge cases that trip up local extraction. The 6-hour max runtime covers movies, long lectures, and full conference recordings. The 16 vCPU machines with 5+ GHz cores ensure that even large files process quickly. The auto-diagnosis feature analyzes failed FFmpeg commands and tells you exactly what went wrong.

Verdict

For a one-off audio extraction, local FFmpeg is the right tool. The commands in this article cover every scenario you will encounter.

For transcription pipelines, batch processing, or team workflows, offload extraction to a hosted FFmpeg API like Very Good FFmpeg. The commands are identical. The infrastructure, scaling, and monitoring are handled for you. You pay only for what you process, with no monthly minimums or subscription commitments.

Start local. Move to the API when your volume demands it.

FAQ

Does `-vn` reduce processing time?

Yes. FFmpeg skips decoding video frames entirely when -vn is set. Audio-only extraction runs significantly faster than a full transcode.

What happens if I omit `-ac 1` and `-ar 16000`?

The output WAV file preserves the source channel count and sample rate. This is fine for stereo audio but wrong for transcription pipelines. Speech-to-text engines expect mono 16kHz input specifically.

Can I extract audio to MP3 without quality loss?

No. MP3 is a lossy format. Quality loss is baked into the codec. Use -q:a 0 for the best VBR quality. For truly lossless output, use WAV or FLAC instead.

Why does FFmpeg produce a large WAV file?

WAV is uncompressed PCM audio. File size is determined by sample rate multiplied by bit depth multiplied by channel count multiplied by duration. Mono 16kHz 16-bit audio produces approximately 1.5 MB per minute. Stereo 44.1kHz 16-bit produces approximately 10 MB per minute.

How do I check the audio codec of my input file?

bash

ffprobe -v error -show_entries stream=codec_name,channels,sample_rate -of default=noprint_wrappers=1 input.mp4

Can I extract audio from a YouTube video with FFmpeg?

FFmpeg cannot directly download from YouTube. Use yt-dlp to download the video first, then run the extraction commands from this article. Very Good FFmpeg supports YouTube URLs as direct input, which bypasses the two-step download and extract workflow.

What is the difference between `-acodec` and `-c:a`?

They are equivalent. -acodec is the older syntax. -c:a is the newer, preferred syntax. Both set the audio codec. Use -c:a for consistency with modern FFmpeg documentation.

Does FFmpeg support GPU-accelerated audio encoding?

Audio encoding is typically CPU-bound and does not benefit significantly from GPU acceleration. Very Good FFmpeg provides GPU instances for video encoding workloads but audio extraction runs on the CPU machines.

How do I preserve original audio quality when extracting?

Use stream copy with -c:a copy if the source codec matches your desired output format. This copies the audio packets without any re-encoding. Zero quality loss, near-instant speed, identical output to the source.

Can I extract audio from a specific portion of a video by timestamp?

Yes. Use -ss for the start time and -t for the duration. For example, ffmpeg -i input.mp4 -ss 00:05:00 -t 00:02:30 -vn -ac 1 -ar 16000 clip.wav extracts 2 minutes and 30 seconds starting at 5 minutes into the video.

References

FFmpeg handles all of these scenarios with a handful of flags.

Key Takeaways

Output	Command	Best For
WAV mono 16kHz	`ffmpeg -i input.mp4 -vn -ac 1 -ar 16000 -c:a pcm_s16le output.wav`	Speech-to-text, Whisper, transcription
MP3 high quality	`ffmpeg -i input.mp4 -vn -c:a libmp3lame -q:a 0 output.mp3`	Podcasts, distribution
AAC stream copy (lossless)	`ffmpeg -i input.mp4 -vn -c:a copy output.m4a`	Maximum quality, no re-encode

Need to process hundreds of files? Offload extraction to Very Good FFmpeg. Same commands, cloud infrastructure, no server management.

Background

What does "extract audio from video" actually mean with FFmpeg?

There are two approaches to this.

Approach	Speed	Quality Change	Can Change Format?
Stream copy (`-c:a copy`)	Fast (no decode)	Lossless	No
Re-encode (default or `-c:a X`)	Slower (decode + encode)	Depends on codec	Yes

What are the key FFmpeg flags for audio extraction?

The FFmpeg command line uses these flags to control audio extraction:

-vn -- disable video recording. FFmpeg skips all video streams.
-c:a codec or -acodec codec -- set the audio codec. Common values are libmp3lame for MP3, pcm_s16le for WAV, aac for AAC, copy for stream copy.
-ac N -- set the number of audio channels. 1 for mono, 2 for stereo.
-ar freq -- set the audio sample rate in Hz. Typical values: 8000, 16000, 22050, 44100, 48000.
-b:a bitrate -- set the audio bitrate. For example, -b:a 192k.
-map 0:a -- select all audio streams and exclude video and subtitles.
-c copy -- enable stream copy mode. No decoding or encoding.

Why does sample rate and channel count matter?

Channel count determines whether the output is mono or stereo. Transcription and diarization engines expect mono input. Music and podcast delivery typically use stereo.

Main Sections

What is the exact FFmpeg command to extract audio to WAV in mono at a specific sample rate?

The canonical command for extracting audio as WAV mono at a specific sample rate is:

bash

ffmpeg -i input.mp4 -vn -ac 1 -ar 16000 -c:a pcm_s16le output.wav

Here is what each flag does:

-i input.mp4 -- the source video file.
-vn -- no video. Only process audio.
-ac 1 -- downmix to mono. If the source has stereo or 5.1 audio, FFmpeg performs an automatic downmix.
-ar 16000 -- resample to 16kHz. The standard sample rate for speech recognition.
-c:a pcm_s16le -- 16-bit PCM codec. This is the default for WAV output but it is good practice to be explicit.
output.wav -- the output file. The .wav extension tells FFmpeg to use the WAV container.

This exact combination is the universal preprocessing step for speech-to-text engines. OpenAI Whisper, Google Speech-to-Text, Deepgram, and AssemblyAI all expect mono 16-bit 16kHz WAV as input.

If you need different sample rates for different use cases, use these variants:

-ar 44100 -- CD quality, suitable for music analysis or high-fidelity archiving.
-ar 22050 -- voice quality, smaller file than CD quality.
-ar 8000 -- telephone quality, used for telephony or bandwidth-constrained applications.

How do I extract audio without re-encoding (lossless stream copy)?

If the audio codec in your source video is already the format you need, use stream copy. This copies the audio packets directly with zero quality loss and near-instant speed.

bash

ffmpeg -i input.mp4 -vn -c:a copy output.m4a

The file extension must match the source codec. Here is how to handle different scenarios:

Run ffprobe input.mp4 to check the audio codec of your input file. Look for the codec_name field under the audio stream section.
If the source audio is AAC (common in MP4 files), use .m4a or .aac as the output extension.
If the source audio is MP3 (common in AVI or old files), use .mp3.
If the source audio is PCM (common in MOV files from cameras), use .wav.

What is the best audio codec for my use case (WAV vs MP3 vs AAC)?

Each audio codec serves a different purpose. This table shows the tradeoffs.

Codec	Container	Compression	Quality	Best Use Case
PCM (`pcm_s16le`)	WAV	None (lossless)	Perfect	Transcription, editing, archival
MP3 (`libmp3lame`)	MP3	Lossy	Good at 192kbps+	Podcasts, distribution
AAC (`aac`)	M4A	Lossy	Better than MP3 at same bitrate	Streaming, Apple ecosystem
FLAC (`flac`)	FLAC	Lossless	Perfect	Music archival

For podcast delivery, use MP3 at 192kbps stereo. MP3 has universal player support and produces reasonable file sizes. Use -q:a 0 for the highest VBR quality setting.

For streaming and Apple ecosystem delivery, use AAC at 128-192kbps. AAC produces better quality than MP3 at the same bitrate.

How do I extract audio to MP3 at a specific bitrate and quality?

MP3 extraction requires the libmp3lame encoder. FFmpeg ships with it in most builds.

bash

ffmpeg -i input.mp4 -vn -c:a libmp3lame -q:a 0 output.mp3

For CBR (constant bitrate), use -b:a instead:

bash

ffmpeg -i input.mp4 -vn -c:a libmp3lame -b:a 192k output.mp3

Set the channel count and sample rate for MP3 output the same way as WAV:

bash

ffmpeg -i input.mp4 -vn -ac 1 -ar 22050 -c:a libmp3lame -q:a 0 output.mp3

This produces a mono MP3 at 22.05kHz with the highest VBR quality.

How do I extract audio to AAC?

AAC is the default audio format for MP4 containers and the standard for streaming delivery. Extract AAC with:

bash

ffmpeg -i input.mp4 -vn -c:a aac -b:a 192k output.m4a

Use the aac encoder that ships with FFmpeg. Set bitrate with -b:a. Good AAC quality starts at 128kbps for stereo content.

For the highest AAC quality, use the libfdk_aac encoder if your FFmpeg build supports it (most distribution builds do not due to licensing):

bash

ffmpeg -i input.mp4 -vn -c:a libfdk_aac -vbr 5 output.m4a

The -vbr flag for libfdk_aac uses a scale of 1 (lowest bitrate) to 5 (highest bitrate).

How do I extract only a portion of the audio (trim with timecodes)?

Add the -ss (start time) and -t (duration) flags to extract a segment:

bash

ffmpeg -i input.mp4 -ss 01:30:00 -t 00:05:00 -vn -ac 1 -ar 16000 clip.wav

This command extracts 5 minutes of audio starting at 1 hour and 30 minutes into the video.

The position of -ss matters for performance:

-ss before -i (fast seeking): FFmpeg jumps to the nearest keyframe. The seek is fast but the start point may be a few frames off.
-ss after -i (slow seeking): FFmpeg decodes from the beginning but cuts at the exact frame. The seek is slower but more accurate.

For most audio extraction use cases, fast seeking (before -i) is fine because the audio will be a few milliseconds off at most.

How do I extract a specific audio track from a multi-language video?

Video files with multiple audio tracks are common. Movies often have English, Spanish, and French audio tracks. MKV files from Blu-ray rips can have 6 or more audio streams.

First, list all streams in the input file:

bash

ffprobe input.mkv

Look for streams of type audio and note their index numbers.

Extract a specific track by index (zero-indexed):

bash

ffmpeg -i input.mkv -map 0:a:3 -vn output.wav

This extracts the fourth audio track (index 3) and writes it as a WAV file.

Extract all audio tracks to individual files in one command:

bash

ffmpeg -i input.mov -map 0:a:0 eng.wav -map 0:a:1 spa.wav -map 0:a:2 fr.wav

How do I check the audio codec of my input file?

Use ffprobe to inspect the audio stream of any media file:

bash

ffprobe -v error -show_entries stream=codec_name,channels,sample_rate -of default=noprint_wrappers=1 input.mp4

The output shows the codec name, channel count, and sample rate. This tells you whether stream copy is an option and what the correct output file extension should be.

For a more human-readable summary:

bash

ffprobe -v error -show_entries stream=index,codec_type,codec_name,channels,sample_rate input.mp4

How do I extract audio at scale (batch processing hundreds of files)?

Processing a single file with FFmpeg is straightforward. Processing 500 podcast episodes or a folder of meeting recordings is a different problem.

A simple bash loop handles small batches locally:

bash

for f in *.mp4; do
  ffmpeg -i "$f" -vn -ac 1 -ar 16000 "${f%.mp4}.wav"
done

This is where a hosted FFmpeg API becomes useful.

How does Very Good FFmpeg handle audio extraction at scale?

Very Good FFmpeg runs the exact same FFmpeg commands on dedicated cloud infrastructure. You pass the same flags you use locally. The only difference is the execution environment.

An audio extraction request looks like this:

bash

curl -X POST https://verygoodffmpeg.com/api/ffmpeg \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input_files": {
      "input.mp4": "https://your-bucket.s3.amazonaws.com/input.mp4"
    },
    "output_files": ["output.wav"],
    "ffmpeg_commands": ["-i {{input.mp4}} -vn -ac 1 -ar 16000 -c:a pcm_s16le {{output.wav}}"]
  }'

Here is the Python version:

python

from very_good_ffmpeg import VGF

client = VGF("your-api-key")

job = client.run(
    input_files={"input.mp4": "https://your-bucket.s3.amazonaws.com/input.mp4"},
    output_files=["output.wav"],
    ffmpeg_commands=["-i {{input.mp4}} -vn -ac 1 -ar 16000 -c:a pcm_s16le {{output.wav}}"],
    wait=True,
)

print(job.output_files["output.wav"])

Why use a hosted API for batch extraction? Here is the comparison.

Factor	Local FFmpeg	Very Good FFmpeg API
Infrastructure	Your machine	Cloud (16 vCPU, 32GB RAM, NVMe)
Scaling	Manual bash loops	API calls, 100 requests per second
Max job runtime	Your machine uptime	6 hours
Cost	Electricity and hardware	$0.50/GB first 10GB, then $0.10/GB
Monitoring	DIY logging	Real-time logs, status URLs, webhooks
SDKs	None	TypeScript, Python, MCP server
Auto-retry	Manual	Built-in

ffprobe -v error -show_entries stream=codec_name,channels,sample_rate -of default=noprint_wrappers=1 input.mp4