Extracting audio from a video file is the most common FFmpeg task. You have an MP4, MOV, or MKV and you need the audio in a specific format. Maybe you need mono WAV at 16kHz for a speech-to-text pipeline. Maybe you need high-quality MP3 for a podcast. Maybe you just want the audio stream copied out without any quality loss.
FFmpeg handles all of these scenarios with a handful of flags.
This article covers the exact commands for every audio extraction scenario. You get working commands for WAV, MP3, and AAC with full control over codec, channel count, and sample rate. Then we show how to run the same commands at scale using a hosted FFmpeg API when local processing stops making sense.
Key Takeaways
| Output | Command | Best For |
|---|---|---|
| WAV mono 16kHz | ffmpeg -i input.mp4 -vn -ac 1 -ar 16000 -c:a pcm_s16le output.wav | Speech-to-text, Whisper, transcription |
| MP3 high quality | ffmpeg -i input.mp4 -vn -c:a libmp3lame -q:a 0 output.mp3 | Podcasts, distribution |
| AAC stream copy (lossless) | ffmpeg -i input.mp4 -vn -c:a copy output.m4a | Maximum quality, no re-encode |
Key flags to remember: -vn strips video, -ac controls channel count, -ar sets sample rate, -c:a selects the audio codec. Stream copy with -c:a copy is lossless but cannot change format. Re-encoding is required for WAV output or any channel or sample rate change.
Need to process hundreds of files? Offload extraction to Very Good FFmpeg. Same commands, cloud infrastructure, no server management.
Background
What does "extract audio from video" actually mean with FFmpeg?
A video file is a container (MP4, MKV, MOV, AVI) that holds one or more video streams, audio streams, and metadata all muxed together. Extracting audio means demuxing the audio stream and writing it to its own file.
There are two approaches to this.
| Approach | Speed | Quality Change | Can Change Format? |
|---|---|---|---|
Stream copy (-c:a copy) | Fast (no decode) | Lossless | No |
Re-encode (default or -c:a X) | Slower (decode + encode) | Depends on codec | Yes |
Stream copy copies the audio packets as-is. It is fast and has zero quality loss. But it cannot change the codec, sample rate, or channel count. Re-encoding decodes the audio and then re-encodes it. This is slower and can reduce quality, but it gives you full control over the output format.
What are the key FFmpeg flags for audio extraction?
The FFmpeg command line uses these flags to control audio extraction:
-vn-- disable video recording. FFmpeg skips all video streams.-c:a codecor-acodec codec-- set the audio codec. Common values arelibmp3lamefor MP3,pcm_s16lefor WAV,aacfor AAC,copyfor stream copy.-ac N-- set the number of audio channels. 1 for mono, 2 for stereo.-ar freq-- set the audio sample rate in Hz. Typical values: 8000, 16000, 22050, 44100, 48000.-b:a bitrate-- set the audio bitrate. For example,-b:a 192k.-map 0:a-- select all audio streams and exclude video and subtitles.-c copy-- enable stream copy mode. No decoding or encoding.
Why does sample rate and channel count matter?
Sample rate is the number of audio samples captured per second. Higher sample rates capture more detail but produce larger files. The standard CD quality sample rate is 44.1kHz. The industry standard for speech recognition is 16kHz.
Channel count determines whether the output is mono or stereo. Transcription and diarization engines expect mono input. Music and podcast delivery typically use stereo.
Bit depth matters for WAV output. FFmpeg defaults to pcm_s16le, which is 16-bit signed little-endian PCM. This gives a good balance of quality and file size. For mono 16kHz 16-bit audio, expect roughly 1.5 MB of output per minute of audio. For stereo 44.1kHz 16-bit, expect roughly 10 MB per minute.
Main Sections
What is the exact FFmpeg command to extract audio to WAV in mono at a specific sample rate?
The canonical command for extracting audio as WAV mono at a specific sample rate is:
ffmpeg -i input.mp4 -vn -ac 1 -ar 16000 -c:a pcm_s16le output.wavHere is what each flag does:
-i input.mp4-- the source video file.-vn-- no video. Only process audio.-ac 1-- downmix to mono. If the source has stereo or 5.1 audio, FFmpeg performs an automatic downmix.-ar 16000-- resample to 16kHz. The standard sample rate for speech recognition.-c:a pcm_s16le-- 16-bit PCM codec. This is the default for WAV output but it is good practice to be explicit.output.wav-- the output file. The.wavextension tells FFmpeg to use the WAV container.
This exact combination is the universal preprocessing step for speech-to-text engines. OpenAI Whisper, Google Speech-to-Text, Deepgram, and AssemblyAI all expect mono 16-bit 16kHz WAV as input.
If you need different sample rates for different use cases, use these variants:
-ar 44100-- CD quality, suitable for music analysis or high-fidelity archiving.-ar 22050-- voice quality, smaller file than CD quality.-ar 8000-- telephone quality, used for telephony or bandwidth-constrained applications.
The file size depends on the sample rate and channel count. Mono 16kHz audio produces about 1.5 MB per minute. Stereo 44.1kHz audio produces about 10 MB per minute. Keep this in mind when planning disk space or storage costs for large batches.
How do I extract audio without re-encoding (lossless stream copy)?
If the audio codec in your source video is already the format you need, use stream copy. This copies the audio packets directly with zero quality loss and near-instant speed.
ffmpeg -i input.mp4 -vn -c:a copy output.m4aThe file extension must match the source codec. Here is how to handle different scenarios:
- Run
ffprobe input.mp4to check the audio codec of your input file. Look for thecodec_namefield under the audio stream section. - If the source audio is AAC (common in MP4 files), use
.m4aor.aacas the output extension. - If the source audio is MP3 (common in AVI or old files), use
.mp3. - If the source audio is PCM (common in MOV files from cameras), use
.wav.
Stream copy cannot change the codec, sample rate, or channel count. If you need any of those to be different, you must re-encode. The advantage is that stream copy is lossless and nearly instant even for large files.
What is the best audio codec for my use case (WAV vs MP3 vs AAC)?
Each audio codec serves a different purpose. This table shows the tradeoffs.
| Codec | Container | Compression | Quality | Best Use Case |
|---|---|---|---|---|
PCM (pcm_s16le) | WAV | None (lossless) | Perfect | Transcription, editing, archival |
MP3 (libmp3lame) | MP3 | Lossy | Good at 192kbps+ | Podcasts, distribution |
AAC (aac) | M4A | Lossy | Better than MP3 at same bitrate | Streaming, Apple ecosystem |
FLAC (flac) | FLAC | Lossless | Perfect | Music archival |
For transcription pipelines, use WAV mono 16kHz. This is the only format that every speech-to-text engine accepts without conversion. Do not use lossy codecs for transcription. The compression artifacts confuse speech models and reduce accuracy.
For podcast delivery, use MP3 at 192kbps stereo. MP3 has universal player support and produces reasonable file sizes. Use -q:a 0 for the highest VBR quality setting.
For streaming and Apple ecosystem delivery, use AAC at 128-192kbps. AAC produces better quality than MP3 at the same bitrate.
How do I extract audio to MP3 at a specific bitrate and quality?
MP3 extraction requires the libmp3lame encoder. FFmpeg ships with it in most builds.
ffmpeg -i input.mp4 -vn -c:a libmp3lame -q:a 0 output.mp3The -q:a flag controls VBR (variable bitrate) quality on a scale from 0 (best) to 9 (worst). Use -q:a 0 for the highest quality. This typically produces bitrates around 220-260kbps depending on the audio content.
For CBR (constant bitrate), use -b:a instead:
ffmpeg -i input.mp4 -vn -c:a libmp3lame -b:a 192k output.mp3Set the channel count and sample rate for MP3 output the same way as WAV:
ffmpeg -i input.mp4 -vn -ac 1 -ar 22050 -c:a libmp3lame -q:a 0 output.mp3This produces a mono MP3 at 22.05kHz with the highest VBR quality.
How do I extract audio to AAC?
AAC is the default audio format for MP4 containers and the standard for streaming delivery. Extract AAC with:
ffmpeg -i input.mp4 -vn -c:a aac -b:a 192k output.m4aUse the aac encoder that ships with FFmpeg. Set bitrate with -b:a. Good AAC quality starts at 128kbps for stereo content.
For the highest AAC quality, use the libfdk_aac encoder if your FFmpeg build supports it (most distribution builds do not due to licensing):
ffmpeg -i input.mp4 -vn -c:a libfdk_aac -vbr 5 output.m4aThe -vbr flag for libfdk_aac uses a scale of 1 (lowest bitrate) to 5 (highest bitrate).
How do I extract only a portion of the audio (trim with timecodes)?
Add the -ss (start time) and -t (duration) flags to extract a segment:
ffmpeg -i input.mp4 -ss 01:30:00 -t 00:05:00 -vn -ac 1 -ar 16000 clip.wavThis command extracts 5 minutes of audio starting at 1 hour and 30 minutes into the video.
The position of -ss matters for performance:
-ssbefore-i(fast seeking): FFmpeg jumps to the nearest keyframe. The seek is fast but the start point may be a few frames off.-ssafter-i(slow seeking): FFmpeg decodes from the beginning but cuts at the exact frame. The seek is slower but more accurate.
For most audio extraction use cases, fast seeking (before -i) is fine because the audio will be a few milliseconds off at most.
How do I extract a specific audio track from a multi-language video?
Video files with multiple audio tracks are common. Movies often have English, Spanish, and French audio tracks. MKV files from Blu-ray rips can have 6 or more audio streams.
First, list all streams in the input file:
ffprobe input.mkvLook for streams of type audio and note their index numbers.
Extract a specific track by index (zero-indexed):
ffmpeg -i input.mkv -map 0:a:3 -vn output.wavThis extracts the fourth audio track (index 3) and writes it as a WAV file.
Extract all audio tracks to individual files in one command:
ffmpeg -i input.mov -map 0:a:0 eng.wav -map 0:a:1 spa.wav -map 0:a:2 fr.wavThe -map 0:a syntax selects all audio streams and nothing else. It automatically excludes video and subtitle streams. This is cleaner than relying on stream copy behavior because it is explicit about which streams to include.
How do I check the audio codec of my input file?
Use ffprobe to inspect the audio stream of any media file:
ffprobe -v error -show_entries stream=codec_name,channels,sample_rate -of default=noprint_wrappers=1 input.mp4The output shows the codec name, channel count, and sample rate. This tells you whether stream copy is an option and what the correct output file extension should be.
For a more human-readable summary:
ffprobe -v error -show_entries stream=index,codec_type,codec_name,channels,sample_rate input.mp4How do I extract audio at scale (batch processing hundreds of files)?
Processing a single file with FFmpeg is straightforward. Processing 500 podcast episodes or a folder of meeting recordings is a different problem.
A simple bash loop handles small batches locally:
for f in *.mp4; do
ffmpeg -i "$f" -vn -ac 1 -ar 16000 "${f%.mp4}.wav"
doneThis works for a dozen files. It breaks down when you need reliability, monitoring, or parallel processing. Local batch processing ties up your machine. A single encoding error can stop the entire loop. There is no built-in retry logic, no progress tracking, and no way to handle files that are too long for your machine.
This is where a hosted FFmpeg API becomes useful.
How does Very Good FFmpeg handle audio extraction at scale?
Very Good FFmpeg runs the exact same FFmpeg commands on dedicated cloud infrastructure. You pass the same flags you use locally. The only difference is the execution environment.
An audio extraction request looks like this:
curl -X POST https://verygoodffmpeg.com/api/ffmpeg \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"input_files": {
"input.mp4": "https://your-bucket.s3.amazonaws.com/input.mp4"
},
"output_files": ["output.wav"],
"ffmpeg_commands": ["-i {{input.mp4}} -vn -ac 1 -ar 16000 -c:a pcm_s16le {{output.wav}}"]
}'The command inside ffmpeg_commands is the same command you would run on your terminal. Very Good FFmpeg provides an official TypeScript and Python SDK so you can integrate extraction into your application code without managing subprocess calls.
Here is the Python version:
from very_good_ffmpeg import VGF
client = VGF("your-api-key")
job = client.run(
input_files={"input.mp4": "https://your-bucket.s3.amazonaws.com/input.mp4"},
output_files=["output.wav"],
ffmpeg_commands=["-i {{input.mp4}} -vn -ac 1 -ar 16000 -c:a pcm_s16le {{output.wav}}"],
wait=True,
)
print(job.output_files["output.wav"])Why use a hosted API for batch extraction? Here is the comparison.
| Factor | Local FFmpeg | Very Good FFmpeg API |
|---|---|---|
| Infrastructure | Your machine | Cloud (16 vCPU, 32GB RAM, NVMe) |
| Scaling | Manual bash loops | API calls, 100 requests per second |
| Max job runtime | Your machine uptime | 6 hours |
| Cost | Electricity and hardware | $0.50/GB first 10GB, then $0.10/GB |
| Monitoring | DIY logging | Real-time logs, status URLs, webhooks |
| SDKs | None | TypeScript, Python, MCP server |
| Auto-retry | Manual | Built-in |
For a concrete example, imagine processing 500 podcast episodes into mono WAV for a transcription pipeline. Running this locally takes days of wall clock time and ties up your machine. With the API, you fan out the requests in parallel (up to 100 per second) and collect the results via webhooks. The output lands at stable URLs that you can feed directly into your transcription pipeline.
Very Good FFmpeg also handles the edge cases that trip up local extraction. The 6-hour max runtime covers movies, long lectures, and full conference recordings. The 16 vCPU machines with 5+ GHz cores ensure that even large files process quickly. The auto-diagnosis feature analyzes failed FFmpeg commands and tells you exactly what went wrong.
Verdict
For a one-off audio extraction, local FFmpeg is the right tool. The commands in this article cover every scenario you will encounter.
For transcription pipelines, batch processing, or team workflows, offload extraction to a hosted FFmpeg API like Very Good FFmpeg. The commands are identical. The infrastructure, scaling, and monitoring are handled for you. You pay only for what you process, with no monthly minimums or subscription commitments.
Start local. Move to the API when your volume demands it.
FAQ
Does -vn reduce processing time?
Yes. FFmpeg skips decoding video frames entirely when -vn is set. Audio-only extraction runs significantly faster than a full transcode.
What happens if I omit -ac 1 and -ar 16000?
The output WAV file preserves the source channel count and sample rate. This is fine for stereo audio but wrong for transcription pipelines. Speech-to-text engines expect mono 16kHz input specifically.
Can I extract audio to MP3 without quality loss?
No. MP3 is a lossy format. Quality loss is baked into the codec. Use -q:a 0 for the best VBR quality. For truly lossless output, use WAV or FLAC instead.
Why does FFmpeg produce a large WAV file?
WAV is uncompressed PCM audio. File size is determined by sample rate multiplied by bit depth multiplied by channel count multiplied by duration. Mono 16kHz 16-bit audio produces approximately 1.5 MB per minute. Stereo 44.1kHz 16-bit produces approximately 10 MB per minute.
How do I check the audio codec of my input file?
ffprobe -v error -show_entries stream=codec_name,channels,sample_rate -of default=noprint_wrappers=1 input.mp4Can I extract audio from a YouTube video with FFmpeg?
FFmpeg cannot directly download from YouTube. Use yt-dlp to download the video first, then run the extraction commands from this article. Very Good FFmpeg supports YouTube URLs as direct input, which bypasses the two-step download and extract workflow.
What is the difference between -acodec and -c:a?
They are equivalent. -acodec is the older syntax. -c:a is the newer, preferred syntax. Both set the audio codec. Use -c:a for consistency with modern FFmpeg documentation.
Does FFmpeg support GPU-accelerated audio encoding?
Audio encoding is typically CPU-bound and does not benefit significantly from GPU acceleration. Very Good FFmpeg provides GPU instances for video encoding workloads but audio extraction runs on the CPU machines.
How do I preserve original audio quality when extracting?
Use stream copy with -c:a copy if the source codec matches your desired output format. This copies the audio packets without any re-encoding. Zero quality loss, near-instant speed, identical output to the source.
Can I extract audio from a specific portion of a video by timestamp?
Yes. Use -ss for the start time and -t for the duration. For example, ffmpeg -i input.mp4 -ss 00:05:00 -t 00:02:30 -vn -ac 1 -ar 16000 clip.wav extracts 2 minutes and 30 seconds starting at 5 minutes into the video.
References
- FFmpeg Main Options -- official docs
- FFmpeg Audio Options -- official docs
- FFmpeg Stream Copy -- official docs
- FFmpeg WAV Format -- official docs
- FFmpeg Transcoding Pipeline -- official docs
- FFmpeg Audio Channel Manipulation -- FFmpeg Wiki
- Stack Overflow: How can I extract audio from video with FFmpeg? (860k+ views)
- Stack Overflow: Extract audio to WAV with mono and sample rate control (llogan, 850+ upvotes)
- Very Good FFmpeg -- Extract Audio Guide
- Very Good FFmpeg -- Podcast Use Case
- Very Good FFmpeg -- Pricing