Python Subprocess FFmpeg: Extract Audio From Video as WAV Mono 16kHz for Transcription

You are building a transcription pipeline. You have video files and you need clean audio for a speech-to-text model like Whisper. The standard input format is wav mono 16khz 16-bit PCM.

Python subprocess calling ffmpeg is the most direct way to extract audio from video. It works on your local machine. It breaks when you try to deploy to production.

This article covers the exact ffmpeg command for audio extraction, the common Python pitfalls, and when you should offload extraction to a hosted FFmpeg API.

Key Takeaways

The canonical ffmpeg command for speech-to-text preprocessing is ffmpeg -i input.mp4 -ac 1 -ar 16000 -acodec pcm_s16le -vn output.wav.
Whisper uses a piped variant: ffmpeg -nostdin -threads 0 -i input.mp4 -f s16le -ac 1 -acodec pcm_s16le -ar 16000 -.
Common subprocess pitfalls: ffmpeg binary not in PATH (exit 127), f-string formatting bugs, shell=True security risks, and pipe deadlocks on large files.
The ffmpeg-python wrapper library (11k stars) still requires the ffmpeg system binary. It does not solve serverless deployment problems.
Serverless platforms like AWS Lambda can bundle ffmpeg but hit 15-minute timeouts and 512MB /tmp limits. Vercel and Cloudflare Workers cannot run ffmpeg reliably.
Very Good FFmpeg provides a Python SDK that accepts the exact same ffmpeg commands without requiring a local binary, with 6-hour runtimes and optional GPU.

What Is the Exact FFmpeg Command to Extract Audio for Speech Recognition?

The universal ffmpeg command for extracting audio as wav mono 16khz for transcription is:

bash

ffmpeg -i input.mp4 -ac 1 -ar 16000 -acodec pcm_s16le -vn output.wav

The flags break down as follows. -ac 1 forces mono output by downmixing stereo or multichannel audio to a single channel. -ar 16000 sets the sample rate to 16kHz, which is the standard input frequency for most speech recognition models. -acodec pcm_s16le specifies signed 16-bit little-endian PCM encoding, the WAV format that Whisper and similar tools expect. -vn disables video stream processing so only audio is written to the output file.

OpenAI Whisper uses a variant that pipes raw PCM data to stdout instead of writing a file:

bash

ffmpeg -nostdin -threads 0 -i input.mp4 -f s16le -ac 1 -acodec pcm_s16le -ar 16000 -

The -f s16le flag outputs raw signed 16-bit little-endian PCM samples instead of a WAV container. The trailing - tells ffmpeg to write to stdout. Whisper reads this pipe with subprocess.run().stdout and converts the raw bytes to a numpy float32 array. This approach avoids creating a temporary file on disk.

How Do You Run FFmpeg From Python Using Subprocess?

The most reliable way to call ffmpeg from Python is using subprocess.run with a list of arguments. This avoids shell injection risks and makes error handling straightforward.

python

import subprocess

def extract_audio(input_path, output_path):
    result = subprocess.run(
        ["ffmpeg", "-i", input_path,
         "-ac", "1", "-ar", "16000",
         "-acodec", "pcm_s16le",
         "-vn", output_path,
         "-y"],
        capture_output=True,
        text=True,
    )
    if result.returncode != 0:
        raise RuntimeError(f"ffmpeg failed: {result.stderr}")
    return output_path

The -y flag overwrites the output file if it exists. capture_output=True captures both stdout and stderr so you can inspect ffmpeg error messages when something goes wrong. The list form of arguments avoids shell interpretation entirely.

To match the Whisper approach and pipe audio directly to your Python process without a temp file:

python

import subprocess
import numpy as np

def load_audio(file_path, sample_rate=16000):
    result = subprocess.run(
        ["ffmpeg", "-nostdin", "-threads", "0",
         "-i", file_path,
         "-f", "s16le",
         "-ac", "1",
         "-acodec", "pcm_s16le",
         "-ar", str(sample_rate),
         "-"],
        capture_output=True,
        check=True,
    )
    audio = np.frombuffer(result.stdout, dtype=np.int16).astype(np.float32) / 32768.0
    return audio

This function returns a float32 numpy array normalized to the [-1, 1] range, exactly as Whisper expects it. No temporary WAV file is created.

What Are the Most Common Pitfalls When Using FFmpeg From Python?

FFmpeg Binary Not Found in PATH

Exit code 127 means the operating system cannot find the ffmpeg binary. This happens when ffmpeg is not installed or when the Python process runs in a restricted environment.

Check for the binary before running ffmpeg:

python

import shutil
ffmpeg_path = shutil.which("ffmpeg")
if ffmpeg_path is None:
    raise RuntimeError("ffmpeg is not installed or not in PATH")

On Linux, ffmpeg is typically at /usr/bin/ffmpeg. On macOS Homebrew, it is at /usr/local/bin/ffmpeg on Intel or /opt/homebrew/bin/ffmpeg on Apple Silicon. Windows requires either manual installation or a package manager like Chocolatey. Docker containers must include ffmpeg in the image explicitly.

F-String Formatting Bugs

Python f-strings require the f prefix before the opening quote. Forgetting it results in literal braces in the command string, which ffmpeg silently ignores or interprets incorrectly.

python

# Wrong: no f prefix
command = "ffmpeg -i {input} -ac 1 -ar 16000 {output}"

# Right: f prefix
command = f"ffmpeg -i {input} -ac 1 -ar 16000 {output}"

# Also right: .format()
command = "ffmpeg -i {input} -ac 1 -ar 16000 {output}".format(input=input, output=output)

Always verify that your command string has the correct variable values before calling subprocess. Print the command to the console during development to catch these bugs early.

Subprocess Deadlocks on Large Files

When you pipe ffmpeg output with subprocess.PIPE and do not read from the pipe, the buffer fills up and the process hangs. This is especially common with long videos that produce large amounts of audio data.

Use subprocess.run with capture_output=True instead of the lower-level subprocess.Popen with manual pipe management. The run function reads both stdout and stderr into memory, so no deadlock occurs. For very large files, consider writing to a temp file and reading it afterward instead of using pipes.

Cross-Platform Binary Paths

Your development machine is probably macOS or Linux. Your production server might run Amazon Linux, Ubuntu, or a Windows container. The ffmpeg binary path differs across these environments.

Use shutil.which("ffmpeg") at runtime to locate the binary regardless of the operating system. Never hardcode paths like /usr/local/bin/ffmpeg in your code.

Does the Ffmpeg-Python Library Solve the Binary Problem?

The ffmpeg-python library provides a fluent Python API for building ffmpeg command chains. It has 11k stars on GitHub and is widely used for complex filter graphs.

python

import ffmpeg

stream = ffmpeg.input("input.mp4")
stream = ffmpeg.output(stream, "output.wav", ac=1, ar="16000")
ffmpeg.run(stream)

This is cleaner than raw subprocess calls but it does not solve the fundamental problem. The library is a pure Python wrapper that generates ffmpeg CLI arguments and calls subprocess.run internally. It still requires the ffmpeg binary to be installed and accessible in PATH.

The library's own documentation states: ffmpeg-python makes no attempt to download or install FFmpeg. You must install it separately with a package manager or download a prebuilt binary.

Can You Run FFmpeg in AWS Lambda?

AWS Lambda supports custom container images up to 10GB in size. You can install ffmpeg in a Docker image and run it as a Lambda function.

dockerfile

FROM public.ecr.aws/lambda/provided:al2
RUN yum install -y ffmpeg
COPY function.py .
CMD ["function.handler"]

This approach works for short audio clips but has hard limits. Lambda functions timeout after 15 minutes. The /tmp directory has 512MB of storage. Processing a 2-hour video file or a batch of 50 files will exceed these limits.

Lambda is viable for on-demand extraction of short video clips under a few minutes. It is not a solution for batch processing, long videos, or high-throughput pipelines.

Can Cloudflare Workers or Vercel Functions Run FFmpeg?

Cloudflare Workers run on V8 isolates that cannot spawn operating system processes. You cannot call ffmpeg from a Worker at all. The ffmpeg.wasm library runs FFmpeg compiled to WebAssembly, but it executes in the browser, not on the server, and has limited codec support and poor performance.

Vercel Serverless Functions have a 60-second timeout and a 50MB function size limit. The ffmpeg binary alone is larger than 50MB on most platforms. Even if you could fit it, the timeout is too short for any meaningful audio extraction.

Neither platform supports running ffmpeg reliably for production transcription pipelines.

What Is the Alternative to Running FFmpeg Yourself?

A hosted FFmpeg API accepts your ffmpeg commands over HTTP, runs them on dedicated infrastructure, and returns the output. You send the exact same ffmpeg arguments you would use locally. The API handles the compute, storage, and binary management.

Very Good FFmpeg provides a Python SDK for this workflow:

python

from very_good_ffmpeg import VGF

client = VGF("your-key")

result = client.run(
    input_files={"input.mp4": "https://example.com/video.mp4"},
    output_files=["output.wav"],
    ffmpeg_commands=[
        "-i {{input.mp4}} -ac 1 -ar 16000 -acodec pcm_s16le -vn {{output.wav}}"
    ],
    wait=True,
)

The API accepts public URLs or uploaded files. You get back a download URL for the extracted audio. The same ffmpeg commands work without modification.

Key advantages over self-hosted ffmpeg:

Factor	Local subprocess	AWS Lambda	Very Good FFmpeg
ffmpeg binary	Must install	Must bundle in container	Managed
Max runtime	Unlimited	15 min	6 hours
/tmp storage	Disk dependent	512 MB	32 GB RAM
GPU support	If you have one	No	RTX 4090/5090
Batch processing	Manual	Concurrency limits	Unlimited
Pricing	Free (compute cost)	Per request + infra	Per GB processed

The per-GB pricing model means you pay for the data you process, not for idle server time. The first 2GB are free, then $0.50/GB dropping to $0.08/GB above 100GB. There is no monthly minimum and no credit expiry.

When Should You Use a Hosted FFmpeg API Instead of Local Subprocess?

Use local subprocess when you are prototyping on your development machine with a few short files. The setup is simple and there is no external dependency.

Switch to a hosted API when any of these conditions apply:

Your videos are longer than 15 minutes and hit Lambda timeouts.
You process batches of videos and need to scale horizontally.
You deploy to Vercel, Cloudflare Workers, or other platforms that cannot run ffmpeg.
You need GPU acceleration for the transcription step and do not want to manage GPU infrastructure.
You want to avoid maintaining ffmpeg versions across multiple environments.
Your pipeline needs to run reliably without devops intervention for binary updates.

A common pattern is to prototype locally with subprocess and deploy to production with the hosted API. The ffmpeg commands are identical in both environments.

How Does Very Good FFmpeg Compare to Running FFmpeg Locally for Transcription Pipelines?

Very Good FFmpeg runs the same ffmpeg binaries under the hood. The difference is where and how they execute.

Local execution ties your pipeline to a specific machine. If the machine runs out of disk space, the pipeline fails. If you deploy to a new server without ffmpeg installed, the pipeline fails. If you process a 4K video that needs more RAM than your instance has, the pipeline fails.

A hosted API abstracts these concerns. Each job runs on dedicated hardware with 16 vCPUs, 32GB RAM, and NVMe storage. GPU jobs run on RTX 4090 or A5000 cards. The 6-hour runtime limit covers even the longest video files. Concurrent job limits are based on your prepaid balance, not on shared infrastructure.

For a transcription pipeline that processes user-uploaded videos, this means no queue backpressure, no timeout failures on long videos, and no GPU provisioning delays.

Verdict

Python subprocess with ffmpeg is the right way to extract audio from video for transcription. The canonical command ffmpeg -i input.mp4 -ac 1 -ar 16000 -acodec pcm_s16le -vn output.wav is reliable and well-understood.

The problem is not the command. It is the environment. Local development works until you deploy. Lambda works until you hit the 15-minute wall. Vercel and Cloudflare Workers cannot run ffmpeg at all.

For production transcription pipelines, offload audio extraction to a hosted FFmpeg API. You keep the exact same ffmpeg commands and eliminate the binary dependency, timeout limits, and infrastructure management.

FAQ

What is the exact ffmpeg command to extract audio for Whisper?

Whisper uses ffmpeg -nostdin -threads 0 -i input.mp4 -f s16le -ac 1 -acodec pcm_s16le -ar 16000 - which pipes raw PCM to stdout. The equivalent file-based command is ffmpeg -i input.mp4 -ac 1 -ar 16000 -acodec pcm_s16le -vn output.wav.

Why do speech models need wav mono 16khz?

Speech recognition models are trained on audio with consistent properties. Mono 16kHz 16-bit PCM is the industry standard format. Whisper, DeepSpeech, Wav2Vec2, and Kaldi all expect this format as input.

What does exit code 127 mean when running ffmpeg from Python?

Exit code 127 means the operating system cannot find the ffmpeg binary. Run shutil.which("ffmpeg") in Python to verify ffmpeg is installed and accessible in the PATH.

Can I use ffmpeg-python without installing ffmpeg?

No. ffmpeg-python is a wrapper that generates ffmpeg CLI commands and calls them with subprocess. The ffmpeg binary must be installed on the system.

How do I extract audio from video in AWS Lambda?

Package ffmpeg in a custom container image using a Dockerfile with yum install ffmpeg and deploy as a Lambda function. Note the 15-minute timeout and 512MB /tmp limits.

Can Cloudflare Workers run ffmpeg?

No. Cloudflare Workers cannot spawn operating system processes. The ffmpeg.wasm library runs in the browser, not on Cloudflare's server runtime.

What is the best way to extract audio at scale?

Use a hosted FFmpeg API that accepts the same commands as local ffmpeg. This avoids binary management, timeout limits, and infrastructure scaling.

Is Very Good FFmpeg compatible with Whisper?

Yes. You send the exact ffmpeg command for audio extraction to the API and receive the output WAV file. The Whisper model processes the result the same way as a locally extracted file.

How much does hosted ffmpeg cost for audio extraction?

Very Good FFmpeg charges per GB of data processed. The first 2GB are free. After that, $0.50/GB with volume discounts to $0.08/GB above 100GB. A 10-minute 1080p video is roughly 50-100MB, so extraction costs pennies.

Should I use subprocess or ffmpeg-python for my project?

Use subprocess directly for simple extraction tasks. Use ffmpeg-python if you need complex filter graphs with multiple inputs and outputs. Both require the ffmpeg binary.

References

Whisper audio.py: https://github.com/openai/whisper/blob/main/whisper/audio.py
Whisper README (ffmpeg dependency): https://github.com/openai/whisper
Stack Overflow: exit code 127 with subprocess and ffmpeg: https://stackoverflow.com/questions/59408588
Stack Overflow: Python extract wav from video: https://stackoverflow.com/questions/26741116
Stack Overflow: f-string formatting bug with ffmpeg: https://stackoverflow.com/questions/52197883
ffmpeg-python GitHub: https://github.com/kkroening/ffmpeg-python
ffmpeg.wasm docs: https://ffmpegwasm.netlify.app/docs/overview
ffmpeg Audio Options: https://ffmpeg.org/ffmpeg.html#Audio-Options
Reddit: mono 16kHz standard for speech preprocessing: https://www.reddit.com/r/audioengineering/comments/lub4yq
AWS blog: running ffmpeg on Lambda using container images: https://aws.amazon.com/blogs/media/running-ffmpeg-on-aws-lambda-using-container-images/
Very Good FFmpeg docs: https://verygoodffmpeg.com/docs
Very Good FFmpeg site: https://verygoodffmpeg.com

You are building a transcription pipeline. You have video files and you need clean audio for a speech-to-text model like Whisper. The standard input format is wav mono 16khz 16-bit PCM.

Python subprocess calling ffmpeg is the most direct way to extract audio from video. It works on your local machine. It breaks when you try to deploy to production.

This article covers the exact ffmpeg command for audio extraction, the common Python pitfalls, and when you should offload extraction to a hosted FFmpeg API.

Key Takeaways

The canonical ffmpeg command for speech-to-text preprocessing is ffmpeg -i input.mp4 -ac 1 -ar 16000 -acodec pcm_s16le -vn output.wav.
Whisper uses a piped variant: ffmpeg -nostdin -threads 0 -i input.mp4 -f s16le -ac 1 -acodec pcm_s16le -ar 16000 -.
Common subprocess pitfalls: ffmpeg binary not in PATH (exit 127), f-string formatting bugs, shell=True security risks, and pipe deadlocks on large files.
The ffmpeg-python wrapper library (11k stars) still requires the ffmpeg system binary. It does not solve serverless deployment problems.
Serverless platforms like AWS Lambda can bundle ffmpeg but hit 15-minute timeouts and 512MB /tmp limits. Vercel and Cloudflare Workers cannot run ffmpeg reliably.
Very Good FFmpeg provides a Python SDK that accepts the exact same ffmpeg commands without requiring a local binary, with 6-hour runtimes and optional GPU.

What Is the Exact FFmpeg Command to Extract Audio for Speech Recognition?

The universal ffmpeg command for extracting audio as wav mono 16khz for transcription is:

bash

ffmpeg -i input.mp4 -ac 1 -ar 16000 -acodec pcm_s16le -vn output.wav

OpenAI Whisper uses a variant that pipes raw PCM data to stdout instead of writing a file:

bash

ffmpeg -nostdin -threads 0 -i input.mp4 -f s16le -ac 1 -acodec pcm_s16le -ar 16000 -

How Do You Run FFmpeg From Python Using Subprocess?

The most reliable way to call ffmpeg from Python is using subprocess.run with a list of arguments. This avoids shell injection risks and makes error handling straightforward.

python

import subprocess

def extract_audio(input_path, output_path):
    result = subprocess.run(
        ["ffmpeg", "-i", input_path,
         "-ac", "1", "-ar", "16000",
         "-acodec", "pcm_s16le",
         "-vn", output_path,
         "-y"],
        capture_output=True,
        text=True,
    )
    if result.returncode != 0:
        raise RuntimeError(f"ffmpeg failed: {result.stderr}")
    return output_path

To match the Whisper approach and pipe audio directly to your Python process without a temp file:

python

import subprocess
import numpy as np

def load_audio(file_path, sample_rate=16000):
    result = subprocess.run(
        ["ffmpeg", "-nostdin", "-threads", "0",
         "-i", file_path,
         "-f", "s16le",
         "-ac", "1",
         "-acodec", "pcm_s16le",
         "-ar", str(sample_rate),
         "-"],
        capture_output=True,
        check=True,
    )
    audio = np.frombuffer(result.stdout, dtype=np.int16).astype(np.float32) / 32768.0
    return audio

This function returns a float32 numpy array normalized to the [-1, 1] range, exactly as Whisper expects it. No temporary WAV file is created.

What Are the Most Common Pitfalls When Using FFmpeg From Python?

FFmpeg Binary Not Found in PATH

Exit code 127 means the operating system cannot find the ffmpeg binary. This happens when ffmpeg is not installed or when the Python process runs in a restricted environment.

Check for the binary before running ffmpeg:

python

import shutil
ffmpeg_path = shutil.which("ffmpeg")
if ffmpeg_path is None:
    raise RuntimeError("ffmpeg is not installed or not in PATH")

F-String Formatting Bugs

Python f-strings require the f prefix before the opening quote. Forgetting it results in literal braces in the command string, which ffmpeg silently ignores or interprets incorrectly.

python

# Wrong: no f prefix
command = "ffmpeg -i {input} -ac 1 -ar 16000 {output}"

# Right: f prefix
command = f"ffmpeg -i {input} -ac 1 -ar 16000 {output}"

# Also right: .format()
command = "ffmpeg -i {input} -ac 1 -ar 16000 {output}".format(input=input, output=output)

Always verify that your command string has the correct variable values before calling subprocess. Print the command to the console during development to catch these bugs early.

Subprocess Deadlocks on Large Files

Cross-Platform Binary Paths

Your development machine is probably macOS or Linux. Your production server might run Amazon Linux, Ubuntu, or a Windows container. The ffmpeg binary path differs across these environments.

Use shutil.which("ffmpeg") at runtime to locate the binary regardless of the operating system. Never hardcode paths like /usr/local/bin/ffmpeg in your code.

Does the Ffmpeg-Python Library Solve the Binary Problem?

The ffmpeg-python library provides a fluent Python API for building ffmpeg command chains. It has 11k stars on GitHub and is widely used for complex filter graphs.

python

import ffmpeg

stream = ffmpeg.input("input.mp4")
stream = ffmpeg.output(stream, "output.wav", ac=1, ar="16000")
ffmpeg.run(stream)

The library's own documentation states: ffmpeg-python makes no attempt to download or install FFmpeg. You must install it separately with a package manager or download a prebuilt binary.

Can You Run FFmpeg in AWS Lambda?

AWS Lambda supports custom container images up to 10GB in size. You can install ffmpeg in a Docker image and run it as a Lambda function.

dockerfile

FROM public.ecr.aws/lambda/provided:al2
RUN yum install -y ffmpeg
COPY function.py .
CMD ["function.handler"]

Lambda is viable for on-demand extraction of short video clips under a few minutes. It is not a solution for batch processing, long videos, or high-throughput pipelines.

Can Cloudflare Workers or Vercel Functions Run FFmpeg?

Neither platform supports running ffmpeg reliably for production transcription pipelines.

What Is the Alternative to Running FFmpeg Yourself?

Very Good FFmpeg provides a Python SDK for this workflow:

python

from very_good_ffmpeg import VGF

client = VGF("your-key")

result = client.run(
    input_files={"input.mp4": "https://example.com/video.mp4"},
    output_files=["output.wav"],
    ffmpeg_commands=[
        "-i {{input.mp4}} -ac 1 -ar 16000 -acodec pcm_s16le -vn {{output.wav}}"
    ],
    wait=True,
)

The API accepts public URLs or uploaded files. You get back a download URL for the extracted audio. The same ffmpeg commands work without modification.

Key advantages over self-hosted ffmpeg:

Factor	Local subprocess	AWS Lambda	Very Good FFmpeg
ffmpeg binary	Must install	Must bundle in container	Managed
Max runtime	Unlimited	15 min	6 hours
/tmp storage	Disk dependent	512 MB	32 GB RAM
GPU support	If you have one	No	RTX 4090/5090
Batch processing	Manual	Concurrency limits	Unlimited
Pricing	Free (compute cost)	Per request + infra	Per GB processed

When Should You Use a Hosted FFmpeg API Instead of Local Subprocess?

Use local subprocess when you are prototyping on your development machine with a few short files. The setup is simple and there is no external dependency.

Switch to a hosted API when any of these conditions apply:

Your videos are longer than 15 minutes and hit Lambda timeouts.
You process batches of videos and need to scale horizontally.
You deploy to Vercel, Cloudflare Workers, or other platforms that cannot run ffmpeg.
You need GPU acceleration for the transcription step and do not want to manage GPU infrastructure.
You want to avoid maintaining ffmpeg versions across multiple environments.
Your pipeline needs to run reliably without devops intervention for binary updates.

A common pattern is to prototype locally with subprocess and deploy to production with the hosted API. The ffmpeg commands are identical in both environments.

Whisper audio.py: https://github.com/openai/whisper/blob/main/whisper/audio.py
Whisper README (ffmpeg dependency): https://github.com/openai/whisper
Stack Overflow: exit code 127 with subprocess and ffmpeg: https://stackoverflow.com/questions/59408588
Stack Overflow: Python extract wav from video: https://stackoverflow.com/questions/26741116
Stack Overflow: f-string formatting bug with ffmpeg: https://stackoverflow.com/questions/52197883
ffmpeg-python GitHub: https://github.com/kkroening/ffmpeg-python
ffmpeg.wasm docs: https://ffmpegwasm.netlify.app/docs/overview
ffmpeg Audio Options: https://ffmpeg.org/ffmpeg.html#Audio-Options
Reddit: mono 16kHz standard for speech preprocessing: https://www.reddit.com/r/audioengineering/comments/lub4yq
AWS blog: running ffmpeg on Lambda using container images: https://aws.amazon.com/blogs/media/running-ffmpeg-on-aws-lambda-using-container-images/
Very Good FFmpeg docs: https://verygoodffmpeg.com/docs
Very Good FFmpeg site: https://verygoodffmpeg.com