If you build software that processes video, you will eventually need three operations: stitch clips together (concatenation), change dimensions (resize), and cut segments (trim). These are the building blocks of any video editing pipeline.
A handful of APIs offer these operations: Shotstack, Creatomate, Cloudinary, Mux, AWS Elemental MediaConvert, and hosted FFmpeg services like Very Good FFmpeg. Each takes a different approach. Some wrap video editing in JSON templates. Some use URL parameters. Some give you raw FFmpeg commands.
This guide compares each API specifically for concatenate, resize, and trim operations. It covers pricing, capability depth, and where each option falls short. If you already know FFmpeg, one of these options will feel like home.
Key Takeaways
- Most capable for all three operations: A hosted FFmpeg API. You get the full FFmpeg filter graph. Concat demuxer, scale filter, trim filter, all in one command. No syntax lock-in.
- Easiest for URL-based trim and resize: Cloudinary. Simple URL parameters. No server-side code. Concatenation is limited to video overlays.
- Best for streaming clipping workflows: Mux. Frame-accurate clipping from stored assets. No concatenation or resize for output files.
- Best for template-driven teams: Shotstack or Creatomate. Good if you need JSON-based timelines or template automation. Expensive at volume. Limited filter control.
- Cheapest at scale: Hosted FFmpeg API with usage-based pricing. Per-minute APIs cost more as batch volume grows.
- Bottom line: If you can write FFmpeg, a hosted FFmpeg API does all three operations without compromise. If you need templates or a visual editor, pick Shotstack or Creatomate and accept the trade-offs.
What does "video editing API" mean in this comparison?
For this guide, a video editing API is a REST API that takes source video files and returns edited output. The three specific operations are:
- Concatenate -- join multiple video clips end-to-end into a single file.
- Resize -- change the width and height of a video (also called scaling).
- Trim -- cut a segment from a video by start and end timestamps (also called clipping).
We are not comparing live streaming APIs, playback SDKs, or video player infrastructure. Mux appears in this guide because it offers clipping, but its core focus is streaming. The comparison stays on the three operations.
Why do developers need concat, resize, and trim in an API?
Three common pipelines drive demand for these operations:
Social media content pipelines. Trim a long recording into highlights. Resize each clip to 1080x1080 (Instagram square), 1920x1080 (YouTube), and 1080x1920 (TikTok). Concatenate multiple highlights into a compilation. All three operations run in sequence.
User-generated content platforms. A user uploads a video. The platform trims the first and last few seconds. Resizes to a uniform resolution. Concatenates multiple uploads into a single reel. Each step must happen automatically through an API.
Batch marketing asset production. A team needs every video resized to 16:9, trimmed to 60 seconds, and concatenated with a branded intro. Running hundreds of these jobs requires an API that handles all three operations efficiently.
How does FFmpeg handle concat, resize, and trim?
FFmpeg is the open-source foundation beneath most video editing APIs. Understanding its primitives helps you evaluate how much control each API really gives you.
| Operation | FFmpeg method | How it works |
|---|---|---|
| Concatenate | concat demuxer or concat filter | Demuxer joins files with identical codecs without re-encoding. Filter joins files with different codecs by re-encoding through the filter graph. |
| Resize | scale filter | scale=1920:1080 sets exact dimensions. scale=-2:720 sets height and auto-calculates width to preserve aspect ratio. scale=iw/2:ih/2 halves dimensions. Supports expressions. |
| Trim | -ss/-t seek or trim filter | Fast seek (-ss before -i) is fast but not frame-accurate. Re-encode trim with trim filter is frame-accurate. Stream-copy trim (-c copy) is fast but only accurate to keyframes. |
Any hosted API that runs arbitrary FFmpeg commands supports all three operations out of the box. Proprietary APIs wrap FFmpeg under the hood but restrict what you can express.
Which video editing API handles concatenation (joining clips)?
Concatenation is the hardest of the three operations for proprietary APIs. Joining video files with different codecs, resolutions, or bitrates requires careful handling. Each API approaches it differently.
| API | Concat method | Handles different codecs? | Transitions? | Notes |
|---|---|---|---|---|
| Very Good FFmpeg | concat demuxer + concat filter | Yes (filter re-encodes) | Via FFmpeg | Full control. Same as local FFmpeg. |
| Cloudinary | fl_splice layer parameter | No (requires same codecs) | No | Spliced as overlay. Limited for multi-clip joins. |
| Mux | None | N/A | N/A | Mux does not support concatenation. |
| Shotstack | Timeline clips array | Yes (re-encodes) | Yes | JSON timeline with sequential clips. Requires knowing clip lengths. |
| Creatomate | RenderScript sequence | Yes (re-encodes) | Yes | Template-based sequencing. Not raw concatenation. |
| AWS MediaConvert | Input clipping only | N/A | N/A | MediaConvert clips within a single input. No multi-file concatenation. |
Does Shotstack concatenate like FFmpeg?
Shotstack uses a JSON timeline where you define an array of clips that play sequentially. It handles transitions between clips and supports audio mixing. You specify the start and length of each clip in seconds.
The trade-off is abstraction cost. Shotstack charges $0.20 to $0.30 per minute of rendered output. A 10-minute concatenated video costs $2.00 to $3.00 in rendering fees. The same operation on a hosted FFmpeg API costs a fraction of that because you pay for compute, not output duration.
Shotstack is easier for developers who want a high-level timeline model. But if you already know FFmpeg's concat demuxer, the timeline abstraction adds complexity without benefit.
Can Cloudinary splice videos end-to-end?
Cloudinary's fl_splice parameter appends one video to another in a transformation URL. It works for simple two-clip joins. The syntax is part of the video overlay system, which means each clip is treated as a layer rather than a segment in a sequence.
The limits appear quickly. You cannot apply per-clip encoding settings. Chaining more than two or three clips requires nested transformation URLs that become unreadable. There is no support for handling codec mismatches between clips.
Cloudinary concat is fine for quick social video joins. It falls apart for production pipelines with more than a few clips.
Which API offers the best resize and scale capabilities?
Resizing is the most widely supported operation across all APIs. The difference is in how much control you get over scaling behavior, aspect ratio preservation, and integration with other operations in the same pipeline.
| API | Resize method | Aspect ratio preserve? | Expressions? | Chain with trim/concat? |
|---|---|---|---|---|
| Very Good FFmpeg | -vf scale= | Yes (-2 syntax) | Yes (math expressions) | Yes, same filter graph |
| Cloudinary | c_scale, c_fill, c_pad | Yes (auto-gravity) | No (fixed params) | Separate URL calls |
| Mux | Playback transforms only | Limited | No | No output-file resize |
| Shotstack | Output width/height | Yes (fit/crop/stretch) | No | Implicit (same render job) |
| Creatomate | Template resolution | Per-element scale | No | Same render job |
| AWS MediaConvert | Output presets | Yes (various) | No | Job template |
Does Cloudinary's URL-based resize beat FFmpeg's scale filter for simplicity?
Cloudinary wins on simplicity for one-off resize operations. A URL like https://res.cloudinary.com/demo/video/upload/c_scale,w_400/sample.mp4 resizes a video without any backend code or API call. You can drop it into an <img> or <video> tag and it works.
FFmpeg's scale filter requires an API call. But it wins for anything beyond simple resize. You can express scale=iw/2:ih/2 to halve dimensions, scale=-2:360 to target a height with auto-width, or scale=1920:1080:force_original_aspect_ratio=decrease to letterbox. You can also chain resize with trim and concat in the same filter graph, something Cloudinary cannot do in a single operation.
Can you resize from one resolution to multiple outputs in one job?
A hosted FFmpeg API can. Use command chaining to produce 1080p, 720p, and 480p versions from one source in a single request:
{
"input_files": { "input.mp4": "https://.../source.mp4" },
"output_files": ["1080p.mp4", "720p.mp4", "480p.mp4"],
"ffmpeg_commands": [
"-i {{input.mp4}} -vf scale=1920:1080 {{1080p.mp4}}",
"-i {{input.mp4}} -vf scale=1280:720 {{720p.mp4}}",
"-i {{input.mp4}} -vf scale=854:480 {{480p.mp4}}"
]
}Cloudinary, Shotstack, and Creatomate require separate jobs or render calls for each resolution.
Which video editing API handles trimming best?
Trimming looks easy until you need frame accuracy, fast seek for large files, or the ability to trim from a source that has not been pre-uploaded. Each API handles a different slice of these requirements.
| API | Trim method | Frame accurate? | Fast seek? | Trim without pre-upload? |
|---|---|---|---|---|
| Very Good FFmpeg | -ss/-t or trim filter | Yes (re-encode) | Yes (-ss before -i) | Yes (input URL direct) |
| Mux | start_time/end_time on asset | Yes | Yes | No (must exist as asset) |
| Cloudinary | so_/eo_ URL params | No (keyframe) | Yes | Yes (source URL in transform) |
| Shotstack | trim on clip object | Yes (re-encode) | Not exposed | Yes (input URL) |
| Creatomate | trim in RenderScript | Yes | Not exposed | Yes (input URL) |
| AWS MediaConvert | Input clip in job config | Yes | Yes | No (S3 source) |
Is Mux clipping good enough for most trim use cases?
Mux clipping is the best trim experience if your workflow is: upload video to Mux, then clip from the stored asset. The clipping API is a single POST with start_time and end_time in seconds. It is frame-accurate and fast.
The limit is that Mux clipping is a standalone operation. You cannot trim and concatenate in one step. You cannot trim and resize in one step. Each clip generates a new Mux asset, and there is no API to combine those assets. If your pipeline is trim-only, Mux works well. If you need trim followed by resize or concat, you need an additional tool.
Can you trim and concatenate in a single API call?
A hosted FFmpeg API can. Use the filter graph to trim each input then concatenate them in one command:
-i {{input1.mp4}} -i {{input2.mp4}} -filter_complex \
"[0:v]trim=start=10:end=30,setpts=PTS-STARTPTS[clip1]; \
[1:v]trim=start=5:end=20,setpts=PTS-STARTPTS[clip2]; \
[clip1][clip2]concat=n=2:v=1:a=0[out]" \
-map "[out]" {{output.mp4}}Shotstack achieves the same via timeline -- each clip has a trim property and clips play sequentially. But the cost scales with output duration. Cloudinary and Mux cannot do this in one operation. Creatomate can sequence trimmed clips in a RenderScript but requires template setup.
How do these APIs compare head-to-head for concat + resize + trim?
| API | Concat | Resize | Trim | Chain all three? | Pricing model |
|---|---|---|---|---|---|
| Very Good FFmpeg | Full FFmpeg demuxer + filter | scale filter, any expression | trim filter, seek | Yes, one command | $0.50/GB usage |
| Cloudinary | fl_splice (limited) | c_scale, c_fill, auto-gravity | so/eo URL params | Separate URL calls | Credit + transformation |
| Mux | None | Playback only | start_time/end_time | No chaining | Storage + delivery |
| Shotstack | Timeline clips | Output width/height | Clip trim property | Single render job | $0.20-0.30/min |
| Creatomate | RenderScript sequence | Template resolution | trim in script | Single render job | ~14 credits/min 720p |
| AWS MediaConvert | Input clips only | Output presets | Input clip config | Job templates | Per-job transcoding |
Which API is cheapest for batch concat + resize + trim?
A hosted FFmpeg API wins on batch pricing. You pay for the compute and storage of input and output bytes. A job that concatenates three clips, resizes the result to 1080p, and trims the first 30 seconds runs as one FFmpeg command on a single machine. You pay once.
Shotstack and Creatomate charge per minute of output. A 30-second trimmed clip costs the same per-minute rate whether your source is 1 minute or 1 hour. At batch scale, per-minute pricing adds up fast.
Cloudinary charges per transformation plus storage. A concat + resize + trim pipeline counts as multiple transformations, each consuming credits. The cost is less predictable than usage-based compute pricing.
Mux has no batch concat or resize, so it is not comparable for this use case.
| Scenario | Very Good FFmpeg | Shotstack | Cloudinary |
|---|---|---|---|
| 100 videos, concat + resize to 720p + trim to 30s | ~$2-$5 total (usage) | ~$10-$15 total ($0.20/min x 0.5 min x 100) | ~$5-$15 total (credits per transform) |
| 10,000 videos, same pipeline | ~$200-$500 total | ~$1,000-$1,500 total | ~$500-$1,500 total |
When should I pick a hosted FFmpeg API over proprietary video editing APIs?
A hosted FFmpeg API like Very Good FFmpeg is the right choice when:
You already know FFmpeg commands. There is no learning curve. The API accepts the same flags, filters, and syntax you use on the command line. Your existing FFmpeg knowledge transfers directly.
You need to chain concat + resize + trim in one job. This is the killer feature. No other API lets you write a filter graph that trims clip A, scales clip B, concatenates them, and outputs a single file. You do it in one FFmpeg command, one API call, one bill.
You want usage-based pricing without monthly minimums. Very Good FFmpeg charges $0.50/GB for the first 10 GB, dropping to $0.10/GB between 10-100 GB and $0.08/GB above 100 GB. The first 2 GB are free. There is no monthly subscription. If you process nothing in a month, you pay nothing.
You want full control over codecs, bitrate, CRF, and pixel format. FFmpeg exposes every encoding parameter. You can target specific H.264 profiles, set custom CRF values, choose between software and hardware encoding, and control pixel formats. Proprietary APIs expose a subset of these options.
You want zero vendor lock-in. Your FFmpeg commands work anywhere FFmpeg runs. If you switch hosts or self-host, your code moves with you. No JSON schema dependency, no timeline abstraction to port.
When should I avoid a hosted FFmpeg API?
You need a visual template editor or no-code workflow. Creatomate and Shotstack offer template builders and drag-and-drop timelines. A hosted FFmpeg API requires writing commands.
You only need streaming and playback infrastructure. Mux handles upload, encoding, delivery, and player SDKs in one platform. Hosted FFmpeg APIs handle encoding only. You need separate infrastructure for delivery.
You want URL-only transformations with zero backend code. Cloudinary's URL-based transforms are the simplest possible interface for resize and trim. Adding a hosted FFmpeg API means writing server-side code to make API calls.
You need broadcast-grade encoding features. AWS MediaConvert includes closed captions, HDR, Dolby Vision, and advanced audio. Hosted FFmpeg APIs cover standard broadcast codecs but not every enterprise feature.
Common problems when building a concat + resize + trim pipeline
Codec mismatch during concatenation
The FFmpeg concat demuxer requires all input files to have the same codec, resolution, and stream layout. If your clips come from different sources, you get an error.
Solution: use the concat filter instead of the demuxer. The filter re-encodes streams through the filter graph, which handles mismatched inputs. It is slower (re-encoding is compute-intensive), but it works with any combination of codecs and resolutions.
Aspect ratio breaks after resize
Setting scale=1920:1080 on a vertical video stretches the content. Setting scale=1080:1920 on a horizontal video letterboxes or crops depending on the scaling mode.
Solution: use the -2 expression in the scale filter to preserve aspect ratio. scale=-2:720 sets the height to 720 pixels and auto-calculates width to maintain the original aspect ratio. For complete control, combine with force_original_aspect_ratio=decrease or =increase to control letterboxing behavior.
Trim is not frame-accurate
Using -ss before -i (fast seek) is fast because FFmpeg jumps to the nearest keyframe and starts decoding from there. But the cut point may be off by up to several seconds depending on the keyframe interval.
Solution: use the trim filter for frame-accurate cuts. Run -i {{input.mp4}} -vf trim=start=10:end=30 to get exact frame positions. The trade-off is that trim filter re-encodes the segment, which is slower than fast seek with stream copy.
Pricing shock from per-minute APIs
Shotstack and Creatomate charge per minute of rendered output. A 10-minute video costs the same whether it is a simple concatenation of two clips or a complex multi-layer edit. At volume, per-minute pricing becomes expensive for simple operations.
Solution: use a usage-based hosted FFmpeg API for simple concat/resize/trim operations. The cost correlates with file sizes and compute time, not output duration. Reserve per-minute APIs for complex template-driven edits where their abstraction adds value.
Runtime limits on long videos
AWS Lambda caps at 15 minutes. Cloud Run caps at 60 minutes. If you need to concatenate or trim long videos, these limits force you to chunk the work.
Solution: use a hosted FFmpeg API with a 6-hour runtime limit. Very Good FFmpeg supports up to 6 hours per job, which covers full feature-length films. No chunking, no multi-job orchestration.
So what is the best video editing API for concatenate, resize, and trim in 2026?
For raw power and flexibility: a hosted FFmpeg API like Very Good FFmpeg. You get all three operations in one command. The full FFmpeg filter graph. No JSON schema to learn. Usage-based pricing with no monthly minimum. This is the right choice if you know FFmpeg or want the least locked-in option.
For quick URL-based trim and resize: Cloudinary. Drop-in URL parameters. No backend code. But concatenation is limited and chaining multiple operations requires separate URL calls.
For streaming and clipping workflows: Mux. Frame-accurate clipping from stored assets. Fast and reliable. But it only clips. No concatenation. No output-file resize.
For template-driven teams who need a visual editor: Shotstack or Creatomate. Both offer JSON timelines, template builders, and integrations with no-code platforms. Both are more expensive at scale and limit your access to the underlying encoder.
For enterprise broadcast workflows: AWS MediaConvert. Broadcast codecs, DRM, HDR, and compliance certifications. But it is not a video editing API for concat, resize, and trim. It is a transcoding service.
Final recommendation: If you can write FFmpeg, use a hosted FFmpeg API. It is the only option that does all three operations without compromise. If you cannot write FFmpeg and need templates or a visual editor, pick Shotstack or Creatomate and accept the higher cost and limited control.
Frequently asked questions about video editing APIs
Can I concatenate videos of different resolutions?
Yes, with the FFmpeg concat filter combined with scale in the same filter graph. The filter re-encodes each input to a common resolution before concatenating. Most proprietary APIs require pre-normalized inputs or fail on mismatched resolutions.
Does Mux support concatenation?
No. Mux only supports clipping individual assets via start_time and end_time parameters. There is no API to join two Mux assets into one video.
Is Cloudinary good for batch video processing?
Cloudinary works for simple batch transforms like resize-all or trim-all. It is not good for batch pipelines that involve concatenation or multi-step filter chains. Each transform counts as a separate credit cost and requires separate URL processing.
What is the cheapest video editing API for trimming?
A hosted FFmpeg API with usage-based pricing. Per-minute APIs like Shotstack and Creatomate cost more per trim operation at scale, even for short clips, because they charge by output duration rather than compute used.
Can I resize and trim in one API call?
Yes, with a hosted FFmpeg API. The command -i {{input.mp4}} -vf trim=start=10:end=30,scale=1280:720 {{output.mp4}} does both in one filter chain. Cloudinary requires two separate transformation URLs. Shotstack and Creatomate handle it in a single render job but with higher per-minute cost.
Do I need to know FFmpeg to use a hosted FFmpeg API?
Yes. That is the trade-off. You get full power over encoding and filters, but you need to know FFmpeg syntax. If you are not comfortable writing FFmpeg commands, Shotstack or Creatomate offer JSON-based alternatives with a gentler learning curve.
How do I handle codec mismatch when concatenating?
Use the FFmpeg concat filter instead of the concat demuxer. The demuxer requires identical codecs. The filter re-encodes each stream to a common codec, which handles mismatches automatically at the cost of additional compute time.
What are the runtime limits for hosted FFmpeg APIs?
Very Good FFmpeg supports up to 6 hours per job. Lambda-based FFmpeg wrappers are capped at 15 minutes. Cloud Run caps at 60 minutes. Choose based on the length of videos you need to process. For long videos or batch jobs, the 6-hour limit avoids chunking and orchestration complexity.
Does Cloudinary support concatenation?
Partially. Cloudinary's fl_splice parameter can join videos, but the implementation treats each additional video as an overlay layer rather than a sequential segment. It works for simple two-clip joins but does not support multi-clip concatenation with per-clip encoding control.
Which API is best for social media batch processing?
A hosted FFmpeg API with command chaining. You can trim each source, resize to multiple aspect ratios (square, landscape, portrait), and concatenate highlights in parallel commands within a single API request. No other API matches this throughput per job.
References
- FFmpeg concat demuxer documentation -- https://ffmpeg.org/ffmpeg-formats.html#concat-1
- FFmpeg concat filter documentation -- https://ffmpeg.org/ffmpeg-filters.html#concat
- FFmpeg scale filter documentation -- https://ffmpeg.org/ffmpeg-filters.html#scale
- FFmpeg trim filter documentation -- https://ffmpeg.org/ffmpeg-filters.html#trim
- Shotstack overview -- https://shotstack.io/docs/guide/
- Shotstack pricing -- https://shotstack.io/pricing/
- Creatomate documentation -- https://creatomate.com/docs/introduction
- Creatomate pricing -- https://creatomate.com/pricing
- Creatomate blog: best video generation APIs reviewed -- https://creatomate.com/blog/the-best-video-generation-apis
- Mux video API -- https://www.mux.com/video-api
- Mux features -- https://www.mux.com/features
- Mux clipping guide -- https://docs.mux.com/guides/video/create-clips-from-your-videos
- AWS Elemental MediaConvert overview -- https://docs.aws.amazon.com/mediaconvert/latest/ug/what-is.html
- Cloudinary video manipulation documentation -- https://cloudinary.com/documentation/video_manipulation_and_delivery
- Very Good FFmpeg homepage -- https://verygoodffmpeg.com
- Very Good FFmpeg trim guide -- https://verygoodffmpeg.com/content/guides/trim-video
- Very Good FFmpeg comparison section -- https://verygoodffmpeg.com
- FFmpeg documentation hub -- https://ffmpeg.org/documentation.html