apps/bytedance/seedance-1-0-pro-fast

seedance-1-0-pro-fast

Fast high-quality video generation up to 1080p from text prompts with optional first-frame image control using ByteDance's Seedance 1.0 Pro Fast model.

run in browser run via API

run with your agent

# install belt

$curl -fsSL https://cli.inference.sh | sh

# view schema & details

$belt app get bytedance/seedance-1-0-pro-fast

# run

$belt app run bytedance/seedance-1-0-pro-fast

ByteDance builds video models the way a car manufacturer builds trim levels. The Seedance 1.x Pro series is three models that share DNA but target different points on the speed-quality-cost triangle. Seedance 1.0 Pro, launched in June 2025, is the original full-quality offering. Seedance 1.0 Pro Fast trades some refinement for dramatically quicker turnaround. Seedance 1.5 Pro, released in December 2025, improves on the original with better motion coherence and optional audio generation. All three produce up to 1080p video from text prompts or first-frame images, and all three are available through inference.sh as simple API calls.

I want to be upfront about positioning here. ByteDance released Seedance 2.0 in February 2026, which brings native audio-video joint generation, reference-to-video with up to 12 input assets, phoneme-perfect lip-sync in 8+ languages, and improved temporal consistency. If you're starting fresh and cost isn't the primary concern, 2.0 is the more capable family. But the 1.x series hasn't become irrelevant overnight. The pricing is lower on certain tiers, the latency profile on Pro Fast is hard to beat, and for workflows that don't need audio or character references, the 1.x models still deliver clean, usable video without paying for capabilities you won't use. There's a reason car companies keep selling last year's model at a discount.

how the three tiers relate to each other

Think of Seedance 1.0 Pro as the baseline. It runs through ByteDance's full diffusion pipeline at native resolution, producing video with strong spatial detail and reasonable motion fidelity. The generation time is what you'd expect from a full-pass model - not instant, but competitive with Wan 2.7 and similar architectures. It supports durations from 2 to 10 seconds, resolutions at 720p and 1080p, and accepts either pure text prompts or an image as a first-frame anchor.

Seedance 1.0 Pro Fast uses the same underlying architecture but applies inference optimizations - likely step reduction and possibly distillation - to cut generation time substantially. The visual quality takes a hit, though it's smaller than you might expect. For rapid prototyping sessions where you're iterating on prompts and need to see results quickly, Pro Fast is significantly cheaper than Pro and the obvious choice. You wouldn't use it for final output in a paid production, but for narrowing down creative direction it's more than adequate.

Seedance 1.5 Pro represents a genuine architectural step forward, built on a Dual-Branch Diffusion Transformer with 4.5 billion parameters - one branch handles video frames while the other processes audio waveforms, connected by a cross-modal joint module that keeps audio and video synchronized at the millisecond level. The motion quality improvement is visible in direct comparisons - cloth movement drapes more naturally, water flows with better physical plausibility, and human movement shows fewer of the uncanny interpolation artifacts that plagued 1.0. ByteDance also claims over 10x faster inference than the previous generation. Duration extends to 4-15 seconds at the upper end, giving you longer clips before needing to cut. The pricing lands between the other two tiers, with audio generation roughly doubling the cost when enabled. That audio capability - having the model generate ambient sound, music, or speech with lip-sync support across 8+ languages including English, Chinese, Japanese, Korean, Spanish, and Portuguese - bridges a gap that previously required a separate model and a sync step.

text-to-video generation

The text-to-video path across all three Seedance 1.x models works identically from an interface perspective. You provide a prompt describing the scene you want, select your resolution and duration, and receive an MP4. The differences are in output quality, speed, and cost rather than workflow.

Prompt writing for Seedance follows the patterns established by other diffusion-based video models. Cinematic language works well. Describing camera movement, lighting direction, and subject action in concrete terms produces better results than vague descriptions. I've found that Seedance 1.x responds particularly well to descriptions of atmosphere and mood - "overcast morning light filtering through industrial windows" gets you something distinctly different from "bright daylight in a warehouse," and the model makes sensible creative choices about color temperature and shadow direction based on those cues.

The camera_fixed parameter deserves mention because it solves a specific frustration. Video models love to move the camera. They'll dolly, pan, orbit, and zoom even when your prompt describes a static scene. Setting camera_fixed to true tells the model to hold its ground, producing stable footage that works as establishing shots or talking-head backgrounds without the constant drift that makes generated video feel restless. It's a small control but it makes a meaningful difference in usability for certain applications.

Duration ranges vary by tier. The original 1.0 Pro supports 2 to 10 seconds. Pro Fast matches that range. Seedance 1.5 Pro extends the minimum to 4 seconds and the maximum to 10 seconds in standard mode, though the quality at the upper bound remains stable enough that you rarely feel forced to truncate. The minimum duration increase on 1.5 suggests ByteDance found that very short clips weren't utilizing the model's temporal coherence capabilities well - two seconds isn't enough time for the improved motion modeling to demonstrate its advantages.

image-to-video and first-frame control

All three Seedance 1.x models accept an optional image input that serves as the first frame of the generated video. This is where the models transition from creative exploration to production tool. When you need the output to match existing visual material - a brand's color palette, a specific character design, a product photograph - starting from a controlled first frame eliminates the randomness that makes pure text-to-video unreliable for commercial work.

The image-to-video mode maintains strong fidelity to the source frame's composition, color, and subject appearance while adding motion based on your text prompt. You provide the image and describe what should happen: "the woman turns to face the camera and smiles," "the car begins driving forward along the highway," "the leaves start blowing in the wind." The model handles the temporal progression while keeping the visual identity locked to your reference.

One practical note: the quality of your input image matters more here than prompt sophistication. A sharp, well-lit photograph with clear subject separation from background will produce cleaner animation than a noisy or compositionally ambiguous source. The model struggles to invent detail that isn't present in the first frame, so starting from high-quality input material pays dividends in the output.

when each tier makes sense

The cost structure across the Seedance 1.x family follows video token pricing, calculated from resolution, frame rate, and duration. Pro Fast is the cheapest tier, Seedance 1.5 Pro sits in the middle, and the original 1.0 Pro is the most expensive per token. Notably, Seedance 1.5 Pro offers better quality than 1.0 at a lower price point - ByteDance clearly wants to migrate users forward without penalizing them financially. The audio surcharge doubles the cost for a feature you might not use. Leave it off unless you have a specific reason to include it.

The decision framework is straightforward. Use Pro Fast during ideation when you're testing prompts and exploring creative directions. Switch to 1.5 Pro for final renders where quality matters. Use 1.0 Pro only if you have a specific reason to prefer its output characteristics over 1.5 - and honestly, I haven't found one yet. The 1.5 improvements are consistent enough that it should be your default for production work within the 1.x family.

For context against competitors: Wan 2.7 offers more modes (text, image, reference, editing) and longer maximum durations. Veo targets higher fidelity at higher cost. Seedance 2.0 adds audio and reference-to-video natively. The 1.x series competes on simplicity and cost-effectiveness for straightforward text-to-video and image-to-video workflows where you don't need the extended feature sets of newer models.

motion quality and known limitations

Seedance 1.0 Pro produces motion that's competent but occasionally mechanical. Repetitive actions like walking or waving sometimes slip into looping patterns where the same few frames repeat with minor variation rather than progressing naturally. Seedance 1.5 Pro reduces this substantially - the temporal modeling is clearly improved - but doesn't eliminate it entirely on longer clips.

Fast motion remains challenging across the entire family. Camera pans, quick subject movement, and action sequences introduce blur and frame inconsistency. This isn't unique to Seedance - it's a limitation shared by most current diffusion-based video models - but it means you'll get best results from deliberate, controlled motion rather than energetic action.

Human anatomy is handled with reasonable accuracy for medium shots and wider framing. Close-ups of hands, detailed facial expressions during speech, and full-body dance sequences still expose generation artifacts. Fingers may merge or split. Facial symmetry drifts over time. Feet sometimes slide relative to the ground rather than planting convincingly. These are improving generation over generation, and 1.5 is noticeably better than 1.0, but they're not solved.

Text in generated scenes is unreliable. Signs, screens, books, and any surface that should contain legible writing will produce garbled characters. Plan to add text in post-production rather than relying on the model to render it.

The maximum resolution of 1080p means no 4K output. For social media, web content, and internal production, this is fine. For broadcast or large-screen display, you'll need either upscaling in post or a model that supports higher native resolution.

when to choose 1.x over seedance 2.0

This is the honest question. If 2.0 exists and improves on everything, why would anyone use 1.x? A few reasons hold up under scrutiny.

Cost sensitivity matters for high-volume workflows. If you're generating hundreds of clips for A/B testing ad creative, the token pricing difference between 1.x Pro Fast and 2.0 adds up fast. The quality gap won't matter when you're screening for performance metrics rather than pixel perfection.

Simplicity has value. The 1.x models are text-in, video-out. No audio configuration to think about, no reference-to-video complexity, no additional modes to understand. If your pipeline just needs a video from a prompt, the simpler interface means fewer parameters to get wrong.

Latency on Pro Fast is genuinely quick for what it delivers. When generation speed is the bottleneck - interactive applications, real-time previews, user-facing generation features - Pro Fast's turnaround time is competitive with anything in its quality class.

That said, for new projects without legacy integration concerns, I'd default to Seedance 2.0 and only fall back to 1.x if you hit a specific constraint around cost, speed, or pipeline simplicity that 2.0 doesn't accommodate. The newer family is better in almost every measurable dimension.

frequently asked questions

what's the maximum video duration across the seedance 1.x models?

Seedance 1.0 Pro and Pro Fast both support 2 to 10 seconds of output video. Seedance 1.5 Pro adjusts this range to 4 to 10 seconds minimum-to-maximum. The floor increase on 1.5 reflects ByteDance's finding that very short clips don't showcase the improved temporal modeling. For clips under 4 seconds where you want the cheapest option, Pro Fast at 2 seconds minimum remains available. None of the 1.x models match the 15-second maximum that Wan 2.7 offers, so plan your shot lengths accordingly.

does seedance 1.5 pro generate audio automatically or do I need to enable it?

Audio generation on Seedance 1.5 Pro is not enabled by default. The standard output is video-only at the base rate. You must explicitly request audio generation, which doubles the cost. The audio output includes ambient sound and can include speech if your prompt describes dialogue, but the quality and appropriateness of generated audio varies. For professional work, you'll likely still want to replace generated audio with properly produced sound in post, making the surcharge unnecessary for most workflows.

how does seedance 1.x compare to wan 2.7 for general text-to-video?

Wan 2.7 offers a broader toolkit with four modes covering text-to-video, image-to-video, reference-to-video, and video editing. Its maximum duration extends to 15 seconds, and it includes features like driving audio input and multi-frame interpolation that Seedance 1.x lacks entirely. On pure text-to-video quality at comparable durations, the two families produce similar results with different aesthetic tendencies - Seedance leans slightly more photorealistic while Wan handles stylized content marginally better. The practical choice often comes down to which additional features you need beyond basic generation.

api reference

about

fast high-quality video generation up to 1080p from text prompts with optional first-frame image control using bytedance's seedance 1.0 pro fast model.

1. calling the api

install the client

the client provides a convenient way to interact with the api.

bash

1pip install inferencesh

setup your api key

set INFERENCE_API_KEY as an environment variable. get your key from settings → api keys.

bash

1export INFERENCE_API_KEY="inf_your_key"

run and get result

submit a request and wait for the final result. best for batch processing or when you don't need progress updates.

python

1from inferencesh import inference23client = inference()456result = client.run({7        "app": "bytedance/seedance-1-0-pro-fast",8        "input": {}9    })1011print(result["output"])

stream live updates

get real-time progress updates as the task runs. ideal for showing progress bars, partial results, or long-running tasks.

python

1from inferencesh import inference23client = inference()456# stream=True yields updates as they arrive7for update in client.run({8        "app": "bytedance/seedance-1-0-pro-fast",9        "input": {}10    }, stream=True):11    if update.get("progress"):12        print(f"progress: {update['progress']}%")13    if update.get("output"):14        print(f"output: {update['output']}")

2. authentication

the api uses api keys for authentication. see the authentication docs for detailed setup instructions.

3. files

file inputs are automatically handled by the sdk. you can pass local paths, urls, or base64 data.

automatic upload

the python sdk automatically detects local file paths and uploads them. urls are passed through as-is.

python

1# local file paths are automatically uploaded2result = client.run({3    "app": "bytedance/seedance-1-0-pro-fast",4    "input": {5        "image": "/path/to/local/image.png",  # detected & uploaded6        "audio": "https://example.com/audio.mp3",  # url passed through7    }8})

manual upload

you can also upload files manually and use the returned url.

python

1# upload and get a hosted URL2file = client.files.upload("/path/to/file.png")3print(file.uri)  # https://cloud.inference.sh/...

4. webhooks

get notified when a task completes by providing a webhook url. when the task reaches a terminal state (completed, failed, or cancelled), a POST request is sent to your url with the task result.

python

1result = client.run({2    "app": "bytedance/seedance-1-0-pro-fast",3    "input": {},4    "webhook": "https://your-server.com/webhook"5}, wait=False)

webhook payload

your endpoint receives a JSON POST with the task result:

json

1{2  "id": "task_abc123",3  "status": 9,4  "output": { ... },5  "error": "",6  "session_id": null,7  "created_at": "2024-01-15T10:30:00Z",8  "updated_at": "2024-01-15T10:30:05Z"9}

idstring— task id

statusnumber— terminal status (9=completed, 10=failed, 11=cancelled)

outputobject— task output (when completed)

errorstring— error message (when failed)

session_idstring— session id (if using sessions)

created_atstring— iso timestamp

updated_atstring— iso timestamp

5. schema

input

promptstring*

text prompt describing the video content and motion. be descriptive about actions and camera movements.

example: "At breakneck speed, drones thread through intricate obstacles, delivering an immersive flying experience."

imagestring(file)

optional first-frame image for image-to-video generation. if not provided, generates from text only.

resolutionstring

video resolution. 1080p for highest quality, 720p for balanced, 480p for fastest generation.

default: "1080p"

options:"480p""720p""1080p"

durationstring

duration of the video in seconds (2-12 seconds).

default: "5"

options:"2""3""4""5""6""7""8""9""10""11""12"

camera_fixedboolean

whether to fix the camera position during video generation. set to true for static camera shots.

default: false

output

videostring(file)*

the generated video file.

output_metaobject

structured metadata about inputs/outputs for pricing calculation

ready to run seedance-1-0-pro-fast?

try in browser browse all tools

we use cookies

we use cookies to ensure you get the best experience on our website. for more information on how we use cookies, please see our cookie policy.

by clicking "accept", you agree to our use of cookies.
learn more.