Lip-sync limitations and workarounds

Lip-Sync re-animates the speaker’s mouth to match the dubbed audio — but it doesn’t work on every shot and knowing its limits saves you credits and frustration. This article covers where Lip-Sync struggles and what you can do about it.

Where Lip-Sync struggles

Lip-Sync depends on the AI seeing a face clearly on screen. It tends to produce weak results in these situations:

Insufficient footage of the face: The video lacks at least 10 seconds of continuous, clear facial coverage, preventing the AI from properly analyzing and animating the lip movements.
The speaker isn’t facing the camera (profile shots, quick turns away). Face tracking can lose the mouth.
The face is small in the frame (wide shots, crowd scenes, group photos). Detail is too low for realistic mouth movement.
The mouth is partially hidden — microphones held close, hands in front of the face, hoodies, masks, heavy beards.
Very fast cuts or rapid head motion. The model has less time to re-animate between frames.

What Dubly does automatically

You don’t need to mark anything. The pipeline:

Detects faces per segment. Segments with no face visible are passed through unchanged — Lip-Sync is applied only where a face is on screen.
Normalizes the video to H.264 MP4 at up to 1920px width and 30 FPS before lip-syncing, so mild encoding quirks in the source don’t break the pipeline.
Retries per-segment failures up to five times before marking that segment failed. Other segments continue independently.

What you can do

Before you run Lip-Sync:

Crop tighter on the speaker’s face if it’s small in the frame — a tighter composition gives the model more to work with.
Cut out non-speaking B-roll — if your video has long stretches where nobody is talking on camera, Lip-Sync isn’t adding value there anyway.
Avoid strong face occlusion or extreme angles where possible.

When Lip-Sync fails completely

If Lip-Sync produces a “Failed” status on the dub page, the most common cause is no face detected anywhere in the video. Animation, motion graphics, product videos without presenters, or voice-over-only footage can’t be lip-synced.

Cost reminder

Lip-Sync is billed separately from dubbing at 1 credit per minute per subdub. If a shot clearly won’t benefit from Lip-Sync (wide shots, no speaker on camera), skipping Lip-Sync for that language saves you that credit spend — see When to Enable Lip-Sync for the decision framework.

Still problems?

If your source video meets the quality bar above — clear mouth, good framing, stable lighting, consistent frame rate — and you’re still facing problems or your lip-sync failed completely, contact our support team with:

The dub link
A timestamp of the problem segment
A short description of the problem

This helps us investigate whether it’s something we need to tune on our side.