Lip-sync limitations and workarounds
Lip-Sync re-animates the speaker’s mouth to match the dubbed audio — but it doesn’t work on every shot and knowing its limits saves you credits and frustration. This article covers where Lip-Sync struggles and what you can do about it.
Where Lip-Sync struggles
Section titled “Where Lip-Sync struggles”Lip-Sync depends on the AI seeing a face clearly on screen. It tends to produce weak results in these situations:
- Insufficient footage of the face: The video lacks at least 10 seconds of continuous, clear facial coverage, preventing the AI from properly analyzing and animating the lip movements.
- The speaker isn’t facing the camera (profile shots, quick turns away). Face tracking can lose the mouth.
- The face is small in the frame (wide shots, crowd scenes, group photos). Detail is too low for realistic mouth movement.
- The mouth is partially hidden — microphones held close, hands in front of the face, hoodies, masks, heavy beards.
- Very fast cuts or rapid head motion. The model has less time to re-animate between frames.
What Dubly does automatically
Section titled “What Dubly does automatically”You don’t need to mark anything. The pipeline:
- Detects faces per segment. Segments with no face visible are passed through unchanged — Lip-Sync is applied only where a face is on screen.
- Normalizes the video to H.264 MP4 at up to 1920px width and 30 FPS before lip-syncing, so mild encoding quirks in the source don’t break the pipeline.
- Retries per-segment failures up to five times before marking that segment failed. Other segments continue independently.
What you can do
Section titled “What you can do”Before you run Lip-Sync:
- Crop tighter on the speaker’s face if it’s small in the frame — a tighter composition gives the model more to work with.
- Cut out non-speaking B-roll — if your video has long stretches where nobody is talking on camera, Lip-Sync isn’t adding value there anyway.
- Avoid strong face occlusion or extreme angles where possible.
When Lip-Sync fails completely
Section titled “When Lip-Sync fails completely”If Lip-Sync produces a “Failed” status on the dub page, the most common cause is no face detected anywhere in the video. Animation, motion graphics, product videos without presenters, or voice-over-only footage can’t be lip-synced.
Cost reminder
Section titled “Cost reminder”Lip-Sync is billed separately from dubbing at 1 credit per minute per subdub. If a shot clearly won’t benefit from Lip-Sync (wide shots, no speaker on camera), skipping Lip-Sync for that language saves you that credit spend — see When to Enable Lip-Sync for the decision framework.
Still problems?
Section titled “Still problems?”If your source video meets the quality bar above — clear mouth, good framing, stable lighting, consistent frame rate — and you’re still facing problems or your lip-sync failed completely, contact our support team with:
- The dub link
- A timestamp of the problem segment
- A short description of the problem
This helps us investigate whether it’s something we need to tune on our side.