Skip to content

Why Audio, Subtitles, and Video Can Be Out of Sync

When translating between different languages, the length of sentences changes, and the duration of pronunciation generally also changes. For example, when translating from Chinese to English, the length of sentences will definitely be different, and the time it takes to pronounce the Chinese sentence and the English sentence will generally also be different.

Chinese: 有多远滚多远 (yǒu duō yuǎn gǔn duō yuǎn) (Go as far away as you can)

English: Get out of here as far as you can!

Chinese: 滚远点 (gǔn yuǎn diǎn) (Go away)

Japanese: ここから出て行け (Koko kara dete ike). (Get out of here.)

If the original Chinese audio in a video takes 2 seconds, and the translated English audio takes 4 seconds, this will inevitably cause synchronization problems.

How to Synchronize Them (Focus on Sync, Not Effect)

As mentioned above, if the duration before translation is 2 seconds and the duration after translation is 4 seconds, if you only need them to be synchronized, regardless of the speed of speech or video, you can directly speed up the audio by 2x. The 4-second duration will then be shortened to 2 seconds, naturally achieving synchronization. Alternatively, you can slow down the video, extending the original 2-second clip to 4 seconds, also achieving alignment.

Audio Speed Up Implementation Details:

  1. In the software interface, select "Automatic Audio Acceleration" and deselect "Automatic Video Slowdown." image-20240902003425516
  2. Open the menu Tools-Options and set the maximum audio acceleration multiple to 100.

This will achieve synchronization, but the drawbacks are obvious: inconsistent speech speed.

Video Slowdown Implementation Details:

  1. Deselect "Automatic Audio Acceleration" in the software interface and select "Automatic Video Slowdown."

    image-20240902003436797

  2. Open the menu Tools-Options and set the maximum video slowdown multiple to 20.

This can also achieve alignment. The speech speed remains constant, and the video is slowed down, but the video becomes inconsistent as well.

If you only want simple alignment and don't care about the effect, you can use these two methods.

Better, More Acceptable Synchronization Methods

Obviously, the above synchronization methods are not practical. Audio that is too fast or video that is too slow is unacceptable, resulting in a poor user experience. For better results, you can enable both "Automatic Audio Acceleration" and "Automatic Video Slowdown."

Specific steps:

  1. When selecting faster mode or openai mode, try to use the medium or larger model, and select "overall recognition". image-20240902004236786

  2. In the software interface, select "Automatic Audio Acceleration" and "Automatic Video Slowdown", and set a small overall acceleration value, such as 10%.

    image-20240902003457505

  3. Open the menu Tools-Options and set the maximum audio acceleration multiple to 1.8, meaning the maximum speech speed is 1.8 times the normal speed. You can manually change this to 2 or 1.5 or other values greater than 1. image-20240902003537160

  4. Open the menu Tools-Options and set the maximum video slowdown multiple to 2, meaning the video is slowed down to 0.05 times the normal speed. You can change this to 3 or 5 or other values greater than 1.

  5. After the above steps 1-3, it may still not be aligned, because the maximum value is limited. When the maximum value is reached but the alignment is not achieved, it will be abandoned and directly postponed. Then you can continue to adjust the screen subtitle related options in the menu-tools-options.

Is There a Perfect Synchronization Method?

Apart from manual intervention, such as simplifying the translation or adding transition scenes, a perfect method that can be automated by programs has yet to be found.

Simultaneously ensuring "acceptable audio acceleration range," "acceptable video slowdown range," and "mouth opening and closing moments match the start of speech" in very long or short videos, in any language translation and dubbing, through program automation, seems to be an impossible task. Apart from manual adjustment, there is no perfect method.