Dubbing, Subtitle, and Video Synchronization in Video Translation | pyVideoTrans-Open Source Video Translation Tool -pyvideotrans.com github.com/jianchang512/pyvideotrans

Dubbing, subtitle, and video synchronization in video translation have always been technical challenges. This is because different languages have vastly different grammatical structures and speaking speeds. Translating the same sentence into another language can result in changes in the number of characters and speaking speed, leading to inconsistencies between the duration of the translated dubbing and the original audio, causing the subtitles, audio, and video to become out of sync.

Specifically, the characters in the original video may have finished speaking, but the dubbing is only halfway through; or the next sentence in the original video has already started, but the dubbing is still reciting the previous sentence, etc.

Character Count Changes Due to Translation

For example, after translating the following Chinese sentences into English, the length and number of syllables have changed significantly, and the corresponding audio duration has also changed accordingly:

Chinese: 得国最正莫过于明 (Dé guó zuì zhèng mò guò yú Míng)
English: There is no country more upright than the Ming Dynasty.
Chinese: 我一生都在研究宇宙 (Wǒ yīshēng dōu zài yánjiū yǔzhòu)
English: I have been studying the universe all my life.
Chinese: 北京圆明园四只黑天鹅疑被流浪狗咬死 (Běijīng Yuánmíngyuán sì zhī hēi tiān'é yí bèi liúlàng gǒu yǎo sǐ)
English: Four black swans in Beijing's Yuanmingyuan Garden suspected of being bitten to death by stray dogs.

As can be seen, after translating Chinese subtitles into English subtitles and dubbing them, the dubbing duration usually exceeds the original Chinese audio duration. To solve this problem, the following strategies are usually adopted:

Several Coping Strategies

Increase Dubbing Speed: In theory, as long as there is no upper limit on the speaking speed, it is always possible to match the audio duration with the subtitle duration. For example, if the original audio duration is 1 second and the dubbing duration is 3 seconds, increasing the dubbing speed to 300% will synchronize the two. However, this method makes the audio sound rushed and unnatural, and the overall effect is often unsatisfactory due to the fluctuating speed.
Simplify Translation: Reduce the dubbing duration by shortening the translation. For example, translating "我一生都在研究宇宙 (Wǒ yīshēng dōu zài yánjiū yǔzhòu)" into the simpler "Cosmology is my life's work". Although this method works best, it requires modifying the subtitles sentence by sentence, which is very inefficient.
Adjust Silence Between Subtitles: If there is silence between two subtitles in the original audio, you can reduce or remove some of the silence to bridge the duration difference. For example, if there is 2 seconds of silence between two subtitles in the original audio, and the translated first subtitle is 1.5 seconds longer than the original subtitle, the silence time can be shortened to 0.5 seconds, so that the dubbing time of the second subtitle is aligned with the original audio time. However, not all subtitles have enough silence between them to be adjusted, and the applicability of this method is limited.
Remove Silence Before and After Dubbing: Usually, some silence is retained before and after dubbing. Removing this silence can effectively shorten the dubbing duration.
Slow Down Video Playback: If simply speeding up the dubbing doesn't work well, you can consider combining it with slowing down the video playback. For example, the original audio duration of a subtitle is 1 second, and after dubbing, it becomes 3 seconds. We can shorten the dubbing duration to 2 seconds (speeding up by 1x) and simultaneously slow down the playback speed of the corresponding video clip to half (extending the duration to 2 seconds), thereby achieving synchronization.

The above methods each have their own advantages and disadvantages and cannot perfectly solve all problems. To achieve the best synchronization effect, manual fine-tuning is usually required, but this contradicts the goal of software automation. Therefore, video translation software usually integrates the above strategies, striving to achieve the best results.

Implementation in Video Translation Software

In software, these strategies are usually controlled through the following settings:

Main Interface Settings:
The "Dubbing Speed" setting is used to globally accelerate the dubbing;
The "Dubbing Acceleration" setting is used to automatically increase the dubbing duration to match the subtitles;
The "Video Slow Motion" setting is used to automatically slow down the video playback speed to match the dubbing duration;
The "Video Extension" setting is used to freeze the last frame of the picture until the dubbing ends after the dubbing is finished.
Advanced Options Settings (Menu Bar -- Tools/Options -- Advanced Options -- Subtitle Sound and Picture Alignment):
Options such as "Remove trailing silence of dubbed audio" / "Remove silence between two subtitles" / "Remove subtitle duration greater than dubbing duration" allow users to more finely control the synchronization of subtitles and dubbing.
In addition, "Maximum audio acceleration" (default 3x) and "Video slowdown factor" (default 20x) limit the degree of acceleration and deceleration, preventing audio distortion or video playback from being too slow.
Audio Compensation Shift Left:
Due to the precision limitations of the underlying technology (ffmpeg), even if synchronization is achieved at the beginning of the video, the dubbing duration may gradually become longer than the subtitle duration over time. The "Audio Compensation Shift Left" setting can shift the entire subtitle timeline to the left, effectively alleviating this problem, for example, eliminating the blank space between subtitles every 3 minutes.

By flexibly using the above settings, video translation software can automate the problem of synchronizing subtitles and dubbing as much as possible, improving translation efficiency.

Character Count Changes Due to Translation ​

Several Coping Strategies ​

Implementation in Video Translation Software ​

Character Count Changes Due to Translation

Several Coping Strategies

Implementation in Video Translation Software