Recommended Best Practices for Video Translation | pyVideoTrans-Open Source Video Translation Tool -pyvideotrans.com github.com/jianchang512/pyvideotrans

An ideal translated video should have the following characteristics: accurate and appropriately timed subtitles, dubbing voice that matches the original, and perfect synchronization between subtitles, audio, and video.

This guide will walk you through the four key steps of video translation, providing best practice recommendations for each stage.

Step 1: Speech Recognition

Goal: Transcribe the audio from the video into a subtitle file in the original language.
Corresponding Control Element: "Speech Recognition" row
Best Configuration (Non-Chinese):
- Select faster-whisper(local)
- Choose model large-v2, large-v3, or large-v3-turbo
- Speech Segmentation Mode: Overall Recognition
- Check Preserve Original Background Sound (more time-consuming)
Best Configuration (Chinese):
- Select Ali FunASR
- Speech Segmentation Mode: Overall Recognition
- Check Preserve Original Background Sound (more time-consuming)
Best Configuration (Less Common Languages):
- Select Gemini Large Model Recognition
Note: Processing speed will be extremely slow without an NVIDIA graphics card or if CUDA acceleration is not enabled. Insufficient video memory can lead to crashes.

Step 2: Subtitle Translation

Goal: Translate the subtitle file generated in the first step into the target language.
Corresponding Control Element: "Translation Channel" row
Best Configuration:
- Preferred: If you have a VPN and understand how to configure it, use the gemini-2.5-flash model (Gemini AI Channel) in Menu - Translation Settings - Gemini Pro.
- Second Best: If you don't have a VPN or don't know how to configure a proxy, select DeepSeek in "Translation Channel."
GeminiAI usage instructions: https://pyvideotrans.com/gemini.html

Step 3: Voice Dubbing

Goal: Generate dubbing based on the translated subtitle file.
Corresponding Control Element: "Dubbing Channel" row
Best Configuration:
- Edge-TTS: Free and supports all languages
- Chinese or English: F5-TTS/Index-TTS (local)
- Japanese/Korean: CosyVoice (local)
You need to install the corresponding F5-TTS/CosyVoice/clone-voice integration package. See the documentation: https://pyvideotrans.com/f5tts.html

Step 4: Subtitle, Dubbing, and Video Synchronization

Goal: Synchronize the subtitles, dubbing, and video.
Corresponding Control Element: Synchronization row
Best Configuration:
- When translating from Chinese to English, you can adjust the Dubbing Speed value (e.g., 10 or 15) to speed up the dubbing because English sentences are typically longer.
- Select the Dubbing Acceleration and Video Slowdown options to force alignment of subtitles, audio, and video.

Output Video Quality Control

The default output uses lossy compression. For lossless output, set Video Transcoding Loss Control to 0 in Menu - Tools - Advanced Options - Video Output Control Area:
Note: If the original video is not in MP4 format or uses embedded hard subtitles, video encoding conversion will cause some loss, but the loss is usually negligible. Improving video quality will significantly reduce processing speed and increase the output video size.

Step 1: Speech Recognition ​

Step 2: Subtitle Translation ​

Step 3: Voice Dubbing ​

Step 4: Subtitle, Dubbing, and Video Synchronization ​

Output Video Quality Control ​

Step 1: Speech Recognition

Step 2: Subtitle Translation

Step 3: Voice Dubbing

Step 4: Subtitle, Dubbing, and Video Synchronization

Output Video Quality Control