Understanding the Main Interface Options
As shown in the picture above, here's a breakdown of what each option does:
- Select Video: Choose the original video you want to translate. The video must have human speech, and the audio should be clear with minimal background noise. Otherwise, the recognition accuracy will be affected. Note that videos without speech will not work, regardless of whether they have existing subtitles. This software works by recognizing human speech to generate subtitles. You can select multiple videos at once by holding down the Ctrl key. However, the spoken language must be the same across all videos.
- Translation Channel: "FreeGoogle" and "Microsoft" can be used directly without requiring a proxy or configuration. Other translation channels either require a proxy (like Google, even for free use) or need configuration (like Baidu Translate, Tencent Translate, etc.). If you're unsure, it's best to choose "Microsoft" or "FreeGoogle".
- Original Language: Select the language spoken in the video. For example, if the video contains English speech, select "English" here.
- Target Language: Choose the language you want to translate the video into. For example, if you want to dub the video into Chinese and embed Chinese subtitles, select "Chinese Simplified" here.
- Network Proxy Address: If you're using services that are inaccessible without a proxy in your region (like Google or Gemini), you must enter your proxy address here. For example, if you are using a V2Ray software, enter
http://127.0.0.1:10809
. If you don't understand what a proxy is, leave this blank and don't use services that require one. - Voice Cloning Channel: "edgeTTS" is free and doesn't require configuration, so it can be used directly. Other voice cloning channels require configuration or installation. If you are unsure, choose edgeTTS.
- Voice Cloning Role: Select the speaker role. Different roles have different timbres. You need to select the target language first and then choose the role.
- Faster Mode: The mode used for recognizing human speech in the video. If you don't understand, just choose the default "faster" mode.
- Tiny: The speech recognition model used. The default includes the "tiny" model under the "faster" mode. It is recommended to choose "medium" or a larger model for higher accuracy. If you choose "faster" mode or "openai" mode, you need to download additional models to the
models
directory under the software directory. The default only includes the "tiny" model under the "faster" mode. Download other models from https://github.com/jianchang512/stt/releases/tag/0.0. If you're just trying it out, choose "tiny" – no download is needed. - Overall Recognition: Leave it as the default. No need to change.
- Embed Subtitles: The method for embedding subtitles into the video. "Soft subtitles" require player support to be displayed and won't show up in web browsers. "Hard subtitles" are always visible, regardless of where the video is played, including in web browsers.
- Video End: The dubbed audio may be longer than the original video. Selecting this option extends the video by 10ms at the end until the dubbing is complete. Recommended to select this option.
- Dubbing Auto Speedup: The dubbing duration may be longer than the original speech duration. Selecting this option forces the software to speed up the speech to match the original length. The maximum speedup can be modified in the menu under "Tools/Advanced Settings -> Advanced Settings".
- Video Auto Slowdown: Selecting this option slows down the video to align the video with the audio and subtitles. The slowdown amount can also be controlled in the Advanced Settings menu.
- Keep Background Sound: Select this to retain the original background sounds (e.g., background music) in the video. Processing will be slower if this is selected, especially for large videos.
- CUDA Acceleration: If you have an NVIDIA graphics card on your Windows or Linux machine, you can use it to accelerate processing. Requires CUDA to be installed. See the installation tutorial at https://pyvideotrans.com/gpu.html.
- Clean Generated Files: If you're repeatedly processing the same video, select this to delete previously generated files before starting again.
- Shutdown After Completion: Whether to shut down the computer after the task is finished.
- Start Processing: After setting everything up, click to begin.
- Import Subtitles: If you want to use existing local subtitle files, click "Import". Once imported, these subtitles will be used directly, and the software will skip speech recognition.
- Dubbing Overall Speed: For example, 10 represents a 10% increase in speed compared to the normal speed, and -10 represents a 10% decrease.
- Volume +: Adjust the volume relative to the normal volume. Only effective under edgeTTS.
- Pitch +: Adjust the pitch relative to the normal pitch. Only effective under edgeTTS.