Skip to content

This document organizes various translation, dubbing, and speech recognition channels, categorized into free and paid options.

It also recommends the best combinations based on the usage environment (e.g., whether or not a VPN is used), ensuring you can find suitable tools in different situations.

Completely Free Solutions

Translation Channels

  • Without VPN or Proxy

    • Preferred: DeepSeek / Zhipu AI as the translation channel. Apply for "DeepSeek" or "Zhipu AI" accounts, request an SK (Secret Key), and fill it in the "DeepSeek or Zhipu AI" settings in the translation settings. Secondary choice: Microsoft Translator.
  • With VPN or Proxy

    • Preferred: Gemini AI Translation, followed by Google.

Dubbing Channels

  • Preferred: "edge-TTS," free and requires no setup, supports all languages.
  • When the target language is Chinese, prioritize dubbing channels like "GPT-SoVITS," "F5-TTS," and "CosyVoice."
  • When the target language is other than Chinese, preferred is edge-TTS.

Speech Recognition Channels

  • When the video language is Chinese

    • Preferred: "Ali FunASR," which is Alibaba's FunASR series of Chinese models, offering better performance than Whisper.
    • Secondary choice: faster-whisper or openai-whisper (local), with the "large-v2" model selected and the speech segmentation mode set to "overall recognition."
    • For Chinese, Japanese, and Korean single-line characters, the default is to split every 20 characters into one subtitle, which can be modified as needed.
  • When the video language is English or other languages

    • Preferred: faster-whisper or openai-whisper (local), with the "large-v2" or "large-v3-turbo" model selected and the speech segmentation mode set to "overall recognition."
  • When the video language is a minority language

    • Preferred: Gemini large model recognition, with the speech segmentation mode set to "overall recognition."

Note: Gemini is not available in all countries. If it indicates that the current country is not supported, please switch VPN nodes; Singapore or Japan nodes are recommended. You can also choose Google Translate.

Completely Paid Solutions

If you are pursuing higher translation quality, you can choose a third-party paid API.

Translation Channels

  • OpenAI ChatGPT (latest models), Gemini, 302.AI, Domestic AI (such as DeepSeek, Zhipu AI).

Dubbing Channels

  • AzureTTS, ByteDance Volcano Speech Synthesis, Elevenlabs.io, OpenAI-TTS.

Speech Recognition Channels

  • For Chinese videos, the preferred choice is ByteDance Volcano Subtitle Generation.
  • For videos in other languages, it is recommended to use faster-whisper or openai-whisper (local) as well as Deepgram.com.

Best Combination Without Using a VPN

  • Translation Channels: Domestic AI (such as DeepSeek, Zhipu AI), Microsoft Translator.
  • Dubbing Channels: AzureTTS, edge-TTS, GPT-SoVITS, F5-TTS, CosyVoice, QwenTTS.
  • Speech Recognition: faster-whisper or openai-whisper (local), select the "large-v2" or "large-v3-turbo" model, speech segmentation mode is "overall recognition," and check "Chinese re-segmentation."

Best Combination Without Restrictions on Paid Services / Without VPN Restrictions

  • Translation Channels: OpenAI ChatGPT latest series models, GeminiAI, DeepSeek, Google Translate, Microsoft Translator.
  • Dubbing Channels: AzureTTS/edge-TTS, ByteDance Volcano Speech Synthesis, Elevenlabs.io, OpenAI-TTS, GPT-SoVITS, F5-TTS, CosyVoice, QwenTTS.
  • Speech Recognition: faster-whisper or openai-whisper (local) / ByteDance Volcano Subtitle Generation / Ali FunASR.

Easiest and Simplest Combination (No Proxy, No Configuration Required)

  • Translation Channel: Microsoft Translator (if you have a VPN and know how to use it, you can choose Google Translate).
  • Dubbing Channel: edge-TTS.
  • Speech Recognition: faster-whisper (local)

Best Speech Recognition Channel for Chinese Pronunciation Videos

  • ByteDance Volcano Subtitle Generation
  • Ali FunASR.
  • faster-whisper (local, large-v2/large-v3-turbo model)
  • openai-whisper (local, large-v2/large-v3-turbo model)

Best Speech Recognition Channel for Videos with Other Language Pronunciation

  • Gemini Large Model Recognition
  • faster-whisper
  • openai-whisper (local, large-v2/large-v3-turbo model)

Best Translation Channel Effectiveness

  1. OpenAI ChatGPT latest series models/Gemini
  2. Domestic AI Translation
  3. Google/DeepL
  4. Microsoft Translator/Tencent Translate/Baidu Translate