Skip to content

GPT-SoVITS is an outstanding multilingual text-to-speech (TTS) open-source project that supports various languages, including Chinese, English, Japanese, and Korean. Its main features include:

Zero-Shot Text-to-Speech (TTS): Generate speech quickly with just a 5-second voice sample.

Few-Shot TTS: Fine-tune the model with only 1 minute of training data to improve timbre similarity and naturalness.

Cross-Lingual Support: Supports synthesis in languages different from the training dataset. Currently supports English, Japanese, Korean, Cantonese, and Chinese.

GPT-SoVITS has been upgraded to version 2, with the following new features:

  1. Added support for Korean and Cantonese
  2. Optimized text front-end processing
  3. Expanded the underlying model training data to 5000 hours
  4. Can generate higher-quality synthesized audio for low-quality reference audio (such as network audio with high-frequency loss or muffled sound quality)

GPT-SOVITS User Manual: https://www.yuque.com/baicaigongchang1145haoyuangong/ib3g1e

The video translation software has integrated GPT-SoVITS v2. This article will briefly introduce how to download the GPT-SoVITS integration package and use it in the video translation software.

Downloading the Integration Package

It is recommended to download the official GPT-SoVITS integration package to ensure compatibility. Third-party API interfaces are not compatible with the official version and may cause errors in the video translation software.

Download address: https://www.yuque.com/baicaigongchang1145haoyuangong/ib3g1e/dkxgpiy9zb96hob4

image.png

Starting the API Service

In the GPT-SoVITS folder, enter cmd in the address bar and press Enter. In the pop-up terminal window, enter .\runtime\python api_v2.py to start the API service.

image.png

The default port is 9880. You need to enter http://127.0.0.1:9880 in the video translation software.

You must start the API service to use it in the translation software.

Configuration in the Video Translation Dubbing Software

1. Enter the API Address

Start the software, click Menu -> TTS Settings -> GPT-SoVITS in sequence, and fill in http://127.0.0.1:9880 in the API Text Box.

image.png

Note: The default port is 9880. If you modify the port, the API address should also be changed accordingly. In addition, when deploying locally, make sure the address is 127.0.0.1, not 0.0.0.0.

2. Fill in the Reference Audio

Note: The reference audio must be in WAV format and be 5-10 seconds long; otherwise, a 400 Client error error will be reported.

The reference audio refers to the audio whose timbre GPT-SoVITS will use for speech synthesis. Suppose you have an audio file 1.wav (5 seconds long, the content is "Today is a good day, it's raining cats and dogs"), you can copy the file to the GPT-SoVITS folder, place it in the same location as the api_v2.py file, and fill in the corresponding content in the Reference Audio Text Box of the software.

image.png

Language code: zh for Chinese, en for English, ja for Japanese, and ko for Korean.

If you store the reference audio files uniformly in the wavs folder in the GPT-SoVITS directory, the reference audio path should be wavs/1.wav#今天是个好天气,瓢泼大雨倾盆下#zh.

image.png

3. Check api_v2?

If the api_v2.py file is started, make sure the api_v2? option is selected. image.png

4. Test Connection

Click Test. If there are no errors, the configuration is successful.

Common Issues

  1. 404 error during testing

    This is caused by using a third-party integration package. The API of the third-party package is not compatible with the official version. Please download and use the official package.

  2. "The remote computer actively refused" or "Please check whether the API service is started"

    The API service may not be started or is blocked by the firewall. Make sure the API is started or turn off the firewall.