This page details the F5-TTS integration method with pyVideoTrans, which is compatible with pyVideoTrans V3.66 and later. For previous integration methods, please refer to Using F5-TTS with pyVideotrans (Versions < 3.66).
From v3.68 onwards, this interface can be used with F5-TTS/Spark-TTS/index-TTS/Dia-TTS/VoxCPM. Just enter the correct URL (usually http://127.0.0.1:7860 for local machines) and select the corresponding service from the dropdown list.
index-tts Deployment Methoddia-1.6b Deployment Methodspark-tts Deployment MethodVoxCPM-tts Deployment Method
Configuration
To use TTS in the video translation software, you must first launch the corresponding TTS webui interface and keep the terminal window open.
Then, fill in the URL address on this configuration page. The default is http://127.0.0.1:7860
. If your startup address is not the default, please fill in the actual address.
In the "Reference Audio" field, enter the following:
Filename of the audio you want to use#The corresponding text in that audio file
Note: Please place the reference audio file in the f5-tts
folder under the pyVideotrans
project root directory. If this folder does not exist, please create it manually. For example, you can name the reference audio file nverguo.wav
.
Example:
Click to view the Spark-TTS source code deployment methodClick to view the index-TTS source code deployment methodClick to view the Dia-1.6b source code deployment methodClick to view the VoxCPM Integration Package
F5-TTS Source Code Deployment Method
F5-TTS is an open-source voice cloning tool from Shanghai Jiao Tong University, known for its excellent results. The initial version only supported Chinese and English cloning, but the latest version v1 has expanded to support multiple languages, including French, Italian, Hindi, Japanese, Russian, Spanish, and Finnish.
This article mainly introduces how to install and start F5-TTS using the official source code and integrate it with the pyVideotrans project. In addition, it will introduce how to achieve LAN invocation by modifying the source code.
Due to limited energy, I will no longer maintain the previous personal integration package and API interface, but will instead use the official interface to connect with the pyVideotrans project. The limitation of the official interface is that it can only be called locally, and cannot be called within the LAN. See the LAN usage section of this article for a solution.
Prerequisites
Your system must have Python version 3.10 installed. Although versions 3.11/3.12 may theoretically work, they have not been actually tested, so it is recommended to use version 3.10.
If Python is not yet installed:
Mac System Installation: If not installed, please visit the Python official website to download the pkg installation package https://www.python.org/downloads/macos, select the
3.10.11
version.
Check if Python is installed:
- Windows System: Press Win+R, enter
cmd
in the pop-up window, and press Enter. In the opened black window, enterpython --version
. If3.10.xx
is displayed, it means it is installed; if it prompts "python is not an internal or external command", it means it is not installed or Python has not been added to the Path environment variable, and it needs to be reinstalled. - Mac System: Directly execute
python3 --version
in the terminal. If3.10.x
is output, it means it is installed; otherwise, it needs to be installed.
Download F5-TTS Source Code
First, create an empty folder in a suitable location. It is recommended to choose a non-system disk, a location that does not require special permissions, such as the D drive. Avoid placing it in directories such as C:/Program Files
(it is recommended that the location and all levels of folders use names composed of pure numbers or letters) to avoid potential problems. For example, D:/f5/v1
is a good location, while D:/open source f5/f5 v1
with spaces and Chinese characters is not recommended.
This article takes installing F5-TTS in the D:/python/f5ttsnew
folder on the Windows10
system as an example.
Open the website: https://github.com/SWivid/F5-TTS
As shown in the figure below, click to download the source code:
After downloading, unzip the compressed package, and copy all the files in the F5-TTS-main
folder to the D:/python/f5ttsnew
folder, as shown below:
Create a Virtual Environment
It is strongly recommended to create a virtual environment unless your computer has no other Python or AI projects. Virtual environments can effectively avoid many potential errors.
Enter cmd
in the address bar of the newly created folder D:/python/f5ttsnew
and press Enter (Mac system please use the terminal to enter the folder).
Execute the following command to create a virtual environment: python -m venv venv
. After execution, a folder named venv
will be added to the folder.
Next, activate the virtual environment (note the spaces and dot symbols):
- Windows System:
.\venv\scripts\activate
- Mac System:
. ./venv/bin/activate
After the virtual environment is activated, the (venv)
string will be added to the command line prompt. Make sure that all subsequent operations are performed in this virtual environment, and check whether the command line prompt contains (venv)
before each operation.
Install Dependencies
In the terminal with the virtual environment activated, continue to enter the following command (note the spaces and dot symbols):
pip install -e .
Wait for the installation to complete. If CUDA acceleration is required, continue to execute the following command (this is one command, do not wrap):
pip install torch==2.4.0+cu124 torchaudio==2.4.0+cu124 --extra-index-url https://download.pytorch.org/whl/cu124
Configure Scientific Internet Access Environment
Important Note: F5-TTS needs to download models online from the huggingface.co
website. Since this website is blocked in China and cannot be directly connected, you must configure a scientific Internet access environment and enable global or system proxy before starting.
If the VPN tool you are using provides an HTTP port (as shown below):
Enter the following commands in the terminal to set the proxy:
- Windows System:
set https_proxy=http://127.0.0.1:10808
(Please replace the port number with the actual port you are using) - Mac System:
https_proxy=http://127.0.0.1:10808
(Please replace the port number with the actual port you are using)
You can also directly modify the code to set the proxy to avoid manually entering it in the terminal each time. Open the F5-TTS root directory/src/f5_tts/infer/infer_gradio.py
file and add the following code at the top of the file:
import os
os.environ['https_proxy']='http://127.0.0.1:10808' # Fill in according to your actual proxy address
Start the WebUI Interface
After configuring the scientific Internet access environment, enter the following command in the terminal to start the WebUI:
f5-tts_infer-gradio
The program will automatically download the model for the first startup, which may be slow. Please be patient. When starting up later, the program may still connect to huggingface.co
for detection. It is recommended to keep the proxy enabled to avoid errors.
After the startup is successful, the terminal will display the IP address and port number, as shown below:
Open the displayed address in the browser, the default is http://127.0.0.1:7860
.
Re-recognize?: By default, the reference audio (the subtitles recognized during cloning) will be sent to F5-TTS to avoid F5-TTS starting Whisper for speech recognition, saving time and improving efficiency. However, sometimes you may want F5-TTS to re-recognize, which can improve the cloning quality to a certain extent. At this time, you can check the checkbox, but please note that if this is the first time you do this after checking it, F5-TTS will download the openai-whisper-v3 model online from huggingface.co, please ensure that you have scientific Internet access.
Solve LAN Problems
If your F5-TTS is deployed on another computer in the LAN, you need to modify the F5-TTS code to support LAN access.
Open the F5-TTS project directory/src/f5_tts/infer/infer_gradio.py
file and add the following code below line 16:
# Add LAN start
import os
from pathlib import Path
ROOT=Path(os.getcwd()).as_posix()
TMP=f'{ROOT}/tmp'
Path(TMP).mkdir(exist_ok=True)
os.environ['GRADIO_TEMP_DIR']=TMP
gr.set_static_paths(paths=[TMP,tempfile.gettempdir()])
print(TMP)
## Add LAN end
Schematic diagram of code addition position:
After saving the changes, restart F5-TTS. Then fill in the IP address and port number after F5-TTS is started in pyVideotrans, such as http://192.168.0.12:7860
.
Add Other Languages
If you need to use models in other languages, you also need to modify the F5-TTS project directory/src/f5_tts/infer/infer_gradio.py
file.
Find the code around line 59:
DEFAULT_TTS_MODEL_CFG = [
"hf://SWivid/F5-TTS/F5TTS_v1_Base/model_1250000.safetensors",
"hf://SWivid/F5-TTS/F5TTS_v1_Base/vocab.txt",
json.dumps(dict(dim=1024, depth=22, heads=16, ff_mult=2, text_dim=512, conv_layers=4)),
]
Code location schematic diagram:
By default, the official Chinese and English models are configured here. If you need to use models in other languages, please modify them according to the following instructions. After the modification is complete, you need to restart F5-TTS and ensure that the scientific Internet access environment is configured so that the program can download new language models online. After the download is successful, first clone a timbre through the WebUI for testing, and then use it through pyVideoTrans.
Important Note: Before using, please make sure that the voiceover text language in pyVideoTrans is consistent with the model language selected in F5-TTS.
The following are the configuration information for each language model:
French:
pythonDEFAULT_TTS_MODEL_CFG = [ "hf://RASPIAUDIO/F5-French-MixedSpeakers-reduced/model_last_reduced.pt", "hf://RASPIAUDIO/F5-French-MixedSpeakers-reduced/vocab.txt", json.dumps({"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1}), ]
Hindi:
pythonDEFAULT_TTS_MODEL_CFG = [ "hf://SPRINGLab/F5-Hindi-24KHz/model_2500000.safetensors", "hf://SPRINGLab/F5-Hindi-24KHz/vocab.txt", json.dumps({"dim": 768, "depth": 18, "heads": 12, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1}) ]
Italian:
pythonDEFAULT_TTS_MODEL_CFG = [ "hf://alien79/F5-TTS-italian/model_159600.safetensors", "hf://alien79/F5-TTS-italian/vocab.txt", json.dumps({"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1}) ]
Japanese:
pythonDEFAULT_TTS_MODEL_CFG = [ "hf://Jmica/F5TTS/JA_25498980/model_25498980.pt", "hf://Jmica/F5TTS/JA_25498980/vocab_updated.txt", json.dumps({"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1}) ]
Russian:
pythonDEFAULT_TTS_MODEL_CFG = [ "hf://hotstone228/F5-TTS-Russian/model_last.safetensors", "hf://hotstone228/F5-TTS-Russian/vocab.txt", json.dumps({"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1}) ]
Spanish:
pythonDEFAULT_TTS_MODEL_CFG = [ "hf://jpgallegoar/F5-Spanish/model_last.safetensors", "hf://jpgallegoar/F5-Spanish/vocab.txt", json.dumps({"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "conv_layers": 4}) ]
Finnish:
pythonDEFAULT_TTS_MODEL_CFG = [ "hf://AsmoKoskinen/F5-TTS_Finnish_Model/model_common_voice_fi_vox_populi_fi_20241206.safetensors", "hf://AsmoKoskinen/F5-TTS_Finnish_Model/vocab.txt", json.dumps({"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1})]
You can follow official updates. Other languages can be added in a similar way. Address: https://github.com/SWivid/F5-TTS/blob/main/src/f5_tts/infer/SHARED.md
Common Errors and Precautions
During API usage, you can close the WebUI interface in the browser, but you cannot close the terminal window that started F5-TTS.
Can I dynamically switch models in F5-TTS? No. You need to manually modify the code in the way described above, and then restart the WebUI.
Frequently encountering this type of error
raise ConnectTimeout(e, request=request)
requests.exceptions.ConnectTimeout: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /SWivid/F5-TTS/resolve/main/F5TTS_v1_Base/vocab.txt (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x000002174796DF60>, 'Connection to huggingface.co timed out. (connect timeout=10)'))"), '(Request ID: 0458b571-90ab-4edd-ae59-b93bd603cdd0)')
Proxy issues, please use scientific internet access and a smooth proxy. Refer to the configuration scientific internet access environment above.
- How to prevent connecting to huggingface.co every time?
Please make sure that you have successfully cloned at least once and the model has been downloaded Open
F5-TTS root directory/src/f5_tts/infer/utils_infer.py
Search for snapshot_download
, find the line of code as shown in the picture
Modify to
local_path = snapshot_download(repo_id="nvidia/bigvgan_v2_24khz_100band_256x", cache_dir=hf_cache_dir,local_files_only=True)
Then search for hf_hub_download
, find the 2 lines of code as shown in the picture
Modify to
config_path = hf_hub_download(repo_id=repo_id, cache_dir=hf_cache_dir, filename="config.yaml",local_files_only=True)
model_path = hf_hub_download(repo_id=repo_id, cache_dir=hf_cache_dir, filename="pytorch_model.bin",local_files_only=True)
Actually, it is adding the new parameter
,local_files_only=True
to the places where these 3 lines of code are called Please make sure that the model has been downloaded locally, otherwise an error will be reported that the model cannot be found
- F5-TTS is deployed normally, but pyVideotrans returns
{detail:"Not found"}
in the test- Check whether other AI projects are occupying the port. Generally, AI projects with interfaces use the gradio interface, which is also 7860 by default. Close the others and restart F5-TTS
- If pyVideotrans is deployed from source code, please execute
pip install --upgrade gradio_client
and then try again - Restart F5-TTS and start it with the command
f5-tts_infer-gradio --api