Voice Denoising Methods | pyVideoTrans-Open Source Video Translation Tool -pyvideotrans.com github.com/jianchang512/pyvideotrans

Why Denoise?

In many voice-related application scenarios, the presence of noise can seriously affect performance and user experience. For example:

Speech Recognition: Noise can reduce the accuracy of speech recognition, especially in low signal-to-noise ratio environments.
Voice Cloning: Noise can degrade the naturalness and clarity of synthesized speech based on reference audio.

Voice denoising can solve these problems to some extent.

Common Denoising Methods

Currently, the main voice denoising techniques include the following methods:

Spectral Subtraction: This is a classic denoising method with a simple principle.
Wiener Filtering: This method works well for stationary noise, but has limited effect on varying noise.
Deep Learning: This is currently the most advanced denoising method. By leveraging powerful deep learning models, such as Recurrent Neural Networks (RNN), Convolutional Neural Networks (CNN), and Generative Adversarial Networks (GAN), it learns the complex relationships between noise and speech, achieving more accurate and natural denoising effects.

ZipEnhancer Model: Deep Learning Denoising

This tool is based on the ZipEnhancer model open-sourced by the Tongyi Laboratory and provides an easy-to-use interface and API, allowing everyone to easily experience the charm of deep learning denoising.

The project is open source on GitHub

The core of the ZipEnhancer model is the Transformer network structure and multi-task learning strategy. It can not only remove noise but also enhance speech quality and eliminate echo simultaneously. The working principle is as follows:

Self-Attention Mechanism: Captures important long-term dependencies in the speech signal, understanding the context of the sound.
Multi-Head Attention Mechanism: Analyzes speech features from different perspectives, achieving more refined noise suppression and speech enhancement.

How to Use This Tool?

Windows Pre-packaged Version:

Download and unzip the pre-packaged version (https://github.com/jianchang512/remove-noise/releases/download/v0.1/win-remove-noise-0.1.7z).
Double-click the runapi.bat file, and the browser will automatically open http://127.0.0.1:5080.
Select an audio or video file to start denoising.

Source Code Deployment:

Environment Preparation: Ensure that Python 3.10 - 3.12 is installed.
Install Dependencies: Run pip install -r requirements.txt --no-deps.

CUDA Acceleration (Optional): If you have an NVIDIA graphics card, you can install CUDA 12.1 to accelerate processing:

bash

pip uninstall -y torch torchaudio torchvision
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Run the Program: Run python api.py.

Linux System:

You need to install the libsndfile library: sudo apt-get update && sudo apt-get install libsndfile1.
Note: Please ensure that the datasets library version is 3.0, otherwise errors may occur. You can use the pip list | grep datasets command to check the version.

Interface Preview

API Usage

Interface Address: http://127.0.0.1:5080/api

Request Method: POST

Request Parameters:

stream: 0 returns the audio URL, 1 returns the audio data.
audio: The audio or video file to be processed.

Return Result (JSON):

Success (stream=0): {"code": 0, "data": {"url": "Audio URL"}}
Success (stream=1): WAV audio data.
Failure: {"code": -1, "msg": "Error message"}

Example Code (Python): (Optimized based on the original text)

python

import requests

url = 'http://127.0.0.1:5080/api'
file_path = './300.wav'


# Get the audio URL
try:
  res = requests.post(url, data={"stream": 0}, files={"audio": open(file_path, 'rb')})
  res.raise_for_status() 
  print(f"Denoised audio URL: {res.json()['data']['url']}")

except requests.exceptions.RequestException as e:
    print(f"Request failed: {e}")



# Get the audio data
try:
    res = requests.post(url, data={"stream": 1}, files={"audio": open(file_path, 'rb')})
    res.raise_for_status()
    with open("ceshi.wav", 'wb') as f:
        f.write(res.content)
    print("Denoised audio saved as ceshi.wav")

except requests.exceptions.RequestException as e:
    print(f"Request failed: {e}")

Why Denoise? ​

Common Denoising Methods ​

ZipEnhancer Model: Deep Learning Denoising ​

How to Use This Tool? ​

Interface Preview ​

API Usage ​