There are 14 speech recognition models, classified into 3 categories, all used to recognize human speech in videos into subtitle text.
To reduce the download size, the software only includes the smallest "tiny" model by default. This model has the lowest recognition accuracy. For better results, please download other larger models.
Models Usable in Both OpenAI and Faster Modes
- tiny, tiny.en: Smallest model, fastest speed, least resource consumption, and lowest accuracy.
- base, base.en: Slightly larger than tiny.
- small, small.en: Slightly larger than base.
- medium, medium.en: Medium model, for Chinese recognition, you should choose at least medium or larger.
- large-v1, large-v2, large-v3: Largest model, highest accuracy, requires 8G or 12G or more of available video memory (VRAM).
Models ending with .en are only for audio and video with English pronunciation.
Models Only Usable in Faster Mode
- distil-whisper-small.en: Only for English videos.
- distil-whisper-medium.en: Only for English videos.
- distil-whisper-large-v2: Requires 8G or more of VRAM, currently performs well for English videos, but very poorly for other languages.
Category 1: Models with the .en Suffix
For example, tiny.en
, base.en
, medium.en
, etc. As the name suggests, these models are only used for video processing where the original language is English. That is, if the spoken language in the video you are processing is English, then choosing a model with the .en suffix will yield better results than an equivalent model without the .en suffix.
Category 2: Models without the .en Suffix
These can be used for all supported languages, such as tiny
, large-v1
, etc.
Category 3: Models Starting with distil
There are currently only three models in this category, and all of them can only process videos where the original language is English. Even if they don't have the .en suffix, it is recommended to only use them for processing videos with English pronunciation. The results will be very poor for videos in other languages.
The characteristic of these models is that they are faster. Note that distil models can only be used in "faster" mode and cannot be used in "openai" mode.
- distil-whisper-small.en
- distil-whisper-medium.en
- distil-whisper-large-v2
Faster Model Download
All models are downloaded from this address: https://github.com/jianchang512/stt/releases/tag/0.0
After opening the page, choose according to the mode you want to use. It is recommended to choose the faster model for faster speed.
After downloading the faster model, the package contains a folder. Copy the folder inside to the "models" folder in the software directory.
For example, after downloading the "medium" model, you will see a folder inside the package. Copy this folder to the "models" directory.
OpenAI Model Download
The same address: https://github.com/jianchang512/stt/releases/tag/0.0
Scroll down and download the file with the .pt suffix. Copy this file directly to the "models" directory.