Difference Between Whole Recognition and Equal Division

Whole Recognition:

This method provides the best speech recognition results but consumes the most computer resources. If you're working with large videos and using the large-v3 model, it might lead to crashes.

During recognition, the entire audio file is passed to the model. The model internally uses VAD (Voice Activity Detection) for segmentation and sentence breaking. The default silence split is 200ms, and the maximum sentence length is 3s. These settings can be configured in Menu -- Tools/Options -- Advanced Options -- VAD.

Equal Division:

As the name suggests, this method cuts the audio file into segments of fixed length and then passes them to the model. Additionally, the OpenAI model forces the use of equal division. This means that when using the OpenAI model, regardless of whether you select "Whole Recognition" or "Pre-segmentation," "Equal Division" will be enforced.

With equal division, each segment is 10 seconds long, and the silence split interval is 500ms. These settings can be configured in Menu -- Tools/Options -- Advanced Options -- VAD.

Note: Although you set the segment length to 10 seconds, each subtitle will be approximately 10 seconds long. However, the actual duration of each voiceover may not always be exactly 10 seconds, taking into account the duration of the speech and the removal of silence at the end of the voiceover.

Difference Between Whole Recognition and Equal Division ​

Difference Between Whole Recognition and Equal Division