Speaker Recognition | pyVideoTrans-Open Source Video Translation Tool -pyvideotrans.com github.com/jianchang512/pyvideotrans

Speaker Recognition/Diarization

From version v3.74 onwards, the Ali FunASR Chinese/Deepgram.com/Gemini Large Model Recognition channels in the speech recognition feature support speaker recognition. This means that after audio transcription, the speaker will be identified and marked before the subtitles.

1
00:00:01,920 --> 00:00:06,800
[spk0] Organic molecules have been discovered in the five-old star system. We are still some people away from the third type of contact.

2
00:00:07,260 --> 00:00:12,940
[spk1] Weibo is really starting the filming mission and is approaching middle age. Recently, many photos that were difficult to shoot in the past have also been sent.

3
00:00:13,460 --> 00:00:21,380
[spk0] In early June, Tianwei scholars published this photo in Natural History, with a circle of orange light around the blue core.

In the subtitles above, [spk0] indicates the first speaker, [spk1] indicates the second speaker, and so on.

Note:

Ali FunASR Chinese: Only supports recognizing Chinese pronunciation.
Deepgram.com: Supports multiple languages, but the effect is not good for Chinese.
Gemini Large Model Recognition: Supports any language.

Due to current model performance limitations, speaker recognition is not always accurate.

Speaker Recognition/Diarization ​

Speaker Recognition/Diarization