Speaker Recognition/Diarization
From version v3.74 onwards, the Ali FunASR Chinese/Deepgram.com/Gemini Large Model Recognition
channels in the speech recognition feature support speaker recognition. This means that after audio transcription, the speaker will be identified and marked before the subtitles.
1
00:00:01,920 --> 00:00:06,800
[spk0] Organic molecules have been discovered in the five-old star system. We are still some people away from the third type of contact.
2
00:00:07,260 --> 00:00:12,940
[spk1] Weibo is really starting the filming mission and is approaching middle age. Recently, many photos that were difficult to shoot in the past have also been sent.
3
00:00:13,460 --> 00:00:21,380
[spk0] In early June, Tianwei scholars published this photo in Natural History, with a circle of orange light around the blue core.
In the subtitles above, [spk0]
indicates the first speaker, [spk1]
indicates the second speaker, and so on.
Note:
- Ali FunASR Chinese: Only supports recognizing Chinese pronunciation.
- Deepgram.com: Supports multiple languages, but the effect is not good for Chinese.
- Gemini Large Model Recognition: Supports any language.
Due to current model performance limitations, speaker recognition is not always accurate.