Speaker diarization
The process of partitioning an audio recording by speaker — determining "who spoke when" — and labelling each segment (Speaker 1, Speaker 2, …).
Speaker diarization answers the question “who spoke when?”. A diarization model segments
audio by speaker and assigns labels, which a transcription pipeline then attaches to the
text so you get Speaker 1: … / Speaker 2: … rather than an undifferentiated wall of
words.
It is essential for multi-speaker recordings — meetings, interviews, calls — where knowing the speaker is as important as the words. Open models such as pyannote are commonly used.