Speaker Diarization
Speaker diarization is the "who spoke when" task: segmenting a multi-speaker audio recording by speaker identity and attaching speaker labels to each transcript segment. It is distinct from plain transcription, which produces a flat stream of words with no notion of who said what. Diarization is essential for meeting notes, sales-call analysis, podcast transcripts, and any downstream task that needs to attribute claims, questions, or action items to specific participants. Most modern STT vendors (Deepgram, AssemblyAI, and others) expose diarization as an optional flag on the transcription request rather than as a separate service.
Example
A sales-coaching tool transcribes a 45-minute discovery call with diarization enabled, producing a transcript labeled "Speaker 1" and "Speaker 2" per turn. The labels are then mapped to "AE" and "Prospect" using the call metadata, and an LLM extracts only the prospect's objections — a task that would be impossible from a non-diarized transcript where every line is anonymous.
Put this into practice
Build polished, copy-ready prompts in under 60 seconds with SurePrompts.
Try SurePrompts