Transcription
The conversion of spoken audio into written text, usually by an automatic speech recognition (ASR) model such as Whisper.
Transcription is the process of turning speech into text. Modern automatic transcription uses ASR (automatic speech recognition) models — Whisper-class models are the current baseline — which reach low single-digit word error rates on clean audio.
On its own, a transcript is plain text. Adding diarization (who spoke when), structure and metadata is what turns transcription into a useful, searchable knowledge base.