Transcription

The conversion of spoken audio into written text, usually by an automatic speech recognition (ASR) model such as Whisper.

Updated 2026-06-18

Transcription is the process of turning speech into text. Modern automatic transcription uses ASR (automatic speech recognition) models — Whisper-class models are the current baseline — which reach low single-digit word error rates on clean audio.

On its own, a transcript is plain text. Adding diarization (who spoke when), structure and metadata is what turns transcription into a useful, searchable knowledge base.