Word error rate (WER)
A standard accuracy metric for transcription: the percentage of words the system gets wrong (substitutions, insertions and deletions) versus a reference.
Word error rate (WER) measures transcription accuracy as the share of words that differ from a human reference transcript — counting substitutions, insertions and deletions. Lower is better: a 3% WER means roughly 3 words in 100 are wrong.
WER depends heavily on audio quality, accents and domain vocabulary, so a single published number is only a guide. Modern Whisper-class models reach low single-digit WER on clean audio.