Word error rate (WER)

A standard accuracy metric for transcription: the percentage of words the system gets wrong (substitutions, insertions and deletions) versus a reference.

Updated

Word error rate (WER) measures transcription accuracy as the share of words that differ from a human reference transcript — counting substitutions, insertions and deletions. Lower is better: a 3% WER means roughly 3 words in 100 are wrong.

WER depends heavily on audio quality, accents and domain vocabulary, so a single published number is only a guide. Modern Whisper-class models reach low single-digit WER on clean audio.