Reference

Character Error Rate (CER) — The Standard Metric for Transcription Accuracy

CER is the most widely used metric for evaluating handwritten text recognition. It measures the percentage of characters that differ between an AI transcription and a human-verified reference — and it is the number reviewers, funders, and fellow researchers will ask you about.

6 min read

How CER is calculated

The Character Error Rate measures the edit distance between the AI transcription and the ground truth, normalized by the length of the reference text.

CER=S+D+IN

S = substitutions, D = deletions, I = insertions, N = total characters in the reference text. A CER of 20.0% means 5 out of 25 characters differ.

< 2%

Excellent

Publication-ready accuracy. Suitable for critical editions and scholarly work with minimal manual review.

2–5%

Good

Suitable for most research workflows. Spot-check and correct key passages before publishing.

5–10%

Needs review

Usable for keyword search and indexing. Consider training a custom model for better results.

Single-hand collections

For documents written by one person in a consistent hand, 15–30 pages of Ground Truth typically achieve good results (CER under 5%).

Multi-hand collections

Registers, court records, or correspondence with many writers need more diversity in training data — typically 50–100 pages across different hands.

Start with a public model

300+ pre-trained models are available. Start with one, evaluate its CER on your material, and only train a custom model if needed.

Iterative improvement

You don't need all Ground Truth upfront. Start with 15 pages, train, evaluate, add more pages where the model struggles, retrain.

Target CER depends on use case

Full-text search works well at 5–8% CER. Scholarly editions may need under 2%. Keyword spotting tolerates even 10–15%.

Quality over quantity

Accurate Ground Truth matters more than volume. 20 carefully corrected pages outperform 100 pages with errors in the reference.

See how CER works — compare transcription quality at a glance

Each example below shows a Ground Truth line and the corresponding recognised text. Characters that differ are highlighted. The CER is calculated automatically from the Levenshtein edit distance.

Ground Truth

Am 15. März 1782 erschien vor dem Gericht der Bürger Johann Georg Müller

Recognised Text

Am 15. März 1782 erschien vor dem Gericht der Bürger Johann Georg Muller

Correct Substitution Insertion Deletion

72Total chars

71Correct

1Substitutions

0Insertions

0Deletions

CER = (1 + 0 + 0) / 72= 1.4%

Benchmarks

CER benchmarks across document types

Real-world CER values depend on the document type, script, and the model used. The table below compares typical results from Transkribus AI models against standard OCR engines.

Feature	Transkribus HTR	Standard OCR
Printed modern text (post-1950)	0.5–1% CER	1–3% CER
Typewritten documents (1920s–1960s)	1–3% CER	3–8% CER
Handwritten 19th century	2–5% CER	15–30% CER
Kurrent / Sütterlin (18th–19th c.)	3–8% CER	Fails
Medieval manuscripts	5–15% CER	Fails

Values are indicative ranges based on well-matched models. Actual CER depends on document condition, handwriting consistency, and model training data.

Document quality

Faded ink, stains, bleed-through, and physical damage all introduce noise that makes characters harder to recognise. High-quality scans of well-preserved originals yield the best CER.

Script type

Modern cursive is easier to recognise than Kurrent, Sütterlin, or medieval book hands. The further the script is from modern letterforms, the more training data the model needs.

Model training data

A model trained on material similar to yours will dramatically outperform a generic one. Custom models trained on 50–100 pages of Ground Truth can cut CER by half or more.

Image resolution

Scans at 300 DPI or higher preserve fine details needed to distinguish similar-looking characters. Low-resolution images increase substitution errors significantly.

Layout complexity

Multi-column layouts, marginalia, tables, and interlinear annotations require accurate layout analysis. Errors in text region detection directly reduce effective CER.

Language

Languages with complex diacritics, non-Latin scripts, or extensive ligatures present additional challenges. Dedicated language-specific models typically achieve the best results.

Find the right model

Find the right model for your documents

Browse over 300 public AI models in the Transkribus model catalogue. Filter by language, script type, and century to find models that match your material — and check their published CER scores before you start.

Browse public models

Historical protocol document transcribed with Transkribus

Try Transkribus on your own documents

Create a free account and see what CER you can achieve on your material. Start with a public model or train your own.

Start for free Browse public models

50 free credits every month · No credit card required

200M+Pages processed

500K+Users worldwide

300+Public AI models