Joshua Isom · PyLaia · Published May 10, 2025

Latin Court Hand: KB27/795 (1460)

Text Recognition

Description

A specialized Handwritten Text Recognition (HTR) model was developed using Pylaia in Transkribus to improve access to challenging plea rolls (CP40, KB27) from The National Archives, utilizing AALT website images provided by Robert Palmer, Elspeth Rosbrook, and Susanne Brand. Initially focused on KB27/795, the model tackles dense, abbreviated Court Hand script. An innovative iterative strategy involved HTR processing, followed by refinement using an LLM (Anthropic's Claude 3.7 Sonnet) guided by paleographic rules and Vance Mead's index. Uncertain lines, identified by high Character Error Rate (CER) from multiple LLM transcriptions, were tagged "unclear." Crucially, these "unclear" lines—often due to manuscript damage or difficult script—were excluded from the ground truth used to retrain Pylaia. This created a "clean" training set focused on high-confidence transcriptions, improving the model's accuracy on clearer text and achieving ~5% CER on the target roll. The transcription philosophy emphasizes manuscript fidelity: non-expansion of abbreviations, strict line integrity, and precise letterforms/capitalization. While trained on clean data from KB27/795, the model offers high accuracy there and is expected to perform well on similar rolls with graceful degradation. It provides visually faithful, non-expanded transcriptions, enhancing access to these vital historical records, especially their clearer sections.

Try this model

Latin Court Hand: KB27/795 (1460)
Use this modelOpen in Transkribus
Low error rate5.25% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 5.25% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words181,441
Lines11,177
Training Pages422
Model ID336333
Languages
Latin
Centuries
14th c.15th c.16th c.