Projekt Trug&Schein · PyLaia · Published January 19, 2024

T&S_R1_M1

Text Recognition

Description

HTR model for the handwriting of the teacher Roland Nordhoff (pseudonym) (*1907). Nordhoff wrote mostly in German Kurrent and occasionally in Sütterlin, in Latin script, including German umlauts and some special characters (ß, ſ) that we partly modernized (to ß and s). We did not make use of a base model. The model was trained with letters from 1942 and 1943 by Laura Fahnenbruck and Andrew S. Bergerson (University of Missouri-Kansas City). The training data was transcribed to Ground Truth by a group of volunteers in countless hours in the public history project Trug&Schein: Ein Briefwechsel. Eine kritische Begegnung mit dem Alltag des Zweiten Weltkriegs – Schreib mit! (2011-2022). The large corpus of Nordhoffs letters to his wife (and vice versa) span the years 1938 to 1946 and is published as transcripts on https://alltag-im-krieg.de/startseite.

Try this model

T&S_R1_M1
Use this modelOpen in Transkribus
Very low error rate2% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 2% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material. This is a larger model trained on diverse material, which generally makes it more robust across different handwriting styles. That said, larger training sets also make it harder to push the CER down further.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words177,857
Lines21,596
Training Pages727
Model ID58659
Languages
German
Centuries
20th c.