Projekt Trug&Schein · PyLaia · Published January 21, 2024

T&S_HR_M1

Text Recognition

Description

HTR model for the handwriting of the worker Hilde Nordhoff (pseudonym) (*1920) and the teacher Roland Nordhoff (pseudonym) (*1907), a couple from rural Germany. She wrote in modern German, Latin script, including German umlauts and the special character ß. Some characters have leftovers from Gernan Kurrent, like an overline or curve above the u. He wrote mostly in German Kurrent and occasionally in Sütterlin, in Latin script, including German umlauts and some special characters (ß, ſ) that we partly modernized (to ß and s). German Giant I was used as a base model. The model was trained with letters from 1942 and 1943 by Laura Fahnenbruck and Andrew S. Bergerson (University of Missouri-Kansas City). The training data was transcribed to Ground Truth by a group of volunteers in countless hours in the public history project Trug&Schein: Ein Briefwechsel. Eine kritische Begegnung mit dem Alltag des Zweiten Weltkriegs – Schreib mit! (2011-2022). The large corpus of Hilde and Roland Nordhoffs letters span the years 1938 to 1946 and is published as transcripts on https://alltag-im-krieg.de/startseite.

Try this model

T&S_HR_M1
Use this modelOpen in Transkribus
Very low error rate2% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 2% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material. This is a larger model trained on diverse material, which generally makes it more robust across different handwriting styles. That said, larger training sets also make it harder to push the CER down further.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words241,144
Lines27,547
Training Pages1,333
Model ID58704
Languages
German
Centuries
20th c.