Milanka Matić-Chalkitis (MultiHTR project) · PyLaia · Published June 21, 2024

Faulhaber

Text Recognition

Description

This is the first version of a specific model for the German Gabelsberger shorthand system based on the handwriting of Michael Cardinal von Faulhaber. The training data includes several diaries handwritten by him as well as so-called supplementary sheets between 1911 and 1952 in Gabelsberger shorthand. The model was trained by Milanka Matić-Chalkitis as part of the MultiHTR project (project leader: Prof. Dr. Achim Rabus) at the Department of Slavic Languages and Literatures of the University of Freiburg (Germany). The training data was kindly provided by the project “Kritische Online-Edition der Tagebücher Michael Kardinal von Faulhabers (1911-1952)” (https://www.faulhaber-edition.de/index.html). The data may be used subject to the applicable rights of use (https://www.faulhaber-edition.de/impressum.html#lizenz). We would particularly like to thank the project collaborators Dr. theol. Philipp Gahn and Dr. theol. Michael Pilarski for their expert support and exchange. In particular, the model is suitable for the handwritten documents of Michael Cardinal von Faulhaber written in the Gabelsberger shorthand. In general, it can be used as a transcription aid model for all manuscripts written in Gabelsberger shorthand, in order to make the context accessible. The “Gabelsberger_natural” model can also be used for manuscripts written in natural Gabelsberger shorthand.

Try this model

Use this modelOpen in Transkribus
Moderate error rate12.17% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 12.17% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material. This is a larger model trained on diverse material, which generally makes it more robust across different handwriting styles. That said, larger training sets also make it harder to push the CER down further.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words259,537
Lines19,953
Training Pages829
Model ID113533