Milanka Matić-Chalkitis (MultiHTR project) · PyLaia · Published June 27, 2024

Gabelsberger_natural

Text Recognition

Description

The model is based on various manuscripts written in Gabelsberger shorthand. The training data includes a part of the diaries and supplementary sheets of Michael Cardinal von Faulhaber (https://www.faulhaber-edition.de/index.html), a part of the minutes of the Council of Ministers from 1900(mrp (oeaw.ac.at)), some war diaries of Carl Schmitt (Arbeitsgruppe Carl Schmitt / Carl Schmitt Tagebücher · GitLab) and some ego documents from the private estates of various people. The model was trained by Milanka Matić-Chalkitis as part of the MultiHTR project (project leader: Prof. Dr. Achim Rabus) at the Department of Slavic Languages and Literatures of the University of Freiburg (Germany). We would like to thank Dr. Philipp Gahn and Dr. Michael Pilarski (Institute of Contemporary History, Munich-Berlin), Dr. Stephan Kurz (Austrian Academy of Sciences) and Prof. Dr. Florian Meinl (University of Göttingen) and his team for kindly providing the GT data and for their close cooperation. The model is intended to offer assistance to those who have little or no expertise in Gabelsberger shorthand, but who wish to explore the context of their documents themselves. It should be noted that, although this model is based on a variety of different training data, the automatic transcription of individual manuscripts varies in quality. We recommend comparing the transcription results of the model with and without the language model.

Try this model

Use this modelOpen in Transkribus
Moderate error rate13.38% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 13.38% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material. This is a larger model trained on diverse material, which generally makes it more robust across different handwriting styles. That said, larger training sets also make it harder to push the CER down further.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words429,663
Lines30,691
Training Pages1,281
Model ID119053