Tobias Hodel · PyLaia · Published May 21, 2022

Medieval_Scripts_M2.4

Text Recognition

Description

This is a combined model of ground truth of different charter and book scripts from a variety of projects and institutions, aiming at building a generic model for Latin scripts of the Middle Ages. It is mainly based on documents from the project CREMMA Manuscrits médiévaux latins, HIMANIS (CNRS), Itinera Nova (Stadsarchief Leuven), and Charters and Records of Königsfelden (Universität Zürich). CREMMA Manuscrits médiévaux latins has been produced by Clérice, Thibault and Chagué, Alix and Vlachou Efstathiou, Malamatenia. It is licensed under a CC-BY 4.0 license. URL: https://github.com/HTR-United/CREMMA-Medieval-LAT HIMANIS is partially published as HIMANIS Guérin produced by Stutzmann, Dominique; Hamel, Sébastien; Kernier, Iseut de; Mühlberger, Günter; Hackl, Günter. Licensed under a CC-BY 4.0 license. DOI: 10.5281/zenodo.5535306 Charters and Records of Königsfelden Abbey and Bailiwick (1308-1662) has been produced by Halter-Pernet, Colette; Teuscher, Simon; Hodel, Tobias; Barwitzki, Lukas; Egloff, Salome; Henggeler, Fabian; Nadig, Michael; Steinmann, Anina; Stettler, Sabine; Prada Ziegler, Ismail. Licensed under a CC-BY 4.0 license. DOI: 10.5281/zenodo.5179361

Try this model

Use this modelOpen in Transkribus
Low error rate7.1% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 7.1% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material. This is a larger model trained on diverse material, which generally makes it more robust across different handwriting styles. That said, larger training sets also make it harder to push the CER down further.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words7,103,723
Lines1,020,225
Training Pages24,764
Model ID42143
Languages
French Middle (ca.1400-1600)German Middle High (ca.1050-1500)LatinDutch