michael.schonhardt · PyLaia · Published July 27, 2023

bdd-wormser-scriptorium-expanded-0.1

Text Recognition

Description

This model has been trained as part of the ongoing edition project Burchards Dekret Digital (www.burchards-dekret-digital.de), funded by the Academy of Sciences and Literature Mainz. It is the project's first high-quality model specifically designed to produce a normalized transcription. The model was trained on three 11th-century manuscripts that can be traced to the episcopal scriptorium in Worms: Bamberg, SB, Msc.Can.6 (https://mdz-nbn-resolving.de/urn:nbn:de:bvb:12-bsb00140701-0), Frankfurt, UB, Ms. Barth. 50 (https://sammlungen.ub.uni-frankfurt.de/msma/urn/urn:nbn:de:hebis:30:2-12488) and Vatican, BAV, Pal.lat.585 (https://digi.vatlib.it/mss/detail/Pal.lat.585). However, it also works well as a base model for later medieval scripts. The model was trained by Dr. Michael Schonhardt (Universität Kassel, https://orcid.org/0000-0002-2750-1900). Transcriptions were provided and proofread by Helena Geitz, Daniel Gneckow, Dr. Andreas Grote, Prof. Dr. Lotte Kéry, Dr. Birgit Kynast, Dr. Hans-Christian Lehner, Dr. Melanie Panse-Buchwalter, Michaela Parma, Dr. Cornelia Scherer, Dr. Michael Schonhardt and Dr. des. Elena Vanelli. The project is led by Prof. Dr. Ingrid Baumgärtner, Prof. Dr. Klaus Herbers and Prof. Dr. Ludger Körntgen.

Try this model

Use this modelOpen in Transkribus
Very low error rate3.3% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 3.3% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words69,650
Lines16,046
Training Pages316
Model ID53889
Languages
Latin