Badische Landesbibliothek · PyLaia · Published March 25, 2024

Latin Incunabula (Reichenau)

Text Recognition

Description

This model is trained to recognize the Gothic and Antiqua typefaces found in Latin incunabula and early prints. It was developed by the project “Digitalisierung und Volltexerkennung der ehemals Reichenauer Inkunabeln” at the Badische Landesbibliothek, which was funded by the Stiftung Kulturgut Baden-Württemberg. The Ground Truth used to train and evaluate this model is based on a collection of incunabula and post-incunabula of the former Reichenau monastery, now held at the Badische Landesbibliothek in Karlsruhe. As, typically, 1-20 pages were drawn from single documents, the Ground Truth set reflects a wide range of typefaces used by printers from the German language area and Northern Italy. The transcription of the Ground Truth followed the guidelines documented at https://doi.org/10.57962/regionalia-22875 and uses a range of Unicode characters to represent Latin abbreviations. This model was created by the Badische Landesbibliothek and is published under the CC-BY-SA 4.0 license.

Try this model

Latin Incunabula (Reichenau)
Use this modelOpen in Transkribus
Very low error rate0.6% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 0.6% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material. This is a larger model trained on diverse material, which generally makes it more robust across different handwriting styles. That said, larger training sets also make it harder to push the CER down further.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words672,762
Lines87,101
Training Pages1,449
Model ID61337
Languages
Latin
Centuries
15th c.16th c.