wouter.haverals · PyLaia · Published October 14, 2023

Middle Dutch Gothic 14th Century

Text Recognition

Description

This handwritten text recognition model is trained on medieval manuscripts from the Carthusian monastery of Herne (ca. 1350-1400), specifically targeting Middle Dutch texts produced by the monastery's scribal community. The model is designed to transcribe manuscripts written by approximately 13 identified scribes who were active during this period. The training data consists of hyper-diplomatic transcriptions that meticulously preserve original manuscript features including abbreviations, punctuation marks, marginal annotations, and exact glyph representations. This model is particularly suited for transcribing Gothic scripts typical of 14th-century scriptoria in the Low Countries. It captures the distinctive scribal practices of the Herne community, including their characteristic abbreviation systems, orthographic preferences, and annotations that resulted from their silent scribal practices. The model was trained on diplomatic transcriptions created by a team of researchers including Wouter Haverals, Mike Kestemont, and collaborators including Anouck Kuypers, Sam Verellen, Frans de Jonge, and Ine Kiekens. These transcriptions maintain spelling as it appears in the manuscripts and preserve all abbreviations exactly as written on the page, following 'graphemic reproduction' standards. The team manually transcribed a minimum of 10% of each manuscript (1,199 folios total) to ensure representative coverage across the entire corpus, with some transcriptions bootstrapped from existing diplomatic editions. ================= This model was developed as part of the FWO-funded research project "Silent voices: A Digital Study of the Herne Charterhouse as a Textual Community (ca. 1350-1400)". ================= References: - Project website: https://hosting.uantwerpen.be/silent-voices/ - Haverals, W., & Kestemont, M. (2023). The middle Dutch manuscripts surviving from the Carthusian monastery of Herne (14th century): constructing an open dataset of digital transcriptions. Proceedings http://ceur-ws. org ISSN, 1613, 0073. - Haverals, W., & Kestemont, M. (2020). Silent voices: a digital study of the Herne Charterhouse Scribal Community (ca. 1350-1400). Queeste, 27(2), 186-195.

Try this model

Middle Dutch Gothic 14th Century
Use this modelOpen in Transkribus
Very low error rate2.4% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 2.4% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material. This is a larger model trained on diverse material, which generally makes it more robust across different handwriting styles. That said, larger training sets also make it harder to push the CER down further.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words284,777
Lines41,855
Training Pages1,199
Model ID55703
Languages
Dutch Middle (ca.1050-1350)LatinDutch