Dorothee Huff · PyLaia · Published September 6, 2023

German_Gothic_Scripts_14th-16th_century

Text Recognition

Description

This model is based on 50+ manuscripts (ca. 500 pages) containing religious texts and travelogues in prose and verse from the 15th and 16th century. The texts are written mainly in Gothic cursive and Bastarda as well as some Textualis and Current writing. They include different Middle High and Low German dialects. Transcription guidelines: - abbreviations are dissolved - s-forms are normalised - diacritical marks are mostly kept (because of a change in the edition guidelines there may be some inconsistencies) The Ground Truth was created by the project “Narrative Vermittlung religiösen Wissens. Edition und Kommentierung geistlicher Vers- und Prosatexte des 13. bis 16. Jahrhunderts” at the Universities Köln and Tübingen (https://religioese-kurzerzaehlungen.uni-koeln.de/), which is funded by the DFG, and the project “Edition der deutschen Übersetzung der ‚Voyages‘ des Jean de Mandeville durch Otto von Diemeringen“ (originally funded by the DFG and Fritz Thyssen-Stiftung). The model training was carried out in cooperation with the University Library of Tübingen and the project OCR-BW (https://ocr-bw.bib.uni-mannheim.de/), which was funded by the MWK Baden-Württemberg.

Try this model

Use this modelOpen in Transkribus
Very low error rate4.1% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 4.1% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words103,304
Lines16,387
Training Pages465
Model ID54887
Languages
GermanGerman Middle High (ca.1050-1500)Low
Centuries
14th c.15th c.16th c.