Voices of the People, Aarhus University · PyLaia · Published September 10, 2025

Tekst-model, 1700-tallets administrative gotiske håndskrift, september 2025

Text Recognition

Description

Modellen er trænet på korrekturlæste sider fra 37 supplikprotokoller fra Danske Kancelli. Protokollerne dækker perioden 1740-1769 og indeholder dansk gotisk håndskrift fra 1700-tallet fra forskellige hænder. Der forekommer enkelte latinske fraser og begreber samt latinske bogstaver, som modellen også er trænet på. Størstedelen af træningsdata stammer fra supplikprotokollen i 1758, mens det resterende er jævnt fordelt over de andre 36 supplikprotokoller. Den fungerer bedst med advanced settings og language model samt smart search slået til. Modellen er trænet af Line Keller Nørbøge Ottosen og Anne Sørensen som en del af projektet Voices of the People, ledet af Nina Javette Koefoed og støttet af Carlsbergfondet, se https://cas.au.dk/voices. Alle protokoller er transskriberede og tilgængelige på vores site: https://app.transkribus.org/sites/supplikker. Modellen er trænet med "18C Danish Administrative Writing (PyLaia)" som base-model. The model was trained using pages from 37 proofread petition protocols from the Danish central administration (Danske Kancelli), which cover the period from 1740 to 1769. The texts contain 18th century Danish Kurrent/Gothic handwriting from various hands. The model is also trained on a few Latin words and letters. The best results are achieved with advanced settings, language model and smart search enabled. Line Keller Nørbøge Ottosen and Anne Sørensen trained the model as part of the Voices of the People project, led by Nina Javette Koefoed and funded by the Carlsberg Foundation. Further information can be found at: https://cas.au.dk/voices. All protocols are transcribed and published at: https://app.transkribus.org/sites/supplikker/. The model is trained using the "18C Danish Administrative Writing (PyLaia)" base model.

Try this model

Tekst-model, 1700-tallets administrative gotiske håndskrift, september 2025
Use this modelOpen in Transkribus
Very low error rate3.14% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 3.14% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material. This is a larger model trained on diverse material, which generally makes it more robust across different handwriting styles. That said, larger training sets also make it harder to push the CER down further.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words1,273,057
Lines217,711
Training Pages1,573
Model ID397597
Languages
Danish