Danish Newspapers 1750-1850

Description

This is a model created to read Danish newspapers in their existing digitised form, as found in Mediestream or Loar. It was trained by Johan Heinsen, Camilla Bøgeskov and the team members of the project Klart som Blæk. For more information see https://hislab.quarto.pub/aalborgonline/ The model performs best on running text. It reads fraktur print better than latin characters, although it can often still decipher the latter, since the newspapers used for training data occasionally include latin characters. The model far outperforms OCR when dealing with deteriorated materials, small letterforms, or material that has been scanned from microfilm, as is the case with the Danish newspaper collection held by the Danish Royal Library. It has been trained on materials from various advertisement papers, mainly from Copenhagen and Aalborg in the decades around 1800. When used on Danish material, it should be used with its language model. The model performs well on most of the newspapers from the period, though a special model suited for the colonial papers is needed, because these are often multi-lingual and also use latin characters much more prevalently. As of this writing (October 2023), the Transkribus Print models performs better on the papers from St. Croix and St. Thomas. The model has been updated in May 2025.

Try this model

Drag an image here

Select a file...

PNG or JPG up to 10 Mb

Wolpi

AI Assistant

By uploading an image, you accept our terms and privacy policy.

Use this model Open in Transkribus

Very low error rate0.56% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 0.56% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material. This is a larger model trained on diverse material, which generally makes it more robust across different handwriting styles. That said, larger training sets also make it harder to push the CER down further.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words420,266

Lines60,354

Training Pages642

Model ID306013

Related models

Description

Try this model

Related models

Nordic typewriter 1900-1950

19th century Schochisk Fractur

Seventeenth Century Danish Newspapers

Danish Fraktur SB 19th century PyLaia