Johan Heinsen (Aalborg University) and Max Odsbjerg Pedersen (Royal Danish Library) · Fields · Published October 23, 2025
Danish Newspapers 1800-1900
Field ExtractionScholar+
Description
This model is trained on historical newspapers from the Danish newspaper collection at the Royal Danish Library (Det Kongelige Bibliotek), spanning from 1800 to 1900. It is trained to handle highly variable newspaper formats, ranging from small A4 pages with two columns to large-format A2 broadsheets with up to six columns. To achieve optimal results and ensure the correct reading order, it is recommended to apply column-wise region sorting after segmentation.
Open in Transkribus
High precision89.28% MaP
Mean Average Precision (MaP) measures how accurately the model detects field regions (higher is better). This model scored 89.28% on its validation set. MaP is harder to compare across models than CER, because the score depends heavily on how many distinct region types the model must distinguish. A model detecting a handful of simple fields will naturally score higher than one trained to recognise many fine-grained regions, even if both perform well in practice.
This score reflects performance on the model's own validation data. Your results will depend on how closely your documents match the training material and the complexity of the structures you need to detect.
Words149,913
Lines40,423
Training Pages1,467
Model ID420801