brievenproject · Baselines · Published December 19, 2022

Baselines_NIOD_WarLet_1935-1950

Layout Analysis

Description

The layout model ‘Baselines_NIOD_WarLet_1935-1950’ was trained on handwritten correspondence in Dutch, originating from the period 1935-1950. The training set consists of 751 ‘Ground Truth’ transcriptions of high-resolution scans. All documents included are part of the archival collection known as ‘247 Correspondentie’ held by the NIOD Institute for War, Holocaust, and Genocide Studies in Amsterdam. The training set contains personal correspondence from a wide variety of letter writers (e.g., children, soldiers, Jewish people in hiding). This model was created as part of the project ‘First-Hand Accounts of War: War letters (1935-1950) from NIOD digitised’, that ran from 2020 till 2023. All documents used for training were scanned and transcribed within this project. This project was funded by the Mondriaan Fund, the Dutch Ministry of Health, Welfare, and Sport, and the NIOD Institute for War, Holocaust, and Genocide Studies in Amsterdam. The ‘Ground Truth’ training set is created by project members Annelies van Nispen, Carlijn Keijzer and Milan van Lange. Additional transcription and layout correction of ‘Ground Truth’ transcriptions was performed under supervision of Muriël Bouman by citizen scientists Hillebrand Verkroost, Bart Cohen, Evelien Bachrach, Marjo Janssens, and Cocky Sietses.
Open in Transkribus
Very low loss4.45% loss

Loss indicates how far the predicted text regions deviate from the ground truth (lower is better). This model achieved 4.45% on its validation set. A loss below 10% generally indicates reliable baseline detection. Trained on a broad range of page layouts, this model should generalise well. Complex or unusual structures may still require fine-tuning.

Layout detection quality depends heavily on your document's structure. Pages with columns, marginalia, or non-standard layouts may produce different results.

Words120,737
Lines16,478
Training Pages751
Model ID48888