NIOD_WarLet_1935-1950

Description

The HTR model ‘NIOD_WarLet_1935-1950’ was trained on handwritten correspondence in Dutch, originating from the period 1935-1950. The training set consists of 1087 ‘Ground Truth’ transcriptions of high-resolution scans. All documents included are part of the archival collection known as ‘247 Correspondentie’ held by the NIOD Institute for War, Holocaust, and Genocide Studies in Amsterdam. The training set contains personal correspondence from a wide variety of letter writers (e.g., children, soldiers, Jewish people in hiding). This model was created as part of the project ‘First-Hand Accounts of War: War letters (1935-1950) from NIOD digitised’, that ran from 2020 till 2023. All documents used for training and validation were scanned and transcribed within this project. This project was funded by the Mondriaan Fund, the Dutch Ministry of Health, Welfare, and Sport, and the NIOD Institute for War, Holocaust, and Genocide Studies in Amsterdam. The ‘Ground Truth’ training set is created by project members Annelies van Nispen, Carlijn Keijzer and Milan van Lange. Additional transcription and correction of ‘Ground Truth’ transcriptions was performed under supervision of Muriël Bouman by citizen scientists Hillebrand Verkroost, Bart Cohen, Evelien Bachrach, Marjo Janssens, and Cocky Sietses. The validation set contains a sample of 17 ‘Ground Truth’ transcriptions from various writers and sub-collections. The model is trained using PyLaia HTR, 250 epochs trained, learning rate 0.0003. The HTR-model 'IJsberg_PyLaia' (id: 38769) was used as a basemodel.

Try this model

Drag an image here

Select a file...

PNG or JPG up to 10 Mb

Wolpi

AI Assistant

By uploading an image, you accept our terms and privacy policy.

Use this model Open in Transkribus

Very low error rate4.6% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 4.6% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material. This is a larger model trained on diverse material, which generally makes it more robust across different handwriting styles. That said, larger training sets also make it harder to push the CER down further.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words177,850

Lines23,697

Training Pages1,087

Model ID53102

Related models

Description

Try this model

Related models

Text Titan II

The Text Titan I ter

The Text Titan I (Super Model)

Dutch Dean (Super Model)