INT - Instituut voor de Nederlandse Taal · PyLaia · Published December 23, 2022

Dutch newspapers 17th century

Text Recognition

Description

This Dutch mixed roman/gothic model has been trained from a ground truth set of 100 images of 17th century Dutch newspaper material from the "Couranten Corpus", cf. https://couranten.ivdnt.org/corpus-frontend/couranten/about. The corpus consists of ground truth transcriptions of newspapers digitized by the Dutch National Library. The model was trained with default parameters, with a 10% validation set.

Try this model

Use this modelOpen in Transkribus
Very low error rate3.5% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 3.5% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words116,287
Lines13,380
Training Pages90
Model ID48942
Centuries
17th c.