National Archives Netherlands · PyLaia · Published December 17, 2021

IJsberg_PyLaia

Text Recognition

Description

This the second model created by the National Archives of the Netherlands. It is based on the careful transcription of dozens of different handwritings coming from the 17th, 18th and 19th century and comprises scans from the Incoming Documents from the Dutch East India Company (Overgekomen Brieven en Papieren van de VOC) of the National Archives of the Netherlands and of 19th century Notarial deeds from the Noord-Hollands archief and eight other State Archives in the provinces. DocID's: 146280 146321 158134 165671 165672 192776 192777 269375 Every 100th scan is GT.

Try this model

Use this modelOpen in Transkribus
Very low error rate4.1% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 4.1% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material. This is a larger model trained on diverse material, which generally makes it more robust across different handwriting styles. That said, larger training sets also make it harder to push the CER down further.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words1,538,478
Lines247,861
Training Pages5,917
Model ID38769