ilkka.jokipii · PyLaia · Published November 29, 2022

Suomi 1870-1917

Text Recognition

Description

Suomi 1870-1917, version 1 Finnish Handwriting from 1870 to 1917. Created by Ilkka Jokipii (National Archives of Finland), Sami Suodenjoki (University of Tampere) and Maria Niku (Finnish Literature Society). Ground Truth set consist of Finnish Court Records from 1870 to 1917, Citizens letters to the Governor General's Office and diaries of Eliel Aspelin-Haapkylä. This is the first version of the model and it will be improved. If you have Finnish transcribed text from models era, contact Ilkka Jokipii (ilkka.jokipii@kansallisarkisto.fi) if you would like to share them to improve the ground truth set.

Try this model

Suomi 1870-1917
Use this modelOpen in Transkribus
Very low error rate2.2% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 2.2% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material. This is a larger model trained on diverse material, which generally makes it more robust across different handwriting styles. That said, larger training sets also make it harder to push the CER down further.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words446,199
Lines98,400
Training Pages1,553
Model ID48363
Languages
Finnish
Centuries
19th c.20th c.