Ville-Pekka Kääriäinen · PyLaia · Published May 20, 2023

Swedish 17th century (Savo, Eastern Finland)

Text Recognition

Description

The training material for this model is written in the Gothic handwriting style, also known as the 'German' handwriting style. This model has been developed for 17th-century handwritten Swedish. It was brought to life as part of Ville-Pekka Kääriäinen's doctoral project at the University of Helsinki, focusing on the 17th-century Upper Savonia (Ylä-Savo, Iisalmi/Idensalmi parish, Eastern Finland). As such, the model's capacity to interpret proper nouns (person and place names) may be somewhat limited, due to the specific geographic scope of the training data. For more information on the model data please check this link: https://readcoop.eu/model/swedish-17th-century-savo-eastern-finland/ The model adheres closely to the source material in its structure. Monetary units, measurement units, and other abbreviations have been addressed with their inherent logic, albeit without expanding them. For instance, common currency units like the mark and thaler (swe: daler) are depicted by symbols m/m:r or D/D:r, contingent on the context. The model has been created through substantial personal effort and commitment. It is my hope that it will prove beneficial to others. I am open to collaboration to further develop this model. Please feel free to contact me at: v.kaariainen@gmail.com. GT: Pages: 1353 (training set) + 147 (validation set) = 1500 pages Words: 472655 (training set) + 51613 (validation set) = 524,268 words

Try this model

Use this modelOpen in Transkribus
Very low error rate3.8% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 3.8% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material. This is a larger model trained on diverse material, which generally makes it more robust across different handwriting styles. That said, larger training sets also make it harder to push the CER down further.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words470,065
Lines60,595
Training Pages1,353
Model ID52321
Languages
Swedish