matija.ogrin · PyLaia · Published November 9, 2024

Slovenian 18th and 19th century manuscripts

Text Recognition

Description

The model for the identification of Slovenian manuscripts of the late 18th and mid-19th century is based on manuscript texts of four Slovenian writers. The total training set consists of approx. 170,000 words: ~ 55,000: Konrad Branka, Franciscan friar, theologian and professor; late 18th century, ~ 20,000: Mihael Zagajšek, parish priest at Kalobje, preacher, spiritual writer, linguist, ~ 12,000: Tobias Vernik, Franciscan brother layman, mid-19th century, ~ 93,000: Ignazij Holzapfel, parish priest in Ribnica, preacher, spiritual writer. The size of the learning set for each writer varies according to the difficulty of the manuscript and the complexity of the hand. The most difficult handwriting is undoubtedly Holzapfel's, and therefore the most extensive training set is made for him. The model was prepared by Marko Kunavar and Matija Ogrin. This work was funded by the CLARIN.SI consortium (Jožef Stefan Institute) and the Research Centre of the Slovenian Academy of Sciences and Arts (ARIS programme P6-0024).

Try this model

Slovenian 18th and 19th century manuscripts
Use this modelOpen in Transkribus
Very low error rate3.29% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 3.29% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words170,159
Lines15,564
Training Pages258
Model ID216113
Languages
Slovenian
Centuries
18th c.19th c.