Vladimir Neumann (Berlin State Library) · PyLaia · Published December 4, 2024

Ogorodok-Limonar (Church Slavonic Prints from Kyiv) V1

Text Recognition

Description

The present OCR model is based on printed Church Slavonic works from Kyiv and was trained using 700 GT pages from Gistorija Varlaama Ioasafa (1637, Kutein). This foundation ensures precise recognition of texts within the specific printing tradition of the Kyiv metropolitan area. The collection includes various theological and liturgical works published in different locations and time periods. The largest share belongs to Zercalo bogoslovija (1618, Pochaiv), accounting for 29.87% of the entries, followed by Ogorodok Marii Bogorodicy (1676, Kyiv) with 20.13%. The printed book of Limonar from Spiridon Sobol (1628, Kyiv) make up 18.18% of the entries. Besedy na 14 poslanij svjatogo apostola Pavla (1623, Kyiv) represents 14.29% of the collection, while Triodʹ cvetnaja (Slozka) (1666, Lviv) accounts for 5.84%. Smaller shares include Paterik ili Otečnik Pečerskij (1661, Kyiv) with 3.25% and the Četveroevangelie from the Mamoniči printing house (1575, Vilna) with 1.95%. The model is characterized by a strong focus on Kyiv’s Church Slavonic printing tradition and presents a differentiated quantitative structure. With a robust training base of 700 GT pages, it enables precise and reliable text recognition for printed works of this tradition. (Further Information: https://slavistik-portal.de/corphub.html)

Try this model

Use this modelOpen in Transkribus
Low error rate5.49% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 5.49% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words28,875
Lines5,043
Training Pages145
Model ID236549
Languages
BelarusianChurch SlavicUkrainian