Achim Rabus · PyLaia · Published January 25, 2025

Generic Church Slavonic Handwriting 3

Text Recognition

Description

Model for Church Slavonic Handwriting (primarily trained on East Slavic material but also includes South Slavic); Suitable for transcribing Old Cyrillic script styles (uncial and semi-uncial); The training data includes: - Codex Suprasliensis (10th-11th cc., South Slavic recension); - Manuscript of the Catecheses of Cyril of Jerusalem (transmitted text version used: 11th c., East Slavic recension); - Methodius of Olympus: Symposion (transmitted text version used: 17th c., East Slavic recension) - Parts of the Velikie Minei Četʹi (16th c., East Slavic recension), including: - Large parts of the volumes for March and May - Apostolos from June volume - Methodius of Olympus: De lepra from June volume The model is an update of the public Church Slavonic models Combined_Full_VKS_2 and VMC_Test_4+ and was trained as part of the QuantiSlav project (https://quantislav.badw.de/) by Elena Renje. Model Curator: Achim Rabus (Slavic Department, University of Freiburg).

Try this model

Use this modelOpen in Transkribus
Very low error rate3.29% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 3.29% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material. This is a larger model trained on diverse material, which generally makes it more robust across different handwriting styles. That said, larger training sets also make it harder to push the CER down further.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words1,484,137
Lines309,959
Training Pages2,643
Model ID272469
Languages
Church Slavic