Vladimir Polomac · PyLaia · Published November 26, 2022

Dionisio 2.0.

Text Recognition

Description

Prof. Vladimir Polomac (University of Kragujevac, Serbia) created the Dionisio 2.0. generic model for automatic recognition of Serbian Church Slavonic printed books of the 15th-17th centuries. The model was created based on the materials from various Serbian printing houses of the 15th-17th centuries (Cetinje, Venice, Goražde, Mileševa, Belgrade and Mrkša'a Church). The percentage of incorrectly recognized characters on the validation set is 2.4%. The process of creating the model within CITLab HTR+ engine, its quantitative and qualitative performance are described in detail in a separate paper. While the models based on the CITlabHTR+ technology are no longer available in Transkribus, the Dionisio 2.0. model has been retrained with PyLaia technology with the almost identical performances.

Try this model

Use this modelOpen in Transkribus
Very low error rate2.4% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 2.4% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material. This is a larger model trained on diverse material, which generally makes it more robust across different handwriting styles. That said, larger training sets also make it harder to push the CER down further.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words176,481
Lines24,143
Training Pages1,050
Model ID48253
Languages
Church Slavic