Polina Staroverova, Alena Kuznetsova, Natalia Perkova, Dmitri Sichinava · PyLaia · Published November 27, 2022

Russian print XVIII cent PyLaia

Text Recognition

Description

This model was trained as a student project in a master’s program “Digital Humanities” during November 2021 – January 2022. The text corpus for the model includes books that were published after the Reform of Russian orthography made by Peter I in the following printing houses: the printing house of the Academy of Sciences in St. Petersburg, the one of the Imperial Moscow University, the one of Vilkovsky and Galchenko, the one of The Land Cadet Corps and some decrees printed in civil script. Training sources are books, scanned by Rusneb (https://rusneb.ru/) and by Google Books. The model shows good results on Russian language material, but it does not recognize other languages that can occur in texts of this period.

Try this model

Use this modelOpen in Transkribus
Very low error rate2.4% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 2.4% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words23,291
Lines4,653
Training Pages185
Model ID48282
Languages
Russian