aleksej.tikhonov · PyLaia · Published August 1, 2024

Ukrainian generic handwritten and typed

Text Recognition

Description

The extension of the Ukrainian generic handwriting 1 model. Curated and trained by Aleksej Tikhonov (MultiHTR project, University of Freiburg) with data from the Prozhito Project (with the participation of Misha Melnichenko) and the the Foundation of the International Memorial Association (with the participation of Aren Vanyan and Nikita Lomakin). The model can transcribe Ukrainian manuscripts and typewritten texts from the 19th-20th centuries. The project was funded by the Ministry of Science, Research and the Arts of Baden-Württemberg with funds from the state digitization strategy digital@bw.

Try this model

Use this modelOpen in Transkribus
Very low error rate4.57% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 4.57% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material. This is a larger model trained on diverse material, which generally makes it more robust across different handwriting styles. That said, larger training sets also make it harder to push the CER down further.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words152,416
Lines24,736
Training Pages773
Model ID144265
Languages
Ukrainian