A. Arkhipov, E. Lazarenko, S. Saldyr / INEL project · PyLaia · Published September 16, 2025

Lehtisalo-2.0

Text Recognition

Description

This model is trained on the printed edition of Nenets folklore: Lehtisalo 1947 "Juraksamojedische Volksdichtung" (pp. 1-169). The text is bilingual: the original Nenets text transcribed with Uralic Phonetic Alphabet (UPA)/Finno-Ugric Transcription (FUT) + German translation. The version of FUT used in this edition is particularly complex and includes over 100 non-ASCII characters and diacritics. Base Model: Transkribus Print M1

Try this model

Lehtisalo-2.0
Use this modelOpen in Transkribus
Very low error rate0.57% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 0.57% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words45,539
Lines5,644
Training Pages81
Model ID401517
Languages
German