Digital Humanities Centre, National Széchényi Library, Petőfi Literary Museum · PyLaia · Published November 11, 2022

Hungarian handwriting 19th–20th Century

Text Recognition

Description

The Digital Humanities Centre of the National Széchényi Library created the model. It is based on the corpus of the correspondence between József Kiss (1843–1921), a Hungarian poet and editor and his various connections. The training data contains his professional and personal letters written throughout his life. The manuscripts used in the current model are located in the Petőfi Literary Museum, Budapest. More manuscripts of the same correspondence are available and under processing at the National Széchényi Library. We expect to further enlarge the model later on when their process is complete.

Try this model

Use this modelOpen in Transkribus
Moderate error rate10.7% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 10.7% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material. This is a larger model trained on diverse material, which generally makes it more robust across different handwriting styles. That said, larger training sets also make it harder to push the CER down further.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words78,893
Lines17,456
Training Pages1,265
Model ID46058
Languages
Hungarian