mikel.iruskieta · PyLaia · Published September 27, 2024

Contemporary Basque Student Handwritten

Text Recognition

Description

The Contemporary Basque Student Handwritten model is a Basque AI model in Transkribus designed to transcribe learners' handwriting in Basque. It achieves a Character Error Rate (CER) of 4.77% on the training set and 6.07% on the validation set. The dataset consists of school-based texts written by adolescent students aged 12–16. Original errors in the handwriting were preserved and transcribed verbatim. The model was trained on a corpus of 51,195 words in Basque, collected from various schools in the Basque Autonomous Community in 2023. Further details and an accompanying research paper will be made available soon. The model training was conducted by Mikel Iruskieta (HiTZ - Ixa, UPV/EHU) and Roberto Arias-Hermoso (Mondragon Unibertsitatea).

Try this model

Use this modelOpen in Transkribus
Low error rate6.07% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 6.07% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material. This is a larger model trained on diverse material, which generally makes it more robust across different handwriting styles. That said, larger training sets also make it harder to push the CER down further.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words51,195
Lines8,893
Training Pages525
Model ID185185
Languages
Basque