mikel.iruskieta · PyLaia · Published October 4, 2024

Contemporary Student Handwritten

Text Recognition

Description

The Contemporary Student Handwritten model is a multilingual AI model in Transkribus designed to transcribe learners' handwriting in Basque, Spanish, and English. It achieves a Character Error Rate (CER) of 4.46% on the training set and 7.68% on the validation set. The dataset consists of school-based texts written by adolescent students aged 12–16. Original errors in the handwriting were preserved and transcribed verbatim. The model was trained on a corpus of 96,931 words across the three languages, collected from various schools in the Basque Autonomous Community in 2023. Further details and an accompanying research paper will be made available soon. The model training was conducted by Mikel Iruskieta (HiTZ - Ixa, UPV/EHU) and Roberto Arias-Hermoso (Mondragon Unibertsitatea).

Try this model

Use this modelOpen in Transkribus
Low error rate7.68% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 7.68% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material. This is a larger model trained on diverse material, which generally makes it more robust across different handwriting styles. That said, larger training sets also make it harder to push the CER down further.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words96,931
Lines14,877
Training Pages862
Model ID187725
Languages
EnglishBasqueCastilian
Centuries
20th c.21st c.