hyj384412940 · PyLaia · Published April 9, 2025

Prensa Cataluña XIX v1.1 b

Text Recognition

Description

This model transcribes 19th-century Spanish (Castilian) printed newspapers from Catalonia, using titles sourced from the Arxiu de Revistes Catalanes Antigues (ARCA: https://arca.bnc.cat/arcabib_pro/es/inicio/inicio.do). Trained on 75,471 words and 8,830 lines across 138 training pages plus 15 validation pages, it achieves a 1.07% CER on validation. Developed within the Grup de Gramàtica i Diacronia (GRADIA) [2017SGR1337], Marginalia en el centro de la investigación diacrónica. Verbos en serie y perífrasis en cadena de MINECO (PID2022-138259NB-I00), University of Barcelona, the model Prensa Cataluña XIX v1.1 is under active development. Contact: Yujian Han. yujianhan@ub.edu.

Try this model

Use this modelOpen in Transkribus
Very low error rate1.07% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 1.07% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words75,471
Lines8,830
Training Pages138
Model ID320373
Languages
Castilian
Centuries
19th c.