Transkribus Community · PyLaia · Published July 18, 2023

Coloso Español

Text Recognition

Description

Coloso Español is a versatile AI model in Transkribus designed to transcribe a wide array of Spanish texts, from medieval manuscripts to 20th-century documents. Developed in collaboration with the Transkribus community and supported by a multitude of researchers, it excels in handling various scripts and spellings across multiple periods. More details and an accompanying research paper will be available soon. Please avoid using this model as a basemodel when training custom models, or with the language model or SmartSearch option, as it will not work properly due to its size. The model training has been coordinated by Álvaro Cuéllar with the collaboration of Stefano Bazzaco, Alba Comino, Andrés Echavarria Peláez, José Manuel Fradejas Rueda, Francisco Gago Jover, Raquel Liceras-Garrido, Patricia Murrieta-Flores, Humberto Olea Montero, Rocío Ortuño Casanova, Fernando J. Pancorbo, Milena Peralta Friedburg, Eva Sánchez-Salido, Rodrigo Vega Sánchez, Juan Carlos Vallejo Velásquez and Ezequiel Villani.

Try this model

Coloso Español
Use this modelOpen in Transkribus
Very low error rate4.8% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 4.8% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material. This is a larger model trained on diverse material, which generally makes it more robust across different handwriting styles. That said, larger training sets also make it harder to push the CER down further.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words11,442,533
Lines2,660,776
Training Pages38,795
Model ID53551
Languages
Castilian