josemanuel.fradejas · PyLaia · Published May 13, 2025

Spanish Gothic Print v2 (HSMS)

Text Recognition

Description

This model is trained to recognize the Gothic typefaces used in Castilian incunabula and early XVI century printed books. It is based on twenty three books printed in the workshops of Stanislao Polono & Meinardo Ungut (Seville: RGP, CNB, JOS, CLS, SPO), Fadrique de Basel (Burgos: LES, AYL, C87, AXP), Pablo Hurus (Zaragoza: ERI, APL, SVH, ACM, VTS, LIM), Cuatro Compañeros (Seville: CAR), Juan de Burgos (Burgos: AUG, BMP), Pedro Hagenbach (Toledo: CUR), and Guillén de Brocar (Pamplona: GEN), all printed between 1487 and 1558. For the correspondence between these acronyms (developed by the Hispanic Seminary of Medieval Studies) and the actual works and copies used, see Fradejas Rueda & Cossío Olavide (2025). This second version, and renamed Spanish Gothic Print (HSMS), was trained because it was discovered that some printers used a stright d instead of the usual uncial d, and a peculiar word final s that was variously rendered when the model was applied. The new text added were printed by Alonso Melgar (Burgos: TP24), Jorge Cocci (Zaragoza: SJ14), Salcedo (Alcalá de Henares: P57; only five openings) and Sebastián Martínez (Valladolid: P59 only five openings). All this were printed between 1514 and 1559 and used Gothic typeface. The samples from these twenty three editions consists on ten folios (verso-recto) per copy (except P57 and P59), and were transcribed according to the HSMS transcription system (http://hispanicseminary.org/manual-en.htm). This means that all abbreviations are expanded and enclosed between < > signs, and superscript letters are followed by a grave accent. But contrary to de original HSMS guidelines, ç, ñ and ¶ are transcribed as such instead of c', n~ and %. Developed within the project 7PartidasDigital (PID2020-112621GB-I00; funded by MCIN/AEI/ 10.13039/501100011033) by José Manuel Fradejas Rueda.

Try this model

Use this modelOpen in Transkribus
Very low error rate0.73% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 0.73% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words217,088
Lines27,372
Training Pages207
Model ID338253
Languages
Castilian