José Carlos Marques and Catarina Serafim (Arquivo Histórico Parlamentar) · PyLaia · Published April 29, 2025

AHP Handwritten Portuguese 19th-20th Centuries

Text Recognition

Description

Portuguese handwriting from the 19th and early 20th centuries, based on documents sent to the Portuguese Parliament by citizens and public and private organizations. The model is trained on original manuscripts, each written by a different hand and evenly distributed in 20-year intervals from 1820 to 1910. The documents are part of the collections of the Arquivo Histórico Parlamentar (AHP) — the Historical Parliamentary Archive of the National Parliament of Portugal.

Try this model

AHP Handwritten Portuguese 19th-20th Centuries
Use this modelOpen in Transkribus
Very low error rate2.53% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 2.53% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material. This is a larger model trained on diverse material, which generally makes it more robust across different handwriting styles. That said, larger training sets also make it harder to push the CER down further.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words186,342
Lines27,008
Training Pages1,063
Model ID330493
Languages
Portuguese
Centuries
19th c.20th c.