Portuguese Handwriting 16th-19th c.

Name: Portuguese Handwriting 16th-19th c.
Author: TraPrInq Project

Description

Generic Model created in the framework of the TraPrInq Project (01.2022 to 07.2023) funded by the FCT (Portuguese Agency for Scientific Research), by the members of the team: Carla Vieira, Jorge Ferreira Paulo, Hervé Baudry, Leonor Dias Garcia, Ana Margarida Dias da Silva, Maria Olinda Alves Pereira, Mário Soares Fatela, Marize Helena de Campos, Natalia Casagrande Salvador, Susana Tavares Pedro, Suzana Maria de Sousa Santos Severs. This HTR-model is based on the trial records of the Portuguese Inquisition produced between 1536 (some documents even before) and 1821. It contains careful transcription from 6226 pages (Validation Set: 505 p; Training Set: 5721 p) extracted from 830 processes, mainly by the Lisbon court, with a total of 1268040 words (VS: 107760 words; TS: 1160280). Digitized files can be found on the website of the Portuguese National Archive (Arquivo Nacional da Torre do Tombo). The Model proved its efficacy with hybrid texts (fill-in forms), documents from non-inquisitorial areas. In broad, the transcription reproduces the spelling of words and abbreviations, uses special characters for baseline abbreviation signs and a single COMBINING MACRON for all superscript abbreviation signs, and modernises word separation. The detailed transcription protocol and character list are available at: https://site-2011948.mozfiles.com/files/2011948/Grelha_Criterios.pdf

Try this model

Drag an image here

Select a file...

PNG or JPG up to 10 Mb

Wolpi

AI Assistant

By uploading an image, you accept our terms and privacy policy.

Use this model Open in Transkribus

Low error rate5.2% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 5.2% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material. This is a larger model trained on diverse material, which generally makes it more robust across different handwriting styles. That said, larger training sets also make it harder to push the CER down further.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words1,159,586

Lines153,467

Training Pages5,721

Model ID53270

Related models

Description

Try this model

Related models

Western Sephardic Diaspora 1.2 (1676-1800)

General Portuguese M1

The Text Titan I (Super Model)

The German Giant I