Early Portuguese Printing (16th-19th)

Description

This model was trained on a dataset of selected Portuguese grammars and linguistic publications spanning the 16th to the 18th centuries. These documents, along with many others, are publicly accessible through the Portuguese National Digital Library (bndigital.bnportugal.gov.pt). The training set for this version comprises 142,606 words (745 pages) printed in Portuguese since 1536. The dataset reveals texts that include unique letters, diacritics, historical acronyms, typography, and fleurons characteristic of the historical Portuguese writing system adapted to the new press technology, all of which this model has been trained to recognize. Given the linguistic focus, both grammatical and historical, of its training set, this model can also recognize certain Greek letters, Latin text, table patterns and simple initial capitals. However, due to the limited training in these areas, it is not recommended for those uses. This model was developed as part of a master's degree project in the postgraduate linguistics program at the Universidade Federal de Santa Catarina (UFSC). The author (saulo.r@posgrad.ufsc.br) was financially supported by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior – Brasil (CAPES).

Try this model

Drag an image here

Select a file...

PNG or JPG up to 10 Mb

Wolpi

AI Assistant

By uploading an image, you accept our terms and privacy policy.

Use this model Open in Transkribus

Very low error rate2.58% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 2.58% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material. This is a larger model trained on diverse material, which generally makes it more robust across different handwriting styles. That said, larger training sets also make it harder to push the CER down further.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words142,606

Lines23,045

Training Pages745

Model ID267229

Related models

Description

Try this model

Related models

Transkribus Print M1

Early Portuguese Printing

XXth century Typewritten Portuguese

SPJCL 17C 4.2