digitisation.lib · PyLaia · Published December 21, 2024

Mons. Paolo Pullicino AI Transcription Model v2.5

Text Recognition

Description

The AI Transcription Model v2.5 was developed by the Digitisation Department at the University of Malta Library. This latest iteration has been carefully revised and trained on a diverse dataset of 192 handwritten pages, encompassing a total of 22,235 words and 3898 lines of text in multiple languages, including Italian, English, Latin, French, and Spanish. The training dataset consists of documents authored by Monsignor Paolo Pullicino (1815-1890), an influential figure in Malta's educational history, often regarded as the "Father of Maltese Education." The model underwent 150 training epochs, achieving an Overall Character Error Rate (CER) of 7.22%. The best recorded CER during training was 3.99%, while the Best Word Error Rate (WER) reached 30.63%.

Try this model

Mons. Paolo Pullicino AI Transcription Model v2.5
Use this modelOpen in Transkribus
Low error rate7.22% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 7.22% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words22,235
Lines3,898
Training Pages192
Model ID249229
Languages
EnglishFrenchItalianLatinCastilian
Centuries
19th c.