Sara Mansutti · PyLaia · Published February 28, 2024

Cosimo Bartoli's Italian Humanistic and Cursive Scripts (1562-1572)

Text Recognition

Description

The "Cosimo Bartoli's Italian Humanistic and Cursive Scripts (1562-1572)" model has been trained to recognise the letters and newsletters sent by Cosimo Bartoli (1503-1572), Medici’s agent in Venice between 1562 and 1572. The Training Set consists of 233,000 words and contains two primary writing styles: the humanistic hand of Cosimo Bartoli and the cursive hand of Curzio Bartoli, Cosimo's son. In addition to these ones, a small number of documents are written by a few other individuals. The documents are heavily abbreviated, and it was decided to keep the abbreviated forms. Thus, the model is trained to transcribe abbreviations exactly as they are. During the training, the "Italian Administrative Hands, 1550-1700” was used as the base model, and the advanced "Dewarping Method" was employed with the parameter set to “dewarp.” This correction method for non-horizontal lines enhances the recognition of curved lines, such as those near the inner margins of a tightly bound volume. The model achieved a Character Error Rate of 3.20%. This model was trained in February 2024 by Sara Mansutti as part of her PhD project within the EURONEWS Project at University College Cork.

Try this model

Cosimo Bartoli's Italian Humanistic and Cursive Scripts (1562-1572)
Use this modelOpen in Transkribus
Very low error rate3.2% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 3.2% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material. This is a larger model trained on diverse material, which generally makes it more robust across different handwriting styles. That said, larger training sets also make it harder to push the CER down further.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words233,000
Lines20,541
Training Pages939
Model ID60205
Languages
Italian