Silvia Carboni for Mediobanca SpA · PyLaia · Published October 10, 2025

20th Century Typewritten Italian

Text Recognition

Description

This model was created by the Historical Archive of Mediobanca, an Italian investment bank, to transcribe a variety of typewritten documents in Italian and English, such as correspondence, articles, meetings' minutes, memos. The training set consists of 28.137 words. The model is trained to expand abbreviations specific to banking activities and people’s honorifics. Mediobanca's internationalization has left many documents in English, so "Transkribus Print M1" was used as the base model: even if "20th Century Typewritten Italian" is specifically geared towards the Italian language, the model was also tested on documents in English, proving to be able to recognize words and short sentences with satisfying results. This model achieved a Character Error Rate of 0.82%. This model was created alongside the "20th Century Typewritten Letters - Diplomatics' Elements" Field Model, as part of a larger project. It was trained by Silvia Carboni for Mediobanca's Historical Archive.

Try this model

20th Century Typewritten Italian
Use this modelOpen in Transkribus
Very low error rate0.82% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 0.82% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words28,137
Lines4,692
Training Pages187
Model ID413877
Languages
EnglishItalian
Centuries
20th c.