Description
This model was created by the Historical Archive of Mediobanca, an Italian investment bank, to identify and tag the elements of 20th century typewritten correspondence, in accordance with contemporary diplomatics. The training set consists of 251 pages: it contains outgoing letters written by the bank and a wide variety of letters received from various senders pertaining to big entities (banks and corporations, both Italian and international), in order to train the model on a broad range of formats and structures. For the same reason, the training set also includes outgoing and incoming telegrams.
The model was trained to recognize the following diplomatics' elements and related tags: 1) "INTESTAZIONE_letterhead"; 2) "DATA_date"; 3) "RICEZIONE_date-received", usually a stamp; 4) "MITTENTE_sender"; 5) "DESTINATARIO_recipient"; 6) "OGGETTO_subject"; 7) "CORPO_textbody"; 8) "FIRMA_signature"; 9) "NOTE_notes", a tag used for typed notes; 10) "NOTE-MS_handwritten-notes"; 11) "RESPONSABILI_written-by", initials of those responsible for writing the letter; 12) "VISTO_read-by", initials indicating who read the letter; 13) "ALLEGATO_attachment", as body text. The tags are written both in Italian and English, with the following structure: "ITALIAN_english". The model achieves a Mean Average Precision of 59.43%. This model was created alongside the "20th Century Typewritten Italian" Text Recognition Model, as part of a larger project. It was trained by Silvia Carboni for Mediobanca's Historical Archive.