Elpida Perdiki · PyLaia · Published September 24, 2022

Chrysostomicus I

Text Recognition

Description

This is a master model trained from transcriptions of 8 Byzantine manuscripts (namely: NLG Athens 263, 10th c.; ONB Vindob. theol. gr. 14, 15th c.; BNF Par. gr. 745, 12th c.; Athos Vatopedi 328, 14th c.; Patriarchal Library Alexandria 34, 10th c. ?; BSB Monac. gr. 377, 10th-11th c.; BSB Monac. gr. 353, 10th c.; Athos Dionysiou 70, 10th c.). The manuscripts that were used are dated from 10th to 15th c. CE and are written in Greek language (ancient-1453). Most manuscripts contain two John Chrysostom's Homilies on the Epistles of St. Paul to Titus (Homilies 1 and 5). Two of the manuscripts contain only the 5th Homily (namely: BNF Par. gr. 745, 12th c.; ONB Vindob. theol. gr. 14, 15th c.). Despite their deferences, most of them have a uniformity in style. They have few ligatures and abbreviations, which are located mostly in ending syllables and nomina sacra. Abbreviations were expanded in metadata whereas ligatures were normalised in the transcription section. Continuiung previous experiments, a base model was exploited to enhance results. The base model was the one with the lowest CER on previous experiments, trained on one of the 8 dataset's manuscripts (CER on Validation Set: 13.70%). All manuscripts were transcribed, and the model was trained by PhD student Elpida Perdiki as part of her dissertation under the supervision of Assistant Professor Maria Konstantinidou at the Department of Greek Philology, Democritus University of Thrace. This model was designed to support scholars in the automatic transcription and paleographic analysis of Byzantine Greek texts , with potential applications in historical research, philology, and digital humanities.

Try this model

Use this modelOpen in Transkribus
Very low error rate3.9% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 3.9% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words25,621
Lines5,916
Training Pages86
Model ID44872
Languages
Greek Ancient (to 1453)
Centuries
10th c.11th c.12th c.13th c.14th c.15th c.