Project Beta maṣāḥǝft: Manuscripts of Ethiopia and Eritrea · PyLaia · Published November 30, 2022

Manuscripts of Ethiopia and Eritrea

Text Recognition

Description

Model for the transcription of Manuscripts of Ethiopia and Eritrea in Classical Ethiopic (Gǝʿǝz). Trained as part of the Beta maṣāḥǝft project and in order to feed a workflow to import transcriptions into the project's database. Transcriptions for the training have been kindly provided by - Alessandro Bausi for ESum039, ff. 16vb-29va; - Antonella Brita for DAS002, 101va-110ra; - Dorothea Reule for ESqdq004, ff. 97ra-101vb, 104ra-109rb. - Nafisa Valieva for BLorient718, ff. 1ra-7vb, images British Library. - Several parts of manuscripts transcribed by Jeremy Brown and pertaining to the Miracle of the Cannibal of Qemer. Importing of images and transcriptions in Transkribus has been done by Pietro Liuzzo The project Beta maṣāḥǝft: Manuscripts of Ethiopia and Eritrea (Schriftkultur des christlichen Äthiopiens und Eritreas: eine multimediale Forschungsumgebung) is a long-term project funded within the framework of the Academies' Programme (coordinated by the Union of the German Academies of Sciences and Humanities) under survey of the Akademie der Wissenschaften in Hamburg. The funding will be provided for 25 years, from 2016–2040. The project is hosted by the Hiob Ludolf Centre for Ethiopian Studies at the Universität Hamburg. It aims at creating a virtual research environment that shall manage complex data related to the predominantly Christian manuscript tradition of the Ethiopian and Eritrean Highlands.

Try this model

Use this modelOpen in Transkribus
Very low error rate3.8% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 3.8% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words53,830
Lines21,173
Training Pages282
Model ID48371