An Gaodhal Irish-English Bilingual Model

Description

This bilingual multi-script text recognition model was derived from the complete set of the late nineteenth-century newspaper An Gaodhal, a bilingual (Irish and English) publication produced in Brooklyn by Irish emigrant Micheál Ó Lócháin (Michael J. Logan). It was developed in the course of the project "Building a Digitally Enhanced Edition of the Brooklyn-Published Irish-Language Newspaper An Gaodhal, 1881-1904," a collaborative initiative between New York University and the University of Galway. The Irish-language text in this newspaper is almost exclusively set in cló Gaelach, a non-roman script commonly used at the time. All pages used for the model were corrected by a specialist in the historical forms of the language, and the digital images were provided through the digital library holdings of the James Hardiman Library at the University of Galway. The model preserves key features of the cló Gaelach form: notably, it deploys unicode characters that preserve the punctum used to designate lenition of consonants present in the original text. The project was conducted with funding support from the Robert D. L. Gardiner Foundation, the Irish Institute of New York, Glucksman Ireland House, New York University, University of Galway, Foras na Gaeilge, and the Department of Rural and Community Development and the Gaeltacht of the Government of Ireland.

Try this model

Drag an image here

Select a file...

PNG or JPG up to 10 Mb

Wolpi

AI Assistant

By uploading an image, you accept our terms and privacy policy.

An Gaodhal Irish-English Bilingual Model

Use this model Open in Transkribus

Very low error rate1.27% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 1.27% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material. This is a larger model trained on diverse material, which generally makes it more robust across different handwriting styles. That said, larger training sets also make it harder to push the CER down further.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words1,412,355

Lines218,021

Training Pages2,067

Model ID501397

Related models

Description

Try this model

Related models

The Text Titan I (Super Model)

Text Titan II

The Text Titan I ter

easy4you 1