Hervé Baudry, Kristian Hamon · PyLaia · Published August 4, 2024

Mouladurioù 17vet-19vet kantved / Breton Prints 17th-19th centuries

Text Recognition

Description

First generic Model for the transcription of documents printed in Breton. The ground truth is made of 6 329 lines, 40 364 words. The extracts (187 pages, 17 of which for the Validation Set) were selected among 52 items printed between 1608 and 1921. Some of them are bilingual (Breton-French). The accents are reproduced as such in the texts, in particular according to the tilde, of limited use along the period. Mouladurioù 17vet-19vet kantved / Breton Prints 17th-19th centuries was trained using the Base Model Transkribus Print M1.

Try this model

Use this modelOpen in Transkribus
Very low error rate1.36% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 1.36% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words36,725
Lines5,773
Training Pages170
Model ID145625
Languages
Breton