Newseye-project · PyLaia · Published October 22, 2021

BnF_Newseye_M2+

Text Recognition

Description

The model works well with French script from late 18th century to mid of 20th century. For normal running text in French newspapers from that time error rates much below 1% were measured. The model was created in the NewsEye project and is based on training data coming from the digital library Gallica of the French National Library (BnF). Note: the model is trained on French language documents (French dailies, 1850-1945) and will therefore be less performant on other languages.

Try this model

Use this modelOpen in Transkribus
Very low error rate4.2% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 4.2% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words434,100
Lines68,481
Training Pages127
Model ID37747
Languages
French
Centuries
18th c.19th c.20th c.