University of Zurich · PyLaia · Published December 27, 2022

NZZ Gold Standard M1+

Text Recognition

Description

The model is based on 167 title pages from the Neue Zürcher Zeitung (NZZ) covering the years 1780 to 1940. Every 10th page is taken as validation set. The model is provided by the Computational Linguistics Group (Simon Clematide, Philip Ströbel) from the University of Zurich within the framework of the Impresso project (https://impresso-project.ch/). Data can be downloaded from ZENODO: https://zenodo.org/record/3333627#.XcvqCNVKjDd

Try this model

Use this modelOpen in Transkribus
Very low error rate0.5% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 0.5% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words273,440
Lines38,757
Training Pages150
Model ID49007
Centuries
18th c.19th c.20th c.