Tobias Hodel · PyLaia · Published December 21, 2022

StAZH_RRB_German_Kurrent_XIX

Text Recognition

Description

Complete training of all minutes of the handwritten Zurich executive minutes (1803-1887), based on automated text-to-image processing. The set is licensed under CC-BY-SA and can be re-used. For access to the minutes see here: https://www.archives-quickaccess.ch/search/stazh/rrb For the TEI-XML see ZENODO: https://doi.org/10.5281/zenodo.803239

Try this model

Use this modelOpen in Transkribus
Very low error rate1.2% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 1.2% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material. This is a larger model trained on diverse material, which generally makes it more robust across different handwriting styles. That said, larger training sets also make it harder to push the CER down further.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words26,026,908
Lines5,909,205
Training Pages159,062
Model ID48925