Julian Helmchen · PyLaia · Published December 19, 2023

Viennese Property Registers 1420-1517

Text Recognition

Description

The model is based on the property registers of the city of Vienna from the 15th and early 16th centuries. These are part of the city books. All real estate transactions in which a property changed hands, for example through a purchase or inheritance, were listed in them. The entries follow a form that varies only slightly, which is why the vocabulary represented in the training material is limited. The entries were written in Early New High German with a few Latin phrases. The fonts used are late Gothic minuscule, Bastarda and a very early Kurrent. The training material consists of 1228264 words, which corresponds to approximately 3500 pages. The Ground Truth was created as part of the DFG-funded research project Mapping Medieval Vienna, which focuses on analyzing the content of the sources. The transcription guidelines are therefore aimed at simplifying readability. Abbreviations have been resolved and medieval punctuation has been omitted. The letters are always transcribed in their basic form, diacritics have not been taken into account, and no distinction has been made between long and round "s". The following abbreviations were used for currency symbols: tl. = pound, s. = shilling, d. = pfenning, fl. = florin. Due to the homogeneity of the source corpus, the model achieves a 1.50% CER on a validation set. Contact: j.helmchen@fu-berlin.de

Try this model

Viennese Property Registers 1420-1517
Use this modelOpen in Transkribus
Very low error rate1.5% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 1.5% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material. This is a larger model trained on diverse material, which generally makes it more robust across different handwriting styles. That said, larger training sets also make it harder to push the CER down further.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words1,228,264
Lines127,905
Training Pages3,300
Model ID57815
Languages
German