Description
The model is based on the property registers of the city of Vienna from the 15th and early 16th centuries. These are part of the city books. All real estate transactions in which a property changed hands, for example through a purchase or inheritance, were listed in them. The entries follow a form that varies only slightly, which is why the vocabulary represented in the training material is limited. The entries were written in Early New High German with a few Latin phrases. The fonts used are late Gothic minuscule, Bastarda and a very early Kurrent.
The training material consists of 1228264 words, which corresponds to approximately 3500 pages. The Ground Truth was created as part of the DFG-funded research project Mapping Medieval Vienna, which focuses on analyzing the content of the sources. The transcription guidelines are therefore aimed at simplifying readability. Abbreviations have been resolved and medieval punctuation has been omitted. The letters are always transcribed in their basic form, diacritics have not been taken into account, and no distinction has been made between long and round "s". The following abbreviations were used for currency symbols: tl. = pound, s. = shilling, d. = pfenning, fl. = florin. Due to the homogeneity of the source corpus, the model achieves a 1.50% CER on a validation set.
Contact: j.helmchen@fu-berlin.de