j.halaczkiewicz · PyLaia · Published June 8, 2024

The Polish Schwabacher

Text Recognition

Description

This text recognition model has been developed on the basis of scans of Jakub Śliwski’s Polish translation of „Historia del regno di Voxu del Giapone, dell, antichita, nobilta, e valore del suo re Idate Masamune” by Scipione Amati, published by Franciszek Cezary in Cracow, 1616. The story tells of the second Japanese mission to Europe, which took place after the successful establishment of Christian faith in Japan by the Franciscan missionary Luis Sotelo (1574–1624). Scans of source book are available at the National Digital Library Polona: https://polona.pl/preview/45b00d44-4957-41c8-af0e-6e9ccae557ae. The source material was printed mainly in the Polish Schwabacher, a Gothic font used by typesetters for typesetting texts in the national language (see more: https://typoteka.pl/en). There are also italics (used to highlight quoted fragments) and Latin font (for Latin words). This text recognition model helps in preparing diplomatic transcription. All characters (including ſ, á, v in the „u” function, y in the „i, j” function, / as a comma) have been preserved. Abbreviations and ligatures (like sweg°, teg°, æ, &) were expanded. The model does not recognize initials. It may not recognize headlines correctly. Contributors: Dr Joanna Hałaczkiewicz (Faculty of Polish Studies, Jagiellonian University, j.halaczkiewicz@uj.edu.pl) – editor and supervisor; Karolina Kapuścińska, Agata Lech, Gabriela Paszkowska, Aleksandra Sobańska, Agnieszka Tkacz, Olga Zatońska – a master’s students of Polish philology with emphasis in textual scholarship.

Try this model

The Polish Schwabacher
Use this modelOpen in Transkribus
Very low error rate0.87% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 0.87% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words16,407
Lines1,860
Training Pages56
Model ID101013
Languages
Polish
Centuries
16th c.17th c.18th c.