Marija Đokić Petrović & Marko Simonović · PyLaia · Published November 28, 2025

Burgenland Croatian Typewritten 2010-2019

Text Recognition

Description

Burgenland Croatian Typewritten 2010–2019 is a Text Recognition model, curated and trained by Marija Đokić Petrović (School of Computing, Union University Belgrade, Serbia) and Marko Simonović (Institute of Slavic Studies, University of Graz, Austria). The model is trained on a dataset consisting of three issues of the weekly newspaper "Hrvatske novine" (18 June 2010; 17 October 2014; 11 January 2019), obtained from ANNO/Austrian National Library. As the first dedicated model for Burgenland Croatian (Glottolog: burg1244; IETF: ckm-AT), it focuses on post-2000 printed material—a period marked by intensified standardisation—and is therefore optimised for texts published after 2010. The model will be updated regularly to improve its accuracy.

Try this model

Use this modelOpen in Transkribus
Very low error rate2.48% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 2.48% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words58,405
Lines12,298
Training Pages80
Model ID442685
Languages
Croatian
Centuries
21st c.