Transkribus · Baselines · Published May 6, 2023

Universal Lines

Layout Analysis★ Featured

Description

This model has been trained by the Transkribus team from approximately 25,000 pages sourced not only from the cBad dataset but also from pages meticulously curated as ground truth data by the Transkribus community. This is the most general baseline model currently in the platform, adept at navigating through the complexities inherent in a wide array of text types, structures, and styles. We would recommend to use this model as the default choice, especially when you're unsure regarding the optimal model selection to align with the characteristics of your material. Also, if you're dealing with very diverse and complex layout, try this model or the "Mixed Line Orientation" model (ID: 49272). Please also be aware of the option to tweak any model's parameter settings to your needs - for more information, check the help center: https://help.transkribus.org/advanced-layout-configuration-settings
Open in Transkribus
Low loss8.94% loss

Loss indicates how far the predicted text regions deviate from the ground truth (lower is better). This model achieved 8.94% on its validation set. A loss below 10% generally indicates reliable baseline detection. Trained on a broad range of page layouts, this model should generalise well. Complex or unusual structures may still require fine-tuning.

Layout detection quality depends heavily on your document's structure. Pages with columns, marginalia, or non-standard layouts may produce different results.

Words5,556,817
Lines1,061,279
Training Pages24,723
Model ID51962