f.erhard · Baselines · Published February 8, 2024

Tibetan Modern Print (TMP) 4.3

Layout Analysis

Description

Tibetan Modern Print (TMP) 4.3 will detect baselines in modern Tibetan prin publications from 1950s to 1980s published in the PRC. For best results, in the advanced settings - baseline options set the "Minimal baseline length" to 10 (low) or even 5, to capture orphaned syllables and page numbers. The model was trained for the Divergent Discourses Project a UK-German collaborative research based at Leipzig University and SOAS, London, and funded by the DFG and AHRC. The model was trained on 18 documents, a total of 440 pages (training set; 342p; validation set 37p.)
Open in Transkribus
Very low loss3.87% loss

Loss indicates how far the predicted text regions deviate from the ground truth (lower is better). This model achieved 3.87% on its validation set. A loss below 10% generally indicates reliable baseline detection.

Layout detection quality depends heavily on your document's structure. Pages with columns, marginalia, or non-standard layouts may produce different results.

Words8,901
Lines7,876
Training Pages417
Model ID59417
Languages
Tibetan