Tibetan Manuscript 3

Description

Tibetan Manuscript 3 was developed as part of the PaganTibet project (pagantibet.com) at EPHE-PSL Paris, with funding from the European Union (ERC, Reconstructing the Pagan Religion of Tibet (2023-2028), 101097364). The model was trained on a dataset of 16,673 images from a diverse collection of Tibetan-language manuscripts in pothi format, with 15,006 images used for training and 1,667 for validation. It is designed to detect text lines while ignoring folio numbers. For optimal results, in the advanced settings, set the “Minimal baseline length” parameter to low to capture orphaned syllables and punctuation.

Open in Transkribus

Low loss5.43% loss

Loss indicates how far the predicted text regions deviate from the ground truth (lower is better). This model achieved 5.43% on its validation set. A loss below 10% generally indicates reliable baseline detection. Trained on a broad range of page layouts, this model should generalise well. Complex or unusual structures may still require fine-tuning.

Layout detection quality depends heavily on your document's structure. Pages with columns, marginalia, or non-standard layouts may produce different results.

Words1,661,711

Lines130,869

Training Pages15,006

Model ID229409

Related models

Description

Related models

Mixed Line Orientation

Universal Lines

Horizontal Line Orientation

Sentenze tribunale