TibSchol Project · PyLaia · Published September 8, 2023

Tibetan cursive (Betsug)

Text Recognition

Description

This model is tailored towards transcribing the handwritten Tibetan cursive script known as Betsug (dpe tshugs). This model was created in the framework of the ERC project The Dawn of Tibetan Buddhist Scholasticism (11th-13th c.) (TibSchol) (https://www.oeaw.ac.at/projects/tibschol), hosted at the Institute for the Cultural and Intellectual History of Asia, Austrian Academy of Sciences, and was released by Rachael Griffiths (rachaelgriffiths1@gmail.com). </p> This Betsug model uses the Drutsa model (‘Tibetan cursive (Drutsa)’, id: 54525) as a base model. For best results, apply this model after running the "Tibetan pecha"  layout recognition model (id: 54306). </p> Ground truth data consists of 93 folios from a selection of 8 Tibetan treatises being explored in the TibSchol project. 85 folios were used in the training set and 8 in the validation set. Transcripts use the extended Wylie transliteration system in Roman alphabet. Abbreviations were transcribed as they appear in the manuscripts, for a list of abbreviations see https://github.com/ERC-TibSchol/abbreviations. The GT also includes 360 images of abbreviations kindly provided by MonlamAI (https://monlam.ai/). </p> If this model is used as base model for your own model, you are kindly requested to mention the model. The TibSchol project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 101001002). This model is published within the project team's responsibility. The European Research Council or the European Commission must not be held responsible for its further use.

Try this model

Tibetan cursive (Betsug)
Use this modelOpen in Transkribus
Very low error rate3.6% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 3.6% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words26,679
Lines857
Training Pages410
Model ID54935
Languages
Tibetan
Centuries
11th c.12th c.13th c.14th c.15th c.16th c.17th c.