TibSchol Project · PyLaia · Published August 21, 2023

Tibetan cursive (Drutsa)

Text Recognition

Description

This model is tailored towards transcribing the handwritten Tibetan cursive script known as Drutsa (’bru tsha). It was created in the framework of the ERC project The Dawn of Tibetan Buddhist Scholasticism (11th-13th c.) (TibSchol) (https://www.oeaw.ac.at/projects/tibschol), hosted at the Institute for the Cultural and Intellectual History of Asia, Austrian Academy of Sciences, and was released by Rachael Griffiths (rachaelgriffiths1@gmail.com). For best results, apply this model after running the "Tibetan pecha"  layout recognition model (id: 54306). Ground truth data consists of 466 folios from a selection of 19 Tibetan treatises being explored in the TibSchol project. 422 folios were used in the training set and 44 in the validation set. Transcripts use the extended Wylie transliteration system in Roman alphabet. Abbreviations were transcribed as they appear in the manuscripts, for a list of abbreviations see https://github.com/ERC-TibSchol/abbreviations. If this model is used as base model for your own model, you are kindly requested to mention the model. The TibSchol project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 101001002). This model is published within the project team's responsibility. The European Research Council or the European Commission must not be held responsible for its further use.

Try this model

Tibetan cursive (Drutsa)
Use this modelOpen in Transkribus
Very low error rate1.4% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 1.4% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words283,839
Lines3,803
Training Pages422
Model ID54525
Languages
Tibetan
Centuries
11th c.12th c.13th c.14th c.15th c.16th c.17th c.