Tibetan Generic 0.1

Description

First version of a generic Tibetan model that includes Uchan (dbu can), Ume (dbu med) as well as some English and Chinese. The texts come from the 18th to 20th century, including legal texts (Daniel Wojahn), modern books from the 1950s to 1980s (Divergent Discourses) as well as Tibetan Language Newspapers from the 1950s and 1960s (Divergent Discourses). "Test model Chinese" was chosen as base model to introduce some basic knowledge of Chinese, which features often in Tibetan texts and is contained in the training data only to some extend. Word count: 161482 words; validation set: 153 pages; training set: 1380 pages. Training cycles: 250; Early Stopping: 20; lines tagged "unclear" or "gap" were omitted; binarization enabled

Try this model

Drag an image here

Select a file...

PNG or JPG up to 10 Mb

Wolpi

AI Assistant

By uploading an image, you accept our terms and privacy policy.

Use this model Open in Transkribus

Very low error rate3.58% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 3.58% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material. This is a larger model trained on diverse material, which generally makes it more robust across different handwriting styles. That said, larger training sets also make it harder to push the CER down further.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words143,742

Lines97,979

Training Pages1,380

Model ID373545

Related models

Description

Try this model

Related models

PaganTibet Ume 5

PaganTibet Ume 4

PaganTibet Ume 3

Tibetan Modern U-chen Print 0.1