TibNewsOne4All 0.2

Description

The model TibNewsOne4All is trained on 500 pages (ca. 100.037 words) of 13 different Tibetan language newspapers of the 1950s and 1960s published in both India and the PRC. The model mainly transcribes Tibetan Uchen script, but can also handle cursive scripts and - very limited - Chinese and English. TibNewsOne4All was trained for the Divergent Discourses, a collaborative research project led by Robert Barnett at SOAS and Franz Xaver Erhard at Leipzig University with funding from AHRC and DFG. For best results, it is recommended to perform text region and line polygon detection before HTR. Settings: - training set of 500 pages - validation set of 27 pages - lines tagged "unclear" were excluded. - 250 epochs - early stopping: 20. - Existing line polygons were not used in the training! - Tibetan language model TMUP 0.1 used as a basemodel

Try this model

Drag an image here

Select a file...

PNG or JPG up to 10 Mb

Wolpi

AI Assistant

By uploading an image, you accept our terms and privacy policy.

Use this model Open in Transkribus

Very low error rate2.52% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 2.52% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words92,423

Lines67,093

Training Pages500

Model ID169581

Related models

Description

Try this model

Related models

PaganTibet Ume 5

PaganTibet Ume 4

PaganTibet Ume 3

PaganTibet Ume 2