RTA2 (Romanian Transition Alphabet)

Description

This is a first attempt on a model trained for texts in the Romanian Transition Alphabet (1830-1862). In order to train an HTR model for these texts, I have chosen 5 samples that show, before and after 1859, when the 2 Romanian provinces become a country with an official language, the progression from a massive use of Cyrillic letters to an eye-friendly employment, which makes reading more fluent. As a general rule, Latin capital letters are preferred for writing titles after 1859. The Latin letters Z/ z, M/ m, D/ d, S/ s, T/ t, N/ n, A/ a, I/ i, E/ e, O/ o, Î/ î, U/ u, Ŭ/ ŭ, Ĭ/ ĭ are present from the oldest sampled text (1853), whereas the Cyrillic Х/х (ha), Ш/ ш (sha), Щ/ щ (shcha), Ц/ ц (tze), Џ/ џ (dze), Ч/ ч (che), Ъ/ ъ (ă), П/ п (pe), Р/ р (er), Ж/ ж (zhe), Ф/ф (ef), К/ к (ca), В/ в (ve), Л/ л (el), Г/ г (ghe), Б/ б (be). Among these Cyrillic letters, the first to receive a Latin equivalent are: Ф/ф (ef) → f; Г/ г (ghe) → g; Л/ л (el) → l; Ж/ ж (zhe) → j. At the same time, Р/ р (er), П/ п (pe), Ъ/ ъ (ă), Ч/ ч (che), В/ в (ve), Ш/ ш (sha), Щ/ щ (shcha), Ц/ ц (tse) tend to be maintained until 1862, when some of them they are replaced with glyphs such as “ḑ” (dz), “ş” (sh) and “ț” (tz), which were imported from the Livonian alphabet but have entered the printing circuit only after 1865. The general guidelines for transcription have been established as follows: 1. Creation of the collection “ALFABET DE TRANZITIE” containing 6 items. 2. Random transcription of initial, middle, and end pages. 3. Transliteration one-on-one of all Cyrillic letters excepting the situations when K/k stands for the group Ch/ ch (e.g. Бukete → Bukete): Х/х → H/ h; Ш/ ш → Ș/ ș; Щ/ щ → Șt/ șt; Ц/ ц → Ț/ ț, Ч/ ч → C/ c; Ъ/ ъ → Ă/ ă; П/ п → P/ p; C/c → S/s; Р/ р → R/ r; Ж/ ж → J/j; Ф/ф → F/ f; К/ к → C/c; В/ в → V/ v; Л/ л → L/l; Г/ г → G/ g; Б/ б → B/ b; Џ/ џ → G/ g. 4. Customization of the following glyphs: apostrophe, right double quotation mark, double low-9 quotation mark, Ŭ/ ŭ, Ĭ/ ĭ, á.

Try this model

Drag an image here

Select a file...

PNG or JPG up to 10 Mb

Wolpi

AI Assistant

By uploading an image, you accept our terms and privacy policy.

Use this model Open in Transkribus

Very low error rate2.8% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 2.8% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material. This is a smaller, specialised model. It may achieve a very low CER on material similar to its training data, but could be less robust on unfamiliar handwriting or layouts.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words10,250

Lines1,211

Training Pages41

Model ID51515

Related models

Description

Try this model

Related models

19th-century Romanian Transitional Script - GT corrected

Transkribus Print M1

Prensa filipina de 2 a 5 columnas

Dabbas 1706-1711