5 AI models for transcribing old Russian handwriting and printed Russian texts

Fiona Park

5 AI models for transcribing old Russian handwriting and printed Russian texts

Fiona Park

January 25, 2023·4 min read

5 AI models for transcribing old Russian handwriting and printed Russian texts

As one of the world’s largest countries, Russia is also one of the most studied. Its turbulent history and influence on world politics make it the focus of many research projects, which often use historical documents — such as local registers, birth records or even personal diaries — as their primary sources.

In times gone by, deciphering the old Cyrillic handwriting or print within them used to be a time-consuming challenge requiring years of training. But AI has changed this. Using AI text recognition technology such as Transkribus, researchers can now simply run a scan of the document through the software and get an instant, automatic transcription. And as we all know, the less time we have to spend transcribing, the more time we have for the more satisfying parts of historical or genealogical research.

If you want to read historical documents in Russian, here are four public AI models that you can use with Transkribus to get instant transcriptions of your texts.

Reading Russian handwriting and print with AI:

AI text recognition platforms can read and transcribe historical documents in Russian. For example, software like Transkribus uses specialised AI models to recognise the handwritten or printed text, allowing users to convert images of historical documents into digital, searchable text.
Transkribus offers several models for Russian handwriting as well as for civil records and printed books in Russian.
These models can be selected when performing text recognition with Transkribus.

The Russian Generic Handwriting model is trained on handwriting and cursive from the last 200 years. © HKR Dataset for Russian and Kazakh

Russian Generic Handwriting 2

If you have a mix of documents from different genres and time periods, then this model, from the MultiHTR team at the University of Freiburg, is a good one to start with. Based on earlier models from the Estonian State Archives and the INEL project in Hamburg, as well as the Russian Civil Records model (see below) and the Prozhito database, it encompasses a wide range of Ground Truth mostly from the late 19th and early 20th centuries.

With a CER of 5.8%, it is capable of giving fairly accurate transcriptions for a wide variety of documents and is an excellent starting point for training your own model.

→ Try the model with your documents

Many civil records in Congress Poland, such as this birth certificate, were written in Russian. © Family Search

Russian Civil Records

This interesting model was created by the L’Dor V’Dor Foundation, who preserve Jewish historical records from around the world. They took handwritten civil records from Congress Poland, Ukraine and Russia from 1914 to 1968 as their Ground Truth, creating a model with a CER of 7.3%.

The model works particularly well with handwritten records from Congress Poland.

→ Try the model with your documents

Rychkov translated cultural information in the indigenous Evenki language (on the left side) into Russian (on the right side). From Arkhipov, Alexandre. (2021) Using Handwritten Text Recognition on Bilingual Evenki-Russian Manuscripts of Konstantin Rychkov. 2021, pp. 233–244.

Russian Rychkov Archive

This model is ideal for using with pre-form Cyrillic documents. It was trained on bilingual Evenki/Russian manuscripts by Russian ethnographer and linguist Konstantin M. Rychkov, who collected various pieces of cultural information from the Evenki culture and translated them into Russian.

The Ground Truth consisted of 581 pages from the Rychkov archive dating from 1911-1913, and it has a CER of 4.4%. The model was also created by the INEL project at the University of Hamburg.

→ Try the model with your documents

Okorokov's Printing House printed scientific papers and manuscripts at Moscow State University in the 18th century. © Georg Gottlob Richter, Севастиан Клинский

Russian Print 18th Century (V. Okorokov’s Printing House)

Created at the European University in St Petersburg, this model was based on a series of scientific papers published by V. Okorokov’s Printing House at Moscow State University. The papers were all printed in Russian, with some scientific terms given in Latin script.

The CER on the validation set is just 0.6% and the model shows good results on printed texts from other publishing houses of the era.

→ Try the model with your documents

How to use these models with Transkribus

Working with historical documents in Russian, whether printed or handwritten, requires specialised tools. Using AI to automate the transcription process allows researchers, librarians, and archivists to save countless hours of manual work, freeing up valuable time and energy to focus on interpretation, analysis, and curation.

To learn more about how to use these models, or how to train your own for a specific collection, please visit our Help Centre at help.transkribus.org or watch the webinar below for an introduction to using public models in Transkribus.

5 AI models for transcribing old Russian handwriting and printed Russian texts

Reading Russian handwriting and print with AI:

Russian Generic Handwriting 2

Russian Civil Records

Russian Rychkov Archive

Russian Print 18th Century (V. Okorokov’s Printing House)

How to use these models with Transkribus

Related Articles

Meet the members of READ-COOP

What is Digital Humanities?

How to digitise archival materials with Transkribus