DiJeSt 3.0

Description

A model for printed (or typed) text in Hebrew Script (mainly Hebrew and Yiddish, both modern and Weiberteitch). The data includes the following contributions: - DiJeSt 2.0. 1,757 pages. The basis for the previous Transkribus model by that name, (https://app.transkribus.org/models/text/46003) collected with the support of Rothschild Foundation Hanadiv Europe in the framework of the project DiJeSt: Digitizing Jewish Studies. For more details see https://dijest.net/gtmodel/ - Hasidic Stories. 446 pages. Funded by the project “Historical Digital Analysis of Hasidic Stories Until 1914” ISF research grant no. 1478/2, headed by Gadi Sagiv, the Open University of Israel. - Zylbercweig Lexicon. 285 pages. Funded by the project “Historical Digital Analysis of Zalmen Zylbercweig’s Lexicon of Yiddish Theatre”. ISF grant number 284/24, headed by Ruthie Abeliovich, Tel Aviv University. - 20the century Hebrew Newspapers. 32 pages. Funded by the project ״The Double Movement? Towards a Socioeconomic Historiography of the Right in Israel (1948-1984)”, ISF grant no. 198/23, headed by Amir Goldstein, Tel Hai Academic College. - Community regulations, 1711-1929, High German Jewish community in Amsterdam. 250 pages. Ronny Reshef and Mirjam Gutschow. CF https://zenodo.org/records/7692989, https://zenodo.org/records/11179901

Try this model

Drag an image here

Select a file...

PNG or JPG up to 10 Mb

Wolpi

AI Assistant

By uploading an image, you accept our terms and privacy policy.

Use this model Open in Transkribus

Very low error rate1.79% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 1.79% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material. This is a larger model trained on diverse material, which generally makes it more robust across different handwriting styles. That said, larger training sets also make it harder to push the CER down further.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words1,498,332

Lines173,190

Training Pages2,853

Model ID357765

Related models

Description

Try this model

Related models

DiJeSt 2.0

IGRA Sfardi Burial Hebrew

Vaybertaytsh.YidTakNL

TOME 3.0