transkribering.samla · PyLaia · Published October 25, 2025

Norwegian hand & typed 1850-1960

Text Recognition

Description

The model is based mainly on texts from three Norwegian tradition archives, published on samla.no. SAMLA is a collaboration between The Ethno-Folkloristic Archive at the University of Bergen (owner), The Norwegian Folklore Archives at the University of Oslo, The Norwegian Ethnological Research at the Norwegian Folk Museum. The model is trained on a variety of Norwegian handwritten dialects, including bokmål and nynorsk, and some material from the transition phase between Norwegian and Danish in the late 19th century and early 20th century. The material describes Norwegian culture and nature, by for instance folklore, legends, fairy tales, letters, diaries, questionaries and descriptions of crafts and traditions. The base model used is “NorHand 1820-1940”, from the National Library of Norway. The training material also include some typed text, as some documents contains both. Additional material has been provided from the archive of The Royal Court, Norway. Due to the ownership and age of some of the material, the training data is not published, but parts can be accessed by request. The model is developed by Therese Foldvik (University of Oslo).

Try this model

Norwegian hand & typed 1850-1960
Use this modelOpen in Transkribus
Very low error rate4.2% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 4.2% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material. This is a larger model trained on diverse material, which generally makes it more robust across different handwriting styles. That said, larger training sets also make it harder to push the CER down further.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words918,195
Lines148,630
Training Pages7,143
Model ID422161
Languages
NorwegianBokmålNorwegian
Centuries
19th c.20th c.