NAF Court Records M10 v2

Name: NAF Court Records M10 v2
Author: National Archives of Finland

Description

This model is based on Renovated District Court Records (Fi: Kihlakunnanoikeuksien renovoidut tuomiokirjat, Swe: Häradsrätternas renoverade domböcker) from the years 1809-1870. Models training set consists of 2841 double-pages and the validation set 100 double-pages. Since there were many (dozens) scribes it is a combination of many different handwritings. The Ground Truth material is picked across Finland from 58 different court districts. Most of the Ground Truth is in Swedish, but there is also some Finnish since from 1850s some of the court districts started to write Court Records in Finnish. Renovated District Court Records are split into two series: Main Records & Notification Records. This model includes mostly Notification Records. Nevertheless the model also works fine with Main Records. This model was created as part of the READ project at National Archives of Finland (NAF). It has been used to transcribe the Notification Records from the years 1809-1870 (all districts). As a result, a search interface has been implemented where you can perform full text searches and browse automatically transcribed documents. The search interface and more information can be found at: https://tuomiokirjat.kansallisarkisto.fi/

Try this model

Drag an image here

Select a file...

PNG or JPG up to 10 Mb

Wolpi

AI Assistant

By uploading an image, you accept our terms and privacy policy.

Use this model Open in Transkribus

Very low error rate2.5% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 2.5% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material. This is a larger model trained on diverse material, which generally makes it more robust across different handwriting styles. That said, larger training sets also make it harder to push the CER down further.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words1,226,202

Lines207,773

Training Pages2,841

Model ID20686

Related models

Description

Try this model

Related models

Text Titan II

The Text Titan I ter

The Text Titan I (Super Model)

The Text Titan I bis