dahlj · PyLaia · Published March 31, 2025

Scottish Custom Books V0.8

Text Recognition

Description

This model has been trained on a text collection consisting of samples from the Scottish port books dated between 1660 and 1691 all from the collection of the National Records of Scotland. The model has been trained in order to provide a systematic analysis of the overseas’ export of Scotland in the period following the Stuart restoration of 1660 and ending with the Act of Union of 1707. The model has been trained on complete port books from nine different ports representing the different jurisdictions of seventeenth century Scotland. The following port books has been included in the training material: Aberdeen (1690-91), Ayr (1667, 1681, 1681-82, 1682, 1682-83, 1684-85, 1685-86, 1689, 1689-90, 1690), Blackness and Bo’ness (1681-82), Edinburgh (1672-73, Inverness (1665-67, 1668-69, 1672-73, 1684-85, 1690), Kelso (1689-90), Kirkcaldy (Fife) (1672, 1673, 1680-81, 1681, 1681-1682, 1682-83, 1683-84, 1684-85, 1684-85, 1685-86, 1688-89, 1689-90), Leith (1671-72, 1681, 1682-83, 1683-84, 1684-85, 1684-85, 1685-86, 1688-89), Montrose (1672-73). The training material contains a decent variety of hands, styles and page layouts representative for the port books of the period. 5 % Validation set Base model: The English Eagle 250 Epochs Learning Rate 0.0003 Use Existing line polygons Omit lines by tag: gap and unclear Link to zenodo for full model description and training set samples: https://zenodo.org/records/15324159

Try this model

Use this modelOpen in Transkribus
Moderate error rate10.42% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 10.42% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material. This is a larger model trained on diverse material, which generally makes it more robust across different handwriting styles. That said, larger training sets also make it harder to push the CER down further.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words116,003
Lines33,793
Training Pages631
Model ID315953
Languages
English