Skip to content
  • Pricing
shadowjayant · PyLaia · Published May 19, 2026

typo v6

Text Recognition

Description

This model is designed for recognizing upright handwritten English text from scanned historical and literary documents. The training data includes pages containing cursive handwriting, punctuation marks, symbols, mixed sentence structures, and varying line lengths. The model was trained using manually corrected transcriptions in Transkribus to improve recognition accuracy for challenging handwritten text styles. The dataset contains pages with natural variations in writing flow, including commas, periods, semicolons, quotation marks, apostrophes, and irregular spacing. Multiple corrected pages were used to help the model learn consistent character patterns, word structures, and symbol recognition across different contexts. The training process focused on improving recognition of upright handwritten typography while maintaining readability and preserving the original textual structure. This model is especially useful for: * Handwritten English manuscripts * Historical or literary documents * Upright cursive writing * OCR transcription workflows * Digital archiving projects * Research and text digitization The model performs best on clear, upright page images with readable handwriting and consistent scan quality. It may still produce errors on heavily damaged pages, extremely decorative handwriting, low-resolution scans, or unusual symbols not sufficiently represented in the training data. The goal of this project is to create a specialized OCR model optimized for upright handwritten English text recognition while continuously improving accuracy through additional corrected training pages and iterative retraining.

Try this model

Drag an image here

Select a file...

PNG or JPG up to 10 Mb

Wolpi
AI Assistant

By uploading an image, you accept our terms and privacy policy.

Use this modelOpen in Transkribus
Very low error rate0.08% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 0.08% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material. This is a smaller, specialised model. It may achieve a very low CER on material similar to its training data, but could be less robust on unfamiliar handwriting or layouts.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words26,047
Lines1,487
Training Pages33
Model ID571569
Languages
English
Centuries
1st c.2nd c.3rd c.4th c.5th c.6th c.7th c.8th c.9th c.10th c.11th c.12th c.13th c.14th c.15th c.16th c.17th c.18th c.19th c.20th c.21st c.