HTR for Medieval Manuscript Transcription

500K+Users worldwide

200M+Pages processed

300+Public AI models

500+Universities and research institutions

The challenge

Why Medieval Handwriting Recognition Demands Specialized Tools

Medieval manuscripts present challenges that no general-purpose OCR system can handle. The scripts themselves are the first barrier: a 12th-century Caroline minuscule codex shares almost no visual characteristics with a 15th-century bastarda charter. But the difficulties go far beyond letterforms. Medieval scribes used extensive abbreviation systems – suspension marks, contraction strokes, tironian notes, and specialized symbols for common Latin words – that compress text by 30-40%. Ligatures merge characters in ways that vary by scriptorium and scribe. Damaged parchment, faded iron gall ink, palimpsests, and marginal glosses add further complexity. Standard OCR, trained on printed text, produces no usable output on these materials.

Abbreviation systems: suspension, contraction, superscript letters, tironian notes – standard OCR has no framework to interpret these

Script diversity: Gothic textura, rotunda, cursiva, Caroline minuscule, Beneventan, Insular, bastarda – each requires distinct recognition models

Ligatures and letter fusion vary by scriptorium, period, and individual scribe

Physical damage: parchment holes, ink fading, palimpsests, water stains, and binding obscuring text near the gutter

Multi-layered text: marginal glosses, interlinear additions, corrections, and rubrication require sophisticated layout analysis

16th-century document with ornate calligraphy and decorative initial letters

The solution

How Transkribus Transcribes Medieval Documents with HTR

Transkribus uses Handwritten Text Recognition (HTR) – deep learning models trained on transcribed manuscript pages – rather than character-template matching. This approach is fundamentally suited to medieval scripts because it learns holistic word and line patterns, not isolated character shapes. The platform's public model repository includes models trained on specific medieval scripts by researchers who work with these materials daily. Where no existing model fits your collection, Transkribus allows you to train a custom HTR model on your own ground truth data, producing a recognition engine tuned to a specific scribe, scriptorium, or document type.

Public HTR models for Gothic textura, Caroline minuscule, and other major medieval scripts – ready to use immediately

Custom model training: provide 50–100 pages of ground truth and train a model for your specific manuscript hand

Layout analysis handles multi-column pages, marginal glosses, rubrication, and interlinear text

Abbreviation expansion can be incorporated into model training for fully resolved transcriptions

Export as TEI-XML with word-level coordinates and confidence scores for digital scholarly editions

How handwriting recognition works

Addres to dear Isabella on the Authors

recovery

O Isa pain did visit me

I was at the last extremity

How often did I think of you

I wished your graceful form to view

To clasp you in my weak embrace

Indeed I thought Id run my race

Good Care Im sure was of me taken

But indeed I was much shaken

At last I daily strength did gain

From manuscript image to TEI-XML edition in 4 steps

Upload manuscript images

Import high-resolution scans or photographs of manuscript folios. Transkribus accepts TIFF, JPG, PNG, and PDF. Organize by codex, quire, or collection.

Select or train an HTR model

Choose from public models trained on medieval scripts, or train a custom model on your own ground truth. For best results on a specific manuscript hand, 50–100 transcribed pages suffice.

Run layout analysis and recognition

Transkribus detects text regions, baselines, columns, and marginal zones automatically. HTR processes each detected line and returns the transcription with per-line confidence scores.

Review, correct, and export

Review the transcription in the built-in editor alongside the manuscript image. Correct errors, add TEI markup, then export as TEI-XML, PAGE XML, ALTO, or plain text for your edition or corpus.

Models and scripts

Gothic Script Recognition and Beyond: Public Models for Medieval Paleography

The Transkribus public model repository includes HTR models contributed by medieval studies researchers and digital humanities projects worldwide. These models cover the major script families encountered in European manuscript traditions from the 8th to the 16th century. Because each model is trained on actual manuscript pages – not synthetic data – they reflect the real-world variation of scribal hands, regional conventions, and period-specific abbreviation practices.

Gothic textura (textualis formata and libraria): Latin liturgical and literary manuscripts, 12th–15th century

Caroline minuscule: Carolingian-era codices, 9th–12th century – the foundation of later European scripts

Beneventan script: Southern Italian and Dalmatian manuscripts, 8th–13th century

Insular scripts (insular majuscule and minuscule): Irish and Anglo-Saxon manuscripts, 6th–9th century

Bastarda and hybrida: Late medieval administrative and literary manuscripts, 14th–16th century

Custom model training for any script not covered by existing public models

Browse public models

Early 19th-century document in historical German Kurrent script

Custom training

Train a Custom HTR Model for Your Manuscript Collection

No two medieval manuscript collections are alike. A 14th-century notarial register from Provence uses a different hand than a 14th-century psalter from Bohemia, even if both fall under 'Gothic cursiva.' Transkribus allows you to train a custom HTR model on your own transcribed ground truth, producing a recognition engine precisely calibrated to your documents. This is how research teams achieve the highest accuracy – by combining domain paleographic expertise with machine learning.

Start with 50–100 pages of manually transcribed ground truth from your manuscript

The training process typically takes a few hours and can be run from the Transkribus interface

Trained models can resolve scribal abbreviations if your ground truth includes expanded forms

Fine-tune an existing public model on your data for faster convergence and fewer training pages

Share your trained model with the research community or keep it private to your project

Train a custom model for your manuscript

Historical Grenzbeschreibung document with formal historical handwriting

Frequently Asked Questions

Which medieval scripts does Transkribus support?

Transkribus has public HTR models for the major medieval script families, including Gothic textura (textualis), Caroline minuscule, Beneventan, Insular (both majuscule and minuscule), bastarda, hybrida, and various regional cursive hands. The model catalog is continuously expanded by the research community. For scripts not yet covered, you can train a custom model on your own ground truth data.

What accuracy can I expect on medieval manuscripts?

Accuracy varies significantly depending on the script, the condition of the manuscript, and the model used. On well-preserved Gothic textura with a matched model, character error rates of 3–5% are achievable. More challenging materials – damaged parchment, heavily abbreviated text, unusual hands – may start at 10–15% error rate with a public model and improve substantially with custom model training. Every line includes a confidence score for targeted review.

How do I train a model for a specific manuscript hand?

Transcribe 50–100 representative pages from your manuscript using the Transkribus editor. This ground truth data serves as training input. Launch the training process from the interface – it typically runs for a few hours. The resulting model is specific to your manuscript's scribal hand, abbreviation system, and layout. You can iteratively improve the model by adding more ground truth.

How does Transkribus handle abbreviations and ligatures?

This depends on how your ground truth is prepared. If your training data expands abbreviations (e.g., transcribing the suspension mark over 'dn' as 'dominus'), the model learns to output expanded forms. If your ground truth preserves abbreviation marks as Unicode characters, the model reproduces them. Many researchers train two models – one for diplomatic transcription and one for expanded – depending on their editorial methodology.

Can I export to TEI-XML for critical editions?

Yes. Transkribus supports TEI-XML export with word-level coordinates, confidence scores, and structural markup. This output can be integrated into digital edition frameworks such as EVT (Edition Visualization Technology) or used as input for collation tools like CollateX. PAGE XML and ALTO XML exports are also available for other downstream workflows.

How does Transkribus handle damaged or fragmentary manuscripts?

The layout analysis engine detects text regions even on pages with holes, stains, or missing sections. For damaged areas, the HTR model produces output with lower confidence scores, clearly flagging uncertain readings. Researchers can mark lacunae in the editor and exclude damaged regions from processing. The system does not hallucinate text where none is legible.

How should I cite Transkribus in publications?

Transkribus provides a recommended citation format in its documentation. Typically, you cite the platform (Transkribus, developed at the University of Innsbruck), the specific HTR model used (including its ID and version), and the processing date. This ensures reproducibility – another researcher can apply the same model to verify your transcriptions. The READ-COOP publication list includes key reference papers.

Built for research. Hosted in Europe. Governed by the community.

Transkribus is developed and operated by the READ-COOP, a European cooperative of 250+ research institutions, archives, and libraries.

Your data stays yours

Full ownership of all uploaded documents and generated transcriptions. Delete anytime.

Hosted in Austria, EU

All processing on our own servers. GDPR-compliant. No third-party cloud dependencies.

Cooperative, not a startup

Thousands of archives, libraries, and universities as co-owners. Built for decades, not a VC exit.

Related resources

More for researchers

Explore the broader Transkribus research toolkit: Transkribus for researchers · What is HTR? · Archival backlog reduction · Create searchable PDFs

Transkribus for researchers

Historical document with wax seals and old cursive handwriting

Ready to accelerate your manuscript transcription?

Join 500+ universities already using Transkribus for handwritten text recognition. Start with free credits and explore public models for medieval scripts. Or try the free handwriting reader to see AI transcription in action.

Try free See plans

50 free credits every month – No credit card required