Skip to content
  • Pricing

Medieval Manuscript Transcription Software: AI-Powered HTR for Historical Scripts

Gothic textura, Caroline minuscule, Beneventan, and more — AI-powered HTR turns months of manual transcription into hours, with TEI-XML export for critical editions.

Medieval ScriptsGothic TexturaCustom ModelsFree to Try

Drag an image here

Select a file...

PNG or JPG up to 10 Mb

Wolpi
AI Assistant

By uploading an image, you accept our terms and privacy policy.

Trusted by 500,000+ users worldwide — 200M+ pages processed

500K+Users worldwide
200M+Pages processed
300+Public AI models
500+Universities and research institutions

The challenge

Why Medieval Handwriting Recognition Demands Specialized Tools

Medieval manuscripts present challenges that no general-purpose OCR system can handle. The scripts themselves are the first barrier: a 12th-century Caroline minuscule codex shares almost no visual characteristics with a 15th-century bastarda charter. But the difficulties go far beyond letterforms. Medieval scribes used extensive abbreviation systems — suspension marks, contraction strokes, tironian notes, and specialized symbols for common Latin words — that compress text by 30-40%. Ligatures merge characters in ways that vary by scriptorium and scribe. Damaged parchment, faded iron gall ink, palimpsests, and marginal glosses add further complexity. Standard OCR, trained on printed text, produces no usable output on these materials.
Abbreviation systems: suspension, contraction, superscript letters, tironian notes — standard OCR has no framework to interpret these
Script diversity: Gothic textura, rotunda, cursiva, Caroline minuscule, Beneventan, Insular, bastarda — each requires distinct recognition models
Ligatures and letter fusion vary by scriptorium, period, and individual scribe
Physical damage: parchment holes, ink fading, palimpsests, water stains, and binding obscuring text near the gutter
Multi-layered text: marginal glosses, interlinear additions, corrections, and rubrication require sophisticated layout analysis
16th-century document with ornate calligraphy and decorative initial letters

The solution

How Transkribus Transcribes Medieval Documents with HTR

Transkribus uses Handwritten Text Recognition (HTR) — deep learning models trained on transcribed manuscript pages — rather than character-template matching. This approach is fundamentally suited to medieval scripts because it learns holistic word and line patterns, not isolated character shapes. The platform's public model repository includes models trained on specific medieval scripts by researchers who work with these materials daily. Where no existing model fits your collection, Transkribus allows you to train a custom HTR model on your own ground truth data, producing a recognition engine tuned to a specific scribe, scriptorium, or document type.
Public HTR models for Gothic textura, Caroline minuscule, and other major medieval scripts — ready to use immediately
Custom model training: provide 50–100 pages of ground truth and train a model for your specific manuscript hand
Layout analysis handles multi-column pages, marginal glosses, rubrication, and interlinear text
Abbreviation expansion can be incorporated into model training for fully resolved transcriptions
Export as TEI-XML with word-level coordinates and confidence scores for digital scholarly editions
Document
Addres to dear Isabella on the Authors
recovery
O Isa pain did visit me
I was at the last extremity
How often did I think of you
I wished your graceful form to view
To clasp you in my weak embrace
Indeed I thought Id run my race
Good Care Im sure was of me taken
But indeed I was much shaken
At last I daily strength did gain

From manuscript image to TEI-XML edition in 4 steps

Upload manuscript images

Import high-resolution scans or photographs of manuscript folios. Transkribus accepts TIFF, JPG, PNG, and PDF. Organize by codex, quire, or collection.

Select or train an HTR model

Choose from public models trained on medieval scripts, or train a custom model on your own ground truth. For best results on a specific manuscript hand, 50–100 transcribed pages suffice.

Run layout analysis and recognition

Transkribus detects text regions, baselines, columns, and marginal zones automatically. HTR processes each detected line and returns the transcription with per-line confidence scores.

Review, correct, and export

Review the transcription in the built-in editor alongside the manuscript image. Correct errors, add TEI markup, then export as TEI-XML, PAGE XML, ALTO, or plain text for your edition or corpus.

Models and scripts

Gothic Script Recognition and Beyond: Public Models for Medieval Paleography

The Transkribus public model repository includes HTR models contributed by medieval studies researchers and digital humanities projects worldwide. These models cover the major script families encountered in European manuscript traditions from the 8th to the 16th century. Because each model is trained on actual manuscript pages — not synthetic data — they reflect the real-world variation of scribal hands, regional conventions, and period-specific abbreviation practices.
Gothic textura (textualis formata and libraria): Latin liturgical and literary manuscripts, 12th–15th century
Caroline minuscule: Carolingian-era codices, 9th–12th century — the foundation of later European scripts
Beneventan script: Southern Italian and Dalmatian manuscripts, 8th–13th century
Insular scripts (insular majuscule and minuscule): Irish and Anglo-Saxon manuscripts, 6th–9th century
Bastarda and hybrida: Late medieval administrative and literary manuscripts, 14th–16th century
Custom model training for any script not covered by existing public models
Early 19th-century document in historical German Kurrent script

Custom training

Train a Custom HTR Model for Your Manuscript Collection

No two medieval manuscript collections are alike. A 14th-century notarial register from Provence uses a different hand than a 14th-century psalter from Bohemia, even if both fall under 'Gothic cursiva.' Transkribus allows you to train a custom HTR model on your own transcribed ground truth, producing a recognition engine precisely calibrated to your documents. This is how research teams achieve the highest accuracy — by combining domain paleographic expertise with machine learning.
Start with 50–100 pages of manually transcribed ground truth from your manuscript
The training process typically takes a few hours and can be run from the Transkribus interface
Trained models can resolve scribal abbreviations if your ground truth includes expanded forms
Fine-tune an existing public model on your data for faster convergence and fewer training pages
Share your trained model with the research community or keep it private to your project
Historical Grenzbeschreibung document with formal historical handwriting

Frequently Asked Questions

Transkribus has public HTR models for the major medieval script families, including Gothic textura (textualis), Caroline minuscule, Beneventan, Insular (both majuscule and minuscule), bastarda, hybrida, and various regional cursive hands. The model catalog is continuously expanded by the research community. For scripts not yet covered, you can train a custom model on your own ground truth data.
Accuracy varies significantly depending on the script, the condition of the manuscript, and the model used. On well-preserved Gothic textura with a matched model, character error rates of 3–5% are achievable. More challenging materials — damaged parchment, heavily abbreviated text, unusual hands — may start at 10–15% error rate with a public model and improve substantially with custom model training. Every line includes a confidence score for targeted review.
Transcribe 50–100 representative pages from your manuscript using the Transkribus editor. This ground truth data serves as training input. Launch the training process from the interface — it typically runs for a few hours. The resulting model is specific to your manuscript's scribal hand, abbreviation system, and layout. You can iteratively improve the model by adding more ground truth.
This depends on how your ground truth is prepared. If your training data expands abbreviations (e.g., transcribing the suspension mark over 'dn' as 'dominus'), the model learns to output expanded forms. If your ground truth preserves abbreviation marks as Unicode characters, the model reproduces them. Many researchers train two models — one for diplomatic transcription and one for expanded — depending on their editorial methodology.
Yes. Transkribus supports TEI-XML export with word-level coordinates, confidence scores, and structural markup. This output can be integrated into digital edition frameworks such as EVT (Edition Visualization Technology) or used as input for collation tools like CollateX. PAGE XML and ALTO XML exports are also available for other downstream workflows.
The layout analysis engine detects text regions even on pages with holes, stains, or missing sections. For damaged areas, the HTR model produces output with lower confidence scores, clearly flagging uncertain readings. Researchers can mark lacunae in the editor and exclude damaged regions from processing. The system does not hallucinate text where none is legible.
Transkribus provides a recommended citation format in its documentation. Typically, you cite the platform (Transkribus, developed at the University of Innsbruck), the specific HTR model used (including its ID and version), and the processing date. This ensures reproducibility — another researcher can apply the same model to verify your transcriptions. The READ-COOP publication list includes key reference papers.

Built for research. Hosted in Europe. Governed by the community.

Transkribus is developed and operated by the READ-COOP, a European cooperative of 250+ research institutions, archives, and libraries.

Your data stays yours

Full ownership of all uploaded documents and generated transcriptions. Delete anytime.

Hosted in Austria, EU

All processing on our own servers. GDPR-compliant. No third-party cloud dependencies.

Cooperative, not a startup

Thousands of archives, libraries, and universities as co-owners. Built for decades, not a VC exit.

Related resources

More for researchers

Explore the broader Transkribus research toolkit: Transkribus for researchers · What is HTR? · Archival backlog reduction · Create searchable PDFs
Historical document with wax seals and old cursive handwriting

Ready to accelerate your manuscript transcription?

Join 500+ universities already using Transkribus for handwritten text recognition. Start with free credits and explore public models for medieval scripts.

50 free credits every month — No credit card required

200M+Pages processed
500+Universities using Transkribus
300+Public AI models