Skip to content
  • Pricing

Your digitisation project, managed from start to finish

Whether you need proven text recognition at scale or a completely new approach for material no standard method can handle — our team of domain experts, AI specialists, and archival scientists runs the entire project. From understanding your corpus to delivering structured, searchable results integrated with your systems.

Your documentsScans, images, manuscripts
Analysis & proof of conceptModel selection, CER evaluation
Processing & trainingRecognition, custom models, QA
Structured deliveryXML, CSV, Sites, system integration
20M+pages in a single project
2,000+institutions trust Transkribus
95%+accuracy on trained models

From standard processing to solving problems nobody else can

Every collection is different. We match the approach to the challenge — from routine batch processing to developing entirely new AI frameworks.

Batch processing with proven models

For well-scanned material with standard scripts: we select the right models from 100+ publicly available text and layout recognition models, configure the workflow, run batch processing, perform quality checks, and deliver.

Printed books and government recordsStandard handwriting (Latin, Kurrent, Fraktur)Large volumes with consistent quality

Custom model training for your material

When standard models do not reach the accuracy you need — unusual handwriting, degraded scans, rare scripts — we train AI models specifically on your material. Multiple training rounds until we hit the target accuracy.

Rare or personal handwriting stylesDegraded scans or microfilm digitisationNon-Latin writing systems
See the Bautzen project — custom Kurrent model for 200 years of council minutes

Schema definition, data extraction & system integration

Beyond plain text: we define extraction schemas for your document types — tables, fields, structured records — and deliver data in the format your systems need. Publication as a searchable Transkribus Site with custom branding.

Table and field extraction from registersCSV, Excel, or database-ready outputIntegration with ArchivesSpace, AtoM, scopeArchivTranskribus Sites with full-text search
See the St. Gallen project — 200,000 pages published as a searchable Site

New frameworks when standard approaches fail

Some collections cannot be solved with existing tools. We develop novel AI approaches: end-to-end Smart Extract models that understand document structure contextually, Named Entity Recognition for automatic tagging, and custom frameworks for problems no off-the-shelf method can handle.

Smart Extract — contextual document understandingNamed Entity Recognition and geo-enrichmentNovel frameworks for non-standard documents
See the MfN Berlin project — first real-world Smart Extract deployment

How a managed project works

A proven process refined across dozens of institutional engagements. You stay in control of scope and quality — we handle the technical execution.

Understanding your material

We analyse your collection: document types, scripts, layouts, condition, volume. What data do you need extracted? What systems does it need to integrate with? What does success look like for your institution?

Proof of concept

You send us a representative sample. We run the full pipeline — including custom model training if needed — and return results with Character Error Rate measurements and a realistic cost estimate.

Project planning & kickoff

We define scope, timeline, milestones, deliverables, and pricing. A dedicated project manager with a background in digital humanities or archival science becomes your single point of contact.

Processing, training & quality assurance

Your PM coordinates the technical pipeline: recognition, model refinement, data extraction, quality checks. Bi-weekly sync meetings keep you informed.

Milestone delivery & review

Results are delivered progressively at agreed milestones, each with quality metrics and sample review. You review and approve before we continue.

Final handover & integration

The complete dataset in your required format — PAGE XML, ALTO, TEI, CSV, searchable PDF — or published as a Transkribus Site. All custom-trained models are yours to keep.

What we have delivered

From 55,000 handwritten pages to 20 million newspaper scans — every project is different.

Specimen labels from the Museum für Naturkunde Berlin

Museum für Naturkunde Berlin

Germany
250Kspecimen labels transcribed

250,000 specimen labels with handwritten metadata spanning two centuries. Standard OCR failed entirely — faded ink, damaged paper, mixed scripts, and non-standard layouts.

Developed a Smart Extract model — a single-pass AI that understands label structure contextually. Added Named Entity Recognition with GeoNames enrichment to automatically tag species and resolve place names.

First real-world Smart Extract deployment. Complete machine-readable dataset of 250,000 transcribed and tagged labels — a replicable model for natural history collections worldwide.

Read the full story
Historical newspaper pages from the Zeitpunkt.NRW project

Zeitpunkt.NRW

North Rhine-Westphalia, Germany
20Mnewspaper pages fully searchable

The complete historical newspaper holdings of North Rhine-Westphalia — 20 million pages spanning centuries. Complex multi-column layouts, Fraktur print, advertisements, and mixed content types.

Full-text recognition at unprecedented scale. AI layout segmentation for complex newspaper pages, batch processing with quality assurance, and publication through a state-level digital newspaper portal.

One of the largest single text recognition projects ever completed. Citizens and researchers can now search across centuries of regional history through the publicly accessible Zeitpunkt.NRW portal.

Visit zeitpunkt.nrw
Notarial records from the Noord-Hollands Archief

Noord-Hollands Archief

Haarlem, Netherlands
2Mscans of notarial archives searchable

Centuries of notarial archives — testaments, property transfers, inventories, witness statements — spanning 1570 to 1925. Nearly 2 million scans of handwritten documents across Haarlem, Kennemerland, and Amstel- en Meerlanden, inaccessible to anyone who cannot read historical scripts.

Applied HTR to the complete notarial archives. Published as a searchable Transkribus Site with fuzzy search for person names and locations. Achieved 93–98.6% character accuracy across collections. Part of the pioneering HTR project "De ijsberg zichtbaar maken" (2019–2021).

Nearly 2 million scans of notarial acts now fully text-searchable online. Researchers, genealogists, and citizens can search for names, locations, and subjects across 350 years of North Holland's notarial history.

Explore the collection
Council meeting minutes from the St. Gallen archive

State Archives of St. Gallen

Switzerland
200Kpages now publicly searchable

417 books, 200,000 pages of council meeting minutes — handwritten and typewritten, many digitised from older microfilm scans. Only accessible through in-person visits.

Custom model training on the council minutes. Combined automated transcription with manual correction. Published as a searchable Transkribus Site with side-by-side document and transcription views.

Council minutes from 1803 onward publicly accessible online — searchable around the clock. No expertise in historical handwriting required.

Read the full story
Historical Kurrent handwriting from the Bautzen archive

Archivverbund Bautzen

Germany
55Kpages of city council history

257 volumes of city council minutes spanning 1623–1832 — 55,000 pages of Kurrent script. Digitised but inaccessible because the handwriting was too difficult for untrained researchers.

Applied the Early Kurrent model, then trained a custom model to improve accuracy. Published as a Transkribus Site with permalinks integrating into Archivportal-D and Findbuch.

200 years of Bautzen city history fully searchable. Seamless discovery through existing archival portals.

Read the full story

Trusted by leading institutions worldwide

Your data stays yours

Full ownership and control. Data Processing Agreements (DPAs) and custom agreements available.

Hosted in Austria, EU

All processing on our own servers. GDPR-compliant. No third-party cloud.

A cooperative, not a startup

250+ archives, libraries, and universities as co-owners. Built for decades, not for exit.

No vendor lock-in

All output in standard formats. Trained models are yours. Data always exportable.

Tell us about your project

Describe your collection and goals — we will get back to you within one business day with a tailored approach, from proof of concept to final delivery.

Roughly how many pages do you need to process?

20,000 pages

By submitting, you agree to our Privacy Policy.

Your data stays yoursFull ownership and control. GDPR-compliant with DPAs available.
Hosted in Austria, EUAll processing on our own servers. No third-party cloud.
A cooperative, not a startup250+ archives, libraries, and universities as co-owners.

Frequently asked questions

Everything you need to know about managed digitisation projects.