Skip to content
  • Pricing

Submit thousands of jobs. We handle the rest.

The Transkribus API manages your processing queue intelligently. Submit documents one at a time or thousands in parallel — jobs are distributed across GPU clusters, processed asynchronously, and results delivered via long polling or standard polling. From a prototype integration to millions of archival pages.

Batch processing a document collection
200M+pages processed on the platform
15M+pages in a single project
300+AI models for any script

Traditional pipeline vs. Transkribus

Document processing at scale used to mean managing people and queues manually. Transkribus handles that infrastructure for you.

Traditional approach

Hire transcribers

Recruit, train, and manage a team of skilled readers

Process sequentially

Each page transcribed by hand, one at a time

Quality review

Second reader checks every page for errors

Format and export

Manual conversion to the required output format

Linear — scales with headcount
vs

Transkribus batch processing

Submit jobs

Upload via web app or submit thousands of jobs via API

Intelligent queue

Jobs are distributed across GPU clusters automatically

Get results

Long polling for instant results, or poll async for batch jobs

Export

Plain text, PAGE XML, ALTO, TEI — structured output

Parallel — scales with infrastructure

Intelligent queue management

How the processing pipeline works

The Transkribus API is async by design. Submit jobs at any rate — the queue distributes them across available GPU capacity. For real-time integrations, use long polling to get results as soon as they're ready. Not satisfied with accuracy? Train a custom model on your specific documents using the visual editor, then reprocess the entire batch.

Submit

POST images via API — URL, base64, or file upload

Queue

Intelligent job distribution across GPU clusters

Process

Layout analysis + text recognition in parallel

Result

Long polling or async polling — your choice

Export

Plain text, PAGE XML, ALTO, or JSON

Case study

Zeitpunkt.NRW: 15 million newspaper pages in a single project

The state of North Rhine-Westphalia used Transkribus to process 15 million historical newspaper pages — the largest single digitization project on the platform. The collection spans over a century of regional newspapers, now fully searchable and accessible to the public at zeitpunkt.nrw.
15 million pages processed with AI text recognition
Historical Fraktur and blackletter print handled automatically
Publicly accessible and full-text searchable
Zeitpunkt.NRW — 15M newspaper pages processed

Structured output, not just flat text

Every page comes back with layout regions, text lines, word coordinates, and confidence scores.

Plain text

Simple UTF-8 text output. Feed into search indexes, databases, or NLP pipelines.

PAGE XML

Full layout coordinates — regions, lines, words, baselines. The standard for HTR workflows.

ALTO XML

Library-standard format for digitized collections. Compatible with Europeana, DFG Viewer, and IIIF.

TEI XML

Text Encoding Initiative format for scholarly editions and digital humanities projects.

Table data

Structured table recognition — rows, columns, and cell content extracted automatically.

Full-text search

Processed documents are instantly searchable within Transkribus — names, dates, places, keywords.

Ready to process your collection?

Start with a free account to test on a sample. For large-scale projects, talk to our team about volume pricing and project support.

200M+pages processed
Volumepricing available
EU-hostedGDPR-compliant