Submit thousands of jobs. We handle the rest.
The Transkribus API manages your processing queue intelligently. Submit documents one at a time or thousands in parallel — jobs are distributed across GPU clusters, processed asynchronously, and results delivered via long polling or standard polling. From a prototype integration to millions of archival pages.

Traditional pipeline vs. Transkribus
Document processing at scale used to mean managing people and queues manually. Transkribus handles that infrastructure for you.
Traditional approach
Hire transcribers
Recruit, train, and manage a team of skilled readers
Process sequentially
Each page transcribed by hand, one at a time
Quality review
Second reader checks every page for errors
Format and export
Manual conversion to the required output format
Transkribus batch processing
Submit jobs
Upload via web app or submit thousands of jobs via API
Intelligent queue
Jobs are distributed across GPU clusters automatically
Get results
Long polling for instant results, or poll async for batch jobs
Export
Plain text, PAGE XML, ALTO, TEI — structured output
Intelligent queue management
How the processing pipeline works
The Transkribus API is async by design. Submit jobs at any rate — the queue distributes them across available GPU capacity. For real-time integrations, use long polling to get results as soon as they're ready. Not satisfied with accuracy? Train a custom model on your specific documents using the visual editor, then reprocess the entire batch.
Submit
POST images via API — URL, base64, or file upload
Queue
Intelligent job distribution across GPU clusters
Process
Layout analysis + text recognition in parallel
Result
Long polling or async polling — your choice
Export
Plain text, PAGE XML, ALTO, or JSON
Case study
Zeitpunkt.NRW: 15 million newspaper pages in a single project

Structured output, not just flat text
Every page comes back with layout regions, text lines, word coordinates, and confidence scores.
Plain text
Simple UTF-8 text output. Feed into search indexes, databases, or NLP pipelines.
PAGE XML
Full layout coordinates — regions, lines, words, baselines. The standard for HTR workflows.
ALTO XML
Library-standard format for digitized collections. Compatible with Europeana, DFG Viewer, and IIIF.
TEI XML
Text Encoding Initiative format for scholarly editions and digital humanities projects.
Table data
Structured table recognition — rows, columns, and cell content extracted automatically.
Full-text search
Processed documents are instantly searchable within Transkribus — names, dates, places, keywords.
Ready to process your collection?
Start with a free account to test on a sample. For large-scale projects, talk to our team about volume pricing and project support.