Technical Reference
Architecture, system requirements, and processing pipeline for Transkribus On-Prem.
Processing pipeline
What happens to your documents, step by step. Click or hover a stage for detail.
Upload
Documents enter the system as image files — TIFF, JPEG, PNG, or multi-page PDF. They can be uploaded through the web dashboard, placed in a watched directory, or submitted via the REST API (Enterprise edition).
Recognition engines
Two engine tiers are included. Both run identically on-prem and on the cloud platform.
Standard HTR
Encoder-decoder neural network for handwritten and printed text. Optimised for throughput — suitable for large-scale batch processing.
| Scripts | Latin, German (Kurrent, Fraktur), and major European scripts |
| Accuracy | CER typically 2–5% on clean documents, 5–10% on challenging material |
| Throughput | ~2 seconds/page per GPU (warm, ~20 lines/page) |
| VRAM | ~4 GB per concurrent model |
Super Models
Larger model architecture with broader script coverage and higher accuracy on difficult material. Use when accuracy matters more than speed.
| Scripts | 70+ scripts including Latin, Greek, Cyrillic, Hebrew, Arabic, and East Asian |
| Accuracy | CER typically 1–3% on common scripts, 3–7% on rare material |
| Throughput | ~4 seconds/page per GPU (warm, ~20 lines/page) |
| VRAM | ~8 GB per concurrent model |
Use Standard HTR when processing large volumes of documents in well-supported scripts. Use Super Models when working with rare scripts, mixed-language documents, or when accuracy is the primary concern. Both can be available simultaneously — the user selects per job.
Model training
Train custom recognition models on your own documents. All training runs locally — no data leaves your infrastructure.
Ground Truth
Transcribe a sample of your documents — typically 50–100 pages for fine-tuning an existing base model. The web dashboard includes ground truth editing tools.
Fine-tuning typically takes hours, not days. A base model trained on similar material can be adapted to a specific hand or document collection with surprisingly little ground truth.
Output formats
| Format | Content | Use case |
|---|---|---|
| PageXML | Baselines, polygons, text, confidence scores, metadata | Round-trip with Transkribus, scholarly editing, preservation |
| ALTO XML | Library-standard OCR structure | METS containers, institutional repositories, Europeana |
| Searchable PDF | Invisible text layer over original scan | End-user access, full-text search, citation |
| Plain Text | UTF-8 text, one file per page | Full-text indexing, NLP pipelines, corpus building |
Architecture — Workstation
Single-server deployment with Docker Compose. All services run on one machine. Includes a live visualisation dashboard with confidence heatmaps and a streaming API for integration.
System requirements — Workstation
| Component | Minimum | Recommended |
|---|---|---|
| OS | Ubuntu 22.04+ / Windows Server 2022 | Ubuntu 22.04 LTS |
| CPU | 8 cores | 16+ cores |
| RAM | 32 GB | 64 GB |
| GPU | NVIDIA, 12 GB VRAM (RTX 3060+) | RTX 4090 / A6000 (24 GB VRAM) |
| Storage | 500 GB SSD | 1 TB+ NVMe |
| NVIDIA Driver | 565.57+ | Latest stable |
| CUDA | 12.4+ | 12.4+ |
| Docker | 24.0+ | Latest stable |
Architecture — Enterprise
Kubernetes or OpenShift cluster with GPU worker nodes. Server/client GPU architecture with MIG partitioning, Redis pub/sub event coordination, S3 storage integration, and Prometheus monitoring. Horizontal scaling, rolling updates, GitOps deployment.
System requirements — Enterprise
| Component | Requirement |
|---|---|
| Orchestration | Kubernetes 1.27+ or OpenShift 4.x |
| GPU Operator | NVIDIA GPU Operator with MIG support |
| Storage | S3-compatible object storage (MinIO, Ceph, AWS S3) |
| GPU per worker | NVIDIA A100 or H100 recommended (MIG partitioning supported) |
| Event coordination | Redis (pub/sub for job coordination) |
| Monitoring | Prometheus + Grafana (metrics exported natively) |
| Deployment | Helm chart provided, ArgoCD recommended |
| NVIDIA Driver | 565.57+ / CUDA 12.4+ |
Performance
Throughput benchmarks at ~20 lines per page. Actual results depend on document complexity, page size, and lines per page.
Workstation (single GPU, RTX 3090)
| Metric | Standard HTR | Super Models |
|---|---|---|
| Per page (warm) | ~3 s | ~5 s |
| Daily (8 h) | ~9,600 pages | ~5,700 pages |
Enterprise (per A100)
| Metric | Standard HTR | Super Models |
|---|---|---|
| Per page (warm) | ~2 s | ~4 s |
| Daily per GPU (8 h) | ~14,000 pages | ~7,000 pages |
| 8× A100 cluster | ~100,000 pages/day | ~56,000 pages/day |
Cold start adds 5–10 seconds for model loading. Subsequent pages in the same batch use the warm throughput above.
Questions about deployment?
For hardware sizing, trial installations, and integration support — contact us.