Skip to content
  • Pricing
On-Prem Overview

Technical Reference

Architecture, system requirements, and processing pipeline for Transkribus On-Prem.

Processing pipeline

What happens to your documents, step by step. Click or hover a stage for detail.

Upload

Documents enter the system as image files — TIFF, JPEG, PNG, or multi-page PDF. They can be uploaded through the web dashboard, placed in a watched directory, or submitted via the REST API (Enterprise edition).

Recognition engines

Two engine tiers are included. Both run identically on-prem and on the cloud platform.

Standard HTR

Encoder-decoder neural network for handwritten and printed text. Optimised for throughput — suitable for large-scale batch processing.

ScriptsLatin, German (Kurrent, Fraktur), and major European scripts
AccuracyCER typically 2–5% on clean documents, 5–10% on challenging material
Throughput~2 seconds/page per GPU (warm, ~20 lines/page)
VRAM~4 GB per concurrent model

Super Models

Larger model architecture with broader script coverage and higher accuracy on difficult material. Use when accuracy matters more than speed.

Scripts70+ scripts including Latin, Greek, Cyrillic, Hebrew, Arabic, and East Asian
AccuracyCER typically 1–3% on common scripts, 3–7% on rare material
Throughput~4 seconds/page per GPU (warm, ~20 lines/page)
VRAM~8 GB per concurrent model

Use Standard HTR when processing large volumes of documents in well-supported scripts. Use Super Models when working with rare scripts, mixed-language documents, or when accuracy is the primary concern. Both can be available simultaneously — the user selects per job.

Model training

Train custom recognition models on your own documents. All training runs locally — no data leaves your infrastructure.

Ground Truth

Transcribe a sample of your documents — typically 50–100 pages for fine-tuning an existing base model. The web dashboard includes ground truth editing tools.

Fine-tuning typically takes hours, not days. A base model trained on similar material can be adapted to a specific hand or document collection with surprisingly little ground truth.

Output formats

FormatContentUse case
PageXMLBaselines, polygons, text, confidence scores, metadataRound-trip with Transkribus, scholarly editing, preservation
ALTO XMLLibrary-standard OCR structureMETS containers, institutional repositories, Europeana
Searchable PDFInvisible text layer over original scanEnd-user access, full-text search, citation
Plain TextUTF-8 text, one file per pageFull-text indexing, NLP pipelines, corpus building

Architecture — Workstation

Single-server deployment with Docker Compose. All services run on one machine. Includes a live visualisation dashboard with confidence heatmaps and a streaming API for integration.

Access
BrowserWeb Dashboard
Services
Web Servernginx / port 443
Processing
RecognitionGPU-accelerated
TrainingOptional
Data
DatabasePostgreSQL
StorageLocal / NAS

System requirements — Workstation

ComponentMinimumRecommended
OSUbuntu 22.04+ / Windows Server 2022Ubuntu 22.04 LTS
CPU8 cores16+ cores
RAM32 GB64 GB
GPUNVIDIA, 12 GB VRAM (RTX 3060+)RTX 4090 / A6000 (24 GB VRAM)
Storage500 GB SSD1 TB+ NVMe
NVIDIA Driver565.57+Latest stable
CUDA12.4+12.4+
Docker24.0+Latest stable

Architecture — Enterprise

Kubernetes or OpenShift cluster with GPU worker nodes. Server/client GPU architecture with MIG partitioning, Redis pub/sub event coordination, S3 storage integration, and Prometheus monitoring. Horizontal scaling, rolling updates, GitOps deployment.

Access
IngressAPI Gateway / LB
Services
REST APIRecognition Service
DashboardWeb UI
Processing
GPU Worker 1A100 / H100
GPU Worker 2A100 / H100
GPU Worker NScale out
Training JobsK8s Jobs
Data
S3 StorageMinIO / Ceph
ArgoCDGitOps (optional)

System requirements — Enterprise

ComponentRequirement
OrchestrationKubernetes 1.27+ or OpenShift 4.x
GPU OperatorNVIDIA GPU Operator with MIG support
StorageS3-compatible object storage (MinIO, Ceph, AWS S3)
GPU per workerNVIDIA A100 or H100 recommended (MIG partitioning supported)
Event coordinationRedis (pub/sub for job coordination)
MonitoringPrometheus + Grafana (metrics exported natively)
DeploymentHelm chart provided, ArgoCD recommended
NVIDIA Driver565.57+ / CUDA 12.4+

Performance

Throughput benchmarks at ~20 lines per page. Actual results depend on document complexity, page size, and lines per page.

Workstation (single GPU, RTX 3090)

MetricStandard HTRSuper Models
Per page (warm)~3 s~5 s
Daily (8 h)~9,600 pages~5,700 pages

Enterprise (per A100)

MetricStandard HTRSuper Models
Per page (warm)~2 s~4 s
Daily per GPU (8 h)~14,000 pages~7,000 pages
8× A100 cluster~100,000 pages/day~56,000 pages/day

Cold start adds 5–10 seconds for model loading. Subsequent pages in the same batch use the warm throughput above.

Questions about deployment?

For hardware sizing, trial installations, and integration support — contact us.