Handwriting recognition API for developers
Integrate AI-powered text recognition into your application. REST API with Python, JavaScript, and cURL support. Process handwritten and printed documents at scale.
Used by archives, libraries, and research institutions worldwide
import requests
TOKEN = "your-bearer-token"
API = "https://transkribus.eu/processing/v2/processes"
# Start a transcription job
resp = requests.post(API,
headers={"Authorization": f"Bearer {TOKEN}"},
json={
"config": {"modelId": 38230},
"image": {
"imageUrl": "https://your-archive.org/scan.jpg"
}
}
)
job = resp.json()
print(f"Job started: {job['processId']}")Integrate in four steps
From API key to structured text output in minutes.
Authenticate
import requests
API_KEY = "your_api_key"
session = requests.Session()
session.headers["Authorization"] = f"Bearer {API_KEY}"Get your API key from the Transkribus dashboard and initialize the client.
Upload
with open("document.pdf", "rb") as f:
resp = session.post(
"https://transkribus.eu/api/v2/uploads",
files={"file": f}
)
upload_id = resp.json()["uploadId"]Upload scanned documents as PDF, JPEG, PNG, or TIFF. Batch upload supported.
Transcribe
resp = session.post(
"https://transkribus.eu/api/v2/jobs",
json={"docId": upload_id, "modelId": 46003}
)
job_id = resp.json()["jobId"]Choose a recognition model and start processing. Monitor progress via webhooks or polling.
Export
resp = session.get(
f"https://transkribus.eu/api/v2/jobs/{job_id}"
)
pages = resp.json()["result"]["pages"]Download results as PAGE XML, ALTO XML, plain text, PDF, or TEI.
API reference
Full REST API with client libraries for Python, Node.js, and direct HTTP access.
/v2/uploadsParameters
filebinaryrequiredcollection_idinteger<span class="code-keyword">import</span> requests
response = requests.post(
<span class="code-string">"https://transkribus.eu/api/v2/uploads"</span>,
headers={<span class="code-string">"Authorization"</span>: <span class="code-string">"Bearer sk_..."</span>},
files={<span class="code-string">"file"</span>: <span class="code-keyword">open</span>(<span class="code-string">"document.pdf"</span>, <span class="code-string">"rb"</span>)}
){
<span class="code-string">"id"</span>: <span class="code-number">12345</span>,
<span class="code-string">"status"</span>: <span class="code-string">"uploaded"</span>,
<span class="code-string">"pages"</span>: <span class="code-number">3</span>,
<span class="code-string">"created_at"</span>: <span class="code-string">"2024-01-15T10:30:00Z"</span>
}What developers build with the API
From batch processing pipelines to intelligent search — see how teams integrate Transkribus.
Batch processing pipelines
Process thousands of document pages automatically. Upload archives, trigger recognition, and collect structured output — all via script.
for doc in archive:
resp = session.post(API + "/uploads", files={"file": doc})
uid = resp.json()["uploadId"]
session.post(API + "/jobs", json={"docId": uid})Full-text search indexing
Make handwritten archives searchable. Transcribe documents and feed the output into Elasticsearch, Solr, or your custom search index.
resp = session.get(f"{API}/jobs/{job_id}")
text = resp.json()["result"]["text"]
es.index(index="archives", body={
"content": text,
"source": doc_meta
})Structured data extraction
Extract tables, fields, and named entities from historical documents. Feed structured data into databases or spreadsheets.
resp = session.post(API + "/jobs",
json={"docId": uid, "modelId": FIELD_MODEL})
result = session.get(f"{API}/jobs/{resp.json()['jobId']}")
for field in result.json()["result"]["fields"]:
db.insert(field["name"], field["value"])Custom ML pipelines
Train custom recognition models for specialized material. Integrate model training and evaluation into your ML workflow.
resp = session.post(API + "/models/train",
json={"name": "Colonial Spanish 1600",
"gtCollectionId": gt_id,
"baseModelId": BASE_MODEL_ID})
print(resp.json()["modelId"])How we compare
Metagrapho vs. other HTR/OCR APIs
General-purpose OCR APIs are built for printed text. Metagrapho is purpose-built for handwriting recognition, including historical scripts that other services cannot read.
| Feature | Metagrapho | Google / AWS / Azure |
|---|---|---|
| Modern handwriting recognition | Yes | Limited |
| Historical documents (pre-1900) | Yes | No |
| Custom model training | Yes | Limited |
| 300+ specialised HTR models | Yes | No |
| EU-hosted processing | Yes | Partial |
| GDPR-compliant by default | Yes | Partial |
| Credit-based pricing (no per-call fees) | Yes | No |
Comparison based on publicly available documentation as of 2025. Google Cloud Vision, AWS Textract, and Azure AI Document Intelligence offer general OCR with some handwriting support but no specialised HTR models or historical document capabilities. AWS and Azure offer limited custom training for printed forms. All three offer EU region options with additional configuration.
Start building with the Metagrapho API
Get your API credentials and start processing documents today. Organisation plans available for production workloads with dedicated throughput and support.
50 free credits per month. No credit card required.