Methodology Guide

How to Include Handwritten Text Recognition in Your Grant Proposal

A step-by-step guide to making the case for HTR in your research funding application — from methodology description and budget justification to references and data management planning. For DFG, ERC, NEH, AHRC, SNSF, FWF, and other research funders.

12 min read

1.Why include HTR in your methodology

Handwritten text recognition (HTR) has matured from an experimental technique into an established research method used across the humanities and social sciences. Hundreds of peer-reviewed publications now cite AI-assisted transcription as a core part of their workflow, and major funding bodies — including the ERC, DFG, NEH, AHRC, SNSF, and FWF — have awarded grants to projects that rely on it.

The methodological case for HTR rests on three pillars:

Efficiency. Automated transcription processes pages in seconds rather than the 15–60 minutes required for manual transcription, making large-scale corpus work feasible within typical grant timelines.
Reproducibility. A trained model produces identical output on the same input every time. This deterministic behaviour is a significant advantage over manual transcription, where inter-annotator agreement is imperfect.
Measurability. Recognition quality is quantified using Character Error Rate (CER), an objective metric computed on held-out test data. This gives reviewers — and the research team — a concrete, verifiable quality indicator.

Including HTR in your methodology signals that your project leverages state-of-the-art digital methods while maintaining rigorous quality control. It also demonstrates awareness of scalability constraints that often concern reviewers evaluating large documentary corpora.

2.Describing the Transkribus workflow

Grant proposals require a clear, technically precise description of your tools and methods. Transkribus is an AI-powered platform for handwritten and printed text recognition, developed and operated by READ-COOP SCE, a European cooperative with 250+ institutional members including archives, libraries, and universities.

The standard workflow consists of four stages:

Upload. Document images (scans, photographs, or PDFs) are uploaded to the platform. Transkribus accepts all common image formats and handles batch uploads for large collections.
Text recognition. An AI model — selected from 300+ publicly available models or custom-trained on your material — performs automatic transcription. Layout analysis detects text regions, baselines, and structural elements such as tables.
Manual correction. The research team reviews and corrects the automated output in a built-in editor. This step produces Ground Truth data that can also be used to further train and improve models.
Export. Corrected transcriptions are exported in standard formats (PAGE XML, ALTO XML, TEI, plain text, searchable PDF) for integration with databases, repositories, or further analysis pipelines.

For projects handling sensitive or restricted-access material, Transkribus offers on-premises deployment: the entire platform runs on your institution's own infrastructure, ensuring that documents never leave your servers. This is particularly relevant for archives with legal restrictions on data transfer.

3.Calculating time and cost

Accurate budget planning is essential for a credible grant proposal. Transkribus uses a credit-based system for text recognition, where the number of credits consumed depends on page count and the type of processing applied.

Estimating recognition costs:

Credits are consumed per page for text recognition, layout analysis, and related processing tasks.
Individual and organisation plans are available at different tiers, allowing you to match your plan to the project's scale.
Volume discounts are available for large institutional projects — contact the Transkribus team for a tailored quote.

Estimating manual correction time:

The time required for post-correction depends on material difficulty and target accuracy. As a rough guide:

Well-recognised material (CER below 5%): 2–5 minutes per page for verification and light correction.
Challenging material (CER 5–10%): 5–15 minutes per page for more substantial correction.
Very difficult material (CER above 10%): consider investing in custom model training before full-scale processing — this typically reduces per-page correction time significantly.

A pilot study on 50–100 representative pages will give you concrete correction-time estimates for your specific material. Include these figures in your proposal as preliminary data.

4.Data management and archival standards

Most research funders now require a data management plan (DMP) as part of the proposal. Transkribus supports compliance with FAIR data principles and long-term preservation standards.

Export formats:

PAGE XML — the de facto standard for layout and transcription data in document analysis research. Preserves baseline coordinates, region types, and reading order.
ALTO XML — widely used in digital library infrastructure and compatible with METS/IIIF workflows.
TEI XML — the standard encoding for digital scholarly editions in the humanities.
Plain text and searchable PDF — for downstream analysis, full-text search, and human-readable output.

FAIR compliance:

Findable: Full-text search across collections; structured metadata in XML exports.
Accessible: Data can be exported at any time in open formats; no proprietary lock-in.
Interoperable: Standard XML schemas ensure compatibility with digital library systems, annotation tools, and text analysis software.
Reusable: Open formats with embedded metadata support long-term reuse and re-analysis.

Long-term preservation: Export your results for deposit in institutional repositories, domain-specific archives, or data centres. The open, non-proprietary formats ensure that data remains accessible independently of any single platform.

5.Model training and accuracy

Recognition accuracy is central to any HTR methodology section. Transkribus measures quality using Character Error Rate (CER): the proportion of characters that differ between the model's output and a manually verified reference transcription.

What reviewers should expect:

Public models on well-suited material: 2–5% CER (95–98% of characters correct).
Challenging scripts or degraded material with custom training: 5–10% CER.
CER is always computed on a held-out test set (typically 10–15% of Ground Truth data not used during training), ensuring an unbiased accuracy estimate.

Custom model training: For specialised material — unusual scripts, historical orthographies, or degraded documents — Transkribus allows you to train a custom model on your own Ground Truth data. Training typically requires 25–75 pages of manually transcribed material, depending on the complexity of the script.

For a detailed explanation of CER and how to report it in your proposal, see our dedicated guide: Character Error Rate (CER) Explained.

6.Collaboration and scalability

Research projects rarely operate in isolation. Transkribus supports collaborative workflows at every scale, from small teams to large multi-institutional initiatives.

Crowdsourcing: For projects that involve volunteer transcribers or citizen scientists, Transkribus provides built-in crowdsourcing capabilities. Volunteers contribute corrections through a streamlined interface, generating Ground Truth that improves model accuracy over time. See our guide on crowdsourcing transcription for details on setting up collaborative transcription campaigns.

API access: For projects requiring automated pipelines or integration with existing research infrastructure, the Transkribus API provides programmatic access to all recognition and processing functions. This enables batch processing, custom workflows, and integration with institutional digital library systems.

Scaling from pilot to full project:

Pilot phase (months 1–3): Process 50–100 representative pages, measure CER, estimate correction time.
Model refinement (months 3–6): If needed, train a custom model on pilot Ground Truth to improve accuracy.
Full processing (months 6+): Apply the optimised model to the entire corpus. Batch processing handles thousands of pages per day.

This phased approach is methodologically sound and demonstrates to reviewers that you have a realistic, evidence-based plan for scaling.

7.Sample methodology text

The following paragraph can be adapted for the methodology section of your grant proposal. Replace the bracketed fields with your project-specific details.

Handwritten text recognition will be performed using Transkribus (transkribus.org), an AI-powered platform developed and operated by the European cooperative READ-COOP SCE (250+ institutional members). The platform employs deep learning architectures trained on PAGE XML Ground Truth data to recognise historical handwriting with measurable accuracy. A pilot study on [N] representative pages of [material description] achieved a character error rate of [X]%, computed on a held-out test set comprising [Y]% of the Ground Truth corpus, confirming the feasibility of automated recognition for this material. During the project, approximately [N] pages of [script type] material from [archive/collection] will be processed using [a public model / a custom-trained model]. Recognition quality will be validated continuously by measuring CER on held-out test data. Manual post-correction by [team members / student assistants] will ensure transcription quality meets the project's standards. All outputs will be exported as [PAGE XML / TEI XML / ALTO XML] for deposit in [repository name] and integration with [database / analysis pipeline]. Data will be stored and processed on Transkribus servers in Austria (EU), in compliance with GDPR. [For sensitive material: on-premises deployment ensures documents remain on institutional infrastructure.]

8.References and further reading

Key publications:

Muehlberger, G. et al. (2019). 'Transforming scholarship in the archives through handwritten text recognition.' Journal of Documentation, 75(5), pp. 954–976.
Kahle, P. et al. (2017). 'Transkribus — A Service Platform for Transcription, Recognition and Retrieval of Historical Documents.' 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), 2017.
'Handwritten Text Recognition for Historical Documents.' Open Research Europe, 5:16 (2025). open-research-europe.ec.europa.eu/articles/5-16

Project provenance:

EU Horizon 2020 READ project (grant no. 674943, 2016–2019) — the research programme under which Transkribus was developed.
READ-COOP SCE — the European cooperative that now operates and governs Transkribus, with 250+ institutional co-owners.

Infrastructure you can cite with confidence.

Transkribus is research infrastructure built and governed by the institutions that use it — a strong sustainability argument for any grant proposal.

Hosted in Austria, EU

All processing on our own servers. GDPR-compliant. No third-party cloud dependencies.

Cooperative, not a startup

250+ archives, libraries, and universities as co-owners. Built for decades, not a VC exit.

Your data stays yours

Full ownership. Export and delete anytime. No third-party data sharing. On-premises option available.

Start your pilot study today

Test Transkribus on your source material before you write the proposal. Include real accuracy data as preliminary evidence — the strongest argument you can make to reviewers.

Start for free Talk to us about institutional plans

50 free credits every month · No credit card required

200M+Pages processed

500K+Users worldwide

500+Universities