Skip to content
  • Pricing

Extract structured data from any document

Research and digitization projects need more than readable text — they need structured data. Names, dates, places, amounts, relationships. Transkribus combines AI text recognition with table extraction, field models, and entity tagging to turn handwritten and printed documents into structured datasets ready for analysis, databases, and spreadsheets.

3Extraction methods
CSV + XMLExport formats
No codingRequired
TablesRows, columns, cells into spreadsheets
FieldsExtract named fields from forms
EntitiesTag persons, places, dates
TrainableCustom models for your layouts

Three ways to extract data from documents

Different document types need different extraction methods. Transkribus offers all three — and they can be combined.

Table recognition on historical document

Table recognition

Detect rows, columns, and cell boundaries in tabular documents — parish registers, census records, tax rolls, ledgers. Each cell becomes a data point. Export the entire table as a spreadsheet or XML.

Field extraction from structured forms

Field extraction

Train models to find and extract specific fields from structured documents — dates, names, reference numbers, amounts. Works on forms, index cards, certificates, and any document with repeating structure.

Entity tagging in transcribed text

Entity tagging

Tag persons, places, dates, and custom entities in running text. Tags become searchable metadata. Export as TEI-XML or filter tagged entities as structured data for your research database.

See table extraction in action

Transkribus detects the grid structure of tabular records and extracts each cell into a structured spreadsheet — ready for your database, genealogy software, or research pipeline.

Document with detected table structure
Extracted Table Data
InstitutionTownAmountObjectDateDisposition
Franklin College (6)New Athen, O.General3/23/16
Fargo College (3)Fargo, N.D.100,000Endowment4/27/16Gen 1914, 5/18/16
Franklin Academy (2)Franklin, Neb.5,000Library Building8/3/16Gen 1914, 8/7/16
Fessenden Acad. & Ind. SchoolFessenden, Fla.General12/22/16
Ferris Institute (2)Big Rapids, Mich.50,000Buildings2/12/17
Findlay College (2)Findlay, O.100,000Endowment5/23/17Gen 1914, 5/28/17
Fairmount CollegeWichita, Kan.200,000Endowment6/7/176/14/17
Franklin CollegeFranklin, Ind.50,000General9/13/17Gen 1914, 9/17/17
Fisk UniversityNashville, Tenn.1,000,000Endowment6/14/18
Friends UniversityWichita, Kan.200,000Endowment6/20/18Gen 1914, 8/8/18

See field extraction in action

Field Models detect and extract specific data fields from documents — names, dates, locations, references — precisely and at scale. Train on your own form layouts for best results.

Document with detected fields
Extracted Fields

Intelligent Document Processing

From document images to research databases

The typical workflow: upload document scans, run AI text recognition to get machine-readable text, then apply table recognition or field extraction to pull structured data. Export as CSV for spreadsheets, as XML for databases, or feed directly into your NLP pipeline for named entity recognition, topic modelling, or network analysis.
Export tables and fields as CSV, Excel, or structured XML
Entity tags export as TEI-XML with coordinates linking to source images
REST API access for automated OCR data extraction pipelines
Batch processing for large document collections

Trainable

Train extraction models on your specific document type

Like text recognition models, table and field extraction models can be trained on your specific documents. If your records have a unique layout — a particular style of parish register, a regional census format, a type of index card — you can train a custom model that understands that structure and extracts data from handwritten documents with high accuracy.
Custom table models for non-standard layouts and complex registers
Custom field models for specific form types and index cards
No coding — training happens in the visual interface
Models improve as you add more training data
Share trained models with your team or the community

Use Cases

What researchers extract with Transkribus

Institutions and researchers worldwide use Transkribus to extract structured data from historical documents at scale. From genealogy databases built from parish registers to economic research based on colonial trade ledgers — the same extraction tools power hundreds of different research projects.
Parish registers → names, dates, relationships for genealogy databases
Census records → demographic data for population studies
Tax rolls and ledgers → economic data for historical analysis
Index cards and catalogs → structured metadata for library systems
Correspondence → tagged persons and places for network analysis

Handwriting specialists

The only IDP platform built for handwriting

Most intelligent document processing platforms focus on modern printed forms — invoices, receipts, contracts. Transkribus is different: it was built from the ground up for handwritten and historical documents. Our AI models handle centuries of handwriting styles, degraded paper, inconsistent layouts, and mixed scripts that defeat general-purpose OCR data extraction tools.
500,000+ users processing handwritten documents
300+ public AI models for historical handwriting
Works across 100+ languages and all major scripts
EU-hosted and GDPR-compliant — your documents stay in Europe

Start extracting data from your documents

Create a free account. Upload your scans, run text recognition, and extract structured data — no coding, no ML expertise needed.

300+Public AI models
CSV + XMLExport formats
EU-hostedGDPR-compliant