Extract structured data from any document

Research and digitization projects need more than readable text — they need structured data. Names, dates, places, amounts, relationships. Transkribus combines AI text recognition with table extraction, field models, and entity tagging to turn handwritten and printed documents into structured datasets ready for analysis, databases, and spreadsheets.

Start extracting free See it in action

3Extraction methods

CSV + XMLExport formats

No codingRequired

TablesRows, columns, cells into spreadsheets

FieldsExtract named fields from forms

EntitiesTag persons, places, dates

TrainableCustom models for your layouts

Table recognition

Detect rows, columns, and cell boundaries in tabular documents — parish registers, census records, tax rolls, ledgers. Each cell becomes a data point. Export the entire table as a spreadsheet or XML.

Field extraction

Train models to find and extract specific fields from structured documents — dates, names, reference numbers, amounts. Works on forms, index cards, certificates, and any document with repeating structure.

Entity tagging

Tag persons, places, dates, and custom entities in running text. Tags become searchable metadata. Export as TEI-XML or filter tagged entities as structured data for your research database.

See table extraction in action

Transkribus detects the grid structure of tabular records and extracts each cell into a structured spreadsheet — ready for your database, genealogy software, or research pipeline.

Extracted Table Data

Institution	Town	Amount	Object	Date	Disposition
Franklin College (6)	New Athen, O.		General	3/23/16
Fargo College (3)	Fargo, N.D.	100,000	Endowment	4/27/16	Gen 1914, 5/18/16
Franklin Academy (2)	Franklin, Neb.	5,000	Library Building	8/3/16	Gen 1914, 8/7/16
Fessenden Acad. & Ind. School	Fessenden, Fla.		General	12/22/16
Ferris Institute (2)	Big Rapids, Mich.	50,000	Buildings	2/12/17
Findlay College (2)	Findlay, O.	100,000	Endowment	5/23/17	Gen 1914, 5/28/17
Fairmount College	Wichita, Kan.	200,000	Endowment	6/7/17	6/14/17
Franklin College	Franklin, Ind.	50,000	General	9/13/17	Gen 1914, 9/17/17
Fisk University	Nashville, Tenn.	1,000,000	Endowment	6/14/18
Friends University	Wichita, Kan.	200,000	Endowment	6/20/18	Gen 1914, 8/8/18

See field extraction in action

Field Models detect and extract specific data fields from documents — names, dates, locations, references — precisely and at scale. Train on your own form layouts for best results.

Extracted Fields

Intelligent Document Processing

From document images to research databases

The typical workflow: upload document scans, run AI text recognition to get machine-readable text, then apply table recognition or field extraction to pull structured data. Export as CSV for spreadsheets, as XML for databases, or feed directly into your NLP pipeline for named entity recognition, topic modelling, or network analysis.

Export tables and fields as CSV, Excel, or structured XML

Entity tags export as TEI-XML with coordinates linking to source images

REST API access for automated OCR data extraction pipelines

Batch processing for large document collections

Trainable

Train extraction models on your specific document type

Like text recognition models, table and field extraction models can be trained on your specific documents. If your records have a unique layout — a particular style of parish register, a regional census format, a type of index card — you can train a custom model that understands that structure and extracts data from handwritten documents with high accuracy.

Custom table models for non-standard layouts and complex registers

Custom field models for specific form types and index cards

No coding — training happens in the visual interface

Models improve as you add more training data

Share trained models with your team or the community

Use Cases

What researchers extract with Transkribus

Institutions and researchers worldwide use Transkribus to extract structured data from historical documents at scale. From genealogy databases built from parish registers to economic research based on colonial trade ledgers — the same extraction tools power hundreds of different research projects.

Parish registers → names, dates, relationships for genealogy databases

Census records → demographic data for population studies

Tax rolls and ledgers → economic data for historical analysis

Index cards and catalogs → structured metadata for library systems

Correspondence → tagged persons and places for network analysis

Handwriting specialists

The only IDP platform built for handwriting

Most intelligent document processing platforms focus on modern printed forms — invoices, receipts, contracts. Transkribus is different: it was built from the ground up for handwritten and historical documents. Our AI models handle centuries of handwriting styles, degraded paper, inconsistent layouts, and mixed scripts that defeat general-purpose OCR data extraction tools.

500,000+ users processing handwritten documents

300+ public AI models for historical handwriting

Works across 100+ languages and all major scripts

EU-hosted and GDPR-compliant — your documents stay in Europe

Start extracting data from your documents

Create a free account. Upload your scans, run text recognition, and extract structured data — no coding, no ML expertise needed.

Get started free Book a consultation

300+Public AI models