Skip to content
  • Pricing

Your newspaper archive, fully searchable.

Millions of historical newspaper pages sit in archives — scanned but unsearchable. Transkribus reads the text, understands the layout, and turns every article, headline, and classified into structured, searchable data. From a single title to an entire national collection.

Historical newspaper layout segmentation
30M+newspaper pages processed
15M+pages in Zeitpunkt.NRW alone
100+public print & Fraktur models

The output

What you end up with after processing your newspaper collection.

Full-text search across newspaper articles

Searchable full text

Every article, headline, advertisement, and classified ad on every page — recognized and indexed. Search by name, date, keyword, or phrase across the entire collection.

Automatic newspaper layout segmentation

Structured layout data

The AI segments multi-column pages into individual content regions — articles, headlines, ads, captions. Each region is tagged and exported separately, so downstream systems can work with articles, not raw page dumps.

Published newspaper collection as Transkribus Site

A browsable online collection

Processed newspapers can be published as a Transkribus Site — a hosted, searchable interface for your collection. No development needed. Branded with your institution's identity.

Case study

Zeitpunkt.NRW: 20 million newspaper pages for North Rhine-Westphalia

The Zeitpunkt.NRW project is digitizing the complete historical newspaper holdings of North Rhine-Westphalia — 20 million pages spanning centuries of regional history. Transkribus processes the full-text recognition at scale, turning scanned pages into searchable text that is published through the state's digital newspaper portal.
20 million newspaper pages processed with Transkribus
Centuries of regional newspapers from NRW libraries
Full-text search available through the Zeitpunkt.NRW portal
Zeitpunkt.NRW — 20 million digitized newspaper pages

Case study

NewsEye: Improving newspaper text recognition with the National Library of Finland

The EU-funded NewsEye project (Horizon 2020) brought together the National Library of Finland with computer scientists and digital humanities researchers to improve text recognition on historical newspapers. Working with 2.5 million pages across 10 Finnish newspaper titles — half of them in Swedish, many in Gothic typefaces — the team used Transkribus to train custom models that improved recognition accuracy by an average of 10 percentage points over legacy OCR methods.
2.5 million newspaper pages (1771–1914), 10 titles
Gothic font recognition improved by 10 percentage points on average
Enhanced search across Finland's national digital library
NewsEye project — historical newspaper digitization

The approach

From scans to structured text — how institutions digitize newspapers at scale

Newspaper digitization follows a proven workflow: upload your scans, select from 100+ pre-trained print and Fraktur models (or train your own on your specific typefaces), run batch text recognition with automatic layout analysis, and export structured results. The AI handles multi-column layouts, mixed content types, and historical typefaces — including Fraktur, blackletter, and early modern print.
100+ public models for Fraktur, blackletter, and historical print
Automatic layout segmentation for multi-column newspaper pages
Batch processing for thousands of pages — no manual intervention
Export as searchable PDF, plain text, or structured XML (ALTO, PAGE)
Newspaper layout analysis and text recognition

Guides and models

Tutorials, AI models, and related use cases for newspaper digitization.

How to Digitise Newspapers with Transkribus

Step-by-step guide: scanning, layout segmentation, model selection, and text recognition for historical newspapers.

Guide

AI Models for Fraktur, Kurrent & Sütterlin

The most common historical German print and handwriting scripts — and the public models that can read them.

Models

Archival Backlog Reduction

How archives use AI to process millions of unsearchable pages — the same approach that applies to newspaper collections.

Use Case

Ready to make your newspaper archive searchable?

Talk to our team about your collection. We'll help you find the right models, plan the workflow, and estimate the scope.

30M+newspaper pages processed
100+public print models
EU-hostedGDPR-compliant