Your newspaper archive, fully searchable.

Millions of historical newspaper pages sit in archives — scanned but unsearchable. Transkribus reads the text, understands the layout, and turns every article, headline, and classified into structured, searchable data. From a single title to an entire national collection.

Book a consultation Read the how-to guide

Historical newspaper layout segmentation

30M+newspaper pages processed

15M+pages in Zeitpunkt.NRW alone

100+public print & Fraktur models

Full-text search across newspaper articles

Searchable full text

Every article, headline, advertisement, and classified ad on every page — recognized and indexed. Search by name, date, keyword, or phrase across the entire collection.

Structured layout data

The AI segments multi-column pages into individual content regions — articles, headlines, ads, captions. Each region is tagged and exported separately, so downstream systems can work with articles, not raw page dumps.

Published newspaper collection as Transkribus Site

A browsable online collection

Processed newspapers can be published as a Transkribus Site — a hosted, searchable interface for your collection. No development needed. Branded with your institution's identity.

Case study

Zeitpunkt.NRW: 20 million newspaper pages for North Rhine-Westphalia

The Zeitpunkt.NRW project is digitizing the complete historical newspaper holdings of North Rhine-Westphalia — 20 million pages spanning centuries of regional history. Transkribus processes the full-text recognition at scale, turning scanned pages into searchable text that is published through the state's digital newspaper portal.

20 million newspaper pages processed with Transkribus

Centuries of regional newspapers from NRW libraries

Full-text search available through the Zeitpunkt.NRW portal

Visit Zeitpunkt.NRW

Zeitpunkt.NRW — 20 million digitized newspaper pages

Case study

NewsEye: Improving newspaper text recognition with the National Library of Finland

The EU-funded NewsEye project (Horizon 2020) brought together the National Library of Finland with computer scientists and digital humanities researchers to improve text recognition on historical newspapers. Working with 2.5 million pages across 10 Finnish newspaper titles — half of them in Swedish, many in Gothic typefaces — the team used Transkribus to train custom models that improved recognition accuracy by an average of 10 percentage points over legacy OCR methods.

2.5 million newspaper pages (1771–1914), 10 titles

Gothic font recognition improved by 10 percentage points on average

Enhanced search across Finland's national digital library

Read about the NewsEye project

NewsEye project — historical newspaper digitization

The approach

From scans to structured text — how institutions digitize newspapers at scale

Newspaper digitization follows a proven workflow: upload your scans, select from 100+ pre-trained print and Fraktur models (or train your own on your specific typefaces), run batch text recognition with automatic layout analysis, and export structured results. The AI handles multi-column layouts, mixed content types, and historical typefaces — including Fraktur, blackletter, and early modern print.

100+ public models for Fraktur, blackletter, and historical print

Automatic layout segmentation for multi-column newspaper pages

Batch processing for thousands of pages — no manual intervention

Export as searchable PDF, plain text, or structured XML (ALTO, PAGE)

How to digitise newspapers with Transkribus