Skip to content
  • Pricing
Success StoryResearchDanish

Enevældens Nyheder Online: An award-winning project to create digital versions of historical newspapers

Enevældens Nyheder Online: An award-winning project to create digital versions of historical newspapers

If you wanted to study social control under absolutist rule, there are many historical sources that could be of interest. Administrative records, land registers, and royal decrees are just some of the documents that researchers in the field often use for their work.

But Professor Johan Heinsen from Aalborg University had a different idea. He decided to study how social control in Denmark and Norway increased between 1750 and 1850 by analysing the reports of prison escapes, military desertions, and evasions from household masters in newspapers from the period. This innovative approach proved to be so groundbreaking that won the prestigious 'Original Idea of the Year' award from the Independent Research Fund Denmark.

We talked to Johan about transcribing complex, historical newspapers with Transkribus and the transformative effect that digitising whole corpora can have for research in the digital humanities.

johan-heinsen-holder-tale-lA6ngere-vA6kJohan received the prestigious 'Original Idea of the Year' award for his innovative research. © Claus Lillevang and Independent Research Fund Denmark

The paradox of social control: When disappearance was possible

Johan’s core research question focused on social control and the gradual establishment of the modern Danish state, which he studied by examining a broad spectrum of escapes between 1750 and 1850. During this century of transition, servants, convicts, and soldiers tried to evade the grip of power, which was tightening considerably. By analysing the records of these escape attempts, the team could map out precisely how and when the state developed the infrastructure and mechanisms necessary to exert control over its population.

The findings illustrate a fundamental shift in the reality of life during the period. “As modern people, we think that we can’t just disappear. If people escape from prison today, they are almost always found again. But at the beginning of the period of my investigation, it was actually possible to disappear on Danish soil and become someone else. And by the end of the period in 1850, that became almost impossible,” Professor Heinsen explained.

Newspapers are valuable sources for historical information about escape attempts, as they were the primary medium for publishing "wanted" notices and advertisements requesting information on fugitives. About 15,000 separate runaways were advertised across the period, with many others described in early genres of crime reporting. The challenge, however, lay in access. While many newspapers from the period had been digitised and transcribed using traditional Optical Character Recognition (OCR) methods, the word accuracy often hovered around 50 per cent—a rate that is simply not high enough to be of any use for large-scale searching or keyword analysis. For the project to succeed, Johan and his team would have to find a way of transcribing large volumes of newspapers with a higher degree of accuracy.

Screenshot 2026-05-27 092209The project pulled together a large collection of newspapers across centuries, many of which are in the holdings of the Royal Danish Library. © Royal Danish Library

Using Transkribus for complex print material

To achieve this, Johan and the team decided to explore a more sophisticated approach. They already had experience with Transkribus for historical Danish handwriting and felt that it might be better placed to deal with the often poor quality scans and the complex, noisy layouts of the print publications they wanted to transcribe.

The team trained a custom model tailored specifically for their Danish historical documents. The initial results were striking:

“In early 2022, we designed a prototype model based on about 100,000 words from eighteenth-century pages in the Danish collection. To our surprise, its performance exceeded expectations. On equivalent validation material, the model performed with an error rate below 1 per cent [at] character level, translating to a word accuracy above 95 per cent. That means—in almost all cases—a text that is legible as is.”

This high-accuracy prototype provided the necessary foundation. With the help of students, the team then added further training data to the model, teaching it to transcribe an even wider range of texts, accommodating variations in typefaces and image quality across the over 550,000 pages of the corpus.

Screenshot 2026-05-27 113718The newspapers first had to be segmented into regions, columns, and lines, before the text recognition could take place. © Transkribus

Solving the layout challenge with Field Models

When automatically transcribing newspapers, it is often not the text that is the challenge but the layout. Historical newspapers feature narrow columns, advertisements, and complex arrangements of text, and Transkribus first has to be taught how to recognise these different layout elements—a process called segmentation—before it can start transcribing the text they contain.

Johan and his team solved this segmentation challenge by training Field Models that could accurately recognise the columns, headers, and distinct text regions on the page. “Early on, we had baselines models that worked for one newspaper at a time, typically even only for a period, or specific types of pages,” Professor Heinsen explained. “[We] had to do a lot of manual correction. Newspapers with narrow column separators were impossible, [which] guided our priorities, as we went for papers that fit the model. Then with Field Models, we could [choose whichever newspapers we wanted]. And the training data was super easy to create, so it felt like the last missing piece.

As the ultimate goal was to allow researchers to search the material by individual article, rather than by page, the team decided to develop an additional layer of intelligence. Using the transcripts from Transkribus, they trained a BERT model, a type of language model that could make predictions on whether a line was the first line of a text, a header, or the last line of a text. This helped them to automatically separate pages of continuous text into discrete, searchable articles.

 

Screenshot 2026-05-27 093638

On the ENO website, users can find all the newspapers used in the project, and perform a full text search on the entire collection. © Enevældens Nyheder Online

 

Sharing the corpus online

The plan was always to publish the resulting data online for other researchers to share and build upon. On the Enevældens Nyheder Online project website, users can search each publication for different keywords and access the high-quality transcriptions, and the team is continually updating the site with new publications.

With a historical corpus spanning 100 years, Johan had enough data about prison escapes to study how the individual’s relationship with the state changed over the period. His innovative research, and the methods used to carry it out, recently won him the ‘Original Idea of the Year’ award by Independent Research Fund Denmark.

The foundation explained their reasoning: "The project is an excellent example of the revolution that digital methods are currently triggering in humanities research. It is an original proposal for writing history by asking new questions that focus on the breakthroughs taking place in the digital humanities, and on a theme—social control—that is also central in the present."

For Johan, the award shows the need to be innovative in research: "It's a great honour for me and my team. In a way, it's an acknowledgement that it's important to experiment."

The dataset is also already being used for other research purposes, demonstrating its value as a shared resource. For example, team member Camilla Bøgeskov is currently studying the advertisements in the corpus to investigate consumption patterns during the period, a line of inquiry that would have been impossible without the full-text searchability provided by the project.

 

johan-heinsen-stA5r-med-prisJohan had the honour of receiving his award from Queen Mary of Denmark. © Claus Lillevang and Independent Research Fund Denmark 

A true community highlight

This award-winning project shows what you can achieve when applying modern tools to historical resources. By digitising an entire corpus, such as the Enevældens Nyheder Online, researchers can search for relevant data at a fraction of the speed of manual transcription, making it easier to answer the big historical questions of our past, and their relevance to the present day.

We would like to congratulate Johan and his team once again on their achievement, and thank them for sharing their experiences with the community.

Related Articles

AI Made for German: Unlocking German-language archives with Transkribus
Success StoryResearchGerman+1Archives

AI Made for German: Unlocking German-language archives with Transkribus

Most AI transcription tools are designed with English-language material as their default setting. They excel at modern printed text and increasingly handle contemporary handwriting, but historical...

Unlocking the secrets of the New Spain Fleets with Patricia Murrieta-Flores
Success StoryResearchSpanish+417th centuryArchives18th century16th century

Unlocking the secrets of the New Spain Fleets with Patricia Murrieta-Flores

Historians of colonial Latin America don’t suffer from a lack of primary sources. Across archives in Europe and the Americas lie millions of pages documenting the colonial maritime routes known as...

Reading the unreadable: How 4 projects deciphered early modern documents with Transkribus
Success StoryResearch17th century+318th century16th centuryEarly Modern

Reading the unreadable: How 4 projects deciphered early modern documents with Transkribus

The early modern period marks the transition from the medieval world to the industrialised modern era. It laid the groundwork for many of today's societal and economic structures, such as...