+ Trolls and water spirits – transcribing Swedish folklore records with Handwritten Text Recognition

It’s time to hear about some remarkable new results with Handwritten Text Recognition (HTR) technology – this time from the Institute for Language and Folklore in Sweden.

The Institute holds a collection of more than 30,000 pages of folklore records written by the Swedish folklorist Carl-Martin Bergstrand between the 1920s and the 1960s.  Dr Fredrik Skott, an associate professor and research archivist at the Institute, has helped to train a HTR model to automatically transcribe these documents.

Dr Skott used our Transkribus platform to transcribe around 20,000 words from pages which were written by Bergstrand in the early 1930s.  A couple of example pages can be seen below, which contain Bergstrand’s records of an interview with August Svensson (b. 1842) where Svensson talked about water spirits and trolls.

Transcripts and images of these documents were processed by CITlab HTR – a form of HTR technology which uses Neural Networks to recognise handwriting.  The resulting HTR model can automatically produce transcripts of pages written by Bergstrand with an average Character Error Rate (CER) of 7.0%.  When a dictionary is integrated into the recognition process, the CER can be as low as 5.5%.

Dr Skott is excited about the possibilities: ‘Previously, I always thought that future generations would have difficulty reading the folklore collections. Now I know that they will find it easier to read the text than the present generation does. In short, the results of our tests with Transkribus are amazing. After manually transcribing just 150 pages, our HTR model now reads the folklore records better than many of our visitors do’.

The Institute for Language and Folklore is now working with these transcriptions to produce a digital map of myths and legends that they plan to launch in autumn 2017.

Start unlocking the past with Transkribus

Leverage the power of Transkribus to get the most out of your historical documents.