My first encounter with Transkribus was driven by envy. In 2014 I helped to draft an unsuccessful Horizon 2020 proposal at Trinity College Dublin for a call that was won by the READ consortium. I subsequently became involved in a major transcription project for the Irish Manuscripts Commission and attended READ’s 2018 user conference in Innsbruck to establish (1) did Transkribus work and (2) expecting that it did not, how was READ’s original proposal better than ours.
Transkribus, of course, worked very well and is now a key component of a suite of software tools we are using to realise Beyond 2022, Ireland’s virtual record treasury. On 30 June 1922, at the outbreak of Ireland’s civil war that followed independence from the United Kingdom, the Public Record Office of Ireland was completely destroyed. On the centenary of this cultural catastrophe, 30 June 2022, the Beyond 2022 Project, funded by the government of Ireland and based at Trinity College Dublin, will unveil a virtual reconstruction of this building and digital surrogates of many of its contents. The copies are official or academic transcriptions sourced from over 50 libraries and archives around the world. Although some of these copies are printed, the majority are manuscript and range from contemporary 13th century transcriptions to official copies made throughout the 19th century. The material is mainly in English or Latin.
We began by producing bespoke HTR models for specific large series of transcriptions, principally unpublished calendars of early material produced by the Irish Record Commissioners, 1810-1830. As these are written in consistent copperplate hand to very high standards, the results form Transkribus are excellent. Our next steps were to produce models tailored to the more cursive hands used by Victorian antiquaries. These enthusiasts sometimes produced 10,000 pages of transcriptions, entirely on their own and for their own research. We are fortunate to have found several collections of these transcriptions made for private research in libraries as far afield as Chicago. The senior officials who ran Ireland on behalf of the British crown normally made copies of the official records produced during their tenure and left the copies in Ireland when they moved on, taking the originals with them. These also consist of large collections of around 10,000 pages, usually the work of one or two officials working carefully, and thus also ideal for a Transkribus approach. We have recently bundled our hand-specific models into a single base model that produces excellent results from most official documents in English, 1600-1900. This model will be made publicly available to all Transkribus users on 20 June 2021, the 99th anniversary of the destruction of the Public Record office of Ireland.
The Beyond 2022 workflow consists of receiving digital images of historical printed and manuscript texts from our archival partners, ‘stitching’ these documents to their destroyed equivalents and placing them back on the virtual shelves of the PROI. We have produced a detailed 3D rendered model of the building and recovered the shelving arrangements for the 140,000 folders, boxes, bound volumes and parchment rolls that were its contents. We can, therefore, place the original at the exact location in the building where it was destroyed. With Transkribus, we can produce high quality searchable text that is then analysed by our own Natural Language Processing system that in turn produces triples of entities that are the basis for a knowledge graph for Irish history. This final step is not possible without Transkribus sitting in the middle producing millions of words of high quality text.
Trinity College Library joined the READ COOP on behalf of the university as READ transitioned from an ERC funded research project to an independently funded one. Beyond 2022 is one of several projects based at Trinity that uses the COOP’s services, and many more are on the way.
For more on Beyond 2022 see our website: https://beyond2022.ie/