Text2Image

The Text2Image (Text-to-Image) tool tries to match existing transcriptions on page level to a line segmentation.
Currently, you have to follow a 2-step approach using the Expert Client of Transkribus:

1Upload of text files:
Existing transcriptions can currently be specified during upload of the document as seperate txt files in a subfolder called ‘txt’. Note that each txt file has to have the same basename as the corresponding image file.
Those transcriptions are then stored in ‘dummy lines’, i.e. lines with the size of the image.
To upload text files for existing documents, go to “Menu -> Document -> Sync local text files with doc…” in the expert client.

2 – Start matching process:
To use Text2Image in the Expert Client, go to “Tools -> Other Tools -> Text2Image…”. The dialog has the following options:

  • Base model: a model must be selected which first performs an HTR – the resulting text is then compared with the input text to find a match.
  • Perform Layout Analysis: whether to perform a baseline detection before the HTR (if not selected, the existing baselines are used)
  • Keep unmatched lines: whether to keep the text from the lines of the HTR that could not be matched
  • Perserve line order: whether to preserve the line order of the input text during matching
  • Write similarity tag: whether to write a similarity tag with an accuracy value within the custom tag of each matched line
  • Region threshold: threshold for block based matching (ranges between 0 and 1) – as a first step, the whole text of a page is assigned to a region according to this threshold – set this value to 0 to try matching the input text to each region (which leads to higher computational cost)
  • Line threshold: threshold for the line matching (ranges between 0 and 1) – after the text has been matched to a region, the input lines are matched to the lines os this region using this threshold – 0.45 is usally a good value for longer lines, for shorter lines try using a higher value, e.g. 0.7 or even 0.9 for better matching quality

Related Articles

Can AI save bad scans?

Can AI save bad scans?

The starting point for any kind of document digitization, whether done by hand or through sophisticated text recognition algorithms, is a good-quality image. Take a look at the one below. It is a...

Mapping Medieval Vienna: The digital edition of historical land registers supported by Transkribus
Success StoryMedievalArchives+1Austria

Mapping Medieval Vienna: The digital edition of historical land registers supported by Transkribus

A central goal of the research project 'Mapping Medieval Vienna' is to make the Viennese land registers of the 15th century available to the public. This is because the land register entries contain...

Supporting Future Scholars: The Transkribus Scholarship Programme

Supporting Future Scholars: The Transkribus Scholarship Programme

Imagine you are a student who wants to dive into the personal story of one of the few famous child authors in history; or who wants to discover what made the authors of the Spanish Golden Age of...