Introducing Text Titan II

Our most accurate transcription model yet - one multilingual model for print and handwriting across Latin scripts.
We are introducing Text Titan II, the successor to Text Titan I and the most accurate general-purpose model we have ever built. It transcribes both printed and handwritten documents across the major Latin-script languages, and it sets a new accuracy standard on the kind of material our users work with on Transkribus every day. Trained on more than 250 million words, Text Titan II gives you a very robust out of the box performance.
Text Titan II reduces character error rates by an average of around 47% over Text Titan I across Latin scripts. We see large gains, not only across print material, but also for handwritten material, the hardest and most common use case in our community collections.
What's new
Text Titan I established that a single model could handle a wide range of documents without forcing users to hunt for the right specialised model first. Text Titan II takes that further. It is more accurate across every language and document type we tested, even compared to our previous best language specific Super Models. In practice, that means most users can reach for one model and trust it.
Handwritten and print material sees error rates roughly halved. Output that previously needed substantial correction is now closer to usable as it comes off the model, which changes the economics of large transcription projects.
The results
We evaluated Text Titan II on an internal benchmark of 2,016 pages, sampled to be representative of the material processed on the Transkribus platform and manually verified to ground-truth quality. The set is 81% handwriting and 19% print, with a language mix reflecting platform usage (34% German, 13% English, 12% Dutch, 10% French).
.png)
Most of the world’s documentary heritage is handwritten and irregular. Across the centuries it has comprised of hundreds of obscure languages and scripts. It is exactly this material that generally-purpose tools struggle with. Unlike those models, the Text Titan II is specifically built around these archaic languages, scripts and documents. Progress on these benchmarks is difficult, but fundamental to unlocking our written past. It is the difference between a collection that can be searched, studied, and preserved, and one that stays locked away forever.
That is the problem Transkribus exists to solve, and it is why we measure ourselves on real platform material rather than simpler, more modern test sets, or those comprised of a handful of different historical collections. A benchmark drawn from the documents our users actually bring us is the harder test, as it draws from one of the most diverse collections of historical material in the world.
A note on how we build
Transkribus is operated by READ-COOP, a cooperative owned by the institutions and individuals who use it. Improvements to our models are not built to maximise a metric for a quarterly announcement; they are built for the archives, libraries, researchers, and family historians who are also our members. Text Titan II is a step in a roadmap we share with the people it serves.
Availability
Text Titan II is rolling out on the Transkribus platform. We recommend it as the default starting point for most Latin-script documents, printed or handwritten. For specialised scripts, particular historical hands, or languages outside its current coverage, a dedicated model may still give better results, and those models remain available.
This is an early version of Text Titan II, and accuracy will continue to improve as training progresses. We will keep you updated as it does - and as always, the best test is your own documents.


