Early modern text transcription revolutionized by ethical machine learning tools

Over recent years, digitization efforts have made sixteenth- and seventeenth-century printed books more widely available than ever before. Scholars are now able to search digital transcriptions for keywords without leaving their desks or having to visit physical archives. Still, as easy as access is, most digitized material remains untranscribed due to limitations of time, labor, and funds.

Early modern text transcription revolutionized by ethical machine learning toolsilluminated manuscript, Antiphonary, Santa Chiara (Naples), 16th century. Credit: Yair Haklai / CC BY-SA 4.0

A new article published in The Sixteenth Century Journal by Serena Strecker and Kimberly Lifton addresses both the technical and the ethical dimensions of this issue. The authors discuss alternatives to traditional transcription methods, which often relied on outsourced laborers—such as graduate students or workers—to manually transcribe historical texts.

Optical Character Recognition (OCR) software, while effective for transcribing late 19th- and 20th-century texts, is inappropriate for the type of inconsistencies common in early modern print. Early modern scholars have thus turned increasingly to Handwritten Text Recognition (HTR) technology. Transkribus, the most effective HTR software, supports public transcription model access or personal training, providing a new solution to the transcription challenge.

Strecker and Lifton conducted a case study using Transkribus on a sample group of four sixteenth-century German exempla collections. The results of their experiments proved that even publicly available models of HTR can generate very accurate early modern printed text transcriptions. Additionally, if scholars use the public models of Transkribus to generate training data, they can develop their own models tailored to their source materials in a five-step process.

Early modern text transcription revolutionized by ethical machine learning toolsHandwriting by Wilhelm Moritz Keferstein around 1864, examples of letters extracted from the handwritten chronicle of the Zoological Museum of Göttingen. Credit: F. Welter-Schultes

This approach not only maximizes transcription accuracy but also guarantees ethical compliance. It is “no longer necessary nor desirable” to employ outsourced workers, the authors argue. Instead, they promote a shift toward empowering individual researchers to produce their own transcriptions, which avoids reinforcing inequalities in academia and reproducing the long-lasting effects of colonial labor practices.

Despite the promise of HTR, the authors are clear that the early modern academic community needs to discuss how this technology can be integrated into research workflows. “With the accurate and automated transcription of early modern print no longer a goal but a reality,” Strecker and Lifton conclude, “the field of early modern studies must consider what combination of human labor and machine learning technology will be accepted, supported, and will ultimately shape the future of research.”

They emphasize that future transcriptions must not only be technologically efficient but also uphold labor ethics. “Only by insisting on ethical labor practices can scholars avoid either exacerbating inequities within the academic hierarchy or perpetuating the lasting inequalities of colonialism.”

More information: Strecker, S., & Lifton, K. (2025). Unlocking the digitized archive of early modern print: The automatic transcription of early modern printed books. The Sixteenth Century Journal, 56(2), 395–419. doi:10.1086/735052

Related Posts

600-year-old amethyst jewel found in Polish castle moat reveals secrets of medieval nobility

600-year-old amethyst jewel found in Polish castle moat reveals secrets of medieval nobility

Archaeologists in Poland have uncovered a unique and elegantly crafted jewel from the moat of the medieval Castle Kolno, once situated along a ducal border. The silver…

Ancient DNA study reveals Ötzi the Iceman’s unique ancestry and the genetic legacy of his Alpine homeland

Ancient DNA study reveals Ötzi the Iceman’s unique ancestry and the genetic legacy of his Alpine homeland

A recent paleogenomic study has revealed incredible genetic continuity and subtle social interactions among prehistoric populations in the Eastern Italian Alps that once sheltered Ötzi the Iceman….

Lost Byzantine town of Tharais rediscovered in southern Jordan

Lost Byzantine town of Tharais rediscovered in southern Jordan

After decades of research and field surveys, archaeologists have finally discovered the ancient Byzantine town of Tharais in southern Jordan. This find promises to reveal new insights…

The mystery of Rennes-le-Château and the secrets of Abbé Saunière’s fortune

The mystery of Rennes-le-Château and the secrets of Abbé Saunière’s fortune

For over a hundred years, the small hilltop village of Rennes-le-Château in southern France has been the subject of fascination. Historians, treasure hunters, and conspiracy theorists alike…

Lost medieval tale The Song of Wade decoded by Cambridge scholars solving 130-year-old Chaucerian mystery

Lost medieval tale The Song of Wade decoded by Cambridge scholars solving 130-year-old Chaucerian mystery

A literary enigma that has puzzled scholars for more than a century might have finally been unraveled. Researchers at Cambridge University have reinterpreted a fragment of the…

Medieval Hungarians continued eating horsemeat for centuries despite Christian influence, new study reveals

Medieval Hungarians continued eating horsemeat for centuries despite Christian influence, new study reveals

A recent archaeological study is rewriting our understanding of medieval food and the Christian influence on the foodways of Europe. Contrary to long-held ᴀssumptions that Christianity led…