Early modern text transcription revolutionized by ethical machine learning tools

Over recent years, digitization efforts have made sixteenth- and seventeenth-century printed books more widely available than ever before. Scholars are now able to search digital transcriptions for keywords without leaving their desks or having to visit physical archives. Still, as easy as access is, most digitized material remains untranscribed due to limitations of time, labor, and funds.

illuminated manuscript, Antiphonary, Santa Chiara (Naples), 16th century. Credit: Yair Haklai / CC BY-SA 4.0

A new article published in The Sixteenth Century Journal by Serena Strecker and Kimberly Lifton addresses both the technical and the ethical dimensions of this issue. The authors discuss alternatives to traditional transcription methods, which often relied on outsourced laborers—such as graduate students or workers—to manually transcribe historical texts.

Optical Character Recognition (OCR) software, while effective for transcribing late 19th- and 20th-century texts, is inappropriate for the type of inconsistencies common in early modern print. Early modern scholars have thus turned increasingly to Handwritten Text Recognition (HTR) technology. Transkribus, the most effective HTR software, supports public transcription model access or personal training, providing a new solution to the transcription challenge.

Strecker and Lifton conducted a case study using Transkribus on a sample group of four sixteenth-century German exempla collections. The results of their experiments proved that even publicly available models of HTR can generate very accurate early modern printed text transcriptions. Additionally, if scholars use the public models of Transkribus to generate training data, they can develop their own models tailored to their source materials in a five-step process.

Handwriting by Wilhelm Moritz Keferstein around 1864, examples of letters extracted from the handwritten chronicle of the Zoological Museum of Göttingen. Credit: F. Welter-Schultes

This approach not only maximizes transcription accuracy but also guarantees ethical compliance. It is “no longer necessary nor desirable” to employ outsourced workers, the authors argue. Instead, they promote a shift toward empowering individual researchers to produce their own transcriptions, which avoids reinforcing inequalities in academia and reproducing the long-lasting effects of colonial labor practices.

Despite the promise of HTR, the authors are clear that the early modern academic community needs to discuss how this technology can be integrated into research workflows. “With the accurate and automated transcription of early modern print no longer a goal but a reality,” Strecker and Lifton conclude, “the field of early modern studies must consider what combination of human labor and machine learning technology will be accepted, supported, and will ultimately shape the future of research.”

They emphasize that future transcriptions must not only be technologically efficient but also uphold labor ethics. “Only by insisting on ethical labor practices can scholars avoid either exacerbating inequities within the academic hierarchy or perpetuating the lasting inequalities of colonialism.”

More information: Strecker, S., & Lifton, K. (2025). Unlocking the digitized archive of early modern print: The automatic transcription of early modern printed books. The Sixteenth Century Journal, 56(2), 395–419. doi:10.1086/735052