Early modern text transcription revolutionized by ethical machine learning tools

Over recent years, digitization efforts have made sixteenth- and seventeenth-century printed books more widely available than ever before. Scholars are now able to search digital transcriptions for keywords without leaving their desks or having to visit physical archives. Still, as easy as access is, most digitized material remains untranscribed due to limitations of time, labor, and funds.

Early modern text transcription revolutionized by ethical machine learning toolsilluminated manuscript, Antiphonary, Santa Chiara (Naples), 16th century. Credit: Yair Haklai / CC BY-SA 4.0

A new article published in The Sixteenth Century Journal by Serena Strecker and Kimberly Lifton addresses both the technical and the ethical dimensions of this issue. The authors discuss alternatives to traditional transcription methods, which often relied on outsourced laborers—such as graduate students or workers—to manually transcribe historical texts.

Optical Character Recognition (OCR) software, while effective for transcribing late 19th- and 20th-century texts, is inappropriate for the type of inconsistencies common in early modern print. Early modern scholars have thus turned increasingly to Handwritten Text Recognition (HTR) technology. Transkribus, the most effective HTR software, supports public transcription model access or personal training, providing a new solution to the transcription challenge.

Strecker and Lifton conducted a case study using Transkribus on a sample group of four sixteenth-century German exempla collections. The results of their experiments proved that even publicly available models of HTR can generate very accurate early modern printed text transcriptions. Additionally, if scholars use the public models of Transkribus to generate training data, they can develop their own models tailored to their source materials in a five-step process.

Early modern text transcription revolutionized by ethical machine learning toolsHandwriting by Wilhelm Moritz Keferstein around 1864, examples of letters extracted from the handwritten chronicle of the Zoological Museum of Göttingen. Credit: F. Welter-Schultes

This approach not only maximizes transcription accuracy but also guarantees ethical compliance. It is “no longer necessary nor desirable” to employ outsourced workers, the authors argue. Instead, they promote a shift toward empowering individual researchers to produce their own transcriptions, which avoids reinforcing inequalities in academia and reproducing the long-lasting effects of colonial labor practices.

Despite the promise of HTR, the authors are clear that the early modern academic community needs to discuss how this technology can be integrated into research workflows. “With the accurate and automated transcription of early modern print no longer a goal but a reality,” Strecker and Lifton conclude, “the field of early modern studies must consider what combination of human labor and machine learning technology will be accepted, supported, and will ultimately shape the future of research.”

They emphasize that future transcriptions must not only be technologically efficient but also uphold labor ethics. “Only by insisting on ethical labor practices can scholars avoid either exacerbating inequities within the academic hierarchy or perpetuating the lasting inequalities of colonialism.”

More information: Strecker, S., & Lifton, K. (2025). Unlocking the digitized archive of early modern print: The automatic transcription of early modern printed books. The Sixteenth Century Journal, 56(2), 395–419. doi:10.1086/735052

Related Posts

Archaeologists uncover Benin City’s ancient urban and artistic legacy

Archaeologists uncover Benin City’s ancient urban and artistic legacy

Archaeological excavations in Benin City, Nigeria, conducted ahead of the construction of the Museum of West African Art (MOWAA), are uncovering new details about the Kingdom of…

Teotihuacan’s forgotten sacred mountain: archaeologists uncover Cerro Patlachique’s pilgrimage shrine

Teotihuacan’s forgotten sacred mountain: archaeologists uncover Cerro Patlachique’s pilgrimage shrine

High above the southern edge of the Teotihuacan Valley (Mexico) rises Cerro Patlachique — a peak now shown to have served as a major pilgrimage shrine long…

Over 100 musket balls unearthed at Culloden reveal valiant last stand

Over 100 musket balls unearthed at Culloden reveal valiant last stand

More than 100 musket balls and cannon sH๏τs found in archaeological excavations at Scotland’s Culloden Battlefield are casting new light on one of the final and most…

Oldest coin ever found in Saxony: 2,200-year-old Celtic gold “Rainbow Cup” unearthed near Leipzig

Oldest coin ever found in Saxony: 2,200-year-old Celtic gold “Rainbow Cup” unearthed near Leipzig

A 2,200-year-old gold coin found near Gundorf, northwest of Leipzig, has been identified as the oldest coin ever discovered in Saxony. Discovered in July by certified metal…

Were Neanderthals capable of making art?

Were Neanderthals capable of making art?

by Paul Petтιтt — The ability to make art has often been considered a hallmark of our species. Over a century ago, prehistorians even had trouble believing…

Carthaginian bronze coins seized in Norway highlight ancient trade and modern heritage challenges

Carthaginian bronze coins seized in Norway highlight ancient trade and modern heritage challenges

In 2022, a case of cultural heritage trafficking unfolded between Tunisia and Norway, revealing how ancient artifacts remain vulnerable to illicit trade. A Tunisian man tried to…