Early modern text transcription revolutionized by ethical machine learning tools

Over recent years, digitization efforts have made sixteenth- and seventeenth-century printed books more widely available than ever before. Scholars are now able to search digital transcriptions for keywords without leaving their desks or having to visit physical archives. Still, as easy as access is, most digitized material remains untranscribed due to limitations of time, labor, and funds.

Early modern text transcription revolutionized by ethical machine learning toolsilluminated manuscript, Antiphonary, Santa Chiara (Naples), 16th century. Credit: Yair Haklai / CC BY-SA 4.0

A new article published in The Sixteenth Century Journal by Serena Strecker and Kimberly Lifton addresses both the technical and the ethical dimensions of this issue. The authors discuss alternatives to traditional transcription methods, which often relied on outsourced laborers—such as graduate students or workers—to manually transcribe historical texts.

Optical Character Recognition (OCR) software, while effective for transcribing late 19th- and 20th-century texts, is inappropriate for the type of inconsistencies common in early modern print. Early modern scholars have thus turned increasingly to Handwritten Text Recognition (HTR) technology. Transkribus, the most effective HTR software, supports public transcription model access or personal training, providing a new solution to the transcription challenge.

Strecker and Lifton conducted a case study using Transkribus on a sample group of four sixteenth-century German exempla collections. The results of their experiments proved that even publicly available models of HTR can generate very accurate early modern printed text transcriptions. Additionally, if scholars use the public models of Transkribus to generate training data, they can develop their own models tailored to their source materials in a five-step process.

Early modern text transcription revolutionized by ethical machine learning toolsHandwriting by Wilhelm Moritz Keferstein around 1864, examples of letters extracted from the handwritten chronicle of the Zoological Museum of Göttingen. Credit: F. Welter-Schultes

This approach not only maximizes transcription accuracy but also guarantees ethical compliance. It is “no longer necessary nor desirable” to employ outsourced workers, the authors argue. Instead, they promote a shift toward empowering individual researchers to produce their own transcriptions, which avoids reinforcing inequalities in academia and reproducing the long-lasting effects of colonial labor practices.

Despite the promise of HTR, the authors are clear that the early modern academic community needs to discuss how this technology can be integrated into research workflows. “With the accurate and automated transcription of early modern print no longer a goal but a reality,” Strecker and Lifton conclude, “the field of early modern studies must consider what combination of human labor and machine learning technology will be accepted, supported, and will ultimately shape the future of research.”

They emphasize that future transcriptions must not only be technologically efficient but also uphold labor ethics. “Only by insisting on ethical labor practices can scholars avoid either exacerbating inequities within the academic hierarchy or perpetuating the lasting inequalities of colonialism.”

More information: Strecker, S., & Lifton, K. (2025). Unlocking the digitized archive of early modern print: The automatic transcription of early modern printed books. The Sixteenth Century Journal, 56(2), 395–419. doi:10.1086/735052

Related Posts

The oldest mummies in the world: ancient Southeast Asian burials rewrite early human history

The oldest mummies in the world: ancient Southeast Asian burials rewrite early human history

Archaeologists in Southeast Asia have unearthed what may be the oldest mummies in the world, dating back as far as 12,000 years. The pre-Neolithic burials, found throughout…

Sardinian figurines reveal Bronze Age metal trade and wide connections

Sardinian figurines reveal Bronze Age metal trade and wide connections

A recent international study has explained the history of Sardinia’s iconic bronzetti statues, showing previously unknown facts about Bronze Age metallurgy and long-distance trade in the Mediterranean….

350-year-old mummified head in Switzerland traced to Bolivia’s Aymara people after new study

350-year-old mummified head in Switzerland traced to Bolivia’s Aymara people after new study

A mummified head housed in Switzerland for more than a century is rewriting what experts thought they knew about its origins. For many years, the remains—consisting of…

New study shows Britain’s economy did not collapse after the Romans left

New study shows Britain’s economy did not collapse after the Romans left

A new study has indicated that Britain’s industrial economy did not collapse with the Romans’ withdrawal, but instead continued for centuries, and actually reached its major revival…

Complete copy of the Canopus Decree unearthed in Egypt after 150 years

Complete copy of the Canopus Decree unearthed in Egypt after 150 years

In a breakthrough find, an Egyptian archaeological team has unearthed a previously unknown and intact copy of the Canopus Decree, which dates back to 238 BCE. The…

Late Bronze Age elites at Seddin reveal foreign origins and long-distance mobility across Europe

Late Bronze Age elites at Seddin reveal foreign origins and long-distance mobility across Europe

New research has demonstrated that the majority of people buried in monumental mounds in northwestern Brandenburg, Germany, around Seddin, were not locals but individuals believed to have…