Early modern text transcription revolutionized by ethical machine learning tools

Over recent years, digitization efforts have made sixteenth- and seventeenth-century printed books more widely available than ever before. Scholars are now able to search digital transcriptions for keywords without leaving their desks or having to visit physical archives. Still, as easy as access is, most digitized material remains untranscribed due to limitations of time, labor, and funds.

Early modern text transcription revolutionized by ethical machine learning toolsilluminated manuscript, Antiphonary, Santa Chiara (Naples), 16th century. Credit: Yair Haklai / CC BY-SA 4.0

A new article published in The Sixteenth Century Journal by Serena Strecker and Kimberly Lifton addresses both the technical and the ethical dimensions of this issue. The authors discuss alternatives to traditional transcription methods, which often relied on outsourced laborers—such as graduate students or workers—to manually transcribe historical texts.

Optical Character Recognition (OCR) software, while effective for transcribing late 19th- and 20th-century texts, is inappropriate for the type of inconsistencies common in early modern print. Early modern scholars have thus turned increasingly to Handwritten Text Recognition (HTR) technology. Transkribus, the most effective HTR software, supports public transcription model access or personal training, providing a new solution to the transcription challenge.

Strecker and Lifton conducted a case study using Transkribus on a sample group of four sixteenth-century German exempla collections. The results of their experiments proved that even publicly available models of HTR can generate very accurate early modern printed text transcriptions. Additionally, if scholars use the public models of Transkribus to generate training data, they can develop their own models tailored to their source materials in a five-step process.

Early modern text transcription revolutionized by ethical machine learning toolsHandwriting by Wilhelm Moritz Keferstein around 1864, examples of letters extracted from the handwritten chronicle of the Zoological Museum of Göttingen. Credit: F. Welter-Schultes

This approach not only maximizes transcription accuracy but also guarantees ethical compliance. It is “no longer necessary nor desirable” to employ outsourced workers, the authors argue. Instead, they promote a shift toward empowering individual researchers to produce their own transcriptions, which avoids reinforcing inequalities in academia and reproducing the long-lasting effects of colonial labor practices.

Despite the promise of HTR, the authors are clear that the early modern academic community needs to discuss how this technology can be integrated into research workflows. “With the accurate and automated transcription of early modern print no longer a goal but a reality,” Strecker and Lifton conclude, “the field of early modern studies must consider what combination of human labor and machine learning technology will be accepted, supported, and will ultimately shape the future of research.”

They emphasize that future transcriptions must not only be technologically efficient but also uphold labor ethics. “Only by insisting on ethical labor practices can scholars avoid either exacerbating inequities within the academic hierarchy or perpetuating the lasting inequalities of colonialism.”

More information: Strecker, S., & Lifton, K. (2025). Unlocking the digitized archive of early modern print: The automatic transcription of early modern printed books. The Sixteenth Century Journal, 56(2), 395–419. doi:10.1086/735052

Related Posts

Imaging technology may reveal hidden rock art at Finland’s Astuvansalmi site

Imaging technology may reveal hidden rock art at Finland’s Astuvansalmi site

Astuvansalmi in Ristiina, Finland, has the largest known prehistoric rock painting collection in the Nordic countries. Some 80 motifs of animals, humans, and abstract figures have been…

3D scanning preserves ancient Buddhist temples in Nepal’s Himalayas

3D scanning preserves ancient Buddhist temples in Nepal’s Himalayas

In the Himalayan region of Dolpo in northwestern Nepal, researchers have undertaken an ambitious mission to digitally document the centuries-old Buddhist monasteries in the region. Combining advanced…

Eating carrion reconsidered: how scavenging shaped human evolution and made us human

Eating carrion reconsidered: how scavenging shaped human evolution and made us human

A new multidisciplinary study led by the National Center for Research on Human Evolution (CENIEH), in collaboration with IPHES-CERCA and other Spanish universities, challenges conventional ᴀssumptions about…

Ancient Maya monument reveals Ix Ch’ak Ch’een, the 6th-century queen who ruled the city of Cobá

Ancient Maya monument reveals Ix Ch’ak Ch’een, the 6th-century queen who ruled the city of Cobá

Archaeologists have identified Ix Ch’ak Ch’een as one of the rulers of the ancient Maya city of Cobá, and uncovered the city’s dynastic history during the 6th…

Thera eruption predates Pharaoh Ahmose: radiocarbon study solves ancient mystery in Egypt’s chronology

Thera eruption predates Pharaoh Ahmose: radiocarbon study solves ancient mystery in Egypt’s chronology

A new radiocarbon dating analysis has finally settled one of archaeology’s most contentious debates: when the mᴀssive eruption of Thera (Santorini) volcano occurred in relation to the…

Rare ᴀssyrian inscription found in Jerusalem

Rare ᴀssyrian inscription found in Jerusalem

Archaeologists have unearthed an extraordinary ᴀssyrian inscription from the First Temple period in Jerusalem—the first of its kind to be discovered in the city. The tiny fragment…