'Channels of Digital Scholarship' Seminar I: New tools and old questions in the analysis of textual corpora’

‘Greek and Latin corpora’

Convenors:  Olivier Delouis, Christophe Gaillac

The study of Greek and Latin languages is ever more concerned with corpora analysis. Collections of texts have developed dramatically over the last twenty years. Almost all Greek ancient and medieval literature from Homer to the Fall of Constantinople in 1453 is today digitized through the “Thesaurus Linguae Graecae” or TLG (by the University of California, founded 1972), while on the Latin side the “Brepolis’ Library of Latin Texts” offers an immense array of texts from the beginnings of Latin literature until the present day (by Brepols Publisher, founded 1991). Inventories of these classic corpora, including growing collections in open access, are regularly made and enable studies that are generally limited to each individual case (see for instance Digital Classical Philology, Ancient Greek and Latin in the Digital Revolution, dir. Monica Berti, 2019).

Now, there are many methods and tools applicable to the analysis of modern languages. Still, the branch of artificial intelligence that helps computers to understand human languages, i.e. Natural Language Processing (NLP), remains underdeveloped for classical languages. Many concepts used in modern corpora analysis such as deep learning-based approaches, convolutional and recurrent neural networks, contextual language models or recently bidirectional encoder representations from transformers (BERT) are still far away from being used in classical humanities.

In this seminar, we aim to present the work of scholars who engage in cross-disciplinary collaboration between the study of classical literature and NLP.


Thibault Clérice (École nationale des chartes, PSL, Paris) – “Detecting sexual isotopies in Latin corpora: setting up an experiment and first results”
Marianne Reboul (IHRIM – UMR 5318 & ENS Lyon): “Homer and Machine Learning: translations alignment on Iliad and Odyssey”
Thea Sommerschield (Marie Curie Fellow, Ca' Foscari University of Venice) – “Greek epigraphic data for Machine Learning applications: the Ithaca project”

'Channels of Digital Scholarship' Seminar

Research in the digital humanities has experienced explosive growth and development in the last ten years. Two important factors have contributed to this progress: firstly, the very strong mobilisation of scientific and scholarly communities to engage with this emerging field in all humanities sectors; secondly, the extraordinary progress of digital tools and capacities. This has resulted in a profusion of initiatives at all levels: major digitisation projects led by libraries and academic institutions, digitisation of corpora of all kinds and for all periods, and multiple research projects with targeted objectives.

The aim of this first Channels of Digital Scholarship seminar series is to reflect upon new avenues for the analysis and use of textual corpora. Textual corpora and their uses represent several challenges in the development and validation of digital tools for analysis, the dialogue between disciplines, and the institutional structures that support the wide range of projects that are being developed. In this series of four seminars, the Maison Française d'Oxford and Digital Scholarship @ Oxford, with the help of leaders of digital humanities initiatives in the CIVIS network, propose to explore these challenges from Franco-British and international perspectives.

Conveners: Goran Gaber (EHESS (LIER-FYT)), Andrew Cusworth (Digital Scholarship, Oxford), Christophe Gaillac (Nuffield College, Oxford), Pascal Marty, Olivier Delouis, Tristan Alonge (MFO), Grégoire Lacaze (MFO/Aix Marseille).

