WP1: Digital editions in the 21st century: theory and practice
What are the dominant theories of digital editions? What are the field’s ambitions, its principles, and how do they relate to scholarly editing pre-digital age? From a technical perspective, is proposing an extension to the TEI schema the best way to address born-digital sources, or might there be alternatives? What computational methods to assist editing have been developed or used to date (e.g. variant analysis and named-entity recognition)? This first phase of the project will conduct secondary research analysis of existing digital scholarly editions, practices and theories, including interviews with five of the world’s leading theorists and practitioners of digital scholarly editing. The chief output will be a much-needed whitepaper synthesising all the existing research and perspectives on digital scholarly editing, a document which will act as an essential resource for both the project team and the wider scholarly community. It will, in part, serve as a comprehensive review of the aims and objectives of digital scholarly editing as an endeavour, and the technical systems and models which facilitate the actual development of digital editions, serving as a vital compendium and roadmap for the future field.
WP2: C21 editions in practice: case studies
A key element of the project’s methodology will be to develop two scholarly digital editions, as case studies, that explore the different methods and techniques for editing born-digital sources and incorporating machine-assisted insights in the editing process. As such, the development of these editions will run concurrently and in dialogue with the research that will take place during WPs 3 and 4 below. Each edition has been chosen because of the challenges it raises for each of our two problem areas. Each edition will become a publicly accessible output and demonstrator of the project’s learnings.
1. Ní Churreáin. Annemarie Ní Churreáin has been described as a poet “speaking for a generation”. Raised in Donegal, she has served as Writer in Residence at Maynooth University and received both a Next Generation Artists Award and John Broderick Residency Award from the Arts Council of Ireland. Her two full-length collections of poetry, Bloodroot (Doire Press 2017) andThe Poison Glen (The Gallery Press 2021) have been critically acclaimed. Demonstrating how and why an author’s social media content should be featured as the focus of a digital critical edition, the C21 Editions project will develop a minimal computing prototype of Ní Churreáin’s Instagram posts.
2. The Canterbury Tales. No version of Chaucer’s incomplete work The Canterbury Tales exists in his own hand; instead we have 80+ manuscript witnesses that differ in narrative structure and text. In the early to late 1990s the British Academy and AHRB funded The Canterbury Tales Project (dhi.ac.uk/projects/canterbury-tales) as a series of CD-ROMs which were published by Cambridge University Press. Since then, there has been no online edition of The Canterbury Tales that can be used by students and teachers apart from utilitarian versions such as the Norman Blake Edition (chaucermss.org) and Project Gutenberg’s e-book. However, the University of Sheffield retains copyright of the full-text transcriptions for eight of the most important witnesses (PI Michael Pidd was one of the original staff of The Canterbury Tales Project). C21 Editions will develop a new online, open access version of The Canterbury Tales and explore how machine-assisted insights combined with external knowledge resources can be used to produce a single annotated text that is comprehensive, accurate, scholarly and insightful as a core resource for teaching and study.
WP3: Digital editions in the 21st century: a user-led approach
A survey designed to better understand the expectations and use of digital editions finds that, when asked “what use would you make of the data published in a digital edition”, respondents most frequently cited “teaching”. A very close second is “text analysis”.8 These findings demonstrate that one of the main things that individuals expect from digital editions, the ability to use quantitative techniques on the curated materials, is typically not a feature of the editions. In some instances, users are permitted to download the materials, and can subsequently conduct their own analyses if they have the specialist expertise or resources required to do so; but in most cases, it is either impossible or prohibitively cumbersome to use the data contained in digital editions for computer-assisted text analysis. This WP will convene two User Design Groups of 10 people each—one group per case study—and undertake two inclusive design workshops per group in order to develop a better understanding of what different audiences of end-users want to be able to see and do with digital scholarly editions. Inclusive design methods will use focus groups, paper prototyping, and user experience testing of the two editions at each stage of their development. Outputs will include design recommendations for digital scholarly editions more broadly.
WP4: A framework for C21 editions
This WP aims to develop practical, technical proposals that will enable future editors to create digital scholarly editions that a) meet the broad user requirements identified during the WP3 inclusive design sessions; b) enable born-digital content to be included, and in a way that does justice to the unique characteristics of this type of content; c) make use of computer-assisted insights during the editing process so that feature-rich editions can be created more speedily and capitalise on the large online knowledge resources that now exist. The case studies will be used to develop, test and validate the following work:
1. The project will develop a proposal for editing standards that accommodate born-digital content. The proposal will be in the form of online user guidelines, a schema (for XML or equivalent), and review of appropriate tools for implementing the schema. The guidelines and schema will need to cover all aspects of born-digital texts, such as their format, structure, non-linearity, inter relationships, ownership, anonymisation, temporality etc. The project starts from the premise that such a schema might serve as an extension to the TEI P5 XML schema, given that TEI accommodates born-analogue texts and is widely used by editors in the arts and humanities, although the outcomes of research in WP1 might favour an alternative encoding standard. Regardless, the guidelines will show how the schema for one data encoding standard can be translated into an alternative schema or data model such as OWL and JSON-LD. The case studies (WP2) will show how such a schema might be implemented in practice.
2. The project will develop a proposal for using machine-assisted insights during the process of edition making. The proposal will be in the form of online user guidance that describes different methods and approaches, and a toolkit of open-source software and cloud-based services. The choice of machine-assisted methods will be dependent on the outcomes of research in WP1 as well as the broad user requirements established in WP3, but one might envisage the following using ML and NLP methods: topic modelling, sentiment analysis, entity recognition and entity linking (for building and implementing domain ontologies), network analysis, and automatic summarisation. Key criteria for this work will be the reproducibility, sustainability, low entry barrier (in terms of technical literacy) and wide applicability of the tools, methods and services. For example, whilst a service that can query Web APIs and pull in contextual information from Wikipedia, Wikidata and elsewhere using ML based comparisons or entity linking, and then rewrite it (auto-summarisation) as annotations for an edition might have a broad value for editors, a software tool that has been trained to compare and disambiguate Middle English spelling variants will have more limited value (unless it can be easily trained to do the same for modern spellings in any language).