Work Packages

WP1: Digital editions in the 21st century: theory and practice 

What are the dominant theories of digital editions? What are the field’s ambitions, its principles, and how do  they relate to scholarly editing pre-digital age? From a technical perspective, is proposing an extension to the  TEI schema the best way to address born-digital sources, or might there be alternatives? What computational  methods to assist editing have been developed or used to date (e.g. variant analysis and named-entity  recognition)? This first phase of the project will conduct secondary research analysis of existing digital  scholarly editions, practices and theories, including interviews with five of the world’s leading theorists and  practitioners of digital scholarly editing. The chief output will be a much-needed whitepaper synthesising all the existing research and perspectives on digital scholarly editing, a document which will act as an essential  resource for both the project team and the wider scholarly community. It will, in part, serve as a  comprehensive review of the aims and objectives of digital scholarly editing as an endeavour, and the  technical systems and models which facilitate the actual development of digital editions, serving as a vital  compendium and roadmap for the future field. 

WP2: C21 editions in practice: case studies 

A key element of the project’s methodology will be to develop two scholarly digital editions, as case studies,  that explore the different methods and techniques for editing born-digital sources and incorporating  machine-assisted insights in the editing process. As such, the development of these editions will run  concurrently and in dialogue with the research that will take place during WPs 3 and 4 below. Each edition  has been chosen because of the challenges it raises for each of our two problem areas. Each edition will  become a publicly accessible output and demonstrator of the project’s learnings. 

1. Ní Churreáin. Annemarie Ní Churreáin has been described as a poet “speaking for a generation”. Raised in Donegal, she has served as Writer in Residence at Maynooth University and received both a Next Generation Artists Award and John Broderick Residency Award from the Arts Council of Ireland. Her two full-length collections of poetry, Bloodroot (Doire Press 2017) andThe Poison Glen (The Gallery Press 2021) have been critically acclaimed. Demonstrating how and why an author’s social media content should be featured as the focus of a digital critical edition, the C21 Editions project will develop a minimal computing prototype of Ní Churreáin’s Instagram posts. 

2. The Canterbury Tales. No version of Chaucer’s incomplete work The Canterbury Tales exists in his own hand; instead we have 80+ manuscript witnesses that differ in narrative structure  and text. In the early to late 1990s the British Academy and AHRB funded The Canterbury Tales  Project (dhi.ac.uk/projects/canterbury-tales) as a series of CD-ROMs which were published by  Cambridge University Press. Since then, there has been no online edition of The Canterbury Tales that can be used by students and teachers apart from utilitarian versions such as the Norman Blake  Edition (chaucermss.org) and Project Gutenberg’s e-book. However, the University of Sheffield  retains copyright of the full-text transcriptions for eight of the most important witnesses (PI Michael  Pidd was one of the original staff of The Canterbury Tales Project). C21 Editions will develop a new  online, open access version of The Canterbury Tales and explore how machine-assisted insights  combined with external knowledge resources can be used to produce a single annotated text that is  comprehensive, accurate, scholarly and insightful as a core resource for teaching and study. 

WP3: Digital editions in the 21st century: a user-led approach 

A survey designed to better understand the expectations and use of digital editions finds that, when asked  “what use would you make of the data published in a digital edition”, respondents most frequently cited  “teaching”. A very close second is “text analysis”.8 These findings demonstrate that one of the main things  that individuals expect from digital editions, the ability to use quantitative techniques on the curated materials, is typically not a feature of the editions. In some instances, users are permitted to download the  materials, and can subsequently conduct their own analyses if they have the specialist expertise or resources  required to do so; but in most cases, it is either impossible or prohibitively cumbersome to use the data  contained in digital editions for computer-assisted text analysis. This WP will convene two User Design  Groups of 10 people each—one group per case study—and undertake two inclusive design workshops per  group in order to develop a better understanding of what different audiences of end-users want to be able  to see and do with digital scholarly editions. Inclusive design methods will use focus groups, paper  prototyping, and user experience testing of the two editions at each stage of their development. Outputs will  include design recommendations for digital scholarly editions more broadly. 

WP4: A framework for C21 editions 

This WP aims to develop practical, technical proposals that will enable future editors to create digital  scholarly editions that a) meet the broad user requirements identified during the WP3 inclusive design  sessions; b) enable born-digital content to be included, and in a way that does justice to the unique  characteristics of this type of content; c) make use of computer-assisted insights during the editing process so that feature-rich editions can be created more speedily and capitalise on the large online knowledge  resources that now exist. The case studies will be used to develop, test and validate the following work: 

1. The project will develop a proposal for editing standards that accommodate born-digital content.  The proposal will be in the form of online user guidelines, a schema (for XML or equivalent), and  review of appropriate tools for implementing the schema. The guidelines and schema will need to  cover all aspects of born-digital texts, such as their format, structure, non-linearity, inter relationships, ownership, anonymisation, temporality etc. The project starts from the premise that  such a schema might serve as an extension to the TEI P5 XML schema, given that TEI accommodates  born-analogue texts and is widely used by editors in the arts and humanities, although the outcomes  of research in WP1 might favour an alternative encoding standard. Regardless, the guidelines will  show how the schema for one data encoding standard can be translated into an alternative schema  or data model such as OWL and JSON-LD. The case studies (WP2) will show how such a schema might  be implemented in practice. 

2. The project will develop a proposal for using machine-assisted insights during the process of edition  making. The proposal will be in the form of online user guidance that describes different methods  and approaches, and a toolkit of open-source software and cloud-based services. The choice of  machine-assisted methods will be dependent on the outcomes of research in WP1 as well as the  broad user requirements established in WP3, but one might envisage the following using ML and NLP  methods: topic modelling, sentiment analysis, entity recognition and entity linking (for building and  implementing domain ontologies), network analysis, and automatic summarisation. Key criteria for  this work will be the reproducibility, sustainability, low entry barrier (in terms of technical literacy)  and wide applicability of the tools, methods and services. For example, whilst a service that can query  Web APIs and pull in contextual information from Wikipedia, Wikidata and elsewhere using ML based comparisons or entity linking, and then rewrite it (auto-summarisation) as annotations for an  edition might have a broad value for editors, a software tool that has been trained to compare and  disambiguate Middle English spelling variants will have more limited value (unless it can be easily  trained to do the same for modern spellings in any language).