Research Context

Digital scholarly editing typically involves the creation of an authoritative version of primary textual sources that can be read on and utilises the affordances of computers. Some notable exceptions aside, the field of digital scholarly editing to date has largely been occupied with producing machine-readable surrogates of printed materials such as books, manuscript, historical documents or epistolary correspondences, using web technologies to facilitate familiar features such as annotation, editorial elaboration, glosses and links to other supporting materials, or metadata that enables materials to be classified or searched. Digital editions which treat born-digital materials or avail of the affordances of statistical methods are very rare.

Some notable exceptions do exist: Digital Fiction Curios is a virtual reality experience that houses works of electronic literature by UK-based digital artists Andy Campbell and Judi Alston, while Newcastle University’s Poetics of the Archive allows users to explore their Bloodaxe Books collection using network visualisations of word associations contained within the 1,500 digitised items. At a grander scale, HathiTrust Research Center Analytics allows users to perform computer-assisted text analysis on user-created collections of its volumes using out-of-the box tools. Such projects demonstrate the affordances of using computers in the production of digital editions, and what might be achieved in the use of computers to curate born-digital content, and indeed, make analytics and distant reading an inherent part of the archival process.

But these exemplars also demonstrate how digital scholarly editing and publishing, despite having “matured as a field” (Schreibman, 2013), has much to achieve in terms of reproducibility, feasibility, and methodological integration. Digital Fiction Curios is an engaging and immersive approach to curating works of born-digital electronic literature, but it is a creative endeavour and expressive experience in itself before it is an analytical tool suited to studying the older works of Campbell and Alston it has archived (for more on ‘electronic literature’, see Heckman et al., 2018). Poetics of the Archive marries the importance of facsimiles with novel, computer-assisted analytical techniques, but users are limited to using one of three methods, and so can only use this feature of the archive to inform a very narrow set of research questions. HathiTrust Research Center Analytics offers a comprehensive set of tools for analysing its digital library, but it has been designed largely with remediated print-content in mind. None of these archival systems are readily reproducible: they are either closed systems that have been purpose-built for very specific content, or major collaborations built on substantial funding.

The process of edition making remains a largely industrial craft process: the materials, medium and methods are technological, but the work itself remains largely manual and bespoke. Digital editions are labour intensive and therefore limited in scale, whilst data mining techniques tend to be reserved for large, low quality OCR’d text collections where the end user needs help with discovery. Few editors have realised that machine-assisted insights can benefit the practice of edition making, and in doing so increase the scale and rapidity and interpretive potential with which critical scholarly editions are created and published. Domain ontologies, linked data and machine learning can enable editors to draw on larger knowledge resources, out on the web, and NLP methods such as entity linking and automatic summarisation can semi automate the process of text annotation. Advancing the methods of digital scholarly editing to include machine-assisted insights is important for two reasons: it enables us to develop a better understanding of our sources by drawing on knowledge from our wider, data-rich world (new insights); and it enables us to develop editions that have high scholarly authority and value more quickly (faster insights).

The concern for replicating the scholarly editing models of printed editions limits most editions to searchable, annotated facsimiles that sit in physical and intellectual isolation from one another, despite often using a common data standard (usually TEI), and despite often setting out to unite physically disparate archive sources. The approach is not easily transferable to born-digital texts which will increasingly become our historical sources; particularly social media which presents challenges of reproducibility, durability, preservation, ownership, contextualisation and hypertextuality that are rarely a problem when working with analogue sources, as a text or corpus is traditionally treated as a fixed and bounded body of evidence.

Open, sustainable platforms suited to the creation of digital editions do exist, such as Omeka and Scalar, but these are inflexible, out-of-the-box tools with no analytical features and somewhat unsuited to born-digital materials like Tweets and digital fiction. Earlier, proprietary platforms such as DynaText were focussed on a digital publishing paradigm that mirrored the printed book, and editors had to resort to tools such as XMetaL and EMACS to create actual content, as many editors still do. Most online digital scholarly editions are “bespoke”, in that they rely on purpose-built user interfaces for presenting the text and tools, with all the software sustainability issues that this raises. Occasionally these interfaces are supported by purpose-built XML editing and document management back-end systems, such as the New Panorama of Polish Literature and the Archivio Storico Ricordi. That so many digital editions have purpose-built interfaces suggests that editors are straining to break free from the old paradigms but lack adequate turnkey solutions to help realise their visions.

The Text Encoding Initiative’s TEI Guidelines provide “an open-ended repository of suggested methods for encoding textual features”(Blanke et al., 2014, pp16-27), and its P5 XML schema has long been the data standard of choice for editors, even if turnkey systems for publishing it have been limited. There is much scope for TEI guidelines to be extended to and adapted for born-digital artifacts or coupled with computer-assisted analytical techniques. Basing any future models for digital editing on an existing standard like TEI makes sense for interoperability: “Without guidelines such as the TEI, exchange and repurposing of data will not be possible and electronic editions will be used as standalone objects with their own set of characteristics, objectives and requirements” (Franzini et al., 2019, p176). Despite TEI’s best efforts, the majority of digital editions are precisely that. The digital scholarly edition remains central to the intellectual practices and productive outputs of the arts and humanities, and yet, their form and structure have remained largely unchanged by the affordances of computers. The edition is often the version of the primary source that is most immediate, accessible, and informative to scholars and students alike, and so it is vital that we invest in projects which further enhance that dialogue, enabling researchers to establish the methods and principles for developing the scholarly digital editions of the future. To that end, C21 Editions will operate as a response to Joris van Zundert, who calls on theorists and practitioners to “intensify the methodological discourse” necessary to “implement a form of hypertext that truly represents textual fluidity and text relations in a scholarly viable and computational tractable manner”. “Without that dialogue,” he warns, “we relegate the raison d’être for the digital scholarly edition to that of a mere medium shift, we limit its expressiveness to that of print text, and we fail to explore the computational potential for digital text representation, analysis and interaction” (Zundert, 2016, p106).