Research Context

Digital scholarly editing typically involves the creation of an authoritative version of primary textual sources  that can be read on and utilises the affordances of computers. Some notable exceptions aside, the field of  digital scholarly editing to date has largely been occupied with producing machine-readable surrogates of  printed materials such as books, manuscript, historical documents or epistolary correspondences, using web  technologies to facilitate familiar features such as annotation, editorial elaboration, glosses and links to other  supporting materials, or metadata that enables materials to be classified or searched. Digital editions which  treat born-digital materials or avail of the affordances of statistical methods are very rare. 

Some notable exceptions do exist: Digital Fiction Curios is a virtual reality  experience that houses works of electronic literature by UK-based digital artists Andy Campbell and Judi  Alston, while Newcastle University’s Poetics of the Archive allows users to explore their  Bloodaxe Books collection using network visualisations of word associations contained within the 1,500  digitised items. At a grander scale, HathiTrust Research Center Analytics allows  users to perform computer-assisted text analysis on user-created collections of its volumes using out-of-the box tools. Such projects demonstrate the affordances of using computers in the production of digital editions,  and what might be achieved in the use of computers to curate born-digital content, and indeed, make  analytics and distant reading an inherent part of the archival process.  

But these exemplars also demonstrate how digital scholarly editing and publishing, despite having  “matured as a field” (Schreibman, 2013), has much to achieve in terms of reproducibility, feasibility, and methodological  integration. Digital Fiction Curios is an engaging and immersive approach to curating works of born-digital  electronic literature, but it is a creative endeavour and expressive experience in itself before it is an analytical  tool suited to studying the older works of Campbell and Alston it has archived (for more on ‘electronic literature’, see Heckman et al., 2018). Poetics of the Archive marries  the importance of facsimiles with novel, computer-assisted analytical techniques, but users are limited to  using one of three methods, and so can only use this feature of the archive to inform a very narrow set of  research questions. HathiTrust Research Center Analytics offers a comprehensive set of tools for analysing  its digital library, but it has been designed largely with remediated print-content in mind. None of these  archival systems are readily reproducible: they are either closed systems that have been purpose-built for  very specific content, or major collaborations built on substantial funding.  

The process of edition making remains a largely industrial craft process: the materials, medium and  methods are technological, but the work itself remains largely manual and bespoke. Digital editions are labour intensive and therefore limited in scale, whilst data mining techniques tend to be reserved for large,  low quality OCR’d text collections where the end user needs help with discovery. Few editors have realised  that machine-assisted insights can benefit the practice of edition making, and in doing so increase the scale  and rapidity and interpretive potential with which critical scholarly editions are created and published.  Domain ontologies, linked data and machine learning can enable editors to draw on larger knowledge  resources, out on the web, and NLP methods such as entity linking and automatic summarisation can semi automate the process of text annotation. Advancing the methods of digital scholarly editing to include  machine-assisted insights is important for two reasons: it enables us to develop a better understanding of  our sources by drawing on knowledge from our wider, data-rich world (new insights); and it enables us to  develop editions that have high scholarly authority and value more quickly (faster insights). 

The concern for replicating the scholarly editing models of printed editions limits most editions to  searchable, annotated facsimiles that sit in physical and intellectual isolation from one another, despite often  using a common data standard (usually TEI), and despite often setting out to unite physically disparate  archive sources. The approach is not easily transferable to born-digital texts which will increasingly become  our historical sources; particularly social media which presents challenges of reproducibility, durability,  preservation, ownership, contextualisation and hypertextuality that are rarely a problem when working with  analogue sources, as a text or corpus is traditionally treated as a fixed and bounded body of evidence. 

Open, sustainable platforms suited to the creation of digital editions do exist, such as Omeka and  Scalar, but these are inflexible, out-of-the-box tools with no analytical features and somewhat unsuited to  born-digital materials like Tweets and digital fiction. Earlier, proprietary platforms such as DynaText were  focussed on a digital publishing paradigm that mirrored the printed book, and editors had to resort to tools  such as XMetaL and EMACS to create actual content, as many editors still do. Most online digital scholarly  editions are “bespoke”, in that they rely on purpose-built user interfaces for presenting the text and tools,  with all the software sustainability issues that this raises. Occasionally these interfaces are supported by  purpose-built XML editing and document management back-end systems, such as the New Panorama of Polish Literature and the Archivio Storico Ricordi. That  so many digital editions have purpose-built interfaces suggests that editors are straining to break free from  the old paradigms but lack adequate turnkey solutions to help realise their visions.  

The Text Encoding Initiative’s TEI Guidelines provide “an open-ended repository of suggested  methods for encoding textual features”(Blanke et al., 2014, pp16-27), and its P5 XML schema has long been the data standard of choice  for editors, even if turnkey systems for publishing it have been limited. There is much scope for TEI guidelines  to be extended to and adapted for born-digital artifacts or coupled with computer-assisted analytical  techniques. Basing any future models for digital editing on an existing standard like TEI makes sense for  interoperability: “Without guidelines such as the TEI, exchange and repurposing of data will not be possible  and electronic editions will be used as standalone objects with their own set of characteristics, objectives  and requirements” (Franzini et al., 2019, p176). Despite TEI’s best efforts, the majority of digital editions are precisely that. The digital scholarly edition remains central to the intellectual practices and productive outputs of  the arts and humanities, and yet, their form and structure have remained largely unchanged by the  affordances of computers. The edition is often the version of the primary source that is most immediate,  accessible, and informative to scholars and students alike, and so it is vital that we invest in projects which  further enhance that dialogue, enabling researchers to establish the methods and principles for developing  the scholarly digital editions of the future. To that end, C21 Editions will operate as a response to Joris van  Zundert, who calls on theorists and practitioners to “intensify the methodological discourse” necessary to  “implement a form of hypertext that truly represents textual fluidity and text relations in a scholarly viable  and computational tractable manner”. “Without that dialogue,” he warns, “we relegate the raison d’être for  the digital scholarly edition to that of a mere medium shift, we limit its expressiveness to that of print text,  and we fail to explore the computational potential for digital text representation, analysis and interaction” (Zundert, 2016, p106).