A response to some responses …

It’s been a heartening (and in my career to date, a unique) experience to receive so much feedback on my MLQ article “The Equivalence of ‘Close’ and ‘Distant’ Reading; or, Toward a New Object for Data-Rich Literary History” (preprint). I’m grateful to those who felt drawn to engage with it.

The experience has also been a steep lesson in using Twitter – going from occasional (conference) lurking to contributing (threads are confusing!) – and I thought I would try another social media first for me by posting a response to some of the feedback.

As its title indicates, the MLQ article has two parts: the first (by far the longest) section diagnoses a problem; the second (much briefer) one offers a solution – or rather, steps towards one. The second section has generated the most commentary, and much of what I say below considers that. But I want, first, to respond to some characterisations of the diagnosis presented by Andrew Piper’s thought-provoking “Data, data, data. Why Katherine Bode’s new piece is so important and why it gets so much wrong about the field”.

two scholars =/= a field 

I think it’s important to be clear that I *do not think* Franco Moretti and his application of “distant reading” represent data-rich literary history – or computational literary criticism or digital literary studies or digital humanities. Yet for different audiences, depending on their familiarity with the research, he is often equated to all these fields. As someone who regularly assesses grant proposals from scholars wanting to use data in their research, I NEED to stop finding as the aim: to “conduct ‘distant reading’ (Moretti 2013)”. And while it is true, as Piper says, that much of the field has moved away from Moretti’s approach, as far as I can tell we haven’t communicated this move, or why it is necessary, to colleagues not directly engaged in such research (I wonder if we have even articulated the reasons to ourselves?). By exploring problems with the two highest-profile approaches for those not involved in data-rich literary history, I sought to indicate differences within the field so as to clear a space to discuss why we do what we do, on what basis, and with what standards. I allude to other research in the article specifically to show that the entire field does not follow the methods of these two researchers.

I’m also very aware that many scholars do publish their data. I’ve been publishing all of mine for years: for example. While Moretti and Jockers do not publish their data, many others in the field do (in the article I discuss Ted Underwood’s work with HathiTrust as representative of that shift). Humanitiesdata.com likewise recognises that we need to make the grounds of our arguments accessible and reusable (thanks to Matthew Lavin for pointing me to this site), as does the example Piper highlights, the Journal of Cultural Analytics. I didn’t mention CA because it was launched after the article was accepted – one of the many problems of traditional publication. But Piper is right to say that CA emphasises the importance of data publication, and his introduction to the journal is one of the best explanations as to why this step is necessary (including why it must acknowledge the cultural analyst’s implication in such knowledge). But publishing data does not mean that we have shared standards for publication, let alone for modelling literary-historical formations: it’s the need for that conversation that I wanted to foreground. I’ll return to some of Piper’s other critiques at the end of this post.

Let me turn, now, to some responses to my proposal for a “scholarly edition of a literary system,” both to elaborate on what I’m trying to do in constructing one, and to complicate some of the features that have been ascribed to it.

building collaborations

Thomas Padilla’s post “Ghosting, or something else?” made the important point that scholarly discussion often substitutes the “archive” – a singular, impersonal thing – for the many things that collections contain, and for the “people who manage, curate, present, and make the data accessible”. I’ve certainly been guilty of this, and I agree that we need to find ways of discussing collections that enrich “our interactions, our arguments, and our relationships with each other”.

I don’t think a “scholarly edition of a literary system” solves these problems. But it does take steps in the right direction. Describing a history of transmission of literary data (a foundational element of such an edition) requires considering the specificity of the many things represented, and the processes and (yes!) the people involved in collecting, curating, remediating, and transforming those things. Bonnie Mak’s outstanding analysis of EEBO – “Archaeology of a digitization” (abstract) – demonstrates the potential for such a history to show how individuals – as well as historical events and technologies – shaped a mass-digitised collection and its capacity to support research questions, including data-rich ones. I try to do the same in my book for Trove’s newspaper collection.

If a history of transmission offers one way of discussing the multiple things and people that make collections, I think data-rich research is already creating collaborations between those who use and those who build collections; or more accurately, that these contemporary conditions of research are collapsing the distinction between those groups. Building a scholarly edition of fiction in nineteenth-century Australian newspapers has enabled me to work with the people who build Trove, in that these detailed descriptions of the documentary record are valuable to the collection, and will be re-ingested as new objects and entities in Trove.

The database through which this edition will be published also aims to build collaborations and collapse boundaries in another way: by making literary data available and interesting to both old and new users/creators of collections – not only librarians and digital researchers, but any scholar or member of the public. While maintaining the consistency of the curated dataset, it interacts with Trove to enable users/creators to index new fictional instalments and titles, and to improve textual data using Trove’s justifiably celebrated text correction facility. (Want to read Oscar Wilde as he was published in nineteenth-century Australian newspapers? Why not correct the OCR as you go along and end up with your own edition of the work! Or alternatively, want to read an Australian novel never published except in newspapers? Why not ensure all instalments are present and correct the OCR as you go along and then release a first book edition on Project Gutenberg!)

Such collaborations are not dependent on the framework of a scholarly edition: for creating connections between scholars and librarians, Underwood’s work with HathiTrust amply demonstrates that. But in its accessibility, a scholarly edition of a literary system is especially suited to building collaborations by providing new ways of representing and engaging with, and new publics for, literary data.

commensurability and statistics

Matthew Lincoln’s response raises the vital issue of “Commensurability”, noting that it’s at odds with the aim of developing “bespoke ontologies for encoding humanities data” (discussed by Miriam Posner and essential if data-rich literary history is to expand the terms of disciplinary debate). I wish I had a neat answer to this question! But I do think that scholarly editions of a literary system – should they ever exist in the plural – will be assisted in this aim because they will build on a longstanding and extensively elaborated framework for representing literary objects: bibliography. Bibliography’s multiple forms – enumerative, descriptive, analytical – have different aims and emphases. But they also often overlap. I wonder if “all the constituent records” need to be commensurable or if it is enough to have anchors or pivots for partial commensurability between editions? I think this is what Lincoln is getting at when he refers to commensurability in the “anatomical units” of literary data, and these are connections that can be built upon.

Although I do pass over statistical theories of confidence and sampling with barely a hand wave in the article, I’ve grappled with this elsewhere, for instance, arguing in an article in Victorian Periodicals Review that statistical methods of probability shouldn’t be used with network models of publication data (preprint). And I’m not averse to probabilistic methods per se. One of the chapters of my forthcoming book (A World of Fiction) analyses fiction in 19th-century Australian newspapers by using a probabilistic method (decision trees) to tie the results of another probabilistic method (topic modeling) to a documentary and historical context. Rather than being against such methods, I think they need to be conducted in a way that renders the historical and documentary context primary.

terms of debate

This brings me (and any resilient readers still with me!), finally, to Piper’s response to my proposal for a scholarly edition of a literary system. As I read his post, it identifies two problems with this framework: one of which raises an important question for how we might proceed as a field; the other of which misreads my argument by misunderstanding the nature of a scholarly edition.

Critique 1: A scholarly edition of a literary system limits the type of datasets that can be used for data-rich literary history.

It is definitely true that modelling relationships of production and reception is only one possible approach to the problem of historical representation. That approach is certainly necessary when one seeks to investigate (as Moretti and Jockers do) reception and influence – and indeed, to use reception to explain production. But one could explore multiple issues in literary history using only production data (including only first editions) or only reception data (including on library borrowers and borrowing).

In suggesting that models of literary systems integrate production and reception, I was not claiming that other datasets are not useful for certain research questions. I was thinking about how we might construct datasets that support a range of questions beyond those the creator wants to ask. If we’re going to devote so much time and energy to creating literary data, why not consider how to make the outcomes of that process resilient to a range of questions?

Making such datasets won’t be what everyone wants to do: just like creating a scholarly edition of a literary work was never what everyone wanted to do. But my question is whether there is value in moving from thinking about individual projects to using the outcomes of analysing mass-digitised collections as a basis for literary history in general, whether it is conducted by computational or non-computational means.

Critique 2: A scholarly edition of a literary system aligns data construction with “singularity – in its singular standing for history” – admitting no other reference points for historical understanding and acknowledging no limitations.

As well as its unsurpassed importance, Piper asserts that I claim my dataset’s perfection: “It may be the case that she has access to ‘all’ newspapers ever printed in Australia (though I’d be surprised). But are they all equally accessible in terms of textual quality (OCR) and what about other types of representations of fiction, say, books? Small presses? Manuscript circulation?” Piper also notes that, “waiting for the perfect data set or the perfect model is a bit like waiting for the white whale. And thinking that one set solves all problems is equally problematic.”

These summaries are diametrically opposite to the argument I intended. I proposed “a scholarly edition of a literary system” not because I think the terminology is elegant or catchy (it’s awful and clunky) but because I wanted to be clear about what I was trying to do: namely, apply to the modelling of literary systems a framework that has been used for centuries to model literary works in a way that acknowledges partiality and contingency, while providing a foundation for analysis. It would seem that my attempt at clarity in the article didn’t work – leaving me only with ugly nomenclature! – so I want to explain more fully what I meant by presenting the scholarly edition as a useful foundation for data-rich literary history.

The curated text of a conventional scholarly edition is not simply the work, or the work with extra references or a historical introduction. It embodies an argument about an imaginary whole – the literary work – which never existed in fact but only as an ideal, and where the contents are, by definition, contested. Likewise, the curated dataset for a scholarly edition hypothesises the existence and interrelationships of literary works in the past. It is an argument about the nature and meaningfulness of a set of objects and connections that is not inevitable: that is, precisely, contestable.

This understanding of a literary system has a number of implications that can be usefully elaborated by extending the analogy with a conventional scholarly edition.

  • Scholarly (or if you like critical) editions are typically reserved for literary works considered important (due, at least in part, to the amount of time necessary to create them!). Such importance is not a fact but a historical argument inflected by a contemporary value judgment. Similarly, I chose to model fiction in 19th-century Australian newspapers not because I think this offers a singular representation of the past, but because I believe this imaginary whole (this literary system) is important for understanding nineteenth-century literary culture, in Australia and as literature circulated globally. (Historical argument – in contrast to the much more diversified markets in America, Britain, and Europe, newspapers were the main source of fiction for the colonies; contemporary value judgment – the popular fiction read by the majority of the population is useful for understanding culture.) Rather than a statement of singularity, my claim that this dataset is valuable for understanding the past is a scholarly argument.
  • Scholarly editions arise from and contribute to a scholarly/literary canon. As such they exclude – or rather, do not represent – many many works. But the idea that we need to avoid canons misunderstands that they are inevitable products of scholarship (as it intermingles with regimes of academic policy and the public sphere). We need to be constantly vigilant of what is included/excluded by our canons. But they are intrinsic to scholarly discussion. This does not mean that they are stable or absolute. Because scholarly editions (of literary works or systems) are claims about historical/contemporary importance their value is a product of their use – some will inevitably be published and judged unimportant; some will be deemed relevant and useful for a time; some will continue to be relevant for a long time. The weight of critical discussion determines whether the original claim to importance is valid and for how long.

Although it might seem to be so, a conventional scholarly edition is not built from “all” of the parts of the work (whatever that would mean). The curated text embodies an argument about the whole based on the editor’s interpretation of the available parts, while the critical apparatus explains and justifies that argument with reference to the history by which those parts are transmitted to and by the present. Likewise, I do not have “all” the newspapers (whatever that would mean), and I freely admit that those that are digitised vary in coverage and legibility. These issues, and many other omissions, inconsistencies, biases, and transformations that constitute the contingency of the documentary record and my interpretation of it, are elaborated and explained in my critical apparatus. It explores the sequence of production and reception by which the nature of the imaginary whole can be known.

This is a long way of saying that, for the scholarly edition of a literary system, like the scholarly edition of a literary work, the whole that is represented is explicitly an artefact of inquiry, at the same time as it offers a scholarly object for analysis. Rather than singular, perfect, and absolute, it’s a proposal for how we might integrate the traditions of literary scholarship with new forms of digital infrastructure and knowledge production. It’s not “finger pointing” nor a failure to be “supportive of the work other people are doing” to want to have this discussion.