MLA Convention 2020

Panel: “Databases and Print Culture Studies”

Presentation: “Data beyond representation: From computational modelling to performative materiality”

The framing of digital literary studies has shifted quite dramatically in recent years, as claims to objective facts have been replaced by references to interpretation and contingency, especially in discussions of computational modelling. Yet both positivist and modelling approaches are grounded in representationalism whereas today I’ll argue that we should approach literary databases, and hence our enactments of digital literary studies, in performative terms.

By representationalism I mean the idea that a knowing human agent symbolically expresses – or represents – some thing-in-the-world (that thing is unchanged by that expression, and that expression is  more available or apprehensible to the subject than the thing itself) – so the basic structure of Cartesian dualism. By contrast, a performative approach conceives of symbolic or discursive practices as always material, and vice versa; entities don’t pre-exist engagements but are generated in an ongoing or emergent way, by those intra-actions.

That probably sounds fairly esoteric, but my intention today is to be very concrete: by demonstrating the need for this shift using examples from recent print-culture studies articles and from my own construction of two print culture studies databases.


The first of those, To be continued: The Australian Newspaper Fiction Database is a collection of around 23,000 full-text records of novels, novellas and short stories (from around the world) published in 19th- and early 20th-century Australian newspapers. (These were identified by mining digitised historical newspapers for the first 21,000 and by crowdsourcing for an addition 2,000 and counting).


The second, Reading at the Interface is still under development; we’re currently finalising data collection. It uses 600,000 or so works of “Australian literature” listed in the national online bibliography to identify millions of pieces of writing about that literature on various platforms including academic journals, newspapers, Goodreads and Librarything.

Let me begin by briefly elaborating the first claim I made: that both positivist and modelling approaches are representationalist.


Until the early 2010s, prominent contributions to digital literary studies presented data as a direct representation of the literary past. Interpretation, when not denied outright, was understood to occur after empirical data was subjected to objective methods to reveal the literary system (often in the form of a visualisation).


In the shift to modelling, interpretation (and often, specifically, subjectivity) is identified as intrinsic to digital literary studies. Whether creating a model of something, or using a model for some purpose, scholars understand themselves to be engaging with contingent representations of literary phenomena. Models are always “fictions,” in Willard McCarty’s words, or “wrong,” as Richard Jean So puts it.

In this same period that many digital literary scholars have embraced the contingency of representation, other areas of the humanities have called that paradigm into question. And as I said, I’ll discuss three articles in print culture studies that move in this other direction. While not necessarily referring explicitly to representationalism or performativity, these articles foreground the emergent materialisation of literary phenomena that digital literary scholars often perceive as “objects” for “representation.”


The first, by Meredith McGill, focuses on reprinting, or the resetting of type from already printed books. Although she mentions digital technologies only in passing, the challenge that McGill ascribes to reprinting for literary studies also pertains to digital literary studies. In both cases, repetition – whether reprinting or digitising – disrupts the foundational idea that literary texts are single, stable, and self-evident things-in-the-world.

I won’t detail how these first two points apply to digital literary studies, because both were central to an article I wrote a few years ago, called “The Equivalence of ‘Close’ and ‘Distant’ Reading,” about Franco Moretti’s approach to quantitative literary history, primarily. There, I objected to the tendency (vis-a-vis, McGill’s point 2) for “distant readers” to define literary works as singular entities, pinned in time and place, typically by the date of first book publication and the author’s nationality; and I argued (vis-a-vis, point 1) that this approach can’t engage with the multiple acts of articulation, including reprinting and digitising, and resulting interconnections and relationships, by which literature is constituted as literature. It’s McGill’s third point I want to focus on, because it offers a very concrete way of thinking about how, whether in reprinting or in digitising, we inscribe the boundaries we often presume to represent.


McGill illustrates the instability of the boundary of text and paratext by the frequency with which first edition title pages are reprinted in mass-market paperbacks. I like this example because it demonstrates the complexity – even in individual instances – of drawing a boundary we often take for granted at scale, in creating literary databases. (And indeed, that we often reify, in projecting onto the distinction of data and metadata.) It also indicates the complexity of the materialisation of that boundary, in that, we can say this is not the paratext for this mass-market novel, yet it is that paratext. Combining these two ideas doesn’t mean that texts and paratexts don’t exist – or that the line between them can be drawn anywhere. Rather, it identifies editors – and digital literary scholars – as part of a process of materialisation of a boundary that over time gives the effect of stability as it generates the form of the literary text. 

In digital literary studies, the boundary between text and paratext always matters, both in the sense that it is materialised in our databases, and that it profoundly shapes what we can know of literary history.


For the Australian Newspaper Fiction Database, drawing this boundary involved automatic and manual data curation of the API field “heading,” which distinguishes the first four lines of digitised article text, to tease out paratextual features ranging from subtitles to the ways in which nationality was inscribed in publication events. For my current, Reading at the Interface database, I’m interested in how different collections of paratexts (or writings about literature) might materialise or constitute different versions of “Australian literature”.

In both cases, these boundaries could be drawn differently; but that doesn’t make them arbitrary. Rather, they are effects of material-semiotic engagements: for instance, with API structures, the zoning of digitised pages, data standards used by platforms, and so on. And these effects both generate and constrain the possibilities of inquiry. Digital methods such as stylistic and machine learning approaches engage directly with this unstable boundary – in identifying textual patterns, often with respect to paratextual features. The results of such methods are commonly taken to be observations or measurements of facts of literary texts. Yet recognising this boundary as performatively produced means reframing such results as part of a process of material-discursive emergence.


Laura Mandell makes a similar point about how digital projects categorise literary texts. Most specifically, her essay criticises the tendency for stylistic research to use M/F (male/female) categories for authorship. But Mandell not arguing that studies referring to “sex” are biased while those exploring “gender” (even in non-binary terms) are not. Her point, rather, is that – as with texts and paratexts – all inquiries create boundaries (or cuts) in a complex reality that can be organised in other ways; and all such boundary-making practices are inevitably biased at the same time as they are a condition of inquiry. That does not, however, make all inquiries or engagements equally valid. For categories of authorship, the difference between “sex” and “gender” is that the former tends to reinforce, while the latter helps to resist, the myth of “an objective observer measuring an inert reality”.


I love the metaphor – of the stereotype – that Mandell uses to make this point and to advocate for a more enabling, what I would call a performative, digital literary studies. Stereotype can refer to plates for reprinting (as here) and to discriminatory assumptions about the world. Mandell argues that, because digital literary studies does not have to fix the text in place (as with stereotype plates) it can better enable “the fluid exploration of parameters and taxonomies.”


Although I used M/F categories for the The Australian Newspaper Fiction Database, I was interested in enabling precisely this fluid exploration. This meant generating multiple parameters of sex/gender: for instance, investigating the embodied “sex” of historical authors as well as the way gender was inscribed in publication events, and both in relation to whether a title was anonymous or pseudonymous, the place and time of publication, and related concepts of embodied and inscribed nationality. (And the slide just shows some of the multiple author taxonomies used in the database.)

My new project is a more deliberate experiment with the effects of parameterization, in that data-mining explicitly involves applying a category – “Australian literature” – to various sites where that category might or might not have been implicated in the creation of literary meaning. In mining Goodreads, for instance, using a list of works defined by an academic bibliography, I’m not interested in representing discussion of “Australian literature” on Goodreads so much as in materialising that platform in ways that cannot be separated from my categories of analysis.


Michael Gavin’s response to Nan Z. Da’s “computational case against computational literary criticism” article is the third and final example I want to give of a shift from representational to performative conceptions of the supposed “objects” of digital literary studies. Where Da argues that all forms of digital (or computational) text analysis involve counting words, Gavin emphasises how literary texts are transformed by computational processes. As a result, the entities that digital literary scholars investigate are profoundly, and ontologically, different from the literary texts that word frequencies supposedly represent.


Rather than with sets of texts, Gavin argues that digital literary scholars are engaging with semantic models in which words and documents are, as he puts it, “mutually constituted by the linear transformation of lexical space into bibliographical space.”

A representational framework prevents us coming to grips with these new material-discursive formations. It makes us believe that we are exploring symbolic expressions of things that exist elsewhere, whereas a corpus, a topic, a word embedding, and so on – these digital literary phenomena are effects of intra-actions that are part of the process of generating seemingly stable boundaries between humans and computers, computers and texts, texts and humans.


In the book I wrote with the Australian Newspaper Fiction Database, I proposed responding to this situation by modelling our databases on scholarly editions. With some exceptions, scholarly editors also don’t use the language of performativity. But a scholarly edition doesn’t represent a literary work (or, I suggested, a literary system); rather, it intervenes in – simultaneously transforming and engaging with – a complex unfolding event.

More recently, however, just as Mandell argues that digital scholarly environments shouldn’t be conceived as fixed, I now wonder if the framework of the scholarly edition is rather too bound by print conventions. As a consequence, in Reading at the Interface, I’m exploring what it might mean to conceive of literary databases as apparatuses, in the sense the term is used in various scientific disciplines, particularly physics.


There, an apparatus is a specific material configuration, including of physicists, wherein certain properties become determinate, while others are excluded. One can’t measure light as a particle and a wave using the same apparatus; but that doesn’t mean that light is not one thing when it is measured as the other. Although it must be said that the phenomena explored in digital literary studies are much more diverse than those for which apparatuses in physics are developed, I wonder if shifting to a conception of measurements as effects of particular material arrangements might help us to reframe some key debates in our field.

For instance, at present, discussion of “representativeness” and “reproducibility” are bound up together, with the implication that if we can represent something accurately enough the results of analysis will be reproducible. Foregrounding the apparatus, by contrast, recognises that our knowledge making practices, as Karen Barad puts it, “contribute to, and are part of, the phenomena we describe”. Might attention to the material specificities of our engagements with literary phenomena be a route to reclaiming objectivity without insisting on an “inert reality” to be symbolically expressed?

A performative approach might also reframe discussion of the role of scientific methods in humanities inquiry. Whether presented as a critique or point of pride, the idea that digital literary studies imports such methods frequently imagines science in ways that have barely been updated since the early 20th century.

MLA_print culture studies panel

Might we reimagine such interdisciplinarity by recognising that many contemporary scientific fields understand measurement, for instance, not as representing the world but as contributing to its production?

To conclude, when digital literary scholars build “print culture studies” databases, we face quite a stark choice between representationalism and performativity. We can imagine these databases as symbolic expressions, by knowing agents, of stable, self-evident, and self-contained (printed) objects that exist elsewhere. Or, we can employ them as sites – or apparatuses – for engaging with literary texts as emergent events, always arising from and altering how the literary past is (re)configured. Today I’ve tried to show how digital and non-digital scholarship in print culture studies encourages us in this second path.


Works cited

Barad, Karen. Meeting the Universe Halfway: Quantum Physics and the Entanglement of Matter and Meaning. Duke University Press, 2007.

Bode, Katherine. A World of Fiction: Digital Collections and the Future of Literary History. University of Michigan Press, 2018.

Bode, Katherine. “The Equivalence of ‘Close’ and ‘Distant’ Reading; or, Toward a New Object for Data-Rich Literary History.” MLQ 78.1 (2017): 77-106.

Gavin, Michael. “Is There a Text in My Data? (Part 1) On Counting Words” Journal of Cultural Analytics September 17, 2019.

Mandell, Laura. “Gender and Cultural Analytics: Finding or Making Stereotypes?” Debates in Digital Humanities 2019. Edited by Matthew K. Gold and Lauren Klein. University of Minnesota Press, 2019.

McCarty, Willard. Humanities Computing. Palgrave Macmillan, 2005.

McGill, Meredith. “Echocriticism: Repetition and the Order of Texts.” American Literature 88.1 (2016): 1-29.

So, Richard Jean. “All Models are Wrong.” PMLA 132.3 (2017): 668-73.