[in which the citation practices of the inventor of the worldwide web and the co-founder of OpenLibHums are compared – and I attempt to interest people in data citation.]
Tim Berners-Lee’s “Information Management: A Proposal” is “gray” literature (an internal memo at CERN, never formally published), and nonetheless one of the most influential and highly cited documents in history. However, in all of the published academic work devoted to this short memo, Sir Tim’s citation style has never received the critique it deserves.
In-text citation broadly comes in two-and-a-half flavours – either the entire reference appears within a footnote or endnote (Oxford style), or a parenthetical key using ordinal numbers (Vancouver) or author surname and date plus an identifier if needed (Harvard) style directs the reader to the appropriate part of a list of references at the end of the work .
Sir TBL’s innovation was to provide citations as short text, combining the ease of location of Vancouver with the immediacy of Harvard. Many of his references are to resources where an author name is not present, or where significant additional text is required alongside a (broadly ACM-style) reference.
For instance, [HYP88] refers to a special edition of “Communications of the ACM” entitled “Hypertext on Hypertext” written in Hyperties syntax and sold on floppy disk, containing the full text of eight papers from the July edition of Comm ACM.
(Not quite the first e-Journal, though: that was Richard H. Zander’s Flora Online in 1987, available by subscription on disc, and free via a BBS. This was primarily a directory of source code for academic software in botany.)
But back to TBL’s citation style. If we see a citation as working on two levels, both as an immediate aide memoir to those cognisant of the current state of a given field (a surname and a year being enough to indicate the work in question) and as a way for those less familiar with the field to find an article in question (via looking it up in a list of references) it can be seen as a neat way of achieving this in a field where resources may be well known but citation is not a situational norm.
As is, of course hypertext, combining three levels of information that can indicate a source: the semantic context of the surrounding text, the raw URL viewable via a mouseover, and (hopefully) the full resource – or at least a means of obtaining it – via a link.
Kathleen Fitzpatrick (@Kfitz on twitter) considers the place of the academic citation in a world of hypertext in a recent article for the LA review of books. As the lead editor of the most recent edition of the venerable MLA Handbook, she is well placed to reflect:
All of our current citation formats were invented for a print-based universe, in which each book or article gave the impression of standing alone. Bibliographic notes and markers connect these many individual texts into a broader, ongoing conversation. But now that we live in a world in which no text need be an island, in which scholarly publications are increasingly delivered digitally and so can be literally interconnected via links and embeds, it is reasonable to ask whether citations are still necessary.
Her conclusion draws on the need for a future scholar to understand not just the name and authorship of the work referred to, but also the precise version of said work
[P]ublications and other cultural objects are no longer quite as fixed in format as they were, and their very malleability may heighten the importance for future scholars of knowing precisely which version today’s researcher consulted.
A problem Martin Eve will know well, from his examination of multiple versions of David Mitchell’s Cloud Atlas. It turns out there are two substantially differing versions of the text of the novel, stemming from UK and US translations and differentiable by place of publication and ISBN.
We have unique identifiers for texts in the form of ISBNs but we have become complacent about assuming that all editions are equal on first publication. When we write of ‘Cloud Atlas’, to what are we referring? Is it the textual edition cited in the bibliography? Nominally, yes, but more often the assumption is that we mean this to refer to the ur-structure, the named entity of a text that is ‘the novel’.
(if you are wondering how he dealt with the two versions of the text within the paper -which he references as “Mitchell, D (2008). Cloud Atlas. London: Sceptre” and “Mitchell, D (2008). Cloud Atlas. New York: Amazon Kindle” – he uses ‘P’ for paper and ‘E’ for e-book in discussion, with what amounts to occasional full references in text as citations(!), the former as “(Mitchell, Cloud Atlas [Sceptre, 2008],’P’)” and cunningly never referring to the latter in the text of the paper (but I’d guess “(Mitchell, Cloud Atlas [Amazon Kindle, 2008],’E’)”).
So neither links or citation/referencing are ideally adapted to dealing with such issues – although Tim Berners-Lee might have elegantly sidestepped all this by using [CLOUDATLASPRINT08] and [CLOUDATLASEBOOK08] in text 🙂
This is bad news for those submitting to a journal which rigidly applies a citation/reference style (which is most of them). And is not really an issue that EndNote, Zotero or Mendelay (those masters of arcane styles) can help us with either.
The reason I am concerning myself with such arcana is because
I’m really boring the citation of research data (of which there may be multiple versions) presents similar problems, except all the time rather than in rare cases like Cloud Atlas. With a paper comparing sensor readings taken milliseconds apart, conventional citation is just not going to cut it – which of course is why we use unique identifiers (though there is an issue in indicating relations between datasets which is where PCDM comes in…). It’s another reason why we cannot simply bring traditional citation processes over to datasets.