This presentation owes an enormous debt to the opportunity I have had to both work and converse with Cameron Neylon of Curtin University. I should clarify that the good bits of the material below should be seen as his influence, the shoddier stuff as my lack of understanding and subtlety. It is presented in a personal capacity.
Substantial pixels have, of late, been devoted to the cultural “demise of the expert” and consequent de-legitimising of academic forms of knowledge. But if we want to know why people don’t believe what we believe, we need to take a long hard look at some of the crazier things we do believe.
Likewise, Open Education, as it matures as an idea has been urged by some voices to consider itself as an academic field. Audrey, in her #etug keynote, suggests that we examine the mechanisms, tropes and homunculi of academic prestige before we take that particular pill.
So I’ve been thinking about one of the new gods of academia – the citation index.
Whether we would have it or no, the purpose of [higher education] is changing. A decade ago the graduate of a college was thought to be fitted with the requisites of a cultural, liberal education, to be ready to begin [their] life work as a good citizen. Everywhere we see the demand for the expert worker, the professional […] who has devoted from two to four additional years to train […] in a special way in a particular field.
PLK and EM Gross of Pomona College writing in Science (October 1927)
I start by saluting PLK and EM Gross for writing a landmark article during a period of major institutional reorganisation. “College Libraries and Chemical Education” represents the birth myth of the science of bibliometrics – but was itself focused on identifying scholarly resources for reuse within undergraduate Chemistry education.
Gross and Gross took the latest volume of the Journal of the American Chemical Society as a starting point (“the most representative of American Chemistry”), and simply tabulated the number of references made therein to works in other journals. The academic journals most frequently cited in this periodical were deemed essential for the college library collection, as they had a demonstrably greater influence on the current state of American Chemistry.
Aside from this contribution to library collection building (or resource discovery, if you’d rather) and the wonderful suggestion that every undergraduate chemistry student should have a working knowledge of German, you would be forgiven for thinking that this paper was naught but a historical curiosity. But it was the first in a series of papers that led, directly or indirectly, to the sorry state of academia today.
Vannever Bush’s 1945 piece in the Atlantic “As We May Think” is often hailed as a founding text of the internet (our opening keynote has written, and spoken, with his usual eloquence about this aspect). Bush – fresh from the interdisciplinary practicalities of administering the Manhattan Project – took as his ostensible subject a similar issue to Gross and Gross: the impossibility of keeping up with the literature.
The difficulty seems to be, not so much that we publish unduly in view of the extent and variety of present day interests, but rather that publication has been extended far beyond our present ability to make real use of the record. The summation of human experience is being expanded at a prodigious rate, and the means we use for threading through the consequent maze to the momentarily important item is the same as was used in the days of square-rigged ships.
In considering the future of threading through the maze of scholarship (or an associative trail if you’d rather), Bush considered examining the links between knowledge (or resources signifying knowledge, I guess) as a means of synthesising and creating a greater understanding – allowing for, simply put, better and more accessible research.
Wholly new forms of encyclopedias will appear, ready made with a mesh of associative trails running through them, ready to be dropped into the memex and there amplified. […] The chemist, struggling with the synthesis of an organic compound, has all the chemical literature before him in his laboratory, with trails following the analogies of compounds, and side trails to their physical and chemical behavior.
Bush postulated the development of a “memex” – essentially a large database of literature alongside the links between it, as a means of exploring a corpus. But our third great influence on contemporary academia, Eugene Garfield, was spurred by “As We Think” to consider a much wider variety of uses.
Drawing on Bush’s ideas alongside HG Wells’ “World Brain” – Garfield’s idea was of an “informatorium” to serve “a new Renaissance during which the entire world will be thirsting for knowledge”. (As an aside, these early ideas of the future of scientific literature do not even consider the issue of copyright and ownership over knowledge, seeing it as a public treasury for all to enjoy. What a beautiful world this could be, what a glorious time to be free.)
Garfield developed the Science Citation Index, under the auspices of the Institute for Scientific Information (ISI). As is often the lot of those with a utopian vision, his early work met with almost universal indifference – compare Vannever Bush’s struggles to establish what became the NSF. But his dream – and boundless energy – prevailed, and the work of the ISI as presented in the 60s almost reached the heights of the knowledge discovery solutions postulated by Bush and the Grosses.
Although ISI was founded to support resource discovery, it has competition built in to its underlying logic. In the 60s it was fair to suggest that not every paper in every journal can be indexed, and not every journal was. So the index started with the idea of a pre-sift. It is contended that that majority of references (80%) are made to a minority of (20%) journals, and that using methods similar to Gross and Gross one could identify and focus on indexing “the best” journals.
Even today the widely used Science Citation Index covers citations within only 3745 journals, Elsevier’s Scopus covers a wider range of literature, including over 21,000 journals – impressively comprehensive until one considers that there is not any one complete list of currently or formerly published academic journals. Of course, both major indexes focus mainly on English language publications in clearly defined established disciplines from traditional publishers.
This serves, of course to amplify inequalities built into a publishing system where (predominately white male western) reviewers recommend to (predominately white male western) editors that articles written by (predominately white male western) academics are published. It is perhaps not too far a leap to suggest that academics with these characteristics are well cited. Laura Czerniewicz showed some truly humbling visualisations illustrating this in her OR2016 keynote.
It is unsafe to consider indexed papers as representing the sum total of world knowledge, or even as being representative of the human race. Indexers choose (predominately) one form of expression, and – as we shall see – do not take this or other assumptions into account when describing the products of this indexing.
Citation index entries as text – and a series of terrible monetary metaphors
Paul Wouters, in his 1999 thesis (“The Citation Culture“) draws a distinction between the “reference” as what actually happens in scientific writing, and the “citation” – which is what appears in a citation index. Though each citation in the index has a corresponding reference, there are references (for instance to grey literature, data, source texts) that do not have corresponding citation index entries. And though each “reference” will have a unique context in describing a precise and nuanced relationship between two text, a citation index entry – for the purposes of indexing – is simply a citation index entry. At that point, all those words of wisdom sound the same.
The role of the citation might also be compared with that of money, especially if the evaluative use of scientometrics is taken into account . Whenever the value of an article is expressed in its citation frequency, the citation is the unit of a “currency of science”. (p108)
In economic terms citation index entries are fungible – each has equal value within the index, and multiples of can be measured against each other in order to ascribe comparative value. Also, like a fiat currency, there are several “central banks” (ICI, Scopus) which create (and destroy! – not every journal stays on the list forever) citation index entries in response to demand and to policy needs such as the need to control inflation – there is a theoretical infinite number of potential entries, but these are controlled by alterations to the coverage of the index. But there are important ways in which the citation index economy does not function like a fiat currency.
The citation shares still another property with the signs of money and language: it can only function properly in the midst of other citations. Therefore, citations need to be mass-produced. A lone citation does not make sense. It derives its function mainly from its relations to other citations. In other words, it is self-referential. Whether one tries to map science or to evaluate it, one needs large amounts of citation data. (p109)
I’m unfairly picking up a very, very small theme in Wouters’ superb and comprehensive thesis here, but I believe it is an important one.
In Blaise Cronin’s “The Citation Process” (1984), he touches on the practice of science as a mechanism of exchange in glossing work by Merton, Storer and others.
The commodity which scientists traditionally exchange is knowledge or information, and in drawing on the intellectual property of their peers, scientists have to enter the exchange system and ’pay the going rate’, so to speak. The currency, to maintain the economic metaphor, is the ‘coin of recognition’. The exchange on which the social system hinges is
information for recognition. The formal record of these transactions is the scientific establishment’s traditional
ledger, the scholarly journal. The most common form of
‘currency’ is the citation. (p19-20)
Ledgers, with the advent of blockchain and other innovations in “fintech”, have become peculiarly fashionable in recent times. And indeed, it is possible to build a simple metaphorical model of scientific publishing as blockchain with aspects of a Ponzi scheme – with a distributed ledger (multiple journals) added to via proof of work (the writing and publication of a paper) hashed (rendered into academic language) in order that value is both realised and distributed in the form of the ‘coin of recognition’ to earlier contributors, with the promise that similar ‘coin’ will be forthcoming to the most recent participants.
This model almost continues to hold up if the monetary system around journals is taken into account. Doing good research, bluntly, is expensive and is becoming more so. In the majority of cases, one must then pay either to view or contribute to the journal – analogous to the contribution of electrical and processing power to blockchain creation. And – like blockchain – these costs, and the costs of conducting high quality underlying research, concentrate contributions into a smaller and smaller group of centres (universities) capable of getting a paper into a prestige journal as the proof of work becomes more arduous.
Of course, I should add that reputation economies have a literature of their own – though this year Cory Doctorow neatly described how much worse such an economy would be. And there is an appealing parallel to Google PageRank, and to the way that the SEO industry developed around this. Furthermore, Cameron Neylon has been mapping academic publishing to actual economic theory.
The black box(en)
Adding references adds to transparency – it’s why I’m adding all these links so you can see I am not making this all up. But citation indexing actually adds a couple of layers of opacity to our understanding of the scientific process.
The first is perhaps more a matter of obscurity than opacity – how do journals get selected to be on the index? Eugene Garfield would claim that this is the correct use of Journal Impact Factor (the number of index entries received by articles published in that journal during the two preceding years, divided by the total number of articles published in that journal during the two preceding years – which requires, circularly, the citaton index entries to happen in a journal that is already on the index!). But he cites a rather telling alternate explanation:
These days, both Web of Science and Scopus offer detailed criteria, predictably preserving this beautiful circularity alongside sundry cultural and language barriers. Basically, good journals are good because they are like other good journals.
The second issue is simply how a reference becomes a citation index entry.
A reference is a polysemous thing – yes, it reflects a link between one piece of scholarship and “something else”, but it also holds contextual information (where in the paper is it?), intention information (am I being nice or naughty?), normative social information (is it the key reference in the field I am writing?), specific social information (am I citing Martin Weller in the hope he’ll cite me?), transactional information (did Martin Weller cite me so now I have to cite him?)…
All of this meaning is somehow compressed down into a simple, interchangeable (remember fungibility?) unit that can be combined with others – and this collection of units now somehow tells me something about academic quality. Hoeffel (above) attempts to sidestep this oddness by claiming it is all a proxy for what we all already know, but this is a proxy (as pointed out by Wilsdon and others) that has a deadly power in false precision.
So what can be done?
There are projects, such as Open Citations (formerly funded by Jisc) that sought to graph and link (via RDF) citations in Open Access literature. More recently, we have seen experiments in Semantometrics (another Jisc funded project) that seek to recognise the value of context in referencing practice in developing new indicators. CrossRef and open identifiers like ORCID improve data quality and add back some of the contextual information for those who wish to explore it. And there are “improvements” to impact factor style metrics made all the time.
On the other hand, James Wilsdon’s report for HEFCE – “The Metric Tide” – brought home in clear and uncompromising terms the human cost of reliance on oversimplified research metrics. And efforts exist to transfer the existing mess that represents practice around citation metrics to other forms of referend, such as research data.
Open access to research literature is key to all of the efforts to gain control and understanding over citation metrics. By allowing (along with appropriate formatting!) any paper to be indexed, it forces open one of the two black boxes that have held such a sway over the past quarter-century of academic life.
The other, around the creation and destruction of meaning at the reference/index point of flux, requires rather more thought – and it will be difficult, if not impossible, to wean the educational superstructure off scientific-seeming measurements that add credence and objectivity to otherwise arbitrary decisions.
But what hath this to do with
In open education, we stand at the beginning of the discovery and “rating” revolution. Resources are becoming an academic achievement, and our only-too-human urge to see the maximum use made of our work are already leading to conversations about measuring reuse and “amazon style recommendations”.
We’re at the start. Academic citation indexes, and similar systems, are the end point. Proceed with caution.