It’s often been suggested that many, if not all, edtech folks secretly want to be a librarian. No bad thing I might add. It starts with a bit of a dabble in ontologies and controlled vocabularies, but pretty soon people start wanting their own repository. And we all know where that leads…
… if I’m me, reading this in the future (1) stop reading your own blog and go and issue some books, (2) this is the post where it all started to go wrong for you…
I’ve spent a lot of time on here recently laying into various reports linked to higher education, but today I want to point to something important that will most likely be overlooked.
How do you describe what you teach? How do people feel sure that what you teach is the same as what is taught in a similarly named course or module at the university down the road?
The Higher Education Data and Information Improvement Programme (HEDIIP), is based at HESA (the UK HE statistics people) and overseen by the
Better Regulation GroupRegulatory Partnership Group (cheers @AndyYouell at HESA). It has been charged with a body of work building on the emphasis in 2011’s “Students At The Heart Of The System” BIS white paper on providing accurate information on higher education.
As no-one but Andy McGettigan will recall, these groups were asked to “to redesign the information landscape” in paragraph 6.22 of said document.
They kicked of in earth shattering style by publishing, this month (July 2013) a thorough examination of JACS (the Joint Academic Coding System). JACS is, fundamentally, a semi-hierarchical near-comprehensive list of subject areas taught at universities and colleges in the UK. You’ll have noticed that that last statement contains more qualifications than any MOOC ever – this is because, although the aims and intent of JACS is laudable, it has a couple of very serious issues.
As a primer for the uninitiated, JACS is a method of accurately defining an academic subject, being a short code consisting of one letter followed by three numbers. So:
- Ú997 might be “Subject Coding LOLs”, with
- Ú990 being “Higher Education Policy Data LOLs”
- Ú900 being “Higher Education LOLs”, and
- Ú being LOLs.
So, ideally a properly coded data return could be drilled down to the appropriate level for analysis. Alas, as so often in metadata, the reality falls far short of this noble dream.
Firstly, not many people actually want to use it, and those that do use it use it inconsistently. The main user is HESA, who manage it so kind of have to. UCAS initially wanted to use it but don’t, ending up inventing their own alternative method also involving letters and/or numbers. HEFCE sometimes use it, but not always (eg price groups for teaching funding do not map cleanly to JACS). Some institutions use it for internal purposes, some don’t.
The HEDIIP report describes a lovely test where institutional staff were asked to provide a JACS code for a module “‘Statistics for Archaeologists” when taught by (a) Mathematics staff and (b) Archaeology staff. HESA guidance suggested that “Applied Statistics” (G310) was the correct code in both cases, but only 10 % of respondents gave that answer. (the most common answers were G310 for a Mathematics delivery and V460-Archaeological Techniques for an archaeological delivery).
(When I was at HEFCE we used it for an internal analysis of the CETLs – remember them? – so I know first hand the vagaries of applying these codes to specific subject areas with any degree of consistency)
“‘A respondent to the review survey cited one particularly anomalous example from the Unistats website where their BA Fashion Marketing course returns satisfaction statistics from Materials & Minerals Technology and work and/or further study statistics from Engineering & Technology. Without going into the detail of whether this is down to the institution’s own coding or the KIS algorithms, it is clear that such data is of little use (and indeed misleading) to potential students and that the current classification system is not lending itself to enabling clarity.” (p25)
Of course, a system where:
“A particular module of study will:
- have a main subject of study (identified by one or more JACS codes);
- belong to a cost centre in the HESA record (although this attribute derives from the teacher see below);
- be taught by one or more members of staff who are assigned to a cost centre and belong to a (or possibly more than one) ‘home’ department (or school);
- be ‘owned’ by a particular department (or school or other organisational unit which may itself have multiple cost centres or which may indeed be part of a larger cost centre);
- belong to one or more courses or programmes of study (also identified by one or more JACS codes) which is also ‘owned’ by a particular department (or school) which may be different to the teaching department (p23)
is bound to contain many such coding errors and unsafe generalisations when only the higher levels of JACS are used.
The second major issue is that, in an IPv4-like moment, we are running out of codes for emerging subjects. As academic subjects develop, largely in areas of new knowledge and on the boundaries of cognate subjects, codes must be assigned to them.
The last major revision to JACS was version 3.0, which applies to student entry for 2012. It included changes and additions in a range of subject areas, most notably the addition of a new “I” group for Computer Science, splitting from the former Mathematical and Computer Sciences (group “G”). As the report notes:
“This permitted some expansion in Mathematics (without any reuse or renaming of codes which is poor data management practice and, hence, explicitly prohibited in JACS) whilst supporting the larger of the growth areas. This has solved an immediate problem but the capacity for this type of amendment is extremely limited and there are other subject areas which will ultimately run out of codes.” (p9)
Again – codes, especially in occasionally lab-based subject like computing, can mean the difference between securing HEFCE funding and being paid for by fees only. This might seem like a data nerd issue but it can mean the difference between a course running and staff being laid off.
Given that (rightly or wrongly) there is an increased emphasis on data quality and data availability in Higher Education, it is clear that “somebody should do something”. HEDIIP makes a number of rational and practicable recommendations:
- the new framework should recognise the, currently implicit, assumption (at least by HEIs) that JACS is a discipline-based classification;
- in developing the new framework the Higher Education Academy’s discipline areas should be considered as a starting point;
- the new framework should consist of three rather than four levels;
- the new framework should be based on a six digit coding structure;
- the new framework should provide a persistent URI (uniform resource identifier) for each of the entities
- in the classification;
- the authoritative URIs should be developed and maintained as a web service for the sector;
- the new framework should be explicitly assigned an open licence;
- a project to implement the new coding framework will require a clear and targeted strand of communications activity.”
Proper data-nerds will feel the thrill of persistent URIs leading to a linked data approach and an open license, which would be a very sensible way of allowing simpler and more distributed analysis of information linked to funding and league tables. One of the worst things (from a very strong field of worst things indeed) to come out of the Browne/Willets funding model has been the increasing politicisation of educational data, and anything that can act as a brake on this can only be welcomed.
But these recommendations – notably – do not do anything to support consistent use of a new set of codes, and to avoid the systemic “gaming” of data to maximise funding returns and student recruitment. In this matter, sadly, it is the system – not the tools such as JACS or whatever follows – that are at fault.