Opening Pandora's Box - Exploring the Key Information Set


The number one question from today's prospective students - if I come to your institution to study, how likely is that to lead to a decent job afterwards?  And anyway, what courses are likely to be the most lucrative for me, and which ones should I avoid like the plague?  Thanks to an initiative called the Key Information Set (KIS), all the data you need to answer these and similar questions for yourself is now openly available.  Let's take a look...

The Open Data Strategy from the UK's Department for Business Innovation and Skills says:

The Key Information Sets will help applicants to find quickly, and compare easily, the headline items which students consider most important.  The content includes:  
Course information 
 Student satisfaction
 Proportion of time spent in different learning and teaching activities
 Different assessment methods used
 Professional bodies that recognise the course  
Costs 
 Accommodation costs
 Tuition charges
 Bursaries, scholarships and other financial support  
Employment 
 Destinations of students six months after completing their course
 Proportion of students employed in a full-time ‘graduate’ job six months after completing course
 Salary for course six months after graduating
 Salary for that subject across all institutions six months after graduating
 Salary for that subject across all institutions forty months after graduating  
The students’ union
 Impact students’ union has had on time as a student

Open By Default

I've blogged recently about open data and the sector's move towards being open by default, and KIS is a great example of this principle at work.  The underlying KIS data is available in CSV format (suitable for importing into the likes of Excel, LibreOffice or Google Docs) and as structured XML data for more advanced processing.  You can also query a RESTful API operated as part of the Unistats service.  Grab your copy of KIS from the open data section of the Unistats website.

The KIS data is made available under the provisions of something called the Open Government License, as shown in the figure below.  You're going to be seeing a lot more of this as more and more public sector data is "opened" for everyone to access.  A good example of this is the JISC Open Course Data Programme, which is opening up course marketing information using the XCRI-CAP standard and the OGL.  See our Open Lboro blog for more information about this.
Open Government License

Opening Pandora's Box

But there is a cost to opening up data - and this is where things start to get complicated, messy and expensive.  Part of this is the redacting or anonymizing of data containing say personal information that was never expected to become public.  Another part is the cost of developing and maintaining facilities such as the Unistats RESTful API, and more generally the infrastructure used to serve up the information that you are making open.

This is a briar patch that most institutions are just starting to venture into through exploring Open Access to research publications and research data, or through Open Educational Resources.  Where we are starting up a project it's relatively easy to be open by default, and then the habit has set in.  However, it probably isn't realistic or useful to aim to open up anything and everything from the larger body of pre-existing information.  A pragmatic approach would be to mix a central (funder's) mandate for key data sets or classes of data with opening up specific data on request.

Going back to KIS...  Now that we have KIS as open data, we can go beyond the use cases that services like Unistats have already been constructed around, and start to mash up the KIS data with other information.  One example that I find particularly interesting is the mapping between Learn Direct Classification System (LDCS) codes and the UCAS Joint Academic Coding SYSTEM (JACS).  LDCS codes are used to classify Further Education courses, and JACS for Higher Education courses.  JACS codes are also part of KIS, which potentially gives students going into FE a way of mapping out their optimum path through both FE and HE in terms of outcomes recorded in KIS.

And why is this like opening Pandora's Box?  Once it becomes trivial to compare courses and institutions through KIS, market forces will start to intervene in earnest.  The "weaker" courses and institutions will a) find it more difficult to recruit, and b) struggle to justify charging £9K per annum for tuition fees.  KIS "widgets" have to be included on University prospectus sites, so that prospective students have all the facts at their disposal when making key decisions, e.g.

Key Information Set widget for Aeronautical Engineering on Loughborough website

Mash It Up and Start Again

One could say that KIS includes very little that was not already generally available.  However, two factors set KIS apart from the likes of the Destinations of Leavers in Higher Education (DLHE) survey - open (i.e. free) access to the dataset, and the requirement that the KIS information be displayed prominently on University websites.

Unistats may be all about course comparison, but open access to the underlying data potentially lets everyone trivially ask questions like "which Universities have the most unemployed graduates?" and "which Universities' students have the lowest starting salaries?"  In an increasingly market driven version of Higher Education, the answers to these questions could well determine the fate of an institution, as prospective students flock towards the more "successful" Universities.

But let's not forget how the story of Pandora's Box ends...
Pandora who felt all was lost sadly opened the box. A beautiful sprite with gossamer wings flew shimmering into the sunlight. Round and round her body the creature flew lighting only when a sore was encountered. As the creature touched the hurt -- it was gone. When Pandora had been healed completely the creature flew to heal Epimetheus. Pandora sat back against the box and thought. Hope, she was certain that was the creature's name continued her healing. 
In time the sprite flew back and rested exhausted on Pandora's shoulder. Pandora watched as the creature drifted painlessly into her flesh and took up residence in her heart. She knew she had been given the gift that, even though it could not erase the pain she had brought to the world, could make that pain easier. 
She smiled a soft smile for knowing there is hope, and hope is sometimes enough.
 :-)



Postscript

Thanks to Owen Boswarva for alerting me to the fact that the KIS dataset is offered by HESA under an open government license, rather than the Open Government License.  This is unfortunate.  As Nigel Shadbolt says on the data.gov.uk blog, the point of the Open Government License was to come up with a single licensing regime that...
anyone in the wider public sector can use [...] whether they are local authorities, police forces, universities, or hospitals. The Local Data Panel which I chair is recommending that all local authorities use the Open Government Licence when publishing their data to make themselves more accountable and open to taxpayers. In the past, licence variations were a significant barrier to data publication.
The key elements of the KIS license draw upon the OGL, but the document includes a significant amount of additional text, notably:
Your access and use of the Unistats Dataset is subject to your acceptance of, and compliance with, the provisions of the terms and conditions below, HESA's Website Privacy Policy set out at http://www.hesa.ac.u/privacy and any other legal notices and/or instructions which may appear on http://www.hesa.ac.uk/unistatsdata ("this Website") from time to time (together the "Terms of Use").
Whilst the KIS open data download is available directly to anyone who knows the URL (http://www.hesa.ac.uk/unistats_file.php), this is not advertised anywhere and visitors to the Unistats site are directed to accept the terms and conditions of the KIS license on the HESA site before clicking through to the data.  It is debatable whether this goes against the letter of the Open Government License, but it is certainly not in the spirit of the OGL.  As the OGL FAQ says:
The Open Government Licence is an implied licence. By using information made available under the licence you indicate that you have accepted its terms and conditions.
Let's hope (as Pandora might) that this confusion about licensing is quickly resolved and the KIS data moved unequivocally to OGL, with no click-through requirement!