You are currently browsing the monthly archive for March 2016.
One of the great things about new projects is that they offer the opportunity to learn new skills as well as build on existing knowledge. So our new library data project is giving plenty of opportunities to learn new things and new tools to help with data extraction and data analysis.
After a bit of experimentation about the best method of getting extracts of library data (including trying to doing it through Access) we settled on using MySQL Workbench version 6.3 with read-only access to the database tables storing the library data. It’s been a bit of a learning curve to understand the tool, the SQL syntax and the structure of our data but direct access to the data means that the team can extract the data needed and quickly test out different options or extracts of data. In the past I’ve mainly used tools such as Cognos or Oracle Business Inteligence which essentially hide the raw SQL queries behind a WYSIWYG interface, so it’s been interesting to use this approach. It’s been really useful to be learning the tool with the project team, because it means that I can get SQL queries checked to make sure they are doing what I think they are doing, and to share queries across the team.
In the main I’m running the SQL query and checking that I’ve got the data I want but then exporting the data as .csv to do further data tidying and cleaning in MS Excel. But I have learnt a few useful things including how add in an anonymised ID as part of the query (useful if you don’t need the real ID but just need to know which users are unique and much easier to do in SQL than in Excel).
I’ve certainly learnt a lot more about Excel. It’s been the tool that I’ve used to process the data extracts, to join data together from other sources and (for the time being at least) to present tables and visualisations of the data. Filtering and pivot tables have been the main techniques, with frequent use of pivot tables to filter data and provide counts. Features such as Excel 2013’s pivot table ‘distinct count’ have been useful.
One of the tasks I’ve been doing in Excel is to join two data sources together, e.g. joining counts of library use via ezproxy and by athens, or joining library use with data on student results. I’d started mainly using VLOOKUP in Excel but have switched (on the recommendation of a colleague) to using INDEX/MATCH as it seems to work much better (if you can get the syntax exactly right.
The project team is starting to think that as we learn more about SQL that we try to do more of the data manipulation and counts directly through the SQL queries as doing them in Excel can be really time-consuming.
SPSS has been a completely new tool to me. We’re using IBM SPSS Statistics version 21 to carry out the statistical analyses. Again it’s got a steep learning curve and I’m finding I need frequent recourse to some of the walk-throughs on sites such as Laerd statistics e.g. https://statistics.laerd.com/spss-tutorials/one-way-anova-using-spss-statistics.php But I’m slowly getting to grips with it and as I get more familiar with it I can start to see more of the value in it. Once you’ve got the data into the data table and organised properly it’s really quick to run correlation or variance tests, although it quickly starts to flag up queries about, which test to use and why, and what do the results mean? I particularly like the output window that it uses to track all the actions and show any charts you’ve created or analyses you’ve undertaken on the data.
The team is in the early stages of exploring the SAS system that is used for our institutional data warehouse. Ultimately we’d want to get library use data into the institutional data warehouse and then query it alongside other institutional data directly from the warehouse. SAS apparently has statistical analysis capabilities but the learning curve seems to be fairly high. We’ve also thought about whether tools such as Open Refine might be useful for cleaning up data but haven’t been able to explore that yet. Similarly I know we have a need for tools to present and visualise the data findings – ultimately that might be met by an institutional SAS Visual Analytics tool.
I was particularly interested in a term I came across in a blog post on innovation on the Nesta blog the other week. Innovation in the public sector: Is risk aversion a cause or a symptom? The blog post talks about Organisation Debt and Organisational Physics and is a really interesting take on why large organisations can struggle with innovation. It’s well worth a read. It starts with referencing the concept of ‘technical debt‘ described in the blog post as “… where quick fixes and shortcuts begin to accumulate over time and eventually, unless properly fixed, can damage operations.” It’s a term that tends to be related to software development but it started me thinking about how a concept of ‘technical debt’ might be relevant to the library world.
If we expand the technical debt concept to the library sector I’d suggest that you could look at at least three areas where that concept might have some resonance: library systems, library practices and maybe a third one around library ‘culture’ – potentially a combination of collections, services and something of the ‘tradition’ of what a library might be.
Our systems are a complex and complicated mix. Library management systems, E-resources management systems, discovery, openURL resolvers, link resolvers, PC booking systems etc etc It can be ten years or more between libraries changing their LMS and although, with Library Services Platforms, we are seeing some consolidation of systems into a single product, there is still a job to do of integrating legacy systems into the mix. For me the biggest area of ‘technical debt’ comes in our approach to linking and websites. Libraries typically spend significant effort in making links persistent, in coping with the transition from one web environment to the other by redirecting URLs. It’s not uncommon to have redirection processes in place to cope with direct links to content in previous websites and trying to connect users directly to replacement websites. Yet on the open web ‘link rot‘ is a simple fact of life. Trying to manage these legacy links is a significant technical debt that libraries carry I’d suggest.
I think you could point to several aspects of library practices that could fall under the category of technical debt but I’d suggest the primary one is in our library catalogue and cataloguing practices. Our practices change across the years but overall the quality of our older records are often lower than what we’d want to see. Yet we typically carry those records across from system to system. We try to improve them or clean them up, but frequently it’s hard to justify the resource being spent in ‘re-cataloguing’ or ‘retrospective cataloguing’. Newer approaches making use of collective knowledge bases and linking holdings to records has some impact on our ability to update our records, but the quality of some of the records in knowledge bases can sometimes also not be up to the level that libraries would like.
You could also describe some other aspects of the library world as showing the symptoms of technical debt. Our physical collections of print resources, increasingly unmanaged and often unused as constrained resources are directed to higher priorities, and more attention is spent on building online collections of ebooks for example. You even, potentially see a common thread with the whole concept of a ‘library’ – the popular view of a library as a place of books means that while libraries develop new services they often struggle to change their image to include the new world.
One of the first projects I worked on at the OU was a Jisc-funded project called Telstar. Telstar built a reference management tool, called MyReferences, integrating RefWorks into a Moodle Virtual Learning Environment (VLE). Well, that MyReferences tool shortly reaches, what the software people call ‘End-of-Life’, and the website world like to refer to as ‘Sunsetting’, in other words, MyReferences is closing down later this month. So it seemed like a good time to reflect on some of the things I’ve learnt from that piece of work.
In a lot of ways several things that Telstar and MyReferences did have now become commonplace and routine. References were stored remotely in the RefWorks platform (we’d now describe that as cloud-hosted) and that’s almost become a default way of operating whether you think of email with Outlook365 or library management systems such as ExLibris Alma. Integration with moodle was achieved using an API, again, that’s now a standard approach. But both seemed quite a new departure in 2010.
I remember it being a complex project in lots of ways, creating integrations not just between RefWorks and Moodle but also making use of some of the OpenURL capabilities of SFX. It was also quite ambitious in aiming to provide solutions applicable to both students and staff. Remit (the Reference Management Integration Toolkit) gives a good indication of some of the complexities not just in systems but also in institutional and reference management processes. The project not only ran a couple of successful Innovations in Reference Management events but led to the setup of a JiscMail reading list systems mailing list.
Complexity is the main word that comes to mind when thinking about some of the detailed work that went into mapping reference management styles between OU Harvard in RefWorks and MyReferences to ensure that students could get a simplified reference management system in MyReferences without having to plunge straight into the complexity of full-blown RefWorks. It really flagged for me the implications of not having standard referencing styles across an institution but also the impact of not adopting a standard style already well supported but of designing your own custom institutional style. One of the drawbacks of using RefWorks as a resource list system was that each reference in each folder was a separate entity meaning that any changes in a resource (name for example) had to be updated in every list/folder. So it taught us quite a bit about what we ideally wanted from a resource list management/link management system.
Reference management has changed massively in the past few years with web-based tools such as Zotero, Refme and Mendeley becoming more common, and Microsoft Office providing support for reference management. So the need to provide institutional systems maybe has passed when so many are available on the web. And I think it reflects how any tool or product has a lifecycle of development, adoption, use and retirement. Maybe that cycle is now much shorter than it would have been in the past.
A tweet from @aarontay refering to an infodocket article flagged up the relaunch of Microsoft Academic Search. Badged as a preview of Microsoft Academic from Microsoft Research on their site it says that it now includes ‘over 80 million publications’ and makes use of semantic search, as well as including an API, the Academic Knowledge API. Trying it out briefly with a subject that’s a current hot topic and it quickly brings back good and relevant results. It sorts by relevance as a default but offer a re-sort by Newest, Oldest or Most citations. Sorting by newest brings back some up to date results from 2016 so it seems to have up-to-date content. Sorting by citations brings back results in citation order but also shows some form of subject tagging but it isn’t clear how they are arrived at?
Where there are results with the full text available there are links or links to PDFs for what are presumably open sources. There’s a fairly extensive list of facets for Year, Authors, Affiliations, Fields of Study, Journals and Conference proceedings. Some of the links seem to go to Deepdyve.com which offers a subscription service to academic content, others seem to make use of DOI’s linking to sites such as the Wiley Online Library. When you filter down to individual journals you get a block appearing on the right with details of the journal and you can drill down to links to the journal.
Results appear quickly and look to be relevant. What doesn’t seem to be there at the moment is an ability to be able to configure your local ezproxy-type settings to link through to resources that you have access to through an institutional subscription and I wonder if it will offer the ability for institutions to publish their subscriptions/holdings as they can with Google Scholar. It will be interesting to see how this develops and what the long-term plans are. But it might be a useful alternative to Google Scholar.