You are currently browsing the category archive for the ‘libraries’ category.
The news, reported in an article by Marshall Breeding in American Libraries, that EBSCO has decided to support a new open source library services platform is a fascinating development. To join with Kuali OLE but to develop what will essentially be a different open source product is a big development for the library technology sector. It’s particularly interesting that EBSCO has gone the route of providing financial support to an open source system, rather than buying a library systems company. The scope and timescales are ambitious, to have something ready for 2018.
Open source library management systems haven’t have the impact that systems like Moodle have had in the virtual learning environment sector and in some ways it is odd that academic libraries haven’t been willing to adopt such a system, given that universities do seem to have an appetite for open source software. Maybe open source library systems products haven’t been developed sufficiently to compete with commercial providers. Software as a Service (SaaS) is coming to be accepted now by corporate IT departments as a standard method of service provision, something that I think a couple of the commercial providers realised at quite an early stage, so it is good to see this initiative recognising that reality. It will be interesting to see how this develops
Analytics seems to be a major theme of a lot of conferences at the moment. I’ve been following a couple of library sector conferences this week on twitter (Talis Insight http://go.talis.com/talis-insight-europe-2016-live #talisinsight and the 17th Distance Library Services Conference http://libguides.cmich.edu/dls2016 #dls16) and analytics seems to be a very common theme.
A colleague at the DLS conference tweeted a picture about the impact of a particular piece of practice and that set us off thinking, did we have that data?, did we have examples of where we’d done something similar? The good thing now is that I think rather than thinking ‘it would be good if we could do something like that’, we’ve a bit more confidence – if we get the examples and the data, we know we can do the analyses, but we also know we ‘should’ be doing the analyses as a matter of course.
It was also good to see that other colleagues (@DrBartRienties) at the university were presenting some of the University’s learning analytics work at Talis Insight. Being at a university that is undertaking a lot of academic work on learning analytics is both really helpful when you’re trying to look at library analytics but also provides a valuable source of advice and guidance in some of our explorations.
[As an aside, and having spent much of my library career in public libraries, I’m not sure how much academic librarians realise the value of being able to talk to academics in universities, to hear their talks, discuss their research or get their advice. In a lot of cases you’re able to talk with world-class researchers doing ground-breaking work and shaping the world around us.]
I was particularly interested in a term I came across in a blog post on innovation on the Nesta blog the other week. Innovation in the public sector: Is risk aversion a cause or a symptom? The blog post talks about Organisation Debt and Organisational Physics and is a really interesting take on why large organisations can struggle with innovation. It’s well worth a read. It starts with referencing the concept of ‘technical debt‘ described in the blog post as “… where quick fixes and shortcuts begin to accumulate over time and eventually, unless properly fixed, can damage operations.” It’s a term that tends to be related to software development but it started me thinking about how a concept of ‘technical debt’ might be relevant to the library world.
If we expand the technical debt concept to the library sector I’d suggest that you could look at at least three areas where that concept might have some resonance: library systems, library practices and maybe a third one around library ‘culture’ – potentially a combination of collections, services and something of the ‘tradition’ of what a library might be.
Our systems are a complex and complicated mix. Library management systems, E-resources management systems, discovery, openURL resolvers, link resolvers, PC booking systems etc etc It can be ten years or more between libraries changing their LMS and although, with Library Services Platforms, we are seeing some consolidation of systems into a single product, there is still a job to do of integrating legacy systems into the mix. For me the biggest area of ‘technical debt’ comes in our approach to linking and websites. Libraries typically spend significant effort in making links persistent, in coping with the transition from one web environment to the other by redirecting URLs. It’s not uncommon to have redirection processes in place to cope with direct links to content in previous websites and trying to connect users directly to replacement websites. Yet on the open web ‘link rot‘ is a simple fact of life. Trying to manage these legacy links is a significant technical debt that libraries carry I’d suggest.
I think you could point to several aspects of library practices that could fall under the category of technical debt but I’d suggest the primary one is in our library catalogue and cataloguing practices. Our practices change across the years but overall the quality of our older records are often lower than what we’d want to see. Yet we typically carry those records across from system to system. We try to improve them or clean them up, but frequently it’s hard to justify the resource being spent in ‘re-cataloguing’ or ‘retrospective cataloguing’. Newer approaches making use of collective knowledge bases and linking holdings to records has some impact on our ability to update our records, but the quality of some of the records in knowledge bases can sometimes also not be up to the level that libraries would like.
You could also describe some other aspects of the library world as showing the symptoms of technical debt. Our physical collections of print resources, increasingly unmanaged and often unused as constrained resources are directed to higher priorities, and more attention is spent on building online collections of ebooks for example. You even, potentially see a common thread with the whole concept of a ‘library’ – the popular view of a library as a place of books means that while libraries develop new services they often struggle to change their image to include the new world.
The end of 2015 and the start of 2016 seems to have delivered a number of interesting reports and presentations relevant to the library technology sphere. So Ken Chad’s latest paper ‘Rethinking the Library Services Platform‘ picks up on the lack of interoperability between library systems as does the new BiblioCommons report on the public library sector ‘Essential Digital Infrastructure for Public Libraries in England‘ commenting that “In retail, digital platforms with modular design have enabled quickly-evolving omnichannel user experiences. In libraries, however, the reliance on monolithic, locally-installed library IT has deadened innovation”.
As ‘Rethinking the Library Services Platform‘ notes, in many ways the term ‘platform’ doesn’t really match the reality of the current generation of library systems. They aren’t a platform in the same way as an operating system such as Windows or Android, they don’t operate in a way that third-parties can build applications to run on the platform. Yes, they offer integration to financial, student and reference management systems but essentially the systems are the traditional library management system reimagined for the cloud. Much of the changes are a consequence of what becomes possible with a cloud-based solution. So their features are shared knowledge bases, with multi-tenanted applications shared by many users as opposed to local databases and locally installed applications. The approach from the dwindling number of suppliers is to try to build as many products as possible to meet library needs. Sometimes that is by developing these products in-house (e.g. Leganto) and sometimes by the acquisition of companies with products that can be brought into the supplier’s eco-system. The acquisition model is exactly the same as that practised by both traditional and new technology companies as a way of building their reach. I’m starting to view the platform as much more in line with the approach that a company like Google will take with a broad range of products aiming to secure customer loyalty to their ecosystem rather than that of another company. So it may not be so surprising that technology innovation, which to my mind seems largely to be driven by vendors innovating to deliver to what they see as being library needs and shaped by what vendors think they see as an opportunity, isn’t delivering the sort of platform that is suggested. As Ken notes, Jisc’s LMS Change work discussed back in 2012 the sort of loosely-coupled, library systems component approach giving libraries to ability to integrate different elements to give the best fit to their needs from a range of options. But in my view options have very much narrowed since 2012/13.
The BiblioCommons report I find particularly interesting as it includes within it an assessment of how the format silos between print and electronic lead to a poor experience for users, in this case how ebook access simply doesn’t integrate into OPACs, with applications such as Overdrive being used that are separate to the OPAC, e.g. Buckinghamshire library services ebooks platform, and their library catalogue are typical. Few if any public libraries will have invested in the class of discovery systems now common in Higher Education (and essentially being proposed in this report), but even with discovery systems the integration of ebooks isn’t as seamless as we’d want, with users ending up in a variety of different platforms with their own interfaces and restrictions on what can be done with the ebook. In some ways though, the public library ebook offer, that does offer some integration with the consumer ebook world of Kindle ebooks, is better then the HE world of ebooks, even if the integration through discovery platforms in HE is better. What did intrigue me about the proposal from the BiblioCommons report is the plan to build some form of middleware system using ‘shared data standards and APIs’ and that leads to wondering whether that this can be part of the impetus for changing the way that library technology interoperates. The plan includes in section 10.3 the proposal to ‘deliver middleware, aggregation services and an initial complement of modular applications as a foundation for the ecosystem, to provide a viable pathway from the status quo towards open alternatives‘ so maybe this might start to make that sort of component-based platform and eco-system a reality.
Discovery is the challenge that Oxford’s ‘Resource Discovery @ The University of Oxford‘ report is tackling. The report by consultancy, Athenaeum 21, looks at discovery from the perspective of a world-leading research institution, with large collections of digital content and looks at connecting not just resources but researchers with visualisation tools of research networks, advanced search tools such as elastic search. The recommendations include activities described as ‘Mapping the landscape of things’, Mapping the landscape of people’, and ‘Supporting researchers established practices’. In some ways the problems being described echo the challenge faced in public libraries of finding better ways to connect users with content but on a different scale and includes other cultural sector institutions such as museums.
I also noticed a presentation from Keith Webster from Carnegie Mellon University ‘Leading the library of the future: W(h)ither technical services? This slidedeck takes you through a great summary of where academic libraries are now and the challenges they face with open access, pressure on library budgets and changes in scholarly practice. In a wide-ranging presention it covers the changes that led to the demise of chains like Borders and Tower records and sets the library into the context of changing models of media consumption. Of particular interest to me were the later slides about areas for development that, like the other reports, had improving discovery as part of the challenge. The slides clearly articulate the need for innovation as an essential element of work in libraries (measured for example as a % of time spent compared with routine activities) and also of the value of metrics around impact, something of particular interest in our current library data project.
Four different reports and across some different types of libraries and cultural institutions but all of which seem to me to be grappling with one issue – how do libraries reinvent themselves to maintain a role in the lives of their users when their traditional role is being erroded or when other challengers are out-competing with libraries – whether through improving discovery or by changing to stay relevant or by doing something different that will be valued by users.
We’re in the early stages of our work with library data and I thought I’d write up some reflections on the early stages. So far we’ve mostly confined ourselves to trying to understand the library data we have and suitable methods to access it and manipulate it. We’re interested in aggregations of data, e.g. by week, by month, by resource, in comparison with total student numbers etc.
One of our main sources of data is from ezproxy, which we use for both on and off-campus use of online library resources. Around 85-90% of our authenticated resource access goes through this system. One of the first things we learnt when we started investigating this data source is that there are two levels of logfile – the full log of all resource requests and the SPU (Starting Point URL) logfile. The latter tracks the first request to a domain in a session.
We looked at approaches that others had taken to help shape how we approached analysing the data. Wollongong for example, decided to analyse the time stamp as follows:
- The day is divided into 144 10-minute sessions
- If a student has an entry in the log during a 10-minute period, then 1/6 is added to the sum of that student’s access for that session (or week, in the case of the Marketing Cube).
- Any further log entries during that student’s 10-minute period are not counted.
Using this logic, UWL measures how long students spent using its electronic resources with a reasonable degree of accuracy due to small time periods (10 minutes) being measured.
Discovering the Impact of Library Use and Student Performance, Cox and Jantti 2012 http://er.educause.edu/articles/2012/7/discovering-the-impact-of-library-use-and-student-performance
To adopt this approach would mean that we’d need to be looking at the full log files to pick up each of the 10 minute sessions. Unfortunately owing to the size of our version of the full logs we found it wasn’t going to be feasible to use this approach, we’d have to use the SPU version and take a different approach.
A small proportion of our resource authentication goes through OpenAthens. Each month we get a logfile of resource accesses that have been authenticated using this route. Unlike ezproxy data we don’t get a date/timestamp, all we know is that those resources were accessed during the month. Against each resource/user combination you get a count of the number of times that combination occurred during the month.
Looking into the data one of the interesting things we’ve been able to identify is that OpenAthens authentication also gets used for other resources not just library resources, so for example we’re using it for some library tools such as RefWorks and Library Search, but it’s straight-forward to take those out if they aren’t wanted in your analysis.
So one of the things we’ve been looking at is how easy it is to add the Athens and Ezproxy data together. There are similarities between the datasets but some processing is needed to join them up. The ezproxy data can be aggregated at a monthly level and there are a few resources that we have access to via both routes so those resource names need to be normalised.
The biggest difference between the two datasets is that whereas you get a logfile entry for each SPU access in the ezproxy dataset you get a total per month for each user/resource combination in the OpenAthens data. One approach we’ve tried is just to duplicate the rows, so where the count says the resource/user combination appeared twice in the month, just copy the line. In that way the two sets of data are comparable and can be analysed together, so if you wanted to be able do a headcount of users who’ve accessed 1 or more library resources in a month you could include data from both ezproxy and openathens authenticated resources.
Numbers and counts
One thing we’ve found is that users of the data want several different counts of users and data from the library e-resources usage data. The sorts of questions we’ve had to think about so far include:
- What percentage of students have accessed a library resource in 2014-15? – (count of students who’ve accessed 1 or more library resources)
- What percentage of students have accessed library resources for modules starting in 2014? – a different question to the first one as students can be studying more than one module at a time
- How much use of library resources is made by the different Faculties?
- How many resources have students accessed – what’s the average per student, per module, per level
Those have raised a few interesting questions, including which student number do you take when calculating means? – the number at the start, at the end, or part-way through?
In the New Year we’ve more investigation and more data to tackle and should be able to start to join library data up with data that lets us explore correlations between library use, retention and student success.
I’m not sure how many people will be familar with the work of Oliver Postgate, and specifically of his stop-motion animation series, The Clangers. One of the characters in the series is Major Clanger, and he’s an inventor.
The character always comes to mind to me when thinking about development approaches as an example of a typical approach to user engagement. So the scene opens with a problem presenting itself. Major Clanger sees the problem and thinks he has an idea to solve it, so he then disappears off and shuts himself away in his cave. Cue lots of banging and noises as Major Clanger is busy inventing a device to solve ‘the problem’. Then comes the great unveilling of the invention, often accompanied by some bemusement from the other Clangers about what the new invention actually is, how it works and what it is supposed to do. Often the invention seems to turn out to not be quite what was wanted or has unforseen consequences. And that approach seems to me to characterise how we often ‘do’ development. We see a problem, we may ask users in a focus group or workshop to define their requirements, but then all too often we go and, like Major Clanger, build the product in complete isolation and then unveil it to users in what we describe as usability testing. And all too often they go ‘yeh, that’s not quite what we had in mind’ or ‘well that would have been good when we were doing X but now we want something else’.
So how do we break that circle and solve our users problems in a better development style that builds the products that users can and will use? That’s where I think that a more co-operative model of user engagement comes in. It starts with a different model of user engagement, where users are involved throughout the requirements, development and testing stages. And that’s an approach that we’ve started to call ‘co-design‘, and have piloted during our discovery research.
It starts with a Student Panel of students who agree to work with us in activities to improve library services. We recruit cohorts of a few dozen students with a committment to carry out several activities with us during a defined period. We outline the activity we are going to undertake and the approach we will take and make sure we have the necessary research/ethical approvals for the work.
For the discovery research we went through three stages:
- Requirements gathering – in this case testing a range of library search tools with a series of exercises based on typical user search activities. This helped to identify the typical features users wanted to see, or did not want to see. For example, at this stage, we were able to rule out using the ‘bento box’ results approach that has been popular at some other libraries
- Feature definition – a stage that allows you to investigate in detail some specific features – in our case we used wireframes of search box options and layouts and tested them with a number of Student panel members – ruling out tabbed search approaches and directing us much more towards a very simple search box without tabs or drop-downs. This stage lets you test a range of different features without the expense of code development, essentially letting you refine your requirements in more detail.
- Development cycles – this step took the form of a sequence of build and test cycles, creating a search interface from scratch using the requirements identified in stages one and two, and then refining it, testing specific new features and discarding or retaining them depending on user reactions. This involved working with a developer to build the site and then work through a series of development and test ‘sprints, testing features identified either in the early research or arising from each of the cycles.
These steps took us to a viable search interface and built up a pool of evidence that we used to setup and customise Primo Library Search. That work led to further stages in engagement with users as we went through a fourth stage of usability testing the interface and making further tweaks and adjustments in the light of user reactions. Importantly it’s an on-going process with a regular cycle of testing with users to continually improve the search tool. The latest testing is mainly around changes to introduce new corporate branding, but includes other updates that can be made to the setup or the CSS of the site in advance of new branding being applied.
The ‘co-design’ model also fits with a more evolutionary or incremental approach to website development and is a model that usability experts such as Nielsen Norman Group often recommend as users generally want a familiar design rather than a radical redesign. Continuous improvement systems typically expect incremental improvements as the preferred approach. Yet the ‘co-design’ model could equally be deployed for a complete site re-design, starting from scratch with a more radical design and structural changes and then using the incremental approach to refine them into a design that meets user needs and overcomes the likely level of resistence by users familar with the old site, by delivering an improved user experience to which users can quickly get comfortable with.
I was intrigued to see a couple of pieces of evidence that the number of words used in scholarly searches was showing a steady increase. Firstly Anurag Acharya from Google Scholar in a presentation at ALPSP back in September entitled “What Happens When Your Library is Worldwide & All Articles Are Easy to Find” (on YouTube) mentions an increase in the average query length to 4-5 words, and continuing to grow. He also reported that they were seeing multiple concepts and ideas in their search queries. He also mentions that unlike general Google searches, Google Scholar searches are mostly unique queries.
So I was really interested to see the publication of a set of search data from Swinburne University of Technology in Australia up on Tableau Public. https://public.tableau.com/profile/justin.kelly#!/vizhome/SwinburneLibrary-Homepagesearchanalysis/SwinburneLibrary-Homepagesearchanalysis The data covers search terms entered into their library website homepage search box at http://www.swinburne.edu.au/library/ which pushes searches to Primo, which is the same approach that we’ve taken. Included amongst the searches and search volumes was a chart showing the number of words per search growing steadily from between 3 and 4 in 2007 to over 5 in 2015, exactly the same sort of growth being seen by Google Scholar.
Across that time period we’ve seen the rise of discovery systems and new relevancy ranking algorithms. Maybe there is now an increasing expectation that systems can cope with more complex queries, or is it that users have learnt that systems need a more precise query? I know from feedback from our own users that they dislike the huge number of results that modern discovery systems can give them, the product of the much larger underlying knowledge bases and perhaps also the result of more ‘sophisticated’ querying techniques. Maybe the increased number of search terms is user reaction and an attempt to get a more refined set of results, or just a smaller set of results.
It’s also interesting for me to think that with discovery systems libraries have been trying to move towards ‘Google’-like search systems – single, simple search boxes, with relevancy ranking that surfaces the potentially most useful results at the top. Because this is what users were telling us that they wanted. But Google have noticed that users didn’t like to get millions of results, so they increasingly seem to hide the ‘long-tail’ of results. So libraries and discovery systems might be one step behind again?
So it’s area for us to look at our search queries to see if we have a similar pattern either in the searches that go through the search box on the homepage of the library website, or from the searches that go into our Discovery system. We’ve just got access to Primo Analytics using Oracle Business Intelligence and one of the reports covers popular searches back to the start of 2015. So looking at some of the data and excluding searches that seem to be ISBN searches or single letter searches and then restricting it down to queries that have been seen more than fifty times (which may well introduce its own bias) gives the following pattern of words in search queries:
Just under 31,000 searches, with one word searches being the most common and then a relatively straightforward sequence reducing the longer the search query. But with one spike around 8 words and with an overall average word length of 2.4 words per query. A lot lower than the examples from Swinburne or Google Scholar. Is it because it is a smaller set or incomplete, or because it concentrates on the queries seen more than 50 times? Are less frequently seen queries likely to be longer by definition? Some areas to investigate further
One of the pieces of work we’re just starting off in the team this year is to do some in-depth work on library data. In the past we’ve looked at activity data and how it can be used for personalised services (e.g. to build recommendations in the RISE project or more recently to support the OpenTree system), but in the last year we’ve been turning our attention to what the data can start to tell us about library use.
There have been a couple of activities that we’ve undertaken so far. We’ve provided some data to an institutional Learning Analytics project on the breakdown of library use of online resources for a dozen or so target modules. We’ve been able to take data from the EZproxy logfiles, and show the breakdown by student ID, by week and by resource over the nine-month life of the different modules. That has put library data alongside other data such as use of the Virtual Learning Environment and allowed module teams to look at how library use might relate to the other data.
A colleague has also been able to make use of some data combining library use and satisfaction survey data for a small number of modules, to shed a little light on whether satisfied students were making more use of the library than unsatisfied ones (obviously not a causal relationship – but initial indications seem to be that for some modules there does seem to be a pattern there).
Library Analytics roadmap
But these have been really early exploratory steps, so during last year we started to plan out a Library Analytics Roadmap to scope out the range of work we need to do. This covers not just data analysis, but also some infrastructural developments to help with improving access to data and some effort to build skills in the library. It is backed up with engagement with our institutional learning analytics projects and some work to articulate a strategy around library analytics. The idea being that the roadmap activities will help us change how we approach data, so we have the necessary skills and processes to be able to provide evidence of how library use relates to vital aspects such as student retention and achievement.
Library data project
We’re working on a definition of Library analytics as being about:
Using data about student engagement with library services and content to help institutions and students understand and improve library services to learners
Part of the roadmap activity this year is to start to carry out a more systematic investigation into library data, to match it against student achievement and retention data. The aim is to build an evidence base of case studies, based on quantitative data and some qualitative work we hope to do. Ideally we’d like to be able to follow the paths mapped out by the likes of Minnesota, Wollongong and Huddersfield in their various projects and demonstrate that there is a correlation between library use, student success and retention.
Challenges to address
We know that we’re going to need more data analysis skills, and some expertise from a statistician. We also have some challenges because of the nature of our institution. We won’t have library management system book loans, or details of visits to the library, we will mainly have to concentrate on use of online resources. So in some ways that simplifies things. But our model of study also throws up some challenges. With a traditional campus institution students study a degree over three or four years. There is a cohort of students that follow through year 1, 2, 3 etc and at the end of that period they do their exams and get their degree classification. So it is relatively straight-forward to see retention as being about students that return in year 2 and year 3, or don’t drop-out during the year, and to see success measured as their final degree classification. But with part-time distance learning, where although students sign up to a qualification, they still follow a pattern of modules and many will take longer than six years to complete, often with one of more ‘breaks’ in study, following a cohort across modules might be difficult. So we might have to concentrate on analysis at the ‘module’ level… but then that raises another question for us. Our students could be studying more than one module at a time so how do you easily know whether their library use relates to module A or module B? Lots of things to think about as we get into the detail.
We’ve been running Primo as our new Library Search discovery system since the end of April so it’s been ‘live’ for just over four months. Although it’s been a quieter time of year over the summer I thought it would be interesting to start to see what the analytics are saying about how Library Search is being used.
Some analytics are provided by the supplier in the form of click-through statistics and there are some interesting figures that come out of those. The majority of searches are ‘Basic searches’, some 85%. Only about 11% of searches use Advanced search. Advanced search isn’t offered against the Library Search box embedded into the home page of the library website but is offered next to the search box on the results page and on any subsequent search. It’s probably slightly less than I might have expected as it seemed to be fairly frequently mentioned as being used regularly on our previous search tool.
About 17% of searches lead to users refining their search using the facets. Refining the search using facets is something we are encouraging users to do, so that’s a figure we might want to see going up. Interestingly only 13% navigated to the next page in a set of search results using the forward arrow, suggesting that users overwhelmingly expect to see what they want on the first page of results. (I’ve a slight suspicion about this figure as the interface presents links to pages 2-5 as well as the arrow – which goes to pages 6 onwards – and I wonder if pages 2-5 are taken into account in the click-through figure).
Very few searches (0.5% of searches) led users to use the bX recommendations, despite this being in a prominent place on the page. The ‘Did you mean’ prompt also seemed to have been used in 1% of searches. The bookshelf feature ‘add to e-shelf’is used in about 2% of searches.
75% of traffic comes from Windows computers with 15% from Macintoshes. There’s a similar amount of traffic from tablets to what we see on our main library website, with tablet traffic running at about 6.6% but mobile traffic is a bit lower at just under 4%.
Devices using library search seem pretty much in line with traffic to other library websites. There’s less mobile phone use but possibly that is because Primo isn’t particularly well-optimised for mobile devices and also maybe something to test with users whether they are all that interested in searching library discovery systems through mobile phones.
I’m not so surprised that basic search is used much more than advanced search. It matches the expectations from the student research of a ‘google-like’ simple search box. The data seems to suggest that users expect to find results that are relevant on page one and not go much further, something again to test with users ‘Are they getting what they want’. Perhaps I’m not too surprised that the ‘recommender’ suggestions are not being used but it implies that having them at the top of the page might be taking up important space that could be used for something more useful to users. Some interesting pointers about things to follow up in research and testing with users.