You are currently browsing the category archive for the ‘User Activity Data’ category.
Analytics seems to be a major theme of a lot of conferences at the moment. I’ve been following a couple of library sector conferences this week on twitter (Talis Insight http://go.talis.com/talis-insight-europe-2016-live #talisinsight and the 17th Distance Library Services Conference http://libguides.cmich.edu/dls2016 #dls16) and analytics seems to be a very common theme.
A colleague at the DLS conference tweeted a picture about the impact of a particular piece of practice and that set us off thinking, did we have that data?, did we have examples of where we’d done something similar? The good thing now is that I think rather than thinking ‘it would be good if we could do something like that’, we’ve a bit more confidence – if we get the examples and the data, we know we can do the analyses, but we also know we ‘should’ be doing the analyses as a matter of course.
It was also good to see that other colleagues (@DrBartRienties) at the university were presenting some of the University’s learning analytics work at Talis Insight. Being at a university that is undertaking a lot of academic work on learning analytics is both really helpful when you’re trying to look at library analytics but also provides a valuable source of advice and guidance in some of our explorations.
[As an aside, and having spent much of my library career in public libraries, I’m not sure how much academic librarians realise the value of being able to talk to academics in universities, to hear their talks, discuss their research or get their advice. In a lot of cases you’re able to talk with world-class researchers doing ground-breaking work and shaping the world around us.]
We’re in the early stages of our work with library data and I thought I’d write up some reflections on the early stages. So far we’ve mostly confined ourselves to trying to understand the library data we have and suitable methods to access it and manipulate it. We’re interested in aggregations of data, e.g. by week, by month, by resource, in comparison with total student numbers etc.
One of our main sources of data is from ezproxy, which we use for both on and off-campus use of online library resources. Around 85-90% of our authenticated resource access goes through this system. One of the first things we learnt when we started investigating this data source is that there are two levels of logfile – the full log of all resource requests and the SPU (Starting Point URL) logfile. The latter tracks the first request to a domain in a session.
We looked at approaches that others had taken to help shape how we approached analysing the data. Wollongong for example, decided to analyse the time stamp as follows:
- The day is divided into 144 10-minute sessions
- If a student has an entry in the log during a 10-minute period, then 1/6 is added to the sum of that student’s access for that session (or week, in the case of the Marketing Cube).
- Any further log entries during that student’s 10-minute period are not counted.
Using this logic, UWL measures how long students spent using its electronic resources with a reasonable degree of accuracy due to small time periods (10 minutes) being measured.
Discovering the Impact of Library Use and Student Performance, Cox and Jantti 2012 http://er.educause.edu/articles/2012/7/discovering-the-impact-of-library-use-and-student-performance
To adopt this approach would mean that we’d need to be looking at the full log files to pick up each of the 10 minute sessions. Unfortunately owing to the size of our version of the full logs we found it wasn’t going to be feasible to use this approach, we’d have to use the SPU version and take a different approach.
A small proportion of our resource authentication goes through OpenAthens. Each month we get a logfile of resource accesses that have been authenticated using this route. Unlike ezproxy data we don’t get a date/timestamp, all we know is that those resources were accessed during the month. Against each resource/user combination you get a count of the number of times that combination occurred during the month.
Looking into the data one of the interesting things we’ve been able to identify is that OpenAthens authentication also gets used for other resources not just library resources, so for example we’re using it for some library tools such as RefWorks and Library Search, but it’s straight-forward to take those out if they aren’t wanted in your analysis.
So one of the things we’ve been looking at is how easy it is to add the Athens and Ezproxy data together. There are similarities between the datasets but some processing is needed to join them up. The ezproxy data can be aggregated at a monthly level and there are a few resources that we have access to via both routes so those resource names need to be normalised.
The biggest difference between the two datasets is that whereas you get a logfile entry for each SPU access in the ezproxy dataset you get a total per month for each user/resource combination in the OpenAthens data. One approach we’ve tried is just to duplicate the rows, so where the count says the resource/user combination appeared twice in the month, just copy the line. In that way the two sets of data are comparable and can be analysed together, so if you wanted to be able do a headcount of users who’ve accessed 1 or more library resources in a month you could include data from both ezproxy and openathens authenticated resources.
Numbers and counts
One thing we’ve found is that users of the data want several different counts of users and data from the library e-resources usage data. The sorts of questions we’ve had to think about so far include:
- What percentage of students have accessed a library resource in 2014-15? – (count of students who’ve accessed 1 or more library resources)
- What percentage of students have accessed library resources for modules starting in 2014? – a different question to the first one as students can be studying more than one module at a time
- How much use of library resources is made by the different Faculties?
- How many resources have students accessed – what’s the average per student, per module, per level
Those have raised a few interesting questions, including which student number do you take when calculating means? – the number at the start, at the end, or part-way through?
In the New Year we’ve more investigation and more data to tackle and should be able to start to join library data up with data that lets us explore correlations between library use, retention and student success.
At the end of November I was at a different sort of conference to the ones I normally get to attend. This one, Design4learning was held at the OU in Milton Keynes, but was a more general education conference. Described as “The Conference aims to advance the understanding and application of blended learning, design4learning and learning analytics ” Design4learning covered topics such as MOOCs, elearning, learning design and learning analytics.
There were a useful series of presentations at the conference and several of them are available from the conference website. We’d put together a poster for the conference talking about the work we’ve started to do in the library on ‘library analytics’ – entitled ‘Learning Analytics – exploring the value of library data and it was good to talk to a few non-library people about the wealth of data that libraries capture and how that can contribute to the institutional picture of learning analytics.
Our poster covered some of the exploration that we’ve been doing, mainly with online resource usage from our EZProxy logfiles. In some cases we’ve been able to join that data with demographic and other data from surveys to start to look in a very small way at patterns of online library use.
The poster also highlighted the range of data that libraries capture and the sorts of questions that could be asked and potentially answered. It also flagged up the leading-edge work by projects such as Huddersfield’s Library Impact Data Project and the work of the Jisc Lamp project.
An interesting conference and an opportunity to talk with a different group of people about the potential of library data.
To Birmingham at the start of last week for the latest Jisc Library Analytics and Metrics Project (http://jisclamp.mimas.ac.uk/) Community Advisory and Planning group meeting. This was a chance to catchup with both the latest progress and also the latest thinking about how this library analytics and metrics work will develop.
At a time when learning analytics is a hot topic it’s highly relevant to libraries to consider how they might respond to the challenges of learning analytics. [The 2014 Horizon report has learning analytics in the category of one year or less to adoption and describes it as ‘data analysis to inform decisions made on every tier of the education system, leveraging student data to deliver personalized learning, enable adaptive pedagogies and practices, and identify learning issues in time for them to be solved.’
LAMP is looking at library usage data of the sort that libraries collect routinely (loans, gate counts, eresource usage) but combines it with course, demographic and achievement data to allow libraries to start to be able to analyse and identify trends and themes from the data.
LAMP will build a tool to store and analyse data and is already working with some pilot institutions to design and fine-tune the tool. We got to see some of the work so far and input into some of the wireframes and concepts, as well as hear about some of the plans for the next few months.
The day was also the chance to hear from the developers of a reference management tool called RefMe (www.refme.com). This referencing tool is aimed at students who often struggle with the typically complex requirements of referencing styles and tools. To hear about one-click referencing, with thousands of styles and with features to intergrate with MS Word, or to scan in a barcode and reference a book, was really good. RefMe is available as an iOS or Android app and as a desktop version. As someone who’s spent a fair amount of time wrestling with the complexities of referencing in projects that have tried to get simple referencing tools in front of students it is really good to see a start-up tackling this area.
It was Lorcan Dempsey who I believe coined the term, ‘Full library discovery’ in a blog post last year. As a stage beyond ‘full collection discovery’, ‘full library discovery’ added in results drawn from LibGuides or library websites, alongside resource material from collections. So for example a search for psychology might include psychology resources, as well as help materials for those pyschology resources and contact details about the subject librarian that covers psychology. Stanford and Michigan are two examples of that approach, combining lists of relevant resources with website results.
Princeton’s new All search feature offers a similar approach, discussed in detail on their FAQ. This combines results from their Books+, Articles+, Databases, Library Website and Library Guides into a ‘bento box’ style results display. Princeton’s approach is similar to the search from North Carolina State University who I think were about the first to come up with this style.
Although in most of these cases I suspect that the underlying systems are quite different the approach is very similar. It has the advantage of being a ‘loosely-coupled’ approach where your search results page is drawn together in a ‘federated’ search method by pushing your search terms to several different systems, making use of APIs and then displaying the results in a dashboard-style layout. It has the advantage that changes to any of the underlying systems can be accommodated relatively easily, yet the display to your users stays consistent.
For me the disadvantages for this are in the lack of any overriding relevancy ranking for the material and that it perpetuates the ‘silo’ing’ of content to an extent (Books, Articles, Databases etc) which is driven largely by the underlying silos of systems that we rely on to manage that content. I’ve never been entirely convinced that users understand the distinction about what a ‘database’ might be. But the approach is probably as good as we can get until we get to truly unified resource management and more control over relevancy ranking.
Going beyond ‘full library discovery’
But ‘full library discovery’ is still very much a ‘passive’ search tool, and by that I mean that it isn’t personalised or ‘active’. At some stage to use those resources a student will be logging in to that system and that opens up an important question for me. Once you know who the user is, ‘how far should you go to provide a personalised search experience?’. You know who they are, so you could provide recommendations based on what other students studying their course have looked at (or borrowed), you might even stray into ‘learning analytics’ territory and know what the resources were that the highest achieving students looked at.
You might know what resources are on the reading list for the course that student is studying – so do you search those resources first and offer those up as they might be most relevant? You might even know what stage a student has got to in their studies and know what assignment they have to do, and what resources they need to be looking at. Do you ‘push’ those to a student?
How far do you go in assembling a profile of what might be ‘recommended’ for a course, module or assignment, what other students on the cohort might be looking at, or looked at the last time this course ran? Do you look at students previous search behaviour? How much of this might you do to build and then search some form of ‘knowledge base’ with the aim of surfacing material that is likely to be of most relevance to a student. Search for psychology on NCSU’s Search All search box gives you the top three articles out of 2,543,911 articles in Summon, and likely behaviour is not to look much beyond the first page of results. So should we be making sure that they are likely to be the most relevant ones?
But, then there’s serendipity, there’s finding the different things that you haven’t looked for before, or read before, because they are new or different. One of the issues with recommendations is the tendancy for them to be circular, ‘What gets recommended gets read’ to corrupt the performance indicator mantra. So how far do you go? ‘Mind reading search’ anyone?
I was intrigued to read David Weinberger’s blog post ‘Protecting library privacy with a hard opt-in’ in it he suggests that there is a case to be made for asking users to explicitly opt-in to publishing details of their checkouts (loans) before you can use that activity data. I must admit that I’d completely missed the connection between David Weinberger author of ‘Everything is miscellaneous’ and his role with the Harvard Innovation Lab and I’m sure I’ve probably blogged about both in the past.
The concern that has been raised is about re-identification, where supposedly ‘anonymous’ datasets can be combined with other data to identify individuals. There’s a good description of the issue in this paper from 2008 from Michael Hay and others from the University of Massachusetts http://scholarworks.umass.edu/cgi/viewcontent.cgi?article=1176&context=cs_faculty_pubs
Obviously an issue of this type is of critical significance when you might be talking about medical trials data for example, but library data might also be personal or sensitive. Aside from the personal aspects you could also imagine that a researcher carrying out a literature search for material for a potential new research area would not want ‘competitors’ to know that they were looking at a particular area, particularly now cross-domain research activities are more common.
The issue of anonymity and potentially being able to identify an individual from their activity data is an area that has been explored through a number of projects, such as in Jisc’s Activity Data programme and synthesis project outputs at http://www.activitydata.org particularly in the section on data protection. Most of the approaches tackled anonymization in two ways, by replacing user IDs with a generated ID (described interestingly by Hay as ‘naive anonymization’) and by removing data from the dataset if there were only small numbers of users included (such as a course with only a few students enrolled).
Re-identification techniques seem to work by being able to identify unique patterns of use, called digital fingerprints that can be used to identify individuals. When you combine data from an anonymized dataset with other material you can start to identify individuals. It certainly seems to be something that needs to be thought carefully about when contemplating releasing datasets.
Is the suggested solution, of asking for explicit permission the right approach? If you are planning to release data openly, I’d probably agree. If you plan to use it only within your systems to generate recommendations, then yes it’s probably good practice. I worry slightly about the value of the activity data if there is a low opt-in level. That may significantly diminish its value and usefulness.
I’m not too convinced though about the approach that says that users agree to a public page that lists your activity. That would seem to me to encourage people who might not be unhappy with allowing their data to be used unattributed in recommendations not to opt-in. When we’ve asked students about their views of what data we should be able to use they were quite happy for activity data to be used. My view would be that it’s fine to show an individual what they have used (and we do that), but not something to share.
To Birmingham today for the second meeting of the Jisc LAMP (library analytics and metrics project) community advisory and planning group. This is a short Jisc-managed project that is working to build a prototype dashboard tool that should allow benchmarking and statistical significance tests on a range of library analytics data.
The LAMP project blog at http://jisclamp.mimas.ac.uk is a good place to start to get up to speed with the work that LAMP is doing and I’m sure that there will be an update on the blog soon to cover some of the things that we discussed during the day.
One of the things that I always find useful about these types of activity, beyond the specific discussions and knowledge sharing about the project and the opportunity to talk to other people working in the sector, is that there is invariably some tool or technique that gets used in the project or programme meetings that you can take away and use more widely. I think I’ve blogged before about the Harvard Elevator pitch from a previous Jisc programme meeting.
This time we were taken through an approach of carrying out a review of the project a couple of years hence, where you had to imagine that the project had failed totally. It hadn’t delivered anything that was useful, so no product, tool or learning came out of the project. It was a complete failure.
We were then asked to try to think about reasons why the project had failed to deliver. So we spent half an hour or so individually writing reasons onto post-it notes. At the end of that time we went round the room reading out the ideas and matching them with similar post-it notes, with Ben and Andy sticking them to a wall and arranging them in groups based on similarity.
It quickly shifted away from going round formally to more of a collective sharing of ideas but that was good and the technique really seemed to be pretty effective at capturing challenges. So we had challenges grouped around technology and data, political and community aspects, and legal aspects for example.
We then spent a bit of time reviewing and recategorising the post-it notes into categories that people were reasonably happy with. Then came the challenge of going through each of the groups of ideas and working out what, if anything, the project could or should do to minimise the risk of that possible outcome happening. That was a really interesting exercise to identify some actions that could be done in the project such as engagement to encourage more take up.
A really interesting demonstration of quite a powerful technique that’s going to be pretty useful for many project settings. It seemed to be a really good way of trying to think about potential hurdles for a project and went beyond what you might normally try to do when thinking about risks, issues and engagement.
It’s interesting to me how so many of the good project management techniques work on the basis of working backwards. Whether that is about writing tasks for a One Page Project Plan based on describing the task as if has been completed, e.g. Site launch completed, or whether it is about working backwards from an end state to plan out the steps and the timescale you will have to go through. These both envisage what a successful project looks like, while the pre-mortem thinks about what might go wrong. Useful technique.
One of the bits of work that we’re doing at the moment is to talk to students about their thoughts about personalised library services. The aim of the work is to help us to understand what students might want (or not want) and to then use that information to build some tools that we can test with them. In part it is being driven by a realisation that library websites and systems are competing against expectations that are shaped by sites such as Google and Amazon. Traditional library websites such as OPACs seem to be a world away from a modern web experience (see Aaron Schmidt’s blogpost on Library Journal for example).
One of the interesting things that is coming out of the work for me is around attitudes and expectations for the personal use data that is being collected as part of user engagement with our systems. I’d expected that students would be quite guarded about what they would expect a library system to know about them, because generally speaking, libraries rarely seem to use data to provide much in the way of a personalised service. But expectations seem to be that once a student has logged in then library systems should know their name and the course they are studying, at least. But that maybe the library systems should also know what previous courses they’d studied or their contact preferences. And that is really interesting to know as it’s difficult to think of many (any? other than some experimental work) examples of library systems that do track what courses a student is studying and actively use that data to provide a tailored service.
When we asked some specific questions about whether students would object to us using certain data to tailor services, over 90% of respondents didn’t object to us using their course or previous material they’ve accessed as a means of providing personalised services. More than 80% had no objection to using their previous courses or search terms for personalisation. I think I would have expected a larger number of respondents who objected to the use of their data.
What I think is that there’s a trade-off between privacy and service (highlighted in this article by Li and Unger ‘Willing to pay for quality personalization?‘ (link is to abstract) from European Journal of Information Systems (2012) 21, 621–642. doi:10.1057/ejis.2012.13). So there is a conscious calculation being made in terms of being able to see that you, the user, are getting a direct benefit from allowing the system to know something about you. As a user you make that calculation and judge whether it seems reasonable to you or not. ‘Does the benefit outweigh the loss of privacy?’ It strikes me that there seems to be an element here where users might be ascribing a ‘value’ to their data and they’d ‘trade’ that value for a benefit. That makes me wonder whether the likes of Google (that essentially make their business model in part at least out of the value they can leverage from user data) have had the effect of making users realise that their ‘data’ also has a value that they can swop for a service?
Encouraged by some thinking about what sort of prototype resource usage tools we want to build to test with users in a forthcoming ‘New tools’ section I’ve been starting to think about what sort of features you could offer to library users to let them take advantage of library data.
For a few months we’ve been offering users of our mobile search interface (which just does a search of our EBSCO discovery system) a list of their recently viewed items and their recent searches. The idea behind testing it on a mobile device was that giving people a link to their recent searches or items viewed would make it easier for people to get back to things that they had accessed on their mobile device by just clicking single links rather than having to bookmark them or type in fiddly links. At the moment the tool just lists the resources and searches you’ve done through the mobile interface.
But our next step is to make a similar tool available through our main library website as a prototype of the ‘articles I’ve viewed’. And that’s where we start to wonder about whether the mobile version of the searches/results should be kept separate from the rest of your activities, or whether user expectations would be that, like a Kindle ebook that you can sync across multiple devices, your searches and activity should be consistent across all platforms?
At the moment our desktop version has all your viewed articles, regardless of the platform you used. But users might want to know in future which device they used to access the material maybe? Perhaps because some material isn’t easily accessible through a mobile device. But that opens up another question, in that the mobile version and the desktop version may be different URLs so you might want them to be pulled together as one resource with automatic detection of your device when you go to access the resource.
With the data about what resources are being accessed and what library web pages are being accessed it starts to open up the possibility of some more user-centred use of library activity and analytics data.
So you could conceive of being able to match that there is a spike of users accessing the Athens problems FAQ page and be able to tie that to users trying to access Athens-authenticated resources. Being able to match activity with students being on a particular module could allow you to push automatically some more targeted help material, maybe into the VLE website for relevant modules, as well as flag up an indication of a potential issue to the technical and helpdesk teams.
You could also contemplate mining reading lists and course schedules to predict when there are particular activities that are scheduled and automatically schedule pushing relevant help and support or online tutorials to students. Some of the most interesting areas seem to me to be around building skills and using activity (or lack of activity) to trigger promotion of targeted skills building activities. So knowing that students on module X should be doing an activity that involves looking at this set of resources, and being able to detect the students that haven’t accessed those resources, offering them some specific help material, or even contact from a librarian. Realistically those sorts of interventions simply couldn’t be managed manually and would have to rely on some form of learning analytics-type trigger system.
One of the areas that would be useful to look at would be some form of student dashboard for library engagement. So this could give students some data about what engagement they have had with the library, e.g. resources accessed, library skills completed, library badges gained, library visits, books/ebooks borrowed etc. Maybe set against averages for their course, and perhaps with some metrics about what high-achieving students on their course last time did. Add to that a bookmarking feature, lists of recent searches and resources used, with lists of loans/holds. Finished off with useful library contacts and some suggested activities that might help them with their course based on what is know about the level of library skills needed in the course.
Before you can do some of the more sophisticated learning analytics-type activities I suspect it would be necessary is to have a better understanding of the impact that library activities/skills/resources have on student retention and achievement. And that seems to me to argue for some really detailed work to understand library impact at a ‘pedagogic’ level.
I’d been thinking early this morning about writing up a blog post around some thoughts about ‘Library Analytics’ and thinking that it was interesting how ‘Library Analytics’ had been used by Harvard for their ‘Library analytics toolkit’ and by others as a way of talking about web analytics, but that neither really seemed to me to quite be analagous to the way that the Learning Analytics community, such as Solar, view analytics. There are several definitions about Learning Analytics. This one from Educause’s 7 things you should know about first-generation learning analytics:
Learning analytics (LA) applies the model of analytics to the specific goal of improving learning outcomes. LA collects and analyzes the “digital breadcrumbs” that students leave as they interact with various computer systems to look for correlations between those activities and learning outcomes. The type of data gathered varies by institution and by application, but in general it includes information about the frequency with which students access online materials or the results of assessments from student exercises and activities conducted online. Learning analytics tools can track far more data than an instructor can alone, and at their best, LA applications can identify factors that are unexpectedly associated with student learning and course completion.
Much of the library interest in analytics seems to me to have mainly been about using activity data to understand user behaviour and make service improvements, but I’m increasingly of the view that whilst that is important, it is only half the story. One of the areas that interests me about both learning analytics and activity data, is the empowering potential of that data as a tool for the user, rather than the lecturer or librarian, to find out interesting things about their behaviour, or get suggested actions or activities, and essentially to be able to make better choices. And that seems to be the key – just as reviews and ratings are helping people being informed consumers, with sites like Trip Advisor then we should be building library systems that help our users to be informed library consumers.
So it was great to see the announcement of the JiscLAMP project this morning http://infteam.jiscinvolve.org/wp/2013/02/01/jisc-lamp-shedding-light-on-library-data-and-metrics/ announcing the Library Analytics and Metrics project and talking about delivering a prototype shared library analytics service for UK academic libraries. I was particularly interested to see that the plan is to develop some use-cases for the data and great that Ben Showers shared some of the vision behind the idea. It’s a great first step to put data on a solid, consistent and sustainable basis, and should build a good platform to be able to exploit that vast reservoir of library data.