You are currently browsing the monthly archive for December 2011.

Harvard Library Innovation Laboratory

The second aspect of data that caught my interest today was Harvard’s Library Innovation Laboratory.  I must admit that when I saw the link to it I did wonder whether it was going to be a list of library tools aimed directly at users (I’m sure I’ve seen the name used elsewhere recently for just such a list).  I know we are looking at redoing our library toolbox to update it and library lab or labspace sounded like a good name for something like that. But the Library Innovation Laboratory is much more interesting proof of concept for anyone with any interest in what you can do with library activity data.

Using library circulation data that has been contributed to the LibraryCloud there are some really imaginative prototype visualisations in the Stack View and Shelf Rank tools.  Two values are shown instantly.  The book width is determined by the numbers of pages in the book and the book colour corresponds to the volume of loans so the darker the blue the greater the traffic.  ShelfLife screenshot Titles are then shown as a stack one on top of each other.   It’s a really neat visualisation of the data and I’m already wondering if that approach would work equally well with visualising library data that is entirely electronic resources.  [It’s actually one of the big problems about anything to do with electronic resources – that there isn’t really a universal icon or symbol that you can use that everyone recognises that it relates to stuff that is online and in electronic form].

There’s quite a lot of interesting stuff in the site and also in the LibraryCloud site at One of the things that particularly interested me (from experiences with the RISE Activity Data project) was the section about data privacy and anonymisation, as a key requirement always has to be that with any dataset where the aspiration is for open release, it must be prepared in a way that ensures that users are unable to be identified individually.

The checkout visualisation is also a neat way of showing that sort of data in a nice clear fashion. Checkout screenshot The feature that lets you sort the data by different schools is useful and slightly brings to mind one of the MOSAIC competition entries that used a graph-type visualisation that allowed you to navigate through library use data.  It did amuse me though that ‘Headphones’ appears twice in the top ten with different numbers.   The perils of libraries using their Library Management Systems to loan all sorts of other things!
LibraryCloud screenshot

LibraryCloud currently has data from Harvard and Northeastern Universities and Darien, San Francisco and San Jose public libraries.  A couple of sites to keep an eye on over the next few months.

Courtesy of a couple of tweets from @psychemedia and @simonjbains two items about data and data visualisation caught my attention today on twitter.  Firstly a great post by Pete Warden ‘What the Sumerians can teach us about data’ on his PeteSearch blog and secondly Harvard’s Library Innovation Laboratory.   Both items covering particular aspects of data, one talking about the history of data, the other a great set of examples of how to use and visualise data, in Harvard’s case library circulation data using the LibraryCloud library metadata repository.

A history of data
I found the blog post on the Sumerians to be particularly interesting.   The starting point is the contention that their greatest achievement was the invention of data and there are some good examples of how the written language was used to record who owned what (or who owed what to whom).  I like the comparison made between the ‘threats of supernatural retribribution’  being used to protect the integrity of the data with modern warnings over video copying, both being ‘ways of forcefully expressing society’s norms, rather than a credible threat of punishment’

It find it interesting how often we seem to find that early examples of writing often turn out to be lists, in other words data rather than stories.  Another example that comes to mind are the Vindolanda tablets.  These are from the Roman period and found during excavations at a roman fort in Northern England.

“… for dining pair(s) of blankets … paenulae, white (?) … from an outfit: paenulae … and a laena and a (?) … for dining loose robe(s) … under-paenula(e) … vests … from Tranquillus
under-paenula(e) … [[from Tranquillus]]
from Brocchus tunics … half-belted (?) … tunics for dining (?) … (Back, 2nd hand?) … branches (?), number … a vase …
with a handle rings with stones (?) …”
Writing lists of things seems to have been a recurrent story and it strikes me that being able to list and count things must have been an early skill that would have to have been mastered by early farmers at least.  To my mind there’s no reason to suppose that early peoples would have been any less intelligent than modern day people.  And as the archaeologists are fond of pointing out ‘absence of evidence isn’t evidence of absence’ so there’s no reason to suppose that people weren’t collecting lists of data long before the Sumerians maybe?

I also thought the comments making a comparion between instructions for interpreting omens and predicting the future from data to be really interesting.  A great deal is often made of the importance of ‘facts and data’ and it has long seemed to me that the critical factor isn’t the data that you have, but how you interpret it and what decisions you make.  And it often seems to me that the interpretation of data and decision making is a much less scientific exercise.

Part two covering the Harvard Library Innovation Laboratory to follow in the next blog post.

A comment on one of my search blog posts by Preedip Balaji suggested TAPoR text analysis as a useful tool to help with comparing the search terms lists that I was using to look at the terms that users were using on the tabbed search tool that we had on our old website.  Tabbed search box screenshotAt the time we had three tabbed searches to cover the library catalogue, website and originally a federated search tool that then migrated to a discovery search tool.  We’d found that there was quite considerable overlap between the search tools that users put into the search box, and subsequently we’ve gone away from a tabbed approach on the new website in favour of a single discovery search box.  But at the time I wondered about whether there were any text analysis tools that would help with trying to provide some form of assessment about the similarity between the search terms used.

TAPoRware seems to be exactly the sort of text comparison tool that I was looking for.  Developed at the University of Alberta, TAPoR (Text Analysis Portal For Research), has a range of HTML, XML and Plain text tools that allow you to analyse words, find patterns and look for data within text for example.  So I’ve been playing around with the Taporware screenshotComparator tool to compare some of the lists of 100 search terms used in the website, federated and catalogue searches.

The comparator tool lets you compare two sets of data at a time and you can upload your set of data as a text file from a local file.  For some reason it wouldn’t accept an excel file but it will display the results as either html or as a tab-delimited file.  The comparator tool goes through and provides some data about how many words there are and how many are unique or appear multiple times.  Then it provides a list of words that are common or unique to either file.

The tool only lets you compare two files at a time, ideally I’d have liked to compare three files.  It also compares the words individually, whereas most of the search terms included in my files are actually search phrases.    So I’ve had to run three comparisons to compare each file with the other two.  The table below summarises the comparisons and shows what percentage of terms are common or unique to each file of search terms.

Common Unique to 1 Unique to 2
Number % Number % Number %
Catalogue/Federated 80 41 56 29 60 31
Catalogue/Website 77 37 75 36 57 27
Federated/Website 61 28 95 43 75 34

If I understand correctly then the implication is that there is more in common between the search terms for the catalogue and federated search than between federated search and the website.  When I looked at the search terms originally there were around 45% that had been used across three of the search boxes and website search terms did seem to differ slightly from the federated and catalogue searches. That seems to be borne out by the text comparator that shows the website search data as having less common words.

TAPoR looks like a useful tool, although I’ve barely scratched the surface of what it can do.  Now we’ve changed our website to just a single discovery search there’s some further work we can maybe do to analyse the terms that people are using now to compare with what they used to use on the tabbed search system.

Finishing off a blog post the other day, wordpress flagged up that it was post number 85.  I hadn’t really been keeping track of exactly how many blog posts I’d written or ever had a particular number of posts in mind, never really aiming for a certain number a week or month.   I’ve been running the blog since June 2009, so 85 posts work out at around 2 and three quarter posts a month.   But looking in a bit more detail I’ve noticed that I’m blogging more frequently this year with nearly twice as many posts per month this year.

That made me wonder about the topics that I’ve been blogging aboutBlog post categories and whether there was any particular pattern to the topics.  Taking out the more generic post categories such as libraries and reflections gives the set of words that make up the wordle on the right.   So I’ve most frequently blogged about the website, not surprising as a lot of my work revolves around the new library website that we’ve built over the last couple of years.

The next most frequent topics have been about ipads and activity data, followed by analytics and digital libraries.   I’ve also blogged a bit about kindles and ebooks but only a couple of times about linked data or discovery systems which slightly surprised me as both have featured quite a lot in my work.   On reflection, I’ve also written a few posts about search, which includes some elements of comment on discovery systems, but largely has been about how library search is presented within library websites rather than about discovery systems per se.

Blog post statistics
Having looked at the topics I’ve written about I thought I’d also look at which topics got most views to see if there were any patterns.  My feeling is that posts about search and ipads seem to get most views.  But when I investigated the wordpress statistics in a bit more detail I realised pretty quickly that they weren’t really going to provide an answer.  Although they do show in the list the number of views each post received, the biggest number of views have been of the home page.  So people going directly to the home page to view the latest blog post aren’t going to show up in the statistics for a particular blog post unfortunately.

Wordpress blog statisticsIt’s always going to be the case I suppose that a lot of the views of a new blog post are going to be from the home page of the blog, and there would generally be several blog posts on the home page.  But it does make it difficult to work out which are the most popular topics.’s statistics for their free blogs give you the basic idea of traffic to the blog but aren’t in any way an equivalent to something like Google Analytics.  Unfortunately you aren’t able to add Google Analytics to a hosted blog.

WordPress blog statistics do show you search terms used to access your blog and where users are being referred from.  The most popular search terms I’m seeing are: refworks api, ipad screen, library search, mendeley ios5 (a bit of a strange combination) and shared services.  The list of search terms does throw up some really strange mispellings of search terms though: lidray search; libariy surch; libreary serch; library shearch; and library surch, all appear in the list.  Google’s ‘did you mean’ feature must be doing its stuff.

Referrers are quite an interesting set of statistics.  Unsurprisingly search engines, largely Google, provide most of the referrrals, but I was surprised that a largely text-based blog had nearly as many referrals from Google Images as Google search.   Next along was twitter, not too surprising as I generally tweet any new blog posts and benefit from a few retweets by people.  There’s quite a wide range of referrers which was quite a surprise and even some from facebook which is interesting as I know I’ve not promoted any blog post from facebook.

Overall, the statistics give you a reasonable indication of how many visits you are getting for your blog and some information about how people are finding your blog and where they are coming from.  If you want the full features of an analytics package you’d have to move to an alternative or paid-for blog host, but the statistics are OK for a free blog host.

Search box on library websiteA couple of tweets today flagged up Andrew Asher’s paper on Search Magic on his ‘An Anthropology of Algorithms’ blog (a great title for a blog).  As he explains in the paper it is based on research he has been conducting into how students find and use information as part of the ERIAL project.

Student search behaviour is something that is of great interest to me as I work at a University that delivers courses at a distance so library search is one of the main ways that students interact with our library.  We’ve grappled with the challenge of how we present library search for a while and I’ve blogged about it before a couple of times, most recently here.

So it is really good to see Andrew’s thoughts and research into library search.  It’s interesting to read about the rise of the secretive ‘algorithmic culture’ that he describes as it really starts to explain the trust that users invest in search engines like Google and the implications that this has for library search systems.  We’ve all recognised the impact that Google has on student expectations and Andrew clearly identifies the simplicity and single search box and simple keyword as being something that libraries have been trying to mimic.  Given that library resources have rather less internal coherence (e.g. the typical federated search systems) than Google’s search index then maybe it’s not surprising that the record is mixed.

The figures Andrew reports clearly show students using library search systems as they would Google which leads to problems with too little or too many search results appearing.  That is a problem that is all too familar to users of the new generation Discovery systems such as Ebsco Discovery and Summon.  As Andrew points out these systems also use relevance ranking algorithms that they can be quite proprietary about.

I suppose I’m not surprised that students largely aren’t using what librarians would consider to be the most appropriate search tool for their particular enquiry.  They use what they have had success with in the past.  At undergraduate level at least I’m not surprised that students don’t have the knowledge of which is the most appropriate database to use.  That’s a skill that librarians have had to master and although we all do a lot to try to get this type of domain search information across it clearly doesn’t get through.  But perhaps the concentration of effort on ‘one-stop’ type discovery searches is obscuring that message?

Andrew also covers students skills in evaluating (0r not evaluating) the quality of results and the self-perpetuating loop of trusting results listed on the first page.  Certainly the examples of students deciding that because their search didn’t turn up any results ‘then the information must not exist and they should give upon the topic’ are familar.

A really fascinating and useful paper and piece of research into student search behaviour and something I look forward to hearing more about.

I was interested to see another blog post on the subject of ‘the user is not broken’ that picked up on librarians tendancies to try to fix the user.  This one by Jenica Rogers on her Attempting Elegence blog includes the great comment:

“The user is not broken in that our job is to fulfill the user’s needs, and the user’s needs are, while not always well-defined, possible to meet, or understood by either side, valid — so accusing the user of Doing It Wrong is counterproductive to our goals and needs, and should be avoided. “

It’s something that came out clearly in our usability testing for our new library website and I blogged about it back in October here, talking about the history of that particular meme, so it’s interesting to see it come up again in connection with the search features on the Barnes and Noble website.

I do find it intriguing that it’s not an uncommon reaction among librarians to these sort of examples of user behaviour.  It seems to range from a comment along the lines of ‘well, they aren’t searching ‘properly’ to ‘well, if I could just have 30 minutes with them I could show them how to get much better search results’. 

In part I think it comes from wanting to help people have a better search experience, a user training aspect, which is a role that librarians have developed.  So there’s a willingness to provide training to users which tends to translate into thinking that problems can be solved by training, guidance and support. Rather than by focusing on getting the search experience right so users don’t need training.

But I wonder also whether there’s an aspect that librarians are so used to dealing with poor user interfaces in the typical search systems that they’ve had to master over the years.  So they are more ‘tolerant’ of poor search systems and expect to come up with effective search strategies.  So they forget that most users just want to search in a simple way, and really want a search that is just good enough.  They don’t necessarily want to construct the perfect search strategy.  Users have learnt that on search engines you can usually get some relevant results and expect other systems to be the same.

Library performance metrics screenshotAfter blogging about Carol Tenopir’s fascinating library value seminar the other week I was interested to see a view about how librarians are struggling with library value metrics in this slideshow ‘Refining the Academic Library’ (courtesy of a Google+ post from Ben Showers). The slides are from the Education Advisory Board in the US. The Education Advisory Board ‘provides best practice research and practical advice to academic, business, and student affairs leaders at the nation’s leading universities.’

The slides themselves cover the challenges facing libraries and talk through the transformation steps involved in changing the model of delivery from a face-to-face, print-based, ‘just-in-case’ business model, to a digital, ‘just-in-time’ model.  Ranging through the issues of the rising costs of journals and how libraries are increasingly being disintermediated (although interestingly that view contradicts a comment from Carol Tenopir that academics may not read journals in the library but still the majority were using library resources as their main source for journal articles), through to looking at the power of Google’s digitisation potential compared with a typical academic library, the slides are a really good summary of the issues facing academic libraries and how academic libraries might move forward.

A lot of the content covers the challenges facing institutions with large physical collections and a building that is the focus of many of their traditional services, so some of that isn’t relevant to my particular interest.  But many of the challenges and solutions around eresources, ebooks and support services are very relevant. Going Where students areOne of the slides advocates ‘Going Where the Students Are’ which is something that we try to do as much as possible.  So we have Library Resources embedded and linked within the Virtual Learning Environment and links to various library websites.  We also have information literacy activities and have the goal of embedding as much as possible into VLE courses.

But that isn’t without issue.  By closely intergrating the library so students can follow a link to resources directly from their course materials you can easily make the library invisible to users.  The value of the physical library is that it makes it clear to students that when they go through the door, the resources within are provided by the ‘Library’  Replicating that feeling when in a virtual environment is quite a challenge and something that library portals have tried to address.  A ‘brought to you by the Library’ pop-up message is feasible to do but potentially intrusive and irritiating to users when it pops-up on every single library resource link.

The slides give a really good run through the challenges and issues for libraries with lots of useful material that are going to be quite helpful in telling the sort of stories that we will need to put across over the next few years.  As a final thought it’s also interesting to see the NCSU mobile services being mentioned in the slides, as they have been something that we have looked at quite a lot when planning the new mobile library website that we now have made available here

ipad iOS 5 screenshotI’ve been part of a few discussions this week about various aspects of mobile and tablet devices, from the institutional data security aspects through to the implications for website development and for helping library staff to support users using these types of devices.  So it was interesting to hear about another aspect, app collection management, blogged about here by Emily Clasper. 

It was good to see the thought processes about the requirements for buying and maintaining the apps set out so clearly.  And to read the conclusion that, ‘choosing content for the iPad was pretty much the same as developing any library collection’.   It’s reassuring to see that the standard tried and tested library acquisitions processes of  stock objectives, stock plans and stock policies still have relevance to digital devices and apps.

I do wonder though if there is a bit of a difference compared with other library selection processes.  In a lot of cases library material is being pre-selected by the library stock supply industry and presented to librarians for them to choose from, so there is some weeding out of unsuitable material.  There are tools to help stock selection that may not yet have caught up with a need to include apps. And finding apps can be a bit of a hit and miss affair.  

Apps can also be quite unreliable and prone to bugs, but you could say the same of computer games and many libraries have been happily lending them and using them for a long time.  But you do have to factor in time to update the apps at regular intervals.

I’d also wonder about the detail in some of the license conditions with some of the apps.  Tablets are pretty much expected to be ‘personal’ devices, so the license conditions on an app aren’t likely to cover their use on a shared device in the library.

Twitter posts



December 2011

Creative Commons License