You are currently browsing the monthly archive for December 2011.
A comment on one of my search blog posts by Preedip Balaji suggested TAPoR text analysis as a useful tool to help with comparing the search terms lists that I was using to look at the terms that users were using on the tabbed search tool that we had on our old website. At the time we had three tabbed searches to cover the library catalogue, website and originally a federated search tool that then migrated to a discovery search tool. We’d found that there was quite considerable overlap between the search tools that users put into the search box, and subsequently we’ve gone away from a tabbed approach on the new website in favour of a single discovery search box. But at the time I wondered about whether there were any text analysis tools that would help with trying to provide some form of assessment about the similarity between the search terms used.
TAPoRware seems to be exactly the sort of text comparison tool that I was looking for. Developed at the University of Alberta, TAPoR (Text Analysis Portal For Research), has a range of HTML, XML and Plain text tools that allow you to analyse words, find patterns and look for data within text for example. So I’ve been playing around with the Comparator tool to compare some of the lists of 100 search terms used in the website, federated and catalogue searches.
The comparator tool lets you compare two sets of data at a time and you can upload your set of data as a text file from a local file. For some reason it wouldn’t accept an excel file but it will display the results as either html or as a tab-delimited file. The comparator tool goes through and provides some data about how many words there are and how many are unique or appear multiple times. Then it provides a list of words that are common or unique to either file.
The tool only lets you compare two files at a time, ideally I’d have liked to compare three files. It also compares the words individually, whereas most of the search terms included in my files are actually search phrases. So I’ve had to run three comparisons to compare each file with the other two. The table below summarises the comparisons and shows what percentage of terms are common or unique to each file of search terms.
|Common||Unique to 1||Unique to 2|
If I understand correctly then the implication is that there is more in common between the search terms for the catalogue and federated search than between federated search and the website. When I looked at the search terms originally there were around 45% that had been used across three of the search boxes and website search terms did seem to differ slightly from the federated and catalogue searches. That seems to be borne out by the text comparator that shows the website search data as having less common words.
TAPoR looks like a useful tool, although I’ve barely scratched the surface of what it can do. Now we’ve changed our website to just a single discovery search there’s some further work we can maybe do to analyse the terms that people are using now to compare with what they used to use on the tabbed search system.
Finishing off a blog post the other day, wordpress flagged up that it was post number 85. I hadn’t really been keeping track of exactly how many blog posts I’d written or ever had a particular number of posts in mind, never really aiming for a certain number a week or month. I’ve been running the blog since June 2009, so 85 posts work out at around 2 and three quarter posts a month. But looking in a bit more detail I’ve noticed that I’m blogging more frequently this year with nearly twice as many posts per month this year.
That made me wonder about the topics that I’ve been blogging about and whether there was any particular pattern to the topics. Taking out the more generic post categories such as libraries and reflections gives the set of words that make up the wordle on the right. So I’ve most frequently blogged about the website, not surprising as a lot of my work revolves around the new library website that we’ve built over the last couple of years.
The next most frequent topics have been about ipads and activity data, followed by analytics and digital libraries. I’ve also blogged a bit about kindles and ebooks but only a couple of times about linked data or discovery systems which slightly surprised me as both have featured quite a lot in my work. On reflection, I’ve also written a few posts about search, which includes some elements of comment on discovery systems, but largely has been about how library search is presented within library websites rather than about discovery systems per se.
Blog post statistics
Having looked at the topics I’ve written about I thought I’d also look at which topics got most views to see if there were any patterns. My feeling is that posts about search and ipads seem to get most views. But when I investigated the wordpress statistics in a bit more detail I realised pretty quickly that they weren’t really going to provide an answer. Although they do show in the list the number of views each post received, the biggest number of views have been of the home page. So people going directly to the home page to view the latest blog post aren’t going to show up in the statistics for a particular blog post unfortunately.
It’s always going to be the case I suppose that a lot of the views of a new blog post are going to be from the home page of the blog, and there would generally be several blog posts on the home page. But it does make it difficult to work out which are the most popular topics. WordPress.com’s statistics for their free blogs give you the basic idea of traffic to the blog but aren’t in any way an equivalent to something like Google Analytics. Unfortunately you aren’t able to add Google Analytics to a wordpress.com hosted blog.
WordPress blog statistics do show you search terms used to access your blog and where users are being referred from. The most popular search terms I’m seeing are: refworks api, ipad screen, library search, mendeley ios5 (a bit of a strange combination) and shared services. The list of search terms does throw up some really strange mispellings of search terms though: lidray search; libariy surch; libreary serch; library shearch; and library surch, all appear in the list. Google’s ‘did you mean’ feature must be doing its stuff.
Referrers are quite an interesting set of statistics. Unsurprisingly search engines, largely Google, provide most of the referrrals, but I was surprised that a largely text-based blog had nearly as many referrals from Google Images as Google search. Next along was twitter, not too surprising as I generally tweet any new blog posts and benefit from a few retweets by people. There’s quite a wide range of referrers which was quite a surprise and even some from facebook which is interesting as I know I’ve not promoted any blog post from facebook.
Overall, the wordpress.com statistics give you a reasonable indication of how many visits you are getting for your blog and some information about how people are finding your blog and where they are coming from. If you want the full features of an analytics package you’d have to move to an alternative or paid-for blog host, but the statistics are OK for a free blog host.
A couple of tweets today flagged up Andrew Asher’s paper on Search Magic on his ‘An Anthropology of Algorithms’ blog (a great title for a blog). As he explains in the paper it is based on research he has been conducting into how students find and use information as part of the ERIAL project.
Student search behaviour is something that is of great interest to me as I work at a University that delivers courses at a distance so library search is one of the main ways that students interact with our library. We’ve grappled with the challenge of how we present library search for a while and I’ve blogged about it before a couple of times, most recently here.
So it is really good to see Andrew’s thoughts and research into library search. It’s interesting to read about the rise of the secretive ‘algorithmic culture’ that he describes as it really starts to explain the trust that users invest in search engines like Google and the implications that this has for library search systems. We’ve all recognised the impact that Google has on student expectations and Andrew clearly identifies the simplicity and single search box and simple keyword as being something that libraries have been trying to mimic. Given that library resources have rather less internal coherence (e.g. the typical federated search systems) than Google’s search index then maybe it’s not surprising that the record is mixed.
The figures Andrew reports clearly show students using library search systems as they would Google which leads to problems with too little or too many search results appearing. That is a problem that is all too familar to users of the new generation Discovery systems such as Ebsco Discovery and Summon. As Andrew points out these systems also use relevance ranking algorithms that they can be quite proprietary about.
I suppose I’m not surprised that students largely aren’t using what librarians would consider to be the most appropriate search tool for their particular enquiry. They use what they have had success with in the past. At undergraduate level at least I’m not surprised that students don’t have the knowledge of which is the most appropriate database to use. That’s a skill that librarians have had to master and although we all do a lot to try to get this type of domain search information across it clearly doesn’t get through. But perhaps the concentration of effort on ‘one-stop’ type discovery searches is obscuring that message?
Andrew also covers students skills in evaluating (0r not evaluating) the quality of results and the self-perpetuating loop of trusting results listed on the first page. Certainly the examples of students deciding that because their search didn’t turn up any results ‘then the information must not exist and they should give upon the topic’ are familar.
A really fascinating and useful paper and piece of research into student search behaviour and something I look forward to hearing more about.
I was interested to see another blog post on the subject of ‘the user is not broken’ that picked up on librarians tendancies to try to fix the user. This one by Jenica Rogers on her Attempting Elegence blog includes the great comment:
“The user is not broken in that our job is to fulfill the user’s needs, and the user’s needs are, while not always well-defined, possible to meet, or understood by either side, valid — so accusing the user of Doing It Wrong is counterproductive to our goals and needs, and should be avoided. “
It’s something that came out clearly in our usability testing for our new library website and I blogged about it back in October here, talking about the history of that particular meme, so it’s interesting to see it come up again in connection with the search features on the Barnes and Noble website.
I do find it intriguing that it’s not an uncommon reaction among librarians to these sort of examples of user behaviour. It seems to range from a comment along the lines of ‘well, they aren’t searching ‘properly’ to ‘well, if I could just have 30 minutes with them I could show them how to get much better search results’.
In part I think it comes from wanting to help people have a better search experience, a user training aspect, which is a role that librarians have developed. So there’s a willingness to provide training to users which tends to translate into thinking that problems can be solved by training, guidance and support. Rather than by focusing on getting the search experience right so users don’t need training.
But I wonder also whether there’s an aspect that librarians are so used to dealing with poor user interfaces in the typical search systems that they’ve had to master over the years. So they are more ‘tolerant’ of poor search systems and expect to come up with effective search strategies. So they forget that most users just want to search in a simple way, and really want a search that is just good enough. They don’t necessarily want to construct the perfect search strategy. Users have learnt that on search engines you can usually get some relevant results and expect other systems to be the same.
After a couple of years of having the same header on this blog I’ve been thinking for a little time that I wanted to update the header on the site. I did wonder about changing the whole look and feel of the blog and go for a new layout but for the time being I’ve settled for a new header. The original header was based on the blog title shown with a circuit board infil for each of the letters, just as something slightly different to the usual shaded text. The immediate trigger for changing the header was a new free logo creator site tweeted about by Phil Bradley at the end of last week and listed on his ‘I want to…’ blog here.
The site at http://logotypemaker.com lets you create your own logo by typing in the words you want to appear in your logo and then it shows you some sample logos. You can refresh the page to show more alternatives or pick one of the logos and change the colours, fonts and images to give you the effect you want.
You can set the logo with a black, white or transparent background and resize it to fit your requirements. Once you have finished your changes you can save/export it as PNG, PDF or as a zipped PNG file.
For my new blog header I settled for a black background and used the Hattori Hanzo Light font for the text. I kept the reflection effect as it seemed to work quite well on the black background and I’ve always liked that type of effect (and probably overuse shadows and reflections in graphics and presentations).
For some reason I wasn’t able to download the PNG version of the image and the zipped version didn’t crop correctly when I picked the 720×180 pixel size I needed for the blog header. So I ended up using the PDF version and then just selecting the image and resizing it in Paint.net to get the right size and proportions.
As a free tool I’m quite impressed with it. It’s obviously not as powerful as a full-spec drawing package but it’s a good starting point if you aren’t that skilled at creating images from scratch. The tool also seems to let you upload your own images, so if you have something you want to incoporate into your logo you can add it in. It looks like a useful addition to the range of tools that are available across the web.
After blogging about Carol Tenopir’s fascinating library value seminar the other week I was interested to see a view about how librarians are struggling with library value metrics in this slideshow ‘Refining the Academic Library’ (courtesy of a Google+ post from Ben Showers). The slides are from the Education Advisory Board in the US. The Education Advisory Board ‘provides best practice research and practical advice to academic, business, and student affairs leaders at the nation’s leading universities.’
The slides themselves cover the challenges facing libraries and talk through the transformation steps involved in changing the model of delivery from a face-to-face, print-based, ‘just-in-case’ business model, to a digital, ‘just-in-time’ model. Ranging through the issues of the rising costs of journals and how libraries are increasingly being disintermediated (although interestingly that view contradicts a comment from Carol Tenopir that academics may not read journals in the library but still the majority were using library resources as their main source for journal articles), through to looking at the power of Google’s digitisation potential compared with a typical academic library, the slides are a really good summary of the issues facing academic libraries and how academic libraries might move forward.
A lot of the content covers the challenges facing institutions with large physical collections and a building that is the focus of many of their traditional services, so some of that isn’t relevant to my particular interest. But many of the challenges and solutions around eresources, ebooks and support services are very relevant. One of the slides advocates ‘Going Where the Students Are’ which is something that we try to do as much as possible. So we have Library Resources embedded and linked within the Virtual Learning Environment and links to various library websites. We also have information literacy activities and have the goal of embedding as much as possible into VLE courses.
But that isn’t without issue. By closely intergrating the library so students can follow a link to resources directly from their course materials you can easily make the library invisible to users. The value of the physical library is that it makes it clear to students that when they go through the door, the resources within are provided by the ‘Library’ Replicating that feeling when in a virtual environment is quite a challenge and something that library portals have tried to address. A ‘brought to you by the Library’ pop-up message is feasible to do but potentially intrusive and irritiating to users when it pops-up on every single library resource link.
The slides give a really good run through the challenges and issues for libraries with lots of useful material that are going to be quite helpful in telling the sort of stories that we will need to put across over the next few years. As a final thought it’s also interesting to see the NCSU mobile services being mentioned in the slides, as they have been something that we have looked at quite a lot when planning the new mobile library website that we now have made available here
I’ve been part of a few discussions this week about various aspects of mobile and tablet devices, from the institutional data security aspects through to the implications for website development and for helping library staff to support users using these types of devices. So it was interesting to hear about another aspect, app collection management, blogged about here by Emily Clasper.
It was good to see the thought processes about the requirements for buying and maintaining the apps set out so clearly. And to read the conclusion that, ‘choosing content for the iPad was pretty much the same as developing any library collection’. It’s reassuring to see that the standard tried and tested library acquisitions processes of stock objectives, stock plans and stock policies still have relevance to digital devices and apps.
I do wonder though if there is a bit of a difference compared with other library selection processes. In a lot of cases library material is being pre-selected by the library stock supply industry and presented to librarians for them to choose from, so there is some weeding out of unsuitable material. There are tools to help stock selection that may not yet have caught up with a need to include apps. And finding apps can be a bit of a hit and miss affair.
Apps can also be quite unreliable and prone to bugs, but you could say the same of computer games and many libraries have been happily lending them and using them for a long time. But you do have to factor in time to update the apps at regular intervals.
I’d also wonder about the detail in some of the license conditions with some of the apps. Tablets are pretty much expected to be ‘personal’ devices, so the license conditions on an app aren’t likely to cover their use on a shared device in the library.