Monthly Archive

You are currently browsing the monthly archive for January 2013.

Making the library’s search box work like Google

17/01/2013 in Discovery systems, libraries, Search | Tags: library search | Comments closed

I’ve been reading a great blog post by Peter Morville on Semantic Studios ‘Inspiration Architecture: the Future of Libraries‘ and it includes a great description that really resonated

There was even a big move towards the vision of “library as platform.” Noble geeks developed elaborate schemata for open source, open API, open access environments with linked data and semantic markup to unleash innovation and integration through transparency, crowdsourcing, and mashups. They waxed poetic about the potential of web analytics and cloud computing to uncover implicit relationships and emerging patterns, identify scholarly pathways and lines of inquiry, and connect and contextualize artifacts with adaptive algorithms. They promised ecosystems of participation and infrastructures for the creation and sharing of knowledge and culture.

Unfortunately, the folks controlling the purse strings had absolutely no idea what these geeks were talking about, and they certainly weren’t about to entrust the future of their libraries (and their own careers) to the same bunch of incompetent techies who had systematically failed, for more than ten years, to simply make the library’s search box work like Google.

I’ve highlighted the last bit as it really struck home. The great search for a library equivalent of the Google Search box is something that is familar to anyone working in trying to build better ways of helping users get to library content. It has pretty much been a mantra over the past few years. (for a great summary of how library search systems differ from Google look at Aaron Tay’s blogpost from last May and his blogpost on web scale discovery from December last year.) So it’s easy to find examples of where libraries and other organisations have tried to put in place a google-like search, from the Biodiversity Heritage Library, from the American University and from others such as the National Archives and Records Administration (reported in Information Management Journal, March 2011) and Oregon State University (paper by Stefanie Buck and Jane Nichols ‘Beyond the search box’ in Reference & User Services Quarterly March 2012.

The current generation of discovery systems (Summon, EDS, Primo etc) are largely built around the concept of a ‘google-like’ search. As reported here for McGill University by OCLC for WorldCat Local. In some ways it seems to me that we’ve been concentrating too much on the simplicity of the original Google interface and as Lorcan Dempsey pointed out in his ‘Thirteen Ways of Looking at Libraries, Discovery and the Catalog’

‘a simple search box has only been one part of the Google formula. Pagerank has been very important in providing a good user experience, and Google has progressively added functionality as it included more resources in the search results. It also pulls targeted results to the top of the page (sports results, weather or movie times, for example), and it interjects specific categories of results in the general stream (news, images).’

So although we’ve implemented a ‘google-like’ search box it becomes apparent that it doesn’t entirely solve the problem. It’s a bit like a false summit or false peak. You think you’ve reached the top but realise that you still have some way to go. Relevancy ranking becomes vitally important and with the Discovery service generation you’ve essentially handed that over to a vendor to control the relevancy algorithm. You can add your local content into the system and have some control but it is limited. And you are constrained in what you can add into the discovery platform. Your catalogue, link resolver/knowledge base generally yes, your institutional repository yes, but your other lists of resources in simple databases, not so easily unless they happen to be OAI-PMH or MARC.

So you look at bringing together content from different systems, probably using the Bento Box approach (as used by Stanford and discussed by them here) where you search across your different systems using APIs etc and return a series of results from each of those systems. You then get a series of results that come from each of the different systems and incorporate the relevancy ranking of discovery systems, rather than ranking the relevancy of the results in total. So is that going to be any better for users? Is it going to be better to sort the results by system, as Stanford have done? or should we be trying to pull results together, as Google do? That’s something we need to test.

But there’s a nagging feeling that this still all relies on users having to come to ‘the library’ rather than being where the users are. So OK, we can put library search boxes into the Virtual Learning Environment, so we’ve an article search tool that searches our Discovery system, but if your users start their search activity with Google, then the challenge is about going beyond Google Scholar to get library discovery up to the network level.

Mobiles, asset-light and the future of the library website

14/01/2013 in libraries, Mobiles, Personalisation, tablets, website | Tags: 2012 Internet Trends, asset-light, Kleiner Perkins Caufield Byers, KPCB | Comments closed

Mobile trends
Although I’d picked up the growth of mobiles and tablets overtaking sales of desktop PCs and laptops, one thing that hadn’t become obvious to me was that we now seem to be approaching the time when the number of tablets/smartphones in circulation outnumbers the numbers of desktops/laptops. December’s Internet Trends survey from Kleiner Perkins Caufield Byers shows, in the graph reproduced here, that they’d expect that stage to be reached globally sometime this year.

Although I’d probably have a couple of caveats about smartphone adoption in the developing world slightly skewing the figures, and whether people might ordinarily have more tablets/smartphones than desktops/laptops, it nonetheless emphasises the point that mobile internet access is now mainstream. For many people it may be their preferred means of accessing your services and their expectation is going to be that it should just work, and give an equivalent or better experience than the ‘traditional’ desktop browser experience.

But numbers of devices doesn’t yet map to the amount of usage of our websites. For us our traffic is still under 10% from mobiles/tablets, so even if the numbers of devices in circulation is reaching parity, we aren’t yet at a stage where the majority of our use is coming from those devices. But looking at the trends, that day is on the horizon maybe.

Asset-light
One of the interesting concepts in KPCB’s slideshow is the ‘asset-light’ idea. The idea that more and more people, perhaps younger people especially, may be less inclined to wanting to own or acquire physical ‘stuff’ and have a more ‘mobile’ (as in being able to move more readily) lifestyle. Characterised as having your music on spotify or iTunes rather than on physical CDs, or renting rather than buying your textbooks. It also has in mind for me a personal version of the concept of ‘Just-in-time’ the production strategy based around reducing inventory in favour of delivery of items when you need them. It’s the concept of ‘on-demand’ rather than ownership ‘just-in-case’.

Potentially, as characterised in this blogpost on Fail!lab it might mean major changes to our library websites, or even the concept of websites. It’s a good and interesting thought. For a while we’ve certainly been pushing content into places where students go, such as pushing library resources via RSS feeds into our VLE. But these spaces are still websites. Yet once you’ve got a stream or feed of data then you could push or pull it into numerous places, whether apps or webpages or systems.

The idea in the Fail!Lab blogpost around Artificially intelligent agents doing the ‘heavy-lifting’ of finding resources for users is something that Paul Walk raised as part of his Library Management Systems vision (slideshare and blog post) so it’s interesting to see someone else postulating a similar future. For me it starts to envisage a future where users choose their environment/tools/agents and we build systems that are capable of feeding data/content to those agents and are built to a set of data sharing standards. It suggests a time where users are able to write queries to interrogate your systems, whether for content or for help materials or skills development activity, and implies a world of profiles, entitlements and charging mechanisms that are a world away from the current model of – go to this website, signup and pass through the gateway into a ‘library’ of stuff.

Amazon Autorip

11/01/2013 in cloud, technology | Tags: amazon, Autorip, Cloud Player | Comments closed

Fascinating news yesterday about Amazon’s decision to give customers who have bought any of some 50,000 CDs the MP3 version of their CD, downloaded to Amazon Cloud Player. (Read the BBC’s reporting of it and the report from the Guardian newspaper). At the moment it is Amazon.com only but there seems to be a commitment in the comments made by Amazon to extend it to other places including the UK. Presumably part of the delay would be the negotiations with record companies about which CDs would be included in the deal. Looking at the list of US CDs there are a lot that are probably less likely to have been bought by a UK audience.

The process seems to be that the MP3s will be downloaded to your Amazon Cloud Player. I wondered what that would mean for that tool as there is currently a limit of only 250 downloaded tracks for the free version, but it seems that Autorip MP3s won’t count towards your limit in Amazon Cloud Player.

It’s a bold move from Amazon and seems partly an attempt to encourage purchasers of CDs to move to MP3s and signup to Amazon’s Cloud Player. But it very much seems to open up another front where Amazon is directly going up against Apple. First there was the Kindle Fire competing with the Apple ipad, now there is the Cloud Player vs iTunes.

Amazon Cloud Player doesn’t seem to have a great deal of functionality at the moment but then really I’m not so sure that iTunes has that much either. I’ve never been all that impressed with the iTunes user experience or user interface when you use it in a browser. And it also seems odd that the Amazon Cloud Player has an iphone/ipod app but doesn’t seem to have an ipad app version.

It’s interesting how there are only really quite basic tools to manage playlists, or use the analytics about what you play. There’s the expected recommendations based on people who bought this also bought that but there aren’t really any advanced features in terms of recommendations for this artist being like that artist, or if you liked this you might like this artist.

Web-archiving

07/01/2013 in Digital Libraries, libraries | Tags: Heritrix, Internet Archive, Library of Congress, web archiving, Wget | Comments closed

News that the Library of Congress have been collecting an archive from twitter for the last few years (reported by the Library of Congress and also in a story from the Washington Post) caught my attention. Web archiving now seems to be something that is gaining some attention as something that Libraries, particularly National Libraries, should be engaged in. So, for example, as the well as the work the Library of Congress are doing, the BL have the UK Web Archive and Australia have Pandora.

Although National Libraries and the Internet Archive have been Web archiving for a while, coverage of the Web is never going to be comprehensive and in particular is always likely to exclude material locked away in institutional systems. For Universities that means that material in their Virtual Learning Environment (VLE), for example, isn’t going to be archived by these web-scale systems, so if you want to preserve a record of how your institution offered online learning, someone has to take steps to actively archive those websites.

What was particularly interesting about the twitter index was that although processes have been put in place to capture and archive the material, there is still some way to go to be able to provide access to that material. Web archiving is something that we’ve been working on for the last few months as part of our digital library work and it has quickly become apparent that collecting the material and presenting the material represent two very different challenges. I’m not entirely sure that the analogy works entirely but it seems to me that you could think of the collection stage of being akin to ‘rescue archaeology’ in that often, what we are having to do, is to archive a website before it is deleted or the server/application closed down.

Collecting web archiving material
We’ve been working on web archiving some of our internal websites, such as our moodle VLE sites, of which there are several thousand going back to 2006. So we’ve had to establish some selection criteria, eventually choosing first and last presentations of individual modules, but recognising that we might also have to capture websites that display particularly significant pedagogical features or aspects of learning design.

To capture the websites our digital library developer initially started with using a web archiving tool called Heritrix but discovered that this had problems with our authentication system. Switching to another tool, Wget proved to be more successful and has allowed us to successfully archive several hundred sites. Both tools essentially work by being given a URL and some parameters and then copying the webpage content, following links to retrieve files/images and continuing across the hierarchy of a site. It is usually a bit of trial and error to get the parameters right so that you archive what you want from the site without straying into other sites. So there is some work to monitor, stop and restart the processes to capture the right content. What you get at the end of the process is an archive file in WARC format.

We have had some challenges to overcome such as concern being expressed that web archiving shouldn’t take place on live systems as web archiving activity could be seen as being similar to a ‘denial of service’ attack, given that it makes a large number of requests in a short space of time. Given that organisations such as the Internet Archive will be web archiving our public sites all the time anyway, that one surprised us a little. Tools like Wget and Heritrix allow you to ‘throttle’ them so they can make limited numbers of requests to minimise the impact on systems.

Displaying web archived material
Although we have captured several hundred websites we haven’t yet made them all available. As with the Library of Congress twitter archive we’ve found that there is quite a significant piece of work to make the websites available. We’ve concentrated on working with one test website as a proof of concept. The approach our digital library developer has taken is to use a local copy of the Wayback Machine software to ‘play-back’ a version of the website. We’ve found that this works pretty well and gives us a reasonable representation of the original website with functioning links to content within that particular website. As part of the digital library work the website has also been pulled apart into its constituent parts and these have been indexed and ingested into the fedora digital library to allow the digital library search to find websites alongside other content.

Whilst the process seems to work quite well there’s some work to do to get all the sites loaded into the digital library. So while we’ve a fairly well-established routine now to archive the sites, we’ve still some work to do to put in place routines to publish the material into the digital library. But it’s been a good peice of work to do and adds to the content that we can make available through the new digital library once it goes live later this year.

Sound links

03/01/2013 in libraries, Mobiles, tablets, tools, Uncategorized | Tags: Antiques Roadshow, audio watermarking, BBC, chirp, QR code | Comments closed

I read an interesting blog post on the BBC Internet blog today (tweeted by @psychemedia) about their new Antiques Roadshow play-along app that got me thinking about whether sound might be an interesting alternative to QR codes for adding links to videos. Apart from the interesting play-along game of guessing the valuation before it is broadcast, what I found interesting was the idea about using a sound to pass information. The BBC app uses ‘audio watermarking’ to send out signals during the TV programme. These signals are inaudible to listeners but can be picked up by the microphone and interpreted by the app on the phone or tablet.

Obviously the BBC are able to broadcast sounds as part of their programmes in a way that isn’t available to most. But with websites and videos many of us are now effectively acting as media producers. I’m particularly thinking about the series of short animations that we’ve been producing over the last couple of years introducing topics like ‘Avoiding plagiarism’ or ‘Evaluating information’. These short animations all tend to end with a link to somewhere to go to find more information and we’ve started adding QR codes to this screen to give people an easy way of following the link via a mobile device. It would be interesting to see if it might be possible to add a sound to the end of the animation that passed on the link.

One piece of technology that does something along those lines is Chirp. This tool ‘sings’ information from one device to another. (For an example of Chirp in action have a look at Thomas Cochrane from AUT’s presentation at last year’s mLibraries conference.) This tool is only available for iOS devices at the moment but apparently they plan to offer it on other devices eventually. It also differs from the BBC ‘audio watermarking’ in that it is an audible sound.

Looks like a potentially useful way of providing follow up links on videos at least.

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Monthly Archive

Making the library’s search box work like Google

Mobiles, asset-light and the future of the library website

Web-archiving

Sound links

RSS links

Twitter posts

Tag Cloud

Search this blog

Categories

Archives

Calendar

Creative Commons License

Privacy policy