You are currently browsing the category archive for the ‘Search’ category.
A tweet from @aarontay refering to an infodocket article flagged up the relaunch of Microsoft Academic Search. Badged as a preview of Microsoft Academic from Microsoft Research on their site it says that it now includes ‘over 80 million publications’ and makes use of semantic search, as well as including an API, the Academic Knowledge API. Trying it out briefly with a subject that’s a current hot topic and it quickly brings back good and relevant results. It sorts by relevance as a default but offer a re-sort by Newest, Oldest or Most citations. Sorting by newest brings back some up to date results from 2016 so it seems to have up-to-date content. Sorting by citations brings back results in citation order but also shows some form of subject tagging but it isn’t clear how they are arrived at?
Where there are results with the full text available there are links or links to PDFs for what are presumably open sources. There’s a fairly extensive list of facets for Year, Authors, Affiliations, Fields of Study, Journals and Conference proceedings. Some of the links seem to go to Deepdyve.com which offers a subscription service to academic content, others seem to make use of DOI’s linking to sites such as the Wiley Online Library. When you filter down to individual journals you get a block appearing on the right with details of the journal and you can drill down to links to the journal.
Results appear quickly and look to be relevant. What doesn’t seem to be there at the moment is an ability to be able to configure your local ezproxy-type settings to link through to resources that you have access to through an institutional subscription and I wonder if it will offer the ability for institutions to publish their subscriptions/holdings as they can with Google Scholar. It will be interesting to see how this develops and what the long-term plans are. But it might be a useful alternative to Google Scholar.
I was intrigued to see a couple of pieces of evidence that the number of words used in scholarly searches was showing a steady increase. Firstly Anurag Acharya from Google Scholar in a presentation at ALPSP back in September entitled “What Happens When Your Library is Worldwide & All Articles Are Easy to Find” (on YouTube) mentions an increase in the average query length to 4-5 words, and continuing to grow. He also reported that they were seeing multiple concepts and ideas in their search queries. He also mentions that unlike general Google searches, Google Scholar searches are mostly unique queries.
So I was really interested to see the publication of a set of search data from Swinburne University of Technology in Australia up on Tableau Public. https://public.tableau.com/profile/justin.kelly#!/vizhome/SwinburneLibrary-Homepagesearchanalysis/SwinburneLibrary-Homepagesearchanalysis The data covers search terms entered into their library website homepage search box at http://www.swinburne.edu.au/library/ which pushes searches to Primo, which is the same approach that we’ve taken. Included amongst the searches and search volumes was a chart showing the number of words per search growing steadily from between 3 and 4 in 2007 to over 5 in 2015, exactly the same sort of growth being seen by Google Scholar.
Across that time period we’ve seen the rise of discovery systems and new relevancy ranking algorithms. Maybe there is now an increasing expectation that systems can cope with more complex queries, or is it that users have learnt that systems need a more precise query? I know from feedback from our own users that they dislike the huge number of results that modern discovery systems can give them, the product of the much larger underlying knowledge bases and perhaps also the result of more ‘sophisticated’ querying techniques. Maybe the increased number of search terms is user reaction and an attempt to get a more refined set of results, or just a smaller set of results.
It’s also interesting for me to think that with discovery systems libraries have been trying to move towards ‘Google’-like search systems – single, simple search boxes, with relevancy ranking that surfaces the potentially most useful results at the top. Because this is what users were telling us that they wanted. But Google have noticed that users didn’t like to get millions of results, so they increasingly seem to hide the ‘long-tail’ of results. So libraries and discovery systems might be one step behind again?
So it’s area for us to look at our search queries to see if we have a similar pattern either in the searches that go through the search box on the homepage of the library website, or from the searches that go into our Discovery system. We’ve just got access to Primo Analytics using Oracle Business Intelligence and one of the reports covers popular searches back to the start of 2015. So looking at some of the data and excluding searches that seem to be ISBN searches or single letter searches and then restricting it down to queries that have been seen more than fifty times (which may well introduce its own bias) gives the following pattern of words in search queries:
Just under 31,000 searches, with one word searches being the most common and then a relatively straightforward sequence reducing the longer the search query. But with one spike around 8 words and with an overall average word length of 2.4 words per query. A lot lower than the examples from Swinburne or Google Scholar. Is it because it is a smaller set or incomplete, or because it concentrates on the queries seen more than 50 times? Are less frequently seen queries likely to be longer by definition? Some areas to investigate further
For a few months now we’ve been running a project to look at student needs from library search. The idea behind the research is that we know that students find library search tools to be difficult compared with Google, we know it’s a pain point. But actually we don’t know in very much detail what it is about those tools that students find difficult, what features they really want to see in a library search tool, and what they don’t want. So we’ve set about trying to understand more about their needs. In this blog post I’m going to run through the approach that we are taking. (In a later blog post hopefully I can cover some detail of the things that we are learning.)
Our overall approach is that we want to work alongside students (something that we’ve done before in our personalisation research) in a model that draws a lot of inspiration from a co-design approach. Instead of building something and then usability testing it with students at the end we want to involve students at a much earlier stage in the process so for example they can help to draw up the functional specification.
We’re fortunate in having a pool of 350 or so students who agreed to work with us for a few months on a student panel. That means that we can invite students from the panel to take part in research or give us feedback on a small number of different activities. Students don’t have to take part in a particular activity but being part of the panel means that they are generally pre-disposed to working with us. So we’re getting a really good take-up of our invitations – I think that so far we had more than 30 students involved at various stages, so it gives us a good breadth of opinions from students studying different subjects, at different study levels and with different skills and knowledge.
We’ve split the research into three different stages: an initial stage that looked at different search scenarios and different tools; a second stage that drew out of the first phase some general features and tried them on students, then a third phase that creates a new search tool and then undertakes an iterative cycle of develop, test, develop, test and so on. The diagram shows the sequence of the process.
The overall direction of the project is that we should have a better idea of student needs to inform the decisions we make about Discovery, about the search tools we might build or how we might setup the tools we use.
As with any research activities with students we worked with our student ethics panel to design the testing sessions and get approval for the research to take place.
We identified six typical scenarios – (finding an article from a reference, finding a newspaper article from a reference, searching for information on a particular subject, searching for articles on a particular topic, finding an ebook from a reference and finding the Oxford English Dictionary). All the scenarios were drawn from activities that we ask students to do, so used the actual subjects and references that they are asked to find. We identified eight different search tools to use in the testing – our existing One stop search, the mobile search interface that we created during the MACON project, a beta search tool that we have on our library website, four different versions of search tools from other Universities and Google Scholar. The tools had a mix of tabbed search, radio buttons, bento-box-style search results, chosen to introduce students to different approaches to search.
Because we are a distance learning institution, students aren’t on campus, so we set up a series of online interviews. We were fortunate to be able to make use of the usability labs at our Institute of Educational Technology and used Teamviewer software for the online interviews. In total we ran 18 separate sessions, with each one testing 3 scenarios in 3 different tools. This gave us a good range of different students testing different scenarios on each of the tools.
Sessions were recorded and notes were taken so we were able to pick up on specific comments and feedback. We also measured success rate and time taken to complete the task. The features that students used were also recorded. The research allowed us to see which tools students found easiest to use, which features they liked and used, and which tools didn’t work for certain scenarios.
For the second phase we chose to concentrate on testing very specific elements of the search experience. So for example, we looked at radio buttons and drop-down lists, and whether they should be for Author/Title/Keyword or Article/Journal title/library catalogue. We also looked at the layout of results screens, and the display of facets, to ask students how they wanted to see date facets presented for example.
We wanted to carry out this research with some very plain wireframes to test individual features without the distraction of website designs confusing the picture. We tend to use a wireframing tool called Balsamiq to create our wireframes rapidly and we ran through another sequence of testing, this time with a total of 9 students in a series of online interviews, again using teamviewer.
By using wireframing you can quickly create several versions of a search box or results page and put them in front of users. It’s a good way of being able to narrow down the features that it is worth taking through to full-scale prototyping. It’s much quicker than coding the feature and once you’ve identified the features that you want your developer to build you have a ready-made wireframe to act as a guide for the layout and features that need to be created.
The last phase is our prototype building phase and involves taking all the research and distilling that into a set of functional requirements for our project developer to create. In some of our projects we’ve shared the specification with students so they can agree which features they wanted to see, but with this project we had a good idea from the first two phases what features they wanted to see in a baseline search tool, so missed out that stage. We did, however, split the functional requirements into two stages: a baseline set of requirements for the search box and the results; and then a section to capture the iterative requirements that would arise during the prototyping stage. We aimed for a rolling-cycle of build and test although in practice we’ve setup sessions for when students are available and then gone with the latest version each time – getting students to test and refine the features and identify new features to build and test. New features get identified and added to what is essential a product backlog (in scrum methodology/terminology). A weekly team meeting prioritises the task for the developer to work on and we go through a rolling cycle of develop/test.
Reflections on the process
The process seems to have worked quite well. We’ve had really good engagement from students and really good feedback that is helping us to tease out what features we need to have in any library search tool. We’re about half way through phase three and are aiming to finish off the research for the end of July. Our aim is to get the search tool up as a beta tool on the library website as the next step so a wider group of users can trial it.
Catching up this week with some of the things from last week’s UKSG conference so I’ve been viewing some of the presentations that have been put up on YouTube at https://www.youtube.com/user/UKSGLIVE There were a few that were of particular interest, especially those covering the Discovery strand.
The one that really got my attention was from Simone Kortekaas from Utrecht University talking about their decision to move away from discovery by shutting down their own in-house developed search system and now looking at shutting down their WebOPAC. The presentation is embedded below
I found it interesting to work through the process that they went through, from realising that most users were starting their search elsewhere than the library (mainly Google Scholar) and so deciding to focus on making it easier for users to access library content through that route, instead of trying to focus on getting users to come to the library, to a library search tool. It recognises that other players (i.e. the big search engines) may do discovery better than libraries.
I think I’d agree with the principle that libraries need to be where there users are. So providing holdings to Google Scholar so the ‘find it at your library’ feature works and providing bookmarklet tools (e.g. http://www.open.ac.uk/library/new-tools/live-tools) to help users login are all important things to do. But whilst Google and Bing now seem to be better at finding academic content they still lack Google Scholar’s ‘Library links’ feature and the ability to upload your holdings that would allow you to offer the same form of ‘Find it at the…’ feature in those spaces. And with Google Scholar you always worry about how ‘mainstream’ it is considered.
It is an interesting direction to take as a strategic decision and means that you need to carefully monitor (as Utrecht do) trends in user activity and in particular changes in those major search engines to make sure that your resources can be found through major search engines. One consequence is that users are largely being taken to publisher websites to access the content and we know that the variations in these sites can cause users some difficulty/confusion. But it’s an approach to think about as we see where the trend for discovery takes us.
For a little while I’ve been trying to find some ways of characterising the different generations or ages of library ‘search’ systems. By library ‘search’ I’ve been thinking in terms of tools to find resources in libraries (search as a locating tool) as well as the more recent trend (athough online databases have been with us for a while) of search as a tool to find information.
I wanted something that I could use as a comparison that picked up on some of the features of library search but compared them with some other domain that was reasonably well known. Then I was listening to the radio the other day and there was some mention that it was the anniversary of the 45rpm single, and that made me wonder whether I could compare the generations of library search against the changes in formats in the music industry.
My attempt at trying to map them across is illustrated here. There are some connections – both discovery systems and the likes of spotify streaming music systems are both cloud hosted. Early printed music scores and the printed library catalogue such as the original British Museum library catalogue. I’m not so sure about some of the stages in between though, certainly the direction for both has been to make library/music content more accessible. But it seemed like a worthwhile thing to think about and try it out. Maybe it works, maybe not.
It was Lorcan Dempsey who I believe coined the term, ‘Full library discovery’ in a blog post last year. As a stage beyond ‘full collection discovery’, ‘full library discovery’ added in results drawn from LibGuides or library websites, alongside resource material from collections. So for example a search for psychology might include psychology resources, as well as help materials for those pyschology resources and contact details about the subject librarian that covers psychology. Stanford and Michigan are two examples of that approach, combining lists of relevant resources with website results.
Princeton’s new All search feature offers a similar approach, discussed in detail on their FAQ. This combines results from their Books+, Articles+, Databases, Library Website and Library Guides into a ‘bento box’ style results display. Princeton’s approach is similar to the search from North Carolina State University who I think were about the first to come up with this style.
Although in most of these cases I suspect that the underlying systems are quite different the approach is very similar. It has the advantage of being a ‘loosely-coupled’ approach where your search results page is drawn together in a ‘federated’ search method by pushing your search terms to several different systems, making use of APIs and then displaying the results in a dashboard-style layout. It has the advantage that changes to any of the underlying systems can be accommodated relatively easily, yet the display to your users stays consistent.
For me the disadvantages for this are in the lack of any overriding relevancy ranking for the material and that it perpetuates the ‘silo’ing’ of content to an extent (Books, Articles, Databases etc) which is driven largely by the underlying silos of systems that we rely on to manage that content. I’ve never been entirely convinced that users understand the distinction about what a ‘database’ might be. But the approach is probably as good as we can get until we get to truly unified resource management and more control over relevancy ranking.
Going beyond ‘full library discovery’
But ‘full library discovery’ is still very much a ‘passive’ search tool, and by that I mean that it isn’t personalised or ‘active’. At some stage to use those resources a student will be logging in to that system and that opens up an important question for me. Once you know who the user is, ‘how far should you go to provide a personalised search experience?’. You know who they are, so you could provide recommendations based on what other students studying their course have looked at (or borrowed), you might even stray into ‘learning analytics’ territory and know what the resources were that the highest achieving students looked at.
You might know what resources are on the reading list for the course that student is studying – so do you search those resources first and offer those up as they might be most relevant? You might even know what stage a student has got to in their studies and know what assignment they have to do, and what resources they need to be looking at. Do you ‘push’ those to a student?
How far do you go in assembling a profile of what might be ‘recommended’ for a course, module or assignment, what other students on the cohort might be looking at, or looked at the last time this course ran? Do you look at students previous search behaviour? How much of this might you do to build and then search some form of ‘knowledge base’ with the aim of surfacing material that is likely to be of most relevance to a student. Search for psychology on NCSU’s Search All search box gives you the top three articles out of 2,543,911 articles in Summon, and likely behaviour is not to look much beyond the first page of results. So should we be making sure that they are likely to be the most relevant ones?
But, then there’s serendipity, there’s finding the different things that you haven’t looked for before, or read before, because they are new or different. One of the issues with recommendations is the tendancy for them to be circular, ‘What gets recommended gets read’ to corrupt the performance indicator mantra. So how far do you go? ‘Mind reading search’ anyone?
I’ve noticed recently when searching Google on an ipad that I’m seeing a different results display to the standard desktop display. I’m now seeing the results split up into a set of boxes. So there’s a box at the top containing paid advertising, followed by a box with three results from the web, followed by a box with a single result from news and so on.
In landscape orientation you also get a related searches box on the right of the screen. When you turn to a portrait view the related searches drop to the bottom. At the foot of the page is a next button that takes you to more results including images. On this second screen the related searches have dropped to the bottom and have been replaced by more advertising.
Some of the boxes have a ‘More’ link, for news and images for example. When you go on to pages three and four you are into a fairly standard google web list but still placed in a box. I’m not sure when Google started doing this or if this is a feature that is just being tested for mobile devices. Not everyone seems to see it on ipads so I’d be interested to know under what circumstances you get to see this approach.
It is very reminiscent of the ‘bento box’ type approach, pulling results from different places and that’s something that we’ve been trying. It’s not dissimilar to NCSU’s approach in terms of showing results from different types of content. e.g. http://www.lib.ncsu.edu/search/?q=psychology
I think I’m quite surprised to find Google looking at this route. For libraries we are looking at this route because it is a way to bring results together from several different systems. Those systems are often the front-end of the systems that are used to manage different types of content and we often seem to struggle to join up all the different types of content into one integrated search solution. Google have come to this from a very different place in that they have their content organised by themselves in what you would presume is a consistent way. But still feel the need to be able to highlight content of different types (news, videos, images) to people.
But I think the difference is in the types of things that are being pulled out here. You can see from NCSU that a typical list of different ‘stuff’ for libraries is Articles, Databases, Books & Media, Journals, Library website. Yet for Google it is news, videos, images, maps, essentially quite high-level format concepts. And I’m starting to think that it is one of the real problems for libraries that we have put ourselves in a position where articles and journals are somehow seen to be two different and separate things, when in reality one is just the packaging of several of the others together.
Reading through Lown, Sierra and Boyer’s article from ACRL on ‘How Users Search the Library from a Single Search Box’ based on their work at NCSU, started me thinking about looking at some data around how people are using the single search box that we have been testing at http://www.open.ac.uk/libraryservices/beta/search/.
About three months or so ago we created a prototype tool that pulls together results from the Discovery product we use (EBSCO Discovery) alongside results from the resources database that we use to feed the Library Resources pages on the library website, and including pages from the library website. Each result is shown in a box (ala ‘bento box’) and they are just listed down the screen, with Exact Title Matches and Title Matches being shown at the top, followed by a list of Databases, Library Pages, Ebooks, Ejournals and then articles from EBSCO Discovery. It was done in a deliberately simple way without lots of extra options to manipulate or refine the lists so we could get some very early views about how useful it was as an approach.
Looking at the data from Google Analytics, we’ve had just over 2,000 page views over the three months. There’s a spread of more than 800 different searches with the majority (less than 10%) being repeated fewer than 6 times. I’d suspect that most of those repeated terms are ones where people have been testing the tool.
The data also allows us to pick up when people are doing a search and then choosing to look at more data from one of the ‘bento boxes’, effectively they do this by applying a filter to the search string, e.g. (&Filter=EBOOK) takes you to all the Ebook resources that match your original search term. So 160 of the 2,000 page views were for Ebooks (8%) and 113 f0r Ejournals (6%) for example.
When it comes to looking at the actual search terms then they are overwhelmingly ‘subject’ type searches, with very few journal articles or author names in the search string. There are a few more journal or database names such as Medline or Web of Science But otherwise there is a very wide variety of search terms being employed and it very quickly gets down to single figure frequency. The wordle word cloud at the top of the page shows the range of search terms used in the last three months.
We’ve more work to do to look in more detail about what people want to do but being able to look at the search terms that people use and see how they filter their results is quite useful. Next steps are to do a bit more digging into Google Analytics to see what other useful data can be gleaned about what users are doing in the prototype.
I noticed an interesting Jisc-funded project at Liverpool today that I hadn’t previously heard about (blogged by Jisc today) that talked about a method of sharing resources amongst students using a crowdsourcing approach. The service is called Kritikos and takes several quite interesting approaches. At the heart of the system is some work that has been done with students to identify resources relevant to their subjects (in this case Engineering) and also to identify results that weren’t relevant (often because some engineering terms have different meanings elsewhere – e.g. stress). That’s an interesting approach as one of the criticisms I’ve heard about discovery systems is that they struggle to distinguish between terms that are used across different disciplines (differentiation for example having separate meanings in mathematics and biology).
The search system uses a Google Custom Search Engine but then presents the results as images which is a fascinating way of approaching this aspect. Kritikos also makes use of the Learning Registry to store data about students interactions with the resource and whether they found them relevant or not. It seems to be a really novel approach to providing a search system that could go some way to address one of the common comments that we’ve been seeing in some work we’ve been doing with students. They feel that they are being deluged with too much material and struggle to find the gold nuggets that give them everything they want.
Kritikos looks to be particularly useful for students in the later stages of their degrees, where they are more likely to be doing some research or independent study. One of the things that we are finding from our work is that students at earlier stages are less interested in what other students are doing or what they might recommend. But possibly if they were presented with something like Kritikos they might be more inclined to see the value of other students’ recommendations.