I was intrigued to see a couple of pieces of evidence that the number of words used in scholarly searches was showing a steady increase.  Firstly Anurag Acharya from Google Scholar in a presentation at ALPSP back in September entitled “What Happens When Your Library is Worldwide & All Articles Are Easy to Find” (on YouTube) mentions an increase in the average query length to 4-5 words, and continuing to grow.  He also reported that they were seeing multiple concepts and ideas in their search queries.  He also mentions that unlike general Google searches, Google Scholar searches are mostly unique queries.

So I was really interested to see the publication of a set of search data from Swinburne University of Technology in Australia up on Tableau Public.  https://public.tableau.com/profile/justin.kelly#!/vizhome/SwinburneLibrary-Homepagesearchanalysis/SwinburneLibrary-Homepagesearchanalysis The data covers search terms entered into their library website homepage search box at http://www.swinburne.edu.au/library/ which pushes searches to Primo, which is the same approach that we’ve taken.  Included amongst the searches and search volumes was a chart showing the number of words per search growing steadily from between 3 and 4 in 2007 to over 5 in 2015, exactly the same sort of growth being seen by Google Scholar.

Across that time period we’ve seen the rise of discovery systems and new relevancy ranking algorithms.  Maybe there is now an increasing expectation that systems can cope with more complex queries, or is it that users have learnt that systems need a more precise query?  I know from feedback from our own users that they dislike the huge number of results that modern discovery systems can give them, the product of the much larger underlying knowledge bases and perhaps also the result of more ‘sophisticated’ querying techniques.  Maybe the increased number of search terms is user reaction and an attempt to get a more refined set of results, or just a smaller set of results.

It’s also interesting for me to think that with discovery systems libraries have been trying to move towards ‘Google’-like search systems – single, simple search boxes, with relevancy ranking that surfaces the potentially most useful results at the top. Because this is what users were telling us that they wanted.  But Google have noticed that users didn’t like to get millions of results, so they increasingly seem to hide the ‘long-tail’ of results.  So libraries and discovery systems might be one step behind again?

So it’s area for us to look at our search queries to see if we have a similar pattern either in the searches that go through the search box on the homepage of the library website, or from the searches that go into our Discovery system.  We’ve just got access to Primo Analytics using Oracle Business Intelligence and one of the reports covers popular searches back to the start of 2015.  So looking at some of the data and excluding searches that seem to be ISBN searches or single letter searches and then restricting it down to queries that have been seen more than fifty times (which may well introduce its own bias) gives the following pattern of words in search queries:

Search query length - OU Primo Jan - Oct 2015 - queries seen more than 50 timesJust under 31,000 searches, with one word searches being the most common and then a relatively straightforward sequence reducing the longer the search query.  But with one spike around 8 words and with an overall average word length of 2.4 words per query.  A lot lower than the examples from Swinburne or Google Scholar.  Is it because it is a smaller set or incomplete, or because it concentrates on the queries seen more than 50 times?  Are less frequently seen queries likely to be longer by definition?  Some areas to investigate further