The end of 2015 and the start of 2016 seems to have delivered a number of interesting reports and presentations relevant to the library technology sphere.  So Ken Chad’s latest paper ‘Rethinking the Library Services Platform‘ picks up on the lack of interoperability between library systems as does the new BiblioCommons report on the public library sector ‘Essential Digital Infrastructure for Public Libraries in England‘ commenting that “In retail, digital platforms with modular design have enabled quickly-evolving omnichannel user experiences. In libraries, however, the reliance on monolithic, locally-installed library IT has deadened innovation”.

As ‘Rethinking the Library Services Platform‘ notes, in many ways the term ‘platform’ doesn’t really match the reality of the current generation of library systems.  They aren’t a platform in the same way as an operating system such as Windows or Android, they don’t operate in a way that third-parties can build applications to run on the platform.  Yes, they offer integration to financial, student and reference management systems but essentially the systems are the traditional library management system reimagined for the cloud.  Much of the changes are a consequence of what becomes possible with a cloud-based solution.  So their features are shared knowledge bases, with multi-tenanted applications shared by many users as opposed to local databases and locally installed applications.   The approach from the dwindling number of suppliers is to try to build as many products as possible to meet library needs.  Sometimes that is by developing these products in-house (e.g. Leganto) and sometimes by the acquisition of companies with products that can be brought into the supplier’s eco-system.  The acquisition model is exactly the same as that practised by both traditional and new technology companies as a way of building their reach.  I’m starting to view the platform as much more in line with the approach that a company like Google will take with a broad range of products aiming to secure customer loyalty to their ecosystem rather than that of another company.  So it may not be so surprising that technology innovation, which to my mind seems largely to be driven by vendors innovating to deliver to what they see as being library needs and shaped by what vendors think they see as an opportunity, isn’t delivering the sort of platform that is suggested.  As Ken notes, Jisc’s LMS Change work discussed back in 2012 the sort of loosely-coupled, library systems component approach giving libraries to ability to integrate different elements to give the best fit to their needs from a range of options.  But in my view options have very much narrowed since 2012/13.

The BiblioCommons report I find particularly interesting as it includes within it an assessment of how the format silos between print and electronic lead to a poor experience for users, in this case how ebook access simply doesn’t integrate into OPACs, with applications such as Overdrive being used that are separate to the OPAC, e.g. Buckinghamshire library services ebooks platform, and their library catalogue are typical.  Few if any public libraries will have invested in the class of discovery systems now common in Higher Education (and essentially being proposed in this report), but even with discovery systems the integration of ebooks isn’t as seamless as we’d want, with users ending up in a variety of different platforms with their own interfaces and restrictions on what can be done with the ebook.  In some ways though, the public library ebook offer, that does offer some integration with the consumer ebook world of Kindle ebooks, is better then the HE world of ebooks, even if the integration through discovery platforms in HE is better.  What did intrigue me about the proposal from the BiblioCommons report is the plan to build some form of middleware system using ‘shared data standards and APIs’ and that leads to wondering whether that this can be part of the impetus for changing the way that library technology interoperates.  The plan includes in section 10.3 the proposal to  ‘deliver middleware, aggregation services and an initial complement of modular applications as a foundation for the ecosystem, to provide a viable pathway from the status quo towards open alternatives‘ so maybe this might start to make that sort of component-based platform and eco-system a reality.

Discovery is the challenge that Oxford’s ‘Resource Discovery @ The University of Oxford‘ report is tackling.   The report by consultancy, Athenaeum 21, looks at discovery from the perspective of a world-leading research institution, with large collections of digital content and looks at connecting not just resources but researchers with visualisation tools of research networks, advanced search tools such as elastic search.  The recommendations include activities described as ‘Mapping the landscape of things’, Mapping the landscape of people’, and ‘Supporting researchers established practices’.  In some ways the problems being described echo the challenge faced in public libraries of finding better ways to connect users with content but on a different scale and includes other cultural sector institutions such as museums.

I also noticed a presentation from Keith Webster from Carnegie Mellon University ‘Leading the library of the future: W(h)ither technical services?  This slidedeck takes you through a great summary of where academic libraries are now and the challenges they face with open access, pressure on library budgets and changes in scholarly practice.   In a wide-ranging presention it covers the changes that led to the demise of chains like Borders and Tower records and sets the library into the context of changing models of media consumption.    Of particular interest to me were the later slides about areas for development that, like the other reports, had improving discovery as part of the challenge.   The slides clearly articulate the need for innovation as an essential element of work in libraries (measured for example as a % of time spent compared with routine activities) and also of the value of metrics around impact, something of particular interest in our current library data project.

Four different reports and across some different types of libraries and cultural institutions but all of which seem to me to be grappling with one issue – how do libraries reinvent themselves to maintain a role in the lives of their users when their traditional role is being erroded or when other challengers are out-competing with libraries – whether through improving discovery or by changing to stay relevant or by doing something different that will be valued by users.

 

 

 

 

 

Photgraph of RobinWe’re in the early stages of our work with library data and I thought I’d write up some reflections on the early stages.  So far we’ve mostly confined ourselves to trying to understand the library data we have and suitable methods to access it and manipulate it.  We’re interested in  aggregations of data, e.g. by week, by month, by resource, in comparison with total student numbers etc.

Ezproxy data
One of our main sources of data is from ezproxy, which we use for both on and off-campus use of online library resources.  Around 85-90% of our authenticated resource access goes through this system.   One of the first things we learnt when we started investigating this data source is that there are two levels of logfile – the full log of all resource requests and the SPU (Starting Point URL) logfile.   The latter tracks the first request to a domain in a session.

We looked at approaches that others had taken to help shape how we approached analysing the data.  Wollongong for example, decided to analyse the time stamp as follows:

  • The day  is divided into 144 10-minute sessions
  • If a student has an entry in the log during a 10-minute period, then 1/6 is added to the sum of that student’s access for that session (or week, in the case of the Marketing Cube).
  • Any further log entries during that student’s 10-minute period are not counted.

Using this logic, UWL measures how long students spent using its electronic resources with a reasonable degree of accuracy due to small time periods (10 minutes) being measured.

Discovering the Impact of Library Use and Student Performance, Cox and Jantti 2012 http://er.educause.edu/articles/2012/7/discovering-the-impact-of-library-use-and-student-performance

To adopt this approach would mean that we’d need to be looking at the full log files to pick up each of the 10 minute sessions.  Unfortunately owing to the size of our version of the full logs we found it wasn’t going to be feasible to use this approach, we’d have to use the SPU version and take a different approach.

Athens data
A small proportion of our resource authentication goes through OpenAthens.   Each month we get a logfile of resource accesses that have been authenticated using this route.   Unlike ezproxy data we don’t get a date/timestamp, all we know is that those resources were accessed during the month.  Against each resource/user combination you get a count of the number of times that combination occurred during the month.

Looking into the data one of the interesting things we’ve been able to identify is that OpenAthens authentication also gets used for other resources not just library resources, so for example we’re using it for some library tools such as RefWorks and Library Search, but it’s straight-forward to take those out if they aren’t wanted in your analysis.

So one of the things we’ve been looking at is how easy it is to add the Athens and Ezproxy data together.   There are similarities between the datasets but some processing is needed to join them up.  The ezproxy data can be aggregated at a monthly level and there are a few resources that we have access to via both routes so those resource names need to be normalised.

The biggest difference between the two datasets is that whereas you get a logfile entry for each SPU access in the ezproxy dataset you get a total per month for each user/resource combination in the OpenAthens data.  One approach we’ve tried is just to duplicate the rows, so where the count says the resource/user combination appeared twice in the month, just copy the line.  In that way the two sets of data are comparable and can be analysed together, so if you wanted to be able do a headcount of users who’ve accessed 1 or more library resources in a month you could include data from both ezproxy and openathens authenticated resources.

Numbers and counts
One thing we’ve found is that users of the data want several different counts of users and data from the library e-resources usage data.  The sorts of questions we’ve had to think about so far include:

  • What percentage of students have accessed a library resource in 2014-15? – (count of students who’ve accessed 1 or more library resources)
  • What percentage of students have accessed library resources for modules starting in 2014? – a different question to the first one as students can be studying more than one module at a time
  • How much use of library resources is made by the different Faculties?
  • How many resources have students accessed – what’s the average per student, per module, per level

Those have raised a few interesting questions, including which student number do you take when calculating means? – the number at the start, at the end, or part-way through?

Next steps
In the New Year we’ve more investigation and more data to tackle and should be able to start to join library data up with data that lets us explore correlations between library use, retention and student success.

 

 

I’m not sure how many people will be familar with the work of Oliver Postgate, and specifically of his stop-motion animation series, The Clangers.  One of the characters in the series is Major Clanger, and he’s an inventor.

Image from Kieran Lamb via https://flic.kr/p/dqthAU

Image Kieran Lamb from https://flic.kr/p/dqthAU

The character always comes to mind to me when thinking about development approaches as an example of a typical approach to user engagement.  So the scene opens with a problem presenting itself.  Major Clanger sees the problem and thinks he has an idea to solve it, so he then disappears off and shuts himself away in his cave.  Cue lots of banging and noises as Major Clanger is busy inventing a device to solve ‘the problem’.  Then comes the great unveilling of the invention, often accompanied by some bemusement from the other Clangers about what the new invention actually is, how it works and what it is supposed to do.  Often the invention seems to turn out to not be quite what was wanted or has unforseen consequences.  And that approach seems to me to characterise how we often ‘do’ development.  We see a problem, we may ask users in a focus group or workshop to define their requirements, but then all too often we go and, like Major Clanger, build the product in complete isolation and then unveil it to users in what we describe as usability testing.  And all too often they go ‘yeh, that’s not quite what we had in mind’ or ‘well that would have been good when we were doing X but now we want something else’.

So how do we break that circle and solve our users problems in a better development style that builds the products that users can and will use?  That’s where I think that a more co-operative model of user engagement comes in.  It starts with a different model of user engagement, where users are involved throughout the requirements, development and testing stages.  And that’s an approach that we’ve started to call ‘co-design‘, and have piloted during our discovery research.

It starts with a Student Panel of students who agree to work with us in activities to improve library services.  We recruit cohorts of a few dozen students with a committment to carry out several activities with us during a defined period.   We outline the activity we are going to undertake and the approach we will take and make sure we have the necessary research/ethical approvals for the work.

For the discovery research we went through three stages:

  1. Requirements gathering – in this case testing a range of library search tools with a series of exercises based on typical user search activities.  This helped to identify the typical features users wanted to see, or did not want to see.  For example, at this stage, we were able to rule out using the ‘bento box’ results approach that has been popular at some other libraries
  2. Feature definition – a stage that allows you to investigate in detail some specific features – in our case we used wireframes of search box options and layouts and tested them with a number of Student panel members – ruling out tabbed search approaches and directing us much more towards a very simple search box without tabs or drop-downs.  This stage lets you test a range of different features without the expense of code development, essentially letting you refine your requirements in more detail.
  3. Development cycles – this step took the form of a sequence of build and test cycles, creating a search interface from scratch using the requirements identified in stages one and two, and then refining it, testing specific new features and discarding or retaining them depending on user reactions.  This involved working with a developer to build the site and then work through a series of development and test ‘sprints, testing features identified either in the early research or arising from each of the cycles.

These steps took us to a viable search interface and built up a pool of evidence that we used to setup and customise Primo Library Search.  That work led to further stages in engagement with users as we went through a fourth stage of usability testing the interface and making further tweaks and adjustments in the light of user reactions.  Importantly it’s an on-going process with a regular cycle of testing with users to continually improve the search tool.  The latest testing is mainly around changes to introduce new corporate branding, but includes other updates that can be made to the setup or the CSS of the site in advance of new branding being applied.

The ‘co-design’ model also fits with a more evolutionary or incremental approach to website development and is a model that usability experts such as Nielsen Norman Group often recommend as users generally want a familiar design rather than a radical redesign.  Continuous improvement systems typically expect incremental improvements as the preferred approach.  Yet the ‘co-design’ model could equally be deployed for a complete site re-design, starting from scratch with a more radical design and structural changes and then using the incremental approach to refine them into a design that meets user needs and overcomes the likely level of resistence by users familar with the old site, by delivering an improved user experience to which users can quickly get comfortable with.

SunsetIn the early usability tests we ran for the discovery system we implemented earlier in the year one of the aspects we looked at were the search facets.   Included amongst the facets is a feature to let users limit their search by a date range.  So that sounds reasonably straight-forward, filter your results by the publication date of the resource, narrowing your results down by putting in a range of dates.  But one thing that emerged during the testing is that there’s a big assumption underlying this concept.  During the testing a user tried to use the date range to restrict results to journals for the current year and was a little baffled why the search system didn’t work as they expected.  Their expectation was that by putting in 2015 it would show them journals in that subject where we had issues for the current year.  But the system didn’t know that issues that were continuing and therefore had a date range that was open-ended were available for 2015 as the metadata didn’t include the current year, just a start date for the subscription period.  So consequently the system didn’t ‘know’ that the journal was available for the current year.  And that exposed for me the gulf that exists between user and library understanding and how our metadata and systems don’t seem to match user expectations.  So that usability testing session came to mind when reading the following blog post about linked data.

I would really like my software to tell the user if we have this specific article in a bound print volume of the Journal of Doing Things, exactly which of our location(s) that bound volume is located at, and if it’s currently checked out (from the limited collections, such as off-site storage, we allow bound journal checkout).

My software can’t answer this question, because our records are insufficient. Why? Not all of our bound volumes are recorded at all, because when we transitioned to a new ILS over a decade ago, bound volume item records somehow didn’t make it. Even for bound volumes we have — or for summary of holdings information on bib/copy records — the holdings information (what volumes/issues are contained) are entered in one big string by human catalogers. This results in output that is understandable to a human reading it (at least one who can figure out what “v.251(1984:Jan./June)-v.255:no.8(1986)”  means). But while the information is theoretically input according to cataloging standards — changes in practice over the years, varying practice between libraries, human variation and error, lack of validation from the ILS to enforce the standards, and lack of clear guidance from standards in some areas, mean that the information is not recorded in a way that software can clearly and unambiguously understand it.  From https://bibwild.wordpress.com/2015/11/23/linked-data-caution/ the Bibliographic Wilderness blog

Processes that worked for library catalogues or librarians i.e. in this case the description v.251(1984:Jan./June)-v.255:no.8(1986) need translating for a non-librarian or a computer to understand what they mean.

It’s a good and interesting blog post and raises some important questions about why, despite the seemingly large number of identifiers in use in the library world (or maybe because) it is so difficult to pull together metadata and descriptions of material to consolidate versions together.   It’s an issue that causes issues across a range of work we try to do, from discovery systems, where we end up trying to normalise data from different systems to reduce the number of what seem to users to be duplicate entries to work around usage data, where trying to consolidate usage data of a particular article or journal becomes impossible where versions of that article are available from different providers, or from institutional repositories or from different URLs.

Photograph of grass in sunlightOne of the areas we started to explore with our digital archive project for www.open.ac.uk/library/digital-archive was web archiving.  The opportunity arose to start to capture course websites from our Moodle Virtual Learning environment from 2006 onwards.   We made use of the standard web archive format WARC and eventually settled on Wget as the tool to archive the websites from moodle, (we’d started with using Heritrix but discovered that it didn’t cope with our authentication processes).  As a proof of concept we included one website in our staff version of our digital archive (the downside of archiving course materials is that they are full of copyright materials) and made use of a local instance of the Wayback machine software from the Internet Archive.  [OpenWayback is the latest development].   So we’ve now archived several hundred module websites and will be starting to think about how we manage access to them and what people might want to do with them (beyond the obvious one of just looking at them to see what was in those old courses).

So I was interested to see a tweet and then a blog post about a tool called warcbase – described as ‘an open-source platform for managing web archives…’ but particularly because the blog post from Ian Milligan combined web archiving with something else that I’d remembered Tony Hirst talking and blogging about, IPython and Jupyter. It also reminded me of a session Tony ran in the library taking us through ipython and his ‘conversations with data’ approach.

The warcbase and jupyter approach takes the notebook method of keeping track of your explorations and scripting and applies it to the area of web archives to explore the web archive as a researcher might.  So it covers the sort of analytical work that we are starting to see with the UK Web Archive data (often written up on the UK Web Archive blog).   And it got me starting to wonder both about whether warcbase might be a useful technology to explore as a way of thinking about how we might develop a method of providing access to the VLE websites archive.  But it also made me think about what the implications might be of the skills that librarians (or data librarians) might need to have to facilitate the work of researchers who might want to run tools like jupyter across a web archive, and about the technology infrastructure that we might need to facilitate this type of research, and also about what the implications are for the permissions and access that researchers might need to explore the web archive.  A bit of an idle thought about what we might want to think about.

Plans are worthless, but planning is everything. Dwight D. Eisenhower

I’ve always been intrigued about the differences between ‘plans’ and ‘planning’ and was taken by this quote from President Dwight D. Eisenhower.  Talking to the National Defense Executive Reserve Conference in 1957 and talking about how when you are planning for an emergency it isn’t going to happen in the way you are planning, so you throw your plans out and start again.  But, critically, planning is vital, in Eisenhower’s own words “That is the reason it is so important to plan, to keep yourselves steeped in the character of the problem that you may one day be called upon to solve–or to help to solve.”  There’s a similar quote generally attributed to Winston Churchill (although I’ve not been able to find an actual source for it)   “Plans are of little importance, but planning is essential”

Bird flocks and sunsetMany of the examples of these sort of quotes seem to come from a military background, along the lines that no plan will survive contact with reality.  But the examples I think also hold true for any project or activity.  Our plans will need to adapt to fit the circumstances and will, and must, change.  Whereas a plan is a document that outlines what you want to do, it is based on the state of your knowledge at a particular time, often before you have started the activity.  It might have some elements based on experience of doing the same thing before, or doing a similar thing before, so you are undertaking some repeatable activity and will have a greater degree of certainty about how to do X or how long Y will take to do.  But that often isn’t the case.  So it’s a starting point, your best guess about the activity.  And you could think about a project as a journey, with the project plan as your itinerary.  You might set out with a set of times for this train or that bus, but you might find your train being delayed or taking a different route and so your plan changes.

So you may start with your destination, and a worked out plan about how to get there.  But, and this is where planning is important, some ideas about contingencies or options or alternative routes in case things don’t quite work out how your plan said they should.  And this is the essence of why planning is important in that it’s about the process of thinking about what you are going to do in the activity.  You can think about the circumstances, the environment and the potential alternatives or contingencies in the event that something unexpected happens.

For me, I’m becoming more convinced that there’s a relationship around project length and complexity and a window/level at which you can realistically plan in terms of level of detail and how far in advance you can go.  At a high level you can plan where you want to get to, what you want to achieve and maybe how you measure whether you’ve achieved what you want to – so, you could characterise that as the destination.  But when it comes to the detail of anything that involves any level of complexity, newness or innovation, then the window of being able to plan a detailed project plan (the itinery) starts have a shorter and shorter window of certainty.  A high-level plan is valuable, but expect that the detail will change.  But then shorter time periods of planning seem to be more useful – becoming much more akin to the agile approach.

So when you’re looking at your planned activity and resource at the start of the project and then comparing it with the actual resource and activity then often you’ll find there’s a gap.  They didn’t pan out how you expected at the start, well, they probably wouldn’t and maybe shouldn’t.  Part way into the project you know much more than when you started, as Donald Rumsfeld put it “Reports that say that something hasn’t happened are always interesting to me, because as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns – the ones we don’t know we don’t know. And if one looks throughout the history of our country and other free countries, it is the latter category that tend to be the difficult ones”

As you go through your project, those ‘unknown unknowns’ become known, even if at some stages and in some projects it’s akin to turning over stones to find more stones, and so on, but on your journey you build up a better picture and build better plans for the next cycle of activity.  (And if you really need to know the differences between Planned and Actuals you can use MS Project and can baseline your plan and then re-baseline it to track how the plan has changed over time).

So, we’re at the start of a new project and I thought it was a useful time to reflect on the range of tools we’re using in the early stages of the project for collaboration and project management.  These tools cover communication, project management, task management and bibliographic management.

Project Management
For small projects we’re using the One Page Project Plan, an excel template from www.oppmi.com This uses a single excel worksheet to cover tasks, progress, responsibility and accountability and also some confidence measures about how the project is progressing.  We’ve used this fairly consistently for two or three years for our small projects and people are pretty familiar with not only how to use them for projects but also how to read and interpret them.   You can only really get about 25-30 tasks onto the OPPP, so it will be used to track activities at a relatively high level although we can reflect both the work-package level and some tasks within each work-package.  Tasks are generally described in the past tense using words such as ‘completed’ or ‘developed’, so although it does give a reasonable overview of when activities are due to be happening there is less of an appreciation of the actual activities taking place in each time period.  There’s a space on the page for a description of the status and that can be used to flag up what has been completed, or any particular issues.   For bigger projects several OPPPs might be used, maybe with a high-level overarching version.

Task tracking
To organise and track the tasks in the project we’re using TrelloTrello screenshotThis openly available tool lets you create a Board for your project, and then arrange your tasks (each one termed a ‘card’)  into groupings.  So we’ve got several Phases for the project and then To Do, Doing and Done lists of tasks.  You can add people to the cards and send out emails to people, set deadlines etc.  You can easily drag cards from one list to another, create new cards and share with the project team.   We’re only using the open/free version not the Business Class version and it seems to work fine for us.  Trello worked pretty well for our digital library development project, particularly in terms of focusing on which developments went into which software release.  So it will be interesting to see how well it works on a project that is a bit more exploratory and research-based.

Bibliographic management
Looking at what work has already been done in this area is an important part of the project.  So at an early stage we’re doing a literature review.  That’s partly to be able to understand the context that we’re working in and to give credit (through citations) of ideas that have come from other work, but specifically to look at techniques people have been using to investigate the relationshipRefMe screenshot between student success, retention and library use.  We’re not expecting that there will be an exact study that matches up with our conditions (the lack of student book loans data for one thing), but the approaches other people have taken are important for us to understand.  We’re also hoping to write up the work for publication, so keeping track of  citations for other work is vital.  To do that we’re using RefMe and have setup a shared folder for the members of the project team to add references they find.  RefMe seems to be quite good at finding the full references from partial details, although there are a few we’re adding in manually.  To help with retrieving the articles we’re adding in the local version of the URL so we can find the article again.  The tool also allows you to add notes about the reference, which can be useful.  RefMe has an enormous range of reference styles and can output in a range of formats to other tools such as Zotero, Mendeley, RefWorks or Endnotes for example.

Communications
To keep interested parties up-to-date with project activities we’re using a wordpress blog, for this project the blog is at www.open.ac.uk/blogs/LibraryDataLibrary Data blog screenshotWe’re fortunate in that we’ve an institutional blog environment established using a locally hosted version of the wordpress software.  Although it isn’t generally the latest version of the wordpress blog software, there’s little maintenance overhead, we can track usage through the Google Analytics plug-in, and it integrates in with our authentication system, so it does the job quite well.  We’ve used blogs fairly consistently through our projects and they have the advantage of allowing the project team to get messages and updates out quickly, encourage some commenting and interaction, and allow both short update-type newsy items as well as some more in-depth reflective or detailed pieces.   They can be a relatively informal communication channels, are easy for people to edit and update and there’s not much of an overhead to administration.  Getting a header sorted out for the blog is often the thing that takes up a bit of time.

Other tools and tools for the next steps
The usual round of office tools and templates are being used for project documents, for project mandates and project initiation documents, through to documentation of Risks, Assumptions, Issues and Dependencies, Stakeholder plans and Communications plans.  These are mainly in-house templates in MS Word or Excel.  Having established the project with an initial set of tools, attention is now turning to approaches to manage the data and the statistics.  How do we manage the large amount of data to be able to merge datasets, extract data, carry out analyses, develop and present visualisations?  Where can we use technologies we’ve already got, or already have licences for, where might we need other tools?

 

I was intrigued to see a couple of pieces of evidence that the number of words used in scholarly searches was showing a steady increase.  Firstly Anurag Acharya from Google Scholar in a presentation at ALPSP back in September entitled “What Happens When Your Library is Worldwide & All Articles Are Easy to Find” (on YouTube) mentions an increase in the average query length to 4-5 words, and continuing to grow.  He also reported that they were seeing multiple concepts and ideas in their search queries.  He also mentions that unlike general Google searches, Google Scholar searches are mostly unique queries.

So I was really interested to see the publication of a set of search data from Swinburne University of Technology in Australia up on Tableau Public.  https://public.tableau.com/profile/justin.kelly#!/vizhome/SwinburneLibrary-Homepagesearchanalysis/SwinburneLibrary-Homepagesearchanalysis The data covers search terms entered into their library website homepage search box at http://www.swinburne.edu.au/library/ which pushes searches to Primo, which is the same approach that we’ve taken.  Included amongst the searches and search volumes was a chart showing the number of words per search growing steadily from between 3 and 4 in 2007 to over 5 in 2015, exactly the same sort of growth being seen by Google Scholar.

Across that time period we’ve seen the rise of discovery systems and new relevancy ranking algorithms.  Maybe there is now an increasing expectation that systems can cope with more complex queries, or is it that users have learnt that systems need a more precise query?  I know from feedback from our own users that they dislike the huge number of results that modern discovery systems can give them, the product of the much larger underlying knowledge bases and perhaps also the result of more ‘sophisticated’ querying techniques.  Maybe the increased number of search terms is user reaction and an attempt to get a more refined set of results, or just a smaller set of results.

It’s also interesting for me to think that with discovery systems libraries have been trying to move towards ‘Google’-like search systems – single, simple search boxes, with relevancy ranking that surfaces the potentially most useful results at the top. Because this is what users were telling us that they wanted.  But Google have noticed that users didn’t like to get millions of results, so they increasingly seem to hide the ‘long-tail’ of results.  So libraries and discovery systems might be one step behind again?

So it’s area for us to look at our search queries to see if we have a similar pattern either in the searches that go through the search box on the homepage of the library website, or from the searches that go into our Discovery system.  We’ve just got access to Primo Analytics using Oracle Business Intelligence and one of the reports covers popular searches back to the start of 2015.  So looking at some of the data and excluding searches that seem to be ISBN searches or single letter searches and then restricting it down to queries that have been seen more than fifty times (which may well introduce its own bias) gives the following pattern of words in search queries:

Search query length - OU Primo Jan - Oct 2015 - queries seen more than 50 timesJust under 31,000 searches, with one word searches being the most common and then a relatively straightforward sequence reducing the longer the search query.  But with one spike around 8 words and with an overall average word length of 2.4 words per query.  A lot lower than the examples from Swinburne or Google Scholar.  Is it because it is a smaller set or incomplete, or because it concentrates on the queries seen more than 50 times?  Are less frequently seen queries likely to be longer by definition?  Some areas to investigate further

 

Two interesting pieces of news came out yesterday with the sale of 3M library systems to Bibliotecha http://www.blibliotecha.com and then the news that Proquest were buying ExLibris.  For an industry take on the latter news look at http://www.sr.ithaka.org/blog/what-are-the-larger-implications-of-proquests-acquisition-of-exlibris/

From the comments on twitter yesterday it was a big surprise to people, but it seems to make some sense.  And it is a sector that has always gone through major shifts and consolidations.  Library systems vendors always seem to change hands frequently.  Have a look at Marshall Breeding’s graphic of the various LMS vendors over the years to see that change is pretty much a constant feature. http://librarytechnology.org/mergers/

There are some big crossovers in the product range, especially around discovery systems and the underlying knowledge bases.  Building and maintaining those vast metadata indexes must be a significant undertaking and maybe we will see some consolidation.  Primo and Summon fed from the same knowledge base in the future maybe?

Does it help with the conundrum of getting all the metadata in all the knowledge bases?  Maybe it puts Proquest/ExLibris in a place where they have their own metadata to trade?  But maybe it also opens up another competitive front.

It will be intersting to see what the medium term impact will be on plans and roadmaps.  Will products start to merge, will there be less choice in the marketplace when libraries come round to chosing future systems?

 

 

A fascinating couple of articles over the last few days around what is happening with ebook sales (from the US).  A couple of articles from the Stratechery site (via @lorcanD and @aarontay) Disconfirming ebooks and Are ebooks declining, or just the publishers.  Firstly referring to an article in the NY Times reporting on ebook sales plateau’ing, but then a more detailed piece of work from Author Earnings analysing more data.  The latter draws the conclusion that it was less a case of ebook sales plateauing but more a case that the market share from the big publishers was declining (and postulating that price increases might play a part).  Overall the research seems to show growth in independent and self-publishing but what looks like fairly low levels of growth overall.  The figures mostly seem to be about market share rather than hard and fast sales per se.  But interesting nonetheless to see how market share is moving away from ‘traditional’ print publishers.

The Stratechery articles are particularly interesting around the way that ebooks fit with the disruptive model of new digital innovation challenging traditional industries, what is termed here ‘Aggregation theory‘  [As an aside it’s interesting from the Author Earnings article to note that many of the new ebooks from independent or self-publishers don’t have ISBNs.  What does that imply for the longer term tracking of this type of material?    Already I suspect that they are hard to acquire for libraries and just don’t get surfaced in the library acquisitions sphere. Does it mean that these titles are likely to become much more ephemeral?]

The conclusion in the second Stratechery article I find particularly interesting, that essentially ebooks aren’t revolutionising the publishing industry in terms of the form they take.  They are simply a digital form of the printed item.  Often they add little extra by being in digital form, maybe they are easier to acquire and store, but often in price terms they aren’t much cheaper than the printed version.  Amazon Kindle does offer some extra features but I’ve never been sure how much they are taken up by readers. Unlike music you aren’t seeing books being disaggregated into component parts or chapters (although it’s a bit ironic considering that some of Charles Dickens’ early works, such as The Pickwick Papers, were published in installments, as part works).  But I’d contend that the album in music isn’t quite the same as a novel for example.  Music albums seem like convenient packaging/price? of a collection of music tracks (possibly with the exception of ‘concept’ albums?) for a physical format, whereas most readers wouldn’t want to buy their novels in parts.  There’s probably more of a correlation between albums/tracks and journals/articles – in that tracks/articles lend themselves in a digital world to being the lowest level and a consumable package of material.

But I can’t help but wonder why audiobooks don’t seem to have disrupted the industry either.  Audible are offering audiobooks in a similar way to Netflix but aren’t changing the book industry in the way the TV and movie industry are being changed.  So that implies to me that there’s something beyond the current ‘book’ offering (or that the ‘book’ actually is a much more consumable, durable package of content than other media).   Does a digital ‘book’ have to be something quite different that draws on the advantage of being digital – linking to or incoporating maps, images, videos or sound, or some other form of social interaction that could never be incorporated in a physical form?   Or are disaggregated books essentially what a blog is (modularization as suggested on stratechery)?  Is the hybrid digital book the game-changer?  [there are already examples of extra material being published online to support novels – see Mark Watson’s Hotel Alpha stories building on his novel Hotel Alpha, for example.]   You could liken online retailers as disrupting the book sales industry as a first step but we’re perhaps only in the early stages of seeing how Amazon will ultimately disrupt the publishing industry.  Perhaps the data from Author Earnings report points to the signs of the changes in ebook publishers.

Twitter posts

Categories

Calendar

February 2016
M T W T F S S
« Jan    
1234567
891011121314
15161718192021
22232425262728
29  

Creative Commons License

Follow

Get every new post delivered to your Inbox.

Join 50 other followers