We had a couple of interesting workshops earlier in the week, led by Tony Hirst an academic at the OU, familiar to many from his blog at and as @psychemedia on twitter.  The workshops covered a couple of topics that I believe are increasingly critical for librarians, looking at some challenges for libraries in the realms of the Social Researcher and Digital Infoskills.  Slides from his presentation can be found on slideshare at [PS. Tony has now blogged about the sessions at]

The second workshop on Digital Infoskills set a series of challenges for librarians in coming to terms with the new information sources and skills that Tony advocates that librarians should be acquiring.  His contention is that if librarians don’t grab this role then someone else will.  After a short, but wide-ranging series of examples – from facebook network visualisations through to Formula 1 circuit height mapping – we were set to work on a selection of tasks from eight challenges – as illustrations of the sort of activities that ‘Digital Librarians’ might need to engage with as well as to see how our current skills matched up.

Tony outlined 8 challenges for librarians with some specific questions to answer:

  • Current awareness – keeping track of articles, resources or legislation
  • Live data  – where to find particular types of data and how to make sure it is up to date for use in courses
  • Linked open data – using government data
  • Text handling – manipulating data
  • Mapping – plotting latitude and longitude
  • Visualisation – visualising a social network
  • Licencing – what can I do with different types of content?
  • Analytics – how are students using resources?

As a test of our skills and abilities it asked some difficult questions.  We had some answers – in some cases we knew where to get the data from – but didn’t know how to manipulate or combine it – or we could work out how to visualise it but didn’t know where to find the data from.   The feeling in the workshop was that most people thought these were skills that librarians should have so they can help students, researchers and academics.

Reflecting on the session I think the skills and knowledge needed by a ‘Digital Librarian’ breaks down into four distinct areas:

  1. What data is out there?
  2. What are the tools that help to extract, manipulate, combine and present data?
  3. What do librarians need to know about rights and licensing for this type of material?
  4. What tools can be used for assessing use, quality and value of this material

1 is clearly a library role and I’d struggle to see it as being in any way different to finding data in printed reference sources or licensed databases. Librarians should know about relevant data sources in their area.  Similarly with 3 I’d expect librarians to be aware of copyright requirements – they may not have detailed expertise but would know the basics and where to get advice on more complex matters.

With area 4 although the tools, such as website analytics, are different to COUNTER statistics or loan statistics the concept that we should identify what use is being made of our resources is familiar, even though the particular systems and techniques might not be.

2 is I feel, slightly different, and takes librarians out of their comfort zone.   In general librarians don’t have the skills to manipulate data using regular expressions, they don’t know about tools that you can use to extract data from Facebook (e.g. netvizz ) or tools to use for visualisation (e.g. Many Eyes ).  One of the challenges with this area is that these tools are proliferating and it is difficult to keep up to date.

So how do I think librarians should react to this challenge?

  • Firstly, library managers need to understand what is involved and decide exactly what role librarians should play (and to an extent who in libraries should be doing what)
  • Secondly, librarians need to consider data stores alongside other sources of information and build up (or enhance) their knowledge of what data is out there
  • Thirdly librarians need to start build up their knowledge and skills in data manipulation and visualisation through training and other staff development activities
  • Fourthly, librarians need to start using these data sources as responses to customers (staff and students)
  • Finally, librarians need to start to demonstrate to users that they have mastered the skills and can advise and guide them how to exploit these data sources.

Ok, so if librarians take on these new roles – where does the time come from?  Well, less time spent on managing print or cataloguing, less time on carrying out activities that might be better handled through shared services or by other university systems. And like any new set of skills and change of role it is going to mean challenges for librarians.  But librarians have faced constant changes in the last 20 years as the digital information revolution has taken hold.  Librarians have constantly had to acquire new knowledge about different content sources and ways of exploiting them.

Like many of the best workshops it probably raised more questions than answers but it certainly made more people aware of the challenges and possibilities for future librarians and it was great to get the chance to hear Tony Hirst’s thoughts on the challenges and issues.

The rise of the datastore
The last year or so has seen the growth of a new type of resource available on the web, the “datastore”.   Datastores are collections of data, generally, but not necessarily government data, although usually authoratitive.  Examples include DataSF from San Fransisco, Chicago City Data, the UK Government datastore, the London datastore and the Guardian newspaper’s datastore.

The defining aspect of datastores is that they provide ‘raw data’ collected together in one place, rather than being spread across many different government and other websites.   That raw data can cover a wide variety of subjects, from mortality statistics and Indices of Deprivation, through to ‘how many miles of high-speed railway’ and FTSE100 Directors’ pay.  Generally the data is presented in the form of tables, often in Excel or CSV format (or exportable in those formats).

Alongside the benefits of having the data collected together in one place, and in many cases having data that has never been made available publicly, datastores offer the potential to start to present and analyse the data through visualisations and data mashups – by combining data from more than one source and exposing connections.  There are a few examples on Tony Hirst’s blog and an example below using Many Eyes of a visualization of London population.

london population visualization

London population visualization










Challenges for libraries and librarians
Although in terms of discovery, datastores help by collecting together relevant data in one place, such as on , datastores do still present some specific challenges for libraries and librarians.   There are still some discovery challenges, but I would consider that the biggest challenges are around librarians getting to grips with exploiting the data within the datastores.

The challenges in this area are about finding the datastores and understanding what is contained within each datastore.  But these are skills that librarians are used to using to find and assess resources so shouldn’t present much of a challenge.  Techniques such as building Google Custom Search engines to search datastores can help with finding relevant data within these resources.

Custom Search engine datastores Building a custom search engine to search the London, UK Government and Guardian datastores is fairly straightforward, so I’ve built a quick example at

Using this form of search engine makes it simple to discover which datastores have datasets that may be of interest.



Exploiting datastores
Where I think it starts to become more difficult for libraries is in exploiting the data in the datastores.  There is a question here about the role of the librarian.  Is the role to just find the data, check its quality and promote it to academics and students?, or is there a role to help users to find ways of using the data?  The latter role implies a much deeper understanding of how the data can be used, not just being able to export the data in a spreadsheet and produce a nice visualization, but also to know how to use APIs to dig into datastores, to use tools such as Yahoo Pipes to take data and transform it.  The question is how much librarians and libraries see that as their role, and how much do they see their role as being that of supporting students and academics in exploiting the data, by learning and teaching the techniques to understand and exploit the data.

Obviously some librarians are more comfortable playing around with data than others, but the interest among librarians in the Mashed Library events indicates that a growing number of librarians are starting to appreciate that this is an area relevant to libraries.   But over the years libraries and librarians have had to get to grip with several generations of new technological innovations, from CD-ROMs, through the world wide web to RFID and in each case librarians have taken on board new skills to exploit the new technologies and help their users.

