You are currently browsing the category archive for the ‘Linked Data’ category.

The announcement yesterday of Olso public library’s new open-source linked data discovery interface comes just a few days after JSTOR unveiled their new prototype text analyzer. JSTOR’s text analyzer takes a piece of text or image and then extracts meaning from it, before finding recommendations from JSTOR for resources that might be relevant.   It’s a simple process and there’s a neat interface showing the results on one side, with an easy route directly to the PDF of the article.  The left hand side of the screen picks out the entities analysed from your text and gives a slider feature to let you adjust your results by giving more weight to some concepts than others.

I’ve not been able to find any detailed description of how it works but it looks very much like some form of semantic search feature, with processes in place to analyse the concepts in the submitted text and match them against the index of concepts from the JSTOR database.  In a lot of ways it isn’t dissimilar to the DiscOU tool we used in the Stellar project (and that development made use of semantic technologies, with entity recognition, semantic indexes and a triple store).JSTOR text analyzer screen shot of rsults page

Oslo’s linked data library system is a different approach, but again with linked data at the heart of the product.  It’s open source and looks to be making use of Koha the open source library management system, so essentially acts as an add-on to that product.  It has much the same clean look and feel of some of the latest commercial discovery products, with facets on the left and the main results on the right.  It will be interesting to see how it develops.

It’s particularly interesting to see this new development and it’s a contrast to the approach from Folio who are building their open sourcce library management system, but currently seem not to be aiming to include a discovery interface.  Makes me wonder about the potential of this development as the discovery interface for Folio?.

SunsetIn the early usability tests we ran for the discovery system we implemented earlier in the year one of the aspects we looked at were the search facets.   Included amongst the facets is a feature to let users limit their search by a date range.  So that sounds reasonably straight-forward, filter your results by the publication date of the resource, narrowing your results down by putting in a range of dates.  But one thing that emerged during the testing is that there’s a big assumption underlying this concept.  During the testing a user tried to use the date range to restrict results to journals for the current year and was a little baffled why the search system didn’t work as they expected.  Their expectation was that by putting in 2015 it would show them journals in that subject where we had issues for the current year.  But the system didn’t know that issues that were continuing and therefore had a date range that was open-ended were available for 2015 as the metadata didn’t include the current year, just a start date for the subscription period.  So consequently the system didn’t ‘know’ that the journal was available for the current year.  And that exposed for me the gulf that exists between user and library understanding and how our metadata and systems don’t seem to match user expectations.  So that usability testing session came to mind when reading the following blog post about linked data.

I would really like my software to tell the user if we have this specific article in a bound print volume of the Journal of Doing Things, exactly which of our location(s) that bound volume is located at, and if it’s currently checked out (from the limited collections, such as off-site storage, we allow bound journal checkout).

My software can’t answer this question, because our records are insufficient. Why? Not all of our bound volumes are recorded at all, because when we transitioned to a new ILS over a decade ago, bound volume item records somehow didn’t make it. Even for bound volumes we have — or for summary of holdings information on bib/copy records — the holdings information (what volumes/issues are contained) are entered in one big string by human catalogers. This results in output that is understandable to a human reading it (at least one who can figure out what “v.251(1984:Jan./June)-v.255:no.8(1986)”  means). But while the information is theoretically input according to cataloging standards — changes in practice over the years, varying practice between libraries, human variation and error, lack of validation from the ILS to enforce the standards, and lack of clear guidance from standards in some areas, mean that the information is not recorded in a way that software can clearly and unambiguously understand it.  From the Bibliographic Wilderness blog

Processes that worked for library catalogues or librarians i.e. in this case the description v.251(1984:Jan./June)-v.255:no.8(1986) need translating for a non-librarian or a computer to understand what they mean.

It’s a good and interesting blog post and raises some important questions about why, despite the seemingly large number of identifiers in use in the library world (or maybe because) it is so difficult to pull together metadata and descriptions of material to consolidate versions together.   It’s an issue that causes issues across a range of work we try to do, from discovery systems, where we end up trying to normalise data from different systems to reduce the number of what seem to users to be duplicate entries to work around usage data, where trying to consolidate usage data of a particular article or journal becomes impossible where versions of that article are available from different providers, or from institutional repositories or from different URLs.

I just remembered to put up my slides onto slideshare from a talk I gave to a group of students about the work that we’ve been doing around linked data, particularly in relation to the STELLAR project.   STELLAR was a Jisc-funded project that finished in July.  It investigated the value of a digital library collection of old course material, carried out an enhancement using linked data technology and then evaluated the impact on perceptions of value.

The slides talk through why semantic web technologies might be important to libraries, cover a very basic outline of linked data and then concentrate on discussing what we did in STELLAR, what we found and how we’ve embedded that technology into our new digital archive.

The slides are on slideshare at and embedded below

Latest project
From February I’m going to be involved in a new project, STELLARSemantic Technologies Enhancing the Lifecycle of LeArning Resources (funded by JISC).   In some ways the project connects with previous work I’ve been involved with in the Lucero project in that it will be employing linked data, and will be working with learning materials, in that I’ve had some involvement with our production and presentation learning systems through the VLE.  But STELLAR will be dealing with a different area for me, in that we’ll be looking at my institution’s store of legacy learning materials.   So it’s a good opportunity to learn more about curation and preservation and digital lifecycles.

STELLAR is particularly going to be looking at trying to understand the value of those legacy learning materials by talking to the academics who have been involved in creating those materials.   There are quite a few reasons why older course materials may still have value, they might be able to be reused in new courses on the basis that reusing old materials might be less costly than creating new materials.  They might have value in being able to be transformed into Open Educational Resources.  Or, for example, they might have value in being good historical examples of styles of teaching and learning.  So STELLAR will be exploring different types and models of expressing the value of those materials.

Finding out about the value that is placed on these materials can also be an important factor when trying to understand which materials to preserve as a priority, or where you should expend your resources, and we’d hope that STELLAR would help to inform HE policies as institutions build up increasing amounts of digital learning materials.

As part of STELLAR we will be taking some digital legacy learning material and transforming it into linked data (with some help from our friends in KMi). This gives us the opportunity to connect old course materials into the OU’s ecosystem by linking to existing datasets on current courses and OER material in OpenLearn.  By transforming the content in this way we can then explore whether making it more discoverable changes the value proposition, makes the content more likely to be reused or opens up other possibilities.  It should be an interesting project and one that I’m looking forward to, as there are going to be a lot of opportunties to build up my understanding of these issues and aspects.

Last time I heard the results of a Funding bid we’d submitted I was sitting in a conference in London.  It seems to be becoming a habit as we had the results of our latest funding bid just before Christmas.   This time I was sitting in a coffee bar in Yorkshire, and it was a nice surprise to hear that we’d been successful as I wasn’t expecting the results before Christmas.  We’d put in a funding bid back in November and all being well with the clarifications on a few points, are going to be doing some work starting next month with our digital legacy learning materials and linked data.  We’re looking forward to getting started on STELLAR.

Harvard Library Innovation Laboratory

The second aspect of data that caught my interest today was Harvard’s Library Innovation Laboratory.  I must admit that when I saw the link to it I did wonder whether it was going to be a list of library tools aimed directly at users (I’m sure I’ve seen the name used elsewhere recently for just such a list).  I know we are looking at redoing our library toolbox to update it and library lab or labspace sounded like a good name for something like that. But the Library Innovation Laboratory is much more interesting proof of concept for anyone with any interest in what you can do with library activity data.

Using library circulation data that has been contributed to the LibraryCloud there are some really imaginative prototype visualisations in the Stack View and Shelf Rank tools.  Two values are shown instantly.  The book width is determined by the numbers of pages in the book and the book colour corresponds to the volume of loans so the darker the blue the greater the traffic.  ShelfLife screenshot Titles are then shown as a stack one on top of each other.   It’s a really neat visualisation of the data and I’m already wondering if that approach would work equally well with visualising library data that is entirely electronic resources.  [It’s actually one of the big problems about anything to do with electronic resources – that there isn’t really a universal icon or symbol that you can use that everyone recognises that it relates to stuff that is online and in electronic form].

There’s quite a lot of interesting stuff in the site and also in the LibraryCloud site at One of the things that particularly interested me (from experiences with the RISE Activity Data project) was the section about data privacy and anonymisation, as a key requirement always has to be that with any dataset where the aspiration is for open release, it must be prepared in a way that ensures that users are unable to be identified individually.

The checkout visualisation is also a neat way of showing that sort of data in a nice clear fashion. Checkout screenshot The feature that lets you sort the data by different schools is useful and slightly brings to mind one of the MOSAIC competition entries that used a graph-type visualisation that allowed you to navigate through library use data.  It did amuse me though that ‘Headphones’ appears twice in the top ten with different numbers.   The perils of libraries using their Library Management Systems to loan all sorts of other things!
LibraryCloud screenshot

LibraryCloud currently has data from Harvard and Northeastern Universities and Darien, San Francisco and San Jose public libraries.  A couple of sites to keep an eye on over the next few months.

Courtesy of a couple of tweets from @psychemedia and @simonjbains two items about data and data visualisation caught my attention today on twitter.  Firstly a great post by Pete Warden ‘What the Sumerians can teach us about data’ on his PeteSearch blog and secondly Harvard’s Library Innovation Laboratory.   Both items covering particular aspects of data, one talking about the history of data, the other a great set of examples of how to use and visualise data, in Harvard’s case library circulation data using the LibraryCloud library metadata repository.

A history of data
I found the blog post on the Sumerians to be particularly interesting.   The starting point is the contention that their greatest achievement was the invention of data and there are some good examples of how the written language was used to record who owned what (or who owed what to whom).  I like the comparison made between the ‘threats of supernatural retribribution’  being used to protect the integrity of the data with modern warnings over video copying, both being ‘ways of forcefully expressing society’s norms, rather than a credible threat of punishment’

It find it interesting how often we seem to find that early examples of writing often turn out to be lists, in other words data rather than stories.  Another example that comes to mind are the Vindolanda tablets.  These are from the Roman period and found during excavations at a roman fort in Northern England.

“… for dining pair(s) of blankets … paenulae, white (?) … from an outfit: paenulae … and a laena and a (?) … for dining loose robe(s) … under-paenula(e) … vests … from Tranquillus
under-paenula(e) … [[from Tranquillus]]
from Brocchus tunics … half-belted (?) … tunics for dining (?) … (Back, 2nd hand?) … branches (?), number … a vase …
with a handle rings with stones (?) …”
Writing lists of things seems to have been a recurrent story and it strikes me that being able to list and count things must have been an early skill that would have to have been mastered by early farmers at least.  To my mind there’s no reason to suppose that early peoples would have been any less intelligent than modern day people.  And as the archaeologists are fond of pointing out ‘absence of evidence isn’t evidence of absence’ so there’s no reason to suppose that people weren’t collecting lists of data long before the Sumerians maybe?

I also thought the comments making a comparion between instructions for interpreting omens and predicting the future from data to be really interesting.  A great deal is often made of the importance of ‘facts and data’ and it has long seemed to me that the critical factor isn’t the data that you have, but how you interpret it and what decisions you make.  And it often seems to me that the interpretation of data and decision making is a much less scientific exercise.

Part two covering the Harvard Library Innovation Laboratory to follow in the next blog post.

I spent a couple of days this last week in Reading at the Institutional Web Managers Workshop run by UKOLN. Videos from the event and a lot more can be found on the event blog at  It was an interesting couple of days and it was fascinating to compare HE institutional web management with my previous experiences of local government institutional web management.  The chill wind of budget cutbacks are being felt in the HE sector although perhaps not yet as severely as had been feared.  The need to justify (and explain) the difference the web team makes in hard cash terms is certainly something that local government has had to address for some time.   So it was good to see one of the first presenters (Ranjit Sidhu) tackle how to present data about website value in a way that plays directly to that need.  Rather than showing visits, visitors or page views by showing usage in cost terms by segmenting use by location of visitor, concentrating on conversion rates and turning them into cash values.  A much more powerful approach.

Although a lot of the sessions were mostly aimed directly at institutional web managers there were quite a few sessions that were of value to me.  The parallel sessions I went to, one on Legal aspects from JISC Legal and one on Linked and Open Data at Southampton were particularly useful and interesting.  Probably unsurprisingly the legal issue on most people’s minds was the new EU ‘cookies’ legislation.  For a piece of legislation that we need to be complying by next spring there’s still some way to go to sort out exactly what is needed and get it in place.

The Linked Data session from Chris Gutteridge from Southampton was also really useful. It was great to look at the range of data that they are making available and the approach they are taking to do the work.  Using Google Docs as a simple way of getting end users to update their data is a good approach and I particularly liked that they were keen to show the value of the data by getting quotes from data providers about the benefits.  The catering data that they’ve done a lot of work with shows the value really well, and the map display approach is a really good way of providing access to the data.  You can see Southhampton’s data at

I’ve been to a couple of presentations in the last month about the Lucero linked data project, this is a JISC-funded project run by the OU’s Knowledge Media Institute, that has been working to publish a fairly wide range of university material as linked data.   One presentation by the Project Director Mathieu d’Aquin covering the wider project aspects to a university-wide audience, the other by the Project Manager, Owen Stephens, to a library audience. 

It’s a project I’ve been fortunate enough to have some involvement with and it has some impressive achievements for a short project.   Establishing as the first University-wide linked data repository, being able to release a range of different datasets from institutional repositories to course data, and not least, going some way to getting the concepts of linked data out from the laboratory and into an area where they can start to be discussed as a practical technology.

Linked data
For anyone who isn’t familiar with Linked Data it’ described by its proposer Tim Berners-Lee on his website thus:

‘The Semantic Web isn’t just about putting data on the web. It is about making links, so that a person or machine can explore the web of data.  With linked data, when you have some of it, you can find other, related, data’ 

[If you are interested in finding out more about Linked Data then is a reasonable starting place to explore].

I always find it interesting with new technologies how people describe them to other people.  Mathieu described it as essentially publishing a raw database of data onto the web as RDF with the data being addressable using a URI and talked of creating ‘a very big distributed dataspace’  That’s certainly something that is well-illustrated by the ‘traditional’ linked data cloud image (without which no linked data presentation is complete).  From more of a library perspective Owen used the example of Charlotte Bronte as the creator of Jane Eyre as an illustration of the subject, object and predicate ‘triple’.

Libraries and Linked Data
What has been particularly interesting from a library point of view is the way that linked data allows systems to extract data in new ways.  So for example, publishing course materials in RDF format has allowed queries to be created that make it possible to list all courses available in a particular country, something you can’t easily do from current websites.  And you start to see all kinds of possibilities for libraries and search systems.  You are potentially less constrained in having to decide in advance what type of queries users can make of your data.  I was interested in a comment made by Mathieu that the art of expoiting linked data was to build many small applications rather than a few big applications. 

Also last month there was the news that Archives Hub through the LOCAH project have released some of their content as linked data as a proof of concept.   So it seems to me that we are at an early stage for libraries in thinking about how Linked Data can be of use.  Certainly for us one of the things we have to think about is does it mean that we need to start to change our cataloguing practice.  It’s clear that the way we catalogue isn’t ideal if we want to convert our catalogue data to Linked Data. 

The process to decide on how you are going to express your data as Linked Data is quite a time-consuming one and a process that is very much an on-the-fly activity. Which I think is where libraries may start to feel a bit uncomfortable, without the safety net of some clear frameworks. 

I think we’ve a way to go before this type of activity starts to be commonplace, and maybe we need some tools that help us to present our resources in Linked Data more easily.  I think the analogy is obviously the early days of the web when the first website swe built were with raw html.  But it wasn’t long until tools came along such as Frontpage and Dreamweaver that meant you could build sites without knowing too much html. 

But I still think that there is massive potential within  the Linked Data world and libraries need to engage with it and start to build prototypes that can show the benefits.  Certainly I’m hopeful that we’ll have the chance to do some further work in this area with our Digital Library.

Twitter posts



January 2021

Creative Commons License