You are currently browsing the monthly archive for November 2015.

SunsetIn the early usability tests we ran for the discovery system we implemented earlier in the year one of the aspects we looked at were the search facets.   Included amongst the facets is a feature to let users limit their search by a date range.  So that sounds reasonably straight-forward, filter your results by the publication date of the resource, narrowing your results down by putting in a range of dates.  But one thing that emerged during the testing is that there’s a big assumption underlying this concept.  During the testing a user tried to use the date range to restrict results to journals for the current year and was a little baffled why the search system didn’t work as they expected.  Their expectation was that by putting in 2015 it would show them journals in that subject where we had issues for the current year.  But the system didn’t know that issues that were continuing and therefore had a date range that was open-ended were available for 2015 as the metadata didn’t include the current year, just a start date for the subscription period.  So consequently the system didn’t ‘know’ that the journal was available for the current year.  And that exposed for me the gulf that exists between user and library understanding and how our metadata and systems don’t seem to match user expectations.  So that usability testing session came to mind when reading the following blog post about linked data.

I would really like my software to tell the user if we have this specific article in a bound print volume of the Journal of Doing Things, exactly which of our location(s) that bound volume is located at, and if it’s currently checked out (from the limited collections, such as off-site storage, we allow bound journal checkout).

My software can’t answer this question, because our records are insufficient. Why? Not all of our bound volumes are recorded at all, because when we transitioned to a new ILS over a decade ago, bound volume item records somehow didn’t make it. Even for bound volumes we have — or for summary of holdings information on bib/copy records — the holdings information (what volumes/issues are contained) are entered in one big string by human catalogers. This results in output that is understandable to a human reading it (at least one who can figure out what “v.251(1984:Jan./June)-v.255:no.8(1986)”  means). But while the information is theoretically input according to cataloging standards — changes in practice over the years, varying practice between libraries, human variation and error, lack of validation from the ILS to enforce the standards, and lack of clear guidance from standards in some areas, mean that the information is not recorded in a way that software can clearly and unambiguously understand it.  From the Bibliographic Wilderness blog

Processes that worked for library catalogues or librarians i.e. in this case the description v.251(1984:Jan./June)-v.255:no.8(1986) need translating for a non-librarian or a computer to understand what they mean.

It’s a good and interesting blog post and raises some important questions about why, despite the seemingly large number of identifiers in use in the library world (or maybe because) it is so difficult to pull together metadata and descriptions of material to consolidate versions together.   It’s an issue that causes issues across a range of work we try to do, from discovery systems, where we end up trying to normalise data from different systems to reduce the number of what seem to users to be duplicate entries to work around usage data, where trying to consolidate usage data of a particular article or journal becomes impossible where versions of that article are available from different providers, or from institutional repositories or from different URLs.

Photograph of grass in sunlightOne of the areas we started to explore with our digital archive project for was web archiving.  The opportunity arose to start to capture course websites from our Moodle Virtual Learning environment from 2006 onwards.   We made use of the standard web archive format WARC and eventually settled on Wget as the tool to archive the websites from moodle, (we’d started with using Heritrix but discovered that it didn’t cope with our authentication processes).  As a proof of concept we included one website in our staff version of our digital archive (the downside of archiving course materials is that they are full of copyright materials) and made use of a local instance of the Wayback machine software from the Internet Archive.  [OpenWayback is the latest development].   So we’ve now archived several hundred module websites and will be starting to think about how we manage access to them and what people might want to do with them (beyond the obvious one of just looking at them to see what was in those old courses).

So I was interested to see a tweet and then a blog post about a tool called warcbase – described as ‘an open-source platform for managing web archives…’ but particularly because the blog post from Ian Milligan combined web archiving with something else that I’d remembered Tony Hirst talking and blogging about, IPython and Jupyter. It also reminded me of a session Tony ran in the library taking us through ipython and his ‘conversations with data’ approach.

The warcbase and jupyter approach takes the notebook method of keeping track of your explorations and scripting and applies it to the area of web archives to explore the web archive as a researcher might.  So it covers the sort of analytical work that we are starting to see with the UK Web Archive data (often written up on the UK Web Archive blog).   And it got me starting to wonder both about whether warcbase might be a useful technology to explore as a way of thinking about how we might develop a method of providing access to the VLE websites archive.  But it also made me think about what the implications might be of the skills that librarians (or data librarians) might need to have to facilitate the work of researchers who might want to run tools like jupyter across a web archive, and about the technology infrastructure that we might need to facilitate this type of research, and also about what the implications are for the permissions and access that researchers might need to explore the web archive.  A bit of an idle thought about what we might want to think about.

Plans are worthless, but planning is everything. Dwight D. Eisenhower

I’ve always been intrigued about the differences between ‘plans’ and ‘planning’ and was taken by this quote from President Dwight D. Eisenhower.  Talking to the National Defense Executive Reserve Conference in 1957 and talking about how when you are planning for an emergency it isn’t going to happen in the way you are planning, so you throw your plans out and start again.  But, critically, planning is vital, in Eisenhower’s own words “That is the reason it is so important to plan, to keep yourselves steeped in the character of the problem that you may one day be called upon to solve–or to help to solve.”  There’s a similar quote generally attributed to Winston Churchill (although I’ve not been able to find an actual source for it)   “Plans are of little importance, but planning is essential”

Bird flocks and sunsetMany of the examples of these sort of quotes seem to come from a military background, along the lines that no plan will survive contact with reality.  But the examples I think also hold true for any project or activity.  Our plans will need to adapt to fit the circumstances and will, and must, change.  Whereas a plan is a document that outlines what you want to do, it is based on the state of your knowledge at a particular time, often before you have started the activity.  It might have some elements based on experience of doing the same thing before, or doing a similar thing before, so you are undertaking some repeatable activity and will have a greater degree of certainty about how to do X or how long Y will take to do.  But that often isn’t the case.  So it’s a starting point, your best guess about the activity.  And you could think about a project as a journey, with the project plan as your itinerary.  You might set out with a set of times for this train or that bus, but you might find your train being delayed or taking a different route and so your plan changes.

So you may start with your destination, and a worked out plan about how to get there.  But, and this is where planning is important, some ideas about contingencies or options or alternative routes in case things don’t quite work out how your plan said they should.  And this is the essence of why planning is important in that it’s about the process of thinking about what you are going to do in the activity.  You can think about the circumstances, the environment and the potential alternatives or contingencies in the event that something unexpected happens.

For me, I’m becoming more convinced that there’s a relationship around project length and complexity and a window/level at which you can realistically plan in terms of level of detail and how far in advance you can go.  At a high level you can plan where you want to get to, what you want to achieve and maybe how you measure whether you’ve achieved what you want to – so, you could characterise that as the destination.  But when it comes to the detail of anything that involves any level of complexity, newness or innovation, then the window of being able to plan a detailed project plan (the itinery) starts have a shorter and shorter window of certainty.  A high-level plan is valuable, but expect that the detail will change.  But then shorter time periods of planning seem to be more useful – becoming much more akin to the agile approach.

So when you’re looking at your planned activity and resource at the start of the project and then comparing it with the actual resource and activity then often you’ll find there’s a gap.  They didn’t pan out how you expected at the start, well, they probably wouldn’t and maybe shouldn’t.  Part way into the project you know much more than when you started, as Donald Rumsfeld put it “Reports that say that something hasn’t happened are always interesting to me, because as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns – the ones we don’t know we don’t know. And if one looks throughout the history of our country and other free countries, it is the latter category that tend to be the difficult ones”

As you go through your project, those ‘unknown unknowns’ become known, even if at some stages and in some projects it’s akin to turning over stones to find more stones, and so on, but on your journey you build up a better picture and build better plans for the next cycle of activity.  (And if you really need to know the differences between Planned and Actuals you can use MS Project and can baseline your plan and then re-baseline it to track how the plan has changed over time).

Twitter posts



November 2015
« Oct   Dec »

Creative Commons License