You are currently browsing the tag archive for the ‘re-identification’ tag.

I was intrigued to read David Weinberger’s blog post ‘Protecting library privacy with a hard opt-in’  in it he suggests that there is a case to be made for asking users to explicitly opt-in to publishing details of their checkouts (loans) before you can use that activity data.  I must admit that I’d completely missed the connection between David Weinberger author of ‘Everything is miscellaneous’ and his role with the Harvard Innovation Lab and I’m sure I’ve probably blogged about both in the past.

The concern that has been raised is about re-identification, where supposedly ‘anonymous’ datasets can be combined with other data to identify individuals.  There’s a good description of the issue in this paper from 2008 from Michael Hay and others from the University of Massachusetts http://scholarworks.umass.edu/cgi/viewcontent.cgi?article=1176&context=cs_faculty_pubs

Obviously an issue of this type is of critical significance when you might be talking about medical trials data for example, but library data might also be personal or sensitive.  Aside from the personal aspects you could also imagine that a researcher carrying out a literature search for material for a potential new research area would not want ‘competitors’ to know that they were looking at a particular area, particularly now cross-domain research activities are more common.

The issue of anonymity and potentially being able to identify an individual from their activity data is an area that has been explored through a number of projects, such as in Jisc’s Activity Data programme and synthesis project outputs at http://www.activitydata.org  particularly in the section on data protection.   Most of the approaches tackled anonymization in two ways, by replacing user IDs with a generated ID (described interestingly by Hay as ‘naive anonymization’) and by removing data from the dataset if there were only small numbers of users included (such as a course with only a few students enrolled).

Re-identification techniques seem to work by being able to identify unique patterns of use, called digital fingerprints that can be used to identify individuals.  When you combine data from an anonymized dataset with other material you can start to identify individuals.  It certainly seems to be something that needs to be thought carefully about when contemplating releasing datasets.

Is the suggested solution, of asking for explicit permission the right approach?  If you are planning to release data openly, I’d probably agree.  If you plan to use it only within your systems to generate recommendations, then yes it’s probably good practice. I worry slightly about the value of the activity data if there is a low opt-in level.  That may significantly diminish its value and usefulness.

I’m not too convinced though about the approach that says that users agree to a public page that lists your activity.  That would seem to me to encourage people who might not be unhappy with allowing their data to be used unattributed in recommendations not to opt-in.  When we’ve asked students about their views of what data we should be able to use they were quite happy for activity data to be used.   My view would be that it’s fine to show an individual what they have used (and we do that), but not something to share.

Twitter posts

  • RT @OU_Library: The OU Library wanted to bring you a short message to let you know that we are here for you 24/7. We hope you enjoy 😊 htt… 3 weeks ago
  • RT @OpenUniversity: Since 1969, our archive team have kept a record of almost everything we've ever taught and some key archives that form… 4 weeks ago
  • ICA - The Open University Archive ica.org/en/what-archiv… 2 months ago
  • RT @RLUK_David: We were one of the bodies who lobbied UKRI to extend the deadline on their OA Review Consultation, so it's great to see a p… 2 months ago
  • RT @OU_Library: THE LIBRARY HELPDESK IS STILL OPEN! Watch our latest YouTube video and find out from one of our librarians, Jude, how you… 2 months ago

Categories

Calendar

June 2020
M T W T F S S
1234567
891011121314
15161718192021
22232425262728
2930  

Creative Commons License