How to deal with sensitive individual data in open science?

+16 votes
548 views
asked Aug 5, 2015 in Open Science by Guilherme Kenji Chih (80 points)

I work with governmental register data, which includes sensitive information about individuals, and therefore cannot make datasets available to the public lest I risk violating the privacy of my subjects. Even though the data are anonymised, it contains information such as addresses and dates of medical prescriptions that should not be made public. I myself need to work inside a secure computer lab and don't have free access to these data. What are the possibilities for open science in this case?

Edit: My terms of use prohibit any sharing of the dataset with others not authorised to use the data. I am not allowed, in addition, to transfer the data from the computer where they are supposed to be analysed.



This post has been migrated from the Open Science private beta at StackExchange (A51.SE)
commented Aug 18, 2015 by Guilherme Kenji Chih (80 points)
That seems useful, but storing that dataset in a VCS repository is out of question. I don't have control over the way in which it is stored at the main database, only over the files that are delivered to me. In addition, I have agreed to not store the data extractions that I receive anywhere else than the computer where I'm supposed to analyse it. The terms of use prohibit sharing data with others, even if it is encrypted.

This post has been migrated from the Open Science private beta at StackExchange (A51.SE)
commented Aug 18, 2015 by kenorb (430 points)
If you're using VCS repository to store your data, you may consider encrypting the sensitive data. See: [BlackBox](https://github.com/StackExchange/blackbox) for more details.

This post has been migrated from the Open Science private beta at StackExchange (A51.SE)

2 Answers

+12 votes
answered Aug 5, 2015 by Thomas (915 points)
 
Best answer

Great question! Unfortunately, I think there are not going to be great answers. Three ideas have come up in my discussions with users of registry data:

Describe data access procedures completely: Be as transparent as you possibly can be about how you acquired the data, how others can acquire them, and what the policies and costs associated with acquisition would be.

Show everything that you can: While you cannot share the raw data, you can likely share a considerable amount of information from the data. For example, you can include descriptive statistics and graphics that convey the univariate and multivariate patterns in the data. You may also be able to share certain aggregated statistics (e.g., data aggregated at a block or city level).

Offer to collaborate: You may have privileged access to data (i.e., other simply will not be able to access it). In that case, to the extent allowed by data access rules, you should offer the ability to run analyses for others and collaborate with them using the data. This would mean that even your critics can use the data in a meaningful public exchange of ideas, even if the data themselves cannot be made public.



This post has been migrated from the Open Science private beta at StackExchange (A51.SE)
commented Aug 18, 2015 by Guilherme Kenji Chih (80 points)
These seem to be the most pragmatic solutions to my problem.

This post has been migrated from the Open Science private beta at StackExchange (A51.SE)
+1 vote
answered Aug 5, 2015 by ArtOfCode (90 points)

While you can't share the data or the sensitive bits, you can share your results for whatever analyses you're doing.

For each analysis you make of the data, check your results. Remove any identifying or sensitive information, and then publish what you've got.

If others contact you about your results, and ask what your data was, you just have to make very clear that you have the data as part of your job, which gives you access to some sensitive material. If put nicely, no reasonable person will complain about the fact that you can't share it with them.



This post has been migrated from the Open Science private beta at StackExchange (A51.SE)

Welcome to Open Science Q&A, where you can ask questions and receive answers from other members of the community.

If you participated in the Open Science beta at StackExchange, please reclaim your user account now – it's already here!

e-mail the webmaster

...