Collecting, exploring and sharing personal data: why, how and where

Tracking #: 596-1576

Authors:

	Name	ORCID
	Vero Estrada-Galiñanes	https://orcid.org/0000-0002-7791-8149
	Katarzyna Wac	https://orcid.org/0000-0002-8060-399X

Responsible editor:

Robert Hoehndorf

Submission Type:

Research Paper

Abstract:

New, multi-channel personal data sources (like heart rate, sleep patterns, travel patterns, or social activities) are enabled by ever increased availability of miniaturised technologies embedded within smartphones and wearables. These data sources enable personal self-management of lifestyle choices (e.g., exercise, move to a bike-friendly area) and, on a large scale, scientific discoveries to improve health and quality of life. However, there are not simple and reliable ways for individuals to securely collect, explore and share these sources. Additionally, much data is also wasted, especially when the technology provider cease to exist and the users are left without any opportunity to retrieve own datasets from dead devices or systems. Our research reveals evidence of what we term human data bleeding and offers guidance on how to address current issues by reasoning upon five core aspects, namely technological, financial, legal, institutional and cultural factors. To this end, we present preliminary specifications of an open platform for personal data storage and quality of life research. The Open Health Archive (OHA) is a platform that would support individual, community and societal needs by facilitating collecting, exploring and sharing personal health and QoL data.

Manuscript:

ds-paper-596.pdf

Supplementary Files (optional):

ds-supplementary-596-921.zip

Data repository URLs:

none

Date of Submission:

Friday, July 19, 2019

Date of Decision:

Tuesday, September 17, 2019

Nanopublication URLs:

Decision:

Solicited Reviews:

Review #1 submitted on 29/Aug/2019

By Viktoria Spaiser ORCID logo

https://orcid.org/0000-0002-5892-245X

Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Good
Suggested Decision: Accept
Technical Quality of the paper: Good
Presentation: Good
Reviewer`s confidence: Low
Significance: Moderate significance
Background: Comprehensive
Novelty: Unable to judge
Data availability: All used and produced data (if any) are FAIR and openly available in established data repositories
Length of the manuscript: The length of this manuscript is about right

Summary of paper in a few sentences:

The paper discusses issues around multi-channel personal health and quality of life data (e.g. waste of data if technology providers cease etc.). Solutions to the issues are suggested, specifically technological, financial, legal, institutional and "cultural" approaches. An open platform for personal data storage that would allow research with that data is outlined.

Reasons to accept:

This paper asks a few important questions and discusses issues that need greater consideration in the field. It suggests an interesting solution to the discussed problems, albeit all the discussions remain quite abstract; the paper is a theoretical paper. The suggested solutions seem to be reasonably novel, albeit building on existing systems and solutions.

Reasons to reject:

There are no, in my view, strong reasons to reject the paper, but, I'm not an expert in this field.
Things to improve:
- more careful proofreading, there are a few typos and grammar errors throughout the manuscript
- a few key concepts and terms are not (properly) defined, e.g. what exactly is quality of life and how exactly does it relate to health
- Table 1 is not referenced/explained in the text
- section 4.5 (cultural and behavioural factor), I think the title should be changed to something about online communities, because strictly speaking the section is neither about cultural nor about behavioural factors. Culture in other parts of the paper (e.g. in the abstract) should be also replaced as a term to avoid confusion.

Nanopublication comments:

Further comments:

Review #2 submitted on 10/Sep/2019

By Paul Schofield ORCID logo

https://orcid.org/0000-0002-5111-7263

Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Average
Suggested Decision: Undecided
Technical Quality of the paper: Good
Presentation: Average
Reviewer`s confidence: High
Significance: Moderate significance
Background: Reasonable
Novelty: Limited novelty
Data availability: Not all used and produced data are FAIR and openly available in established data repositories; authors need to fix this
Length of the manuscript: This manuscript is too long for what it presents and should therefore be considerably shortened (below the general length limit)

Summary of paper in a few sentences:

The paper constitutes a critical review of issues surrounding the collection, use, and ownership of personal healthcare data with the emphasis on the collection and archiving of data by individuals using mobile devices, self reporting and largely commercial apps. The authors consider the issues of data ownership, utility and privacy. They end by considering the potential for an Open Health Archive, how it might be governed and how it might be of benefit.

Reasons to accept:

The subject matter is of intense societal concern at the moment and under commercial, regulatory and technical scrutiny. It is topical and highly relevant to issues of data management and governance.

Reasons to reject:

The paper takes little account of the clinical aspects of data collection and the utility of self collected and uncurated data from a wide variety of unregulated and disparate sources. There are some very critical issues that they do not address and they do not cite the relevant literature ( eg. BMJ 2019;364:l920
http://dx.doi.org/10.1136/bmj.l920) particularly the clinical and clinical informatics literature. They do not seem to be up to date with policy discussions in national health providers and regulators such as the FDA and the UK NHS. Several issues are not given sufficient discussion. Firstly the utility of self collected data - such data is not acceptable for example for clinical trials as it does not conform to the FDA ALCOA requirements. Self collected data is particularly subject to longitudinal interruptions, changes of platform and format, rendering it unreliable for accurate and objective analysis. The issues of interoperability are raised appropriately but it is clear that an organised a regulatory framework and licensing of apps as medical devices will be necessary in order to make any of this data usable for clinical research. Such regulatory standardisation and oversight is now being discussed eg. https://acmedsci.ac.uk/file-download/74634438 and more should be made of this. Discussion of FAIR data access for personal healthcare data seems rather fanciful considering the level of metadat a required to comply with FAIR guidelines and the technical ability of most depositors of personal healthcare data. There is insufficient critical discussion of the unregulated sharing of data and insecure transmission of data between commercial entities. Sustainability of open archives is not discussed. Taking these shortcomings into account I do not feel that the paper contributes significantly to discussion and although it might be in part a useful review of the area I do not think that its originality justifies the length.

Nanopublication comments:

Further comments:

The paper does not rely on data other than that cited in other publications and so although all this is not available this should not be a problem for publication.

Review #3 submitted on 15/Sep/2019

By Andreas Karwath ORCID logo

https://orcid.org/0000-0002-6942-3760

Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Good
Suggested Decision: Accept
Technical Quality of the paper: Average
Presentation: Good
Reviewer`s confidence: Medium
Significance: High significance
Background: Comprehensive
Novelty: Limited novelty
Data availability: All used and produced data (if any) are FAIR and openly available in established data repositories
Length of the manuscript: The length of this manuscript is about right

Summary of paper in a few sentences:

The manuscript entitled ‘Collecting, exploring and sharing personal data: why, how and where’ presents the case for an Open Health Archive platform supporting the collection, the sharing, and exploitation of personal health data in a distributed fashion. The manuscript presents a variety of arguments for why this kind of platform is needed and how this could potentially be achieved.

The authors introduce initially their definition of different types and aims of personal health data collection, ranging from electronic medical and health records over patient-generated data to IOT devices, such as wearables and smart watches/cars etc. Consequently, the authors describe problems and challenges with current data collection approaches. This focuses to some extend on the different perspectives of the point of views from individual and researchers. Furthermore, they highlight different aspects of values within the data ranging from individual, technical, legal and cultural perspectives. This leads towards the argument for the need of an open and distributed platform, the Open Health Archive (OHA), suppling the data collectors, i.e. individuals like patients and citizens, with more control over the collected data when compared to the status quo were data is held by individual companies and organizations with a variety of different technical and legal restrictions.

Reasons to accept:

The manuscript introduces valid arguments for a decentralized platform supporting the collection, the sharing, and the exploitation of personal health data in a distributed fashion with researchers, data providers (citizens) and privacy in mind.

Reasons to reject:

It is clear, that based on a slightly different argumentation one could also come to a different outcome, even if this would be purely based on practical considerations.

Nanopublication comments:

Further comments:

Overall, the manuscript reads quite well and fluently. Most of the arguments are well presented and easily comprehensible. However, they are all brought forward with the aim to achieve an open data platform. The discussion about monetary incentives for the actual data collectors is brushed aside. Not that the reviewer is in favor of such an approach, but it might be interesting to discuss the potential benefits of the individuals for such an approach. Although, the overall approach is very much conceivable some aspect fails to convince: how is the long-term storage of the data achieved – in terms of storage itself and also in terms of financing? A further point which would need some clarification is the aim to have both: a) a decentralized system with individuals able to retract any information access at any time and b) the aim to adhere to the FAIR principles. As far as the reviewer understands these principles, every piece of research performed on a FAIR-principled dataset should be able to repeated at any later stage and so allowing for transparent assessment of algorithms. But, what would happen if an individual retracts data access. To enable FAIR-principled data and to repeat exactly the same piece of research the data would have to be stored elsewhere, including the now unconsented parts. Would that not lead to inconsistencies and/or legal problems?

In the introductory section of the manuscript, the authors highlight the problem of same measurements differ across different devices although they should measure the same entities. It is a bit unclear to the reviewer how this would be overcome within the open platform. To overcome some parts of this problem, it might be required to publish the devices including their Oss and/or software versions or even location. This might make it quite hard for the level of anonymization required by the portal contributors.

On a rather minor note, I have so far not seen that links are provided via the webarchive. Although understandable (allowing the access of web pages even if the content has been changed in the meantime), it does make reading the links very easy.

Minor comments:

On a different not so important note and with the individual in mind, it might also be interesting to the data donor /collector to enable or view what data is accessed by whom. I personally would be interested in such information.

Although the overall manuscript reads very well, that last section of section 4.5 on page 11 has some minor mistakes (colloquial English and ‘this’ instead of ‘these’).

1 Comment

Meta-Review by Editor

Submitted by Tobias Kuhn on Tue, 09/17/2019 - 03:28

The manuscript has been reviewed by three experts. The reviewers agree that the topic is timely, novel, and important, but also identified several minor issues that must be addressed before the manuscript can be published. In particular, Reviewer 2 highlights a number of important points related to clinical, personal, and commercial data sharing, and the reviewer lists several references that should be taken into account in a more elaborate discussion. Reviewers 1 and 2 also highlight several minor points that should be addressed in the revised manuscript.

Robert Hoehndorf (http://orcid.org/0000-0001-8149-5890)

Data Science

Collecting, exploring and sharing personal data: why, how and where

Tracking #: 596-1576

Authors:

Responsible editor:

Submission Type:

Abstract:

Manuscript:

Supplementary Files (optional):

Tags:

Data repository URLs:

Date of Submission:

Date of Decision:

Decision:

1 Comment

Meta-Review by Editor