Reviewer has chosen not to be AnonymousOverall Impression:
AcceptTechnical Quality of the paper:
Limited noveltyData availability:
All used and produced data (if any) are FAIR and openly available in established data repositoriesLength of the manuscript:
The length of this manuscript is about right
Summary of paper in a few sentences:
The manuscript entitled ‘Collecting, exploring and sharing personal data: why, how and where’ presents the case for an Open Health Archive platform supporting the collection, the sharing, and exploitation of personal health data in a distributed fashion. The manuscript presents a variety of arguments for why this kind of platform is needed and how this could potentially be achieved.
The authors introduce initially their definition of different types and aims of personal health data collection, ranging from electronic medical and health records over patient-generated data to IOT devices, such as wearables and smart watches/cars etc. Consequently, the authors describe problems and challenges with current data collection approaches. This focuses to some extend on the different perspectives of the point of views from individual and researchers. Furthermore, they highlight different aspects of values within the data ranging from individual, technical, legal and cultural perspectives. This leads towards the argument for the need of an open and distributed platform, the Open Health Archive (OHA), suppling the data collectors, i.e. individuals like patients and citizens, with more control over the collected data when compared to the status quo were data is held by individual companies and organizations with a variety of different technical and legal restrictions.
Reasons to accept:
The manuscript introduces valid arguments for a decentralized platform supporting the collection, the sharing, and the exploitation of personal health data in a distributed fashion with researchers, data providers (citizens) and privacy in mind.
Reasons to reject:
It is clear, that based on a slightly different argumentation one could also come to a different outcome, even if this would be purely based on practical considerations.
Overall, the manuscript reads quite well and fluently. Most of the arguments are well presented and easily comprehensible. However, they are all brought forward with the aim to achieve an open data platform. The discussion about monetary incentives for the actual data collectors is brushed aside. Not that the reviewer is in favor of such an approach, but it might be interesting to discuss the potential benefits of the individuals for such an approach. Although, the overall approach is very much conceivable some aspect fails to convince: how is the long-term storage of the data achieved – in terms of storage itself and also in terms of financing? A further point which would need some clarification is the aim to have both: a) a decentralized system with individuals able to retract any information access at any time and b) the aim to adhere to the FAIR principles. As far as the reviewer understands these principles, every piece of research performed on a FAIR-principled dataset should be able to repeated at any later stage and so allowing for transparent assessment of algorithms. But, what would happen if an individual retracts data access. To enable FAIR-principled data and to repeat exactly the same piece of research the data would have to be stored elsewhere, including the now unconsented parts. Would that not lead to inconsistencies and/or legal problems?
In the introductory section of the manuscript, the authors highlight the problem of same measurements differ across different devices although they should measure the same entities. It is a bit unclear to the reviewer how this would be overcome within the open platform. To overcome some parts of this problem, it might be required to publish the devices including their Oss and/or software versions or even location. This might make it quite hard for the level of anonymization required by the portal contributors.
On a rather minor note, I have so far not seen that links are provided via the webarchive. Although understandable (allowing the access of web pages even if the content has been changed in the meantime), it does make reading the links very easy.
On a different not so important note and with the individual in mind, it might also be interesting to the data donor /collector to enable or view what data is accessed by whom. I personally would be interested in such information.
Although the overall manuscript reads very well, that last section of section 4.5 on page 11 has some minor mistakes (colloquial English and ‘this’ instead of ‘these’).