Reviewer has chosen not to be Anonymous
Overall Impression: Weak
Suggested Decision: Reject
Technical Quality of the paper: Bad
Presentation: Average
Reviewer`s confidence: Medium
Significance: Moderate significance
Background: Reasonable
Novelty: Limited novelty
Data availability: Not all used and produced data are FAIR and openly available in established data repositories; authors need to fix this
Length of the manuscript: The length of this manuscript is about right
Summary of paper in a few sentences:
This paper describes how to use APHRODITE to generate FAIR phenotype definitions based on EHR data. The authors describe a way to share generated phenotype definitions between organizations using GitHub, allowing these phenotypes to be reusable and validated by other institutions. Sharing learned phenotype definitions between healthcare systems is a major challenge because although phenotypes can be shared, the data used to generate them often cannot be, which means that shared phenotypes are difficult/impossible to validate.
Reasons to accept:
* The idea of shareable phenotypes seems great.
* A system for sharing and reusing clinical phenotypes seems timely and important. It would be great for the community if there were such a path for validating learned phenotypes and adding them into standard phenotype definitions.
Reasons to reject:
* The design and implementation of the system leaves much to be desired.
* I have a number of concerns and questions which I enumerate in Further Comments.
Nanopublication comments:
Further comments:
1. My primary concern is that the system you propose for sharing and validating phenotypes does not exist and would not work even if it exists based on your examples. The files you share in the GitHub repository do not contain very much information. When I load one, it is just a single variable which lists a set of predictor names. How are these at all useful for other institutions? There is no metadata about how these predictors were selected (size of cohort, model used, predictor metrics etc), and there is no way to know the strength of association between each predictors and the phenotype. As another institution, I face an impossible challenge using the information in this file.
2. Github is a for-profit company. There is no guarantee of it being publicly and freely available forever. In fact, it could start charging tomorrow. In fact, it already charges for private repositories above some limit. I’m not entirely convinced about your argument that it is a better alternative than something like hosting git on an institutional server, which would give you the same benefit of a URI (commit hash).
3. Section 4.3: Phenotype definition metadata will use *either* json or GitHub markdown for KR. Which one is it? Seems like a simple question to answer. Also, I would argue that Markdown is not suitable for KR, although what KR means in this context is unclear.
4. Section 4.4: A lot of “will”s. I am concerned that the authors are promising a lot, but there is no guarantee that any of it is done or will be done. I would prefer if the authors report on what they did do. Refer to point 1.
5. Section 6: The phenotype definition for MI is missing from your GitHub repo. There is also no information in this GitHub repo about any of these files or how you anticipate their being used.
1 Comment
Meta-Review by Editor
Submitted by Tobias Kuhn on
Overall, the idea of sharing phenotype information in a FAIR way is a good one. However, it was unclear how the paper contributed beyond what the authors have already published. Furthermore, the implementation status of the actual Github based systems was unclear from the paper - what was planned and what actually has been implemenented?
Paul Groth (https://orcid.org/0000-0003-0183-6910)