FAIR Phenotyping with APHRODITE

Tracking #: 588-1568

Authors:

	Name	ORCID
	Juan Banda	https://orcid.org/0000-0001-8499-824X
	Andrew Williams	https://orcid.org/0000-0001-6756-2751
	Mehr Kashyap	https://orcid.org/0000-0002-9220-067X
	Martin Seneviratne	https://orcid.org/0000-0003-0435-3738
	Aaron Potvien	https://orcid.org/0000-0001-8068-7444
	Jon Duke	https://orcid.org/0000-0001-5523-2368
	Nigam Shah	https://orcid.org/0000-0001-9385-7158

Responsible editor:

Paul Groth

Submission Type:

Position Paper

Abstract:

Electronic phenotyping over the years has been evolving from simple to complex rule-based definitions, and more recently entering the machine learning age with probabilistic phenotype models. With the added complexity comes the additional need to have consistent and reproducible phenotype definitions for maintenance, replicability and community sharing. In this work we introduce how to construct probabilistic phenotype definitions with Automated PHenotype Routine for Observational Definition, Identification, Training and Evaluation (APHRODITE) that follow the FAIR principles to improve their reproducibility and quality. By using a centralized repository and creating a standard list of meta-data elements, we aim to guide probabilistic phenotype definition developers with a FAIR-compatible standard. By developing this standard within the Observational Health Data Sciences (OHDSI) initiative, we aim to ensure community wide compatibility and maximum reproducibility.

Manuscript:

ds-paper-588.docx

Special issue (if applicable):

FAIR Data, Systems and Analysis

Data repository URLs:

https://github.com/thepanacealab/FAIR_APHRODITE_phenotypes

Date of Submission:

Sunday, June 16, 2019

Date of Decision:

Monday, July 29, 2019

Nanopublication URLs:

Decision:

Reject

Solicited Reviews:

Review #1 submitted on 28/Jun/2019

By Lucy Lu Wang ORCID logo

https://orcid.org/0000-0001-8752-6635

Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Weak
Suggested Decision: Reject
Technical Quality of the paper: Bad
Presentation: Average
Reviewer`s confidence: Medium
Significance: Moderate significance
Background: Reasonable
Novelty: Limited novelty
Data availability: Not all used and produced data are FAIR and openly available in established data repositories; authors need to fix this
Length of the manuscript: The length of this manuscript is about right

Summary of paper in a few sentences:

This paper describes how to use APHRODITE to generate FAIR phenotype definitions based on EHR data. The authors describe a way to share generated phenotype definitions between organizations using GitHub, allowing these phenotypes to be reusable and validated by other institutions. Sharing learned phenotype definitions between healthcare systems is a major challenge because although phenotypes can be shared, the data used to generate them often cannot be, which means that shared phenotypes are difficult/impossible to validate.

Reasons to accept:

* The idea of shareable phenotypes seems great.
* A system for sharing and reusing clinical phenotypes seems timely and important. It would be great for the community if there were such a path for validating learned phenotypes and adding them into standard phenotype definitions.

Reasons to reject:

* The design and implementation of the system leaves much to be desired.
* I have a number of concerns and questions which I enumerate in Further Comments.

Nanopublication comments:

Further comments:

1. My primary concern is that the system you propose for sharing and validating phenotypes does not exist and would not work even if it exists based on your examples. The files you share in the GitHub repository do not contain very much information. When I load one, it is just a single variable which lists a set of predictor names. How are these at all useful for other institutions? There is no metadata about how these predictors were selected (size of cohort, model used, predictor metrics etc), and there is no way to know the strength of association between each predictors and the phenotype. As another institution, I face an impossible challenge using the information in this file.

2. Github is a for-profit company. There is no guarantee of it being publicly and freely available forever. In fact, it could start charging tomorrow. In fact, it already charges for private repositories above some limit. I’m not entirely convinced about your argument that it is a better alternative than something like hosting git on an institutional server, which would give you the same benefit of a URI (commit hash).

3. Section 4.3: Phenotype definition metadata will use *either* json or GitHub markdown for KR. Which one is it? Seems like a simple question to answer. Also, I would argue that Markdown is not suitable for KR, although what KR means in this context is unclear.

4. Section 4.4: A lot of “will”s. I am concerned that the authors are promising a lot, but there is no guarantee that any of it is done or will be done. I would prefer if the authors report on what they did do. Refer to point 1.

5. Section 6: The phenotype definition for MI is missing from your GitHub repo. There is also no information in this GitHub repo about any of these files or how you anticipate their being used.

Review #2 submitted on 26/Jul/2019

By Pascale Gaudet ORCID logo

https://orcid.org/0000-0003-1813-6857

Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Weak
Suggested Decision: Undecided
Technical Quality of the paper: Average
Presentation: Weak
Reviewer`s confidence: High
Significance: High significance
Background: Incomplete or inappropriate
Novelty: Clear novelty
Data availability: Not all used and produced data are FAIR and openly available in established data repositories; authors need to fix this
Length of the manuscript: The length of this manuscript is about right

Summary of paper in a few sentences:

The paper describes how to produce phenotypic definitions with the APHRODITE algorithm, developed to generate phenotypic definitions using machine learning approaches, in a FAIR-compliant manner.

Reasons to accept:

N/A

Reasons to reject:

The work presented in this manuscript is interesting and worth sharing with the research community. However the manuscript needs to be re-written significantly to communicate the message better. The authors spend much of the text describing what APHRODITE does, and how phenotypic definitions are generated, but not very much how to make them FAIR, except than by filling all metadata.

- Section 4 about the anatomy of a FAIR phenotype definition is written as if the guidelines for phenotype definitions were not yet established, ie everything is written in the future tense. Are phenotype definitions FAIR, or just proposed to be FAIR?

- Contrary to what is stated in the text, the GitHub repo does not contain 5 phenotype definitions, but only 4. Myocardial Infarction (MI) is missing.

- The phenotype definitions in the GitHub repo are in R language (.rda); I couldn't figure out how to visualize them. There should be some text version to promote better accessibility and interoperability.

Many sentences have syntactic problems that hinder comprehension. For example:
- "While the eMERGE network established important pioneering practices for externally validating phenotype definitions. "
- "The same patient attributes will often be represented by different coding systems or units across sites are treated as semantically equivalent."
- "Both of these methods which focus on shifting the paradigm of needing gold-standards and take advantage of clinical notes and structured EHR data. "
- "Both of these methods rely on using a large amount of imperfectly labeled training data, in order to learn good phenotype classifiers. " -> "To learn a classifier" does not make sense.

The manuscript contains many repetitions that make reading tedious. For example:
- Both of these methods which focus on shifting the paradigm of needing gold-standards and take advantage of clinical notes and structured EHR data.", which is followed by "Both of these methods rely on using a large amount of imperfectly labeled training data, in order to learn good phenotype classifiers. "
- "Our probabilistic phenotyping software is called APHRODITE [20], which is an open source R software package for building phenotype models." (...) Two sentences later "Our FAIR phenotype definitions are a product of the R package APHRODITE, which implements phenotyping methods developed by (...).

If these comments are addressed I am willing to re-review the manuscript.

Best regards,

Pascale Gaudet

Nanopublication comments:

Further comments:

Review #3 submitted on 26/Jul/2019

Review Details

Reviewer has chosen to be Anonymous

Overall Impression: Bad
Suggested Decision: Reject
Technical Quality of the paper: Weak
Presentation: Weak
Reviewer`s confidence: High
Significance: High significance
Background: Incomplete or inappropriate
Novelty: Lack of novelty
Data availability: Not all used and produced data are FAIR and openly available in established data repositories; authors need to fix this
Length of the manuscript: The length of this manuscript is about right

Summary of paper in a few sentences:

The paper presents work by the OHDSI consortium on their APHRODITE system for creating probabilistic phenotype descriptions and publishing them according to FAIR principles.

Reasons to accept:

None.

Reasons to reject:

The basics of APHRODITE have already been published by these authors in Banda et al. AMIA Jt Summits Transl Sci Proc. 2017; 2017: 48–57. The current article focuses on how they attain “FAIRness” according to Wilkinson et al 2016, by publishing their results on GitHub. See above for further comments, and the attached redlined Word doc for a more detailed line by line review.

Nanopublication comments:

Further comments:

See attached file containing a detailed line by line review.

Review Document: APHRODITE FAIR - DS paper REVIEW - DETAILS.pdf

1 Comment

Meta-Review by Editor

Submitted by Tobias Kuhn on Mon, 07/29/2019 - 08:25

Overall, the idea of sharing phenotype information in a FAIR way is a good one. However, it was unclear how the paper contributed beyond what the authors have already published. Furthermore, the implementation status of the actual Github based systems was unclear from the paper - what was planned and what actually has been implemenented?

Paul Groth (https://orcid.org/0000-0003-0183-6910)

Tracking #: 588-1568

Authors:

Responsible editor:

Submission Type:

Abstract:

Manuscript:

Tags:

Special issue (if applicable):

Data repository URLs:

Date of Submission:

Date of Decision:

Decision:

1 Comment