Review #1
Summary of paper in a few sentences:
A position paper on best practices in translational research.
Reasons to accept:
Timely and appropriate set of best practices supported by examples and historical relevance. Generally well written with reasonable use of figures.
--> I thank the reviewer for his comments.
Reasons to reject:
Not very novel, however seems reasonable given the format of a position paper.
--> I thank the reviewer for his constructive comments. I believe this manuscript is novel in the sense that is the first clear overview of what 'rules' to follow in the field of translational research informatics.
Review #2
Length of the manuscript:
The authors need to elaborate more on certain aspects and the manuscript should therefore be extended (if the general length limit is already reached, I urge the editor to allow for an exception)
--> I have extended most of the commandments, as well as table 1.
Summary of paper in a few sentences:
Paper aims to give listing of ten commandments translational teams should follow when considering data for projects and how best to manage that data. They stress the importance of consistency and planning BEFORE a study actually begins, which will enable easier analysis at the end of the study.
Reasons to accept:
This type of information is extremely useful, especially to newly formed teams. Having worked in the field for years, I frequently encounter resistance to several of these topics from PI's, and it is often challenging to perform data analysis when some of these steps are not thought of at the beginning of a study. Having something to reference would be very helpful.
--> I thank the reviewer for his comments. Indeed, data management is something that is often being overlooked by the PIs at the start of the study. Hopefully this manuscript can help change the situation for the better.
Reasons to reject:
It is unclear who the intended audience is. The author seems to imply the commandments would be useful for new teams, however much of the language is in acronyms which a new team might not know. The audience should be better defined, and that should be thought of when explaining each commandment. Along those lines, including examples for some of the commandments (i.e. commandment 1 would be nice to include or reference a work package). The author mentions this is for translational research, yet the focus seems to be mainly on clinical data. Basic science data needs to be mentioned/included more to make it more relevant to the entire translational research spectrum, and not just clinical research which is how it largely reads at the moment.
--> The focus is indeed on translational research teams, because translational research informatics is what I have worked on over the past years (I can imagine that clinical research informatics would be different in some ways). I have included some examples for the commandments, as well as a more extensive description of what a work package is. I have now included some text on the other data types, including genomics data, which is a type of data that is typically generated in more fundamental research (as opposed to clinical research).
Explain acronyms...there are many, and some are not defined.
--> I have now included the full terms, with acronyms in brackets.
Commandment 1 - work package is strange phrasing, as I’ve never heard that term, and I suspect many others have not either. SOP (standard operating protocol) might be a better choice. A reference should be included for Horizon 2020. It would be nice to include what a WP should include in this commandment as well. Examples would be helpful for people forming new teams.
--> "Work package" is a common term in project management. However, I have changed the text to "work package (WP) or work stream (WS)”, and have explained what a WP entails (i.e. a sub-project with a goal, milestones, deliverables, FTEs and financial resources). I have included a link to the Horizon 2020 webpage.
Commandment 2 - What is EDC? Since reference NCATS earlier, would be nice to include REDCap in references here as well. How does this commandment apply to EMR or basic science data? Can you extract data directly from machines to ease this problem?
--> "EDC" stands for "Electronic Data Capture", this has been included in the text. I have referenced REDCap here. In case of automatic extraction from an EMR, I would still advise that someone checks the data entered into the EDC system.
Commandment 3 - in addition to a codebook, it would be good to consider restricting the data types for some variables to diminish errors (such as gender should not be recorded as numeric). Also good to include plausible values for some variables to diminish errors as well. This is in addition to having a codebook. What is eCRF? And how does that apply to basic science research/teams?
--> The codebook should include the data type, and a list of possible values (in case of a categorical variable). I have included a more extensive description of what information a codebook should contain. “eCRF” stands for “Electronic Case Report Form”. The eCRF is indeed used in clinical research and not so much in basic science, but I have included a paragraph in which I explain that derived data from the other data domains also needs to be included in the codebook. These other domains such as genomics and imaging are important in basic research as well.
Commandment 4 - would be good to mention places that one can share data as well as how to go about the sharing process as it is not always straightforward.
--> In this paragraph, 'data sharing' means sending data from one party to the other (usually done with secure data transfer software). When we are discussing places to share data (e.g. repositories), I have mentioned some examples.
Commandment 5 - Would be good to list, or reference, specific imaging tools mentioned in the last sentence.
--> I have included and referenced two DICOM anonymization tools.
Commandment 6 - Table 1 would benefit from a better description of the software...such as what each tool actually does, or what specific field (analytic area) do the resources apply to. Potentially along the lines of why someone would want to visit each of the websites (i.e. what can one use i2b2 to help them with?)
--> I have included a more extensive description in table 1, plus the main data type that is being processed by the software.
Commandment 7 - should also mention importance of dissemination of these tools so that others know they actually exist, and can easily find.
--> I have included a sentence about the dissemination of translational research informatics tools at the end of commandment 7.
Commandment 8 - it would be good to include an example of how to apply/follow the FAIR guidelines.
--> The Rembrandt brain cancer dataset has been included as an example of how to follow the FAIR guiding principles.
One additional thing to consider is that it is often a good idea to include in the team/data management plan to consult with someone that will be doing the downstream data analysis from the acquired data. This is to ensure that the fields collected will be in a format needed for easier access, and that the statistician/bioinformatician won't have to perform magic before analysis can be performed.
--> Thank you for the useful suggestion. I have included this in commandment 3, which now reads "Define all data fields up front together with the help of data analysis experts".
Review #3
Summary of paper in a few sentences:
The paper outlines several suggestions for planning data management processes based on the author's experience in prior projects. These mostly seem to revolve around clinical trials of translational research products. The suggestions are all reasonable, though not sure they are novel for those familiar with such work or clear/compelling for those who are not.
Reasons to accept:
Principles described are all reasonable and important for successful data management and collection processes that could easily be overlooked by PIs only focused on the end results/overriding hypothesis. The "dirty work" of gluing together the study/data infrastructure is indeed important to ensure projects are successful.
--> I thank the reviewer for his comments.
Reasons to reject:
Hard to parse out who target audience is for the writing here. Suggestions are largely just "plan ahead," which is a generally good suggestion, but often not specific enough to be useful to those who need it. May help some translational research managers formulate a checklist of elements to consider on predictable data management issues, but limited background or examples/illustrations provided on the consequences means that many of the people who would understand how to apply the suggestions (data managers) are likely those who would already know about them. Whereas those that writer is probably trying to convince (research PIs) are not getting enough background or illustration to convince them to commit the extra resources and planning that are appropriately suggested.
--> Thanks you for the useful suggestion. The intended audience is any translational research team, with the emphasis on the team leader (usually the PI) who makes the planning and commits the resources. I have included more background and examples throughout the manuscript.
Excessive use of abbreviations without definition (e.g., WP, FTE, EDC, PCCM, eCRF, BFO, OBO, RO, GDPR, HIPAA, EHR, DICOM, ARX) is just a superficial example of how the writing is going to miss people at different ends of the translational research spectrum, and largely be "preaching to the choir" when it does hit.
--> I have now included the full terms, with acronyms in brackets. ARX is not an acronym, apparently.
May help to reduce the number of suggestions (not artificially constrained to the "ten commandments" framework), so have more room to make individual key sections on HOW to achieve the general suggestions made.
--> I kept the ten commandments framework in place, but have included several paragraphs throughout the paper with concrete examples.
For example, reference to GDPR and HIPAA and how should have good data processor agreements and proper patient consent explanation. A reader who does not already know, will have no greater clarity on how to do any of this. Could have an example of what a typical, but poor, patient consent statement would be vs. a good one.
--> I have included a paragraph on the consent form in commandment 4.
Commandment 10: Think about sustainability. A reader can "think" about it and not actually be able to do anything about it. Some concrete suggestions on example sustainability plans would help. (e.g., data repositories as suggested, or operational hosting plans, institutional core resources, etc.)
--> I have changed this commandment to "Make it sustainable", and gave some suggestions.
Review #4
Summary of paper in a few sentences:
This position paper offers a set of desiderata for the data stewardship of projects in translational research—or really any research actually. These principles are derived from the authors personal experience in several large research projects.
Reasons to accept:
The paper is accessible, written in a style that will grab attention and that communicates its ideas well. The topic is important and is not well covered in the translational science literature.
--> I thank the reviewer for his comments.
Reasons to reject:
The paper does not reference current literature on data stewardship and it does not seem to recognize that data stewardship is already a scholarly discipline.
--> The "FAIR Guiding Principles for scientific data management and stewardship" paper was already discussed in the manuscript. I have now included some more relevant literature in the introduction section.
Further comments:
The paper is generally well written, but needs copyediting for English usage. "Learnings" is not a concrete noun in English. "Up front" is two words. "Institutes" are transient, whereas "institutions" have durability. It is not clear that "replacement" refers specifically to a person.
--> I thank the reviewer for his suggestions. I have implemented the listed corrections (or alternatives).
It is not obvious to me that any of the commandments will necessarily become irrelevant with time as the author suggests. The ideas may become practiced more widely, but scientists will still need to be aware of these principles. It is also not obvious why these principles apply only to translational research. They would seem to have much wider applicability.
--> This is correct, I have changed this sentence in the discussion (and abstract). Translational research is on the interface between basic research and clinical research, and thus some of the principles apply to these areas as well.
The author needs to define all the acronyms in the paper: CRF, eCRF, PCCM, BFO, OBO, RO, GDPR, HIPAA, EHR
--> I have now included the full terms, with acronyms in brackets.
The reference to "ontologies for translational research" in Commandment 3 mentions BFO, OBO, and RO. OBO is not an ontology, but rather a library of ontologies. BFO and RO are ontologies sometimes used in the construction of other ontologies, but they themselves do not specify terms that are relevant for the creation of standardized metadata.
--> Indeed, the OBO foundry maintains us a list of ontologies. I have updated the text (and explained the acronyms).
What was the method by which the entries in Table 1 were created? How can the reader know that this list is in any way complete? Many of the descriptions in this table do not give the reader a sense of what the tool actually does.
--> I have included a sentence on how the list was created, and have extended the descriptions in the table.
Although these commandments are quite reasonable, how do we know that there are only ten such commandments? A review of the literature on data stewardship might reveal others worth including as well—although I appreciate the attraction of having exactly ten.
--> There could definitely be more than 10, but a list that is too long is very difficult to read. So, in some cases, multiple commandments were combined into one. For example, I have included the need to discuss the codebook with data analysis experts in commandment 3.
Review #5
Summary of paper in a few sentences:
Important and relevant topic, but this manuscript takes an overly simplistic and incomplete approach to development of such a list. The manuscript makes statements that are naïve to the complexities of the issues involved, for example, “at the end of the project, data should be shared with the whole world.” This is a nice idea, and could be useful to investigators, but is underdeveloped in its current form.
Reasons to accept:
This is a nice idea, and could be useful to investigators, but is underdeveloped in its current form.
Reasons to reject:
This is a nice idea, and could be useful to investigators, but is underdeveloped in its current form. For example, statements made about some ‘commandments’ are overly simplistic and naïve to the complexities of the issues involved.
For example, the statement that “at the end of the project, data should be shared with the whole world” is overly simplistic and frankly naive. What about PHI and PII? How do you balance public access to data with the need to incentivize investigators to (painstakingly) collect data for many years? How do you protect participant privacy when multiple datasets with their information may be released? Another example is the statement “if the party performing the data integration receives data that is not properly de-identified, it should be destroyed immediately because of the privacy risk.” Again, this is overly simplistic.
--> I have added text about the importance of the patient’s privacy and the informed consent in this section, and have made clear that data should be shared after results have been published (which is usually the main incentive for the PIs). I have rephrased the statement about destroying data somewhat, and have included the possibility that the data integration expert is also responsible for de-identification and anonymization.
In addition, there are inaccuracies in the list of freely available software table. For example, the REDCap reference points to its implementation at University of Minnesota, rather than the REDCap consortium webpage itself. This suggests that the author is making lists/recommendations for tools that he/she does not fully understand or has adequately investigated.
--> REDCap is referenced in the manner provided at https://projectredcap.org/resources/citations/, with a link to Harris et al., 2009. The link in the table has been altered to https://projectredcap.org/. I do not know of any other inaccuracies.
Although the author refers to the “valley of death” (based on a published Nature article), reliance on the “Ten Commandments” structure and the religious overtones it entails is overdone and potentially offensive. Would suggest changing to “Ten Recommendations” and leaving out all the religious references.
--> I considered to change "commandments" to "recommendations", but I believe this would make the title less interesting. However, I have changed figure 3 (the ‘stone tablet’) to a table, as well as the reference to it in the text. I am not aware of any other references to religion.
Further comments:
The manuscript would benefit from substantial copy-editing. For example, there is no such word as “productized.”
--> "Productized" is an existing word, although perhaps not used often in science. According to the Cambridge Dictionary, "productize" means "to make something into a product which can be sold". I have replaced "productized" with "implemented in a product". As for some other language issues raised by the other reviews, these have been corrected.
Meta-Review by Editor
Thank you for your submission. While we believe you present important principles for data management, we cannot accept the manuscript in its current form. We would therefore like to invite you to revise your manuscript based on the reviewer’s suggestions at which point we will reassess the piece’s suitability for publication.
Your paper attempts to address issues that are essential to the field of data science, but it is missing some critical pieces that could enable much greater impact. Overall the commandments were too general and simplistic to be helpful in practice and could benefit from more concrete details with specific examples (perhaps of good practice contrasted with poor practice). Importantly, previous work on data stewardship should be described and referenced so that the ideas or principles described here can be distinguished from past practices. It is especially important to acknowledge what has been done and how (if at all) these ideas put forth here are new or different from previous work. In addition, the commandments are provided for translational research, however, the guidelines seem tailored to clinical research. Explicit inclusion and consideration of basic science would allow for a more impactful piece.
--> I have included more examples throughout the manuscript, and have focused more on translational research (including basic research), instead of only clinical research. The "FAIR Guiding Principles for scientific data management and stewardship" paper was already discussed within the manuscript. I have now included some more relevant literature in the introduction section. In contrast to most existing literature, we have more emphasis on phase 1 of the Research Data Lifecycle: creating data (using a codebook, eCRF, etc.).
The language used (including the excessive use of acronyms) is quite specialized indicating a narrow audience and should be made more general so that the principles can be adopted by the more general data science community. Whenever possible, examples should be used for illustration.
--> I have explained all acronyms, and have included more extensive descriptions and examples throughout the paper.
An important rule not listed includes engagement with an expert in study design or statistics for downstream analysis. This point was notably missing.
--> I have included this point in commandment 3.
We believe this is an important and timely topic and we look forward to reviewing a revised version of the manuscript if you are interested in submitting one that addresses the reviewers' concerns.
--> Thank you. I believe all reviewers’ concerns have been addressed and look forward to your comments.
1 Comment
Meta-Review by Editor
Submitted by Tobias Kuhn on
Thank you for submitting your revised manuscript and responding to the initial reviewers' comments. There are still some concerns that remain that we would like you to further address that involve providing more nuanced and less simplistic guidance for the principles you provide. In addition, we request that you edit for more concise language.
Manisha Desai (http://orcid.org/0000-0002-6949-2651)