By Daniel Garijo
Review Details
Reviewer has chosen not to be Anonymous
Overall Impression: Weak
Suggested Decision: Reject
Technical Quality of the paper: Good
Presentation: Good
Reviewer`s confidence: High
Significance: High significance
Background: Reasonable
Novelty: Limited novelty
Data availability: All used and produced data (if any) are FAIR and openly available in established data repositories
Length of the manuscript: The length of this manuscript is about right
Summary of paper in a few sentences:
This paper describes worcs, a package and workflow to help make R code reproducible according to the TOP guidelines.
The paper describes a tutorial of the tool, and how it implements the TOP guidelines.
Reasons to accept:
- The paper is relevant to the journal and on a timely topic.
- The capabilities of the tool for synthetic data generation, connect to OSF, preprint publishing and dependency gathering show a lot of promise.
- All the software (except a 404 page in the help, which is minor) is available online; and the project seems maintained and well documented.
Reasons to reject:
- There is no user evaluation, and therefore all the claims about worcs are unsupported by evidence. If the authors had included a compelling evaluation I would be in favor of accepting the paper.
Nanopublication comments:
Further comments:
This paper describes worcs, a package and workflow to help make R code reproducible according to the TOP guidelines.
The paper is well written, easy to follow and very relevant and timely for the journal. I think the capabilities of the worcs package for synthetic data generation, connect to osf, help with the preprints and capture dependencies are very valuable to the community. However, the main limitation of the paper is that the workflow and proposed package are not evaluated with users, and therefore the claims of the authors with respect of its capabilities are not supported by evidence. I would like to encourage the authors to gather user/community feedback and demonstrate with numbers that worcs can help scientists as they claim. Then, the paper would become a strong contribution for the journal.
Below I describe other points that could be improved in the current publication:
- Unsupported claims (beyond the ones derived from an absence of evaluation): The paper claims in many parts that worcs helps make an entire research project Open and Reproducible. However, at the same time there is also a claim for not supporting replication. Since replication is generally understood as a necessary steps towards reproducibility, I find this kind of contradictory. Maybe the claims can be modified stating that worcs helps documentation and understanding, hence supporting FAIR.
- I am confused by the citation "Van Lissa, Caspar J., Aaron Peikert, and Andreas M. Brandmaier. 2020. “Worcs: Workflow for Open Reproducible Code in Science.”". Is this referring to another paper or the actual package? If it's another paper then what is the contribution of this publication?
- If the paper is generated as DDG, where are the functions that show it's capabilities? It is also not clear to me that some of the functionalities of Latex would be usable in RMarkdown, such as all the equations.
- It is not clear how worcs addresses depedency management. The section lists a series of tools that can be used, but the section is not very specific to worcs.
- The workflow of releasing data products on GitHub is not ideal. First, because datasets of 100 MBs usually require special handling. Second, it doesn't follow very well the FAIR principles for findability and attribution of data. Instead, repositories like Zenodo or FigShare could be user to address data storage, with proper metadata and its corresponding citation. The authors seem to not describe very well what would happen in data science experiments where the data size is significant.
- In the 3 phases listed, it is not clear what worcs does and does not for users.
- In some cases it looks like the authors claim that everyone should use R for open science. While R has a strong community behind, I believe there are others with very strong support (e.g. Python). I am not sure if suggesting a shift to R is the right approach. Note that this is the impression I got from reading the paper, maybe it was unintended by the authors.
- Vignete on citation leads to a 404.
- Synthetic data generation is fantastic, but it is not clear to me how does worcs actually keep the data consistent e.g. for training models. Does it follow a similar distribution to the source data?
- Docker is deemed as complicated for novice users, but the setup vignete looks quite complex to me as well. Wouldn't a Docker container with worcs installed be easier to use? Maybe these aspects would be better highlighted in a user evaluation.
2 Comments
Meta-Review by Editor
Submitted by Tobias Kuhn on
The reviewers agree that the paper has merit, but some of them also point to major shortcomings, in particular the lack of discussion of user adoption and of handling non-linear research processes. Note that according to our guidelines, resource papers do not need to have a full-blown user evaluation, but we expect "sound evidence of its (potential for) reuse".
Tobias Kuhn (http://orcid.org/0000-0002-1267-0234)
Action letter
Submitted by Caspar Van Lissa on
WORCS: A Workflow for Open Reproducible Code in Science
Dear Dr. Kuhn,
We would like to thank you for providing us with the opportunity to revise our manuscript, “WORCS: A Workflow for Open Reproducible Code in Science”. We sincerely appreciate the time you and the Reviewers have taken to read the manuscript, and the extensive and thought-provoking comments you have raised. We hereby submit our revision of the manuscript for your consideration. We have attempted to address each of the issues raised by yourself and the Reviewers on a point-by-point basis, and detail the changes made in the action letter below. We would like to thank you and the Reviewers for your time, and we look forward to receiving your response.
Sincerely, on behalve of all coauthors, Caspar van Lissa
Reviewer #1
Review #1 submitted on 19/Oct/2020 by Daniel Garijo ORCID logo https://orcid.org/0000-0003-0454-7145
This paper describes worcs, a package and workflow to help make R code reproducible according to the TOP guidelines.
The paper describes a tutorial of the tool, and how it implements the TOP guidelines.
Reasons to accept:
Reasons to reject:
C1 There is no user evaluation, and therefore all the claims about worcs are unsupported by evidence. If the authors had included a compelling evaluation I would be in favor of accepting the paper.
Response: We appreciate the recommendation of performing a user evaluation, and this is certainly a next step in the development of WORCS. At this point, we have written this manuscript detailing the conceptual workflow, and released a software implementation for R-users. The next step, which we are now entering, is to build a user base, and then mount a user evaluation.
To address the Reviewer’s comment, we have set up an evaluation questionnaire, and sent it to the attendees of our previous workshops on WORCS. At the time of writing, responses are coming in. These are generally very favorable, and have not yet revealed major points of improvement.
We further want to point out that we have a strong history of addressing user comments submitted through the GitHub issues page. WORCS adheres to the Core Infrastructure Initiative “best practices” (CII Best Practices), which requires that the channels through which users can provide feedback are clearly indicated, as they are here. It is also evident from the resolved tickets on GitHub that we have been very responsive in actively incorporating user feedback. This is another requirement of CII Best Practices.
We are committed to continue evaluating user experience, both actively through user evaluation forms, and passively through the GitHub issues pages. We hope that this commitment, along with the evidence of our prior responsiveness to user feedback presented above, help assuage the Reviewer’s concerns.
Finally, we wish to point out that the Editor has indicated that ‘resource papers do not need to have a full-blown user evaluation, but we expect “sound evidence of its (potential for) reuse”’. We therefore also elaborate on the efforts we have made to ensure that WORCS is reusable, followed by evidence of its current adoption rates.
We have made the following efforts to ensure potential for reuse:
There is also evidence that WORCS is already being adopted, even prior to publication (as of 14-11-2020):
worcs
R-package has been downloaded 2567 times from CRAN, since being published on 18-05-2020Response: In addressing this comment, we respectfully refer to our response to the preceding comment (“There is no user evaluation”, R1 C1), and again emphasize the Editor’s comment that “resource papers do not need to have a full-blown user evaluation”. We would also like to clarify that the goal of the present manuscript is to define the problem WORCS intends to address (specifically; meeting the TOP-guidelines for Open Science), introduce the tools WORCS uses to address this problem, explain how WORCS uses these tools, and provide a step-by-step workflow. It is not clear what “claims” are made that are unsupported by evidence. In the revision, we have attempted to further clarify this goal from the paper onset:
Abstract:
We again emphasize that we take the recommendation to perform user evaluation very seriously, but we also feel that it would be premature to do so even before the paper is published. We have built WORCS based on established tools and best practices, and we have gathered preliminary positive feedback from the community through workshops and teaching.
C3 Unsupported claims (beyond the ones derived from an absence of evaluation): The paper claims in many parts that worcs helps make an entire research project Open and Reproducible. However, at the same time there is also a claim for not supporting replication. Since replication is generally understood as a necessary steps towards reproducibility, I find this kind of contradictory. Maybe the claims can be modified stating that worcs helps documentation and understanding, hence supporting FAIR.
Response: We thank the Reviewer for bringing this apparent contradiction to our attention. The misunderstanding arises from a lack of explicit definitions of the terms “replication” and “reproducibility” in the original manuscript. We are aware of the oft-lamented lack of consistent definitions for these terms (e.g., Patil, Peng, & Leek, 2019; Plesser, 2018). In the revision, we have attempted to address this lack of clarity by including explicit definitions for these terms:
After making these definitions explicit, it becomes clear that the Reviewer’s definition of replication as “a necessary steps towards reproducibility” is not consistent with the definitions used in the present paper.
We further wish to clarify that the fact that we address guidelines 1-7, but not guideline 8 (“replication”) is fully consistent with the intended purpose of the TOP-guidelines. Specifically, the first seven guidelines are intended to ensure the “reproducibility of the reported results based on the originating data” (direct quote from Nosek et al. 2015). The eighth criterion, by contrast, is “not formally a transparency standard for authors, [but] addresses journal guidelines for consideration of independent replications for publication” (again a direct quote). To avoid further confusion or controversy, we now address this distinction between the first 7 guidelines and the final 8th explicitly in the revised manuscript:
C4: I am confused by the citation “Van Lissa, Caspar J., Aaron Peikert, and Andreas M. Brandmaier. 2020. “Worcs: Workflow for Open Reproducible Code in Science.”". Is this referring to another paper or the actual package? If it’s another paper then what is the contribution of this publication?
Response: We apologize; it appears that the reference list was not parsed correctly after rendering the manuscript to HTML - the preferred format of Data Science. The reference is indeed to the R-package. In the revised manuscript, we have repaired the references so that it is clear that this is a software package.
C5: If the paper is generated as DDG, where are the functions that show it’s capabilities? It is also not clear to me that some of the functionalities of Latex would be usable in RMarkdown, such as all the equations.
Response: We understand that our introduction of the RMarkdown format was a bit too brief, and that readers may wonder whether specific functionalities of LaTeX would be usable in RMarkdown. To address this comment, the revised manuscript now reads:
C6: It is not clear how worcs addresses depedency management. The section lists a series of tools that can be used, but the section is not very specific to worcs.
Response: We apologize for the apparent lack of clarity of this section. We have restructured and clarified this section to convey the following information:
renv
, the solution for dependency management used by the R-implementation ofworcs
The changes are extensive, so we will not quote them all here. Instead, we refer the Reviewer to the revised manuscript.
We have also, in response to another Reviewer comment, created a Docker Hub build that runs
worcs
. Therefore, as of this revision, both of the tools discussed in this section (i.e.,renv
and Docker) are directly relevant for WORCS-users, and the section should read less like a “series of tools that can be used”. If the Reviewer so desires, we would also be willing to cut the brief discussion of Code Ocean - but this had been included at the request of an informal reviewer of this paper (Daniël Lakens).C7a: The workflow of releasing data products on GitHub is not ideal. First, because datasets of 100 MBs usually require special handling. […] The authors seem to not describe very well what would happen in data science experiments where the data size is significant.
Response: We have split this comment up to separately address the handling of large data, and findability (see below).
With regard to the handling of large data, we agree that GitHub is not an ideal platform for handling data larger than 100MB. It is important to note, however, that there are no restrictions for sharing files up to that size. Further, take into consideration that 100MB is very large for tabular data; for example, if only integer numbers are stored that is (100MB∗10242)/(22)=26214400(100MB∗10242)/(22)=26214400 unique values. In the social sciences, datasets rarely exceed this limit - and so for our readership, GitHub is as good a platform for sharing data as any. Moreover, GitHub is ideal for version controlling analysis code, RMarkdown documents, and small to medium-sized data files. In our opinion, the advantages of having all of a paper’s resources in the same repository outweighs the advantages of encouraging every reader to additionally use specialized solutions for “big data” (which most will not need).
To address this comment, we now explicitly address the problem of version controlling large files, and introduce a solution for larger files, which can be linked to git using Git Large File Storage:
We also added a paragraph to the Limitations section that discusses the fact that no single workflow is suitable for all situations:
C7b: Second, it doesn’t follow very well the FAIR principles for findability and attribution of data. Instead, repositories like Zenodo or FigShare could be user to address data storage, with proper metadata and its corresponding citation.
Response: We agree completely that it is important to make open data findable beyond just providing it freely. In order to address the Reviewer’s comment, we’ve revised the paper and the workflow vignette to emphasize more clearly that an essential step of the workflow is to connect the GitHub repository to a project page on the Open Science Framework - a standard repository for open research. We also devote more attention to the importance of DOIs; both for the project as a whole, and for specific resources, such as data files. Finally, we have elaborated on the discussion of the FAIR principles, emphasizing that WORCS is not designed to address these comprehensively, but pointing out specific ways in which WORCS is consistent with these guiding principles. Please see our response to Reviewer 2, comment 2 for more details.
With regard to the specific platforms named by the Reviewer: We are not aware of arguments to support a strong preference for Zenodo or FigShare over GitHub. Without such arguments, as far as we are concerned, these repositories are largely interchangeable. One key advantage of storing the data on GitHub is that they are then bundled with the documentation (e.g., codebook), analysis code, and manuscript. Thus, readers have access to all materials required for replication in one repository. Moreover, the entire project can be readily updated from the console, and is version controlled. GitHub additionally offers user interaction with the materials through forking, cloning, opening issues, and sending pull requests - all useful tools for replication and collaboration.
It is further worth noting that GitHub is compatible with Zenodo; users can document GitHub repositories in Zenodo and generate a DOI. We now explicitly mention this option in the paper and workflow vignette. We also mention the potential redundancy with Steps 15-16 of the workflow, however: Connecting the GitHub repository to an OSF project and generating a DOI accomplishes effectively the same.
C8: In the 3 phases listed, it is not clear what worcs does and does not for users.
Response: It is not entirely clear to us whether the Reviewer is referring to the conceptual workflow (“WORCS”), or to the
R
implementation of WORCS, i.e.,worcs
.We suspect that the Reviewer means the
worcs
R-package, so to address this comment, the Revision now states the following:Just in case we misunderstood, and the Reviewer actually meant the conceptual workflow WORCS, we have also added a few sentences to clarify the purpose of the conceptual workflow:
C9: In some cases it looks like the authors claim that everyone should use R for open science. While R has a strong community behind, I believe there are others with very strong support (e.g. Python). I am not sure if suggesting a shift to R is the right approach. Note that this is the impression I got from reading the paper, maybe it was unintended by the authors.
Response: We wish to emphasize that it was certainly unintended to suggest that everyone should us R for open science. To address this comment, we have done two things: First, throughout the paper, we have attempted to clarify the distinction between the platform independent conceptual workflow “WORCS”, and its implementation for R users,
worcs
. Second, throughout the paper, we encourage the implementation of WORCS in other languages. Most illustrative of these changes is the start of the paragraph on The R implementation of WORCS:C10: Vignete on citation leads to a 404.
Response: We apologize for the oversight; we have fixed this URL and checked all URLs in the manuscript for mistakes (none found).
C11: Synthetic data generation is fantastic, but it is not clear to me how does worcs actually keep the data consistent e.g. for training models. Does it follow a similar distribution to the source data?
Response: We agree that synthetic data generation is exciting, but feel obliged to clarify that this is not a novel contribution of the present paper nor R-package. As indicated in the paper, we use the algorithm by Nowok and colleagues, who explain the details of the method. We re-implemented the method in the
worcs
package because thesynthpop
package is rarely maintained (e.g., one of our pull requests took almost 10 months to be accepted), and because we wanted a more flexible, customizable interface.To answer the Reviewer’s specific question, the synthetic data indeed follow similar distributions to the source data, as indicated in the paper:
We have additionally clarified the paragraph that introduces the
synthetic()
function, and added further references to the methodological paper, and to the documentation of thesynthetic()
function (which summarizes the algorithm).C12: Docker is deemed as complicated for novice users, but the setup vignete looks quite complex to me as well. Wouldn’t a Docker container with worcs installed be easier to use? Maybe these aspects would be better highlighted in a user evaluation.
Response: We thank the Reviewer for this thoughtful suggestion. To address this comment, we have published an image on Docker Hub with all requirements of
worcs
installed, and added a Docker-specific setup vignette to the package and website, which is referenced in the Setup pagragraph:Reviewer #2
Submitted on 19/Oct/2020 by Remzi Celebi ORCID logo https://orcid.org/0000-0001-7769-4272
The paper introduces the Workflow for Open Reproducible Code in Science (WORCS) , a workflow that complies with most of best practices for open sciences projects and demonstrates the use of the R worcs package for adapting WORCS workflow. The package provides a template for R users to create projects that follow open science principles.
Reasons to accept:
It provides a good discussion of TOP-guidelines (a set of open science principles) and how proposed workflow/package can be used to facilitate the adaptation of these guidelines.
Reasons to reject:
C1: I would suggest the authors include a flowchart for users that shows the decision making process for which tools/operations should be used in WORCS. An example project would also help users better understand the package.
Response: We thank the Reviewer for these suggestions. Regarding the flowchart - we had already included one in the previous version of the manuscript. To address this comment, we have clarified the references in the text to this flowchart:
Regarding the example project, we have now curated a list of all public
worcs
projects on GitHub, and we reference this list in several places in the manuscript, including:Abstract
Introducing the workflow
C2: The claim that WORCS is amenable to the FAIR Guiding Principles should be justified. What are the solutions provided by WORCS that make a project interoperable and reusable? It would be improved with a more thorough discussion on how WORCS meets each FAIR principle.
Response: To address this comment, we have made three changes:
worcs
package, e.g., to include information on the project readme about how readers can obtain access to data.This paragraph now reads:
C3: Is WORCS providing any service for de-identification?
Response: WORCS does not provide a service for de-identification, and we consider this to be outside of the scope of the present paper. If the Reviewer feels differently, we are open to being convinced, however!
The current team lacks relevant expertise to make this contribution, but since
worcs
is open source software, anybody is welcome to contribute this functionality. Such a contribution would warrant coauthorship on the package, in line with the contribution guidelines outlined here.Reviewer #3
Submitted on 26/Oct/2020 by Anonymous
The paper presents WORCS, a library that is implemented within R-studio with the purpose of promoting the creation of reproducible research. In particular, they assist the researchers in the tasks of generating the scholarly article, thereby managing a number of tasks that researchers have to deal with from the reformulation of their hypothesis to version control, to dependency management, and finally generating the scholarly article.
Reasons to accept:
On the positive side, the authors built their tool, with Open Science principles in mind. In doing so, they have pined the actions that need to be performed as part of the research process. For instance, steps such as the pre-registration of the objective of the study before starting the investigation is ignored by existing solutions.
Reasons to reject:
The objective of the author is ambiguous. That said, I do not think that WORCS provide sufficient features that make it stands out from state of the art tools, for the following reasons.
C1: The process as described by the author for conducting research is assumed to be linear. However, researchers often go back and forth on steps that they have already completed, and sometimes have to undo them completely and start from scratch. This does not seem to be tackled by the tool.
Response: We agree with the Reviewer that researchers often go back and forth in their work. We did not intend to convey the impression that the process of conducting research is assumed to be linear. Of course, since we are presenting a workflow - there are a number of steps involved, which we have placed in roughly chronological order and numbered for reference purposes. For example, Study Design must logically occur prior to Writing and Analysis, and Publication can only logically occur after Writing. We also wish to emphasize that the “tool” (we assume the Reviewer is referring to the
worcs
R-package) does not enforce any kind of linearity.To address the Reviewer’s comment, we have made two changes: First, we now explicitly state, in two places in the paper, and in the Workflow vignette included with the
worcs
package, that the steps are not necessarily linear:Introducing the workflow:
Limitations:
Second, while it was always possible to manually add a manuscript or preregistration at a later point in time after creating a new project in RStudio - we have added more functions to increase the non-linear flexibility of the
worcs
package:add_manuscript()
andadd_preregistration()
.C2: The gist of the tool seems to allow researchers to document their steps and generate a scholarly document that is ready for submission. In a sense, there is a similarity with existing notebooks such as Jupyter. So I was wondering why don’t the authors start from an existing popular tool such as Jupyter as the starting point (given the similarity in features that they would like to add), and extend Jupyter for instance with the new features such as dependency documentation and version control ?
Response: There are two points we wish to address here. First, the Reviewer focuses on “the tool” - which we assume refers to the R package. However, the main goal of this paper is to introduce the conceptual workflow. The R package is merely offered to make adoption of the conceptual workflow easier for R users. This distinction is clarified several places throughout the manuscript, and to prevent further confusion, we have clarified the distinction even further in the revision, e.g.:
Abstract:
Manuscript, introduction:
Manuscript, paragraph titled The R implementation of WORCS:
Second, in response to the question “why don’t the authors start from an existing popular tool such as Jupyter”, we completely agree with the Reviewer that the conceptual workflow as explained in the paper could be implemented in any software environment, including Jupyter. In fact, we emphasize this in several places throughout the manuscript, for instance the paragraph titled The R Implementation of WORCS:
And in Future developments:
To answer the Reviewer’s question directly: We implemented this workflow in R because we are R developers. But in the paper, we also provide some arguments that compel our preference for R and RStudio as a choice for statistical computing in open science:
C3: Regarding Version control and dependency documentation, the authors uses in their system existing solution, namely git and env, respectively. While I did not criticize reusing existing solution, on the contrary, I don’t believe that simply using such solution add a substantial value. For example, git capture changes at the level of the line of code/text, whereas a researcher needs an abstraction that informs him/her on the coarse-grained elements that are significant from the point of view of development and research steps so as to have a clear idea of where s/he is within the process and help him/her make the decision of the next steps to undertake.
Response: Respectfully, we are not entirely sure what the criticism is here. First, as addressed in our response to the previous comment, there seems to be a misunderstanding about the scope of the paper. To clarify, we again state that our paper introduces a workflow for open science, based on objective criteria set out in the TOP-guidelines. Regarding novelty: Aside from the work-in-press of our co-authors - which is focused on strict computational reproducibility and is compatible with the present workflow - this is one of the first efforts to translate the requirements of open science into a step-by-step workflow. When you search Google Scholar for “open science workflow”, the first hit that actually describes a workflow is the present manuscript.
Second, the Reviewer claims that using e.g. Git does not add substantial value - but this claim is in disagreement with published literature (which we cite) that makes a strong case for the value of using Git for research (see Ram 2013; Blischak, Davenport, and Wilson 2016).
Third, there seems to be some misunderstanding of the workings of existing solutions. For example, the Reviewer remarks that “git capture changes at the level of the line of code/text, whereas a researcher needs an abstraction that informs him/her on the coarse-grained elements […] so as to have a clear idea of where s/he is within the process and help him/her make the decision of the next steps to undertake”. This comment suggests a fundamental misunderstanding of how Git works. Although Git tracks file changes on a line-by-line basis, such changes are typically grouped together in a “commit”, which is how Git captures the more “course-grained elements” the Reviewer refers to. For example, in our response to these reviews, we have created one Git commit for each Reviewer comment, and these commits contain changes to the action letter, manuscript, and vignettes.
To address the Reviewer’s comment, we now clarify this point in the revised manuscript:
The GitHub function of creating “releases” also helps mark the “course-grained” progress of a research project. In the WORCS procedure, we encourage taggin the preprint and submitted version of a manuscript as a “release” on GitHub. Such releases are prominently featured on the GitHub project page, and are accompanied by a downloadable archive of the historic state of the project. To further address the Reviewer’s comment, we now mention this in the manuscript text as well:
C4: To sum up, I think that the initial idea is interesting. However, I do not think that WORCS will be adopted by researchers or promote the state of the art of reproducibility, even if we focus on researchers who conduct their development and analysis using R.
Response: We thank the Reviewer for acknowledging the idea to be interesting. Whether it will be adopted by researchers remains to be seen after publication. We can offer some prelimiary evidence that the workflow is already being adopted (as of 14-11-2020):
worcs
R-package has been downloaded 2567 times from CRAN, since being published on 18-05-2020Reviewer #4
Submitted on 26/Oct/2020 by Katherine Wolstencroft https://orcid.org/0000-0002-1279-5133
This paper describes a Workflow for Open Reproducible Code in Science (WORCS). It promotes reproducible and FAIR R code and complies with the TOP-guidelines. The workflow is presented with an R library and a Github repository containing worcs templates and all supporting materials for this manuscript.
Reasons to accept:
The paper is well written and clearly describes WORCS, which promotes reproducible and open code for scientific investigations. It complies with current best practices for open and reproducible code and it also follows the FAIR principles. Such initiatives help promote better open and reproducible science. The approach taken here is pragmatic and lightweight, which is necessary for large scale uptake. The authors also focus on ease of use and provide a “one-click solution”.
WORCS is for scientists working with R with RMarkdown. It could therefore be argued that the utility is limited. However, R is widely used across science and the approach taken here could serve as an example for other software.
Reasons to reject:
C1: The abstract of the paper states that the manuscript “provides examples of the implementation of worcs in R”, so I was expecting science projects that had been described using worcs. If these examples exist, they should be referenced and linked to from the manuscript, as they would provide a better demonstration of the practical utility of the workflow and software than tutorial style examples.
Response: We thank the Reviewer for this helpful suggestion! Although such user examples are not yet numerous, the lead author has six public WORCS repositories, and there are four public repositories by other users. In our opinion, it would be most useful to have a continuously cumulating list of example repositories, and link that list in the paper - rather than to publish a static list. Therefore, to address this comment, we have scripted a web scraper that searches GitHub for WORCS repositories (using the metadata tags created by the
worcs
package). We have embedded the scraper in the README.Rmd file on the worcs GitHub page, which is regularly updated. In the revised paper, we point readers to a list of publicworcs
projects in the Abstract, and when introducing the workflow.C2: There is a comparison in the paper to another R-based reproducible science solution (Peikert and Brandmaier (2019)). The authors discuss some of the differences between these approaches, but the comparison is limited. It is not clear to me how different these approaches actually are, apart from ease of use. If scientists are already familiar with R, is there a significant learning curve with either solution?
We agree that this is an important point. We have attemptedto address it by rewriting the paragraph comparing these two solutions (as cited below). We hope that this is satisfactory. At this point in time, this is all we can say about the comparison of the two workflows, because the workflow of Peikert and Brandmaier is still in press, and the software implementation has not yet been fully developed.
Editor
Submitted by Tobias Kuhn on Tue, 10/27/2020 - 02:04, http://orcid.org/0000-0002-1267-0234
The reviewers agree that the paper has merit, but some of them also point to major shortcomings, in particular the lack of discussion of user adoption and of handling non-linear research processes. Note that according to our guidelines, resource papers do not need to have a full-blown user evaluation, but we expect “sound evidence of its (potential for) reuse”.
Response: We thank the Editor and all Reviewers for their helpful comments on this manuscript. We have attempted to address all of these points in the action letter above. With regard to the specific point reiterated by the Editor:
User adoption
We thank the Editor for pointing out that a full-blown user evaluation is not required for a resource paper. Of course, we take this Reviewer suggestion very seriously, and intend to conduct a user evaluation after the paper is published and users begin to adopt the workflow. Please refer to our response to Reviewer 1’s comment 1, where we discuss our efforts to ensure the potential for reuse, and provide preliminary evidence for actual reuse/uptake of the workflow.
Non-linear research processes
Although Reviewer 3 claims that WORCS assumes a linear research process, this is incorrect.
We agree with Reviewer 3 that researchers often go back and forth in their work. We did not intend to convey the impression that the process of conducting research is assumed to be linear. Of course, since we are presenting a workflow - there are a number of steps involved, which we have placed in roughly chronological order and numbered for reference purposes. For example, Study Design must logically occur prior to Writing and Analysis, and Publication can only logically occur after Writing. We also wish to emphasize that the “tool” (we assume the Reviewer is referring to the
worcs
R-package) does not enforce any kind of linearity.To address Reviewer 3’s comment, we have made two changes: First, we now explicitly state, in two places in the paper, and in the Workflow vignette included with the
worcs
package, that the steps are not necessarily linear:Introducing the workflow:
Limitations:
Second, while it was always possible to manually add a manuscript or preregistration at a later point in time after creating a new project in RStudio - we have added more functions to increase the non-linear flexibility of the
worcs
package:add_manuscript()
andadd_preregistration()
.References
Allaire, J. J., Kevin Ushey, RStudio, and Yuan Tang. 2020. “R Markdown Python Engine.” 2020. https://rstudio.github.io/reticulate/articles/r_markdown.html.
Blischak, John D., Emily R. Davenport, and Greg Wilson. 2016. “A Quick Introduction to Version Control with Git and GitHub.” PLOS Computational Biology 12 (1): e1004668. https://doi.org/10.1371/journal.pcbi.1004668.
Grolemund, Garrett, and Hadley Wickham. 2017. R for Data Science. O’Reilly. https://r4ds.had.co.nz/.
Lamprecht, Anna-Lena, Leyla Garcia, Mateusz Kuzak, Carlos Martinez, Ricardo Arcila, Eva Martin Del Pico, Victoria Dominguez Del Angel, et al. 2019. “Towards FAIR Principles for Research Software.” Edited by Paul Groth. Data Science, November, 1–23. https://doi.org/10.3233/DS-190026.
Muenchen, Robert A. 2012. “The Popularity of Data Science Software.” R4stats.com. April 25, 2012. http://r4stats.com/articles/popularity/.
Nosek, B. A., G. Alter, G. C. Banks, D. Borsboom, S. D. Bowman, S. J. Breckler, S. Buck, et al. 2015. “Promoting an Open Research Culture.” Science 348 (6242): 1422–5. https://doi.org/10.1126/science.aab2374.
Nowok, Beata, Gillian M. Raab, and Chris Dibben. 2016. “Synthpop: Bespoke Creation of Synthetic Data in R.” Journal of Statistical Software 74 (1, 1): 1–26. https://doi.org/10.18637/jss.v074.i11.
Patil, Prasad, Roger D. Peng, and Jeffrey T. Leek. 2019. “A Visual Tool for Defining Reproducibility and Replicability.” Nature Human Behaviour 3 (7, 7): 650–52. https://doi.org/10.1038/s41562-019-0629-z.
Peikert, Aaron, and Andreas Markus Brandmaier. n.d. “A Reproducible Data Analysis Workflow with R Markdown, Git, Make, and Docker.” Accessed January 9, 2020. https://doi.org/10.31234/osf.io/8xzqy.
Ram, Karthik. 2013. “Git Can Facilitate Greater Reproducibility and Increased Transparency in Science.” Source Code for Biology and Medicine 8 (1): 7. https://doi.org/10.1186/1751-0473-8-7.
R Core Team. 2020. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Van Lissa, Caspar J., Aaron Peikert, and Andreas M. Brandmaier. 2020. Worcs: Workflow for Open Reproducible Code in Science (version 0.1.5). https://cran.r-project.org/web/packages/worcs/index.html.
Vicente-Saez, Ruben, and Clara Martinez-Fuentes. 2018. “Open Science Now: A Systematic Literature Review for an Integrated Definition.” Journal of Business Research 88 (July): 428–36. https://doi.org/10.1016/j.jbusres.2017.12.043.
Wilkinson, Mark D., Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, et al. 2016. “The FAIR Guiding Principles for Scientific Data Management and Stewardship.” Scientific Data 3 (1): 160018. https://doi.org/10.1038/sdata.2016.18.
Xie, Yihui, J. J. Allaire, and Garrett Grolemund. 2018. R Markdown: The Definitive Guide. The R Series. Chapman and Hall/CRC. https://bookdown.org/yihui/rmarkdown/.