Reviewer has chosen to be AnonymousOverall Impression:
UndecidedTechnical Quality of the paper:
Unable to judgeData availability:
All used and produced data are FAIR and openly available in established data repositoriesLength of the manuscript:
The authors need to elaborate more on certain aspects and the manuscript should therefore be extended (if the general length limit is already reached, I urge the editor to allow for an exception)
Summary of paper in a few sentences:
A review of the role of data scientists in 'team science', i.e. the conducting of large studies by large interdisciplinary teams. The paper identifies the importance of the data scientists in such teams and states that this will increase over time. As such it is important to consider now how that role should be defined and how it should interact with existing academic systems, especially with the criteria defining individual career progression for data scientists.
Reasons to accept:
This paper makes some important points concerning the mechanics of collaboration when data scientists become part of large research teams. The author uses examples of her own experience working in such teams to illustrate the important role that data scientists can play in such teams, as well as the challenges to the team and the individual. With the large scale hiring of new roles in data-oriented positions (at least in my own university), and the subsequent pool of data scientists looking for collaborations, this is a timely issue to address.
Reasons to reject:
None. However, the paper skims or misses several issues which are detailed below in further comments. It would benefit from addressing these in more detail.
I feel that the author lets other team members off the hook rather easily. For example, take the statement: ‘Thus, today, the modern scientist cannot be expected to know all that is necessary about the data generated or the implications of how to address the research question raised.’ In my experience, many scientists convince themselves that they could not possibly understand data and statistics. Of course, this opens opportunities for data scientists, but it also leads to the quasi-consultancy relationships that are then criticised later in the paper. It also abrogates the responsibility co-authors have to adequately check each other’s work. A major challenge in bringing data scientists into teams is avoiding a balkanisation where the data and analysis is the responsibility of one individual and the collection of that data another, with too little common knowledge. Indeed, the article touches on this area: ‘When data scientists dig deep as collaborators, gaps in methodology can easily be identified.’ The flip side of this is that (in my own experience) too much time is wasted because some researchers lack the statistical literacy to adequately explain all of the constraints or peculiarities of the data collection process, until they are finally uncovered by the frustrated analyst.
On the issue of authorship, credit and evaluation (e.g. for promotion), the author focuses her attention on how institutions can appropriately change their metrics for promotion that relate to author-order. Perhaps this is a system-specific issue as I am not aware of any *explicit* metrics based on such factors in promotion within our university (though of course such things are relevant). If criteria are explicit and/or strictly quantitative then this should be an area of frequent review. However, I suggest that the author has neglected an equally important side of the equation: namely the responsibility or otherwise of the data scientist to demand the credit they deserve and need. For instance, how does this issue intersect with relative seniorities between colleagues, and with gender and ethnic biases? Considering why data scientists consistently occupying the middle author slots is as important as re-weighting the importance of those slots. This may mean also revisiting the previous point: are data scientists joining teams as full intellectual contributors or as consultants? Are both sides in agreement about this role a priori, or are they in unspoken disagreement? Could this be solved with a more explicit conversation about roles at the start of the collaboration, as in ‘this is what I want/need to get out the project’?
References are sometimes oddly formatted, for instance: 'This information directly informed how we data scientists designed the study and developed our statistical models. 7' - the reference superscript is outside the sentence.
It would be good to have a citation for the use of ‘team science’ as a previously recognised term near the start of the introduction.
‘Further, individual labs have become more specialized, as they have consequently been generating and/or handling more specific types of data for which a unique skill set is required. Thus, today, the modern scientist cannot be expected to know all that is necessary about the data generated…’ - This seems a bit vague, and the second sentence possibly contradicts the first - if most data is being generated in specialised labs with skill sets focused on that type of data, then it seems that most researchers would know how to deal with the data their lab produces (but potentially miss other techniques or know-how that they could translate from other areas)
Some examples are needed for the statement: ‘Conferences that discuss new tools and strategies for successful cross-disciplinary (including multi-, inter-, and transdisciplinary) engagement have been established.’
‘In addition, research articles in scientific journals that describe methods for cross-disciplinary collaboration are appearing (e.g.,See Börner et al., 2010 and Bennett and Galdlin, 2010)’ - these references are now 7 years old; are there no more recent examples?
“For example, as part of a group working with investigators to study the comparative effectiveness of HIV anti-retroviral agents on cardiovascular disease, team members with expertise in treating HIV noted the importance of adjusting for potential confounders like CD4 count, viral load, and cholesterol levels. Further, they emphasized that while the literature implicated single agents as having a role in cardiovascular disease, in practice, therapy was prescribed and taken in combinations, making the entire combined therapy a more pertinent focus. In addition, as we observed in the data, patients switched their combinations with great frequency.” - this section seems to move awkwardly from third to first person. I suggest working in the first person immediately to let the reader know this is personally experienced example