The integration of the data scientist into the team: implications and challenges

Tracking #: 460-1440

Authors:

	Name	ORCID
	Manisha Desai	https://orcid.org/0000-0002-6949-2651

Responsible editor:

Tobias Kuhn

Submission Type:

Position Paper

Abstract:

Modern biomedical research is complex and requires a cross section of experts collaborating using multi-, inter-, or transdisciplinary approaches to address scientific questions. Known as team science, such approaches have become so critical it has given rise to a new field – the science of team science. In biomedical research, data scientists often play a critical role in team-based collaborations. Integration of data scientists into research teams has multiple advantages to the clinical and translational investigator as well as to the data scientist. Clinical and translational investigators benefit from having an invested dedicated collaborator who can assume principal responsibility for essential data-related activities, while the data scientist can build a career developing tools that are relevant and data-driven. Participation in team science, however, can pose challenges. One particular challenge is the ability to appropriately evaluate the data scientist’s scholarly contributions, necessary for promotion. Only a minority of academic health centers have attempted to address this challenge. In order for team science to thrive on academic campuses, leaders of institutions need to hire data science faculty for the purpose of doing team science, with novel systems in place that incentivize the data scientist’s engagement in team science and that allow for appropriate evaluation of performance. Until such systems are adopted at the institutional level, the ability to conduct team science to address modern biomedical research with its increasingly complex data needs will be compromised. Fostering team science on campuses by putting supportive systems in place will benefit not only clinical and translational investigators as well as data scientists, but also the larger academic institution.

Manuscript:

ds-paper-460.docx

Supplementary Files (optional):

ds-supplementary-460.doc

Previous Version:

The integration of the data scientist into the team: implications and challenges

Revised Version:

The integration of the data scientist into the team: implications and challenges

Data repository URLs:

None

Date of Submission:

Saturday, May 13, 2017

Date of Decision:

Tuesday, May 30, 2017

Nanopublication URLs:

Decision:

Solicited Reviews:

Review #1 submitted on 24/May/2017

By Olivia Woolley-Meza ORCID logo

https://orcid.org/0000-0003-4517-2765

Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Good
Suggested Decision: Accept
Technical Quality of the paper: Average
Presentation: Good
Reviewer`s confidence: Medium
Significance: High significance
Background: Reasonable
Novelty: Limited novelty
Data availability: All used and produced data are FAIR and openly available in established data repositories
Length of the manuscript: The length of this manuscript is about right

Summary of paper in a few sentences (summary of changes and improvements for second round reviews):

As previous version

Reasons to accept:

My comments have been addressed in a satisfactory manner.

Reasons to reject:

None

Nanopublication comments:

Further comments:

To clarify the first comment I made in my original review, the point was simply that some of the power of different perspectives inherent in cross-disciplinary science could be lost in a team where social forces could lead to more conformism. There is nothing specific about cross-disciplinary work that makes it more susceptible to such a process. Rather I was just raising the point as a reminder that not all forms of team integration lead to the best science. But I do agree that this is outside the scope of the current paper.

Review #2 submitted on 26/May/2017

Review Details

Reviewer has chosen to be Anonymous

Overall Impression: Good
Suggested Decision: Accept
Technical Quality of the paper: Good
Presentation: Good
Reviewer`s confidence: High
Significance: High significance
Background: Comprehensive
Novelty: Limited novelty
Data availability: All used and produced data are FAIR and openly available in established data repositories
Length of the manuscript: The length of this manuscript is about right

Summary of paper in a few sentences (summary of changes and improvements for second round reviews):

The paper raises awareness of the team-based nature of much scientific research (especially in biomedical research), and specifically the role of data scientists in these teams. It explores the issues arising from this in terms of incentives for data scientists to participate in teams and the appropriate distribution of credit for team work. The conclusion is that hiring panels and other institutional bodies should review how they assess the work of candidates to better account for the contributions data scientists make.

Reasons to accept:

The paper raises important questions about the role appropriate credit for scientists plays in maintaining well functioning teams and departments, and keeping excellent scientists involved in research. The specific focus on the role data scientists play is well suited to the journal and timely given the increasing number of researchers identifying as such.

The paper is more complete and substantially improved after revision.

Reasons to reject:

None

Nanopublication comments:

Further comments:

Review #3 submitted on 30/May/2017

By Brian Davis ORCID logo

https://orcid.org/0000-0002-5759-2655

Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Good
Suggested Decision: Accept
Technical Quality of the paper: Good
Presentation: Weak
Reviewer`s confidence: High
Significance: Moderate significance
Background: Reasonable
Novelty: Clear novelty
Data availability: All used and produced data are FAIR and openly available in established data repositories
Length of the manuscript: The length of this manuscript is about right

Summary of paper in a few sentences (summary of changes and improvements for second round reviews):

See previous review.

Reasons to accept:

My comments have been addressed in a satisfactory manner

Reasons to reject:

No reasons

Nanopublication comments:

Further comments:

I would ask the Editor that there is a proper sectioning format in the final version.

3 Comments

Response by Author to Reviews of Previous Version

Submitted by Tobias Kuhn on Sat, 05/13/2017 - 06:18

Reviewer 1

Summary:

This position paper in essence advocates for the importance of team science but more importantly in order to encourage the activity it must be recognised and rewarded in academic/research staff performance review. This is valid point but the paper could be more convincing with respect to structuring its arguments. I like the topic but it needs some more work.

Reasons to accept:

Well written
Team science is an important issue and I will a publication that is not concerned with a technical experiment but is concerned with scholarship with respect research skills and professional development of the research scientist (data scientist)
Relevant in that this is paper is concerned with the (academic/research) practitioner of data science.
I argue that perspectives on an emerging field such as this should be encouraged.
Good related work.

Reasons to reject with point by point responses:

1. Referencing is little confusing. Either footnotes/citations?
There is what appears to be a quote from Dr. Julie Segre. This should be surrounded by "". Its not clear to me if the author is referring to her or someone else. The latter is not acceptable for quote in a publication.

Response: I apologize for the confusion. To rectify, I changed the article style so that instead of utilizing a superscript font to indicate citations, parentheses that include the numbered citation is utilized. There was no quote from Dr. Julie Segre. The superscript font was being used to indicate a reference from Dr. Segre’s work based on a team science approach. This should be clarified now in the newly revised presentation.

2. The is not well structured. There are no sections.

Response: I have restructured the article to include sections.

3. The author states the problem - but does not go into detail about possible causes and or/what methods are needed to verify explore this problems wrt to credit/promotion further.

Response: I have revised so that I now have a section labeled “Challenges and solutions to promotion of the team scientist” and have included greater details of solutions so that explicit recommendations are clear. More specifically, as discussed and referenced in the paper, Mazumdar et al (2015) have detailed concrete criteria by which team scientists can have their scholarly contributions noted, and they use biostatisticians to illustrate. I further present additional solutions. These include hiring faculty for the specific purpose of doing team science, creating a faculty line that is appropriate for the team scientist where promotion criteria appropriately consider team science-based contributions and where career trajectories differ from those in faculty lines not specifically intended for team science, and incentivizing all faculty in participation of team science. I have revised this section so that the solutions are presented clearly (Paragraph 2, of Challenges and solutions section).

4. What can be found in related literature from other disciplines. How does one claim credit for scientific team work in other disciplines such as physics. How can team science skills be measured? What does the literature say with respect to other disciplines. I cant imagine that this is not a new problem but rather needs some detective work in other disciplines under the scholarship of research/academic practice

Response: As the reviewer points out, Mazumdar et al (2015) and I are not the first to recognize that an issue with participating team science is the lack of appropriate recognition of scholarly contribution. For example, this has also been recognized in the Engineering field. Rikakis (2009) described the problem with respect to collaborative engineers and presented five measures with which to gauge scholarly contributions. The approach – that took years to establish, was specific to the engineering field, required knowledge of the field to implement and required a new informatics tool for data collection – was implemented exclusively at the University of Arizona and has not been adopted elsewhere. Mazumdar et al (2015) therefore propose a broader framework that could be implemented at other institutions if adopted at the institutional level. We limit the discussion to the paper by Mazumdar et al (2015) as it includes discussion of previous work and is the most recent assessment of this topic.

Reviewer 2

Summary:

A review of the role of data scientists in 'team science', i.e. the conducting of large studies by large interdisciplinary teams. The paper identifies the importance of the data scientists in such teams and states that this will increase over time. As such it is important to consider now how that role should be defined and how it should interact with existing academic systems, especially with the criteria defining individual career progression for data scientists.

Reasons to accept:

This paper makes some important points concerning the mechanics of collaboration when data scientists become part of large research teams. The author uses examples of her own experience working in such teams to illustrate the important role that data scientists can play in such teams, as well as the challenges to the team and the individual. With the large scale hiring of new roles in data-oriented positions (at least in my own university), and the subsequent pool of data scientists looking for collaborations, this is a timely issue to address.

Reasons to reject with point by point responses:

None. However, the paper skims or misses several issues which are detailed below in further comments. It would benefit from addressing these in more detail.

1. I feel that the author lets other team members off the hook rather easily. For example, take the statement: ‘Thus, today, the modern scientist cannot be expected to know all that is necessary about the data generated or the implications of how to address the research question raised.’ In my experience, many scientists convince themselves that they could not possibly understand data and statistics. Of course, this opens opportunities for data scientists, but it also leads to the quasi-consultancy relationships that are then criticised later in the paper. It also abrogates the responsibility co-authors have to adequately check each other’s work. A major challenge in bringing data scientists into teams is avoiding a balkanisation where the data and analysis is the responsibility of one individual and the collection of that data another, with too little common knowledge. Indeed, the article touches on this area: ‘When data scientists dig deep as collaborators, gaps in methodology can easily be identified.’ The flip side of this is that (in my own experience) too much time is wasted because some researchers lack the statistical literacy to adequately explain all of the constraints or peculiarities of the data collection process, until they are finally uncovered by the frustrated analyst.

Response: This is an important point raised by the reviewer that I agree with and I have revised accordingly. I fully agree with the reviewer that scientists – within and outside of the field of data – have a responsibility to fully understand the problem at hand and not simply contribute superficially (as the reviewer points out that a consultant might). To better reflect these ideas (Paragraph 1) I deleted the sentence “Thus, today, the modern scientist cannot be expected to know all that is necessary about the data generated or the implications of how to address the research question raised.” Further, I revised to explain that the joining together of members within a team is largely due to the increasingly complex scientific landscape and the specialization of many labs, making collaboration across labs increasingly necessary to address today’s modern scientific questions (Paragraph 1).

2. On the issue of authorship, credit and evaluation (e.g. for promotion), the author focuses her attention on how institutions can appropriately change their metrics for promotion that relate to author-order. Perhaps this is a system-specific issue as I am not aware of any *explicit* metrics based on such factors in promotion within our university (though of course such things are relevant). If criteria are explicit and/or strictly quantitative then this should be an area of frequent review. However, I suggest that the author has neglected an equally important side of the equation: namely the responsibility or otherwise of the data scientist to demand the credit they deserve and need. For instance, how does this issue intersect with relative seniorities between colleagues, and with gender and ethnic biases? Considering why data scientists consistently occupying the middle author slots is as important as re-weighting the importance of those slots. This may mean also revisiting the previous point: are data scientists joining teams as full intellectual contributors or as consultants? Are both sides in agreement about this role a priori, or are they in unspoken disagreement? Could this be solved with a more explicit conversation about roles at the start of the collaboration, as in ‘this is what I want/need to get out the project’?

Response: I agree that academic scientists need to demonstrate leadership to earn promotion, and I clarify this in the text (first paragraph under Challenges and solutions section). More specifically, I am not discussing situations where data scientists deserve an anchor position in the paper (first or last) and settle for something inappropriate. Rather I am suggesting that in the team science setting, we need to rethink contributions for a given effort. For example, if the data scientist leads in developing the analysis plan, oversees the analysis, provides principal interpretation for the study findings, and writes the methods and results section of the manuscript, a second place in authorship order may be appropriate, but should be recognized as a major contribution when promotion is considered because the effort greatly influenced the direction of the science. If not, data scientists may opt not to be involved in cutting-edge research where they are not listed as first or senior authors, and if this is the case, science that relies on team-based approaches will suffer. The revised text clarifies this position.

3. References are sometimes oddly formatted, for instance: 'This information directly informed how we data scientists designed the study and developed our statistical models. 7' - the reference superscript is outside the sentence.

Response: The reference style in the article has been revised.

4. It would be good to have a citation for the use of ‘team science’ as a previously recognised term near the start of the introduction.

Response: I have revised the manuscript so that the first sentence of the paper introduces this term with a citation and now reads:

Team science is a collaborative effort by scientists that join together across multiple disciplines to solve scientific questions (1).

5. ‘Further, individual labs have become more specialized, as they have consequently been generating and/or handling more specific types of data for which a unique skill set is required. Thus, today, the modern scientist cannot be expected to know all that is necessary about the data generated…’ - This seems a bit vague, and the second sentence possibly contradicts the first - if most data is being generated in specialised labs with skill sets focused on that type of data, then it seems that most researchers would know how to deal with the data their lab produces (but potentially miss other techniques or know-how that they could translate from other areas)

Response: This is a good point. I have revised the language to clarify my meaning (Paragraph 1).

6. Some examples are needed for the statement: ‘Conferences that discuss new tools and strategies for successful cross-disciplinary (including multi-, inter-, and transdisciplinary) engagement have been established.’

Response: I have now included two examples of such conferences.

7. ‘In addition, research articles in scientific journals that describe methods for cross-disciplinary collaboration are appearing (e.g.,See Börner et al., 2010 and Bennett and Galdlin, 2010)’ - these references are now 7 years old; are there no more recent examples?

Response: I have revised to include a more recent example from 2017.

8. “For example, as part of a group working with investigators to study the comparative effectiveness of HIV anti-retroviral agents on cardiovascular disease, team members with expertise in treating HIV noted the importance of adjusting for potential confounders like CD4 count, viral load, and cholesterol levels. Further, they emphasized that while the literature implicated single agents as having a role in cardiovascular disease, in practice, therapy was prescribed and taken in combinations, making the entire combined therapy a more pertinent focus. In addition, as we observed in the data, patients switched their combinations with great frequency.” - this section seems to move awkwardly from third to first person. I suggest working in the first person immediately to let the reader know this is personally experienced example

Response: Excellent suggestion. I have revised and think the paragraph reads better now (Paragraph 4).

Reviewer 3

Summary:

The author argues for the importance of multi-, trans- and interdisciplinary work where the data scientist is well integrated in a team with the necessary domain specific knowledge. The specific case she discusses is data-science in biomedical research. The main points she makes are, first, that domain specific knowledge helps the data scientist select relevant models and produce interpretable and usable results. Second, that new technical gaps can be exposed through applied data-science work especially when the non-data scientists in the team push the data scientist to detect shortcomings in the existing approaches. This leads to advancements in "pure" data-science research. Further, the author suggests some of the institution level frameworks necessary to sustain and encourage this activity. Namely, the appropriate metrics need to be designed to evaluate data science work that is primarily team and application driven, and career tracks opened up for researchers that work principally in this environment.

Reasons to accept:

I think this is an important and urgent topic, with general applicability to the work of data-scientists. The commentary is well-written and an interesting read.

Reasons to reject with point by point responses:

However, I think that in some places the general thread of the argument is lost, and some critical points are touched upon superficially. I make some suggestions below. Once these points are addressed this will be a piece well-worth publishing.

1. The author uses the idea of relevance of a model as a central argument for why integrating a data-scientist in the team is good. Another point is the appropriate clinical interpretation. Specifically: "Without the context provided by the HIV experts, the data scientists would have developed a less relevant model, and possibly provided misleading findings. Thus, integration of the data scientist into the team environment enabled the data scientists to arrive at an approach that yielded clinically appropriate interpretation. The iterative team-based process also more generally ensures that products from the collaboration will be relevant. " These are very important points, but also controversial. It is easy to make the opposite argument that the involvement in the team leads to "group-think", and narrowed/biased model selection according to conventional understanding, ruling out better and more innovative approaches. The author should address how to avoid/minimize this problem or why it is not an issue in the setting she is discussing.

Response: This is an interesting point raised by the reviewer. It is certainly possible that a research team – whether intra-disciplinary or cross-disciplinary – can arrive at a narrowed model selection approach. It is not apparent, however, why a cross-disciplinary approach would be more susceptible to this issue. If anything, I would think that by leveraging expertise across disciplines, different perspectives would be incorporated, increasing the potential for novelty.

2. In the example of exposing gaps in existing methods, the author does not provide a reference for the missing data imputation method developed. Either a reference is required and/or an explanation of how the method developed is different from the standard, and how the input of domain experts made this so.

Response: I have clarified in the text that this work is ongoing under a proposal to develop novel ideas for handling the missing data for this context and was awarded by PCORI (Paragraph 6).

3. It is unclear to me where the line is drawn between an external "consultant" data scientist and one who is really part of the team. I think spelling this out more clearly early on in the argument would make it clearer.

Response: I agree with this comment and have revised the text to include a new paragraph that distinguishes between these two settings (Paragraph 5).

4. Finally, in regards for how to evaluate contribution to team work, and the overall performance of a researcher who primarily engages in this work, the author says :"The authors developed an excellent and systematic approach for how evaluation and promotion could better recognize the intellectual leadership of the team scientist.[8] They further described how to evaluate contributions to publications, grants, and research programs, in order to summarize overall scholarship in a way that appropriately weights contributions to team-based science." What is the main idea behind this systematic approach? The innovation? A short explanation of the specific method suggested is really left wanting.

Response: Great point. I agree and have elaborated to provide greater details on the approach (Paragraph 11).

I wish to thank the reviewers and editor for their thoughtful review, and hope you will find the suggested changes have improved the manuscript.

Manisha Desai (http://orcid.org/0000-0002-6949-2651)

Meta-Review by Editor

Submitted by Tobias Kuhn on Tue, 05/30/2017 - 06:27

The previous issues are resolved with this revision, and I agree with the reviewers to accept this manuscript for publication.

Tobias Kuhn (http://orcid.org/0000-0002-1267-0234)

Link to Final PDF and JATS/XML Files

Submitted by Tobias Kuhn on Wed, 07/04/2018 - 08:41

https://github.com/data-science-hub/data/tree/master/publications/1-1-2/ds-1-1-2-ds008

Data Science

The integration of the data scientist into the team: implications and challenges

Tracking #: 460-1440

Authors:

Responsible editor:

Submission Type:

Abstract:

Manuscript:

Supplementary Files (optional):

Previous Version:

Tags:

Data repository URLs:

Date of Submission:

Date of Decision:

Decision:

3 Comments

Response by Author to Reviews of Previous Version

Meta-Review by Editor

Link to Final PDF and JATS/XML Files