A Novel Machine Learning Algorithm for Classifying Mortality Risk Patients for Intensive Care Unit Admissions

Tracking #: 626-1606

Authors:

	Name	ORCID
	Carol Hargreaves	https://orcid.org/0000-0002-5522-4058
	Hui Ling Juliet Tan	https://orcid.org/0000-0001-7903-3371
	Dinesh Kumar	https://orcid.org/0000-0001-6535-5441

Responsible editor:

Michael Krauthammer

Submission Type:

Research Paper

Abstract:

Introduction: In this paper, we focus on the classification of high mortality risk patients for Intensive Care Units (ICUs). Classification algorithms for identifying ICU mortality are necessary for measuring and improving ICU performance. Mortality risk severity scores are an essential part of hospital management and clinical decision-making. Proper application of classification models can help in decision making to lower hospital costs. In fact, classification high mortality risk models have become a necessary tool to explain differences in mortality risk. Purpose of Study: The purpose of this study is to develop and evaluate a new algorithm which more accurately predicts patient mortality in ICU, using patient information of vital signs and laboratory results only in the first 24 hours of ICU admission. Methods: We evaluate a novel approach, by statistically converting continuous variables into categorical variables and identifying optimal threshold cut points for stabilizing the coefficients of the classification mortality risk model. Using a machine learning method, namely the logistic regression model, the optimal threshold cut points and open source tools, we developed and evaluated a mortality risk algorithm for ICU patients. Results: An optimal set of 3 threshold values were derived, that partitioned the data into 4 groups, resulting in the patient mortality risk scores being more distinguishable across the 4 partitioned groups. The most important variables for the ICU Mortality Risk was PO2 (120 ¬ 125), followed by Cardiac Arrest (Yes), Bilirubin (0.75 – 1), Vasopressors (Yes), SPO2 (< 66), Bilirubin (>7.75), Foley (<6), Severe COPD (Yes), WBC (> 19.5) and BUN (> 49). Conclusion: We present a new binary classification algorithm, the logistic regression with threshold cut points, designed to address the problem of continuous variables with high variability and extreme values, to stabilize our model coefficients and improve the accuracy of classifying high mortality risk ICU patients. Our proposed optimal threshold cut point model performed substantially better (AUC=0.944) in identifying ICU patients with high mortality risk compared to the current scoring systems commonly used in hospitals, such as the SAPS 11 (AUC =0.771), APACHE 11 (AUC=0.736) and SOFA (AUC=0.699). This accuracy is at least 30% (1.35 times) better than current mortality risk scoring systems. SAPS 11, APACHE 11 and SOFA are static algorithms whereas our new optimal threshold algorithm is a data-driven algorithm which predicts mortality in ICU patients in real-time and may be useful for the timely identification of deteriorating patients. Our new binary classification algorithm will allow clinicians to accurately identify high-mortality risk patients early within 24 hours so that they can be given prompt treatment to reduce their risks of deteriorating or dying.

Manuscript:

ds-paper-626.docx

Data repository URLs:

https://github.com/CarolHargreaves/Classification-of-Mortality-Risk-Patients-for-Intensive-Care-Unit-Admissions

Date of Submission:

Sunday, March 29, 2020

Date of Decision:

Wednesday, May 13, 2020

Nanopublication URLs:

Decision:

Reject

Solicited Reviews:

Review #1 submitted on 11/Apr/2020

Review Details

Reviewer has chosen to be Anonymous

Overall Impression: Average
Suggested Decision: Undecided
Technical Quality of the paper: Weak
Presentation: Weak
Reviewer`s confidence: Medium
Significance: High significance
Background: Reasonable
Novelty: Lack of novelty
Data availability: All used and produced data (if any) are FAIR and openly available in established data repositories
Length of the manuscript: The authors need to elaborate more on certain aspects and the manuscript should therefore be extended (if the general length limit is already reached, I urge the editor to allow for an exception)

Summary of paper in a few sentences:

This paper describes an algorithm for classifying whether an admitted ICU patient is at a high risk of mortality or not (binary classification) by using diagnostic and monitoring data recorded in the first 24 hours following admission to the ICU. The algorithm takes a three step approach to build a classifier. First, univariate variables are selected using t-tests between their values for survivors and non-survivors. Second, for all continuous numerical variables, optimal threshold cut-points are identified in order to discretize numerical variables. To do so, chi-square tests of independence between counts generated by variable thresholding and the outcome variable (survived / not survived). Essentially, if a proposed threshold value splits the patient data in such a way that the chi-square test is significant (p-value < 0.05) that threshold is kept, otherwise rejected. The procedure can be recursively performed to increase the number of cut-points and thus, the number of discrete categories of a continuous variable. Finally, logistic regression is performed in a 5-fold cross-validation scheme to learn the mode. The main results are the identification of variables high PO2, old age, eye-opening score, cardiac arrest, and COPD as highly predictive of mortality. Further, they report a high AUC score of 0.925 and a large increase from the scores achieved by commonly used mortality risk algorithms like SAPS|| and APACHE||. The authors used the publicly available MIMIC ||| dataset to train and evaluate their approach.

Reasons to accept:

1. The authors report a significant increase in predictive accuracy compared to the baseline approach of SAPS|| (AUC = 0.77) and APACHE|| (AUC = 0.736).
2. The optimal threshold cut-point technique provides clinicians with thresholds that can be interpreted better in ICU environments. It makes it easier for doctor's to apply the results from the classification algorithm to alleviate mortality risk.

Reasons to reject:

There are some major issues with statistical reporting in this article:

1. In section 2.3, it is stated that non-survivor data were upsampled such that the proportion of surviving and non-surviving patients was almost equal. This was done before 5-fold cross-validation based training and evaluation. However, in my view this is not a valid way to evaluate a model since the distribution of the data has been changed. Upsampling is fine for training the model (i.e. upsampling the non-survivor proportion in any of the training folds) but not for evaluation! In an actual ICU test environment, the model has to deal with the actual data distribution, i.e. a much lower overall mortality risk. AUC, sensitivity and specificity results reported on an already upsampled dataset are not valid.

2. In the Discussion section, paragraph 2, the authors state that "The 5-Fold and Leave-one-out Cross-Validation results showed a significant improvement in performance of the logistics regression model when the partitioned continuous variables were used instead of the raw continuous variables." However, I did not find any quantitative results tabulation to prove this claim in the manuscript.

3. The source for the quoted AUC values for SAPS||, APACHE|| and SOFA scoring systems is not provided. Did the authors test these systems on the same dataset themselves ? Where are the details ?

##############

More issues regarding the approach taken:

1. I find that the authors do not show why their method is novel though they claim it to be so. The optimal threshold cut-point technique they use is very similar to the Sheth 2015 method, which itself is not much of a development from Donoho and Jin's paper: "Higher Criticism Thresholding: Optimal Feature Selection when Useful Features are Rare and Weak".

2. The authors do not justify well the reason for discretizing continuous variables. They do not show why this statistical testing based approach is better than a Decision Tree based method which would also partition the continuous variables and provide cut-points without statistical hypothesis testing.

3. The authors themselves show that a three threshold partition is better than a two-threshold partition, in terms of p-values and chi-square test statistics. If one extrapolates this argument further, in the limit of number of partitions, one recovers the continuous variable. So wouldn't the original numerical variable be better under this line of reasoning ?

4. Details on the number of features rejected based on initial feature selection is missing.

5. No comparison is made to any other state-of-the-art method, e.g. using deep learning or more powerful tree-based methods.

6. The identified important variables are not discussed in relation to what is already known about their significance in literature.

Nanopublication comments:

Further comments:

Review #2 submitted on 30/Apr/2020

By Juerg Blaser ORCID logo

https://orcid.org/0000-0002-4928-1291

Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Average
Suggested Decision: Undecided
Technical Quality of the paper: Average
Presentation: Average
Reviewer`s confidence: Medium
Significance: Moderate significance
Background: Reasonable
Novelty: Limited novelty
Data availability: With exceptions that are admissible according to the data availability guidelines, all used and produced data are FAIR and openly available in established data repositories
Length of the manuscript: The length of this manuscript is about right

Summary of paper in a few sentences:

The paper presents a novel mortality risk assessment for patients admitted to the ICU.

Reasons to accept:

An novel algorithm developed by machine learning methods allows for identifying patients at high mortality risk in real time. The results suggest better accuracy compared to established scoring systems.

Reasons to reject:

The results and conclusions presented were derived in a retrospective analysis of a rather old data set from a single center with a relatively low mortality rate. No clinical validation of the algorithm is presented, neither prospectively with a clinical study nor retrospectively using independent data sets obtained from other institutions.
The results and conclusions should be presented with more caution as long as no external validation data can be provided and the limitations of the study need to be clearly addressed in the discussion. This theoretical study may be of some value for illustrating the potential of machine learning approaches but it’s a long way to document a potential real world impact on the efficiency and quality of the clinical care and patients’ outcome.
An app present individual the risk predictions according to the algorithm. The app appears to be a stand-alone solution. Such decision support tools need to be both (a) highly integrated technically by interfacing with the clinical information systems to allow for close to real-time updates of all parameters required and (b) closely integrated into the workflow tools of the clinicians fighting work- and information-overload. The app and the screenshots presented appear be a proof of concept without studying the acceptance of such a solution within an ICU. Presenting the app does not add to the significance of the paper.
Specific comments
It’s difficult to understand the effect of data manipulations on the results such as upsampling and missing data replacement. How robust are the results when omitting incomplete data sets? To what extent data were missing?

Nanopublication comments:

Further comments:

1 Comment

Meta-Review by Editor

Submitted by Tobias Kuhn on Wed, 05/13/2020 - 06:20

Both reviewers commented on serious flaws in the evaluation of the method. There were questions about the novelty of the approach. The reviewers pointed at the lack of comparison to any other state-of-the-art method and of testing on a independent dataset.

Michael Krauthammer (https://orcid.org/0000-0002-4808-1845)

Data Science