Analysis of machine learning methods for COVID-19 detection using serum raman spectroscopy

Tracking #: 691-1671


Responsible editor: 

Manik Sharma

Submission Type: 

Research Paper


One of the most challenging aspects of the emergent COVID-19 pandemic caused by infection of SARS-CoV-2 has been the need for massive diagnostic tests to detect and track infection rates at the population level. Current tests such as RT-PCR can be low-throughput and labor intensive. An ultra-fast and accurate mode of detecting COVID-19 infection is crucial for healthcare workers to make informed decisions in fast-paced clinical settings. The high-dimensional, feature-rich components of raman spectra and validated predictive power for identifying human disease, cancer, as well as bacterial and viral infections poses the potential to train a supervised classification machine-learning algorithm on raman spectra of patient serum samples to detect COVID-19 infection. We developed a novel stacked subsemble classifier model coupled with an iteratively validated and automated feature selection and engineering workflow to predict COVID-19 infection status from raman spectra of 250 human serum samples, with a 10-fold cross validated classification accuracy of 98.4% (98.6% precision and 95.9% sensitivity). Furthermore, we benchmarked 9 machine learning and artificial neural network models when evaluated using 8 standalone performance metrics to assess whether ensemble methods offered any improvement from baseline machine learning models. Using a rank normalized scores derived from the performance metrics, the stacked subsemble model ranked higher than the Multi-layer Perceptron, which in turn ranked higher than the 8 other machine learning models. This study serves as a proof of concept that stacked ensemble machine learning models are a powerful predictive tool for COVID-19 diagnostics.



  • Under Review

Data repository URLs: 

Date of Submission: 

Tuesday, March 30, 2021