Supervised Learning Inspired Fast Forecasting Model of 2019-nCoV Outbreak using Small Dataset

Tracking #: 628-1608

Responsible editor: 

Christine Chichester

Submission Type: 

Research Paper


A rapid spread of the 2019-novel Corona Virus (2019-nCoV) epidemic imposes a threat to society and the global economy. The epidemic induced by the contagious coronavirus resulted in the suspension of day to day activities such as education, tourism, and community services in provinces of China and its neighboring countries. The real impact of this virus on a society largely depends on its outbreak momentum. Therefore, it is imperative to formulate a robust and accurate prediction model to approximate its disastrous repercussions on human lives. Limited understanding of the 2019-nCoV outbreak with the imprecision involved induces an extraordinary challenge in framing a prudent forecasting model. This publication elucidates a collaborative framework consisting of Machine Learning (ML) and Statistical prediction methods to estimate the adversity of this virus.The suggested framework offers a high degree of accuracy in evaluating the rise in the 2019-nCoV pandemic in Chinese provinces, with a reasonably small Root Mean Square Error (RMSE) on a small dataset rendered by the World Health Organization (WHO).


Supplementary Files (optional): 


  • Reviewed

Data repository URLs: 

Description of the produced data as follows:

  1. The top two graphs of Fig. 2 created using no. of infected and no. of death data due to the COVID-2019 outbreak in China consisting of WHO data, including augmented data generated by the Linear Regression method. (Provided in the link: %2B Predicted Data using Linear Regression.xlsx).
  2. This dataset link also used for classification and calculation of the RMSE values of RFM and MLP methods, as reflected in Table 2, Table 3, Table 4 and Fig. 3, Fig. 4.
  3. Fig. 5 showed the observed and predicted no. of deaths induced by the nCoV-2019 outbreak using ARIMA, ETS, and LR-lag methods, and corresponding datasets presented in the links,,, Additionally, the RMSE values calculated using these datasets corresponding to these three methods mentioned in Table 5.
  4. Using same source datasets Fig. 6 created from the optimized RMSE values of the above three methods, i.e., ARIMA, ETS, LR-lag, RFM, and MLP.
  5. The observed death data of the WHO and the MLP-lag method's predicted data plotted in Fig. 7, and the corresponding dataset link is
  6. Fig. 8 represented the observed death using data of the WHO and predicted data of the BATS model. Consequently, we calculated the RMSE of this model. The corresponding dataset link is
  7. Finally, the BATS, MLP-lag, and our CFPSD model’s RMSE values plotted in Fig. 9.


Date of Submission: 

Friday, April 3, 2020

Date of Decision: 

Tuesday, April 28, 2020



Solicited Reviews:

1 Comment

Meta-Review by Editor

We must reject this paper due to the fact that the methodology for data augmentation, supposably a main contribution of the paper, is severely flawed.  Additionally, the unclarity regarding the use and composition of the test set make the output of the model difficult to correctly evaluate.

Christine Chichester (