Analysis and forecasting of epidemic patterns in new SARSCoV-2 positive patients are presented in this research using metaheuristic optimization and long short-term memory (LSTM). The optimization method employed for optimizing the parameters of LSTM is Al-Biruni Earth Radius (BER) algorithm.

To evaluate the effectiveness of the proposed methodology, a dataset is collected based on the recorded cases in Saudi Arabia between March 7^{th}, 2020 and July 13^{th}, 2022. In addition, six regression models were included in the conducted experiments to show the effectiveness and superiority of the proposed approach. The achieved results show that the proposed approach could reduce the mean square error (MSE), mean absolute error (MAE), and R^{2} by 5.92%, 3.66%, and 39.44%, respectively, when compared with the six base models. On the other hand, a statistical analysis is performed to measure the significance of the proposed approach.

The achieved results confirm the effectiveness, superiority, and significance of the proposed approach in predicting the infection cases of COVID-19.

As of December 31, 2019, the World Health Organization (WHO) has declared a cluster of pneumonia cases in Wuhan, China, as Coronavirus Disease 2019 (COVID-19). A pandemic was declared on March 11, 2020, after the fast spread of COVID-19 globally [

Several optimization algorithms have been presented during the past decade in order to enhance the performance of machine learning models. Choosing a technique is often based on how well it performs from a range of angles, in the most generic sense. The cost of calculation, accuracy or even the difficulty of implementation can all be taken into consideration. According to [

In this paper, we propose the application of Al-Biruni earth radius (BER) optimization algorithm for optimizing the parameters of LSTM network to improve the prediction of COVID-19 positive cases. The proposed approach is evaluated in terms of a dataset collected from the recorded cases in Saudi Arabia during the period from 7^{th} March 2020 to 13^{th} July 2022. The proposed approach is compared with six other regression models to show its effectiveness and superiority. What follows is the outline of the remainder of this paper. The background of COVID-19 prediction is presented in Section 2. The proposed methodology is discussed in Section 3, followed by an explanation of the achieved results in Section 4. Finally, the conclusions of the findings are presented in Section 5.

Time-series data is a type of numerical data that includes a time stamp for each value. It is possible to study the time series using either statistical or machine learning techniques. Autoregressive integrated moving average (ARIMA) is commonly used for this purpose, but it’s not always necessary. The ARIMA incorporates both the autoregression (AR) and the moving average (MA) models into a single equation. Two of the three components of ARIMA are the moving average and the auto-regression. Another ARIMA extension, known as seasonal ARIMA (SARIMA) [

Authors in [

To build a robust model for predicting COVID-19 positive cases, the LSTM is employed along with the BER optimization algorithm. A dataset is collected to verify the effectiveness of the proposed method. The records of the dataset are preprocessed by deleting empty entries, resolving missing values, and normalization. The steps depicted in

A comma-separated value (CSV) format was used to get the data of interest. According to the International Standards Organization (ISO) 3166-1 standard, a country can be identified by its country code. “KSA” is an ISO code that may be used to obtain the Saudi Arabian Department of Civil Protection’s dataset from “Our World in Data COVID-19 Cases”. To further narrow the scope of the data, a date range can be specified. In any dataset, we can find information that is irrelevant to our model. The information of interest, which included additional positive instances, was gathered during the preprocessing stage. We used autocorrelation analysis to find stationary spots in the time series. Functions for training and testing datasets were constructed based on the original dataset and the number of previous time steps used as input variables to forecast the future time period (i.e., look back), which are the two major variables. Datasets were constructed with the default setting of creating datasets with the number of observations (X) and look-backs at each point in time (t + look back). During training, we utilized a look-back value of seven (7 days or one week). In order to create the LSTM model, the data has to be transformed. The final format included [samples, time steps, and features]. “Looking back to the previous day’s information, the samples were made up of information from that day’s data, and the time step was one day (the data was gathered daily). We divided the data into two sets: one for training and the other for testing. 80% of the observations were used for training, while the remaining 20% were used for testing. Afterward, we preserved the test set and randomly selected 80% of the training set as the new training set, while the rest (20%) was the validation set.

The Long Short-Term Memory (LSTM) model, shown in

where

To increase the accuracy of the COVID-19 prediction, a new approach is proposed by adjusting the hyperparameters of the LSTM. This section begins by presenting the LSTM’s structure and describing which parameters are being improved, followed by presenting the optimization algorithm that is employed to optimize the parameters of LSTM.

It is the goal of optimization algorithms to find the best possible solution to a problem given limitations. When using BER, an individual from the population may be shown in the form of a ‘S’ vector,

The evaluation of the proposed approach is performed, and the results are explained in this section. The section starts by describing the dataset included in the conducted experiments, followed by the evaluation criteria and an explanation of the achieved results.

The “Our World in Data COVID-19 Cases” dataset (ourworldindata.org/covid-cases, accessed on 13 July 2022) was used in the suggested technique [^{th} March 2020 and 13^{th} July 2022, new cases were registered.

The metrics used to assess the proposed methodology and their corresponding formulas are presented in ^{2} metrics [_{i}_{i}_{i}_{i}

Key | Formula |
---|---|

NSE | |

RMSE | |

MAPE | |

MAE | |

NRMSE | |

R^{2} |

To prove the effectiveness and superiority of the proposed approach, several experiments were conducted to predict COVID-19. Firstly, a set of baseline experiments were conducted using six base models, including LSTM, BILSTM, GRU, LSTMs, BILSTMs, and CONVLSTMs. The results of these models were compared to the achieved results using the optimized LSTM based on BER algorithm.

Model | LSTM | BILSTM | GRU | LSTMs | BILSTMs | CONVLSTMs | BER/LSTM |
---|---|---|---|---|---|---|---|

MSE train | 21463.01 | 24095.68 | 26017.09 | 79823.45 | 27384.12 | 24669.21 | |

MSE test | 48329.74 | 63435.45 | 62599.56 | 84688.31 | 89843.57 | 753748.1 | |

RMSE train | 146.5 | 155.23 | 161.3 | 282.53 | 165.48 | 157.06 | |

RMSE test | 219.84 | 251.86 | 250.2 | 291.01 | 299.74 | 868.19 | |

MAE train | 90.96 | 81.86 | 96.03 | 196.55 | 93.71 | 108.81 | |

MAE test | 114.62 | 107.79 | 140.99 | 153.64 | 125.02 | 375.86 | |

R^{2} train |
0.98 | 0.97 | 0.97 | 0.91 | 0.97 | 0.97 | |

R^{2} test |
0.98 | 0.97 | 0.97 | 0.96 | 0.96 | 0.71 | |

RRMSE train | 0.16 | 0.17 | 0.18 | 0.31 | 0.18 | 0.16 | |

RRMSE test | 0.26 | 0.3 | 0.29 | 0.31 | 0.35 | 0.94 | |

MAPE train | 38.4 | −44.63 | −23.84 | 186.69 | −67.3 | −7.9 | |

MAPE test | 29.7 | −54.78 | −8.82 | 109.72 | −91.66 | −195.08 | |

NSE train | 0.98 | 0.97 | 0.97 | 0.91 | 0.97 | 0.97 | |

NSE test | 0.98 | 0.97 | 0.97 | 0.91 | 0.96 | 0.71 |

As presented in the table, the proposed approach could achieve the best values over all the evaluation criteria, which confirms the superiority of the proposed approach. The achieved MSE on the test set using the proposed approach is (45965.88), whereas the best MSE achieved by the base models is (48329.74). In addition, MAE, R^{2}, MAPE, and NSE of the test set using the proposed approach are (103.84), (0.99), (26.8), and (0.99). These values prove the effectiveness of the proposed approach.

On the other hand, a statistical analysis is performed to clearly investigate the effectiveness of the proposed approach.

LSTM | BILSTM | GRU | LSTMs | BILSTMs | CONVLSTMs | BER/LSTM | |
---|---|---|---|---|---|---|---|

Num values | 13 | 13 | 13 | 13 | 13 | 13 | |

Minimum | 217.8 | 250.9 | 250.2 | 290 | 298.7 | 868.2 | |

25% percentile | 219.8 | 251.9 | 250.2 | 291 | 299.7 | 868.2 | |

Median | 219.8 | 251.9 | 250.2 | 291 | 299.7 | 868.2 | |

75% percentile | 219.8 | 251.9 | 250.7 | 291 | 299.9 | 876.7 | |

Maximum | 222.8 | 255 | 261 | 297 | 305 | 898.2 | |

Range | 5 | 4.14 | 10.8 | 7 | 6.26 | 30 | |

Mean | 219.8 | 252.1 | 251.5 | 291.7 | 300.3 | 873.3 | |

Std. deviation | 1.08 | 0.9718 | 3.155 | 1.974 | 1.589 | 9.59 | |

Std. err of mean | 0.2996 | 0.2695 | 0.8752 | 0.5475 | 0.4408 | 2.66 | |

Sum | 2858 | 3277 | 3269 | 3792 | 3903 | 11353 |

Moreover, the one-way analysis of variance (ANOVA) and the Wilcoxon signed rank tests are performed to study the stability of the proposed approach. The results of these tests are presented in

ANOVA | SS | DF | MS | F (DFn, DFd) | |
---|---|---|---|---|---|

Between columns | 4388558 | 6 | 731426 | F (6, 84) = 46279 | |

Within columns | 1328 | 84 | 15.8 | ||

Total | 4389885 | 90 |

LSTM | BILSTM | GRU | LSTMs | BILSTMs | CONVLSTMs | BER/LSTM | |
---|---|---|---|---|---|---|---|

Actual median | 219.8 | 251.9 | 250.2 | 291 | 299.7 | 868.2 | 201.3 |

Theoretical median | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

Number of values | 13 | 13 | 13 | 13 | 13 | 13 | 13 |

Sum of positive ranks | 91 | 91 | 91 | 91 | 91 | 91 | 91 |

Sum of signed ranks | 91 | 91 | 91 | 91 | 91 | 91 | 91 |

0.0002 | 0.0002 | 0.0002 | 0.0002 | 0.0002 | 0.0002 | 0.0002 | |

Exact or estimate? | Exact | Exact | Exact | Exact | Exact | Exact | Exact |

*** | *** | *** | *** | *** | *** | *** | |

Significant | Yes | Yes | Yes | Yes | Yes | Yes | Yes |

Discrepancy | 219.8 | 251.9 | 250.2 | 291 | 299.7 | 868.2 | 201.3 |

On the other hand, more results are shown in the plots depicted in

More plots are shown in

Al-Biruni metaheuristic optimization algorithm was used in this research to improve the performance of the standard LSTM network in the analysis and forecasting of the SARS-CoV-2 (COVID-19) positive cases. To prove the effectiveness of the proposed approach, a dataset is collected for analysis and prediction. The proposed approach was tested using the Saudi Arabian dataset collected from an official data source. The evaluation of the performance of the proposed approach is realized using six key performance indicators. In addition, the performance of the proposed approach is compared to the performance of the other six prediction models to show its superiority. On the other hand, a set of statistical analysis experiments, including ANOVA and Wilcoxon tests, was conducted to show the significance of the proposed approach. The recorded results confirmed the effectiveness, superiority, and significance of the proposed approach. By including numerous rates of contagiousness, as well as personal and clinical data sets, future research might improve monitoring of SARS-CoV-2 variations (e.g., clustering data for ages and comorbidities, susceptible patients, and statistics on mobility).

Princess Nourah bint Abdulrahman University Researchers Supporting Project Number (PNURSP2022R120), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.

The authors declare that they have no conflicts of interest to report regarding the present study.