The COVID-19 outbreak began in December 2019 and was declared a global health emergency by the World Health Organization. The four most dominating variants are Beta, Gamma, Delta, and Omicron. After the administration of vaccine doses, an eminent decline in new cases has been observed. The COVID-19 vaccine induces neutralizing antibodies and T-cells in our bodies. However, strong variants like Delta and Omicron tend to escape these neutralizing antibodies elicited by COVID-19 vaccination. Therefore, it is indispensable to study, analyze and most importantly, predict the response of SARS-CoV-2-derived t-cell epitopes against Covid variants in vaccinated and unvaccinated persons. In this regard, machine learning can be effectively utilized for predicting the response of COVID-derived t-cell epitopes. In this study, prediction of T-cells Epitopes’ response was conducted for vaccinated and unvaccinated people for Beta, Gamma, Delta, and Omicron variants. The dataset was divided into two classes, i.e., vaccinated and unvaccinated, and the predicted response of T-cell Epitopes was divided into three categories, i.e., Strong, Impaired, and Over-activated. For the aforementioned prediction purposes, a self-proposed Bayesian neural network has been designed by combining variational inference and flow normalization optimizers. Furthermore, the Hidden Markov Model has also been trained on the same dataset to compare the results of the self-proposed Bayesian neural network with this state-of-the-art statistical approach. Extensive experimentation and results demonstrate the efficacy of the proposed network in terms of accurate prediction and reduced error.

Since its outbreak, the COVID-19 virus has affected our society and economies in several ways [

The response of the human body’s immune system plays a critical role against viruses or other foreign particles [

It is still under investigation whether or not the Omicron variant completely evades the T-cell immunity elicited by COVID-19 vaccination. Therefore, it is indispensable to study, analyze, and most importantly predict the response of SARS-CoV-2-derived t-cell epitopes against different Covid variants in vaccinated and unvaccinated persons. In this regard, machine learning methods can effectively predict the response of COVID-derived t-cell epitopes in vaccinated and unvaccinated persons, which could help and further the vaccine development studies by providing realistic prediction data.

This paper focuses on the machine learning-based prediction analysis of T-cell epitope’s response against different Covid variants in vaccinated and unvaccinated persons. The prediction analysis has been carried out using a self-proposed Bayesian neural network by combining variational inference and flow normalization optimizers. Furthermore, the Hidden Markov Model (HMM) has also been trained on the same dataset to compare the results of the self-proposed Bayesian neural network with this state-of-the-art statistical approach. Extensive experimentation and results demonstrate the efficacy of the proposed network in terms of accurate prediction and reduced error.

The main contributions of this paper are:

A self-proposed Bayesian Neural Network with Variational Inference and flow normalization optimizer has been developed to predict the response of T-cell epitope against four variants of COVID-19 in both vaccinated and unvaccinated persons.

The self-proposed Bayesian Neural Network has been compared with the Hidden Markov Method for validation purposes.

To validate the cross-variational inference of the proposed Bayesian Neural Network, it has been compared with the Monte Carlo method. Similarly, HMM has been cross-validated with the Support Vector Machine (SVM).

In literature, Bayesian Neural Network and Hidden Markov Method can be found for several related applications; for instance, a shallow long short-term memory (LSTM) based neural network has been proposed in [

Hidden Markov models (HMMs) capture randomnesses in Spatio-temporal dynamics and uncertainty in observations. In recent literature, several articles utilize the Hidden Markov Method for the prediction analysis of COVID data; for example, in [

While Bayesian networks (BNs) have drawn more scientific interest, their use in practice has lagged, despite their potential to impact healthcare positively. The variety of medical diseases for which healthcare-related BN models have been presented has been improved, as have the approaches taken by the models when applied to the most prevalent medical conditions. According to recent studies [

After an extensive literature review, the research gap found in the existing literature to the authors’ knowledge is that machine learning-based prediction for the T-cell epitopes’ response against Covid-19 has not been performed on four different variants. Hence, in this research, a robust analysis was conducted on four different variants of Covid-19 to predict the trend of upcoming variants or similar diseases using imbalanced and new datasets and that too in a resource constraint environment, i.e., for small datasets and low computation power.

In this study, the dataset has been divided into two categories, i.e., vaccinated and unvaccinated. The Microarray datasets have been utilized for inscribing Gene Expression Omnibus of the National Centre of Biotechnology to extract the raw gene of “SARS-CoV” [

Variants | Vaccinated T-cell epitope (affected) | Non-Vaccinated T-cell epitope (affected) | Vaccinated T-cell epitope (not affected/mild affected) |
---|---|---|---|

Beta | 234 | 1749 | 1108 |

Gamma | 228 | 1612 | 1161 |

Delta | 1208 | 2948 | 1556 |

Omicron | 111 | 312 | 198 |

Total | 1781 | 6621 | 4023 |

The overall design approach structure is shown in

The proposed Bayesian neural network approach helps to enhance the predictability of COVID-19 variants and their effects on T-cells Epitopes. Especially this method is beneficial for Omicron Variant, which can help researchers to achieve analysis in a sophisticated manner with limited data. Bayesian Neural Network, in comparison to the standard neural network is more marginalized, and weights tend towards probabilistic distributions instead of single set weights distribution. It also helps in approximate predictive analysis. In this paper, the utilized technique to implement a Bayesian neural network includes integrating a variational inference with deep learning. Initially, the TensorFlow probability library is imported, and a sequential module is initialized. The model is divided into three parts, i.e., the input layer, the hidden layer and the output layer. The variational inference is applied as a dropout layer to obtain an approximate prediction analysis of different variants of COVID-19 on T-cell epitopes. Moreover, the optimization technique of variational inference using a flow normalization surrogate has been used to enhance results.

In this study, the surrogate is built using normalization flow. Normalization flow helps to build custom probability distributions. This is different from the Bijector-based surrogate, in which a new Bijector (also known as Inverse Autoregressive flows (IAF)) transformation has to be established and have been implemented in [

The model of HMM is prepared using PyTorch Libraries, inculcating forward and backward propagation techniques for minimizing the total probability difference with respect to actual figures. Prediction analysis was carried out on a P100 Graphic Processing Unit (GPU). The GPU had a frequency of 1190 MHz with a memory unit of 715 MHz. Being a dual-slot card, the NVIDIA Tesla P100 PCIe 16 GB drew power from a 1x 8-pin power connector, with a power draw rated at 250 W maximum.

The HMM architecture depicting the emission probabilities is shown in

The training performance of the model was evaluated with different parameters. The parameters inscribed prediction analysis, mean losses, median losses, IAF surrogate losses, and Prediction density prediction with respect to intercept.

Firstly, prediction analysis of the response of T-cell epitope with respect to different variants was observed after rigorous training in the case of unvaccinated people. The overall analysis predicted that the Delta variant was more impairing towards T-cell epitope with respect to the Omicron variant, as shown in

Variant | Strong T-cell epitope | Impaired T-cell epitope | Over-activated T- |
---|---|---|---|

% Prediction effect by Beta | 41.22% | 29.1% | 16% |

% Prediction effect by Gamma | 55.046% | 34.58% | 16.15% |

% Prediction effect by Delta | 58.17% | 51.91% | 23.89% |

% Prediction effect by Omicron | 41.4% | 37.13% | 11.99% |

Moreover, similar training was performed on the vaccinated group. The results depicted that the Omicron was less affected by the T-Cell epitope, which shows that the T-cell is attacking or protecting the body from the Omicron variant, as shown in

Variant | Strong T-cell epitope | Impaired T-cell epitope | Over-activated T-cell epitope |
---|---|---|---|

% Prediction effect by Beta | 69.5% | 13.2% | 13.1% |

% Prediction effect by Gamma | 74.07% | 8.76% | 18.9% |

% Prediction effect by Delta | 68.17% | 31.23% | 24.111% |

% Prediction effect by Omicron | 73.19% | 28.2% | 9.2% |

While comparing

The holistic scope of this study is towards an in-depth comparison of HMM and a self-proposed model in terms of accuracy and computational power (will be discussed in an upcoming section). However, the accuracy of state-of-the-art HMM and Bayesian models is near and reasonable. However, to cross-verify the proper working and methodology of the Self-proposed Bayesian Model, further analyses are performed in further sections for satisfactory cross-validating results.

To assess convergence, the progress of the model fitting process was followed. These optimizations can be observed under loss functions in the fitting process. The mean squared error (MSE) loss function has been used. The loss plots of the IAF surrogate in

The evidence lower bound (ELBO, also known as the variational lower bound) is a lower bound on the likelihood of seeing some data in a given model. It is an important parameter to be observed and gives direction about how well surrogate density performs for a specific prediction model, for example, prediction analysis of variants with respect to t-cell epitopes in our study.

The sample size is enhanced and then observed for better convergence of the model. The ELBO for this model is predicted to be ELBO = −661.4, and given the comparison with earlier loss numbers, it appears plausible that the ELBO should be slightly less. However, the difference is minor and arguably negligible. Even though the noise parameter is enhanced by increasing the sample size to 30, the ELBO value is only slightly enhanced. The drawback of noise is that it can disturb the stability of the whole case, as shown in

After the abovementioned analysis, a question arises whether flow normalization performs better in Variational Inference. Because the ELBO value was not much differentiating at different sample points, this density comparison for all variants of COVID-19 prediction was estimated for the model against the T-Cell epitopes dataset. We compare density estimates from the VI IAF model with those from Just Another Gibbs Sampler (JAGS) to see if normalizing flows have helped with our current modelling difficulty (which uses MCMC). As it does not use any surrogate density, the Markov chain Monte Carlo sampling becomes the preferred approach here, as shown in

The training performances of the model were evaluated with different parameters. The parameters inscribed prediction analysis, comparison analysis with SVM and processing analysis.

Firstly, prediction analysis of the response of T-cell epitope with respect to different variants was observed after rigorous training in the case of unvaccinated people. The overall results analysis predicted that the Delta variant was more impairing towards T-cell epitope with respect to the Omicron variant can be shown in

Variant | Strong T-cell epitope | Impaired T-cell epitope | Over-activated T-cell epitope |
---|---|---|---|

% Prediction effect by Beta | 47.32% | 32.67% | 15.41% |

% Prediction effect by Gamma | 56.44% | 31.23% | 17.55% |

% Prediction effect by Delta | 31.85% | 56.56% | 25.78% |

% Prediction effect by Omicron | 44.12% | 39.22% | 14.48% |

Similarly, the same training methodology was implemented on the vaccinated group, and the results depicted that Omicron was less affected by the T-Cell epitopes. This portrays that T-cells might be protecting the body from the Omicron variant, as shown in

Variant | Strong T-cell epitope | Impaired T-cell epitope | Over-activated T-cell epitope |
---|---|---|---|

% Prediction effect by Beta | 73.11% | 11.8% | 5.06% |

% Prediction effect by Gamma | 71.01% | 7.22% | 9.88% |

% Prediction effect by Delta | 63.67% | 33.001% | 14.11% |

% Prediction effect by Omicron | 71.9% | 26.90% | 9.64% |

In

SVM categorizes data points even when they are not linearly separable by mapping the data to a high-dimensional feature space. The data are changed to enable the hyper plane representation of the separator once a separator between the categories has been found. Therefore, cross-validation with SVM was carried out to cross-check HMM on this dataset, and the results were considerable, as shown in ^{th} bulk epoch size. A support vector machine is a validation factor because it helps map high dimensional points at dataset imbalance [

The comparison analysis was carried out as given in

HMM | BNN | HMM | BNN | HMM | BNN | ||||
---|---|---|---|---|---|---|---|---|---|

Variant (%) | Strong T-cell epitope | % error | Impaired T-cell epitope | % error | Over-activated T-cell epitope | % error | |||

Prediction effect by Beta | 73.11% | 69.5% | 3.61% | 11.8% | 13.2% | 1.4% | 5.06% | 13.1% | 8.04% |

Prediction effect by Gamma | 71.01% | 74.07% | 3.06% | 7.22% | 8.76% | 1.54% | 9.88% | 18.9% | 9.02% |

Prediction effect by Delta | 63.67% | 68.17% | 4.5% | 33.001% | 31.23% | 1.77% | 14.11% | 24.111% | 10% |

Prediction effect by Omicron | 71.9% | 73.19% | 1.29% | 26.9% | 28.2% | 1.3% | 9.64% | 9.2% | 0.44% |

Comparison analysis was conducted to determine whether the proposed Bayesian neural network is in proper directives. So, by comparing the values with the state-of-the-art Hidden Markov model, the % error was relatively less in string and impaired the T-Cell epitope’s analysis. Furthermore, the edge of the self-proposed Bayesian Neural network on the Hidden Markov model was that no data cleaning was required. On the other hand, data cleaning was required in the Hidden Markov model for accuracy. Moreover, the processing time of Self proposed model was less than the Hidden Markov model and other Bayesian Neural Networks already on the market. Similarly, cross-check optimizations were performed for the Bayesian Neural network, and cross-check validation was performed with the state-of-the-art Monte Carlo method. It was observed that the Bayesian Neural network performed exceptionally well. The graphical trend is shown in

Training losses were computed before model optimization and after model optimization. Training Losses were computed, and 10,000 numbers of training steps were performed. After optimization of the Bayesian model, training losses were significantly reduced, as shown in

Average Processing time and computational cost at different states were observed in

Variant prediction analysis | Bayesian model (epoch = 1000) | Hidden Markov model (epoch = 1000) |
---|---|---|

Gamma | 2 h 46 min | 2 h 51 min |

Beta | 2 h 39 min | 2 h 44 min |

Delta | 3 h 56 min | 3 h 59 min |

Omicron | 49 min | 1-h 5 min |

This paper presents a self-proposed Bayesian neural network with variational inference and normalization flow for the prediction analysis of T-cell epitopes’ response against four Covid variants, i.e., Beta, Delta, Gamma, and Omicron response for vaccinated and unvaccinated samples. The dataset was divided into two categories: vaccinated and unvaccinated. The least affected variant in the unvaccinated group predicted through a self-proposed Bayesian neural network was the Beta variant of Covid-19, standing at 29.1% among the impaired group. Furthermore, the least affected variant in the vaccinated group was the Gamma variant. A comparison analysis was made with Hidden Markov Model to validate results further. The results were similar to the self-proposed model. However, the additional advantage of our self-proposed network included reduced computational complexity compared to the standard Hidden Markov model. Furthermore, it was predicted that under the unvaccinated category delta variant was creating the most impaired T-cell epitopes with a ratio of more than 50%. On the other hand, in the vaccinated category, this percentage was considerably reduced in all variants, especially the delta variant. For future work, the proposed algorithm can be deployed on real time devices, and the network can also be trained in a distributed manner by exploiting federated learning techniques. In clinical practices, such algorithms on real time devices can help predict the impacts of COVID-19 variants on vaccinated and unvaccinated people.

Authors extend their appreciation to the Deanship of Scientific Research at Imam Mohammad Ibn Saud Islamic University for funding this work through Research Group No. RG-21-07-05.

This paper is funded by the

The authors declare that they have no conflicts of interest to report regarding the present study.