Nowadays, quantum machine learning is attracting great interest in a wide range of fields due to its potential superior performance and capabilities. The massive increase in computational capacity and speed of quantum computers can lead to a quantum leap in the healthcare field. Heart disease seriously threatens human health since it is the leading cause of death worldwide. Quantum machine learning methods can propose effective solutions to predict heart disease and aid in early diagnosis. In this study, an ensemble machine learning model based on quantum machine learning classifiers is proposed to predict the risk of heart disease. The proposed model is a bagging ensemble learning model where a quantum support vector classifier was used as a base classifier. Furthermore, in order to make the model’s outcomes more explainable, the importance of every single feature in the prediction is computed and visualized using SHapley Additive exPlanations (SHAP) framework. In the experimental study, other stand-alone quantum classifiers, namely, Quantum Support Vector Classifier (QSVC), Quantum Neural Network (QNN), and Variational Quantum Classifier (VQC) are applied and compared with classical machine learning classifiers such as Support Vector Machine (SVM), and Artificial Neural Network (ANN). The experimental results on the Cleveland dataset reveal the superiority of QSVC compared to the others, which explains its use in the proposed bagging model. The Bagging-QSVC model outperforms all aforementioned classifiers with an accuracy of 90.16% while showing great competitiveness compared to some state-of-the-art models using the same dataset. The results of the study indicate that quantum machine learning classifiers perform better than classical machine learning classifiers in predicting heart disease. In addition, the study reveals that the bagging ensemble learning technique is effective in improving the prediction accuracy of quantum classifiers.

Healthcare is one of the most influential fields on the global population’s safety. The continuous evolution of the healthcare sector facilitates disease prediction, treatment, diagnosis, and cure. The advancement of research and technologies in healthcare and public health has significantly decreased global mortality, with the advanced healthcare system helping in the prevention of disease progression and improvement of life quality. However, the healthcare sector has recently experienced an explosion of data and increased system complexity. The scope and quality of healthcare data open up new opportunities for healthcare practitioners to utilize advances in data science to extract valuable insights from these enormous databases. Advancements in data analytics, computing power, and algorithms are rapidly changing the prospect of healthcare by improving clinical and operational decision-making [

Cardiovascular diseases (CDVs) is a medical term that refers to diseases that affect both the heart and blood vessels and can lead to heart attack, stroke, and heart failure. According to the World Health Organization (WHO), CDVs cause approximately 17.9 million deaths annually, or 32% of all deaths worldwide. Among the risk factors for these diseases are high blood pressure, high blood glucose levels, obesity, and high blood lipid levels [

This research utilized a four-step methodology: Initially, the data are prepared using multiple pre-processing techniques, including feature selection, feature extraction, and normalization. Second, analyzing the performance of classical machine learning classifiers such as Support Vector Classifier (SVC) and Artificial Neural Network (ANN) and quantum machine learning classifiers to investigate the potential of QML models in comparison to traditional ones. Then, applying three distinct quantum machine learning classifiers namely Quantum Support Vector Classifier (QSVC), Quantum Neural Network (QNN), and Variational Quantum Classifier (VQC)). The objective of this step is to identify the best performing QML model for the given dataset. Finally, designing, applying and evaluating a Quantum Support Vector Classifier-based bagging ensemble learning model (Bagging-QSVC). In addition to explaining the significance and indication of the findings, the interpretation of a machine learning model facilitates the understanding of why a particular decision was made. The interpretability of the proposed model, where the importance and contribution of each feature to the prediction are computed and visualized using the SHapley Additive exPlanations (SHAP) framework, is thus another novel aspect of this work. To evaluate the models, several performance metrics, including accuracy, recall, precision, F1-score, and ROC, were calculated, with the results indicating that the quantum classifiers outperform the classical classifiers in terms of performance. In addition, the study demonstrated that ensemble learning yields superior predictive performance for quantum classifiers compared to classical classifiers. In addition, it can be deduced that combining ensemble learning with quantum classifiers can produce successful results.

The remaining sections are organized as follows. In Section 2, a review of related work is presented. Section 3 is devoted to the description of the background material used in this study, which is comprised of the two main ingredients EL and QML, as well as the methodology employed, with a focus on the data set, the investigated models, and the proposed ensemble model. Section 4 provides the experimental study, obtained results, and model interpretation. In Section 5, conclusions and future work are drawn.

Machine learning classification algorithms are vastly utilized in many fields to solve numerous problems. A field such as healthcare is considered a rich machine learning domain, where machine learning can be employed to tackle various medical decisions. Heart disease is a major health problem investigated by researchers using novel machine learning methods. Ensemble learning is one of the machine learning methods that has proven to boost machine learning performance. A remarkable number of previous works utilized ensemble learning to improve the accuracy of heart disease prediction using multiple methodologies. In the light of recent past studies, Gao et al. [

Among several studies, the majority voting ensemble learning approach is largely used in the literature of heart disease [

On other hand, few researchers have addressed the ensemble learning approach besides quantum machine learning to solve the heart disease prediction problem [

The field of machine learning investigates algorithms and techniques that allow computers to automatically find solutions to complex problems that traditional programming methods cannot solve. Machine learning can be leveraged to provide insights into the pattern in a dataset by trying to design an efficient model that learns from a training dataset to predict outcomes [

Ensemble learning is a popular machine learning approach that combines several models and then assembles their outputs to make more accurate predictions [

The bagging or (bootstrap aggregation) method produces one model at a time from a random sample or (bootstrap sample) that has the same size as the dataset [

Personal computers are working on the concept of classical mechanics, which uses a bit of 0 or 1 as a fundamental unit of electronic circuits. However, quantum machines employ a quantum bit (or qubit) that can simultaneously occupy two fundamental states, 0 and 1 [

The state of a quantum system is given by a vector that has a particular notation in quantum systems called the Dirac notation which is denoted by ∣ψ⟩ [

The linear coefficients alpha (α), and beta (β) belong to the complex numbers,

A qubit can be visualized using a Bloch sphere which is a geometric representation of the qubit states. The continuous combination of the two states |0⟩ and |1⟩ can be placed in any potential points on the Bloch sphere. A qubit is represented on a Bloch sphere as a point on the surface of the sphere. Hence, a generic quantum state |ψ⟩ can be represented by the three parameters

The global factor

The states in quantum computing can be represented in the Bloch sphere as a vector that starts at the original centre and ends on the sphere surface, where the vector is represented by an arrow pointing to a location on a sphere. The three-dimensional graphical representation of a single qubit using the Bloch sphere is represented in

Quantum Machine Learning (QML) field represents the intersection between the concepts of machine learning and quantum computing [

The purpose of this study is to investigate the potential of quantum machine learning algorithms for predicting heart disease. As a result, the research was divided into four distinct phases, the first of which dealt with the Cleveland dataset using various pre-processing techniques, including Recursive Feature Elimination (RFE) for feature selection, Principal Component Analysis (PCA) for feature extraction, and Min-Max normalization. In the second phase, classical classifiers (SVC and ANN) were compared to quantum classifiers. Three different quantum machine learning classifiers (QSVC, QNN, and VQC) were investigated in the third phase. Finally, a bagging ensemble learning model based on Quantum Support Vector Classifier (Bagging-QSVC) has been designed and implemented.

This research uses the UCI machine learning repository’s [

Feature code | Feature name | Data type | Description |
---|---|---|---|

Age | Age | Numerical (continuous) | Age in years |

Sex | Sex | Categorical (binary) | 1 = male, and 0 = female |

CP | Chest Pain types | Categorical (multi-valued) | Chest Pain types: |

Trestbps | Resting blood pressure | Numerical (continuous) | Resting blood pressure (in mm Hg) |

Chol | cholesterol | Numerical (continuous) | Serum cholesterol (in mg/dl) |

Fbs | Fasting blood sugar | Categorical (binary) | Fasting blood sugar > 120 mg/dl: |

Thalach | Maximum heart rate | Numerical (continuous) | maximum heart rate reached during thallium test |

Restecg | Resting electrocardiographic | Categorical (multi-valued) | Resting electrocardiographic result: |

Exang | Exercise-induced angina | Categorical (binary) | Exercise-induced angina: |

Oldpeak | ST depression | Numerical (continuous) | ST depression caused by exercise relative to rest |

Slope | ST slope | Categorical (multi-valued) | The peak exercise ST segment slope: 1 = ascending, 2 = flat, 3 = descending |

Ca | Number of major vessels | Categorical (multi-valued) | Number of main vessels: |

Thal | Thallium heart test | Categorical (multi-valued) | Exercise Thallium heart test result: |

Target | Heart disease | Categorical (binary) | Heart disease diagnosing: |

In quantum computers, Quantum SVC is the quantum counterpart of the classical SVC. Since the classical SVC handles problems in higher dimension space, the computational resources required to solve them on classical computers can be costly and time-consuming [

Quantum Neural Networks combine the fundamentals of conventional ANN with quantum computation models that outperform conventional ANN [

Variational Quantum Classifier [

Since it has been proven that ensemble learning enhances the performance of the models, the model proposed in this work consists of a bagging ensemble model with QSVC Classifier (Bagging-QSVC). QSVC was chosen with the bagging model because it achieved the highest performance among the three quantum classifiers. Nevertheless, in the bagging ensemble method, each model was trained on a random sample of the dataset, where the random samples (or bootstrap samples) have the same size as the original dataset. Bagging models make use of sampling with a replacement that duplicates or ignores some instances from the original dataset in each bootstrap sample. Therefore, each subsample had different instances, and these subsamples were used to train 100 different QSVC models that fitted in parallel. Subsequently, once the separated models in the ensemble had been trained, the ensemble aggregated their predictions by returning the class that gained the majority of the votes to get the final output of the ensemble model. The proposed Bagging-QSVC model is illustrated in

The experimental study was conducted using qiskit in jupyter notebook and python. The code is accessible at [^{th} generation Core i7 processor and 8 GB of RAM.

To evaluate the performance of the predictive models applied in this work (QSVC, SVC, QNN, ANN, VQC, and Bagging-QSVC), a set of performance measures including accuracy, precision, recall, F_{1}-measure and area under the curve (AUC) or ROC index have been used. The definitions and equations of these performance measures are provided below

– Accuracy: The percentage of the total number of instances that are correctly classified relative to the number of all tested instances.

– Precision: The ratio between the number of positive instances that are correctly classified and all instances predicted as positive. The precision presents how confident an instance predicted with a positive target actually has a positive target level.

– Recall (or sensitivity): The ratio between the positive instances and the number of all actual positive instances. The recall presents how confident all the instances with a positive target the model found.

– F_{1}-measure: The harmonic mean of precision and recall measures. The F-measure, precision, and recall can assume values in the range [0,1], where the larger values indicate better performance.

– The ROC index (or Area Under the Curve): The ROC curve relates the True Positive Rate (TPR) (the positive points correctly predicted as positive) to the False Positive Rate (FPR). The diagonal of the ROC curve represents the expected performance of a model with random predictions, while the closer the curve is to the upper left corner (or a higher AUC value), the more predictive the model. The ROC index or AUC can take on values between 0 and 1, with larger values indicating superior model performance [

Classifier | Accuracy | Precision | Recall | F1-score |
---|---|---|---|---|

SVM | 85.24% | 0.85 | 0.85 | 0.85 |

ANN | 85.24% | 0.85 | 0.85 | 0.85 |

QSVC | 88.52% | 0.88 | 0.89 | 0.89 |

QNN | 86.84% | 0.88 | 0.86 | 0.87 |

VQC | 86.89% | 0.85 | 0.85 | 0.85 |

Bagging-QSVC | 90.16% | 0.90 | 0.90 | 0.90 |

Classifier | Confusion matrix | ROC curve |
---|---|---|

SVM | ||

ANN | ||

QSVC | ||

QNN | ||

VQC | ||

Bagging-QSVC |

The comparative analysis shows that the proposed Bagging-QSVC outperforms some previous studies in predicting heart disease with improved accuracy.

Study | Ensemble learning method | Accuracy achieved |
---|---|---|

Latha et al. [ |
Majority voting | 85.48% |

Tama et al. [ |
Stacking ensemble learning | 85.71% |

Mehanović et al. [ |
Majority voting | 87.37% |

Alim et al. [ |
Random Forest | 86.94% |

Kumar et al. [ |
Quantum random forest | 89% |

Our proposed model | Bagging-QSVC | 90.16% |

The SHapley Additive exPlanations (SHAP) framework was used to interpret and explain the model results [

The Beeswarm plot represents feature importance in descending order, as well as feature impacts on prediction, whether positive or negative. The SHAP values are used to depict the impact of higher and lower feature values on the model output. The feature determines the position on the y-axis, and positive and negative SHAP values determine the position on the x-axis. Each point in the plot represents a single observation, and the colour represents how the higher and lower values of the feature affect the result, with red representing a higher value and blue representing a lower value of a feature.

The SHAP force plot visualizes the SHAP values of each feature as a force that either increases or decreases the prediction, representing the contribution of each feature to the model’s prediction of a particular observation. Each feature's force is represented by a red or blue arrow, depending on whether it increases or decreases the model's score. The features that have a greater influence on the prediction are located closer to the dividing line, and the size of the arrow represents the magnitude of this influence. The force plot illustrates the significance of each feature in adjusting the model to increase or decrease the prediction based on a SHAP value baseline. These characteristics counterbalance one another to determine the final prediction of the data instance.

As implied by its name, the Stacked SHAP force plot combines multiple force plots, each of which depicts the prediction of an instance in the dataset. The Staked plot depicts the predictions for all samples in the dataset in a single plot by rotating the force plots of each instance by 90 degrees and stacking them vertically next to one another based on their clustering similarity. Each x-axis position represents an instance of the data, while each y-axis position represents the baseline prediction. The red SHAP values result in a higher prediction, whereas the blue SHAP values result in a lower prediction, and the values at the top of the vertical axis indicate a higher probability of being classified as class 1, whereas the values at the bottom of the vertical axis indicate a higher probability of being classified as class 0.

Heart disease is a major cause of death, and early detection can help prevent the disease’s progression. Consequently, the objective of this study was to investigate the potential of quantum machine learning for predicting the risk of cardiovascular disease by developing an ensemble learning model based on quantum machine learning algorithms. The Bagging-QSVC model involves randomly subdividing the dataset into smaller subsets and modelling each subset using QSVC. Afterwards, an ensemble was formed using the majority voting method. On the Cleveland dataset, the proposed ensemble model achieved a higher classification accuracy of 90.16%, compared to 88.52%, 86.84%, and 85.25% for QSVC, QNN, and VQC, respectively, and 85.24% for both SVC and ANN. Accordingly, the various performance measures and ROC curves showed that the proposed model performed better than other machine learning models. According to the findings of this study, quantum classifiers are more effective than classical classifiers. Furthermore, results showed also that ensemble learning models can enhance the performance of quantum classifiers. In addition to predicting and diagnosing other cardiovascular diseases and aiding in medical decision-making, the model proposed in this study can be used for the early prediction of heart disease risk to avoid any serious consequences, as well as for predicting and diagnosing other cardiovascular diseases.

The authors would like to acknowledge the Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2022R196), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.