In this digital era, Cardio Vascular Disease (CVD) has become the leading cause of death which has led to the mortality of 17.9 million lives each year. Earlier Diagnosis of the people who are at higher risk of CVDs helps them to receive proper treatment and helps prevent deaths. It becomes inevitable to propose a solution to predict the CVD with high accuracy. A system for predicting Cardio Vascular Disease using Deep Neural Network with Binarized Butterfly Optimization Algorithm (DNN–BBoA) is proposed. The BBoA is incorporated to select the best features. The optimal features are fed to the deep neural network classifier and it improves prediction accuracy and reduces the time complexity. The usage of a deep neural network further helps to improve the prediction accuracy with minimal complexity. The proposed system is tested with two datasets namely the Heart disease dataset from UCI repository and CVD dataset from Kaggle Repository. The proposed work is compared with different machine learning classifiers such as Support Vector Machine, Random Forest, and Decision Tree Classifier. The accuracy of the proposed DNN–BBoA is 99.35% for the heart disease data set from UCI repository yielding an accuracy of 80.98% for Kaggle repository for cardiovascular disease dataset.

American Heart Association identified Cardio Vascular Disease as the underlying cause of nearly 8,68,662 deaths in the US as per an updated fact sheet [

Amin et al. [

Li et al. proposed a work [

Devi et al. proposed a work [

Nawaz et al. designed a framework [

Different researchers applied different optimization algorithms for arriving at optimal results. Vijay Mohan and Indumathi Ganesan proposed an application for person identification [

It is inferred from the literature survey that almost all of the machine learning algorithms were exploited for disease prediction by exploiting the benefits of using feature selection algorithms. But, still, there is a gap in terms of accuracy, required and significant features usage, and volume of the health data. That’s why the focus is given to the maximization of accuracy via a deep artificial neural network and the minimization of execution time via selecting significant features instead of the entire set of features. The flow of the proposed system is as follows. The data sets used for this work are pre-processed. The significant features are extracted from the data set. The significant features are fed as input to the classifier for predicting whether the patient is affected by cardiovascular disease or not.

The highlights of the proposed work are:

The significant features influencing the disease are identified using the BBoA.

The deep neural network architecture is constructed for accuracy boosting.

The accuracy of the prediction is improved which facilitates the disease diagnosis in a better way.

The proposed Deep Neural Network-based Binarized Butterfly optimization algorithm (DNN-BBoA) is designed to predict cardiovascular disease. The overall structure of the proposed work is shown in

The input features are binarized. The input which is binarized with the value “

The population which yields the best results when validated with the optimization function is termed G_best as in

The selected features of G_best population are fed to the deep artificial network classifier. The data set used for this work is taken from the UCI repository of heart disease and the cardiovascular disease dataset from Kaggle repository. The system is comprised of three modules namely data pre-processing, feature selection, and classification.

The UCI data set is normalized using standard normalization. The CVD dataset from Kaggle has certain irrelevant values that do not lie within the range of a normal human being. The irrelevant values are dropped from the dataset as per the following pre-processing steps.

The normal range of values is listed in

Features | Range |
---|---|

Blood pressure | 60–200 mm hg |

Cholesterol | 120–600 mg/dL |

Maximum heart rate | 60–200 bpm |

Old peak | 0–6 |

Systolic BP | 120–180 mm hg |

Diastolic BP | 80–120 mm hg |

From the pre-processed data, it is necessary to select the appropriate subset of features before the learning process so that the learning can be effective. It is essential and appropriate for many applications such as disease prediction, fault prediction, and sentiment analysis as in [

There are two different types of feature selection strategies such as filter-based and wrapper-based methods. The filter-based approach extracts the dominant features using univariate statistics and these methods are expensive and have a low value of computational cost. The statistical metrics are the chi-square test, information gain, fisher’s score, correlation coefficient, etc. The dominant and informative features may be selected after ranking them based on the statistical metrics.

The Wrapper based approach follows the exhaustive approach of searching the features with all possible combinations of features. There are two selection strategies such as forward selection and backward selection. The forward selection opens up the feature selection with the empty set of features and starts adding feature one at a time until there is further enhancement in the accuracy, whereas the backward elimination approach starts with the entire set of features and prune the data set by eliminating the features one at a time. But it is an exhaustive approach increasing the time complexity. The feature selection is here inspired by Bio-inspired optimization called as Binarized Butterfly optimization algorithm.

Optimization is the process of finding the finest solution. There are various optimization techniques. Among them, the bio-inspired or meta heuristics algorithms are the optimization algorithms that are influenced and realized by the biological behavior of birds, and animals to yield an improved accuracy for classification. The Grey wolf optimization algorithm (GWO) is an optimization algorithm that imitates the hunting behavior of wolves. The GWO algorithm is modeled after the natural leadership structure and hunting mechanism of grey wolves. For replicating the leadership structure, four sorts of grey wolves are used: alpha, beta, delta, and omega. In addition, to accomplish optimization, three primary processes of hunting are implemented: searching for prey, encircling prey, and attacking prey. The Whale Optimization Algorithm is the optimization algorithm that mimics the hunting behavior of whales. It uses three main components to implement the search for prey, encompassing prey, and bubble-net foraging characteristics of humpback whales. The butterfly optimization algorithm mimics the food search and mating behavior of butterflies.

The optimization algorithms are used to tune the performance of the classifiers. The optimization algorithm comes under the category of the bio-inspired algorithm. The algorithm is enhanced to get the finer results of classification which is very crucial for health care applications.

The data set used as such in classification may degrade the performance and increase the computational cost of the classifier. The features of the data set can be selected based on the contributing faster and more effective prediction of results. The significant features are identified using the Butterfly Optimization algorithm. The Butterfly optimization algorithm is based on the movement of the butterflies toward the best fragrance. The butterflies sense the fragrance with the help of three parameters such as sensory modality (c), stimulus intensity (I), and the power exponent (a).

The fragrance of the solution is calculated using the

To compute the best solution, there are two strategies followed. They are:

The global search moves in the solution space towards the global best solution global* which is represented by _{i} is the fragrance of the i^{th} solution. r is the random value in the range [0, 1] and g_{i}^{t} represents the j^{th} butterfly.

The local search moves solution space towards a new random place and the local search space is given by

The movement pattern of the solution is correlated with the process of searching for the optimal solution (i.e., best fitness function). The search for the solution is done in local and global spaces. The context migration between local and global search is based on the probability value p which is selected and initialized as an algorithmic specific parameter. The above processes are repeated until a saturated point is reached and the algorithm emits the best solution with high accuracy.

The BOA is modified to have a novel global and local search algorithm based on the binarization of the attribute values with the help of randomly generated threshold ‘t” as given in the algorithm. Algorithm 3.2(a) represents the pseudo-code for the binarized butterfly optimization algorithm where epoch refers to the number of iterations done for getting the optimal solution and

Different classifiers are available for prediction and machine learning classifiers such as Support Vector Machine, Random Forest, Naïve Bayesian, and Decision Tree are used by different researchers. The deep artificial neural classifier algorithm increases the learning rate and yields higher accuracy.

Machine Learning is the concept that comes under the umbrella of Artificial Intelligence and deep learning is another interior subset of machine learning. The artificial neural network is used for classifying data that can yield precise results. It mimics the structure and behavior of the human nervous system. It also tries to learn effectively yielding better performance. The artificial neural network has the sequence of layers for learning the features. The layers facilitate the extraction of features from lower layers. The extracted information is the backbone of the model with fewer hidden parameters.

The features extracted out of the BBoA are fed to the input layer. The Deep Neural Network facilitates learning by accomplishing different abstraction layers. The number of layers decides the depth of the network. The deep neural network structure used in the proposed system is depicted in

The feature transformation of weighted input to output is done with the help of activation functions. The rectified linear activation function (ReLU) and sigmoid activation functions are used for the transformation of input features. The rectified linear activation function (ReLU) is the default activation function predominantly used yielding improved classification accuracy. The activation function is implemented using

The sigmoid activation function is the nonlinear function used for input transformation and it is implemented as follows using

There are different metrics to measure the performance of the binary classifiers such as accuracy, sensitivity, specificity, and F-measure.

Accuracy is the most used measure for classification performance comparison. It is the ratio of correct prediction to the total number of instances. Accuracy is calculated based on the number of true positive (TP), true negative (TN), false positive (FP), and false-negative samples (FN) and it is represented as a confusion matrix of 2 * 2 dimensions for a binary classification problem. It is represented as follows.

Specificity is the measure of the ability of the classifier to predict the negative instances correctly. It reveals how far it is predicting the true negative instances correctly. If the specificity is 100%, the system can predict all instances the cardiovascular disease. It is computed using

Sensitivity is the measure of the ability of the classifier to predict the positive instances correctly. It reveals how far it is predicting the true positive instances correctly. If the sensitivity is 100%, the system can predict all instances without cardiovascular disease. It is represented by the

The proposed system DNN–BBoA is tested on different data sets. They are the CVD data set from the UCI machine learning repository and the CVD data set from the Kaggle repository. All researchers focused on finding a solution to improve the prediction accuracy of the UCI machine learning dataset. The proposed system experiments with the two CVD datasets. The public data set for Cardio Vascular Disease data set from the UCI repository is available for different data sets such as Cleveland, Hungary, Switzerland, and VA Long Beach datasets and altogether forms a merged database of 1025 records with 13 features. The target is a binary output ‘0’ or ‘1’ which tells whether the patient is affected by CVD or not. The data set taken from Kaggle is also tested using the proposed system. It has 12 attributes of 70,000 instances. The data set description is mentioned in the

Sl. No | Attributes |
---|---|

1. | Age |

2. | Sex |

3. | Chest pain |

4. | Resting blood pressure |

5. | Serum cholesterol |

6. | Fasting blood sugar > 120 mg/dl |

7. | Resting electrocardiographic |

8. | Maximum heart rate |

9. | Exercise induced angina |

10. | Old peak |

11. | The slope of the peak exercise ST segment |

12. | Number of major vessels (0–3) coloured by fluoroscopy |

13. | Thallium scan |

14. | CVD–yes/no–target variable |

Sl. No | Attributes |
---|---|

1. | Id |

2. | Age |

3. | Gender |

4. | Height |

5. | Weight |

6. | Systolic blood pressure |

7. | Diastolic blood pressure |

8. | Cholesterol |

9. | Glucose |

10. | Smoke |

11. | Alcohol intake |

12. | Active |

13. | Cardio-target variable |

The features such as Age, Blood Pressure, Glucose, Cholesterol contribute to disease prediction. The heart disease dataset from the UCI machine learning dataset has

Different possibilities are explored to design an effective prediction system. The exploration is implemented and compared with the different number of features and these experimental results are shown in

But, as these results are not promising, the proposed algorithms were deployed to improve further. That’s why the Binarized Butterfly optimization algorithm is applied to the raw data set yielding optimal features with higher accuracy as explained in visual

The results are enhanced by applying the proposed system (DNN–BBoA). The same is exercised on the two data sets UCI machine learning data sets. The proposed algorithm (BBoA) assumes the values of the parameters

To get efficient deep layer architecture, different deep layers are exercised for getting higher accuracy. The activation function “Rectified Linear Unit” along with the different number of hidden neurons is exercised on both datasets. The results of different deep architectures are presented in

Layer | Activation function used | Heart disease data set Shape of features/weights | EDA data set Number of parameters (Shape * shape + shape) | Shape of features/weights | Number of parameters (Shape * shape + shape) |
---|---|---|---|---|---|

Dense layer 1 (input layer) | 13 | 80 | 11 | ||

Dense layer 2 (hidden layer 1) | ReLU | 14 | 210 | 14 | 210 |

Dense layer 3 (hidden layer 2) | ReLU | 14 | 210 | 14 | 210 |

Dense layer 4 (hidden layer 3) | ReLU | 14 | 210 | 14 | 210 |

Dense layer 5 (hidden layer 4) | ReLU | 14 | 210 | 14 | 210 |

Dense layer 2 (output layer) | Sigmoid | Prediction |

The performance comparison of the proposed work is shown by incorporating different optimization algorithms along with different machine learning algorithms. Different optimization algorithms were exercised and compared with the proposed system. The Grey wolf optimization algorithms is tested on both UCI and Kaggle data sets and achieved the accuracy of 90.58% and 75.53% respectively. The Whale optimization algorithms are tested on both UCI and Kaggle data sets reaching the accuracy of 88.56% and 76.69% respectively. But these algorithms incurred an increased time complexity even on smaller data sets.

The experimental results shown in

The performance comparison shown in

State of the artwork | Methodology used (Feature selection + classifiers) | Sensitivity (%) | Specificity (%) | Accuracy (%) |
---|---|---|---|---|

Ghadiri Hedeshi et al. [ |
Particle swarm optimization + boosting approach | 90.02 | 82.31 | 85.76 |

Ul Haq et al. [ |
Relief, LASSO, mRMR + logistic regression and support vector machine | 98 | 98 | 89 |

Li et al. [ |
FCMIM-support vector machine | 92.5 | 98 | 92.37 |

Ghosh et al. [ |
LASSO + random forest bagging method | 97.57 | 99 | 99.05 |

Saqib Nawaz [ |
Gradient descent optimization | 99.43 | 97.6 | 98.54 |

Mohan et al. [ |
Hybrid random forest and linear method | 92.8 | 82.6 | 88.47 |

Proposed DL-BBoA | Binarized butterfly optimization algorithm + oneural network | 99.34 | 99.24 |

We show gratitude to anonymous referees for their useful ideas.

^{2}statistical model and optimally configured deep neural network”