Arrhythmia has been classified using a variety of methods. Because of the dynamic nature of electrocardiogram (ECG) data, traditional handcrafted approaches are difficult to execute, making the machine learning (ML) solutions more appealing. Patients with cardiac arrhythmias can benefit from competent monitoring to save their lives. Cardiac arrhythmia classification and prediction have greatly improved in recent years. Arrhythmias are a category of conditions in which the heart's electrical activity is abnormally rapid or sluggish. Every year, it is one of the main reasons of mortality for both men and women, worldwide. For the classification of arrhythmias, this work proposes a novel technique based on optimized feature selection and optimized K-nearest neighbors (KNN) classifier. The proposed method makes advantage of the UCI repository, which has a 279-attribute high-dimensional cardiac arrhythmia dataset. The proposed approach is based on dividing cardiac arrhythmia patients into 16 groups based on the electrocardiography dataset’s features. The purpose is to design an efficient intelligent system employing the dipper throated optimization method to categorize cardiac arrhythmia patients. This method of comprehensive arrhythmia classification outperforms earlier methods presented in the literature. The achieved classification accuracy using the proposed approach is 99.8%.

High fatality and morbidity rates are caused mainly by the heart disease as one of the most common diseases in the world with over 385,000 individuals dying each year. A heart attack happens every 34 s [

Electrocardiogram (ECG) represents the electrical activity of the heart and picked up using electrodes placed around the heart of the patient. ECG signal provides information about heart beats, regularity of heart rhythm, and detection of heart failure along with disorders that affects performance of the heart. A normal ECG signal is showing in

The normal ECG signal can be divided into three waves (P-QRS, T), two segments (PR, ST), and four intervals (PR, QRS, ST, QT). P wave represents the contraction of the atria it’s amplitude lies between (0.2–0.25 mV) while it’s duration lies between (0.06–0.12 s). The QRS complex marks the start of the contraction of the ventricles, it’s amplitude between (0.5–3 mV) and duration between (0.06–0.1 s). The T wave corresponding to the relaxation of the ventricles, it’s amplitude in range of (0.1–0.8 mV) and duration between (0.05–0.25 s). In Normal ECG recording PR interval is between (0.12–0.2 s) while normal QRS duration lies between (0.06–0.10 s), QT interval lies between (0.36–0.44 s). the QT interval between (0.36–0.44 s) and ST segment between (0.08–0.12 s). ECG is a non-stationary signal; it has multiple frequencies. The QRS wave oscillates faster than the T wave which is also faster than P wave. The ECG signal acquires noise from the power supply and interference from breathing muscle, which have to be removed before processing of ECG signals. So it’s very difficult to analyze ECG signal, beside that the clinical observation takes long time. According to this automatic analysis of ECG signal is preferred to done by computers. Arrhythmias is the oddity of the ECG signal from the above normal measurements and specifications, it reflects a fast or slow heart beats (tachycardia, bradycardia) or irregular ECG patterns.

Ventricular arrhythmias are the popular type of cardiac arrhythmias leading to abnormal heartbeat causing approximately 79% of sudden death. As soon as the arrhythmia is detected the cardiac arrest can be avoided. Therefore, it is safe to conclude that regular heart rate monitoring is essential to prevent cardiovascular disease. Arrhythmia detection in ECG signals has attracted many researchers [

We present a new approach to improve arrhythmia classification prediction accuracy. We employed the dipper throated optimization algorithm for optimizing the feature selection and the K-nearest neighbors’ classifier for categorizing patients into one of 16 arrhythmia types. The medical industry can benefit greatly from this approach to arrhythmia categorization. The categorization helps in determining if an arrhythmia exists or not. The dataset for the simulation came from the UCI Machine Learning arsenal, and the results showed a significant improvement in classification accuracy.

The following is how the rest of the paper is organized. The basis of arrhythmia categorization is discussed in Section 2. The method and strategy proposed are explained in Section 3. The simulation results for the suggested technique are presented in Section 4. Section 5 concludes this work by summarizing the results and suggesting areas for further investigation.

Several methods for detecting and classifying cardiac arrhythmia have been presented in the last two decades. Simple statistical learning, traditional machine learning, and more contemporary deep learning techniques are all examples of these approaches.

Filtering to remove noise from ECG [

Particle swarm optimization (PSO)is used to increase the performance of SVM in classification of five types of arrhythmia by fine tune the discriminator function for selecting the best features is introduced in [

A novel classifier is proposed to classify seventeen different types of arrhythmias [

A suggested algorithm used short term Fourier transform and wavelet transform for the classification of arterial fibrillation. Also, a CNN model is built to interpret ECG segments [

Ref. | Database | Method | ACC (%) | Sensitivity (%) | Specificity (%) |
---|---|---|---|---|---|

[ |
MIT-BIH arrhythmia | CNN | 96.00 | 95.49 | 94.19 |

[ |
MIT-BIH arrhythmia | CNN-LSTM | 98.00 | 97.87 | 98.57 |

[ |
MIT-BIH arrhythmia | Optimize CNN | 93.19 | 93.98 | 95.00 |

[ |
ECG (Shaoxing Hospital) | CNN | 93.19 | 95.00 | 94.30 |

[ |
MIT-BIH arrhythmia | CNN with focal loss | 98.55 | 82.00 | 79.00 |

[ |
MIT-BIH arrhythmia | CNN with focal loss | 97.40 | 96.7 | 97.8 |

[ |
MIT-BIH arrhythmia | CNN | 95.30 | 94.2 | 95.00 |

[ |
China ECG Challenge | Cascade CNN | 86.50 | 85.3 | 82.00 |

[ |
MIT-BIH arrhythmia | SVM | 99.51 | 99.28 | 99.63 |

[ |
MIT-BIH arrhythmia | CNN | 97.16 | 99.28 | 99.63 |

[ |
MIT-BIH arrhythmia | CNN | 97.38 | 92.00 | 95.63 |

In this paper, we propose the application of the dipper throated optimization (DTO) algorithm to select the significant features of the given dataset. In addition, we employed DTO algorithm to optimize the parameters of the K-NN classification algorithm. Practically, the process starts with preprocessing the records of the dataset to ensure the consistency and integrity of the recorded data. The coming sections present the main steps of the proposed approach.

In contrast to those characteristics with tiny numeric values, the features in the arrythmia dataset have huge numeric values, which have a significant impact on classification accuracy. For many attributes in the dataset utilized in this research, there is an inclusive numeric fluctuation. For these sorts of characteristics, data normalization is employed to limit the effect of the response variables. Data normalization aims to improve the classification model's performance by limiting the impact of higher-valued features. The numeric stability of the proposed approach is improved by using a scaling and centering strategy for data normalization [

This algorithm is proved as an efficient metaheuristic optimization algorithm inspired by the hunting dipper throated bird that performs rapid bowing movements. The main formulation of this algorithm is expressed in terms of the following equations [

where the location and speed of the

The challenge in selecting features is unique since the search space is limited to two binary values, 0 and 1. Consequently, we used the sigmoid function to turn the normal optimizer’s output into something that works for our purpose. To fit the feature selection problem, we apply the following equation to transform the continuous solution to binary.

where

The query instance’s KNN prediction is based on the category of nearest neighbors’ simple majority. To calculate the K-nearest neighbors, it uses the shortest distance between the query instance and the training examples. Euclidean distance, which is defined in the following equation, is a widely used distance metric.

where the training set is denoted by _{i}_{j}^{th} sample and ^{th} feature dimension. The detailed steps of KNN classifier are shown in

The UCI ML Repository [

To prove the effectiveness of the proposed approach, a set of experiments were conducted to assess the performance of the steps of the proposed approach. The first experiment was conducted to assess the feature selection process using the dipper throated optimization algorithm. The results presented in

bDTO | bGWO | bPSO | bBA | bWAO | bFA | bGA | |
---|---|---|---|---|---|---|---|

Average error | 0.6310 | 0.6482 | 0.682 | 0.6916 | 0.6818 | 0.6804 | 0.6618 |

Average Select size | 0.5838 | 0.7838 | 0.7838 | 0.9232 | 0.9472 | 0.8183 | 0.7262 |

Average Fitness | 0.6942 | 0.7104 | 0.7088 | 0.7317 | 0.7166 | 0.7607 | 0.7218 |

Best Fitness | 0.5960 | 0.6307 | 0.6891 | 0.6214 | 0.6807 | 0.6794 | 0.6251 |

Worst Fitness | 0.6945 | 0.6976 | 0.7568 | 0.7230 | 0.7568 | 0.7770 | 0.7402 |

Standard deviation Fitness | 0.5165 | 0.5212 | 0.5206 | 0.5305 | 0.5228 | 0.5574 | 0.5228 |

On the other hand, to chose the classifier that is best convenient for the task in hand, three classifiers were evaluated to find the best classifier.

KNN | SVM | NN | |
---|---|---|---|

AUC | 0.951 | 0.946 | 0.938 |

MSE | 0.000289 | 0.000311 | 0.000116 |

Another set of experiments is conducted to evaluate the performance of the optimization applied to the KNN classifier. In these experiments, five optimization approaches were used to optimize the parameters of KNN classifier, and the results are recorded in

DTO+KNN | WOA+KNN | GWO+KNN | GA+KNN | PSO+KNN | |
---|---|---|---|---|---|

Number of values | 13 | 13 | 13 | 13 | 13 |

Minimum | 0.998 | 0.9551 | 0.956 | 0.969 | 0.961 |

25% Percentile | 0.999 | 0.971 | 0.976 | 0.989 | 0.981 |

Median | 0.999 | 0.971 | 0.976 | 0.989 | 0.981 |

75% Percentile | 0.999 | 0.971 | 0.976 | 0.989 | 0.981 |

Maximum | 0.999 | 0.981 | 0.9786 | 0.989 | 0.981 |

Range | 0.001 | 0.0259 | 0.0226 | 0.02 | 0.02 |

Mean | 0.9988 | 0.9705 | 0.9747 | 0.9867 | 0.9787 |

Std. Deviation | 0.0003755 | 0.005402 | 0.005653 | 0.005991 | 0.005991 |

Std. Error of Mean | 0.0001042 | 0.001498 | 0.001568 | 0.001662 | 0.001662 |

Coefficient of variation | 0.03760% | 0.5566% | 0.5800% | 0.6072% | 0.6122% |

The Wilcoxon signed rank test is performance to measure the statistical difference between the proposed approach and the other approaches.

DTO+KNN | WOA+KNN | GWO+KNN | GA+KNN | PSO+KNN | |
---|---|---|---|---|---|

Theoretical median | 0 | 0 | 0 | 0 | 0 |

Actual median | 0.999 | 0.971 | 0.976 | 0.989 | 0.981 |

Number of values | 13 | 13 | 13 | 13 | 13 |

Wilcoxon signed rank test | |||||

Sum of signed ranks (W) | 91 | 91 | 91 | 91 | 91 |

Sum of positive ranks | 91 | 91 | 91 | 91 | 91 |

Sum of negative ranks | 0 | 0 | 0 | 0 | 0 |

P value (two tailed) | 0.0002 | 0.0002 | 0.0002 | 0.0002 | 0.0002 |

Exact or estimate? | Exact | Exact | Exact | Exact | Exact |

P value summary | *** | *** | *** | *** | *** |

Significant (alpha = 0.05)? | Yes | Yes | Yes | Yes | Yes |

How big is the discrepancy? | |||||

Discrepancy | 0.999 | 0.971 | 0.976 | 0.989 | 0.981 |

In addition, the one-way analysis of variance (ANOVA) test is performed to study difference between the proposed approach and the other approach using different hypothesis. The results are shown in

ANOVA table | SS | DF | MS | F (DFn, DFd) | P value |
---|---|---|---|---|---|

Treatment (between columns) | 0.006523 | 4 | 0.001631 | F (4, 60) = 61.27 | |

Residual (within columns) | 0.001597 | 60 | 2.66E-05 | ||

Total | 0.008119 | 64 |

Moreover,

In this paper, we proposed a new approach for categorization of arrhythmia based in the dipper throated optimization algorithm. This algorithm is used for both feature selection and optimization of the parameter of the KNN classifier. The proposed approach outperformed the earlier ML and optimization frameworks in terms of accuracy. The UCI ML repository provided the arrhythmia dataset. It planned to use this optimization approach with clustering and noise reduction methods in other domains in the future. Because the majority of the examples in the dataset utilized in this study belong to class 1 and the other classes only contain two to three instances, the risk of misclassification is increased when applying different methods. Because Class 1 has the most impact on the prediction model's output, obtaining as many cases in the other classes as feasible is required to improve forecasts in the future. If the arrhythmia dataset characteristics were categorized based on their physical similarity, the algorithm's result would be more helpful. Cases with aberrant P waves, for example, may be grouped together, whereas all variables with abnormal Q waves could be grouped together. After then, the results of various methods might be compared.

Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2022R308), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.